Database Sharding

Wednesday, September 27, 2023

Database Sharding

What is Database Sharding?

Any application or website that sees significant growth will need to scale to take care of the increase in traffic. Some organisation's choose to scale their databases dynamically.

Sharding is a database architecture pattern related to horizontal partitioning.

Separating a table's rows into multiple different tables (which are known as partitions)
Each partition will have the same table and schema but the rows would be different
Data in each of the rows is different
Data is broken up into smaller sets known as shards.
These shards are then pushed into multiple distributed database nodes.
All the data collectively in all these shards represent the entire dataset.
This is also known as scaling out.
We can add more machines as the load increases.
Leads to faster processing.

Vertical partitioning

Columns are separated and put into new tables
Data is unique but there is a common column in all tables to align the data
Known as scaling up
Upgrade hardware, CPU and memory.

Benefits

As the table grows, queries slow down.
With horizontal sharding, queries go through fewer rows and hence are faster.
If there is an outage, only a single shard may get impacted and while it may impact the application, the overall impact would be less as compared to the whole DB being down.

Drawbacks

Complex
Very difficult to return to unsharded/original form
Unbalanced shards

In case the sharding was done by name e.g. A-H, I-P and S-Z
If the names are more in A-H and less in S-Z, it would lead to unbalanced shards
Application queries to A-H will slow down
Thus, A-H shard is known as database hotspot.

Not supported by all DBs like PostgreSQL

Sharding Architectures

Some of the Architectures that are important for all to be aware of:

Range Based sharding

Sharding data based on a range (obvious by the name)
Relatively simple to implement
Every shard has unique set of data
Read and write from the shard where the range falls in.
As expected, this can lead to unbalanced data as well.

Directory Based sharding

Create a lookup table that will contain a mapping of a Key and shard ID.
Do note that the key here should have low cardinality (less # of possible values)
Application needs to have another query to get shard ID from the lookup table.
Can the lookup table become a point of failure? Yes, it can.

Key Based Sharding

Also known as Hash Based Sharding
Creates a hash value from a key and that hash value represents the Shard ID
The shard key value should be static and not change with time
If data grows and more shards need to be added, effort would be needed to add corresponding hash and do remapping of existing values.

My Learning Cafe

Wednesday, September 27, 2023

Database Sharding

No comments:

Post a Comment