Collating some of the resources which talks about Database Sharding.
- https://en.wikipedia.org/wiki/Shard_(database_architecture)
- [Feb 2019] http://highscalability.com/blog/2019/2/19/intro-to-redis-cluster-sharding-advantages-limitations-deplo.html
Redis Cluster is the Native Sharding implementation available within Redis that allows your to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. Its covers Sharding with Redis Cluster where Redis Clusters is divided in 16384 slots and these slots are assigned to multiple Redis Nodes.
The Redis Cluster Specification is the definitive guide to understanding the internals of the technology, while the Redis Cluster Tutorial provides deployment and administration guidelines. - [ Jan 2019 ] https://scalegrid.io/blog/scalegrid-hosting-adds-support-for-highly-available-redis-clusters-with-automated-sharding/
ScaleGrid : Fully Managed Database as a Service. Fully Managed Native Sharding Implementation available within Redis.
Scale Grid Hosting Adds support for Highly Available Redis Clusters with Automated Sharding. - [October 2010] Problems in Sharding : Lessons from Four Square Incident 17 Hours Downtime.
- MongoDB : Database which FourSquare uses contributed to the incident.
- FourSquare uses MongoDB on 2 EC2 Nodes with 66 GB Ram.
Data Replicates to slaves for redundancy.
MongoDB : JSON Format , Document oriented database, indexes can be made on any field and it supports Auto Sharding. - Data did not got distributed evenly over two Shards. One got 67 GB larger than RAM and other to 50 GB.
- Page Faults started occuring due to Demand Paging as RAM is now exhausted.
Whole system got slow as Disks are slow.
Queries became slow, causing backlog of operations and causing more demand paging and thus whole system went down. - Third Shard was added : yet First was still hitting disk.
Reason : Foursquare checkins are small, 300 bytes, MongoDB uses 4K Pages.
When you move a checking, 4K Page would be allocated , Memory would be available only when data is moved off a page. i.e
For Writing 300 Bytes, 4K Memory was getting paged-out. - Compaction Feature of MongoDB was turned-on. But compaction was slow because of the size of the data and becuase EC2's network disk storage.
EBS is relatively slow. - Solution : Was to reload the system from failures from backups so that data was resharded across three shards. All data now fits in physical RAM.
- [October 2010] Cache and System Design Blog with Redis : Focusing on Demand Paging and Page Faults.
- The Goal of sharding is to continually structure a system to ensure data is chunked in small enough units and spread across enough nodes that data operations are not limited by resource constraints.
- Some of the issues which we should be aware of Sharding
- Node Exhaustion : Hardware resources exhausted
- Shard Exhaustion : Capacity of Shard is exceeded.
- Shard Allocation Imbalance : Uneven spread of data across a shard set.
- Hot Key : Excessive access for a particular key or set of Keys.
- Small Shard Set : Data is stored in too few shards.
- Slow Recovery Operations
- Cloud Bursting : How quickly can your system respond to cloud bursts ?
- Fragmentation :
- Naive Disaster Handling , e.g when you add shards are you required to add all the data again?
- Excessive Paging
- Single Points of Failure
- Data Opacity : Data is not observable
- Reliance on developer omniscience
- Excessive Locking
- Slow IO
- Transnational Model Mismatch
- Poor Operations Model
- Insufficient Pool Capacity
- Slow bulk import
- Object Size and other limits
- High Latency and Poor Connection sensitivity
- Possible Responses to Troubled Shards
- Use more powerful nodes.
- Use a larger number of nodes.
- Move a shard to a different node.
- Move keys from a shard to another shard or into its own shard.
- Add more shards and move existing data to those shards.
- Condition Traffic.
- Dark-mode switches
- Read-only disaster mode
- Reload from a backup
- Key Choice
- Key Mapping : There are ways to map other than hashing. Example Flickr
- Replicate the data and load balance requests across replicas
- Monitor
- Automation
- Background Reindexing and defragmentation
- Capacity Planning
- Selection
- Test
- Figure a way to get out of Technical Debt
- Better algorithm selection
- Separate Historical and Real Time data
- Use more compact data structures
- Faster IO
- Think about SPOF.
- Figure out how you will handle downtime.
- Below 2 are exact copies of one another, Digital Ocean Came first.
{
https://www.digitalocean.com/community/tutorials/understanding-database-sharding
https://medium.com/@varshney.shivam786/database-sharding-system-design-1-63203f494fc0
} - https://www.educative.io/courses/grokking-the-system-design-interview/mEN8lJXV1LA
- Articles with Sharding as Tag
- [January 2008] http://highscalability.com/blog/2008/1/10/sharding-with-cookie-based-session-storage.html
- [April 2008] How PostreSQL was used as Production DB in Skype ?
It has some interesting related resources. Worth reading.
Comments
Post a Comment