System Design
  • Introduction
  • System Design Process
  • System Design Systematic Approach
  • System Design Topics
  • System Design Interview Tips
  • Object Oriented Design
  • System Design Problems
    • Designing an API Rate Limiter
    • Design News Feed
    • Design Recommendation System
    • Design Photo Sharing App
    • Design Location Based App
    • Design Messenger App
    • Design Twitter
    • Design Uber Lyft
    • Design Surge Pricing
  • Architect's Toolbox
    • Cache Design
    • Database and Cache
    • Pull vs Poll
    • Geo Location
    • Storage Estimation
    • ID Generator
    • Latency Numbers
    • Encoding Decoding Encryption Decryption
  • Systems Design Glossary
    • Consistent Hashing
    • Sharding or Partitioning
    • Database Indexes
    • Proxies
    • Caching
    • Queues
    • SQL vs. NoSQL
    • CAP Theorem
    • Distributed Messaging System
    • Long-Polling vs WebSockets vs Server-Sent Events
    • Producer and Consumer
    • Latency, Bandwidth and Throughput
    • Microservices Architecture
    • RESTful API
    • Concurrent Programming
  • Distributed System Resources
    • Distributed System Notes
  • Reference
Powered by GitBook
On this page
  • 1. Partitioning Methods
  • 2. Partitioning Criteria
  • 3. Common Problems of Sharding

Was this helpful?

  1. Systems Design Glossary

Sharding or Partitioning

Excerpt from Grokking System Design

1. Partitioning Methods

a. Horizontal partitioning (Row):

In this scheme, we put different rows into different tables.

b. Vertical Partitioning (Column):

In this scheme, we divide our data to store tables related to a specific feature to their own server.

c. Directory Based Partitioning (Custom Mapping - loosely coupled approach):

Create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code.

2. Partitioning Criteria

a. Key or Hash-based partitioning

Under this scheme, we apply a hash function to some key attribute of the entity we are storing, that yields the partition number.

CON: The fundamental problem with this approach is that it effectively fixes the total number of DB servers, since adding new servers means changing the hash function which would require redistribution of data and downtime for the service. Workaround: Consistent Hashing

b. List partitioning

In this scheme, each partition is assigned a list of values.

c. Round-robin partitioning

This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n).

d. Composite partitioning:

Under this scheme, we combine any of above partitioning schemes to devise a new scheme.

3. Common Problems of Sharding

a. Joins and Denormalization:

Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database shards.

b. Referential integrity:

As we saw that performing a cross-shard query on a partitioned database is not feasible, similarly trying to enforce data integrity constraints such as foreign keys in a sharded database can be extremely difficult.

c. Rebalancing:

There could be many reasons we have to change our sharding scheme:

  • The data distribution is not uniform

  • There are a lot of load on a shard

PreviousConsistent HashingNextDatabase Indexes

Last updated 5 years ago

Was this helpful?