> For the complete documentation index, see [llms.txt](https://aaronice.gitbook.io/system-design/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://aaronice.gitbook.io/system-design/distributed-systems/sharding-or-partitioning.md).

# Sharding or Partitioning

Excerpt from Grokking System Design

## 1. Partitioning Methods

**a. Horizontal partitioning (Row):**

In this scheme, we put different rows into different tables.

**b. Vertical Partitioning (Column):**

In this scheme, we divide our data to store tables related to a specific feature to their own server.

**c. Directory Based Partitioning (Custom Mapping - loosely coupled approach):**

Create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code.

## 2. Partitioning Criteria

**a. Key or Hash-based partitioning**

Under this scheme, we apply a hash function to some key attribute of the entity we are storing, that yields the partition number.

CON: The fundamental problem with this approach is that it effectively fixes the total number of DB servers, since adding new servers means changing the hash function which would require redistribution of data and downtime for the service. Workaround: Consistent Hashing

**b. List partitioning**

In this scheme, each partition is assigned a list of values.

**c. Round-robin partitioning**

This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n).

**d. Composite partitioning:**

Under this scheme, we combine any of above partitioning schemes to devise a new scheme.

## 3. Common Problems of Sharding

**a. Joins and Denormalization:**

Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database shards.

**b. Referential integrity:**

As we saw that performing a cross-shard query on a partitioned database is not feasible, similarly trying to enforce data integrity constraints such as foreign keys in a sharded database can be extremely difficult.

**c. Rebalancing:**

There could be many reasons we have to change our sharding scheme:

* The data distribution is not uniform
* There are a lot of load on a shard


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://aaronice.gitbook.io/system-design/distributed-systems/sharding-or-partitioning.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
