# ID Generator

## (Random) ID Genrator

### Requirements

Usually there are a few requirements for ID generators:

* **They cannot be arbitrarily long.** Let’s say we keep it within **64 bits**.
* **ID is incremented by date.** This gives the system a lot of flexibility, e.g. you can **sort** users by ID, which is same as ordering by register date.

### Approaches

#### Single Machine

Start with something simple and keep optimizing it later, which is more true when the question is broad.

**How:**

In the simplest case, we can just keep incrementing ID from 1, 2, 3 … N, which in fact is one of the most popular ways to generate ID in many real life projects. If user A’s ID is larger than user B, then we know that A registered later.

**Pros:**

Start with single machine design, not only is this easy to design, but it’s good enough for many of the cases.

**Cons:**

hard to scale

#### 3rd party service

**How:**

Keep a single separate server that is only responsible for ID generation. More specifically, when a user registers to the product, for whichever database that handles this request, it’ll connect to the 3rd party server to ask for a random ID.

**Pros:**

Since all the ID generation is handled in a single server, there’s no risk of generating duplicate IDs.

**Cons:**

3rd party server will soon become the bottleneck

#### Multiple machine solution

Example: Open Source ID Generator -- Snowflake from Twitter

**How:**

Since within a single timestamp there can also be multiple users, we could solve this with two approaches.

1. We assign a **server ID** to each ID generation server and the final ID is a combination of timestamp and the server ID.
2. We can also allow multiple requests within a single **timestamp** on a single server. We can keep a **counter** on each server, which indicates how many IDs have been generated in the current timestamp. So the **final ID is a combination of timestamp, serverID and the counter.**

```
Final ID = Timestamp + ServerID + Counter
```

**Limitation for counter**: since the ID cannot be arbitarily long, the counter may end up with only **8bits** for instance. In this case, the server can only handle **256** **requests** **within** a **single** **timestamp** at most. -- If it frequently exceeds this limit, we need to add more instances.

## Clock synchronization

There’s a hidden assumption that all ID generation servers have the **same** **clock** to generate the timestamp, which might not be true in distributed systems. In reality, system clocks can drastically skew in distributed systems

## Industry Practices

* **Flickr’s** [**ticket servers**](http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/)
* **Twitter open sourced their ID generator called** [**Snowflake**](https://blog.twitter.com/2010/announcing-snowflake)
* **Instagram:** [**Sharding & IDs at Instagram**](https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c) **(✨）**
  * Before starting out, we listed out what **features** were essential in our system:
    1. **Generated IDs** should be **sortable** by **time** (so a list of photo IDs, for example, could be sorted without fetching more information about the photos)
    2. **IDs** should ideally be **64 bits** (for smaller indexes, and better storage in systems like Redis)
    3. The system should introduce as few new ‘moving parts’ as possible — a large part of how we’ve been able to scale Instagram with very few engineers is by choosing simple, easy-to-understand solutions that we trust.
  * **Each of our IDs consists of:**
    * 41 bits for **time** **in** **milliseconds** (gives us 41 years of IDs with a custom epoch)
    * 13 bits that represent the **logical** **shard ID**
    * 10 bits that represent an **auto-incrementing sequence**, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond
  * **Example**: September 9th, 2011, at 5:00pm, which is 1387263000 **milliseconds** since the beginning of our epoch
    * To start our ID, we fill the left-most 41 bits with this value (**epoch** **timestamp in ms**) with a **left-shift**
      * `id = 1387263000 <<(64-41)`
    * Next, take the **shard ID** for this particular piece of data we’re trying to insert. Let’s say we’re **sharding** by **user ID**, and there are 2000 **logical** **shards**; if our user ID is 31341, then the shard ID is 31341 % 2000 -> 1341. We fill the next 13 bits with this value:
      * `id |= 1341 <<(64-41-13)`
    * Finally, we take whatever the next value of our **auto-increment sequence** (this sequence is unique to each table in each schema) and fill out the remaining bits. Let’s say we’d generated 5,000 IDs for this table already; our next value is 5,001, which we take and mod by 1024 (so it fits in 10 bits)
      * `id |= (5001 % 1024)`

.

## Reference

Random ID Generator - Gainlo Blog: <http://blog.gainlo.co/index.php/2016/06/07/random-id-generator/>

Instagram Engineering Blog: [Sharding & IDs at Instagram](https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c). (👍)

[Generating unique IDs in a distributed environment at high scale.](https://www.callicoder.com/distributed-unique-id-sequence-number-generator/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aaronice.gitbook.io/system-design/architecture-toolbox/id-generator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
