System Design
  • Introduction
  • System Design Process
  • System Design Systematic Approach
  • System Design Topics
  • System Design Interview Tips
  • Object Oriented Design
  • System Design Problems
    • Designing an API Rate Limiter
    • Design News Feed
    • Design Recommendation System
    • Design Photo Sharing App
    • Design Location Based App
    • Design Messenger App
    • Design Twitter
    • Design Uber Lyft
    • Design Surge Pricing
  • Architect's Toolbox
    • Cache Design
    • Database and Cache
    • Pull vs Poll
    • Geo Location
    • Storage Estimation
    • ID Generator
    • Latency Numbers
    • Encoding Decoding Encryption Decryption
  • Systems Design Glossary
    • Consistent Hashing
    • Sharding or Partitioning
    • Database Indexes
    • Proxies
    • Caching
    • Queues
    • SQL vs. NoSQL
    • CAP Theorem
    • Distributed Messaging System
    • Long-Polling vs WebSockets vs Server-Sent Events
    • Producer and Consumer
    • Latency, Bandwidth and Throughput
    • Microservices Architecture
    • RESTful API
    • Concurrent Programming
  • Distributed System Resources
    • Distributed System Notes
  • Reference
Powered by GitBook
On this page
  • (Random) ID Genrator
  • Requirements
  • Approaches
  • Clock synchronization
  • Industry Practices
  • Reference

Was this helpful?

  1. Architect's Toolbox

ID Generator

(Random) ID Genrator

Requirements

Usually there are a few requirements for ID generators:

  • They cannot be arbitrarily long. Let’s say we keep it within 64 bits.

  • ID is incremented by date. This gives the system a lot of flexibility, e.g. you can sort users by ID, which is same as ordering by register date.

Approaches

Single Machine

Start with something simple and keep optimizing it later, which is more true when the question is broad.

How:

In the simplest case, we can just keep incrementing ID from 1, 2, 3 … N, which in fact is one of the most popular ways to generate ID in many real life projects. If user A’s ID is larger than user B, then we know that A registered later.

Pros:

Start with single machine design, not only is this easy to design, but it’s good enough for many of the cases.

Cons:

hard to scale

3rd party service

How:

Keep a single separate server that is only responsible for ID generation. More specifically, when a user registers to the product, for whichever database that handles this request, it’ll connect to the 3rd party server to ask for a random ID.

Pros:

Since all the ID generation is handled in a single server, there’s no risk of generating duplicate IDs.

Cons:

3rd party server will soon become the bottleneck

Multiple machine solution

Example: Open Source ID Generator -- Snowflake from Twitter

How:

Since within a single timestamp there can also be multiple users, we could solve this with two approaches.

  1. We assign a server ID to each ID generation server and the final ID is a combination of timestamp and the server ID.

  2. We can also allow multiple requests within a single timestamp on a single server. We can keep a counter on each server, which indicates how many IDs have been generated in the current timestamp. So the final ID is a combination of timestamp, serverID and the counter.

Final ID = Timestamp + ServerID + Counter

Limitation for counter: since the ID cannot be arbitarily long, the counter may end up with only 8bits for instance. In this case, the server can only handle 256 requests within a single timestamp at most. -- If it frequently exceeds this limit, we need to add more instances.

Clock synchronization

There’s a hidden assumption that all ID generation servers have the same clock to generate the timestamp, which might not be true in distributed systems. In reality, system clocks can drastically skew in distributed systems

Industry Practices

    • Before starting out, we listed out what features were essential in our system:

      1. Generated IDs should be sortable by time (so a list of photo IDs, for example, could be sorted without fetching more information about the photos)

      2. IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis)

      3. The system should introduce as few new ‘moving parts’ as possible — a large part of how we’ve been able to scale Instagram with very few engineers is by choosing simple, easy-to-understand solutions that we trust.

    • Each of our IDs consists of:

      • 41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)

      • 13 bits that represent the logical shard ID

      • 10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond

    • Example: September 9th, 2011, at 5:00pm, which is 1387263000 milliseconds since the beginning of our epoch

      • To start our ID, we fill the left-most 41 bits with this value (epoch timestamp in ms) with a left-shift

        • id = 1387263000 <<(64-41)

      • Next, take the shard ID for this particular piece of data we’re trying to insert. Let’s say we’re sharding by user ID, and there are 2000 logical shards; if our user ID is 31341, then the shard ID is 31341 % 2000 -> 1341. We fill the next 13 bits with this value:

        • id |= 1341 <<(64-41-13)

      • Finally, we take whatever the next value of our auto-increment sequence (this sequence is unique to each table in each schema) and fill out the remaining bits. Let’s say we’d generated 5,000 IDs for this table already; our next value is 5,001, which we take and mod by 1024 (so it fits in 10 bits)

        • id |= (5001 % 1024)

.

Reference

PreviousStorage EstimationNextLatency Numbers

Last updated 5 years ago

Was this helpful?

Flickr’s

Twitter open sourced their ID generator called

Instagram: (✨)

Random ID Generator - Gainlo Blog:

Instagram Engineering Blog: . (👍)

ticket servers
Snowflake
Sharding & IDs at Instagram
http://blog.gainlo.co/index.php/2016/06/07/random-id-generator/
Sharding & IDs at Instagram
Generating unique IDs in a distributed environment at high scale.