Design Messenger App

Message Handling

To get a message from the server, the user has two options:

  1. Pull model: Users can periodically ask the server if there are any new messages for them.

  2. Push model: Users can keep a connection open with the server and can depend upon the server to notify them whenever there are new messages.

Pull Model

Server needs to keep track of messages that are still waiting to be delivered, and as soon as the receiving user connects to the server to ask for any new message, the server can return all the pending messages.



To minimize latency for the user, they have to check the server quite frequently, and most of the time they will be getting an empty response if there are no pending message. This will waste a lot of resources and does not look like an efficient solution.

Push Model

All the active users keep a connection open with the server, then as soon as the server receives a message it can immediately pass the message to the intended user


This way, the server does not need to keep track of the pending messages, and we will have minimum latency, as the messages are delivered instantly on the opened connection.


How will clients maintain an open connection with the server?

We can use HTTP Long Polling or WebSockets.

- Long Polling

In long polling, clients can request information from the server with the expectation that the server may not respond immediately.

If the server has no new data for the client when the poll is received, instead of sending an empty response, the server holds the request open and waits for response information to become available.

Once it does have new information, the server immediately sends the response to the client, completing the open request.

Upon receipt of the server response, the client can immediately issue another server request for future updates.

This gives a lot of improvements in latencies, throughputs, and performance. The long polling request can timeout or can receive a disconnect from the server, in that case, the client has to open a new request.

FAQ for Message Handling

How can the server keep track of all the opened connection to redirect messages to the users efficiently?

Hash Map: UserID -> Connection Object

The server can maintain a hash table, where “key” would be the UserID and “value” would be the connection object.

So whenever the server receives a message for a user, it looks up that user in the hash table to find the connection object and sends the message on the open request.

What will happen when the server receives a message for a user who has gone offline?

Assuming the messenger app only support limited offline messages.

If the receiver has disconnected, the server can notify the sender about the delivery failure. If it is a temporary disconnect, e.g., the receiver’s long-poll request just timed out, then we should expect a reconnect from the user. In that case, we can ask the sender to retry sending the message. This retry could be embedded in the client’s logic so that users don’t have to retype the message. The server can also store the message for a while and retry sending it once the receiver reconnects.

What about offline messages?

How many chat servers we need?

Let’s plan for 500 million connections at any time. Assuming a modern server can handle 50K concurrent connections at any time, we would need 10K such servers.

How do we know which server holds the connection to which user?

We can introduce a software load balancer in front of our chat servers; that can map each UserID to a server to redirect the request.

How should the server process a ‘deliver message’ request?

The server needs to do the following things upon receiving a new message: 1) Store the message in the database 2) Send the message to the receiver and 3) Send an acknowledgment to the sender.

The chat server will first find the server that holds the connection for the receiver and pass the message to that server to send it to the receiver. The chat server can then send the acknowledgment to the sender; we don’t need to wait for storing the message in the database (this can happen in the background). Storing the message is discussed in the next section.

Storing and retrieving the messages from the database

Which storage system we should use?

Requirements: 1. support a very high rate of small updates; 2. fetch a range of records quickly

We need to have a database that can support a very high rate of small updates and also fetch a range of records quickly.

This is required because we have a huge number of small messages that need to be inserted in the database and, while querying, a user is mostly interested in sequentially accessing the messages.

We cannot use RDBMS like MySQL or NoSQL like MongoDB because we cannot afford to read/write a row from the database every time a user receives/sends a message. This will not only make the basic operations of our service run with high latency, but also create a huge load on databases.

>> Choice:

Wide-column database solution like HBase

HBase is a column-oriented key-value NoSQL database that can store multiple values against one key into multiple columns. HBase is modeled after Google’s BigTable and runs on top of Hadoop Distributed File System (HDFS). HBase groups data together to store new data in a memory buffer and, once the buffer is full, it dumps the data to the disk.

HBase groups data together to store new data in a memory buffer and, once the buffer is full, it dumps the data to the disk. This way of storage not only helps storing a lot of small data quickly, but also fetching rows by the key or scanning ranges of rows. HBase is also an efficient database to store variably sized data, which is also required by our service.

How should clients efficiently fetch data from the server?

Pagination depend on device screen size

Data partitioning

We will be storing a lot of data (3.6PB for five years), we need to distribute it onto multiple database servers.

Partition scheme? UserID or MessageID?

Total message data as estimated 500 million DAU * 40 message per day * 100 bytes per message ~ 2TB per day; that is

2TB * 365 days * 5 years ~= 3.6PB

Partitioning based on UserID (YES - faster for fetching

We partition based on the hash of the UserID so that we can keep all messages of a user on the same database.

How many shards? if one DB shard is 4TB, 3.6PB / 4TB ~= 900 shards, assume 1k shards.

So we will find the shard number by “hash(UserID) % 1000” and then store/retrieve the data from there.

This partitioning scheme will also be very quick to fetch chat history for any user.

In the beginning, we can start with fewer database servers with multiple shards residing on one physical server.

Our hash function needs to understand this logical partitioning scheme so that it can map multiple logical partitions on one physical server.

Partitioning based on MessageID (NO - slow for fetching range of messages of a chat)

If we store different messages of a user on separate database shards, fetching a range of messages of a chat would be very slow, so we should not adopt this scheme.


We can cache a few recent messages (say last 15) in a few recent conversations that are visible in a user’s viewport (say last 5).

Since we decided to store all of the user’s messages on one shard, cache for a user should entirely reside on one machine too.

Load Balancing

load balancer in front of our chat servers; that can map each UserID to a server that holds the connection for the user and then direct the request to that server

Reference and Other Resources

Last updated