# Design Messenger App

### Message Handling

To get a message from the server, the user has two options:

1. **Pull model:** Users can **periodically ask the server** if there are any new messages for them.
2. **Push model:** Users can **keep a connection open with the server** and can depend upon the server to notify them whenever there are new messages.

#### Pull Model

Server needs to keep track of messages that are still waiting to be delivered, and as soon as the receiving user connects to the server to ask for any new message, the server can return all the pending messages.

**Pro:**

**Cons:**

**To minimize latency** for the user, they have to **check the server quite frequently,** and most of the time they will be getting an empty response if there are no pending message. This will **waste a lot of resources** and **does not look like an efficient solution**.

#### Push Model

All the **active users keep a connection open with the server**, then as soon as the server receives a message it can immediately pass the message to the intended user

**Pros:**

This way, the server **does not need to keep track of the pending messages**, and we will have **minimum latency**, as the messages are delivered instantly on the opened connection.

**Cons:**

#### How **will clients maintain an open connection with the server?**

We can use HTTP [Long Polling](https://www.educative.io/collection/page/5668639101419520/5649050225344512/5715426797420544) or [WebSockets](https://www.educative.io/collection/page/5668639101419520/5649050225344512/5715426797420544).

**- Long Polling**

In long polling, **clients** can **request** information from the **server** with the **expectation** that the **server may not respond immediately**.

If the server has no new data for the client when the poll is received, instead of sending an empty response, the **server holds the request open and waits for response information to become available**.

Once it does have new information, the server immediately sends the response to the client, completing the open request.

**Upon** **receipt** of the **server** **response**, the **client** can **immediately** **issue** another **server** **request** **for** **future** updates.

This gives a lot of improvements in latencies, throughputs, and performance. The long polling request can timeout or can receive a disconnect from the server, in that case, the client has to open a new request.

### FAQ for Message Handling

#### **How can the server keep track of all the opened connection to redirect messages to the users efficiently?**

&#x20;**Hash Map: UserID -> Connection Object**

The server can maintain a hash table, where “key” would be the UserID and “value” would be the connection object.

So whenever the server receives a message for a user, it looks up that user in the hash table to find the connection object and sends the message on the open request.

#### **What will happen when the server receives a message for a user who has gone offline?**

**Assuming the messenger app only support limited offline messages.**

If the receiver has disconnected, the server can notify the sender about the delivery failure. If it is a temporary disconnect, e.g., the receiver’s long-poll request just timed out, then we should expect a reconnect from the user. In that case, we can ask the **sender** to **retry** sending the **message**. This **retry** could be **embedded** in the **client’s** logic so that users don’t have to retype the message. The **server** can also **store** the **message** for **a** **while** and **retry** sending it once the receiver reconnects.

**What about offline messages?**

#### **How many chat servers we need?**

Let’s plan for 500 million connections at any time. Assuming a modern server can handle 50K concurrent connections at any time, we would need 10K such servers.

#### **How do we know which server holds the connection to which user?**

We can introduce a software load balancer in front of our chat servers; that can map each UserID to a server to redirect the request.

#### **How should the server process a ‘deliver message’ request?**

The server needs to do the following things upon receiving a new message: 1) Store the message in the database 2) Send the message to the receiver and 3) Send an acknowledgment to the sender.

The chat server will first find the server that holds the connection for the receiver and pass the message to that server to send it to the receiver. The chat server can then send the acknowledgment to the sender; we don’t need to wait for storing the message in the database (this can happen in the background). Storing the message is discussed in the next section.

### Storing and retrieving the messages from the database <a href="#b-storing-and-retrieving-the-messages-from-the-database" id="b-storing-and-retrieving-the-messages-from-the-database"></a>

#### **Which storage system we should use?**

> **Requirements: 1. support a very high rate of small updates; 2. fetch a range of records quickly**

We need to have a database that can **support a very high rate of small updates** and **also fetch a range of records quickly**.

This is required because we have a huge number of small messages that need to be inserted in the database and, while querying, a user is mostly interested in **sequentially accessing the messages**.

We **cannot** use **RDBMS** like **MySQL** or **NoSQL** like **MongoDB** because we cannot afford to read/write a row from the database every time a user receives/sends a message. This will not only make the **basic** **operations** of our service run with **high** **latency**, but also create a **huge** **load** on **databases**.

**>> Choice:**

Wide-column database solution like [**HBase**](https://en.wikipedia.org/wiki/Apache_HBase)

**HBase** is a **column-oriented key-value NoSQL database** that can store multiple values against one key into multiple columns. HBase is modeled after Google’s [BigTable](https://en.wikipedia.org/wiki/Bigtable) and runs on top of Hadoop Distributed File System ([HDFS](https://en.wikipedia.org/wiki/Apache_Hadoop)). HBase groups data together to store new data in a memory buffer and, once the buffer is full, it dumps the data to the disk.

**HBase** groups data together to store new data in a **memory** **buffer** and, once the **buffer** is **full**, it **dumps** the **data** to the **disk**. This way of storage not only **helps storing a lot of small data quickly**, but also **fetching rows by the key or scanning ranges of rows**. HBase is also an **efficient** database to **store variably sized data**, which is also required by our service.

#### **How should clients efficiently fetch data from the server?**

Pagination depend on device screen size

### Data partitioning

We will be storing a lot of data (3.6PB for five years), we need to distribute it onto multiple database servers.

***Partition scheme***? **UserID** or **MessageID**?

Total message data as estimated 500 million DAU \* 40 message per day \* 100 bytes per message \~ 2TB per day; that is

2TB \* 365 days \* 5 years \~= 3.6PB

#### **Partitioning based on UserID (YES - faster for fetching**

We partition based on the hash of the UserID so that we can keep all messages of a user on the same database.

How many shards? if one DB shard is 4TB, 3.6PB / 4TB \~= 900 shards, assume 1k shards.

So we will find the shard number by “hash(UserID) % 1000” and then store/retrieve the data from there.

**This** **partitioning** **scheme** will also be **very** **quick** to **fetch chat history for any user**.

In the beginning, we can start with fewer database servers with multiple shards residing on one physical server.

Our **hash** **function** needs to understand this **logical partitioning scheme** so that it can **map multiple logical partitions** on one **physical server**.

#### **Partitioning based on MessageID (NO - slow for fetching range of messages of a chat)**

If we store **different** **messages** of a **user** on **separate** **database** **shards**, **fetching a range of messages of a chat** would be **very slow**, so we **should** **not adopt this scheme**.

### Cache

We can cache a few recent messages (say last 15) in a few recent conversations that are visible in a user’s viewport (say last 5).

Since we decided to **store all of the user’s messages on one shard**, **cache for a user should entirely reside on one machine** too.

### Load Balancing

load balancer in front of our chat servers; that can map each UserID to a server that holds the connection for the user and then direct the request to that server

## Reference and Other Resources

* [微信为什么不丢消息？](https://www.w3cschool.cn/architectroad/architectroad-the-reliable-delivery-of-instant-messaging.html)
