Social network service
Scenario
User stories, requirements of the system
Post tweets
Follow other people
Favorite tweets
Timeline consisting of top tweets from people the user follows
Tweets can contain photos and videos.
Extended Requirements
Searching for tweets.
Replying to a tweet.
Trending topics – current hot topics/searches.
Tagging other users.
Tweet Notification.
Who to follow? Suggestions?
Moments.
Capacity Estimation
Estimate: 200 Million DAU
Tweets View (Read): 200M users * 100 tweets ~ 20B per day
Storage (Write):
New Tweets (Text): 100M tweets * (280 + 30 ) bytes per tweets ~ 30GB per day
Multi-media (Photos, Videos): (100M / 5 photos * 200KB) + (100M/10 videos * 2MB) ~= 24TB per day
Service
System APIs
Parameters:
api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
tweet_data (string): The text of the tweet, typically up to 140 characters.
tweet_location (string): Optional location (longitude, latitude) this Tweet refers to. user_location (string): Optional location (longitude, latitude) of the user adding the tweet.
media_ids (number[]): Optional list of media_ids to be associated with the Tweet. (All the media photo, video, etc. need to be uploaded separately).
Returns: (string)
A successful post will return the URL to access that tweet. Otherwise, an appropriate HTTP error is returned.
High Level System Component
Write: 100M / 86400s = 1150 tweets per second
Read: 20B / 86400s = 230K tweets per second
Read-heavy system.
This traffic will be distributed unevenly throughout the day, though, at peak time we should expect at least a few thousand write requests and around 1M read
Storage
Database Schema
-
-
-
Data Sharding
Sharding based on UserID:
We can try storing all the data of a user on one server. While storing, we can pass the UserID to our hash function that will map the user to a database server where we will store all of the user’s tweets, favorites, follows, etc.
Issues with this approach:
Hot user problem - lot of queries on the server holding hot user, high load will affect performance of the service
Non-uniform distribution of storage - some user can end up storing a lot more tweets or having a lot more followers
What if one user's data can't fit in one shard? - What if we cannot store all tweets of a user on one shard?
Unavailability of all of the user's data on the shard - if that shard is down or higher latency if it's serving high load for specific user
--
To recover from these situation: repartition/redistribute our data or use consistent hashing
Sharding based on TweetID
Our hash function will map each TweetID to a random server where we will store that Tweet.
Pros:
Solve the problem of hot users
Cons:
In contrast to sharding by UserID, we have to query all database partitions to find tweets of a user, which can result in higher latencies
Store hot tweets in cache in front of database servers
Last updated