Learn more about computerscience with this collection
Understanding machine learning models
Improving data analysis and decision-making
How Google uses logic in machine learning
Activity feeds show activities from the people you follow and they represent the core of many popular products such as Facebook, Twitter, Instagram, LinkedIn, and Pinterest.
Feeds are hard to scale because there is no clear way to shard the data across multiple machines. Once you have a large user base, almost everyone ends up being connected to everyone else.
The next ideas show how Stream scaled their feed infrastructure over time.
1
8 reads
The first solution was to store all activities in a single Postgres table and build the feed at read-time with a query like:
SELECT * FROM love WHERE user_id IN (...)
This solution ran smoothly up to 10M rows and with some fine tuning it held up to 100M rows and 1M users. After that point the performance dropped and users occasionally waited multiple seconds for their feeds to load.
1
4 reads
The next step was to pre-compute the feeds in Redis.
This solution was easy to setup and maintain, but it got expensive when they needed to store more data in Redis. As the user base grew, the queries to Postgres became slower and it would have been too expensive to move more data into Redis to speed up queries.
1
5 reads
Next they moved the pre-computed feeds into Cassandra and stored the full activity data in them to avoid having to query Postgres when loading the feed.
1
5 reads
Even though the performance of Cassandra was great, it was complex to optimize it and hard to diagnose it during slowdowns and other issues.
Also, after a point they often found themselves delegating logic to the database layer because it was hard to speed up the performance of Python.
1
5 reads
In 2018 they upgraded their architecture again. They haven't disclosed how their data is organized, but they mentioned what technologies they are using.
They replaced Cassandra with RocksDB because it's simpler to maintain, has better performance, and has more consistent performance.
They replaced Python with Go to:
1
6 reads
IDEAS CURATED BY
Alt account of @ocp. I use it to stash ideas about software engineering
Other curated ideas on this topic:
3 ideas
12 Agile Skills for Software Developers
gartner.com
4 ideas
The road to microservices · Romain Vernoux
romain.vernoux.fr
5 ideas
Read & Learn
20x Faster
without
deepstash
with
deepstash
with
deepstash
Personalized microlearning
—
100+ Learning Journeys
—
Access to 200,000+ ideas
—
Access to the mobile app
—
Unlimited idea saving
—
—
Unlimited history
—
—
Unlimited listening to ideas
—
—
Downloading & offline access
—
—
Supercharge your mind with one idea per day
Enter your email and spend 1 minute every day to learn something new.
I agree to receive email updates