Scaling Activity Feeds to 300M Users @ Stream - Deepstash
Machine Learning With Google

Learn more about computerscience with this collection

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Machine Learning With Google

Discover 95 similar ideas in

It takes just

14 mins to read

The challenge of scaling activity feeds

The challenge of scaling activity feeds

Activity feeds show activities from the people you follow and they represent the core of many popular products such as Facebook, Twitter, Instagram, LinkedIn, and Pinterest.

Feeds are hard to scale because there is no clear way to shard the data across multiple machines. Once you have a large user base, almost everyone ends up being connected to everyone else.

The next ideas show how Stream scaled their feed infrastructure over time.

1

8 reads

Phase 1: Python + Postgres

Phase 1: Python + Postgres

The first solution was to store all activities in a single Postgres table and build the feed at read-time with a query like:

SELECT * FROM love WHERE user_id IN (...)

This solution ran smoothly up to 10M rows and with some fine tuning it held up to 100M rows and 1M users. After that point the performance dropped and users occasionally waited multiple seconds for their feeds to load.

1

4 reads

Phase 2: Python + Redis + Postgres

Phase 2: Python + Redis + Postgres

The next step was to pre-compute the feeds in Redis.

  • They stored a feed for every user and populated them by fanning out new activities to all followers.
  • To minimize memory usage they stored only activity IDs in Redis and queried Postgres when loading the feed to get the full activity data.
  • To distribute load they sharded feeds across multiple Redis machines.

This solution was easy to setup and maintain, but it got expensive when they needed to store more data in Redis. As the user base grew, the queries to Postgres became slower and it would have been too expensive to move more data into Redis to speed up queries.

1

5 reads

Phase 3: Python + Cassandra + Postgres (1)

Phase 3: Python + Cassandra + Postgres (1)

Next they moved the pre-computed feeds into Cassandra and stored the full activity data in them to avoid having to query Postgres when loading the feed.

  • To reduce memory usage and cost they stored the last 3,600 activities for active users, and only 180 for inactive users (falling back to Postgres after that point).
  • To reduce compute costs they split fanout tasks in two: active users got a high priority queue, and inactive users a low priority one. The high priority queue had a higher capacity buffer to cope with spikes, whereas the low priority queue relied on autoscaling and spot instances.

1

5 reads

Phase 3: Python + Cassandra + Postgres (2)

Even though the performance of Cassandra was great, it was complex to optimize it and hard to diagnose it during slowdowns and other issues.

Also, after a point they often found themselves delegating logic to the database layer because it was hard to speed up the performance of Python.

1

5 reads

Phase 4: Go + RocksDB

Phase 4: Go + RocksDB

In 2018 they upgraded their architecture again. They haven't disclosed how their data is organized, but they mentioned what technologies they are using.

They replaced Cassandra with RocksDB because it's simpler to maintain, has better performance, and has more consistent performance.

They replaced Python with Go to:

  • Speed up features such as aggregation, ranking and serialization
  • Simplify their infrastructure and improve latency. For the same number of requests, they managed to use 10 times less servers thanks to the lower memory and CPU usage of Go.

1

6 reads

IDEAS CURATED BY

ocpodariu

Alt account of @ocp. I use it to stash ideas about software engineering

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

100+ Learning Journeys

Access to 200,000+ ideas

Access to the mobile app

Unlimited idea saving

Unlimited history

Unlimited listening to ideas

Downloading & offline access

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

Email

I agree to receive email updates