Collections

Get Pro

Learn more about computerscience with this collection

Machine Learning With Google

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Discover 95 similar ideas in

Machine Learning With Google

It takes just

14 mins to read

Phase 3: Python + Cassandra + Postgres (1)

Next they moved the pre-computed feeds into Cassandra and stored the full activity data in them to avoid having to query Postgres when loading the feed.

To reduce memory usage and cost they stored the last 3,600 activities for active users, and only 180 for inactive users (falling back to Postgres after that point).
To reduce compute costs they split fanout tasks in two: active users got a high priority queue, and inactive users a low priority one. The high priority queue had a higher capacity buffer to cope with spikes, whereas the low priority queue relied on autoscaling and spot instances.

5 reads

MORE IDEAS ON THIS

Phase 2: Python + Redis + Postgres

The next step was to pre-compute the feeds in Redis.

They stored a feed for every user and populated them by fanning out new activities to all followers.
To minimize memory usage they stored only activity IDs in Redis and queried Postgres wh...

5 reads

Phase 3: Python + Cassandra + Postgres (2)

Even though the performance of Cassandra was great, it was complex to optimize it and hard to diagnose it during slowdowns and other issues.

Also, after a point they often found themselves delegating logic to the database layer because it was hard to speed up the performance of Python.

5 reads

Phase 1: Python + Postgres

The first solution was to store all activities in a single Postgres table and build the feed at read-time with a query like:

SELECT * FROM love WHERE user_id IN (...)

This solution ran smoothly up to 10M rows and with some fine tuning it held...

4 reads

Phase 4: Go + RocksDB

In 2018 they upgraded their architecture again. They haven't disclosed how their data is organized, but they mentioned what technologies they are using.

They replaced Cassandra with Rock...

6 reads

The challenge of scaling activity feeds

Activity feeds show activities from the people you follow and they represent the core of many popular products such as Facebook, Twitter, Instagram, LinkedIn, and Pinterest.

Feeds are hard to scale because there is no clear way to shard the data across multiple machines. On...

8 reads

CURATED FROM

Scaling Activity Feeds to 300M Users @ Stream

highscalability.com

6 ideas

33 reads

IDEAS CURATED BY

Ovidiu Podariu (Tech)

@ocpodariu

Alt account of @ocp. I use it to stash ideas about software engineering

Related collections

Introduction to Web 3.0

Metaverse

Hiring Without an Office

The Podcasting Ecosystem

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

—

100+ Learning Journeys

—

Access to 200,000+ ideas

—

Access to the mobile app

—

Unlimited idea saving

—

Unlimited history

—

Unlimited listening to ideas

—

Downloading & offline access

—

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

I agree to receive email updates

deepstash

Content

Ideas

Collections

Stories

Explore

Product

Pricing

Businesses

Resources

Terms

Privacy

Press Kit

Sitemap

Company

About

Contact