Phase 3: Python + Cassandra + Postgres (1) - Deepstash
Machine Learning With Google

Learn more about computerscience with this collection

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Machine Learning With Google

Discover 95 similar ideas in

It takes just

14 mins to read

Phase 3: Python + Cassandra + Postgres (1)

Phase 3: Python + Cassandra + Postgres (1)

Next they moved the pre-computed feeds into Cassandra and stored the full activity data in them to avoid having to query Postgres when loading the feed.

  • To reduce memory usage and cost they stored the last 3,600 activities for active users, and only 180 for inactive users (falling back to Postgres after that point).
  • To reduce compute costs they split fanout tasks in two: active users got a high priority queue, and inactive users a low priority one. The high priority queue had a higher capacity buffer to cope with spikes, whereas the low priority queue relied on autoscaling and spot instances.

1

5 reads

MORE IDEAS ON THIS

Phase 2: Python + Redis + Postgres

Phase 2: Python + Redis + Postgres

The next step was to pre-compute the feeds in Redis.

  • They stored a feed for every user and populated them by fanning out new activities to all followers.
  • To minimize memory usage they stored only activity IDs in Redis and queried Postgres wh...

1

5 reads

Phase 3: Python + Cassandra + Postgres (2)

Even though the performance of Cassandra was great, it was complex to optimize it and hard to diagnose it during slowdowns and other issues.

Also, after a point they often found themselves delegating logic to the database layer because it was hard to speed up the performance of Python.

1

5 reads

Phase 1: Python + Postgres

Phase 1: Python + Postgres

The first solution was to store all activities in a single Postgres table and build the feed at read-time with a query like:

SELECT * FROM love WHERE user_id IN (...)

This solution ran smoothly up to 10M rows and with some fine tuning it held...

1

4 reads

Phase 4: Go + RocksDB

Phase 4: Go + RocksDB

In 2018 they upgraded their architecture again. They haven't disclosed how their data is organized, but they mentioned what technologies they are using.

They replaced Cassandra with Rock...

1

6 reads

The challenge of scaling activity feeds

The challenge of scaling activity feeds

Activity feeds show activities from the people you follow and they represent the core of many popular products such as Facebook, Twitter, Instagram, LinkedIn, and Pinterest.

Feeds are hard to scale because there is no clear way to shard the data across multiple machines. On...

1

8 reads

CURATED FROM

IDEAS CURATED BY

ocpodariu

Alt account of @ocp. I use it to stash ideas about software engineering

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

100+ Learning Journeys

Access to 200,000+ ideas

Access to the mobile app

Unlimited idea saving

Unlimited history

Unlimited listening to ideas

Downloading & offline access

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

Email

I agree to receive email updates