Running controlled experiments: A/B Testing - Deepstash

Running controlled experiments: A/B Testing

At Netflix, running A/B tests, where possible, allows us to substantiate causality and confidently make changes to the product knowing that our members have voted for them with their actions.

An A/B test starts with an idea — some change we can make to the UI, the personalization systems that help members find content, the signup flow for new members, or any other part of the Netflix experience that we believe will produce a positive result for our members.

STASHED IN:

2

STASHED IN:

0 Comments

MORE IDEAS FROM What is an A/B Test?

To run the experiment, we take a subset of our members, usually a simple random sample, and then use random assignment to evenly split that sample into two groups. Group “A,” often called the “control group,” continues to receive the base Netflix UI experience, while Group “B,” often called the “treatment group”, receives a different experience, based on a specific hypothesis about improving the member experience (more on those hypotheses below). Here, Group B receives the Upside Down box art.

STASHED IN:

2

With many experiments, including the Upside Down box art example, we need to think carefully about what our metrics are telling us. Suppose we look at the click-through rate, measuring the fraction of members in each experience that clicked on a title. This metric alone may be a misleading measure of whether this new UI is a success, as members might click on a title in the Upside Down product experience only in order to read it more easily.

STASHED IN:

2

Because we create our control (“A”) and treatment (“B”) groups using random assignment, we can ensure that individuals in the two groups are, on average, balanced on all dimensions that may be meaningful to the test. Random assignment ensures, for example, that the average length of Netflix membership is not markedly different between the control and treatment groups, nor are content preferences, primary language selections, and so forth. 

STASHED IN:

2

A/B tests let us make causal statements. We’ve introduced the Upside Down product experience to Group B only, and because we’ve randomly assigned members to groups A and B, everything else is held constant between the two groups. We can therefore conclude with high probability (more on the details next time) that the Upside Down product caused the reduction in engagement.

STASHED IN:

2

With the Top 10 example, the hypothesis read: “Showing members the Top 10 experience will help them find something to watch, increasing member joy and satisfaction.” The primary decision metric for this test (and many others) is a measure of member engagement with Netflix: are the ideas we are testing helping our members to choose Netflix as their entertainment destination on any given night? Our research shows that this metric (details omitted) is correlated, in the long term, with the probability that members will retain their subscriptions.

STASHED IN:

2

Deepstash helps you become inspired, wiser and productive, through bite-sized ideas from the best articles, books and videos out there.

GET THE APP:

RELATED IDEA

Netflix was created with the idea of putting consumer choice and control at the centre of the entertainment experience, and as a company, we continuously evolve our product offerings to improve on that value proposition.

Back in 2010, the Netflix UI was static, with limited navigation options and a presentation inspired by displays at a video rental store.

Now, the UI is immersive and video-forward, the navigation options richer but less obtrusive, and the box art presentation takes greater advantage of the digital experience.

STASHED IN:

2

STASHED IN:

0 Comments

Your metrics are a reflection of your strategy. They help answer, is the strategy working? Metrics without strategy is like looking at a bunch of random numbers.

You need to define the strategy before you define your metrics. What are the key hypotheses of the strategy? What metrics would indicate those hypotheses are true?

STASHED IN:

21

A/B Testing: Definition & How it Works

A/B testing is used to find the best marketing strategies. It is be used to test everything from website copy to sales emails. This allows you to find the best-performing version of your campaign before spending your entire budget on one that don’t work. 

While A/B testing is time-consuming, its advantages are enough to offset the time investment. Proper A/B tests make a huge difference in the effectiveness of your campaign. Narrowing down & combining the most effective elements of a campaign creates a higher return on investment, lower risk of failure, & a stronger marketing plan.

STASHED IN:

4