Learn more about artificialintelligence with this collection
Understanding machine learning models
Improving data analysis and decision-making
How Google uses logic in machine learning
Slicing means separating your data into subgroups and looking at metric values for each subgroup separately. We commonly slice along dimensions like browser, locale, domain, device type, and so on. If the underlying phenomenon is likely to work differently across subgroups, you must slice the data to confirm whether that is indeed the case.
Even if you do not expect slicing to produce different results, looking at a few slices for internal consistency gives you greater confidence that you are measuring the right thing.
102
515 reads
MORE IDEAS ON THIS
When making theories about data, we often want to assert that "X causes Y"—for example, "the page getting slower caused users to click less." You can not simply establish causation because of correlation. By considering how you would validate a theory of causation, you can usually develop a good ...
97
271 reads
Randomness exists and will fool us. Some people think, “Google has so much data; the noise goes away.” This simply isn’t true. Every number or summary of data that you produce should have an accompanying notion of your confidence in this estimate (through measures such as confidence intervals and...
104
698 reads
Especially if you are trying to capture a new phenomenon, try to measure the same underlying thing in multiple ways. Then, determine whether these multiple measurements are consistent.
By using multiple measurements, you can identify bugs in measurement or logging code, unexpected feature...
97
296 reads
The previous points suggested some ways to get yourself to do the right kinds of soundness checking and validation. But sharing with a peer is one of the best ways to force yourself to do all these things. A skilled peer can provide qualitatively different feedback than the consumers of your data...
97
285 reads
Typically, data analysis for a complex problem is iterative. You will discover anomalies, trends, or other features of the data. Naturally, you will develop theories to explain this data. Don’t just develop a theory and proclaim it to be true. Look for evidence (inside or outside the data) to con...
97
263 reads
97
248 reads
Both slicing and consistency over time are particular examples of checking for reproducibility. If a phenomenon is important and meaningful, you should see it across different user populations and time. But verifying reproducibility means more than performing these two checks. If you are building...
97
276 reads
There’s always a motivation to analyze data. Formulating your needs as questions or hypotheses helps ensure that you are gathering the data you should be gathering and that you are thinking about the possible gaps in the data. Of course, the questions you ask should evolve as you look at the data...
97
266 reads
If you create new metrics (possibly by gathering a novel data source) and try to learn something new, you won’t know if your new metric is right. With new metrics, you should first apply them to a known feature or data.
If you have a new metric for where users are directing their attention...
97
262 reads
Examine outliers carefully because they can be canaries in the coal mine that indicate more fundamental problems with your analysis.
It's fine to exclude outliers from your data or to lump them together into an "unusual" category, but you should make sure that you know why data ended up i...
108
755 reads
Not only do we typically work with very large data sets, but those data sets are extremely rich. That is, each row of data typically has many, many attributes. When you combine this with the temporal sequences of events for a given user, there are an enormous number of ways of looking at the data...
103
1.21K reads
As part of the "Validation" stage, before actually answering the question you are interested in (for example, "Did adding a picture of a face increase or decrease clicks?"), rule out any other variability in the data that might affect the experiment. For example:
97
303 reads
The most interesting metrics are ratios of underlying measures. Oftentimes, interesting filtering or other data choices are hidden in the precise definitions of the numerator and denominator. For example, which of the following does “Queries / User” actually mean?
101
332 reads
Looking at day-over-day data also gives you a sense of the variation in the data that would eventually lead to confidence intervals or claims of statistical significance. This should not generally replace rigorous confidence-interval calculation, but often with large changes you can see they will...
97
390 reads
Almost every large data analysis starts by filtering data in various stages. Maybe you want to consider only US users, or web searches, or searches with ads. Whatever the case, you must:
98
355 reads
Good data analysis will have a story to tell. To make sure it’s the right story, you need to tell the story to yourself, then look for evidence that it’s wrong. One way of doing this is to ask yourself, “What experiments would I run that would validate/invalidate the story I am telling?” Even if ...
99
266 reads
Before looking at any data, make sure you understand the context in which the data was collected. If the data comes from an experiment, look at the configuration of the experiment. If it's from new client instrumentation, make sure you have at least a rough understanding of how the data is collec...
97
347 reads
When doing exploratory analysis, perform as many iterations of the whole analysis as possible. Typically you will have multiple steps of signal gathering, processing, modeling, etc. If you spend too long getting the very first stage of your initial signals perfect, you are missing out on opportun...
97
258 reads
You should almost always try slicing data by units of time because many disturbances to underlying data happen as our systems evolve over time. (We often use days, but other units of time may also be useful.)
During the initial launch of a feature or new data collection, practitioners ofte...
99
414 reads
Most practitioners use summary metrics (for example, mean, median, standard deviation, and so on) to communicate about distributions.
However, you should usually examine much richer distribution representations by generating histograms, cumulative distribution functions (CDFs), Quantile-Q...
118
887 reads
We typically define various metrics around user success.
You can not use the metric that is fed back to your system as a basis for evaluating your change. If you show more ads that get more clicks, you can not use “more clicks” as a basis for deciding that users are happier, even though “m...
98
245 reads
With a large volume of data, it can be tempting to focus solely on statistical significance or to home in on the details of every bit of data. But you need to ask yourself, "Even if it is true that value X is 0.1% more than value Y, does it matter?"
This can be especially important if you...
98
447 reads
Often you will be calculating a metric that is similar to things that have been counted in the past. You should compare your metrics to metrics reported in the past, even if these measurements are on different user populations.
You do not need to get an exact agreement, but you should be i...
97
263 reads
Validation: Do I believe the data is self-consistent, that it was collected correctly, and that it represents what I think it does?
Description: What's the objective interpretation of this data? For example, "Users make fewer queries classified as X," "In t...
104
341 reads
Anytime you are producing new analysis code, you need to look at examples from the underlying data and how your code is interpreting those examples. Your analysis is abstracting away many details from the underlying data to produce useful summaries.
How you sample these examples is importa...
103
606 reads
When looking at new features and new data, it's particularly tempting to jump right into the metrics that are new or special for this new feature. However, you should always look at standard metrics first, even if you expect them to change.
For example, when adding a new universal block t...
97
285 reads
CURATED FROM
Explore the World’s
Best Ideas
Save ideas for later reading, for personalized stashes, or for remembering it later.
Start
31 ideas
Start
44 ideas
# Personal Growth
Take Your Ideas
Anywhere
Just press play and we take care of the words.
No Internet access? No problem. Within the mobile app, all your ideas are available, even when offline.
Ideas for your next work project? Quotes that inspire you? Put them in the right place so you never lose them.
Start
47 ideas
Start
75 ideas
My Stashes
Join
2 Million Stashers
4.8
5,740 Reviews
App Store
4.7
72,690 Reviews
Google Play
Sean Green
Great interesting short snippets of informative articles. Highly recommended to anyone who loves information and lacks patience.
“
Ashley Anthony
This app is LOADED with RELEVANT, HELPFUL, AND EDUCATIONAL material. It is creatively intellectual, yet minimal enough to not overstimulate and create a learning block. I am exceptionally impressed with this app!
“
samz905
Don’t look further if you love learning new things. A refreshing concept that provides quick ideas for busy thought leaders.
“
Shankul Varada
Best app ever! You heard it right. This app has helped me get back on my quest to get things done while equipping myself with knowledge everyday.
“
Jamyson Haug
Great for quick bits of information and interesting ideas around whatever topics you are interested in. Visually, it looks great as well.
“
Giovanna Scalzone
Brilliant. It feels fresh and encouraging. So many interesting pieces of information that are just enough to absorb and apply. So happy I found this.
“
Ghazala Begum
Even five minutes a day will improve your thinking. I've come across new ideas and learnt to improve existing ways to become more motivated, confident and happier.
“
Laetitia Berton
I have only been using it for a few days now, but I have found answers to questions I had never consciously formulated, or to problems I face everyday at work or at home. I wish I had found this earlier, highly recommended!
“
Read & Learn
20x Faster
without
deepstash
with
deepstash
with
deepstash
Access to 200,000+ ideas
—
Access to the mobile app
—
Unlimited idea saving & library
—
—
Unlimited history
—
—
Unlimited listening to ideas
—
—
Downloading & offline access
—
—
Personalized recommendations
—
—
Supercharge your mind with one idea per day
Enter your email and spend 1 minute every day to learn something new.
I agree to receive email updates