The 5 Basic Statistics Concepts Data Scientists Need to Know

Curated from: towardsdatascience.com

Ideas, facts & insights covering these topics:

Problem Solving

Artificial Intelligence

Science & Nature

Computer Science

7 ideas

4.62K reads

Explore the World's Best Ideas

Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.

Data Science And Statistics

Statistics is using math to do technical analysis of data. Instead of guesstimating, data helps us get concrete and factual information.

The most widely used statistical concept in data science is called Statistical Features. It includes important measurements like bias, variance, mean, median and percentiles. It’s all code-friendly too.

149

1.78K reads

Data Reading On Statistical Features

A typical data set diagram (box plot) carries a lot of information.

If it is short, it means the data points are similar, but if it is tall, it implies there is a lot of range and variance.
A median (the line in the middle of a dataset graph) provides a more accurate reading as it avoids outlier values.
The lower regions of the box plot represent smaller percentages (like 25 percentile), with the higher regions denoting larger ones.

120

580 reads

Probability Distributions

In data science, probability is the percent chance that something will happen. A zero(0) in this case means the event will not occur, while the digit 1 denotes that we are certain it will happen.

124

597 reads

Common Probability Distributions

The common probability distributions are:

Uniform Distribution: It is a simple off or on distribution, where anything outside the given range is 0.
Normal (or Gaussian) Distribution: This distribution has the same standard deviation in all directions. We get to know the average dataset value along with the spread of the data.
Poisson Distribution: This is similar to Poisson Distribution but also has skewness, in which the variation tells about the spread of the data in different directions.

127

406 reads

Dimensionality Reduction

The process of reduction in the number of dimensions (or feature variables) in datasets is known as Dimensionality Reduction.

If a cube has 1000 points, we can reduce its dimensionality by simply taking the 3D data and viewing it as a 2D model. We can also remove feature variables to reduce the data volume. This is generally done with features that have a low correlation with the dataset and is called feature pruning.

123

381 reads

Over And Under Sampling

Sometimes if we want to compare two datasets, or classify datasets that have an uneven number of samples for different sides or types. Just by taking fewer samples (undersampling), one can even out a dataset.

Oversampling is a way to copy datasets to have the same number of examples as the other class. The copies are produced maintaining the distribution ratio.

119

398 reads

Bayesian Statistics

Based on the concept of probability, Bayesian Statistics computes and analyzes prior data to forecast the future trend. If there is a specific change in the present, the prior data will not reflect that.

Frequency analysis, therefore, is computing the likelihood of a specific occurrence, where new information isn’t computed.

128

476 reads

IDEAS CURATED BY

Cameron

@camz

Everyone you meet has something to teach you.

Cameron 's ideas are part of this journey:

Learn more about problemsolving with this collection

Behavioral Economics, Explained

How to make rational decisions

The role of biases in decision-making

The impact of social norms on decision-making

Related collections

The Art of Decision-Making

Productivity Systems

How To Study Effectively For Exams

7 Books on Habits

Similar ideas

3 ideas

How Risk Analysis Works

investopedia.com

8 ideas

6 Math Foundation to Start Learning Machine Learning

towardsdatascience.com

1 idea

I ranked every Intro to Data Science course on the internet, based on thousands of data points

medium.com

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

—

100+ Learning Journeys

—

Access to 200,000+ ideas

—

Access to the mobile app

—

Unlimited idea saving

—

Unlimited history

—

Unlimited listening to ideas

—

Downloading & offline access

—

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

I agree to receive email updates

deepstash

Content

Ideas

Collections

Stories

Explore

Product

Pricing

Businesses

Resources

Terms

Privacy

Press Kit

Sitemap

Company

About

Contact