# Data Science And Statistics

Statistics is using math to do technical analysis of data. Instead of guesstimating, data helps us get concrete and factual information.

The most widely used statistical concept in data science is called Statistical Features. It includes important measurements like bias, variance, mean, median and percentiles. It’s all code-friendly too.

### Dimensionality Reduction

The process of reduction in the number of dimensions (or feature variables) in datasets is known as Dimensionality Reduction.

If a cube has 1000 points, we can reduce its dimensionality by simply taking the 3D data and viewing it as a 2D model. We can also remove feature variables...

### Common Probability Distributions

The common probability distributions are:

1. Uniform Distribution: It is a simple off or on distribution, where anything outside the given range is 0.
2. Normal (or Gaussian) Distribution: This distribution has the same standard deviation in all d...

### Probability Distributions

In data science, probability is the percent chance that something will happen. A zero(0) in this case means the event will not occur, while the digit 1 denotes that we are certain it will happen.

### Bayesian Statistics

Based on the concept of probability, Bayesian Statistics computes and analyzes prior data to forecast the future trend. If there is a specific change in the present, the prior data will not reflect that.

Frequency analysis, therefore, is computing the likelihood of a speci...

### Over And Under Sampling

Sometimes if we want to compare two datasets, or classify datasets that have an uneven number of samples for different sides or types. Just by taking fewer samples (undersampling), one can even out a dataset.

Oversampling is a way to copy datasets to have the same number ...

### Data Reading On Statistical Features

A typical data set diagram (box plot) carries a lot of information.

1. If it is short, it means the data points are similar, but if it is tall, it implies there is a lot of range and variance.
2. A median (the line in the middle of a dataset graph) provides a more accurate reading...

