Statistics : Data Distributions II - Deepstash

## Types of data - Discrete Data

When you roll a die or pick a card from a deck, you have a limited number of outcomes possible.

This type of data is called Discrete Data, which can only take a specified number of values.

18

## Types of data - Continuous Data

Recording time or measuring a person’s height has infinitely many values within a given interval.

This type of data is called Continuous Data, which can have any value within a given range.

That range can be finite or infinite.

17

## Discrete Data Distributions

• Discrete uniform distribution: All outcomes are equally likely
• Bernoulli Distribution: Single-trial with two possible outcomes
• Binomial Distribution: A sequence of Bernoulli events
• Poisson Distribution: The probability that an event may or may not occur
• chi-square distribution
• Uniform distribution

18

## Discrete uniform distribution

Uniform distribution refers to a statistical distribution in which all outcomes are equally likely.

Consider rolling a six-sided die. You have an equal probability of obtaining all six numbers on your next roll, i.e., obtaining precisely one of 1, 2, 3, 4, 5, or 6, equaling a probability of 1/6, hence an example of a discrete uniform distribution.

As a result, the uniform distribution graph contains bars of equal height representing each outcome. In our example, the height is a probability of 1/6 (0.166667).

16

## Bernoulli Distribution:

It can be used as a starting point to derive more complex distributions.

Any event with a single trial and only two outcomes follows a Bernoulli distribution. Flipping a coin or choosing between True and False in a quiz are examples of a Bernoulli distribution.

We have the probability of one of the outcomes (p). From (p), we can deduce the probability of the other outcome by subtracting it from the total probability (1), represented as

(1-p).

p(Tail)= q = 1-p = 1- 0.3 = 0.7

Used for categorical variables.

16

## Binomial Distribution: A sequence of Bernoulli events

Sum of outcomes of an event following a Bernoulli distribution.

Therefore, Binomial Distribution is used in binary outcome events, and the probability of success and failure is the same in all successive trials.

An example of a binomial event would be flipping a coin multiple times to count the number of heads and tails.

B(n,p) where

n = number of trails

p=success of probability of each trail

16

## Poisson Distribution

Poisson distribution deals with the frequency with which an event occurs within a specific interval.

Instead of the probability of an event, Poisson distribution requires knowing how often it happens in a particular period or distance.

For eg. a cricket chirps two times in 7 seconds on average. We can use the Poisson distribution to determine the likelihood of it chirping five times in 15 seconds.

Represented with the notation Po(λ),  λ represents the expected number of events that can take place in a period.

The expected value and variance of a Poisson process is λ

X -discrete random variable.

15

## Continuous Data Distributions

• Normal Distribution: Symmetric distribution of values around the mean
• Student t-Test Distribution: Small sample size approximation of a normal distribution
• Exponential distribution: Model elapsed time between two events
• Weibull Distribution
• Non-normal distributions
• Lognormal distribution
• F distribution

16

## Normal Distribution

Here, data is symmetrically distributed with no skew.

When plotted, the data follows a bell shape, with most values clustering around a central region and tapering off as they go further away from the center.

Represented as N(µ, σ2) ; sample mean and variance.

16

## 68-95-99.7 Rule

The curve is symmetric at the center.

Therefore mean, mode, and median are equal to the same value, distributing all the values symmetrically around the mean.

The area under the distribution curve equals 1 (all the probabilities must sum up to 1).

68-95-99.7 Rule

68% of the data points will fall within one standard deviation of the mean.

95% of the data points will fall within two standard deviations of the mean.

99.7% of the data points will fall within three standard deviations of the mean.

16

## Student t-Test Distribution

A type of statistical distribution similar to the normal distribution with its bell shape but has heavier tails.

The t distribution is used instead of the normal distribution when you have small sample sizes.

For example, suppose we deal with the total apples sold by a shopkeeper in a month. In that case, we will use the normal distribution. Whereas, if we are dealing with the total amount of apples sold in a day, i.e., a smaller sample, we can use the t distribution.

16

## A difference in Students's t and Normal Distributions.

Critical difference between the students’ t distribution and the Normal one is that apart from the mean and variance, we must also define the degrees of freedom for the distribution.

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

A Student’s t distribution is represented as t(k), where k represents the number of degrees of freedom. For k=2, i.e., 2 degrees of freedom, the expected value is the same as the mean.

15

## Exponential distribution

Exponential distribution is one of the widely used continuous distributions.

It is used to model the time taken between different events.

For example, in physics, it is often used to measure radioactive decay; in engineering, to measure the time associated with receiving a defective part on an assembly line; and in finance, to measure the likelihood of the next default for a portfolio of financial assets.

Another common application of Exponential distributions in survival analysis (e.g., expected life of a device/machine).

16

## Weibull Distribution

It is a two-parameter family of curves.

Weibull Distributions measure data in an exponential curve – a curve beginning at zero and gradually increasing in value.

This data distribution is often used for reliability tests and can help us predict how long it will take for a system to fail.

It models a broad range of random variables, largely in the nature of a time to failure or time between events.

15

## Weibull Distribution

Terms:

α is referred to as the shape parameter, and β is the scale parameter.

When α=1, the Weibull distribution is an exponential distribution with λ=1/β, so the exponential distribution is a special case of both the Weibull distributions and the gamma distributions.

15

## Non-normal distributions

It may lack symmetry, may have extreme values, or may have a flatter or steeper “dome” than a typical bell.

There is nothing inherently wrong with non-normal data; some traits simply do not follow a bell curve.

For example, data about coffee and alcohol consumption are rarely bell shaped.

Use Case

15

## Lognormal distribution

Continuous probability distribution of a random variable whose logarithm is normally distributed.

Thus, if the random variable X is log-normally distributed, then Y = log(X) has a normal distribution.

Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y) , has a log-normal distribution.

A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicineeconomics and other topics (e.g., energies, concentrations, lengths).

15

## F distribution / Snedecor's F distribution

F-distribution or F-ratio, also known as or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor), is a continuous probability distribution that arises frequently as the null distribution of a test statistic, most notably in the analysis of variance (ANOVA) and other F-tests.

15

## Chi-Square distribution

The graph above shows examples of chi-square distributions with different values of k(shape of a chi-square distribution).

They’re widely used in hypothesis tests, including the chi-square goodness of fit test and the chi-square test of independence.

In hypothesis testing, steps

1. Ho or H1
2. significance value  α =0.05
3. Degree of freedom = n-1 ; n is number of categorical variables
4. Decision Boundary - Cheak chi-square table
5. calculate test statistics

Χ^2 =  Σ (f o - f e) ^2 / f e

f e=expected outcome

f o=observed outcome

6 . P

15

# CURATOR'S NOTE

A data distribution is a graphical representation of data that was collected from a sample or population. It is used to organize and disseminate large amounts of information in a way that is meaningful and simple for audiences to digest.

## Explore the World’sBest Ideas

### 200,000+ ideas on pretty much any topic. Created by the smartest people around & well-organized so you can explore at will.

#### An Idea for Everything

Explore the biggest library of insights. And we've infused it with powerful filtering tools so you can easily find what you need.

Knowledge Library

#### Powerful Saving & Organizational Tools

Save ideas for later reading, for personalized stashes, or for remembering it later.

# Personal Growth

### Organize your ideas & listen on the go. And with Pro, there are no limits.

#### Listen on the go

Just press play and we take care of the words.

#### Never worry about spotty connections

No Internet access? No problem. Within the mobile app, all your ideas are available, even when offline.

#### Get Organized with Stashes

Ideas for your next work project? Quotes that inspire you? Put them in the right place so you never lose them.

My Stashes

Join

2 Million Stashers

4.8

5,740 Reviews

App Store

4.7

72,690 Reviews

Best app ever! You heard it right. This app has helped me get back on my quest to get things done while equipping myself with knowledge everyday.

Sean Green

Great interesting short snippets of informative articles. Highly recommended to anyone who loves information and lacks patience.

samz905

Don’t look further if you love learning new things. A refreshing concept that provides quick ideas for busy thought leaders.

Ashley Anthony

This app is LOADED with RELEVANT, HELPFUL, AND EDUCATIONAL material. It is creatively intellectual, yet minimal enough to not overstimulate and create a learning block. I am exceptionally impressed with this app!

Jamyson Haug

Great for quick bits of information and interesting ideas around whatever topics you are interested in. Visually, it looks great as well.

Laetitia Berton

I have only been using it for a few days now, but I have found answers to questions I had never consciously formulated, or to problems I face everyday at work or at home. I wish I had found this earlier, highly recommended!

Giovanna Scalzone

Brilliant. It feels fresh and encouraging. So many interesting pieces of information that are just enough to absorb and apply. So happy I found this.

Ghazala Begum

Even five minutes a day will improve your thinking. I've come across new ideas and learnt to improve existing ways to become more motivated, confident and happier.

20x Faster

without
deepstash

with
deepstash

with

deepstash

Unlimited idea saving & library

Unlimited history

Unlimited listening to ideas

Personalized recommendations

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

Email