🧐

# Problem Solving

87 SAVED IDEAS

Statistics is using math to do technical analysis of data. Instead of guesstimating, data helps us get concrete and factual information.

The most widely used statistical concept in data science is called Statistical Features. It includes important measurements like bias, variance, mean, median and percentiles. It’s all code-friendly too.

@cam751

🧐

Problem Solving

A typical data set diagram (box plot) carries a lot of information.

1. If it is short, it means the data points are similar, but if it is tall, it implies there is a lot of range and variance.
2. A median (the line in the middle of a dataset graph) provides a more accurate reading as it avoids outlier values.
3. The lower regions of the box plot represent smaller percentages (like 25 percentile), with the higher regions denoting larger ones.

In data science, probability is the percent chance that something will happen. A zero(0) in this case means the event will not occur, while the digit 1 denotes that we are certain it will happen.

The common probability distributions are:

1. Uniform Distribution: It is a simple off or on distribution, where anything outside the given range is 0.
2. Normal (or Gaussian) Distribution: This distribution has the same standard deviation in all directions. We get to know the average dataset value along with the spread of the data.
3. Poisson Distribution: This is similar to Poisson Distribution but also has skewness, in which the variation tells about the spread of the data in different directions.

The process of reduction in the number of dimensions (or feature variables) in datasets is known as Dimensionality Reduction.

If a cube has 1000 points, we can reduce its dimensionality by simply taking the 3D data and viewing it as a 2D model. We can also remove feature variables to reduce the data volume. This is generally done with features that have a low correlation with the dataset and is called feature pruning.

Sometimes if we want to compare two datasets, or classify datasets that have an uneven number of samples for different sides or types. Just by taking fewer samples (undersampling), one can even out a dataset.

Oversampling is a way to copy datasets to have the same number of examples as the other class. The copies are produced maintaining the distribution ratio.

Based on the concept of probability, Bayesian Statistics computes and analyzes prior data to forecast the future trend. If there is a specific change in the present, the prior data will not reflect that.

Frequency analysis, therefore, is computing the likelihood of a specific occurrence, where new information isn’t computed.

Hindsight bias is a false belief that our judgement is better than it actually is when we look back and see the events. Reality appears more predictable after an event happens. This is also known as the ‘Knew-it-all-along effect’.

This bias makes people less accountable for their decisions, and overconfident in their ability to make those decisions, due to the various mental models that they have developed.

Prior knowledge of an event’s outcome anchors the mind towards a certain kind of interpretation of how the event will unfold.

This makes the individual confirm the outcome to fit their existing expectations. If the outcome is entirely different from the expected result, the mind tries to justify the claimed foresight. The mind is constantly learning and updating previously held knowledge, and hindsight bias make the process less of a burden.

As often heard, the simplest explanations tend to be the most likely, and the more difficult it is for us to imagine an outcome, the less likely it is. Any new information then tends to be processed by the prior judgements and speculations, making our justifications solidify in our minds.

Example: When a couple breaks up, the small problems or quarrels noticed earlier seems to indicate that the breakup was obvious and expected. The same problems were not highlighted by the brain if the couple was together.

For protection from the hindsight bias, we can discipline ourselves to make explicit decisions based on actual, known facts, in a recorded process.

Having an explicit, written documentation can ensure that we and the external observers were objective in the decision, and it was not based (even unconsciously) on bias and prejudice.

Confirmation bias is a common tendency to self-promote and validate our own beliefs. Most controversial issues have people who are for or against the given topic, and tend to look at points that support their existing belief patterns.

Daniel Kahneman, a Nobel prize winning psychologist states that though we can be provided with tools to be aware of the cognitive errors and biases in humans, we are still unable to fix our own.

We form mental models of learning, and see any new information based on our pre-existing belief patterns, assumptions, and education, forming a framework of information in our minds.

The new information could easily be rejected if it does not integrate into the existing framework.

• We must learn to recognize and identify the cognitive action of always being biased.
• Our prior beliefs are put into everything we see, read or hear, and we make matters worse by only reading our own viewpoint, rejecting something that is not agreeing with our existing framework of information.
• One needs to jump to the other side of the shore and develop a good understanding of the counter beliefs and things we do not expose ourselves with.
• Having a mild disagreement is okay if you are talking to someone who is outside your intellectual social circle, as long as you enlighten yourself with a viewpoint not seen before.

Life is always more out of our control than we would prefer it to be. Even with the most meticulous planning, the perfect day only shows up now and then.

If we were to have a perfect day every day, it would quickly become just another normal part of our experience. Then we would need a new fantasy to take its place.

Similar to the desire for the perfect day, an ideal life can mean enforcing a rigid uniformity that does more harm than good.

Chasing utopian dreams never takes us exactly where we want to go, because ideas change, people change, and new technologies develop.

Dictators from history had an ideal world in mind that would last. But their dreams were never realized, and instead left catastrophic destruction behind.

We are unable to plan a perfect life without also fully understanding the complexity of life. Things we think we want now might be different from what we want in the near future.

We believe people have many needs and values that can come together in perfect harmony. We think, under the right conditions, education, technology and political systems, we can completely solve all our problems.

But we have to stop and consider these questions: What if our values and needs contradict each other? What, if we gain somewhere, we will lose somewhere else?

Choosing one way of life means giving up many others. A desire for privacy is at odds with convenience tools like Google and Facebook. Long-term travel will mean being lonely at times.

It is not possible to combine a diversity of forms of life within a single person.

Obsessing over the idea of having a perfect life where you compress yourself into a focused point, means that you will suffer from tunnel-vision. Tunnel-vision means that you will miss much of life.

The perfect life is always around the corner, but the decent life is right here already if you can stop for long enough to see it.

Generally, scientists occupy themselves with their longstanding research programs that follow previous ones.

But, this pattern can be disrupted by unexpected breakthroughs as a result of novel experimental findings. Anomalous experimental results lead to a surge in publications about possible interpretations and implications.

The question is: how much time should a scientist dedicate to revising research goals instead of pursuing past interests?

Too little time of brainstorming might lead to inconsequential research directions full of dead ends and stagnation. Spending too much time on planning can lead to conflicting considerations that could lead to procrastination. In order to discover something unexpected, it is necessary to take risks.

Since it is unclear in advance which research direction will yield results, scientific progress needs independent explorers. Sometimes a conventional path leads to an unexpected breakthrough, but more often, a traditional path leads to a traditional result.

Following new paths bring fresh opportunity for discovering hidden treasures.