All You Need to Know About Bias in Statistics

The tendency of a measurement process to over or under-estimate the value of a population parameter is referred to as Bias in Statistics. It is used to describe any error or distortion discovered through statistical analysis.

Bias can occur for various reasons, including a failure to respect comparability or consistency, the price collection and measurement procedures used, and the formula used for calculation and aggregation.

Measurement Errors

When a provided response differs from the true value, this is a measurement error. For example, you could conduct a survey to determine whether or not someone voted for President Obama. Someone may have voted for him, but the wording of the questionnaire confuses them, and they mistakenly respond that they did not vote for him. Several factors can cause measurement error, including:

The method by which data is collected
The way question is phrased

Classification of Bias

The bias is categorized into two different types:

Measurement Bias

When conducting a survey, measurement bias occurs throughout the process, and the reasons for its consequences can be attributed to the following:

The error happens while recording the data
Leading questions for the survey
False response from respondents

Non-Representative Sampling Bias

Non-Representative Bias occurs when a survey sample inaccurately represents the population due to working involuntarily with only a subset of the population, and the sample becomes unrepresentative of the entire population.

Types of Statistical Bias

You will now explore some of the Statistical Bias that are commonly seen:

Selection Bias

When you choose your sample or data incorrectly, selection bias occurs. Typically, this means working with a subset of your audience rather than the entire population, rendering your sample unrepresentative of the entire population. There are many reasons for this, but the most common one is when people only collect and work with easy access to data.

For example, suppose you're conducting a poll to determine how people feel about the current ruling government. You've gathered information from various people who have provided a thorough and immediate response to this question. Unfortunately, many of them cite Facebook Feed as their primary source of information. It isn't considered trustworthy because it isn't public opinion but their friends' opinion. As a result, this type of data can be classified as traditional selection bias, easily accessible but only for unrepresentative and specific subsets of the overall population.

Self-Selection Bias

Self-selection bias occurs in statistics when people choose to join a group for themselves, resulting in a biased sample from nonprobability sampling.

Now, look at another example. Assume you're conducting research into the habits of successful entrepreneurs. Because successful people do not have enough time to answer any random survey questions, the answer may be ambiguous. As a result, 98 percent of responses will come from entrepreneurs who believe they are successful entrepreneurs but are not. The surveyor (self-selector) must conduct a face-to-face interview with the successful candidates to obtain accurate survey data.

Recall Bias

Recall bias occurs when a respondent's recollection of events is inaccurate or incomplete. It's especially problematic when it comes to retrospective survey questions.

Let's say you went on vacation three years ago. You may have forgotten about the bad things and only remember the good ones. Ultimately, it does not assist us in evaluating memories, but our brains have a habit of retaining positive memories for specific reasons.

Observer Bias

When a researcher subconsciously projects their expectations onto the research, this is known as observer bias. It can take many forms, including influencing participants unintentionally during interviews and surveys or cherry-picking (focusing on the statistics that support our hypothesis rather than those that don't).

Robert Rosenthal, a psychologist, had two groups of students test rats in 1963. The rats were divided into two groups based on their ability to complete mazes: "bright" and "dull," despite being all the same type of standard lab rat.

The study found that students who thought they were handling "bright" rats behaved in ways that increased the rats' chances of completing the mazes, while students who thought they were handling "dull" rats behaved in ways that decreased their chances of completing the mazes.

Because the students' expectations influenced how well the different groups of rats performed, this is an example of observer bias.

Survivorship Bias

Survivorship bias is a type of statistical bias in which the researcher concentrates only on the parts of the data set that have already undergone some sort of pre-selection process and ignores the data points that have been lost during this process because they are not visible anymore.

For example, there is a fascinating collection of stories about statistical biases in the case of falling cats. According to a 1987 study, cats who fall from a higher building sustain more injuries than cats who fall from a lower building. It was a terminal velocity that was used as the driving force behind it. This shows that the cat falling from the high building reached top speed, giving it enough time to prepare for the landing.

However, after ten years, a newspaper reported that the cat's chances of dying were much higher than those of cats falling from the lower building. Fortunately, the cats who fell from a higher building survived. In this case, it has taken the survivorship bias into account.

Get broad exposure to key technologies and skills used in data analytics and data science, including statistics with the Data Analytics Certification Program.

Conclusion

In this tutorial, you learned about bias in statistics and its different types. You looked at where the bias can be introduced in your data unknowingly or knowingly.

If you want to learn more about Data science and statistics, you should refer to our Data Analytics Certification Program.

If you have any questions for us, please mention them in our comments section, and we will get back to you.

All You Need to Know About Bias in Statistics

Table of Contents

Measurement Errors

Classification of Bias

Measurement Bias

Non-Representative Sampling Bias

Types of Statistical Bias

Selection Bias

Self-Selection Bias

Recall Bias

Observer Bias

Survivorship Bias

Conclusion

About the Author

Recommended Resources