The Complete Guide to Skewness and Kurtosis

In today's world, data is becoming increasingly important. Entire professions are dedicated to studying, understanding, manipulating, and processing data. It is important to hence, know about different types of data and their associated properties.

The most frequently occurring type of data and probability distribution is the normal distribution. A symmetrical bell-shaped curve defines it. However, under the influence of significant causes, the normal distribution too can get distorted. This distortion can be calculated using skewness and kurtosis. In this tutorial titled ‘The Simplified and Complete Guide to Skewness and Kurtosis’, you will be exploring some of the different types of distortion that can occur in a normal curve.

Professional Certificate Program in Data Analytics

In partnership with Purdue UniversityView Course
Professional Certificate Program in Data Analytics

What Is a Normal Distribution?

A normal distribution is a continuous probability distribution for a random variable. A random variable is a variable whose value depends on the outcome of a random event. For example, flipping a coin will give you either heads or tails at random. You cannot determine with absolute certainty if the following outcome is a head or a tail. 

When you plot the probability of a random event, you get its probability distribution. The probability of a random variable that can take on any value is called a continuous probability distribution. The number of values that the probability could be are infinite and form a continuous curve. Hence, instead of writing the probability values, you define the range in which they lie.

When the continuous probability distribution curve is bell-shaped, i.e., it looks like a hill with a well-defined peak, it is said to be a normal distribution. The peak of the curve is at the mean, and the data is symmetrically distributed on either side of it. The mean, median, and mode are equal to each other or lie close to each other.

Skewness_And_Kurtosis_1

Figure 1: Normal distribution  

Consider the marks scored in a math test by students in a class. The majority of the students would have scored the average mark. Few students would have scored a little less, and some would have scored more. Even fewer would be in the bottom 10% and the top 10%. Some examples of normal distributions are:

  1. Blood pressure of people
  2. I.Q. scores
  3. Salaries

FREE Course: Introduction to Data Analytics

Learn Data Analytics Concepts, Tools & SkillsStart Learning
FREE Course: Introduction to Data Analytics

What Is Skewness?

Skewness is used to measure the level of asymmetry in our graph. It is the measure of asymmetry that occurs when our data deviates from the norm. 

Sometimes, the normal distribution tends to tilt more on one side. This is because the probability of data being more or less than the mean is higher and hence makes the distribution asymmetrical. This also means that the data is not equally distributed. The skewness can be on two types:

1. Positively Skewed: In a distribution that is Positively Skewed, the values are more concentrated towards the right side, and the left tail is spread out. Hence, the statistical results are bent towards the left-hand side. Hence, that the mean, median, and mode are always positive. In this distribution, Mean > Median > Mode.

Skewness_And_Kurtosis_2.

Figure 2: Positively Skewed 

2. Negatively Skewed: In a Negatively Skewed distribution, the data points are more concentrated towards the right-hand side of the distribution. This makes the mean, median, and mode bend towards the right. Hence these values are always negative. In this distribution, Mode > Median > Mean.

Skewness_And_Kurtosis_3

Figure 3: Negatively Skewed 

Pearson’s First Coefficient

The median is always the middle value, and the mean and mode are the extremes, so you can derive a formula to capture the horizontal distance between mean and mode.

Skewness_And_Kurtosis_4

Figure 4: Pearson’s First Coefficient 

The above formula gives you Pearson's first coefficient. Division by the standard deviation will help you scale down the difference between mode and mean. This will scale down their values in a range of -1 to 1. Now understand the below relationship between mode, mean and median.

Skewness_And_Kurtosis_5

Figure 5: Mode in terms of mean and median 

Substituting this in Pearson’s first coefficient gives us Pearson’s second coefficient and the formula for skewness:

Skewness_And_Kurtosis_6

Figure 6: Pearson’s Second Coefficient

If this value is between:

  1. -0.5 and 0.5, the distribution of the value is almost symmetrical
  2. -1 and -0.5, the data is negatively skewed, and if it is between 0.5 to 1, the data is positively skewed. The skewness is moderate.
  3. If the skewness is lower than -1 (negatively skewed) or greater than 1 (positively skewed), the data is highly skewed.

Data Analyst Master's Program

In Collaboration With IBMExplore Course
Data Analyst Master's Program

What Is Kurtosis?

Kurtosis is used to find the presence of outliers in our data. It gives us the total degree of outliers present. 

The data can be heavy-tailed, and the peak can be flatter, almost like punching the distribution or squishing it. This is called Negative Kurtosis (Platykurtic). If the distribution is light-tailed and the top curve steeper, like pulling up the distribution, it is called Positive Kurtosis (Leptokurtic).

Skewness_And_Kurtosis_7. 

Figure 7: (a) Leptokurtic, (b) Normal Distribution, (c) Platykurtic

The expected value of kurtosis is 3. This is observed in a symmetric distribution. A kurtosis greater than three will indicate Positive Kurtosis. In this case, the value of kurtosis will range from 1 to infinity. Further, a kurtosis less than three will mean a negative kurtosis. The range of values for a negative kurtosis is from -2 to infinity. The greater the value of kurtosis, the higher the peak. 

Skewness_And_Kurtosis_8

Figure 8: Excess Kurtosis

Hence, you can say that Skewness and Kurtosis are used to describe the spread and height of your normal distribution. Skewness is used to denote the horizontal pull on the data. It tells you how spread out the data is, and Kurtosis is used to find the vertical pull or the peak's height. 

Looking forward to a career in Data Analytics? Check out the Data Analytics Course and get certified today.

Conclusion

In this tutorial ‘The Complete Guide to Skewness and Kurtosis’, you saw the concept of Skewness and Kurtosis and how to find their mathematical values. You also take a look at how different values of skewness and kurtosis affect the distribution. 

Statistical concepts like Skewness and Kurtosis are critical concepts applied in the field of Data Analytics. If you are looking to pursue this line of study further and perhaps make a career as a Data Analyst, Simplilearn’s Data Analytics Certification Program in partnership with Purdue University & in collaboration with IBM is the program for you. Learn from experts in the field, attend masterclasses from Purdue and IBM and get certificates and endorsements that can help you get into today’s top companies in exciting Data Analytics roles.

Was this tutorial on Skewness and Kurtosis useful to you? Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!

About the Author

Kartik MenonKartik Menon

Kartik is an experienced content strategist and an accomplished technology marketing specialist passionate about designing engaging user experiences with integrated marketing and communication solutions.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.