In today's world, data is becoming increasingly important. Entire professions are dedicated to studying, understanding, manipulating, and processing data. It is important to hence, know about different types of data and their associated properties.
The most frequently occurring type of data and probability distribution is the normal distribution. A symmetrical bell-shaped curve defines it. However, under the influence of significant causes, the normal distribution too can get distorted. This distortion can be calculated using skewness and kurtosis. In this tutorial titled ‘The Simplified and Complete Guide to Skewness and Kurtosis’, you will be exploring some of the different types of distortion that can occur in a normal curve.
What Is a Normal Distribution?
A normal distribution is a continuous probability distribution for a random variable. A random variable is a variable whose value depends on the outcome of a random event. For example, flipping a coin will give you either heads or tails at random. You cannot determine with absolute certainty if the following outcome is a head or a tail.
When you plot the probability of a random event, you get its probability distribution. The probability of a random variable that can take on any value is called a continuous probability distribution. The number of values that the probability could be are infinite and form a continuous curve. Hence, instead of writing the probability values, you define the range in which they lie.
When the continuous probability distribution curve is bell-shaped, i.e., it looks like a hill with a well-defined peak, it is said to be a normal distribution. The peak of the curve is at the mean, and the data is symmetrically distributed on either side of it. The mean, median, and mode are equal to each other or lie close to each other.
Figure 1: Normal distribution
Consider the marks scored in a math test by students in a class. The majority of the students would have scored the average mark. Few students would have scored a little less, and some would have scored more. Even fewer would be in the bottom 10% and the top 10%. Some examples of normal distributions are:
- Blood pressure of people
- I.Q. scores
- Salaries
Measures of Skewness and Kurtosis
What Is Skewness?
Skewness is used to measure the level of asymmetry in our graph. It is the measure of asymmetry that occurs when our data deviates from the norm.
Sometimes, the normal distribution tends to tilt more on one side. This is because the probability of data being more or less than the mean is higher and hence makes the distribution asymmetrical. This also means that the data is not equally distributed. The skewness can be on two types:
1. Positively Skewed: In a distribution that is Positively Skewed, the values are more concentrated towards the right side, and the left tail is spread out. Hence, the statistical results are bent towards the left-hand side. Hence, that the mean, median, and mode are always positive. In this distribution, Mean > Median > Mode.
Figure 2: Positively Skewed
2. Negatively Skewed: In a Negatively Skewed distribution, the data points are more concentrated towards the right-hand side of the distribution. This makes the mean, median, and mode bend towards the right. Hence these values are always negative. In this distribution, Mode > Median > Mean.
Figure 3: Negatively Skewed
Pearson’s First Coefficient
The median is always the middle value, and the mean and mode are the extremes, so you can derive a formula to capture the horizontal distance between mean and mode.
Figure 4: Pearson’s First Coefficient
The above formula gives you Pearson's first coefficient. Division by the standard deviation will help you scale down the difference between mode and mean. This will scale down their values in a range of -1 to 1. Now understand the below relationship between mode, mean and median.
Figure 5: Mode in terms of mean and median
Substituting this in Pearson’s first coefficient gives us Pearson’s second coefficient and the formula for skewness:
Figure 6: Pearson’s Second Coefficient
If this value is between:
- -0.5 and 0.5, the distribution of the value is almost symmetrical
- -1 and -0.5, the data is negatively skewed, and if it is between 0.5 to 1, the data is positively skewed. The skewness is moderate.
- If the skewness is lower than -1 (negatively skewed) or greater than 1 (positively skewed), the data is highly skewed.
What Is Kurtosis?
Kurtosis is used to find the presence of outliers in our data. It gives us the total degree of outliers present.
The data can be heavy-tailed, and the peak can be flatter, almost like punching the distribution or squishing it. This is called Negative Kurtosis (Platykurtic). If the distribution is light-tailed and the top curve steeper, like pulling up the distribution, it is called Positive Kurtosis (Leptokurtic).
Figure 7: (a) Leptokurtic, (b) Normal Distribution, (c) Platykurtic
The expected value of kurtosis is 3. This is observed in a symmetric distribution. A kurtosis greater than three will indicate Positive Kurtosis. In this case, the value of kurtosis will range from 1 to infinity. Further, a kurtosis less than three will mean a negative kurtosis. The range of values for a negative kurtosis is from -2 to infinity. The greater the value of kurtosis, the higher the peak.
Figure 8: Excess Kurtosis
Hence, you can say that Skewness and Kurtosis are used to describe the spread and height of your normal distribution. Skewness is used to denote the horizontal pull on the data. It tells you how spread out the data is, and Kurtosis is used to find the vertical pull or the peak's height.
Looking forward to a career in Data Analytics? Check out the Data Analytics Course and get certified today.
Conclusion
In this tutorial ‘The Complete Guide to Skewness and Kurtosis’, you saw the concept of Skewness and Kurtosis and how to find their mathematical values. You also take a look at how different values of skewness and kurtosis affect the distribution.
Statistical concepts like Skewness and Kurtosis are critical concepts applied in the field of Data Analytics. If you are looking to pursue this line of study further and perhaps make a career as a Data Analyst, Simplilearn’s Data Analytics Certification Program in partnership with Purdue University & in collaboration with IBM is the program for you. Learn from experts in the field, attend masterclasses from Purdue and IBM and get certificates and endorsements that can help you get into today’s top companies in exciting Data Analytics roles.
Was this tutorial on Skewness and Kurtosis useful to you? Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!