When you have a huge dataset, it will be convenient if, instead of looking at it and trying to figure it out, you just got numbers that give you a summary of important measures of that dataset. Luckily, all this is possible with the help of a concept of statistics called Measures of Central Tendency, that includes the very common terms of mean, median and mode.
The most fundamental terms for analyzing data through statistics are mean, median, and mode. In this tutorial on the Measures of Central Tendency, you will look at various terms and try to understand mean, median, and mode with definition, formulae, and solved examples.
Of mean, median and mode, let’s first look to understand various types of mean. We start with arithmetic mean;
What Is Arithmetic Mean?
In statistics, Arithmetic Mean is the average of all data values which you work with. Mean is used to find the average value around which your data values range.
Generally, when working with data, you may want to know the average data value. This will give you a term that incorporates every data value from the dataset. This also helps produce a term that has minimum error out of all terms in the data set. Hence, you can minimize the individual error occurring at any data point. The mean includes every data value in its calculation and gives us a cumulative term that sums up the dataset well.
Figure 1: Arithmetic Mean
To find the mean, all you have to do is add up all the values in your data and then divide it by the total number of data values. Consider n terms X_1, X_2, X_3,………… X_n. The mean is the total sum of terms by the number of terms.
Figure 2: Arithmetic Mean formula
Now, you will understand mean with the help of an example. Consider a class whose students have obtained the following marks out of 50 in mathematics :
Figure 3: Class marks data
You can see that there are 12 data points. So all you have to do is add up each value and divide the result by 12, as shown below :
Figure 4: Class marks mean
Hence, you get the mean as 37. This means that, on average, a student belonging to the above class will score 37 out of 50 in mathematics.
Next in this mean, median, and mode tutorial, we move on to understanding about median.
What Is Median?
Median refers to the middle value of your data. To find the median, you first sort the data in either ascending or descending order and then find the numerical value present in the middle of your data.
The median refers to the middle value of your data. You can use the median to figure out the point around which your data is centered. It divides the data into two halves and has the same number of data points above and below.
The median is especially useful when you have skewed data. That is, it has high data distribution towards one side. In this case, the average wouldn't give you a fair mid-value but would lean more towards the higher values. In this case, you can use the middle data point as the central point instead.
Consider n terms X_1, X_2, X_3,………… X_n. The basic formula for the median is by dividing the total number of observations by 2. This works fine when you have an odd number of terms because you will have one middle term and the same number of terms above and below. For an even number of terms, consider the two middle terms and find their average.
Figure 5: Median Formula
Now, use the same example of a class of 12 students and their marks in mathematics and find the median of this data.
Figure 6: Class marks
To find the middle term, you first have to sort the data or arrange the data in ascending or descending order. This ensures that consecutive terms are next to each other.
Figure 7: Sorted class marks
You can see that we have 12 data points, so use the median formula for even numbers.
Figure 8: Class marks median
So, the middle term in the range of marks is 37. This means that the other marks lie in a frequency range of around 37.
We now come to the last of the mean, median, and mode trio - mode.
What Is Mode?
The Mode refers to the most frequently occurring value in your data. You find the frequency of occurrence of each number and the number with the highest frequency is your mode. If there are no recurring numbers, then there is no mode in the data.
Using the mode, you can find the most commonly occurring point in your data. This is helpful when you have to find the central tendency of categorical values, like the flavor of the most popular chip sold by a brand. You cannot find the average based on the orders; instead, you choose the chip flavor with the highest orders.
Usually, you can count the most frequently occurring values and get your mean. But this only works when the values are discrete. Now, again take the example of class marks.
Figure 9: Class marks
Over here, the value 35 occurs the most frequently and hence is the mode. But what if the values are categorical? In that case, you must use the formula below:
Figure 10: Mode
l = lower limit of modal class
h = lower limit of preceding modal class
f1 = frequency of modal class
f0 = frequency of class preceding modal class
f2 = frequency of class succeeding modal class
The modal class is simply the class with the highest frequency. Consider the range of frequencies given for the marks obtained by students in a class:
Number of Students
Table 1: Class Marks
In this case, you can see that class 30-40 has the highest frequency, hence it is the modal class. The remaining values are as follows:
l = 30
h = 20
f1 = 5
f0 = 3
f2 = 4
In that case, the mode becomes :
Figure 11: Class marks mode
Hence, the mark which occurs most frequently is 43.33.
What Is Geometric Mean?
So far, you only looked at mean, median, and mode, the basic measures of central tendency. But the mean itself is of many types. Let's look at different types of the mean.
Unlike the arithmetic mean, which adds the numbers, the geometric mean multiplies our data points to find the rate of growth. It is used to calculate population or interest growth.
The geometric mean considers compounding values. You use the geometric mean on data that is not independent of each other and grows over time. Using geometric mean, you can find the average growth rate of values and find out how the data will look over time. For example, you can calculate bacteria growth, the average return of an investment portfolio, etc. using geometric mean.
Consider n terms X_1, X_2, X_3,………… X_n. The Geometric mean is obtained by taking the nth root of the product of each term.
Figure 12: Geometric Mean
Let’s consider the mathematics marks of a class again.
Figure 13: Class marks
In this case, the geometric mean is as shown below.
Figure 14: Geometric mean of class marks
This means that the marks of your class have an average growth of 32.201 or a 32% growth between the lowest and highest value.
What Is Harmonic Mean?
The harmonic mean is used to find relationships between fractions or decimals. You calculated it by taking the reciprocal of each data point and then finding the arithmetic mean. You then again take the joint of the resulting arithmetic mean to get the harmonic mean.
Mean, median and mode work best with whole numbers. But sometimes, you may have fractions or decimals in your data. When this is the case, to find the mean, you have to worry about common divisors. But if the data is vast, it will take a long time to calculate just the common denominator. You can cut this process short by using a harmonic mean. The harmonic mean is usually used for averaging ratios, rates, fractions, and decimal numbers.
Consider n terms X_1, X_2, X_3,………… X_n. The harmonic mean is the reciprocal of the mean of the reciprocal of each term.
Figure 15: Harmonic mean
Consider a class whose students have obtained the following marks out of 50 in mathematics:
Figure 16: Class marks
The resulting harmonic mean is:
Figure 17: Harmonic mean of class marks
Looking forward to a career in Data Analytics? Check out the Data Analytics Bootcamp and get certified today.
In this tutorial about the Measure of Central Tendency, you got an overview of central tendency terms like mean, median, and mode and different types of mean like harmonic and geometric mean, all with the help of definition, formulae, and solved examples.
If you need any further clarifications or want to learn more about the measure of central tendency and mean, median, and mode, share your queries with us by mentioning them in this page's comments section and we will have our experts review them at the earliest. You can also understand the concept of mean, median, and mode or other concepts by checking out this video on our youtube channel.
Are you perhaps looking to learn more about data analytical concepts and looking to build a robust career in Data Analytics? If yes, Simplilearn’s Data Analytics Certification Program should be the program for you to check out. This program is offered in partnership with Purdue University and in collaboration with IBM to offer you an industry-ready curriculum delivered by world-class practitioners and trainers. The program features live online classes and unique masterclasses from Purdue University and IBM experts. Do take a walkthrough of the course details. It just might be the solution you are looking for.