A correlation coefficient is a descriptive statistic that summarizes the data and helps you compare results between sample data. It is unit-free, which means that you can compare the coefficients directly. In this tutorial, you will learn what correlation is and the different types of the correlation coefficient.
What Is Correlation?
Correlation refers to the statistical relationship between the two entities. It measures the extent to which two variables are linearly related. For example, the height and weight of a person are related, and taller people tend to be heavier than shorter people.
You can apply correlation to a variety of data sets. In some cases, you may be able to predict how things will relate, while in others, the relation will come as a complete surprise. It's important to remember that just because something is correlated doesn't mean it's causal.
There are three types of correlation:
- Positive Correlation: A positive correlation means that this linear relationship is positive, and the two variables increase or decrease in the same direction.
- Negative Correlation: A negative correlation is just the opposite. The relationship line has a negative slope, and the variables change in opposite directions, i.e., one variable decreases while the other increases.
- No Correlation: No correlation simply means that the variables behave very differently and thus, have no linear relationship.
What is Correlation Coefficient?
- Correlation coefficients give you the measure of the strength of the linear relationship between two variables.
- The letter r denotes the value, and it ranges between -1 and +1
- If r < 0, it implies negative correlation
- If r > 0, it implies positive correlation
- If r = 0, it implies no correlation
- Calculating the correlation coefficient takes time; therefore, data is entered into a calculator, computer, or statistics program to calculate the correlation coefficient.
Types of Correlation Coefficient
There are mainly two types of correlation coefficients.
Pearson’s Product Moment Correlation
The Pearson correlation coefficient is defined in statistics as the measurement of the strength of the relationship between two variables and their association. It is denoted by r.
The correlation coefficient can be calculated by using the below formula:
- r = Coefficient of correlation
- xbar = Mean of x-variable
- ybar = Mean of y-variable
- xi yi = Samples of variable x,y
Spearman’s Rank Correlation
Spearman’s rank correlation measures the strength and direction of association between two ranked variables. It basically gives the measure of monotonicity of the relation between two variables i.e. how well the relationship between two variables could be represented using a monotonic function.
ρ= Spearman rank correlation
di= Difference between the ranks of corresponding variables
n= Number of Observations
Calculate Correlation Using Excel
You will now see how you can calculate the correlation between two variables using Excel. Here, you have the data of the temperature of the day and the unit of ice cream sold on that day.
Step 1: On the Data tab, in the Analysis group, click Data Analysis.
Step 2: Select Correlation and click OK.
Step 3: Select input and output range.
Step 4: Click OK. You will get the correlation data.
0.774 shows there is a strong positive correlation between the two variables.
Now, in this blog we have already learned about what is correlation coefficient. Let's learn about Kendall Correlation. Kendall rank correlation is a non-parametric test for evaluating how dependent two variables are on each other. The total number of pairings with samples a and b is n(n-1)/2 if we take into account two samples, a, and b, each of which has a sample size of n. Kendall rank correlation can be calculated as follows:
Correlation Coefficient Formula
= (nc - nd)/(0.5*n*(n-1))
Nc is the number of concordant.
Nd is the number of discordant
When data is ordered by quantities, Kendall rank correlation is employed to see if the ordering is comparable. Kendall's coefficient of correlation takes pairs of data and calculates the degree of connection based on the patterns of concordance and discordance between the pairings. Unlike other forms of coefficient of correlation that use observations as the foundation of the correlation.
- Concordant: arranged similarly (consistency). If (x2 — x1) and (y2 — y1) have the same sign, then two observations are said to be concordant.
- Discordant: In a different order (inconsistency). If (x2 — x1) and (y2 — y1) have opposite signs, a pair of observations is said to be concordant.
Spearman's rho correlation values are often larger than Kendall's Tau values. Concordant and discordant pairings are used as the basis for the computations. Now, let’s see how to find correlation coefficient?
Calculate the Correlation Coefficient
Before going through the steps, let’s see what does correlation mean in statistics? The size and direction of a relationship between two or more variables are described by the statistical measure of correlation, which is expressed as a number. However, a correlation between two variables does not necessarily imply that a change in one variable is the reason for a change in the values of the other. Let’s calculate correlation coefficient.
Determine Your Data Sets
Decide on your variables before you start your math. You may include these numbers in your equation after you are familiar with your data sets. Put the x and y variables between these values.
x: (1, 2, 3, 4) and y: (2, 3, 4, 5)
Calculate the Mean of the X and Y Variables
Add the values of each variable together and divide by the total number of values in the dataset to determine the mean, also known as the average. Using the example, you would add 1, 2, 3, and 4 together and divide by 4 to find the mean of x because x has four possible values. Likewise with the y variables. Since there are four possible values for y in the aforementioned example, you would add 2, 3, 4, and 5 together and divide by 4.
Subtract the Mean
By subtracting the mean from each x-variable value, you can calculate the value of the x-variable "a." In the same way, you can calculate the value of the y-variable by subtracting the mean from each value and calling the result “b ”.
Multiply and Find The Sum
This is the next phase. Each a-value must be multiplied by the matching b-value. Once you've performed all of the multiplications with their respective terms, get the total, which will be the numerator of the formula.
Determine the Square Root
At this stage, you may square each a-value and calculate the total. After that, compute the square root of the amount you just established. This will serve as the denominator of the formula.
No, you must divide the number you came up with in step 4 by the number you came up with in step 5. Therefore, divide the denominator by the numerator. The correlation coefficient will follow.
Limitations of Correlation
1. Correlation does not indicate causality and cannot be used to do so. We cannot infer that one variable is the cause of another even though there is a very high relationship between them.
Imagine, for instance, that there is a link between viewing violent TV shows and engaging in violent conduct as a teenager. It's possible that a third (extraneous) variable, like growing up in a violent household, is what causes both of these, and that both violent behavior and TV watching are the results of this.
2. We are limited by correlation to the information that is provided. For instance, suppose research revealed a link between the amount of time students spend on their homework (from half an hour to three hours) and the number of G.C.S.E. passes (1 to 6). It would be incorrect to conclude from this that putting in 6 hours of homework would probably result in 12 G.C.S.E. passes.
Examples of Correlation
Now, let’s understand some real-life examples and see what does negative correlation mean? And what does positive correlation mean?
Example 1: Body Fat and Running Time
An individual's body fat tends to be lower the more time they spend jogging. In other words, there is a negative correlation between the variable body fat and the variable running time. Body fat decreases as running time increases.
Example 2: Exam Results and TV Viewing Time
Exam results typically suffer when a student watches more television. In other words, there is a negative correlation between the variable amount of time spent watching TV and the variable exam grade. Exam results decline as TV viewing time increases.
Example 3: Height Vs Weight
The relationship between a person's weight and height is often good. To put it another way, bigger people often weigh more.
Example 4: Temperature Vs. Sales of Ice Cream
The temperature and overall ice cream sales have a favorable association. In other words, since more people like to buy ice cream when it's hot outdoors, the company's overall ice cream sales tend to be greater when it's hotter outside.
In this tutorial, you have learned about what correlation is. You also explored correlation coefficients and their types. You also learned the Pearson product-moment correlation and spearman rank correlation with the formula.
If you are looking to pursue this further and make a career as a Data Analyst, Simplilearn’s Caltech Post Graduate Program in Data Science in partnership with Caltech CTME University & in collaboration with IBM is the program for you.
Was this tutorial on what correlation is helpful to you? If you have any doubts or questions, please mention them in this tutorial’s comments section, and we'll have our experts answer them for you at the earliest!