What Is ANOVA? Understanding the Fundamentals of ANOVA

Data collection, organization, analysis, interpretation, and presentation are all part of statistics, a branch of mathematics. As an interdisciplinary field, statistics have several concepts that have found practical applications. Analysis of Variance, also known as ANOVA, is one such concept that will be discussed in this article.

Post Graduate Program In Data Science

The Ultimate Ticket To Top Data Science Job RolesExplore Course
Post Graduate Program In Data Science

What Is Analysis of Variance (ANOVA)?

ANOVA is to test for differences among the means of the population by examining the amount of variation within each sample, relative to the amount of variation between the samples.  Analyzing variance tests the hypothesis that the means of two or more populations are equal.

In a regression study, analysts use the ANOVA test to determine the impact of independent variables on the dependent variable.

When Might You Use ANOVA?

As an analyst, you might use Analysis of Variance (ANOVA) to test a particular hypothesis. You'd use ANOVA to figure out how your various groups react, with the null hypothesis being that the means of the various groups are equal. If the difference between the two populations is statistically significant, then the two populations are unequal.

Now that we have understood what ANOVA is, let’s understand some important terms related to ANOVA.

Free Course: Introduction to Data Science

Learn the Fundamentals of Data ScienceEnroll Now
Free Course: Introduction to Data Science

Important Terms Related to ANOVA

  • Means (Grand and Sample)

A sample mean is the average value for a group, whereas the grand mean is the average of sample means from various groups or the mean of all observations combined.

  • F-Statistics

F-statistic or F-ratio is a statistical measure that tells us about the extent of difference between the means of different samples. Lower the F-ratio, closer are the sample means.

  • Sum of Squares

The sum of squares is a technique used in regression analysis to determine the dispersion of data points. It is used in the ANOVA test to compute the value of F.

  • Mean Squared Error (MSE)

The Mean Squared Error gives us the average error in the data set.

  • Hypothesis

In ANOVA, we have Null Hypothesis and an Alternative Hypothesis. The Null hypothesis is valid when all the sample means are equal, or they don’t have any major difference.

The Alternate Hypothesis is valid when at least one of the sample means is different from the other.

  • Group Variability

In ANOVA, a group is a set of samples within the independent variable.

  • Between-group variability occurs when there is a significant variation in the sample distributions of individual groups.
  • Within-group variability occurs when there are variations in the sample distribution within a single group. 

Data Scientist Master's Program

In Collaboration with IBMExplore Course
Data Scientist Master's Program

One-Way ANOVA

The most common method of performing an ANOVA test is one-way ANOVA. The one-way ANOVA means that the analysis of variance has one independent variable. 

You can use the one-way ANOVA to see if there are any significant differences between the means of your independent variables. When you know how each independent variable's mean differs from the others, you can figure out which of them is linked to your dependent variable and start to figure out what's driving that behaviour.

The two-way analysis of variance is a variation of the one-way analysis. There are two independent variables in this equation (hence the name two-way). Factors are the two independent variables in a two-way ANOVA. The concept is that the dependent variable is influenced by two variables, or factors.

Now that you know the basic concepts of ANOVA, let’s see how you can perform One-way ANOVA in Excel

Data Analysis Toolpak in Excel

In Excel, to start analyzing the data, you have to activate the add-in named Data Analysis Toolpak. 

Here are the steps to turn on the Excel add-in:

Step 1: Choose Files > Options 

Step 2: In option window, select Add-ins

Step 3: Next to Manage, select Excel Add-ins, then click Go.

Step 4: In the Add-ins window, select Analysis Toolpak, and then click Go.

anova.

Example of One-Way ANOVA in Excel

Let’s consider a problem statement. 

Suppose you are a research scientist and you want to perform clinical trials to study the effectiveness of three drugs developed by three healthcare companies to cure a certain disease.

annova-1

The data below represents the time taken to cure the disease for different patients when they consume either Drug A, B, or C. The time is represented in terms of total hours and minutes.

annova-2

At the 0.05 level of significance (alpha value), we need to test whether the mean time for the three drugs to cure the disease are equal (H0).

1. Click Data Tab, then Data Analysis.

anova-2

2. Select ANOVA: Single Factor.

anova-3.

3. In the Input Range, enter the range of cells containing the data. Click OK.

annova-3

You have your ANOVA table ready.

annova-4

Free Course: Python Libraries for Data Science

Learn the Basics of Python LibrariesEnroll Now
Free Course: Python Libraries for Data Science

Summary of the ANOVA

  • The average time taken to cure the disease after consumption of the three drugs are approximately 107, 90 and 97 hours.
  • The difference between the largest and smallest mean is 17.47.
  • We can see that the F value > F-critical value. So, we can reject the null hypothesis. This means that the average time taken to cure the disease is not the same for all the three drugs. 
  • Using paired comparisons, we can conclude the time taken by the three drugs as follows: Time(Drug A) > Time(Drug C) > Time(Drug B). 
Looking forward to becoming a Data Scientist? Check out the Data Science Bootcamp Program and get certified today.

Conclusion

It is recommended that you take a variety of problem statements and solve them using Analysis of Variance and the techniques discussed in this tutorial. 

Simplilearn also offers a comprehensive course on Data Science and Analytics which prepares you for all kinds of Data Science roles.

If you have any questions or queries relating to this tutorial ‘What is ANOVA’, do share them in the comment section. Our subject matter experts will acknowledge and respond to your queries. Happy learning!

About the Author

SimplilearnSimplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.