If you work with datasets long enough, you will eventually need to deal with statistics. Ask the average person what statistics are, and they’ll probably throw around words like “numbers,” “figures,” and “research.”

Statistics is the science, or a branch of mathematics, that involves collecting, classifying, analyzing, interpreting, and presenting numerical facts and data. It is especially handy when dealing with populations too numerous and extensive for specific, detailed measurements. Statistics are crucial for drawing general conclusions relating to a dataset from a data sample.

Statistics further breaks down into two types: descriptive and inferential. Today, we look at descriptive statistics, including a definition, the types of descriptive statistics, and the differences between descriptive statistics and inferential statistics.

Descriptive Statistics Defined

Descriptive statistics describe, show, and summarize the basic features of a dataset found in a given study, presented in a summary that describes the data sample and its measurements. It helps analysts to understand the data better.

Descriptive statistics represent the available data sample and do not include theories, inferences, probabilities, or conclusions. That’s a job for inferential statistics.

Also Read: The Difference Between Data Mining and Statistics

Start your Dream Career with the Best Resources!

Caltech Post Graduate Program in Data ScienceExplore Program
Start your Dream Career with the Best Resources!

Descriptive Statistics Examples

If you want a good example of descriptive statistics, look no further than a student’s grade point average (GPA). A GPA gathers the data points created through a large selection of grades, classes, and exams then average them together and presents a general idea of the student’s mean academic performance. Note that the GPA doesn’t predict future performance or present any conclusions. Instead, it provides a straightforward summary of students’ academic success based on values pulled from data.

Here’s an even simpler example. Let’s assume a data set of 2, 3, 4, 5, and 6 equals a sum of 20. The data set’s mean is 4, arrived at by dividing the sum by the number of values (20 divided by 5 equals 4).

Analysts often use charts and graphs to present descriptive statistics. If you stood outside of a movie theater, asked 50 members of the audience if they liked the film they saw, then put your findings on a pie chart, that would be descriptive statistics. In this example, descriptive statistics measure the number of yes and no answers and show how many people in this specific theater liked or disliked the movie. If you tried to come up with any other conclusions, you would be wandering into inferential statistics territory, but we'll later cover that issue.

Finally, political polling is considered a descriptive statistic, provided it’s just presenting concrete facts (the respondents’ answers), without drawing any conclusions. Polls are relatively straightforward: “Who did you vote for President in the recent election?”

Types of Descriptive Statistics

Descriptive statistics break down into several types, characteristics, or measures. Some authors say that there are two types. Others say three or even four. 

Distribution (Also Called Frequency Distribution)

Datasets consist of a distribution of scores or values. Statisticians use graphs and tables to summarize the frequency of every possible value of a variable, rendered in percentages or numbers. For instance, if you held a poll to determine people’s favorite Beatle, you’d set up one column with all possible variables (John, Paul, George, and Ringo), and another with the number of votes.

Statisticians depict frequency distributions as either a graph or as a table.

Measures of Central Tendency

Measures of central tendency estimate a dataset's average or center, finding the result using three methods: mean, mode, and median.

Mean: The mean is also known as “M” and is the most common method for finding averages. You get the mean by adding all the response values together, and dividing the sum by the number of responses, or “N.” For instance, say someone is trying to figure out how many hours a day they sleep in a week. So, the data set would be the hour entries (e.g., 6,8,7,10,8,4,9), and the sum of those values is 52. There are seven responses, so N=7. You divide the value sum of 52 by N, or 7, to find M, which in this instance is 7.3.

Mode: The mode is just the most frequent response value. Datasets may have any number of modes, including “zero.” You can find the mode by arranging your dataset's order from the lowest to highest value and then looking for the most common response. So, in using our sleep study from the last part: 4,6,7,8,8,9,10. As you can see, the mode is eight.

Median: Finally, we have the median, defined as the value in the precise center of the dataset. Arrange the values in ascending order (like we did for the mode) and look for the number in the set’s middle. In this case, the median is eight.

Variability (Also Called Dispersion)

The measure of variability gives the statistician an idea of how spread out the responses are. The spread has three aspects — range, standard deviation, and variance.

Range: Use range to determine how far apart the most extreme values are. Start by subtracting the dataset’s lowest value from its highest value. Once again, we turn to our sleep study: 4,6,7,8,8,9,10. We subtract four (the lowest) from ten (the highest) and get six. There’s your range.

Standard Deviation: This aspect takes a little more work. The standard deviation (s) is your dataset’s average amount of variability, showing you how far each score lies from the mean. The larger your standard deviation, the greater your dataset’s variable. Follow these six steps:

  1. List the scores and their means.
  2. Find the deviation by subtracting the mean from each score.
  3. Square each deviation.
  4. Total up all the squared deviations.
  5. Divide the sum of the squared deviations by N-1.
  6. Find the result’s square root.

Raw Number/Data

Deviation from Mean

Deviation Squared


4-7.3= -3.3



6-7.3= -1.3



7-7.3= -0.3



8-7.3= 0.7



8-7.3= 0.7






10-7.3= 2.7



Sum = 0.9

Square sums= 23.83

When you divide the sum of the squared deviations by 6 (N-1): 23.83/6, you get 3.971, and the square root of that result is 1.992. As a result, we now know that each score deviates from the mean by an average of 1.992 points.

Variance: Variance reflects the dataset’s degree spread. The greater the degree of data spread, the larger the variance relative to the mean. You can get the variance by just squaring the standard deviation. Using the above example, we square 1.992 and arrive at 3.971.

Learn From The Best in The Data Science Business!

Caltech Data Science BootcampExplore Now
Learn From The Best in The Data Science Business!

Univariate Descriptive Statistics

Univariate descriptive statistics are helpful when it comes to summarizing huge amounts of numerical data as well as revealing patterns in the raw data. Patterns discovered in univariate data may be described using central tendency (mean, mode, and median), as well as dispersion: variance, range, quartiles, standard deviations, maximum, and minimum.

When dealing with univariate data, you have numerous alternatives for defining it.

  • Frequency Distribution Table
  • Histograms
  • Bar Charts
  • Pie Charts
  • Frequency Polygon

Bivariate Descriptive Statistics

Bivariate statistics are inferential statistics that examine the connection between two variables. In other words, bivariate statistics investigates how one variable compares to another or how one variable impacts another.

Bivariate descriptive statistics include studying (comparing) two variables at the same time in order to see whether there is a link between them. By convention, the columns represent the independent variable and the rows represent the dependent variable.

Univariate vs. Bivariate



Involves only one variable

Involves two variables

Doesn't deal with relationships or causes

Deals with causes or relationships

The prime purpose of univariate is describing:

  • Dispersion: variance, range, standard deviation, quartiles, maximum, minimum
  • Central tendency: mean median, and mode
  • Bar graph, pie chart, histogram, box-and-whisker plot, line graph

The prime purpose of bivariate is explaining:

  • Correlations: Comparisons, explanations, causes, relationships
  • Dependent and independent variables
  • Tables where just one variable is dependent on other variables' values
  • Simultaneous analysis of two variables

What is the Main Purpose of Descriptive Statistics?

The prime purpose of descriptive statistics is to convey information regarding a data set. It helps in reducing a large chunk of data into a few relevant pieces of information.

What’s the Difference Between Descriptive Statistics and Inferential Statistics?

So, what’s the difference between the two statistical forms? We’ve already touched upon this when we mentioned that descriptive statistics doesn’t infer any conclusions or predictions, which implies that inferential statistics do so.

Inferential statistics takes a random sample of data from a portion of the population and describes and makes inferences about the entire population. For instance, in asking 50 people if they liked the movie they had just seen, inferential statistics would build on that and assume that those results would hold for the rest of the moviegoing population in general.

Therefore, if you stood outside that movie theater and surveyed 50 people who had just seen Rocky 20: Enough Already! and 38 of them disliked it (about 76 percent), you could extrapolate that 76% of the rest of the movie-watching world will dislike it too, even though you haven’t the means, time, and opportunity to ask all those people.

Simply put: Descriptive statistics give you a clear picture of what your current data shows. Inferential statistics makes projections based on that data.

Enroll in the Professional Certificate Program in Data Science to learn over a dozen of data science tools and skills, and get exposure to masterclasses by Purdue faculty and IBM experts, exclusive hackathons, Ask Me Anything sessions by IBM.

Why Not Become a Data Scientist?

Whether you like descriptive or inferential statistics, you can find many opportunities in the field of data analytics and data science. Simplilearn’s Professional Certificate Program in Data Science, gives you broad exposure to key data science concepts and tools like Python, R, Machine Learning, and more. Hands-on labs and project work in this acclaimed program bring the ideas to life with skilled trainers and teaching assistants to guide you along the way.

The boot camp, conducted in partnership with Purdue University and in collaboration with IBM, features the perfect mix of theory, case studies, & extensive hands-on practice. The Economic Times ranked this Data Science certification program at the top of its list.

According to Glassdoor, data scientists earn an annual average of USD 113,309. Payscale shows that a data scientist in India makes a yearly average of ₹817,366. Data science is a great career choice if you’re looking for a challenge in a secure vocation and getting well-compensated in the process!

Check out Simplilearn’s data science courses today and embark on this exciting new opportunity!

About the Author


Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • *According to Simplilearn survey conducted and subject to terms & conditions with Ernst & Young LLP (EY) as Process Advisors