How to Aggregate Data Using Group By in SQL

SQL allows the user to store more than 30 types of data in as many columns as required, so sometimes, it becomes difficult to find similar data in these columns. Group By in SQL helps us club together identical rows present in the columns of a table. This is an essential statement in SQL as it provides us with a neat dataset by letting us summarize important data like sales, cost, and salary.

What Is Group By in SQL?

The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.

The SQL Group By statement uses the split-apply-combine strategy. 

  • Split: The different groups are split with their values.
  • Apply: The aggregate function is applied to the values of these groups.
  • Combine: The values are combined in a single row.

group_by_strategy-SQL_group_by.

The SUM() aggregate function, which results in the arithmetic sum of the rows’ values, has been applied to the groups in the above illustration. 

To better understand how SQL Order By is used, let’s get an insight into this statement's syntax.

Business Analyst Master's Program

Gain expertise in Business analytics toolsExplore Program
Business Analyst Master's Program

The Syntax for Using Group By in SQL

SELECT column_1, function_name(column_2)

FROM table_name

WHERE [condition]

GROUP BY column_name

ORDER BY column_name;

  • The columns to be retrieved are specified in the SELECT statement and separated by commas. Any of the aggregate functions can be used on one or more than one of the columns being retrieved.
  • The table being used is specified in the table_name parameter of the FROM statement.
  • There is an optional WHERE clause, which can be used to specify any condition according to which the rows are to be selected.

NOTE: The WHERE clause always precedes the ORDER BY statement.

  • The GROUP BY statement lets the database system know that we wish to group the same value rows of the columns specified in this statement's column_names parameter.
  • The optional ORDER BY statement is used to sort the resulting table in ascending order based on the column specified in this statement's column_name parameter.

NOTE: The ORDER BY statement always appears after the GROUP BY statement and is applied to the groups of rows formed.

Aggregate functions are one of the most vital parts of the SQL Group By statement, so let’s see what these are.

Aggregate Functions

The five aggregate functions that we can use with the SQL Order By statement are:

  • AVG(): Calculates the average of the set of values.
  • COUNT(): Returns the count of rows.
  • SUM(): Calculates the arithmetic sum of the set of numeric values.
  • MAX(): From a group of values, returns the maximum value.
  • MIN(): From a group of values, returns the minimum value.

Let’s apply the concepts stated above to the columns of a table to get an even clearer picture of the Group By statement.

Using Group By on a Single Column

Let’s take a sample “Employee_dept” table:

Employee_dept_table

  • In this table, to find the average salary for each department’s employees, we’ll use the following query:

average_salary-SQL_Group_by

We have used the AVG() aggregate function for this purpose.

This will result in the following table:

average_salary_output

There was no ORDER BY clause, so the table has no definite order of rows.

  • Let’s see what happens when we use the ORDER BY clause.

To find the number of employees from each city, this is what is required:

count_city-SQL_Group_by

And it will result in the following table:

count_city_output

As we can see, the groups have been ordered in ascending order, according to the number of employees from each city.

  • The WHERE clause can also be used in the query, and it precedes the GROUP BY statement. Let’s see an example of this.

To find the maximum salary for each department with “Dept_ID” greater than 1003:

group_by_with_where_clause

The WHERE clause will be applied to the rows before they are grouped.

The above query will result in the following:

group_by_with_where_clause_output.

  • Let’s see what happens when we use an aggregate function without the GROUP BY statement.

To find the total salary of all the employees:

without_group_by

As we can see, the count function on “Dept_ID” returns the total number of records in the table, and the sum function on “Salary” returns the arithmetic sum of all the employees’ salaries.

The SQL Group By statement can be applied to multiple columns of a table in a single query. Let's see how this is done.

Introduction To Data Analytics (FREE)

Mastery to Data Analytics Basics is a Click Away!Start Learning
Introduction To Data Analytics (FREE)

Using Group By on Multiple Columns

In the “Employee_dept” table, to find the number of employees from each city in all the  departments, we’ll use the following query:

group_by_on_multiple_columns-SQL_Group_by

As we can see, the GROUP BY is applied to two columns. 

This will result in the following table:

group_by_on_multiple_columns_output

As we can see, each city and department combination is represented in the table, and the number of employees for each of these combinations has been calculated.

Using the Having Clause With Group By

The WHERE clause cannot be used to filter the grouped rows based on a condition. We have the HAVING clause for this purpose. The syntax for using this clause is:

SELECT column_1, function(column_2)

FROM table_name

WHERE [condition]

GROUP BY column_names

HAVING [condition]

ORDER BY column_names;

The HAVING clause always precedes the ORDER BY statement in the query.

In the “Employee_dept” table, we'll use the following query to find the cities with multiple employees.

group_by_with_having-SQL_Group_by

As we can see, a condition has been applied to the group of cities formed by the GROUP BY statement. This will result in the following:

group_by_with_having_output.

Only the cities with multiple employees are returned.

With this, we come to the end of this article about the popular Group By statement.

Gain expertise in the latest Business analytics tools and techniques with the Post Graduate Program in Business Analysis. Enroll now!

Next Steps

Data aggregation or summarization is one of the most important tasks and skills for anyone who manages data. From profit and loss statements to creating a perfect visualization of data, SQL Group By is the tool that allows us to do these kinds of tasks efficiently.

Now that you know how to aggregate and summarize data, it is time for you to start querying, manipulating, and visualizing all kinds of data to move forward in your journey to become an expert in SQL. If you liked this article and want to get certified, check out our Business Analyst Master’s Program as it will help you learn the A-Z of SQL as well.

Do you have any questions for us? Please mention them in the comment section of our “How to Aggregate Data Using Group By in SQL” article, and we’ll have our experts in the field answer them for you right away!

About the Author

SimplilearnSimplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.