In SQL tables, there may be duplicate rows which often leads to data inconsistency. These problems can be overcome with a primary key, but sometimes when these rules are not followed or an exception occurs, the problem becomes more daunting. It is a better practice to use the relevant keys and constraints to remove the risk of duplicate rows. To clean up duplicate data, you must use some special procedures.

Deleting Duplicate Rows in SQL

In an SQL Server table, duplicate records can be a serious problem. Duplicate data can result in orders being handled several times, incorrect reporting results, and more. In SQL Server, there are many options for dealing with duplicate records in a table, depending on the circumstances. They are:

  • Unique constraints in Table
  • No Unique constraints in Table 

Want a Top Software Development Job? Start Here!

Full Stack Developer - MERN StackExplore Program
Want a Top Software Development Job? Start Here!

Unique Constraints in Table

According to Delete duplicate rows in SQL, a table with a unique index can use the index to identify duplicate data, and then delete the duplicate records. Self-joins, ordering the data by the maximum value, using the RANK function, or using NOT IN logic is used to perform identification.

No Unique Constraints in Table

Under Delete duplicate rows in SQL, it's a little difficult for tables without a special index. The ROW NUMBER() function can be used in connection with a common table expression (CTE) to sort the data and then remove the duplicate records.

Also Read: The Ultimate Guide on SQL Basics

Get the Coding Skills You Need to Succeed

Full Stack Developer - MERN StackExplore Program
Get the Coding Skills You Need to Succeed

SQL Delete Duplicate Rows using Group By and Having Clause

According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.

Code:

Creating and inserting data into the table:

First, you need to create a table as follows:

CREATE TABLE stud(Regno integer, Name text,Marks integer);

/* Create few records in this table */

INSERT INTO stud VALUES(1,'Tom',77);

INSERT INTO stud VALUES(2,'Lucy',78);

INSERT INTO stud VALUES(3,'Frank',89);

INSERT INTO stud VALUES(4,'Jane',98);

INSERT INTO stud VALUES(5,'Robert',78);

INSERT INTO stud VALUES(3,'Frank',89);

INSERT INTO stud VALUES(5,'Robert',78);

INSERT INTO stud VALUES(4,'Jane',98);

COMMIT;

DeletingDuplicateRowsinSQL_1 

Fetching and Identifying the duplicate rows in SQL

/* Display all the records from the table */

SELECT * FROM stud;

Input

DeletingDuplicateRowsinSQL_2

Output

DeletingDuplicateRowsinSQL_3.

The table depicted above consisting of data repetition, that is duplicate data, can be deleted using the group by clause as follows:

Deleting Duplicate Rows From the Table Using Group by and Having Clause

Under SQL, delete duplicate Rows in SQL is done with the Group by and Having clause. It is done as follows:

Code:

select Name,Marks,grade,count(*) as cnt from stud group by Name,Marks,grade having count(*) > 1;

Input:

DeletingDuplicateRowsinSQL_4

Output:

DeletingDuplicateRowsinSQL_5

SQL Delete Duplicate Rows Using Common Table Expressions (CTE)

Common Table Expression

In delete duplicate rows in SQL, the acronym CTE stands for "common table expression." It's a named temporary result set created by a simple query and specified within the scope of a single SELECT, INSERT, UPDATE, or DELETE expression. You can write complex recursive queries using CTE. It has a lot more traction than temporary tables.

Syntax

WITH [CTEName]

As

( Select col1,col2,col3 from [tablename] where [condition]

Select col1,col2,col3 from [CTEName]

Procedure for Removing the Duplicate Rows Using CTE

First, you need to make a table Employ_DB  table script to SQL Server and run it in the required database.

Creating the Table

Create table Employ_DB(emp_no number(10),emp_name varchar(20),emp_address varchar(25), emp_eoj date);

Input:

DeletingDuplicateRowsinSQL_6

Output

DeletingDuplicateRowsinSQL_7

Want a Top Software Development Job? Start Here!

Full Stack Developer - MERN StackExplore Program
Want a Top Software Development Job? Start Here!

Inserting Data Into the Table

After the table has been established, you must now insert some records into it, including some duplicates.

Code:

Insert into Employ_DB values(11,’Mohith’,’tokya’,’12-may-2000’);

Input:

DeletingDuplicateRowsinSQL_8.

Output:

(8) rows inserted

Searching Data From the Table

Next, execute the preceding script and use the following query to search for the data in the table.

To print the records from the Employ_EB in a sorted list, use the field name and address of an employee.

Code:

Select * from Employ_DB order by emp_name,emp_address;

Input:

DeletingDuplicateRowsinSQL_9

Output:

DeletingDuplicateRowsinSQL_10

Also Read: What Is a Schema in SQL and Advantages of Using Schema

Deleting Duplicate Rows in SQL Using CTE

According to Delete Duplicate Rows in SQL, the above contains many of the duplicated records that have been fixed onto the table. For numbering duplicate city records by state, you’ll use CTE's row number () feature. CTE can generate a temporary result set that you can use to remove redundant records from the actual table with a single question.

Code:

With CTE as (Select emp_no,emp_name,row_number()

Over (partition by emp_no order by emp_no) as number_of_employ

From Employ_DB)Select * from CTE order by emp_name,emp_no;

DeletingDuplicateRowsinSQL_11 

Finally, delete the duplicate record using the Common Type expression is as follows:

Code:

With CTE as (Select emp_no,emp_name,row_number()

Over (partition by emp_no order by emp_no) as number_of_employ

From Employ_DB)

Select * from CTE where number of employ >1 order by emp_no;

DeletingDuplicateRowsinSQL_12

According to Delete Duplicate Rows in SQL, in the above table, only two of the records are duplicated based on the emp_no. So, now you are going to delete that duplicate record from the table Employ_DB using the following code.

 

DeletingDuplicateRowsinSQL_14.

Learn 15+ In-Demand Tools and Skills!

Automation Testing Masters ProgramExplore Program
Learn 15+ In-Demand Tools and Skills!

Rank Function to SQL Delete Duplicate Rows

According to Delete Duplicate Rows in SQL, you can also use the SQL RANK feature to get rid of the duplicate rows. Regardless of duplicate rows, the SQL RANK function returns a unique row ID for each row.

You need to use aggregate functions like Max, Min, and AVG to perform calculations on data. Using these functions, you get a single output row. The SQL RANK functions in SQL Server allow you to specify a rank for individual fields based on categorizations. For each participating line, it returns an aggregated value. RANK functions in SQL are often referred to as Window Functions.

Also Read: SQL UNION: The Best Way to Combine SQL Queries

You can implement Rank functions in four ways.

  1. ROW_NUMBER()
  2. RANK()
  3. DENSE_RANK()
  4. NTILE()

According to Delete Duplicate Rows in SQL, the PARTITION BY clause is used with the RANK function in the following query. The PARTITION BY clause divides the data into subsets for the listed columns and assigns a rank to each partition.

 Code:

Select *,

RANK() OVER ( PARTITION BY Animal_id,Animal_name ORDER BY sno DESC )rank

From Animals ) T on A.sno=T.sno

Where T.Rank>1

Alter table Animals

Drop column sno

Select * from Animals

DeletingDuplicateRowsinSQL_15.

DeletingDuplicateRowsinSQL_16

You can see in the screenshot that the row with a rank greater than one needs to be removed. Let's use the following question to get rid of those rows.

DeletingDuplicateRowsinSQL_17. 

Want a Top Software Development Job? Start Here!

Full Stack Developer - MERN StackExplore Program
Want a Top Software Development Job? Start Here!

Use SSIS Package to SQL Delete Duplicate Rows

The SQL Server integration service includes several transformations and operators that aid administrators and developers in reducing manual labor and optimizing tasks. The SSIS package can also be used to delete duplicate rows from a SQL table.

Use Sort Operator in an SSIS Package for Removing Duplicating Rows

A Sort operator can be used to sort the values in a SQL table. You may wonder how data sorting will help you get rid of duplicate rows. Here is how it is.

DeletingDuplicateRowsinSQL_18

DeletingDuplicateRowsinSQL_19.

DeletingDuplicateRowsinSQL_20

DeletingDuplicateRowsinSQL_21

DeletingDuplicateRowsinSQL_22.

To demonstrate this challenge, build an SSIS kit.

First, build a new Integration kit in SQL Server Data Tools. Add an OLE DB source link to the new bundle.

Boost Your Coding Skills. Nail Your Next Interview

Full Stack Developer - MERN StackExplore Program
Boost Your Coding Skills. Nail Your Next Interview

Choose The Right Software Development Program

This table compares various courses offered by Simplilearn, based on several key features and details. The table provides an overview of the courses' duration, skills you will learn, additional benefits, among other important factors, to help learners make an informed decision about which course best suits their needs.

Program Name Full Stack Java Developer Career Bootcamp Automation Testing Masters Program Post Graduate Program in Full Stack Web Development
Geo IN All IN/ROW
University Simplilearn Simplilearn Caltech
Course Duration 11 Months 11 Months 9 Months
Coding Experience Required Basic Knowledge Basic Knowledge Basic Knowledge
Skills You Will Learn 15+ Skills Including Core Java, SQL, AWS, ReactJS, etc. Java, AWS, API Testing, TDD, etc. Java, DevOps, AWS, HTML5, CSS3, etc.
Additional Benefits Interview Preparation
Exclusive Job Portal
200+ Hiring Partners
Structured Guidance
Learn From Experts
Hands-on Training
Caltech CTME Circle Membership
Learn 30+ Tools and Skills
25 CEUs from Caltech CTME
Cost $$ $$ $$$
Explore Program Explore Program Explore Program

Conclusion

In this article, you looked at how to remove duplicate rows in SQL using T-SQL, CTE, and the SSIS kit, among other methods. You are free to use whatever approach makes you feel most at ease. However, it is advised against directly implementing these procedures and packaging on output results. You should conduct your tests in a less demanding setting.

To become an expert in SQL programming language, join our Simplilearn's SQL Certification Training Course. This SQL certification course provides you with everything you need to get started working with SQL databases and incorporating them into your applications. Learn how to organize your database correctly, write effective SQL statements and clauses, and maintain your SQL database for scalability. This course includes comprehensive coverage of SQL basics, comprehensive coverage of all relevant query tools and SQL commands, an industry-recognized course completion certificate, and lifetime access to self-paced learning.

Despite the fact that SQL is an old language, it is still very useful today as companies all over the world collect massive amounts of data. SQL is one of the most in-demand engineering skills, and mastering it will vastly improve your resume.

Database and relationship management, Query tools and SQL commands, Aggregate functions, Group by clause, Tables and joins, Subqueries, Data manipulation, Transaction control, Views, and procedures are among the skills covered.

Start learning for free today's most in-demand skills. This course emphasize the development of good core skills for future career advancement. Experts in the field will teach you. Get access to 300+ job-ready skills in today's most in-demand fields. You can find free guides on various career paths, wages, interview tips, and more on our website.

If you are looking to enhance your software development skills further, we recommend you check Simplilearn's Post Graduate Program in Full Stack Web Development. This course can help you hone the right skills and make you job-ready in no time.

Have any questions for us? Leave them in the comments section of this article, and our experts will get back to you on them, as soon as possible!

Our Software Development Courses Duration And Fees

Software Development Course typically range from a few weeks to several months, with fees varying based on program and institution.

Program NameDurationFees
Caltech Coding Bootcamp

Cohort Starts: 17 Jun, 2024

6 Months$ 8,000
Full Stack Developer - MERN Stack

Cohort Starts: 30 Apr, 2024

6 Months$ 1,449
Automation Test Engineer

Cohort Starts: 1 May, 2024

11 Months$ 1,499
Full Stack Java Developer

Cohort Starts: 14 May, 2024

6 Months$ 1,449