How to Find and Delete Duplicates in SQL

Duplicate records in a database can lead to inaccurate data analysis, increased storage usage, and slower queries. It’s crucial to identify and remove these duplicates to maintain the integrity and performance of your database. In this blog post, we’ll explore three methods to find and delete duplicates in SQL: using GROUP BY, subqueries, and Common Table Expressions (CTE). We’ll also provide the SQL script to create the sample table used in these examples.

There are three method to Find and Delete Duplicates:

1. Using Group By

2. Using Subquery

3. Using Common Table Expression (CTE) And Window Function.

Lets check the each method one by one with explanation.

Let’s start by creating a sample Employee table and populating it with some data, including duplicates.

  
-- Create Employee Table
 CREATE TABLE Employee (
 ID INT PRIMARY KEY,
 employee_name VARCHAR(100),
 city VARCHAR(100)
);

-- Insert Sample Data
INSERT INTO Employee (ID, employee_name, city) VALUES 
(1, 'Amit', 'Delhi'),
(2, 'Rohit', 'Mumbai'),
(3, 'Suman', 'Delhi'),
(4, 'Amit', 'Delhi'), -- Duplicate
(5, 'Rohit', 'Mumbai'), -- Duplicate
(6, 'Neha', 'Pune'),
(7, 'Amit', 'Delhi'); -- Duplicate
--With the table and data ready, let's explore the methods to find
and delete duplicates.

Method 1: Using GROUP BY

The GROUP BY clause groups records with identical values in specified columns. By combining it with the HAVING clause, we can find groups with more than one occurrence, indicating duplicates.

  SELECT 
employee_name,
city,
COUNT(*) AS duplicate_count
FROM Employee
GROUP BY employee_name, city
HAVING COUNT(*) > 1;

Explanation:

The query groups rows by employee_name and city.

COUNT(*) AS duplicate_count counts the occurrences of each group.

The HAVING COUNT(*) > 1 condition filters the results to show only groups with more than one record.

Usage:

This method is useful when you want to identify duplicates without immediately deleting them. It allows you to review which records are duplicated before proceeding with any deletions.

Sql Projects :

Healthcare Analytics System

August 8, 2024

Hotel Management System

July 27, 2024

Employees Management System

July 20, 2024

Method 2: Using Subquery

This method uses a subquery to identify duplicate records and deletes them, keeping only the first occurrence based on the ID column.

  
--To view duplicates
SELECT * FROM Employee;
WHERE EXISTS 
(
SELECT 1
FROM Employee AS e2
WHERE Employee.employee_name = e2.employee_name
AND Employee.city = e2.city
AND Employee.ID > e2.ID
);

-- To delete duplicates
DELETE FROM Employee
WHERE EXISTS 
(
SELECT 1
FROM Employee AS e2
WHERE Employee.employee_name = e2.employee_name
AND Employee.city = e2.city
AND Employee.ID > e2.ID
);

Explanation:

The subquery checks if a duplicate exists by comparing the current row (Employee) with another row (e2) that has the same employee_name and city.
AND Employee.ID > e2.ID ensures that only the duplicate rows with higher ID values are deleted, preserving the first occurrence.
Usage: This approach is effective when you want to remove duplicates directly while keeping the earliest record (smallest ID).

SQL Interview Questions for Beginner

September 14, 2024

How to Find and Delete Duplicates in SQL

September 13, 2024

SQL Server’s System-Defined Databases: The Backbone of Your Database

September 7, 2024

Method 3: Using Common Table Expression (CTE)

A Common Table Expression (CTE) provides a more readable and flexible approach to handling duplicates using the ROW_NUMBER() function.

  
--Using CTE AND Window function
-- To view duplicates
WITH cte AS 
(
SELECT 
employee_name,
city,
id,
 ROW_NUMBER()
 OVER(PARTITION BY employee_name, city ORDER BY employee_name DESC) AS Rwn
 FROM  Employee
)
SELECT * FROM cte WHERE Rwn > 1;
-- For deleting duplicate records
WITH cte AS (
SELECT 
employee_name,
city,
id,
ROW_NUMBER() 
OVER(PARTITION BY employee_name, city ORDER BY employee_name DESC) AS Rwn
FROM Employee
)
DELETE FROM Employee
WHERE ID IN (SELECT id FROM cte WHERE Rwn > 1);

Explanation:

The CTE assigns a row number to each row within a partition defined by employee_name and city.
ROW_NUMBER() generates a sequential number for each row, ordered by employee_name DESC.
The main query then filters out rows with a row number greater than one (Rwn > 1), identifying duplicates.
The DELETE statement removes these duplicates, keeping only the first occurrence.
Usage: This method is ideal when you need a flexible, scalable way to handle duplicates, especially in complex datasets.

Conclusion

Managing duplicates is a common but critical task in SQL database management. The methods outlined above provide various approaches depending on your specific needs—whether it’s to identify, review, or delete duplicates directly. By using GROUP BY, subqueries, or CTEs, you can ensure that your database remains clean, efficient, and ready for accurate data analysis.

Feel free to choose the method that best suits your scenario, and always remember to back up your data before performing delete operations!

Post Views: 104

Spread the love

Sqlholic

How to Find and Delete Duplicates in SQL

Method 1: Using GROUP BY

Sql Projects :

Healthcare Analytics System

Hotel Management System

Employees Management System

Method 2: Using Subquery

SQL Interview Questions for Beginner

How to Find and Delete Duplicates in SQL

SQL Server’s System-Defined Databases: The Backbone of Your Database

Method 3: Using Common Table Expression (CTE)

Conclusion

Tags

Leave a Reply Cancel reply

Latest Posts

SQL Interview Questions for Beginner

How to Find and Delete Duplicates in SQL

SQL Server’s System-Defined Databases: The Backbone of Your Database

SQL Constraints: The Ultimate Guide to Ensuring Data Integrity

Address Parsing in SQL: Solutions Using MySQL and T-SQL

Follow Us On Social Media

Trending Posts

SQL Interview Questions for Beginner

How to Find and Delete Duplicates in SQL

SQL Server’s System-Defined Databases: The Backbone of Your Database

SQL Constraints: The Ultimate Guide to Ensuring Data Integrity

Categories

Archives

About Us