Scale customer reach and grow sales with AskHandle chatbot

How to Remove Duplicate Rows in SQL While Keeping One?

Are you tired of dealing with duplicate rows in your SQL database? If you've ever found yourself in a situation where you need to eliminate duplicate records but still want to retain one unique entry, then you're in the right place. In this article, we will explore efficient ways to delete duplicate rows while ensuring that at least one instance of each distinct record is preserved.

image-1
Written by
Published onJuly 20, 2024
RSS Feed for BlogRSS Blog

How to Remove Duplicate Rows in SQL While Keeping One?

Are you tired of dealing with duplicate rows in your SQL database? If you've ever found yourself in a situation where you need to eliminate duplicate records but still want to retain one unique entry, then you're in the right place. In this article, we will explore efficient ways to delete duplicate rows while ensuring that at least one instance of each distinct record is preserved.

Identifying Duplicate Rows

Before diving into removing duplicate rows, it's crucial to identify which records are duplicates within your database table. One common method is by using the GROUP BY clause along with the COUNT() function to find duplicates based on specific columns. For instance, let's say we have a table named employees with columns id, name, and email. To identify duplicate email addresses in this table, we can execute the following SQL query:

SELECT email, COUNT(*)
FROM employees
GROUP BY email
HAVING COUNT(*) > 1;

By running the above query, you will receive a list of duplicate email addresses along with the count of occurrences for each duplicated email. This initial step is essential in understanding the extent of duplicate records in your database table.

Deleting Duplicate Rows

Once you have identified the duplicate rows in your SQL table, the next step is to remove the duplicates while keeping one instance of each unique record. There are a few strategies you can employ to achieve this, depending on your specific requirements.

Using a Common Table Expression (CTE)

One effective method to delete duplicate rows in SQL is through the use of a Common Table Expression (CTE) in conjunction with the ROW_NUMBER() window function. This approach allows you to assign a unique row number to each record, enabling you to select and retain only the first occurrence of duplicate rows.

Here's how you can accomplish this:

WITH CTE AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM employees
)
DELETE FROM CTE
WHERE rn > 1;

In the above SQL query, the ROW_NUMBER() function partitions the dataset by the email column and orders the rows based on the id. By deleting rows where the row number is greater than one, you effectively eliminate duplicates while preserving the original record with the lowest id.

Utilizing Subqueries

Another approach to deleting duplicate rows while keeping one is by utilizing subqueries in SQL. This method involves selecting the minimum or maximum value based on a specific column and deleting rows that do not meet this criterion. For instance, if you want to retain the row with the lowest id for each duplicated email address, you can execute the following query:

DELETE FROM employees
WHERE id NOT IN (
    SELECT MIN(id)
    FROM employees
    GROUP BY email
);

By leveraging subqueries in this manner, you can effectively eliminate duplicate rows while maintaining the desired unique entry in your database table.

Leveraging Temporary Tables

In certain scenarios, creating temporary tables can simplify the process of removing duplicate rows in SQL. By transferring the distinct records into a temporary table and subsequently truncating the original table, you can reintroduce the unique data back without any duplicate entries.

Here's how you can utilize temporary tables for this purpose:

CREATE TABLE temp_employees AS
SELECT MIN(id) AS id,
       name,
       email
FROM employees
GROUP BY email;

TRUNCATE TABLE employees;

INSERT INTO employees
SELECT *
FROM temp_employees;

DROP TABLE temp_employees;

By employing temporary tables, you can efficiently manage the elimination of duplicate rows in your database table while retaining one instance of each unique record.

Removing duplicate rows in SQL while preserving one distinct entry can significantly enhance the data integrity and efficiency of your database operations. Whether you opt for Common Table Expressions, subqueries, or temporary tables, the key lies in selecting an approach that best aligns with your specific requirements and constraints.

The next time you find yourself grappling with duplicate records in your SQL database, remember these strategies to streamline the process and maintain a clean dataset. By implementing these techniques, you can effectively manage duplicate rows while ensuring that each unique entry is retained for optimal database management.

Create personalized AI to support your customers

Get Started with AskHandle today and launch your personalized AI for FREE

Featured posts

Join our newsletter

Receive the latest releases and tips, interesting stories, and best practices in your inbox.

Read about our privacy policy.

Be part of the future with AskHandle.

Join companies worldwide that are automating customer support with AskHandle. Embrace the future of customer support and sign up for free.