Eliminating Duplicates Using SQL DISTINCT Operator

Summary: in this tutorial, you will learn how to use the SQL DISTINCT operator to eliminate duplicate rows in the result set.

SQL DISTINCT clause overview

The result set of a SELECT statement may contain duplicate rows. To eliminate the duplicates, you use the DISTINCT operator as follows:

SELECT DISTINCT select_list
FROM table_name;Code language: SQL (Structured Query Language) (sql)

Notice you can use the DISTINCT operator in the SELECT statement only.

The SELECT statement uses the values of the columns specified after the DISTINCT operator for evaluating the uniqueness of the rows in the result set.

If you specify one column, the database uses the values in the column to evaluate the uniqueness.

If you specify multiple columns, the database engine evaluates the uniqueness of rows based on the combination of values in those columns.

SQL DISTINCT operator examples

Let’s take a look at some examples of using the DISTINCT operator in the SELECTstatement.

1) Using SQL DISTINCT with one column example

The following query gets employee’s city in the employees table:

SELECT city
FROM employees;Code language: SQL (Structured Query Language) (sql)

Output:

+----------+
| city     |
+----------+
| Seattle  |
| Tacoma   |
| Kirkland |
| Redmond  |
| London   |
| London   |
| London   |
| Seattle  |
| London   |
+----------+
9 rows in set (0.00 sec)Code language: JavaScript (javascript)

The result set contains duplicate city i.e., London appears four times, which indicates that some employees located in the same city.

To remove the duplicate cities, you can use the DISTINCT operator as shown in the following query:

SELECT DISTINCT
    city
FROM
    employees;Code language: SQL (Structured Query Language) (sql)

Output:

+----------+
| city     |
+----------+
| Seattle  |
| Tacoma   |
| Kirkland |
| Redmond  |
| London   |
+----------+
5 rows in set (0.00 sec)Code language: JavaScript (javascript)

2) Using the SQL DISTINCT with multiple columns example

To find a list of unique cities and countries from the employees table, you can specify the city and country columns after the DISTINCT operator as shown in the following query:

SELECT DISTINCT
    city, country
FROM
    employees;Code language: SQL (Structured Query Language) (sql)

Output:

+----------+---------+
| city     | country |
+----------+---------+
| Seattle  | USA     |
| Tacoma   | USA     |
| Kirkland | USA     |
| Redmond  | USA     |
| London   | UK      |
+----------+---------+
5 rows in set (0.01 sec)Code language: JavaScript (javascript)

In this example, the DISTINCT uses the combination of values in the city and country columns to determine the uniqueness of rows in the result set.

Using the SQL DISTINCT with NULL

The DISTINCT operator treats NULL duplicate. It means that the two NULLs are the same. Therefore, if the SELECT statement returns NULLs, the DISTINCT returns only one NULL.

The following SELECT statement selects all the regions from the region column of the employees table:

SELECT region FROM employees;Code language: SQL (Structured Query Language) (sql)

Output:

+--------+
| region |
+--------+
| WA     |
| WA     |
| WA     |
| WA     |
| NULL   |
| NULL   |
| NULL   |
| WA     |
| NULL   |
+--------+
9 rows in set (0.00 sec)Code language: PHP (php)

As you can see from the output, the query returns duplicate regions. For example, NULL appears four times.

The following SELECT statement uses the DISTINCT operator to select unique regions from the employees table:

SELECT DISTINCT region
FROM employees;Code language: SQL (Structured Query Language) (sql)

Output:

+--------+
| region |
+--------+
| WA     |
| NULL   |
+--------+
2 rows in set (0.00 sec)Code language: JavaScript (javascript)

The result set now has unique regions.

DISTINCT vs. ALL

It is worth to mention the ALL operator. Unlike the DISTINCT operator, the ALL operator includes all rows that contain duplicate values. The following queries return the same result set.

SELECT ALL select_list
FROM table_name;Code language: SQL (Structured Query Language) (sql)

SELECT select_list
FROM table_name;Code language: SQL (Structured Query Language) (sql)

The SELECT statement uses the ALL operator by default. Therefore, you don’t have to specify it explicitly in the statement.

Summary

Use the DISTINCT operator to select the unique values from one or more columns.

SQL DISTINCT clause overview

SQL DISTINCT operator examples

1) Using SQL DISTINCT with one column example

2) Using the SQL DISTINCT with multiple columns example

Using the SQL DISTINCT with NULL

DISTINCT vs. ALL

Summary

Databases