Summary: in this tutorial, you will learn how to use the SQL DISTINCT operator to eliminate duplicate rows in the result set.
The result set of a SELECT statement may contain duplicate rows. To eliminate the duplicates, you use the DISTINCT
operator as follows:
1 2 3 4 | SELECT DISTINCT column_1, column2 FROM table; |
Notice you can use the DISTINCT
operator in the SELECT
statement only.
The database engine uses values of the columns specified after the DISTINCT
operator for evaluating the uniqueness of the row in the result set. If you specify one column, the database engine uses the values in the column to evaluate the uniqueness. In case you specify multiple columns, the database engine evaluates the uniqueness of rows based on the combination of values in those columns.
SQL DISTINCT operator examples
Let’s take a look at some examples of using the DISTINCT
operator in the SELECT
statement.
SQL DISTINCT one column example
The following query gets employee’s city in the employees
table:
1 2 | SELECT city FROM employees |
The result set contains duplicate city i.e., London
appears four times, which indicates that some employees located in the same city.
To remove the duplicate cities, you can use the DISTINCT
operator as the following query:
1 2 3 4 | SELECT DISTINCT city FROM employees; |
SQL DISTINCT multiple columns example
To find a list of unique cities and countries of employees, you can specify the city
and country
columns after the DISTINCT
operator as the following query:
1 2 3 4 | SELECT DISTINCT city, country FROM employees; |
The combination of values in the city and country columns are used to determine the uniqueness of rows in the result set.
SQL DISTINCT with NULL values
The DISTINCT
operator treats NULL
values to be duplicate of each other, therefore if the SELECT
statement returns NULL
values, only one NULL
value is returned when you use the DISTINCT
operator.
Let’s take a look at the following example:
1 2 3 4 | SELECT region FROM employees |
The query returns duplicate regions which the NULL
value appears 4 times in the result set. Let’s apply the DISTINCT
operator in the query as follows:
1 2 3 4 | SELECT DISTINCT region FROM employees; |
Now, we have only one NULL
value included in the result set.
SQL DISTINCT with aggregate functions
Besides eliminating duplicates, you can use the DISTINCT
operator in conjunction with an aggregate function such as:
- COUNT:
COUNT(DISTINCT column)
to count distinct values in a result set. - SUM:
SUM(DISTINCT column)
to calculate the sum of distinct values.
For example, to count distinct cities of employees, you use the DISTINCT
operator with the COUNT
function as the following query:
1 2 3 4 | SELECT COUNT(DISTINCT city) FROM employees; |
To calculate the sum of distinct unit prices of products in the products
table, you use the DISTINCT
operator with the SUM
function as follows:
1 2 3 4 | SELECT SUM(DISTINCT unitprice) FROM products; |
DISTINCT vs. ALL
It is worth to mention the ALL
operator in this tutorial. Unlike the DISTINCT
operator, the ALL
operator is used to include all rows that contain duplicate rows in the result set. The following queries return the same result set.
1 2 3 4 | SELECT ALL column_1, column_2 FROM table; |
1 2 3 4 | SELECT column_1, column_2 FROM table; |
The SELECT
statement uses the ALL
operator by default so you don’t have to specify it explicitly in the statement.
In this tutorial, we have shown you how to use DISTINCT
operator in SELECT
statement to eliminate duplicate rows in the result set.