Eliminating Duplicates Using SQL DISTINCT Operator

Summary: in this tutorial, you will learn how to use the SQL DISTINCT operator to eliminate duplicate rows in the result set.

The result set of a SELECT statement may contain duplicate rows. To eliminate the duplicates, you use the DISTINCT operator as follows:

Notice you can use the DISTINCT operator in the SELECT statement only.

The database engine uses values of the columns specified after the DISTINCT operator for evaluating the uniqueness of the row in the result set. If you specify one column, the database engine uses the values in the column to evaluate the uniqueness. In case you specify multiple columns, the database engine evaluates the uniqueness of rows based on the combination of values in those columns.

SQL DISTINCT operator examples

Let’s take a look at some examples of using the DISTINCT operator in the SELECT statement.

SQL DISTINCT one column example

The following query gets employee’s city in the employees table:

SQL DISTINCT - Employee Cities Example

The result set contains duplicate city i.e., London appears four times, which indicates that some employees located in the same city.

To remove the duplicate cities, you can use the DISTINCT operator as the following query:

SQL DISTINCT one column

SQL DISTINCT multiple columns example

To find a list of unique cities and countries of employees, you can specify the city and country columns after the DISTINCT operator as the following query:

SQL DISTINCT multiple columns

The combination of values in the city and country columns are used to determine the uniqueness of rows in the result set.

SQL DISTINCT with NULL values

The DISTINCT operator treats NULLvalues to be duplicate of each other, therefore if the SELECT statement returns NULL values, only one NULL value is returned when you use the DISTINCT operator.

Let’s take a look at the following example:

SQL DISTINCT employee regions

The query returns duplicate regions which the NULL value appears 4 times in the result set. Let’s apply the DISTINCT operator in the query as follows:

SQL DISTINCT NULL values

Now, we have only one NULL value included in the result set.

SQL DISTINCT with aggregate functions

Besides eliminating duplicates, you can use the DISTINCT operator in conjunction with an aggregate function such as:

  • COUNT: COUNT(DISTINCT column) to count distinct values in a result set.
  • SUM: SUM(DISTINCT column) to calculate the sum of distinct values.

For example, to count distinct cities of employees, you use the DISTINCT operator with the COUNT function as the following query:

SQL COUNT DISTINCT

To calculate the sum of distinct unit prices of products in the products table, you use the DISTINCT operator with the SUM function as follows:

SQL DISTINCT SUM

DISTINCT vs. ALL

It is worth to mention the ALL operator in this tutorial. Unlike the DISTINCT operator, the ALL operator is used to include all rows that contain duplicate rows in the result set. The following queries return the same result set.

 

The SELECT statement uses the ALL operator by default so you don’t have to specify it explicitly in the statement.

In this tutorial, we have shown you how to use DISTINCT operator in SELECT statement to eliminate duplicate rows in the result set.