SQL Duplicates Count Query for two or more columns

por | 7 febrero, 2023
WITH CTE AS
(
    SELECT fips, cnty_name,
           ROW_NUMBER() OVER (PARTITION BY fips, cnty_name ORDER BY fips, cnty_name) AS DuplicateCount
    FROM zip_to_fips
    WHERE fips IS NOT NULL AND cnty_name IS NOT NULL
)
SELECT fips, cnty_name, COUNT(*) AS DuplicateCount
FROM CTE
WHERE DuplicateCount >= 1
GROUP BY fips, cnty_name
ORDER BY DuplicateCount DESC

This query uses a commonly used query (CTE) to group the results and count duplicates in the zip_to_fips table.

The first part of the query, the CTE, selects the columns «fips», «cnty_name» and adds an additional column «DuplicateCount» that uses the ROW_NUMBER() function to assign a unique number to each row that has equal values in the «fips» and «cnty_name» columns. The PARTITION BY clause indicates the columns by which to group and the ORDER BY clause indicates the order in which to assign numbers.

The second part of the query uses the CTE to select the columns «fips», «cnty_name» and uses the COUNT(*) function to count the number of times each combination of «fips» and «cnty_name» appears. The GROUP BY clause indicates the columns by which to group the result. Finally, the ORDER BY clause indicates the order in which to sort the results, in this case by DuplicateCount in descending order.

The query filters only the rows that have non-null values in the «fips» and «cnty_name» columns using the WHERE fips IS NOT NULL AND cnty_name IS NOT NULL clause. In addition, the query also filters only the results with DuplicateCount greater than 1, i.e., those that have duplicates.