Sunday 20 November 2011

How to retrieve TOP Distinct values in SQL Server

I have recently got a requirement from one of our customers to show recently used distinct items for a particular report. While writing the query, I found out it is not straight forward as there is a little cache “recent items” in the requirement.

This can be achieved by using a simple trick as shown in the below example. To illustrated the scenario, I have create a sample table with data

Sample Data

CREATE TABLE SaleItem ( ItemName NVARCHAR(30),SaleDate DATETIME)

insert into SaleItem SELECT 'Book',GETDATE()
insert into SaleItem SELECT 'Book',GETDATE()
insert into SaleItem SELECT 'Book',GETDATE()
insert into SaleItem SELECT 'Pencil',GETDATE()
insert into SaleItem SELECT 'Pencil',GETDATE()
insert into SaleItem SELECT 'Pencil',GETDATE()
insert into SaleItem SELECT 'Pen',GETDATE()
insert into SaleItem SELECT 'Pen',GETDATE()
insert into SaleItem SELECT 'Pen',GETDATE()

SQL Query

WITH ItemList AS
(
SELECT
IL.ItemName,
IL.SaleDate,
ROW_NUMBER() OVER (ORDER BY IL.SaleDate DESC) as 'RankID'
FROM
SaleItem IL
)
SELECT DISTINCT TOP 30
IL.ItemName,
IL.SaleDate
FROM ItemList IL
WHERE NOT EXISTS (Select 1 FROM ItemList IL2
WHERE ( IL2.ItemName = IL.ItemName )
AND IL.RankID < IL2.RankID)
ORDER BY SaleDate DESC

The trick is assigning RankID to each row and eliminate the duplicate row  by comparing the RankID while selecting the records.