Lets consider a scenario where you store orders in a table for different products and you want to get the top two orders for each product based on the quantity. For doing this we can't rely upon simple group by query. To accomplish this task, we can either use a ranking function with a common table expression or a statement with cross apply. For the rest of the article we will be looking at applying these two techniques (performance stats of each technique is also provided).
As usual lets start by creating our test tables and populating them with test data.
We will create two tables
1. Products (which will hold info about different products)
2. Orders (which will hold info about all the orders placed for products)
Tables are created using the following scripts
CREATE TABLE Products
(
ProductID INT IDENTITY(1,1) PRIMARY KEY
, ProductName VARCHAR(50)
)
CREATE TABLE Orders
(
OrderID INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT NOT NULL FOREIGN KEY REFERENCES Products(ProductID),
Quantity INT NOT NULL
)
Having created our tables, its time to populate them with test data using the following scripts
INSERT INTO Products(ProductName)
VALUES('P1'), ('P2'), ('P3'), ('P4'), ('P5'),('P6'), ('P7'), ('P8'), ('P9'), ('P10')
INSERT INTO Orders(ProductID, Quantity)
SELECT ((ROW_NUMBER() OVER( ORDER BY c1.column_ID)) % 10)+ 1 , ROW_NUMBER() OVER( ORDER BY C2.column_ID)
FROM sys.columns c1, sys.columns c2
Now lets create an index on our Orders table to make our queries run faster
CREATE INDEX IDX_Orders_ProductID_Quantity ON Orders( ProductID ASC, Quantity ASC)
First Method
Lets look at using Rank() function
RANK () : Returns the rank of a function with in the partition of a result set
Syntax :
RANK () OVER ( [
Our ranking query looks like this
;WITH ProductOrders AS
(
SELECT P.ProductName, O.OrderID, O.Quantity, RANK() OVER(PARTITION BY O.ProductID ORDER BY O.Quantity DESC) AS OrderRank
FROM Products P
INNER JOIN Orders O
ON P.ProductID = O.ProductID
)
SELECT *
FROM ProductOrders
WHERE OrderRank < 3
I also managed to capture some stats (io and time) data
(20 row(s) affected)
Table 'Orders'. Scan count 10, logical reads 486, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 217 ms.
Second Method
CROSS APPLY : The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. CROSS APPLY returns only rows from the outer table that produce a result set from the table-valued function
Query using CROSS APPLY
SELECT P.ProductName, O.OrderID, O.Quantity
FROM Products P
CROSS APPLY ( SELECT TOP 2 *
FROM Orders
WHERE
ProductID = p.ProductID
ORDER BY Quantity DESC
) O
Stats for this query are as follows:
(20 row(s) affected)
Table 'Orders'. Scan count 10, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 1 ms.
Both methods described above give us the required output, but using the second method with cross apply seems to be much more efficient.
As usual i hope you find this article interesting. Any comments, questions and suggestion are more than welcome.