Friday, December 31, 2010

Use of Rank and/or Cross Apply to get top n items from an ordered grouping

If you want to get the max or min (or an other accumulating function) out of a grouping, a simple group by clause is more than enough.
Lets consider a scenario where you store orders in a table for different products and you want to get the top two orders for each product based on the quantity. For doing this we can't rely upon simple group by query. To accomplish this task, we can either use a ranking function with a common table expression or a statement with cross apply. For the rest of the article we will be looking at applying these two techniques (performance stats of each technique is also provided).

As usual lets start by creating our test tables and populating them with test data.
We will create two tables
1. Products (which will hold info about different products)
2. Orders (which will hold info about all the orders placed for products)

Tables are created using the following scripts


CREATE TABLE Products
(
ProductID INT IDENTITY(1,1) PRIMARY KEY
, ProductName VARCHAR(50)
)

CREATE TABLE Orders
(
OrderID INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT NOT NULL FOREIGN KEY REFERENCES Products(ProductID),
Quantity INT NOT NULL
)


Having created our tables, its time to populate them with test data using the following scripts

INSERT INTO Products(ProductName)
VALUES('P1'), ('P2'), ('P3'), ('P4'), ('P5'),('P6'), ('P7'), ('P8'), ('P9'), ('P10')

INSERT INTO Orders(ProductID, Quantity)
SELECT ((ROW_NUMBER() OVER( ORDER BY c1.column_ID)) % 10)+ 1 , ROW_NUMBER() OVER( ORDER BY C2.column_ID)
FROM sys.columns c1, sys.columns c2


Now lets create an index on our Orders table to make our queries run faster


CREATE INDEX IDX_Orders_ProductID_Quantity ON Orders( ProductID ASC, Quantity ASC)


First Method
Lets look at using Rank() function

RANK () : Returns the rank of a function with in the partition of a result set

Syntax :
RANK () OVER ( [ ] )

Our ranking query looks like this



;WITH ProductOrders AS
(
SELECT P.ProductName, O.OrderID, O.Quantity, RANK() OVER(PARTITION BY O.ProductID ORDER BY O.Quantity DESC) AS OrderRank
FROM Products P
INNER JOIN Orders O
ON P.ProductID = O.ProductID
)

SELECT *
FROM ProductOrders
WHERE OrderRank < 3


I also managed to capture some stats (io and time) data

(20 row(s) affected)
Table 'Orders'. Scan count 10, logical reads 486, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 217 ms.



Second Method
CROSS APPLY : The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. CROSS APPLY returns only rows from the outer table that produce a result set from the table-valued function

Query using CROSS APPLY



SELECT P.ProductName, O.OrderID, O.Quantity
FROM Products P
CROSS APPLY ( SELECT TOP 2 *
FROM Orders
WHERE
ProductID = p.ProductID
ORDER BY Quantity DESC
) O




Stats for this query are as follows:


(20 row(s) affected)
Table 'Orders'. Scan count 10, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 1 ms.



Both methods described above give us the required output, but using the second method with cross apply seems to be much more efficient.

As usual i hope you find this article interesting. Any comments, questions and suggestion are more than welcome.

Tuesday, December 21, 2010

Navigating through hierarchy using Recursive Common Table Expressions

Its quite often that we find the need to traverse up or down the hierarchy of a tree.
For example :
1. to find everyone working in an organization ( Manager -> Sub Manager -> Employee)
2. to list the products, the components which make up the product right down to the individual parts that make the components.

To accomplish these things, we had to use temporary tables, cursors or loops and some terminating logic. The introduction of recursive CTEs makes it much more efficient and easy.

Before we begin with the usage of CTEs, lets have a brief introduction.
Common Table Expression:
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

Basic Structure of a Recursive CTE :

WITH cte_name ( column_name [,...n] )
AS
(
CTE_query_definition –- Anchor member is defined.
UNION ALL
CTE_query_definition –- Recursive member is defined referencing cte_name.
)

-- Statement using the CTE
SELECT *
FROM cte_name

The anchor part of the recursive CTE defines the initial data set. This inital data set forms input for the first recursive iteration, the output of which forms the input for the next recursive iteration. Recursion continues till an empty set is returned.





Now let us create some test tables and populate them with dummy data.



Query to create our test table



CREATE TABLE RecursiveCTETable
(UserID INT PRIMARY KEY,
ManagerID INT)



now lets create an index on ManagerID column to make things faster for retrieval



CREATE INDEX IDX_ManagerID ON RecursiveCTETable(ManagerID)



Here is an easy way of populating our test data






;WITH Populator AS(
select ROW_NUMBER() over(order by c1.object_id) RowNum from sys.columns C1 cross apply sys.columns C2
)

INSERT INTO RecursiveCTETable(UserID, ManagerID)
SELECT RowNum - 1, RowNum/7
FROM Populator

UPDATE RecursiveCTETable
SET ManagerID = NULL
WHERE
UserID = 0




Let us see how things used to be done using temp tables:




CREATE TABLE #OrganizationalHierarchy(ID INT IDENTITY(1,1) PRIMARY KEY, UserID INT, ManagerID INT)
CREATE INDEX [IDX_TempOrganizationalHierarchy_UserID] ON #OrganizationalHierarchy(UserID)
CREATE INDEX [IDX_TempOrganizationalHierarchy_ManagerID] ON #OrganizationalHierarchy(ManagerID)


DECLARE @COUNT INT, @IdentNum INT
SELECT @COUNT = 0, @IdentNum = 0

INSERT INTO #OrganizationalHierarchy(UserID, ManagerID)
SELECT UserID, ManagerID
FROM RecursiveCTETable
WHERE
ManagerID IS NULL

SET @COUNT = @@ROWCOUNT

WHILE(@COUNT > 0)
BEGIN
INSERT INTO #OrganizationalHierarchy(UserID, ManagerID)
SELECT R.UserID, R.ManagerID
FROM #OrganizationalHierarchy H
INNER JOIN RecursiveCTETable R
ON H.UserID = R.ManagerID
WHERE
ID > @IdentNum

SET @COUNT = @@ROWCOUNT
SET @IdentNum = @@IDENTITY - @COUNT
END


SELECT * FROM #OrganizationalHierarchy

DROP TABLE #OrganizationalHierarchy

The above query runs with the following stats




Now lets use the Recursive CTE to solve the same problem


;WITH OrganizationalHierarchy AS(
SELECT UserID, ManagerID
FROM RecursiveCTETable
WHERE
ManagerID IS NULL
UNION ALL
SELECT R.UserID, R.ManagerID
FROM RecursiveCTETable R
INNER JOIN OrganizationalHierarchy H
ON H.UserID = R.ManagerID
)

SELECT * FROM OrganizationalHierarchy


Both the queries provide same results.

Its also evident from the stats below that Recursive CTE's not only provide a simpler code but also are efficient.




Be sure to provide right indexes for the Recursive CTE's to work properly.
Hope this article is helpful and any comments or questions are highly appreciated.