Friday, December 31, 2010

Use of Rank and/or Cross Apply to get top n items from an ordered grouping

If you want to get the max or min (or an other accumulating function) out of a grouping, a simple group by clause is more than enough.
Lets consider a scenario where you store orders in a table for different products and you want to get the top two orders for each product based on the quantity. For doing this we can't rely upon simple group by query. To accomplish this task, we can either use a ranking function with a common table expression or a statement with cross apply. For the rest of the article we will be looking at applying these two techniques (performance stats of each technique is also provided).

As usual lets start by creating our test tables and populating them with test data.
We will create two tables
1. Products (which will hold info about different products)
2. Orders (which will hold info about all the orders placed for products)

Tables are created using the following scripts


CREATE TABLE Products
(
ProductID INT IDENTITY(1,1) PRIMARY KEY
, ProductName VARCHAR(50)
)

CREATE TABLE Orders
(
OrderID INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT NOT NULL FOREIGN KEY REFERENCES Products(ProductID),
Quantity INT NOT NULL
)


Having created our tables, its time to populate them with test data using the following scripts

INSERT INTO Products(ProductName)
VALUES('P1'), ('P2'), ('P3'), ('P4'), ('P5'),('P6'), ('P7'), ('P8'), ('P9'), ('P10')

INSERT INTO Orders(ProductID, Quantity)
SELECT ((ROW_NUMBER() OVER( ORDER BY c1.column_ID)) % 10)+ 1 , ROW_NUMBER() OVER( ORDER BY C2.column_ID)
FROM sys.columns c1, sys.columns c2


Now lets create an index on our Orders table to make our queries run faster


CREATE INDEX IDX_Orders_ProductID_Quantity ON Orders( ProductID ASC, Quantity ASC)


First Method
Lets look at using Rank() function

RANK () : Returns the rank of a function with in the partition of a result set

Syntax :
RANK () OVER ( [ ] )

Our ranking query looks like this



;WITH ProductOrders AS
(
SELECT P.ProductName, O.OrderID, O.Quantity, RANK() OVER(PARTITION BY O.ProductID ORDER BY O.Quantity DESC) AS OrderRank
FROM Products P
INNER JOIN Orders O
ON P.ProductID = O.ProductID
)

SELECT *
FROM ProductOrders
WHERE OrderRank < 3


I also managed to capture some stats (io and time) data

(20 row(s) affected)
Table 'Orders'. Scan count 10, logical reads 486, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 217 ms.



Second Method
CROSS APPLY : The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. CROSS APPLY returns only rows from the outer table that produce a result set from the table-valued function

Query using CROSS APPLY



SELECT P.ProductName, O.OrderID, O.Quantity
FROM Products P
CROSS APPLY ( SELECT TOP 2 *
FROM Orders
WHERE
ProductID = p.ProductID
ORDER BY Quantity DESC
) O




Stats for this query are as follows:


(20 row(s) affected)
Table 'Orders'. Scan count 10, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 1 ms.



Both methods described above give us the required output, but using the second method with cross apply seems to be much more efficient.

As usual i hope you find this article interesting. Any comments, questions and suggestion are more than welcome.

Tuesday, December 21, 2010

Navigating through hierarchy using Recursive Common Table Expressions

Its quite often that we find the need to traverse up or down the hierarchy of a tree.
For example :
1. to find everyone working in an organization ( Manager -> Sub Manager -> Employee)
2. to list the products, the components which make up the product right down to the individual parts that make the components.

To accomplish these things, we had to use temporary tables, cursors or loops and some terminating logic. The introduction of recursive CTEs makes it much more efficient and easy.

Before we begin with the usage of CTEs, lets have a brief introduction.
Common Table Expression:
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

Basic Structure of a Recursive CTE :

WITH cte_name ( column_name [,...n] )
AS
(
CTE_query_definition –- Anchor member is defined.
UNION ALL
CTE_query_definition –- Recursive member is defined referencing cte_name.
)

-- Statement using the CTE
SELECT *
FROM cte_name

The anchor part of the recursive CTE defines the initial data set. This inital data set forms input for the first recursive iteration, the output of which forms the input for the next recursive iteration. Recursion continues till an empty set is returned.





Now let us create some test tables and populate them with dummy data.



Query to create our test table



CREATE TABLE RecursiveCTETable
(UserID INT PRIMARY KEY,
ManagerID INT)



now lets create an index on ManagerID column to make things faster for retrieval



CREATE INDEX IDX_ManagerID ON RecursiveCTETable(ManagerID)



Here is an easy way of populating our test data






;WITH Populator AS(
select ROW_NUMBER() over(order by c1.object_id) RowNum from sys.columns C1 cross apply sys.columns C2
)

INSERT INTO RecursiveCTETable(UserID, ManagerID)
SELECT RowNum - 1, RowNum/7
FROM Populator

UPDATE RecursiveCTETable
SET ManagerID = NULL
WHERE
UserID = 0




Let us see how things used to be done using temp tables:




CREATE TABLE #OrganizationalHierarchy(ID INT IDENTITY(1,1) PRIMARY KEY, UserID INT, ManagerID INT)
CREATE INDEX [IDX_TempOrganizationalHierarchy_UserID] ON #OrganizationalHierarchy(UserID)
CREATE INDEX [IDX_TempOrganizationalHierarchy_ManagerID] ON #OrganizationalHierarchy(ManagerID)


DECLARE @COUNT INT, @IdentNum INT
SELECT @COUNT = 0, @IdentNum = 0

INSERT INTO #OrganizationalHierarchy(UserID, ManagerID)
SELECT UserID, ManagerID
FROM RecursiveCTETable
WHERE
ManagerID IS NULL

SET @COUNT = @@ROWCOUNT

WHILE(@COUNT > 0)
BEGIN
INSERT INTO #OrganizationalHierarchy(UserID, ManagerID)
SELECT R.UserID, R.ManagerID
FROM #OrganizationalHierarchy H
INNER JOIN RecursiveCTETable R
ON H.UserID = R.ManagerID
WHERE
ID > @IdentNum

SET @COUNT = @@ROWCOUNT
SET @IdentNum = @@IDENTITY - @COUNT
END


SELECT * FROM #OrganizationalHierarchy

DROP TABLE #OrganizationalHierarchy

The above query runs with the following stats




Now lets use the Recursive CTE to solve the same problem


;WITH OrganizationalHierarchy AS(
SELECT UserID, ManagerID
FROM RecursiveCTETable
WHERE
ManagerID IS NULL
UNION ALL
SELECT R.UserID, R.ManagerID
FROM RecursiveCTETable R
INNER JOIN OrganizationalHierarchy H
ON H.UserID = R.ManagerID
)

SELECT * FROM OrganizationalHierarchy


Both the queries provide same results.

Its also evident from the stats below that Recursive CTE's not only provide a simpler code but also are efficient.




Be sure to provide right indexes for the Recursive CTE's to work properly.
Hope this article is helpful and any comments or questions are highly appreciated.

Friday, November 26, 2010

Practical use of PIVOT and UNPIVOT in an SQL Query

One of my friend's recently asked if there is a better way to write a query which required him to make a join on a single table 10 times. This gave me the opportunity to write an article which demonstrates the use of Pivot and Unpivot.

Problem :

Table1 had 11 columns, one column being the id of the row and the other 10 columns referencing Table2 (which just stored an ID and Description)

We were required to return a single row from Table1 with its ID and for each of the other 10 columns their corresponding description from Table2.

Solution :

Use UNPIVOT to convert all the columns of a single row from Table1 into multiple rows which hold each of the 10 reference columns. Then join it with Table2 to get the description. Finally PIVOT the result of the join to get back our single row with descriptions instead of IDs.



Brief description of PIVOT AND UNPIVOT


PIVOT (simplified definition)
Pivot is used when you want to convert the unique values of a column(spanning accross multiple rows) into multiple columns (column names) of a row in the output. The final output can also contain aggregations on the other columns required in the output.

UNPIVOT
Unpivot performs exactly opposite of Pivot. It converts the values of different columns in a single row to multiple rows in the output

Note:

  1. Pivot/Unpivot can only be used it the compatibility of the database is set to 90 or higher.
  2. Applying Unpivot to Pivot might not result in the same data due to the application of aggregations.

Solution with Example:

In order to demonstrate the use of Pivot and Unpivot we will be creating two tables :

First Table : ValueTable -- this table contains ID and Name

CREATE TABLE dbo.ValueTable(
ID INT IDENTITY(1,1) PRIMARY KEY
,Name VARCHAR(50) NOT NULL
)

Second Table : TableWithReferences -- contains an ID column and 5 other columns which reference the ValueTable's ID column
CREATE TABLE dbo.TableWithReferences(
ID INT IDENTITY(1,1) PRIMARY KEY
,RefCol1 INT FOREIGN KEY REFERENCES ValueTable(ID)
,RefCol2 INT FOREIGN KEY REFERENCES ValueTable(ID)
,RefCol3 INT FOREIGN KEY REFERENCES ValueTable(ID)
,RefCol4 INT FOREIGN KEY REFERENCES ValueTable(ID)
,RefCol5 INT FOREIGN KEY REFERENCES ValueTable(ID)
)


Now lets populate each table with some data to work with

INSERT INTO ValueTable(Name)
VALUES('First'), ('Second'), ('Third'), ('Fourth'), ('Fifth')

INSERT INTO TableWithReferences(RefCol1,RefCol2,RefCol3,RefCol4,RefCol5)
VALUES (1,2,3,4,5), (1,1,3,3,2)


We need to write a query which should give out the same result as the following query

DECLARE @ID AS INT
SET @ID = 2

SELECT TR.ID
,V1.Name AS RefCol1
,V2.Name AS RefCol2
,V3.Name AS RefCol3
,V4.Name AS RefCol4
,V5.Name AS RefCol5
FROM TableWithReferences TR
INNER JOIN ValueTable V1
ON TR.RefCol1 = V1.ID
INNER JOIN ValueTable V2
ON TR.RefCol2 = V2.ID
INNER JOIN ValueTable V3
ON TR.RefCol3 = V3.ID
INNER JOIN ValueTable V4
ON TR.RefCol4 = V4.ID
INNER JOIN ValueTable V5
ON TR.RefCol5 = V5.ID
WHERE
TR.ID = @ID

Output of the query above :

ID RefCol1 RefCol2 RefCol3 RefCol4 RefCol5
2 First First Third Third Second

The problem with the query above is that it requires too many joins. Lets take a look at how we can solve this problem by using Pivot and Unpivot

Now lets cook up some queries to solve this problem
Lets start by writing a query to display the Column name and its reference value in each row.


DECLARE @ID AS INT
SET @ID = 2

SELECT ValueTableID
,ReferenceColumn
FROM (
SELECT RefCol1
,RefCol2
,RefCol3
,RefCol4
,RefCol5
FROM TableWithReferences
WHERE
ID = @ID
)P
UNPIVOT(
ValueTableID
FOR ReferenceColumn IN(RefCol1, RefCol2, RefCol3, RefCol4, RefCol5)
) AS Unpvt

the result of this query is :

ValueTableID ReferenceColumn
------------ -----------------
1 RefCol1
1 RefCol2
3 RefCol3
3 RefCol4
2 RefCol5


Now joining the previous query with Value table will give us the Names instead of IDs

DECLARE @ID AS INT
SET @ID = 2

SELECT ReferenceColumn, Name
FROM
(
SELECT ValueTableID
,ReferenceColumn
FROM (
SELECT RefCol1
,RefCol2
,RefCol3
,RefCol4
,RefCol5
FROM TableWithReferences
WHERE
ID = @ID
)P
UNPIVOT(
ValueTableID
FOR ReferenceColumn IN(RefCol1, RefCol2, RefCol3, RefCol4, RefCol5)
) AS Unpvt
) P
INNER JOIN ValueTable V
ON P.ValueTableID = V.ID

query's output:

ReferenceColumn Name
RefCol1 First
RefCol2 First
RefCol3 Third
RefCol4 Third
RefCol5 Second

Now that we have got the Names from IDs, its time to unpivot the result to get our single row

DECLARE @ID AS INT
SET @ID = 2
SELECT @ID as ID, RefCol1, RefCol2, RefCol3, RefCol4, RefCol5
FROM
(
SELECT ReferenceColumn, Name
FROM
(
SELECT ValueTableID
,ReferenceColumn
FROM (
SELECT RefCol1
,RefCol2
,RefCol3
,RefCol4
,RefCol5
FROM TableWithReferences
WHERE
ID = @ID
)P
UNPIVOT(
ValueTableID
FOR ReferenceColumn IN(RefCol1, RefCol2, RefCol3, RefCol4, RefCol5)
) AS Unpvt
) P
INNER JOIN ValueTable V
ON P.ValueTableID = V.ID
) pv
PIVOT
(MAX(Name) FOR ReferenceColumn IN ([RefCol1], [RefCol2], [RefCol3], [RefCol4], [RefCol5])) P

Output of this query :

ID RefCol1 RefCol2 RefCol3 RefCol4 RefCol5
2 First First Third Third Second


This output is same as the output of the query with lots of joins.

Hope this post was informative and helpful.
Any suggestions or comments are welcome.