Tuesday, December 21, 2010

Navigating through hierarchy using Recursive Common Table Expressions

Its quite often that we find the need to traverse up or down the hierarchy of a tree.
For example :
1. to find everyone working in an organization ( Manager -> Sub Manager -> Employee)
2. to list the products, the components which make up the product right down to the individual parts that make the components.

To accomplish these things, we had to use temporary tables, cursors or loops and some terminating logic. The introduction of recursive CTEs makes it much more efficient and easy.

Before we begin with the usage of CTEs, lets have a brief introduction.
Common Table Expression:
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

Basic Structure of a Recursive CTE :

WITH cte_name ( column_name [,...n] )
AS
(
CTE_query_definition –- Anchor member is defined.
UNION ALL
CTE_query_definition –- Recursive member is defined referencing cte_name.
)

-- Statement using the CTE
SELECT *
FROM cte_name

The anchor part of the recursive CTE defines the initial data set. This inital data set forms input for the first recursive iteration, the output of which forms the input for the next recursive iteration. Recursion continues till an empty set is returned.





Now let us create some test tables and populate them with dummy data.



Query to create our test table



CREATE TABLE RecursiveCTETable
(UserID INT PRIMARY KEY,
ManagerID INT)



now lets create an index on ManagerID column to make things faster for retrieval



CREATE INDEX IDX_ManagerID ON RecursiveCTETable(ManagerID)



Here is an easy way of populating our test data






;WITH Populator AS(
select ROW_NUMBER() over(order by c1.object_id) RowNum from sys.columns C1 cross apply sys.columns C2
)

INSERT INTO RecursiveCTETable(UserID, ManagerID)
SELECT RowNum - 1, RowNum/7
FROM Populator

UPDATE RecursiveCTETable
SET ManagerID = NULL
WHERE
UserID = 0




Let us see how things used to be done using temp tables:




CREATE TABLE #OrganizationalHierarchy(ID INT IDENTITY(1,1) PRIMARY KEY, UserID INT, ManagerID INT)
CREATE INDEX [IDX_TempOrganizationalHierarchy_UserID] ON #OrganizationalHierarchy(UserID)
CREATE INDEX [IDX_TempOrganizationalHierarchy_ManagerID] ON #OrganizationalHierarchy(ManagerID)


DECLARE @COUNT INT, @IdentNum INT
SELECT @COUNT = 0, @IdentNum = 0

INSERT INTO #OrganizationalHierarchy(UserID, ManagerID)
SELECT UserID, ManagerID
FROM RecursiveCTETable
WHERE
ManagerID IS NULL

SET @COUNT = @@ROWCOUNT

WHILE(@COUNT > 0)
BEGIN
INSERT INTO #OrganizationalHierarchy(UserID, ManagerID)
SELECT R.UserID, R.ManagerID
FROM #OrganizationalHierarchy H
INNER JOIN RecursiveCTETable R
ON H.UserID = R.ManagerID
WHERE
ID > @IdentNum

SET @COUNT = @@ROWCOUNT
SET @IdentNum = @@IDENTITY - @COUNT
END


SELECT * FROM #OrganizationalHierarchy

DROP TABLE #OrganizationalHierarchy

The above query runs with the following stats




Now lets use the Recursive CTE to solve the same problem


;WITH OrganizationalHierarchy AS(
SELECT UserID, ManagerID
FROM RecursiveCTETable
WHERE
ManagerID IS NULL
UNION ALL
SELECT R.UserID, R.ManagerID
FROM RecursiveCTETable R
INNER JOIN OrganizationalHierarchy H
ON H.UserID = R.ManagerID
)

SELECT * FROM OrganizationalHierarchy


Both the queries provide same results.

Its also evident from the stats below that Recursive CTE's not only provide a simpler code but also are efficient.




Be sure to provide right indexes for the Recursive CTE's to work properly.
Hope this article is helpful and any comments or questions are highly appreciated.

1 comment: