Common Table Expressions (CTE) for SQL Server Developers

What is A CTE?

A CTE is a temporary view that exists only in the scope of running query. A CTE is similar to a view but a CTE is embedded into the query. CTEs can be used in Stored Procedures, User Define Functions (UDFs), Triggers and Views but not in Indexed Views.

Below is the syntax for a CTE :

In the database, you will have a physical table called Sales.Customer and from the above query you are taking that table into a CTE called AllmyCustomers which will be referred in a future query.

Difference between a CTE and Variable Table

Obviously, a CTE is different from a temporary table sice temporary tables save data to the tempdb but the CTE will be kept in the memory. So are is the differences between CTEs and table variables?

  • A table variable’s scope is for the batch, while a CTE’s scope is only for the query.
  • To populate a table variable, you need to execute insert scripts, but with CTEs the data will be populated at time of defining the CTE.

  • If you want to use recursion in table variables, you need to write your own queries but with a CTE it is much easier to write recursive queries.

  • If you are using larger data sets in table variables it will use tempdb to store data physically. However, a CTE always uses memory. This can be an issue in electing for CTEs as you need to ensure that you are not using larger data sets that will drain memory. The following graph shows tempdb write transactions/sec for temp tables, table variables and CTE scenarios.
    You can simply see that a CTE does not write anything to the tempdb, hence it has a higher performance compared with the other two options.

Usages for CTE

There are a few scenarios where using CTEs is optimal:

Recursion

Recursion is typically a difficult development task. Let us look atan example, say we have following data set:

IF EXISTS (SELECT 1 FROM sys.tables WHERE Name = 'Employees')
DROP TABLE Employees
CREATE TABLE Employees
(EmployeeID INT PRIMARY KEY,
Name VARCHAR(50), ReportsTo INT)

The important point to remember is ReportsTo is foreign key constraint to the EmployeeID column or as some of you call it, this table has a self-relationship between two columns of the same data.

We will now populate this with the following sample data:

INSERT INTO Employees
VALUES
(1,'Richard',NULL),
(2,'Stephan',1),
(3,'Clemens',2),
(4,'Simmon',2),
(5,'Anderson',4),
(6,'Cris',1),
(7,'Andrew',5),
(8,'Peter',6)

If you visualize your data, it will looks as below.

Richard is the boss of the company while Stephan and Cris report directly to Richard.

Let us say you want to find out all the employees that come under Stephan who has an employeeid of ‘2’.

WITH EmployeeRecursion(Name,EmployeeID,ReportsTo)
AS
(
 SELECT Name,EmployeeID,ReportsTo
 From Employees WHERE EmployeeID = 2 -- @EmpID
 UNION ALL
 SELECT e.Name,e.EmployeeID,e.ReportsTo
 From Employees e INNER JOIN EmployeeRecursion r
 ON e.ReportsTo = r.EmployeeID
 )
 SELECT R.EmployeeID,R.Name,E.Name Boss
 FROM EmployeeRecursion  R
 INNER JOIN Employees E ON R.ReportsTo = E.EmployeeID

This results in the below output :

Below is the query plan for the above CTE which shows one index scan and two index seeks. This is after adding a non-clustered index to the ReportsTo column

In recursive queries, you can set number of levels you need to stop recursion by applying the MAXRECURSION option.

--Max Recursion
WITH EmployeeRecursion(Name,EmployeeID,ReportsTo)
AS
(
SELECT Name,EmployeeID,ReportsTo
From Employees WHERE EmployeeID = 2
UNION ALL
SELECT e.Name,e.EmployeeID,e.ReportsTo
From Employees e INNER JOIN EmployeeRecursion r
ON e.ReportsTo = r.EmployeeID
)
 SELECT R.EmployeeID,R.Name,E.Name Boss
 FROM EmployeeRecursion  R
 INNER JOIN Employees E ON R.ReportsTo = E.EmployeeID
OPTION (MAXRECURSION 2)

Though this returns the correct values indented, error of level 16 will be generated triggering an exception.

Msg 530, Level 16, State 1, Line 2
The statement terminated. The maximum recursion 2 has been exhausted
before statement completion.

This will prevent users to using the MAXRECUSION option. This issue exists in all SQL Server versions up to SQL Server 2012. You can overcome this by, introducing a level column to the recursion as shown below.

WITH EmployeeRecursion(Name,EmployeeID,ReportsTo,Level)
AS
(
SELECT Name,EmployeeID,ReportsTo,0 Level
 From Employees WHERE EmployeeID = 2
 UNION ALL
 SELECT e.Name,e.EmployeeID,e.ReportsTo, Level + 1
 From Employees e INNER JOIN EmployeeRecursion r
 ON e.ReportsTo = r.EmployeeID
 WHERE Level <2
 )
 SELECT R.EmployeeID,R.Name,E.Name Boss
 FROM EmployeeRecursion  R
 INNER JOIN Employees E ON R.ReportsTo = E.EmployeeID

Finding Duplicates

CTEs can also assist in finding duplicates in our data by improving both performance and readability. Lets through this by an example. First we will create the below table with data

 --Finding Duplicates
IF EXISTS (SELECT 1 FROM sys.tables WHERE Name = 'Products')
DROP TABLE Products
CREATE TABLE Products
(ProductID INT PRIMARY KEY CLUSTERED,
ProductName VARCHAR(50),
Price NUMERIC(9,2)
)
INSERT INTO Products
VALUES
(1,'Mouse',20.15),
(2,'KeyBoard',15.75),
(3,'Monitor',214),
(4,'UPS',88),
(5,'Mouse',20.14),
(6,'Monitor',251),
(7,'Mouse Pad',5),
(8,'Monitor',265)

Now, you are given the task of finding duplicates:

Below is the traditional way of doing it.

SELECT * FROM Products
WHERE ProductID NOT IN
(
SELECT MIN(A.ProductID) FROM Products A
WHERE  Products.ProductName = A.ProductName
);

Using CTEs will eliminate the subquery and improve the performance:

WITH DuplicateProducts
AS
(
SELECT MIN(ProductID) AS ProductID , ProductName
FROM Products
GROUP BY ProductName
HAVING COUNT(1) > 1
)
SELECT P.ProductID,P.ProductName
FROM Products P INNER JOIN DuplicateProducts cte
ON P.ProductName = cte.ProductName
AND P.ProductID >cte.ProductID;

Leave a comment

Your email address will not be published.