Common Table Expressions (CTE) for SQL Server Developers
What is A CTE?
A CTE is a temporary view that exists only in the scope of running query. A CTE is similar to a view but a CTE is embedded into the query. CTEs can be used in Stored Procedures, User Define Functions (UDFs), Triggers and Views but not in Indexed Views.
Below is the syntax for a CTE :
In the database, you will have a physical table called Sales.Customer and from the above query you are taking that table into a CTE called AllmyCustomers which will be referred in a future query.
Difference between a CTE and Variable Table
Obviously, a CTE is different from a temporary table sice temporary tables save data to the tempdb but the CTE will be kept in the memory. So are is the differences between CTEs and table variables?
- A table variable’s scope is for the batch, while a CTE’s scope is only for the query.
- To populate a table variable, you need to execute insert scripts, but with CTEs the data will be populated at time of defining the CTE.
- If you want to use recursion in table variables, you need to write your own queries but with a CTE it is much easier to write recursive queries.
If you are using larger data sets in table variables it will use
tempdb to store data physically. However, a CTE always uses memory. This
can be an issue in electing for CTEs as you need to ensure that you are not using
larger data sets that will drain memory. The following graph shows tempdb write transactions/sec
for temp tables, table variables and CTE scenarios.
You can simply see that a CTE does not write anything to the tempdb, hence it has a higher performance compared with the other two options.
Usages for CTE
There are a few scenarios where using CTEs is optimal:
Recursion is typically a difficult development task. Let us look atan example, say we have following data set:
IF EXISTS (SELECT 1 FROM sys.tables WHERE Name = 'Employees') DROP TABLE Employees CREATE TABLE Employees (EmployeeID INT PRIMARY KEY, Name VARCHAR(50), ReportsTo INT)
The important point to remember is ReportsTo is foreign key constraint to the EmployeeID column or as some of you call it, this table has a self-relationship between two columns of the same data.
We will now populate this with the following sample data:
INSERT INTO Employees VALUES (1,'Richard',NULL), (2,'Stephan',1), (3,'Clemens',2), (4,'Simmon',2), (5,'Anderson',4), (6,'Cris',1), (7,'Andrew',5), (8,'Peter',6)
If you visualize your data, it will looks as below.
Richard is the boss of the company while Stephan and Cris report directly to Richard.
Let us say you want to find out all the employees that come under Stephan who has an employeeid of ‘2’.
WITH EmployeeRecursion(Name,EmployeeID,ReportsTo) AS ( SELECT Name,EmployeeID,ReportsTo From Employees WHERE EmployeeID = 2 -- @EmpID UNION ALL SELECT e.Name,e.EmployeeID,e.ReportsTo From Employees e INNER JOIN EmployeeRecursion r ON e.ReportsTo = r.EmployeeID ) SELECT R.EmployeeID,R.Name,E.Name Boss FROM EmployeeRecursion R INNER JOIN Employees E ON R.ReportsTo = E.EmployeeID
This results in the below output :
Below is the query plan for the above CTE which shows one index scan and two index seeks. This is after adding a non-clustered index to the ReportsTo column
In recursive queries, you can set number of levels you need to stop recursion by applying the MAXRECURSION option.
--Max Recursion WITH EmployeeRecursion(Name,EmployeeID,ReportsTo) AS ( SELECT Name,EmployeeID,ReportsTo From Employees WHERE EmployeeID = 2 UNION ALL SELECT e.Name,e.EmployeeID,e.ReportsTo From Employees e INNER JOIN EmployeeRecursion r ON e.ReportsTo = r.EmployeeID ) SELECT R.EmployeeID,R.Name,E.Name Boss FROM EmployeeRecursion R INNER JOIN Employees E ON R.ReportsTo = E.EmployeeID OPTION (MAXRECURSION 2)
Though this returns the correct values indented, error of level 16 will be generated triggering an exception.
Msg 530, Level 16, State 1, Line 2 The statement terminated. The maximum recursion 2 has been exhausted before statement completion.
This will prevent users to using the MAXRECUSION option. This issue exists in all SQL Server versions up to SQL Server 2012. You can overcome this by, introducing a level column to the recursion as shown below.
WITH EmployeeRecursion(Name,EmployeeID,ReportsTo,Level) AS ( SELECT Name,EmployeeID,ReportsTo,0 Level From Employees WHERE EmployeeID = 2 UNION ALL SELECT e.Name,e.EmployeeID,e.ReportsTo, Level + 1 From Employees e INNER JOIN EmployeeRecursion r ON e.ReportsTo = r.EmployeeID WHERE Level <2 ) SELECT R.EmployeeID,R.Name,E.Name Boss FROM EmployeeRecursion R INNER JOIN Employees E ON R.ReportsTo = E.EmployeeID
CTEs can also assist in finding duplicates in our data by improving both performance and readability. Lets through this by an example. First we will create the below table with data
--Finding Duplicates IF EXISTS (SELECT 1 FROM sys.tables WHERE Name = 'Products') DROP TABLE Products CREATE TABLE Products (ProductID INT PRIMARY KEY CLUSTERED, ProductName VARCHAR(50), Price NUMERIC(9,2) ) INSERT INTO Products VALUES (1,'Mouse',20.15), (2,'KeyBoard',15.75), (3,'Monitor',214), (4,'UPS',88), (5,'Mouse',20.14), (6,'Monitor',251), (7,'Mouse Pad',5), (8,'Monitor',265)
Now, you are given the task of finding duplicates:
Below is the traditional way of doing it.
SELECT * FROM Products WHERE ProductID NOT IN ( SELECT MIN(A.ProductID) FROM Products A WHERE Products.ProductName = A.ProductName );
Using CTEs will eliminate the subquery and improve the performance:
WITH DuplicateProducts AS ( SELECT MIN(ProductID) AS ProductID , ProductName FROM Products GROUP BY ProductName HAVING COUNT(1) > 1 ) SELECT P.ProductID,P.ProductName FROM Products P INNER JOIN DuplicateProducts cte ON P.ProductName = cte.ProductName AND P.ProductID >cte.ProductID;