New T-SQL Features in SQL Server 2012
Paging is a big issue for developers as it is required for many applications but entails many performance problems. Developers have used different workarounds to support paging.
For example, assume we need to display several pages of the HumanResource.Employee object with 5 rows per page. Below is the query you need to execute this in SQL Server 2012.
DECLARE @page_index SMALLINT , @page_Size SMALLINT = 5, @offset INT SET @page_index = 2 SET @offset = ( (@page_index - 1) * @page_Size) SELECT BusinessEntityID, NationalIDNumber, JobTitle FROM HumanResources.Employee ORDER BY BusinessEntityID OFFSET @offset ROWS FETCH NEXT @page_Size ROWS ONLY
In this query you will note that there are two additional syntaxes added to the well-known SELECT query.
OFFSET <offset> ROWS
The number of rows to exclude from the top.
FETCH NEXT <page_size> ROWS ONLY
The number of rows to display.
So in the results you will only receive the data you need.
You must remember that ORDER BY clause is a must, you can have multiple columns for the ORDER BY clause.
SELECT SOH.SalesOrderID ,SOD.SalesOrderDetailID ,P.Name As ProductNam FROM Sales.SalesOrderDetail SOD INNER JOIN Sales.SalesOrderHeader SOH ON SOD.SalesOrderID = SOH.SalesOrderID INNER JOIN Production.Product P ON P.ProductID = SOD.ProductID ORDER BY SOH.SalesOrderID,SOD.SalesOrderDetailID DESC OFFSET 100 ROWS FETCH NEXT 25 ROWS ONLY
The query plan for this will appear as below:
As you can see, the Top operator will be introduced at an almost zero percent cost.
Lag & Lead
This is another T-SQL feature added to SQL Server 2012 addresses an issue in reporting.
Let me describe this feature with a real world example.
Below are the sales amounts for one month.
The above data was generated from the FactInternetSales table in the AdventureWorksDWDenali database. You can find the same script in the 5528EN _02_03.sql file in the code folder.
For reports you need this month’s sales and last month’s sales along with the difference between the two months.
If you were asked to write a query for this, you query would look like:
SELECT t1.Year, t1.Month, t1.SalesAmount as SalesAmountThisMonth, t2.SalesAmount as SalesAmountLastMonth, t1.SalesAmount - t2.SalesAmount SalesDifferent FROM MonthlySales as t1 LEFT OUTER JOIN MonthlySales as t2 ON (t1.Year = t2.Year) AND (t1.Month = t2.Month+1) ORDER BY Year, Month
You need to use LEFT OUTER JOIN as INNER JOIN will not give any rows for the very first month in the data set.
Obviously, there will be a either an index scan or a table scan (depending whether you have an index or not) twice for the table which is costly.
With SQL Server 2012, the new T-SQL function LAG is introduced just to address the above scenario:
SELECT Year, Month, SalesAmount as SalesAmountThisMonth, LAG(SalesAmount, 1, NULL) OVER(ORDER BY Year, Month) as SalesAmountLastMonth, SalesAmount - LAG(SalesAmount, 1, 0) OVER(ORDER BY Year, Month) SalesDifferent FROM MonthlySales
As shown in the above example, the LAG analytical function has three parameters. The first parameter is the expression which is the column in this scenario. The second parameter is the offset, since we are looking for last month this parameter is 1. If you want the last year’s comparative month’s data to be displayed then this parameter should be 12. The third parameter is the default, in case there is no match for the previous month, default value will be displayed.
So the output of the above query will be:
So how does this new syntax compare with the pre-2012 syntax? The simple way to compare them is running both queries in one batch and check the execution plan as shown in the below image. In the batch, the first query is the old style query and second query is the one using the LAG function.
Relative cost wise, the first query (old style LEFT OUTER JOIN) cost is 77% whereas the query which contains LAG cost is 23%. These queries were analyzed for Reads and duration using the SQL Profiler.
Below are the results for both queries executed after clearing the cache.
LEFT OUTER JOIN
Obviously using LAG has a clear performance advantage. However, to achieve better performance you need to cover those columns (in the above scenario, it will be Year and Month).
You can also use the LAG function for grouping. For example, if you want product model sales differenciated by name you could use the below query.
SELECT ModelName, Year, Month, SalesAmount as SalesAmountThisMonth, LAG(SalesAmount, 1, NULL) OVER(PARTITION BY ModelName ORDER BY Year, Month) as SalesAmountLastMonth, SalesAmount - LAG(SalesAmount, 1, 0) OVER(PARTITION BY ModelName ORDER BY Year, Month) SalesDifferent FROM ModelNameWiseMonthlySales
And the result is:
From the above results, you will see that the Sales Amount for the previous month is taken only for the model name.
LEAD is another new function which is similar to LAG but instead shows the future records.
SELECT Year, Month, SalesAmount as SalesAmountThisMonth, LAG(SalesAmount, 1, NULL) OVER(ORDER BY Year, Month) as SalesAmountLastMonth, LEAD(SalesAmount, 1, NULL) OVER(ORDER BY Year, Month) as SalesAmountNextMonth FROM MonthlySales
As you can see from the below screenshot, the above query will give you the previous month’s and next month’s sales values in a single query.
FIRST_VALUE & LAST_VALUE
FIRST_VALUE will return the first value while LAST_VALUE will return the last in an ordered set of values in SQL Server 2012.
Take for example, the below query.
SELECT d.LastName, d.GroupName, e.JobTitle, e.HireDate, FIRST_VALUE(e.HireDate) OVER (PARTITION BY d.GroupName ORDER BY e.JobTitle) AS FirstValue, LAST_VALUE(e.HireDate) OVER (PARTITION BY d.GroupName ORDER BY e.JobTitle) AS LastValue FROM HumanResources.Employee e INNER JOIN HumanResources.vEmployeeDepartmentHistory d ON e.BusinessEntityID = d.BusinessEntityID
This query will join the HumanResources.Employee table and the HumanResources.vEmployeeDepartmentHistory view together via BusinessEntityID. You can see that LastName, JobTitle, GroupName and Hiredate are selected from either the table or the view.
If you closely look at the LAST_VALUE and the FIRST_VALUE function, you will observe that the Partition column and the order by column (Which are HireDate and GroupName respectively) are the same.
FIRST_VALUE will give you the first value of the hiredate for each group when it orders by job title.
The results are given below: