SQL Server T-SQL Tuning -TVF and Scalar Functions

In the first of this series we looked at TSQL tuning for temp tables, table variables and the Union command , we now turn our attention to User Defined Functions – either Table-Valued Functions or Scalar Functions.

A UDF is very convenient for centralising business logic as we can specify a set of business logic in one UDF which references multiple stored procedures and ad-hoc queries. However, they can lead to significant performance degradation due to their demands on the CPU


Table-Valued Functions (TVF)

As a general concept, when using a TVF an inner join with an object without sufficient indexing will result in a TVF scan which is very similar to an Index Scan or a Table Scan.

As an alternative we could create temp tables with the appropriate clustered indexes or non-clustered indexes as below:

  • Create the appropriate temp tables.
  • Create sufficient clustered and non clustered according to your T-SQL Query.
  • Insert all data coming from the TVF into temp tables.
  • Replace each TVF by this temp table and all relevant columns as well.
  • Delete the temp tables at the end of the query.

Note the performance advantage of temp tables over table variables in this scenario since (as noted in the previous article) table variables do not support indexes.

Examples : :

Create a TVF :

use [Workshops]
go
alter FUNCTION Dep_Salaries1
(
@empid int
)
RETURNS @table table
(
Department int,
Salary_Max int,
Salary_Min int
)
AS
BEGIN
declare @Department int = (select S.deptid from Employees s where s.empid=@empid)
insert into @table
SELECT S.deptid , max (Salary) , MIN(Salary)  FROM Employees s inner join Departments T  ON S.deptid =T.deptid group by S.deptid having S.deptid =@Department
RETURN
END
GO

Poor T-SQL Practice1 using a TVF :

alter procedure Unperformant_SP1
@empid  int
as
begin
select T.deptid as department_name , s.* from Dep_Salaries1 (@empid )S inner join Departments T  ON S.Department =T.deptid
end

Replacing TVF using Temp tables :

go
alter procedure Performant_SP1
@empid  int
as
begin
create table #table
(
Department int,
Salary_Max int,
Salary_Min int
)
create clustered index #table_index1 on #table (Department)
insert into #table select * from Dep_Salaries1 (@empid )
select T.deptid as department_name , s.* from #table S inner join Departments T  ON S.Department =T.deptid
end

You should still performance test both approaches with your specific queries and data to determine which the best for your case.

Scalar Functions Within Selected Columns

Scalar functions are very convenient for determining aggregative values ,cumulative values and differentiated values within stored procedures or ad-hoc queries, but they come with a performance hit especially when used with large volumes of data since a scalar function will be executed for each record.


Alternatives to Scalar Functions

Temp Tables

Temp tables can be used, although the situation is a little different from the TVF scenario as we are looking to dispense entirely with the scalar function and instead use their internal T-SQL code directly.


Persisted deterministic computed columns

Persisted deterministic computed column values are not recomputed every time they are selected but are computed one time when they are created. Thus they can add a distinct performance advantage for T-SQL queries as they reduce the processing overhead. This feature can be added by following the below steps

  1. Adding a new computed column to store the results of scalar function.
  2. Enabling the persisted feature of this computed column.
  3. Place the appropriate indexes on the column either as key column or include column.

But bear in your mind that persisted feature has some restrictive limitations such as:

  • Computed columns should not to use any aggregate functions of other records.
  • Computed columns should not use functions that call external system procedures.
  • Computed columns should not use any functions of other fields of other tables.
  • It is generally better to use System provided functions such as Convert , Cast ,Replace etc and not UDFs created by developers since UDFs often contradict the deterministic feature.

This only applies to the persisted feature , but to able to add an index on these computed columns , they should be also Precise columns by making sure they are calculated of precise data types like Int ,Bigint and Datetime and not imprecise columns such as decimal types. If the datatypes are not precise ,you can just add these columns to the ‘include columns’ part of the index and not ‘keys column’ part.

Use Scheduled Update Jobs

If it is not possible to use persisted deterministic computed columns, you can create normal columns and create scheduled update jobs on them to update them with the output of scalar functions and then use these columns in your T-SQL query instead of a scalar function as below:

Create a Scalar Function :

use [Workshops]
go
create FUNCTION Salary_Tax
(
@empid int
)
RETURNS float
AS
BEGIN
declare @salary int = (select (S.salary-100) from Employees s where s.empid=@empid)
RETURN @salary
END
GO

--Poor T-SQL Practice using the Scalar function
Select empid ,dbo.Salary_Tax (empid) as 'SalaryWithTax' from Employees

Replace a Scalar function with a Temp table :

Create Table #temp (Empid int primary key clustered , Salary_Tax float)
Create nonclustered index #temp_Index1 on #temp (Empid ) include (Salary_Tax )
insert into #temp select Empid ,(Salary-100) as salary_Tax from Employees
select * from #temp

Replace a Scalar function using persisted deterministic computed columns :

ALTER TABLE dbo.Employees ADD Salary_Tax  AS Salary-100 PERSISTED
Create nonclustered index Employees_Index1 on Employees (Empid, Salary_Tax )
select empid ,Salary_Tax from Employees

Replace a Scalar function using Scheduled jobs :

ALTER TABLE dbo.Employees ADD Salary_Tax1 float, update_flag bit
ALTER TABLE dbo.Employees ADD CONSTRAINT DF_Employees_update_flag DEFAULT 0 FOR update_flag

Schedule the below DML update by an appropriate frequency according to your workload

Update Employees set Salary_Tax1=Salary-100 WHERE UPDATE_Flag=0

Then you can include the below select query within your stored procedure.

select empid , Salary_Tax1 from Employees

In the next article of this series we will look at optimizing Not In and Not exists in T-SQL Queries



Related Articles :

26 Responses to “SQL Server T-SQL Tuning -TVF and Scalar Functions”

  1. I really enjoy reading this :-)

  2. Hey Shehap, great information in your article…
    I really appreciate, its very usefull…

  3. Its very helpfull !

  4. Good informative article!!

  5. What? No discussion of inline TVFs?

  6. I would like to know what is the best way to learn T-SQL. Can you please recommend something. Thank you for writing your blog.

  7. Very helpful and useful, specially I used some of this techniques ,but without begin aware of the explanation.

  8. –would this work?

    use [Workshops]
    go
    create FUNCTION Dep_Salaries1_ITVF
    (@empid int)
    RETURNS table
    return

    with dep as
    (select
    S.deptid
    from
    Employees s
    where
    s.empid=@empid)
    select
    max(s.Salary) min_salary,
    min(s.Salary) max_salary
    FROM
    Employees s
    inner join Departments T ON S.deptid =T.deptid
    inner join dep on dep.deptid = s.deptId;

    go;

    alter procedure Unperformant_SP1
    @empid int
    as
    begin
    select
    T.deptid as department_name,
    s.*
    from
    Departments T
    outer apply dbo.Dep_Salaries1_ITVF(@empid) s
    end

    go;

    • Thx for your comments …You are definitely right if your case is simple and have just single select statement not multiple statements and complex code logics coz ITFV (Inline TVF) actually represents a parameterized view , thereby my blog here was directed to address such generic MTVF (Multiline TVF) performance issues using the mentioned T-SQL practices which really worked perfectly with me for numerous cases on HUGE TB DBs …You can try them also and let me know your input

      Perhaps I might included simple examples that can be covered easily by ITVF but generally speaking I tried to show out generic T-SQL practices to address any performance issue of TVFs

      In addition, I consider largely DB stress powers so using Temp table might be sometimes much powerful if replacing MTVF or ITVF if they are used within joins with other tables but you need to check that along with your T-SQL queries and I recommend heavily for your to rely any DB stress tool like SQLQueryStress

      Once again …Thx for your follow and comments

  9. There are two major important items missing in this episode.

    1. Using iTVFs (inline versions of table valued functions, also known as parameterised views)

    2. On regular TVFs, index the result table properly with not just a primary key, but if needed also a unique key on other fields, if needed with an identity column added as last in the index for making it unique.

    3. Add WITH SCHEMABINDING to the function definition. I don;t remember what it was for sure, but it certaily eliminates some steps in queryplans (spools maybe?).

    A short note on iTVF, you just return the type TABLE, and then jist as with a view, make the body up with a single select statement (which now can use the parameters given to the iTVF, contrarary to a regular view). Finally you wrap the whole select in a as in:

    create function Dep_Salaries1( @empid as int ) with schemabinding
    returns table
    as
    return
    (
    select
    Department = S.deptid
    , Salary_Max = max( S.Salary )
    , Salary_Min = min( S.Salary )
    from
    Employees as E
    inner join Departments as T on T.deptid = E.deptid /* Is this line even needed in this model? */
    inner join Employees as S on S.deptid = E.deptid
    where
    E.empid = @empid
    group by
    S.deptid
    )
    ;
    )

    As a final general comment to the examples seen in the article. Please always prefix attributes with a table alias (so no S.* and such). Now as a reader I have to guess or assume where Salary is stored. Assuming this is an attribute of Eployees, the whole join to Departments is not needed in the first example!!!

    It also is an odd function feeding an employee ID and returning aggregates over all emplyees in the same department.

  10. Shehap El-Nagar wrote:
    “You can Learn more about T-SQL Programming at http://www.tsql.info/..

    Kindly let me know if any further help is needed

    Good start but needs more “how and why”. For example, the trigger section just mentions syntax. Nothing about the logical tables formed, etc.

  11. There’s a misconception here. Table variables do, in fact, support indexes. The only trouble is that they must be formed as constraints during table creation time which also makes them a bit limited but you can still form a PK and you can still form unique indexes through the use of unique constraits.

    I also would have liked to see some info on using iTVFs (especially as a replacement for scalar functions) but it’s good to see someone think out of the box with Temp Tables and Divide’n'Conquer methods instead of necessarily trying to to everything using an “All-in-One” query.

    Here’s an article on replacing scalar functions with iTVFs.
    http://www.sqlservercentral.com/articles/T-SQL/91724/

    • Thx for your kind words and notes …That is right , you can create clustered index implicitly on table variable by defining a primary key on some column like DECLARE @TABLE TABLE (name INT primary key) and that is the only way to do that and no other way to create clustered or nonclustered indexes explicitly which actually restrict the chances of having indexes on a table variable more than temp tables and that is exactly what I meant for that regard

      Once again thanks for your comments

      • Actually, you can create unique indexes on table variables by assigning unique constraints to them. SQL Server uses an index to enforce uniqueness.

        Try this in a test environment:

        USE tempdb;
        GO
        CREATE TABLE #T (ObjectID INT, IndexID INT);

        INSERT INTO #T (ObjectID, IndexID)
        SELECT OBJECT_ID, index_id
        FROM sys.indexes;
        GO
        DECLARE @T TABLE (
        ID INT IDENTITY PRIMARY KEY,
        Col1 INT NOT NULL UNIQUE);

        SELECT sys.indexes.OBJECT_ID, sys.indexes.index_id
        FROM sys.indexes
        EXCEPT
        SELECT ObjectID, IndexID
        FROM #T;
        GO
        DECLARE @T TABLE (
        ID INT IDENTITY PRIMARY KEY,
        Col1 INT NOT NULL);

        SELECT sys.indexes.OBJECT_ID, sys.indexes.index_id
        FROM sys.indexes
        EXCEPT
        SELECT ObjectID, IndexID
        FROM #T;
        GO
        DROP TABLE #T;

        What you’ll see is that the table variable declared with a UNIQUE constraint generates an index, which shows up in sys.indexes. You’ll see the PK clustered index in there too.

        • That is right , you can create clustered index implicitly on table variable by defining a primary key on some column like DECLARE @TABLE TABLE (name INT primary key) and that is the only way to do that and no other way to create clustered or nonclustered indexes explicitly which actually restrict the chances of having indexes on a table variable and that is exactly what I meant for that regard

  12. I really have to take exception to most of what you have written.

    First, I would rewrite the multistatement TVF as an inline TVF. Multistatement TVF can be a performance problem if used incorrectly.

    Second, your example is really to simplistic, the query itself can easily be tuned just by doign the following:

    select
    t.deptid as department_name,
    max(s.Salary) as MaxDeptSalary,
    min(s.Salary) as MinDeptSalary
    from
    dbo.Departments t
    inner join dbo.Employees s
    on (t.deptid = s.deptid)
    group by
    t.deptid
    having
    t.deptid = (select deptid from Employees where empid = @empid);

    As for scalar UDFs, if they are being used in a SELECT column list, I would rewrite them as inline TVFs that return a single row and use them in a cross apply or outer apply in the FROM clause depending on how they are written as UDFs (i.e., is it possible for the UDF to return a null value).

  13. Only one issue: You can use clustered index on a table variable.

    Outside of that your article is very helpful.

    • Thx for your kind words and notes …That is right , you can create clustered index implicitly on table variable by defining a primary key on some column like DECLARE @TABLE TABLE (name INT primary key) and that is the only way to do that and no other way to create clustered or nonclustered indexes explicitly which actually restrict the chances of having indexes on a table variable and that is exactly what I meant for that regard

      Once again thanks for your comments

  14. Interesting and very helpful article. Thanks for writing this !!!

  15. Your blogs help me a lot … really thankful to you …

  16. Thanks Shehap for the Blog, very helpful !

  17. Its very helpful !Thank you for writing this!!

  18. Very helpful and valuable

  19. I really appreciate, its very usefull

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |