Are UDFs Harmful to SQL Server Performance?
Well, not really. But it is vitally important to understand when to use them and when to avoid them, as well as how best to construct them so as not to shoot your foot off.
SQL Server 2000 introduced user-defined functions (UDFs), and they were immediately hailed as a great tool for encapsulating repetitive code, as well as allowing you to perform more complicated processing directly in an SQL expression. On its face, that claim is valid. You can certainly improve readability and maintainability with UDFs. But cleaner code will be cold comfort if your queries bog down and lock up your server.
As almost every article on UDFs will tell you, there are three different flavors of UDF in SQL Server 2000: scalar, tabular, and multistatement tabular. I will concentrate on the dangers of scalar UDFs, as they are the most likely to be used innocently in ways that can compromise performance.
A scalar UDF takes 0 to n parameters and returns a single scalar value. Often, this type of function is used to perform complicated mathematical transformations on input data. Other common uses include encapsulating a very complicated logical test so that a WHERE clause can be shortened and repeated uses can be consolidated into one place. There are various limitations on what types of operations you can perform in the function, but you have the ability to: call other internal functions, UDFs and external stored procedures; use the SELECT statement; and use looping and branching constructs. This is a lot of power, even restricted as it is.
Here’s an example you can try in a SQL Server 2000 database to illustrate how a scalar UDF is created and invoked:
CREATE FUNCTION fn_null(@id int) RETURNS int BEGIN RETURN @id END GO CREATE TABLE #Tmp ( id int NOT NULL PRIMARY KEY, value int NOT NULL ) GO DECLARE @i int SET @i = 1 WHILE (@i < 1000) BEGIN INSERT INTO #Tmp (id, value) VALUES (@i, @i+1000) SET @i = @i + 1 END GO SELECT COUNT(*) FROM #Tmp t1 CROSS JOIN #Tmp t2 WHERE t1.id < t2.id GO SELECT COUNT(*) FROM #Tmp t1 CROSS JOIN #Tmp t2 WHERE dbo.fn_null(t1.id) < dbo.fn_null(t2.id) GO SELECT COUNT(*) FROM #Tmp t1 CROSS JOIN #Tmp t2 WHERE t1.id = 5 GO SELECT COUNT(*) FROM #Tmp t1 CROSS JOIN #Tmp t2 WHERE t1.id = dbo.fn_null(5) GO SELECT COUNT(*) FROM #Tmp t1 CROSS JOIN #Tmp t2 WHERE dbo.fn_null(t1.id) = 5 GO
This large mass of SQL will create a simple UDF, create a temporary table, populate the table, and invoke several sample queries against it. When run with the SQL Server Profiler active, the results are interesting:
|Query||CPU (ms)||Reads||Duration (ms)|
The first query is a simple triangular join and serves as a benchmark. It took about a quarter of a second to execute, which doesn’t seem unreasonable. The second query is passed through two executions of the fn_null() UDF for every one of the million rows returned by the CROSS JOIN. This second query takes around 450 times more CPU power to execute than the first one does, for the exact same resultset. It really isn’t possible to make a UDF that is simpler than this, so it should be quite clear that UDFs that are invoked with nonconstant parameters are a potential source of trouble.
Looking at what happened in more detail: if the optimizer is not confused, you will execute a UDF that appears in the WHERE clause once for every potential record returned by the join that is not eliminated by the sargable predicates in the WHERE clause.
What this means is that if you are able to specify enough sargable predicates to substantially reduce the number of records before the UDFs are evaluated, the UDFs will degrade your performance less. If you cannot (and in the second example query there was no sargable predicate in the WHERE clause) you should be prepared for potentially extreme performance degradation.
Queries 3, 4, and 5 illustrate that the SQL Server optimizer is not completely hopeless, and that a UDF that causes problems when invoked with nonconstant parameters may perform acceptably when invoked with constant parameters. In this case, Query 3 is the benchmark. There is no UDF involved in the WHERE clause, and it performs near-instantaneously. Query 4 introduces the UDF, but with a constant parameter. Performance is the nearly the same as the version without the UDF. For comparison, Query 5 contains the same function, but the parameter is now nonconstant. As you can see, performance is significantly degraded compared to the original version.
Of course, this UDF is about as simple as it gets. If we complicate the UDF a bit such that the optimizer has a tougher time with it, we will see different results:
CREATE FUNCTION fn_null(@id int) RETURNS int BEGIN DECLARE @val int SELECT @val = COUNT(*) FROM Tmp WHERE id != @id RETURN @val + 7 END GO CREATE TABLE Tmp ( id int NOT NULL PRIMARY KEY, value int NOT NULL ) GO DECLARE @i int SET @i = 1 WHILE (@i < 1000) BEGIN INSERT INTO Tmp (id, value) VALUES (@i, @i+1000) SET @i = @i + 1 END GO SELECT COUNT(*) FROM Tmp t1 CROSS JOIN Tmp t2 WHERE t2.value != 1005 GO SELECT COUNT(*) FROM Tmp t1 CROSS JOIN Tmp t2 WHERE t2.value != dbo.fn_null(1005) GO
This example is pretty contrived, and notice that I had to make the table a first-class table rather than a temporary table. This is because one of the restrictions on UDFs is that you cannot access temporary tables. The results of this query are:
|Query||CPU (ms)||Reads||Duration (ms)|
Note that it would be trivial in the case of Query 2 to declare a variable, populate it with the results of your UDF invoked with the constant parameter, and then use the variable in the comparison. This would achieve performance equivalent to Query 1.
Even though the fn_null() function includes a SELECT statement in this case, it is possible that even a function without a SELECT will be unoptimizable and will degrade your performance in a similar manner.
SQL Server user-defined functions are powerful tools, but as with all powerful tools they can be misused. Make sure you examine the performance of your queries closely with the Profiler and the Show Execution Plan option in Query Analyzer, maintain sargability and optimizability, and you will be able to reap the benefits of UDFs without suffering the drawbacks.