Previous to SQL Server 2005, SQL Server supported the concept of computed columns. A computed column is simply a column in a table that is computed from an expression that can include other columns in the table, constants, functions, variables, and most any combination of operators. What many DBAs may not realize is that computed columns are virtual, not physical. In other words, the value for the computed column must be recalculated every time when it is referenced in a query. And in many cases, this is a good thing. Why store a lot of data that either may change a lot, or may not even be used. Isn’t it better to calculate the computed column only when it is needed? In some cases, this is true. But in other cases, might it not be an advantage to actually persist the computed column physically?
One reason why it might be handy to have physically persisted computed columns is that it is sometimes more efficient, resource-wise, to perform the computation once than it is perform it over and over again. Some applications may lend themselves to this type of usage. Another reason why you might want to have a physically persisted computed column is to be able to add an index to it. For example, if you need to query a computed column often, and the computed column is virtual, then SQL Server will have to calculate every computed column in every row, and then perform a scan on all of the results to identify which rows to return. But if there was an index on the computed column, all SQL Server would have to do is to perform a quick index lookup, saving a lot of computing resources.
SQL Server 2005 introduces persisted computed columns, and the ability to index the computed column for those cases where this feature makes sense. So this means that every time any factor that affects the computation of a column is made, such as changing the value of a column that is part of the computation of the computed column, that change is physically made in the database, and if appropriate, the related index is updated. This, as you can imagine, induces some overhead in the database if the table with the persisted computed column is subject to many INSERTs, UPDATEs, or DELETEs. As a DBA, it is your job to weigh the pros and cons of using persisted (and possibly indexed) columns and to determine whether using them or not is beneficial to the overall performance of your application.
To create a persisted column in SQL Server 2005, you use the PERSISTED keyword in either the CREATE TABLE or ALTER TABLE statements. For example:
CREATE TABLE [dbo].[Invoice_Table](
[Invoice#] [int] IDENTITY(1,1) NOT NULL,
[Invoice_Amount] [money] NOT NULL,
[Invoice_Freight] [money] NOT NULL,
[Invoice_Total] AS ([Invoice_Amount]+[Invoice_Freight]) PERSISTED
) ON [PRIMARY]
Now, if I want to add an index to the PERSISTED Invoice_Total column, I do so like this:
CREATE NONCLUSTERED INDEX [Invoice_Total_Idx] ON [dbo].[Invoice_Table]
(
[Invoice_Total] ASC
) ON [PRIMARY]
You can define indexes on computed columns as long as the following requirements are met:
- Ownership requirements: Computed column must have same owner as the table.
- Determinism requirements: Computed column must always return the same value for the same inputs.
- Precision requirements: Computed column expression must be precise.
- Data type requirements: Computed column must not evaluate to text, ntext, image, float, real.
- SET option requirements: The NUMERIC_ROUNDABORT option must be set to OFF, and the following database options must be set to ON: ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER. By default, the ARITHABORT database option is set to OFF, so you must turn it on if you want to create persisted computed columns in a particular database.
If you are aware of any of your databases where computed columns are being queried a lot, and you are running SQL Server 2005, you should seriously evaluate whether or not making the computed column persisted, and indexing the column, will benefit the overall performance of your application. If you try this, be sure you first benchmark query performance before and after the change to see if the change helps more than it hurts performance.
]]>