Super Sizing Columns in SQL Server

The MAX Specifier increases the maximum storage capabilities of the VARCHAR and VARINARY data types up to 2^31-1 bytes and up to 2^30-1 bytes for NVARCHAR.  VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) are collectively called large-value data types.  The MAX Specifier is in effect, a very large variable length column.  Although the maximum size of the MAX specifier is approximately 2GB, the size is actually the maximum size that SQL Server supports.  This means that in future versions of the product if the maximum size supported increases, the MAX specifier will automatically be able to support the size increase without any modifications.  To use the MAX Specifier the word MAX is used in place of a size when you define a column or variable.  In the following example the LargeColumn in the LargeDataType table is created using the VARCHAR data type and the MAX Specifier as the size.

CREATE TABLE dbo.LargeDataType
 (
  LargeColumn VARCHAR(MAX)
 )

There are two ways in SQL Server that columns that are defined with the MAX Specifier may be stored.  They can be stored either in a page with the other columns in a row or alternatively off-page.  When the MAX Specifier is used for a column, SQL Server uses its own algorithms to determine whether to keep the value in line within the physical record or store the value externally to the base record and to keep track of the value by using a pointer.  SQL Server will store the data as either a VARCHAR, NVARCHAR, VARBINARY or as a Large Object (LOB).  If the length of the column is less than or equal to 8000 bytes, SQL Server will store the data in-page and where it is greater than 8000 bytes SQL Server will check the row size to determine the appropriate storage.  If the row size is less than the size of a page (8060 bytes), the data values will be stored as in-row data whereas if the row size is greater than 8060 bytes, the data values are stored as LOB data with only a 16 byte pointer stored in the row.  It is possible to override this default behaviour by using the new table option called large value type out of row so that columns defined with the MAX Specifier are always stored as a LOB.  The following example illustrates how this option can be enabled by using the system Stored Procedure sp_tabeloption.

EXEC sp_tableoption
‘dbo.LargeDataType’, ‘large value types out of row’, 1

When the option is set to 1 (enabled) the data in columns that have been defined with the MAX specifier will always be stored out of row as a LOB with only a 16-byte text pointer stored in the row.   Text pointers point to the root node of a tree built of internal pointers that map to the pages in which string fragments are actually stored.

The advantage of storing large-value data types in-row with the other columns in a table is that SQL Server can return a single row with only one I/O operation.  If the bulk of SQL Server statements do not return large-value data type columns then the data should be stored out of row.  This allows for a greater number of rows to be stored on a data page allowing a greater number of rows to be returned for each I/O operation.

Unlike TEXT and NTEXT data types that store the data off page, it does not matter where the data is stored for columns defined with the MAX Specifier.  As no matter where the data is stored the column can be treated as a standard variable length data type.  Hence, different operators do not need to be used and all of the standard operators can be used.  This means that in SQL Server, there is now a unified programming model for working with regular types and large objects.  The restrictions that previously existed for the use of TEXT and NTEXT as variables in Stored Procedures and Functions also no longer exists with large-value data types.   

Continues…

Leave a comment

Your email address will not be published.