Data Compression in SQL Server 2008
SQL Server 2008 provides two types of compression namely, Database Backup Compression and Data Compression. We have already discussed the Database Backup Compression feature in my article titled How to configure and Use Database Backup Compression in SQL Server 2008.
Data Compression was first introduced in SQL Server 2005 with Service Pack 2 where Microsoft introduced a new storage format called vardecimal to store decimal and numeric data types as variable length columns. In SQL Server 2008 this concept is applicable even to fixed length data types such as integer, char and float. The main advantage of using Data Compression is that it reduces the storage cost and increases the query performance by reducing I/O. Database Administrator needs to be very careful before using Database Compression as it is more CPU intensive activity. If you are not facing CPU crunch and you have more I/O based activities then it is a good idea to implement data compression, which can give you better query performance.
SQL Server 2008 supports Row Level and Page Level Compression for tables and indexes. Data Compression is supported for the following database objects:
- Heap Table (a table which has no clustered index or non clustered index)
- Clustered Table
- Non Clustered Table
- Indexed View
- Partitioned Tables and Indexes (compression option needs to be configured for each partition)
The important point to be noted is that the compression setting of a table is not automatically applied to its nonclustered indexes. Compression settings on each and every index need to be set separately. Using the CREATE TABLE or CREATE INDEX Data Definition Language (DDL) statement you can create compression on a table or index. In order to change the compression state of a table, index or a partition you need to use the ALTER TABLE or ALTER INDEX Data Definition Language (DDL) statements.
Row Level Compression
Row Level compression extends the vardecimal storage format by storing all fixed-length data types in a variable length storage format. When you are enabling Row Level Compression you are only changing the physical storage format of the data which is associated with a data type. There is absolutely no change required at the application level once a DBA enables Row Level Compression on one or more tables within a database.
Now let us take a small example of a table which has a CHAR (100) column. In the fixed length format, this column will take 100 characters irrespective of the actual value stored in it. For example if you are storing “Row Level Compression” or “Row Level Compression Feature” value, then each value will take 100 character space to store the data. However, when CHAR (100) is stored in a variable length format, then it will only take 21 characters for storing the first value (“Row Level Compression”) and 29 characters for storing the second value (“Row Level Compression Feature”). From the above example you can see that you were able to reduce the storage size of the first value by 79% and for the second value by 71%. This is a significant amount of valuable space which you have managed to save using variable length storage format. Another advantage of using Row Level Compression feature is that it does not take any disk space for zero or null values.
Page Level Compression
In the earlier versions of SQL Server every value will be stored in Page regardless whether it has already appeared in the same column for some other row within a Page. In SQL Server 2008 the redundant or duplicate value will be stored only once within the page and the value will be referred in all other occurrences, this is called Page Level Compression. Page Level Compression is a basically a superset of Row Level Compression and it will only occur when the Page is full to optimize the performance. Page Level Compression also uses Column Prefix and Dictionary Compression along with Row Level Compression. For example, if a table is partitioned using a column prefix, then all the data within a specific partition will have same or similar prefix. Let’s say that the column stores values like “Memory”, “Memorably” and “Memorizer” within a page for different rows, then the SQL Server 2008 Storage Engine will store the value “Memor” only once in the Page and will refer to this value for all the other occurrences.
Different ways to Estimate Savings in Storage Space
In SQL Server 2008 there are two ways to estimate the savings in storage space for tables and indexes. The first method is to use sp_estimate_data_compression_savings inbuilt system stored procedure. And the second method is to use the Data Compression Wizard. The other important thing which DBA needs to see is whether the existing data is fragmented; you will be able to reduce the size of index by rebuilding it instead of using compression feature. However, Data Compression feature is available only in SQL Server 2008 Enterprise Edition and Developer Edition.