The major difference between a small development database and the full-sized production database is the statistical data distribution. This leads to different cost estimates and potentially different execution plans. In this circumstance, the developer is effectively blind to potential performance issues until the actual execution plan can be verified on a full-sized database.
A solution to the different execution plan effect would be to transfer the statistics from the production database to the smaller development database. Ideally, this capability should be internal to SQL Server, and readers are encouraged to submit a request for this feature to email@example.com. Since this feature is reportedly in Sybase Adaptive Server Enterprise, someone else has apparently also thought of this. In the interim, an alternative method is presented.
SQL Server StatisticsSQL Server uses cost-based optimization. The key to cost-based optimization is a method to estimate the rows and pages involved in each step of an execution plan. This is the reason SQL Server generates and maintains distribution statistics. Statistics are generated on index keys and can also be generated for columns not indexed. The sysindexes table has an entry for each index and each statistics collection not associated with an index. Each table has an entry in the sysobjects table with an object id unique to the database. The id column in the sysindexes table is the object id that identifies the table. The id and indid column uniquely identifies a row in the sysindexes table. The name column in sysindexes is either the index name or the statistics collection name. Any particular statistics collection can be displayed with the following command:
DBCC SHOW_STATISTICS ( table , target )
The target is either the index name or statistics collection name. An example of the DBCC SHOW_STATISTICS output for an index-based statistics collection is shown below. The first dataset contain general information including the date of the last update, total rows, rows sampled, etc. The second dataset contains the overall average distribution for each key in succession. In this example, the lead key column is eventPlannerID, and the second and last key column is ID. The first row show the overall average distribution information for each distinct value of the first key, and the second row shows the distribution for each distinct value of the first key combined with the second key.
Figure 1. DBCC SHOW_STATISTICS output.
Statistics Transfer ProcessThe process for transferring statistics from one database to another database with the same schema is described as follows:
1) Update statistics on the full-sized database (optional, but recommended).
2) Create a new database on a server with a full-sized version of the source database.
3) Set AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS off.
4) Create users, data types, tables, constraints, clustered indexes (including primary keys) and all other objects except nonclustered indexes.
5) Create tables to hold table and user name to object id mappings between the original database and the new database. Populate the table and user name mapping tables.
6) Create and populate a table with a copy of the sysindexes tables from the original database (Optional).
7) Execute sp_configure to allow updates to system tables.
8) Insert statistics collections not associated with indexes into the sysindexes table of the new database.
9) Create all nonclustered indexes.
10) Update the sysindexes entries for statistics related values on all index rows.