Tips on Optimizing SQL Server Database Design
When designing columns to store formatted data, such as Social Security numbers or phone numbers, you have two choices. You can choose to store the data with no formatting or with formatting. Each choice has its pros and cons. If you store the data with formatting, then your disk space requirements will be slightly higher than if you don’t store the data with formatting. If you store the data without formatting, then you will have to add or remove formatting each time you INSERT or UPDATE a record, which adds to CPU and memory overhead on the server. In other words, the choice you make affects your server’s load.
Which option you should choose in order to maximize performance is not always obvious, but many database designers choose to store formatted data because they feel the savings in CPU time (because formatting isn’t performed on the fly) is more important than saving some space on the disk drive. You will have to closely examine your application to determine, in your particular case, which option will save your application the most overhead. [2000, 2005, 2008] Updated 2-3-2009
If you expect your database to be accessed simultaneously by many different users, consider what would happen to your database’s scalability and performance if you designed your database with just a few tables holding virtually all the data. For example, say you are designing a sales-based application that could potentially have over 10,000 new rows inserted into it every day. And also assume that as many as 1% of these rows would need to be updated or deleted daily. And also assume that as many as 100 people could be accessing the same table at the same time. Can you see the problem this could cause? As a rule of thumb, design your database to minimize potential user contention. This will both boost scalability and performance of your application. [2000, 2005, 2008] Updated 2-3-2009
SQL Server’s data pages are 8K (8192 bytes) in size. Of the 8192 available bytes in each data page, only 8060 bytes are used to store a row. The rest is used for overhead.
So how does this information affect database design and the performance of your application? To optimize SQL Server performance, you should design your rows in such a way as to maximize the number of rows that can fit into a single data page. The more densely rows are stored in data pages, the less I/O SQL Server has to perform when reading data pages from disk, and the more rows it can squeeze into the buffer. The more rows that you can fit into the buffer increases the likelihood that the data you need is in the buffer and not on the disk, saving even more valuable I/O resources.
For example, if you have a row that was 4031 bytes long, then only one row could fit into a data page, and the remaining 4029 bytes left in the page would be empty. (Keep in mind that a row can be up to 8060 bytes, and half of this is 4030 bytes.) This is a great waste of space which can affect the I/O performance of the server.
If you run into a situation like this, try to redesign the row, if possible, so that the row is 4030 bytes or less. This way you would get two rows in each page and I/O performance would be greatly enhanced. This not only applies to this particular example. It can also apply to cases where you are trying to fit three, four, or more records efficiently into a single data page. [2000, 2005, 2008] Updated 2-3-2009
By default, TEXT, NTEXT, and IMAGE data is stored separately from the rest of the data in a table. The table itself (in the appropriate columns) contains a 16-bit pointer that points to separate data pages that contain the TEXT, NTEXT, or IMAGE data. The reason this is done is to enhance performance.
For example, if these datatypes hold large quantities of data, and they were stored in a table with the rest of the data, this would require SQL Server to work harder when accessing the table. The larger the table, the more I/O SQL Server has to perform in order to accomplish its tasks.
But if the TEXT, NTEXT, and IMAGE data is stored separately from the table, then SQL Server uses less I/O to access it, assuming that the TEXT, NTEXT, or IMAGE data is not being accessed in the current operation. On the other hand, if the TEXT, NTEXT, and IMAGE data does need to be accessed, SQL Server has to go to extra work to retrieve it because it has to locate the pointer in the table, and then find the necessary data from the data pages. Assuming the TEXT, NTEXT, and IMAGE data does not have to be accessed frequently, this is a good performance tradeoff.
But in some cases, storing these datatypes away from the table (in their own data pages) is not the most efficient method. This is especially true if the TEXT, NTEXT, or IMAGE data is not large and if the data is frequently accessed. In this case, it is often better to store this data in the table itself, instead of separate data pages.
With SQL Server, you have the option to store smaller TEXT, NTEXT, and IMAGE data directly in a table instead of in separate data pages. If you want to turn this option on, you will have to set the “text in row” table option to permit this. For example:
sp_tableoption ‘tablename’, ‘text in row’, ‘on’
sp_tableoption ‘tablename’, ‘text in row’, ‘size’
Where ‘size’ is a value ranging from 24 to 7,000 bytes.
Using this option in the right circumstances can greatly help your application’s performance. On the other hand, if used incorrectly, it can hurt performance. [2000, 2005, 2008] Updated 2-3-2009
If you have been using the “text in row” option in SQL Server, but have now decided to no longer use it and to turn it off, SQL Server will automatically move any TEXT, NTEXT, and IMAGE data from regular data pages to a new area in the database where other similar data is stored, leaving only 16 byte pointers behind in the original data page. Depending on the amount of data that was stored using this “text in row” option in data pages, it may take a while for SQL Server to move all of the data. You may want to schedule such a task during a time when SQL Server is not too busy. [2000, 2005, 2008] Updated 2-3-2009
Ideally, all tables should have a primary key in order to enforce entity integrity. This of course requires the use of a unique index. One method that is commonly used by database designers is to add an identity column to each of their tables in order to ensure uniqueness. In many cases, this is a good idea. But, if the rows in a table are naturally unique, then the use of an identity column is not needed. In fact, doing so can reduce the performance of your database because it may require the use of unnecessary joins, and because it uses up data pages unnecessarily, which requires SQL Server to perform more I/O to read your table.
So don’t automatically add identify columns to your table. Only add them if there is no column (or columns if you want to create a composite primary key) in your table that can guarantee uniqueness. [2000, 2005, 2008] Updated 2-3-2009
If your plan is to use replication as part of your SQL Server-based application, and your goal is high scalability and performance, then this factor should be considered when the database is first designed. The fact that you will be using replication, and the type of replication you plan to use, can directly affect how your databases should be designed. [2000, 2005, 2008] Updated 2-3-2009
Don’t reinvent the wheel when it comes to database design; in other words, take advantage of SQL Server’s built-in features to enforce data integrity instead of writing your own. These built-in integrity features use much less overhead and perform faster than the ones you can create.
Examples of the built-in SQL Server integrity features you should take advantage of include the CHECK, DEFAULT, NOT NULL, PRIMARY KEY, UNIQUE, and FOREIGN IDENTITY constraints. They also include IDENTITY and TIMESTAMP columns. You can also take advantage of the ability to create your own user-defined data types.
Not only are these built-in data integrity features faster than ones you create, they also enforce these rules completely and consistently. [2000, 2005, 2008] Updated 2-3-2009
The Query Optimizer can use check constraints to help speed up query processing in some cases. For example, if you have a check constraint on a column of the integer data type that specifies that the lowest allowable number is 1,000 and that the highest allowable number is 10,000, the Query Optimizer can use this information to help it make better decisions when optimizing the query and selecting the correct index(es) to use.
Consider adding check constants to all columns where they are applicable. This not only helps to ensure “proper” data is entered into your tables, in can boost overall performance of some queries. [2000, 2005, 2008] Updated 2-3-2009
Generally, most SQL Server-related development is done on a “development” SQL Server box, then moved to a “testing” SQL Server box, and then is eventually moved to a “production” SQL Server box. One step you can take to help ensure that your new application will perform as expected once it goes into production is to perform testing before before moving any code (in the application or the database) from one box to another. Doing so helps to prevent unexpected performance surprises.
For example, you run a query on the development box and notice that it performs at sub-second speed. Because it is fast, you don’t think that it needs performance tuning. But when you move it from the development box to the test box, you notice that the same query now takes 45 seconds. After a little research, you find out that the number of rows hit in each box in the tables is different. In one key table on the development server you discover that is only has 5,000 rows, but the same key table on the test server has 1,600,100 rows. In addition, when you take a look at the query plan for the query in question, you discover that the query is performing a table scan on both servers. So right away, you realize that the query, which appeared to be fast on the development server, is only fast because it only has to scan 5,000 rows. Obviously, the query needs tuning (or an appropriate index added).
Problems such as the one described above can sneak up on you and hurt your application’s performance. That’s why it is very important to do performance testing whenever you move code from one server to another during the development process. [2000, 2005, 2008] Updated 2-3-2009
When you have a choice of using a constraint or a trigger to perform the same task, always choose the constraint. The same goes if you have the option of using either a constraint or a rule, or a constraint or a default. Constraints require less overhead than triggers, rules, and defaults, reducing the load on SQL Server, and boosting overall performance. [2000, 2005, 2008] Updated 2-3-2009
Don’t implement redundant integrity features in your database. For example, if you are using primary key and foreign key constraints to enforce referential integrity, don’t add unnecessary overhead by also adding a trigger that performs the same function. The same goes for using both constraints and defaults or constraints and rules that perform redundant work. While this may sound obvious, it is not uncommon to find these kinds of problem in SQL Server databases. [2000, 2005, 2008] Updated 2-3-2009