Bad logical database design results in bad physical database design, and generally results in poor database performance. So, if it is your responsibility to design a database from scratch, be sure you take the necessary time and effort to get the logical database design right. Once the logical design is right, then you also need to take the time to get the physical design right.
Both the logical and physical design must be right before you can expect to get good performance out of your database. If the logical design is not right before you begin the development of your application, it is too late after the application has been implemented to fix it. No amount of fast, expensive hardware can fix the poor performance caused by poor logical database design. [2000, 2005, 2008] Updated 2-3-2009
*****
One strong indicator of poor database design is the use of a lot of SELECT DISTINCTs or UNIONs (not UNION ALL) in your queries. A properly designed database often negates the need for using either of these statements.
Besides this, using SELECT DISTINCT or UNION uses up a lot of resources, that could be better spent elsewhere speeding up your SQL Server. [2000, 2005, 2008] Updated 2-3-2009
*****
Following standard database normalization recommendations when designing OLTP databases can greatly maximize a database’s performance. Here’s why:
- Helps to reduce the total amount of redundant data in the database. The less data there is, the less work SQL Server has to perform, speeding its performance.
- Helps to reduce the use of NULLS in the database. The use of NULLs in a database can reduce database performance, especially in WHERE clauses.
- Helps to reduce the number of columns in tables, which means that more rows can fit on a single data page, which helps to boost SQL Server read performance.
- Helps to reduce the amount of Transact-SQL code that needs to be written to deal with non-normalized data. The less code there is, the less that has to run, speeding your application’s performance.
- Helps to maximize the use of clustered indexes, the most powerful and useful type of index available to SQL Server. The more data that is separated into multiple tables because of normalization, the more clustered indexes that become available to help speed up data access.
- Helps to reduce the total number of indexes in your database. The less columns tables have, the less need there is for multiple indexes to retrieve it. And the fewer indexes there are, the less negative is the performance effect of INSERTs, UPDATES, and DELETES.
[2000, 2005, 2008] Updated 2-3-2009
*****
If normalizing your OLTP database forces you to create queries with many multiple joins (4 or more), you may want to consider denormalizing some of the tables in order to reduce the number of required joins. Denormalization is the process of selectively taking normalized tables and re-combining the data in them in order to reduce the number of joins needed to produce the necessary query results. Sometimes the addition of a single column of redundant data to a table from another table can reduce a 4-way join into a 2-way join, significantly boosting performance by reducing the time it takes to perform the join.
While denormalization can boost join performance, it can also have negative effects. For example, by adding redundant data to tables, you risk the following problems:
- More data means SQL Server has to read more data pages than otherwise needed, hurting performance.
- Redundant data can lead to data anomalies and bad data.
- In many cases, extra code will have to be written to keep redundant data in separate tables in synch, which adds to database overhead.
As you consider whether to denormalize a database to speed joins, be sure you first consider if you have the proper indexes on the tables to be joined. It is possible that your join performance problem is more of a problem with a lack of appropriate indexes than it is of joining too many tables.
Before you decide to denormalize a properly normalized database, be sure you thoroughly consider all of the implications and test performance both before and after you denormalize to see if your efforts have really bought you anything. [2000, 2005, 2008] Updated 2-3-2009
*****
When you are first designing your application, don’t automatically assume you have to de-normalize your data. Your first goal in the design process should be to normalize your data. Next, you can test your design with realistic data and transactions. At this point, if you see that de-normalization will help, then by all means do so. But don’t assume you need to de-normalize data until you can prove (through testing) that de-normalizing is the right thing to do. [2000, 2005, 2008] Updated 2-3-2009
*****
If a null is stored in a fixed-length column, then the entire column (even though it is null) still takes up the same amount of fixed space as established when the column was created. For example, a CHAR(100) field takes up 100 characters of storage, even if there is a null in the column.
To prevent the above “data explosion” problem, you need to change your table design tactics. You need to make your columns that hold many nulls variable-length instead of fixed-length, assuming the length of the column varies considerably. Of course, if the column length is not great, or if nulls to be stored are few, then using fixed-length data types will work fine. [2000, 2005, 2008] Updated 2-3-2009
*****
One option database designers have is to put all of their data for their application in a single large database, or to place their data in multiple smaller databases. Generally, if the data is highly similar and used the same way (e.g. the database is used only for OLTP, not OLAP), then it is better to use one large database rather than many smaller databases.
For example, let’s say you have customers all over the world and you are designing an OLTP application. Except for the fact that your customers are spread throughout the world, the application works the same for all of them. Under this scenario, it would be more efficient to use one database instead of creating separate smaller databases for each country or region of the world. The biggest reason you would want one database instead of multiple databases is to take advantage of SQL Server’s ability to cache data, such as the data used in lookup tables.
If you use multiple databases instead of a single database, then each database would have to cache the same lookup tables, over and over for each database. The same applies to common stored procedures used in each database. This would present an inefficient use of SQL Server’s memory and could potentially reduce performance. [2000, 2005, 2008] Updated 2-3-2009
*****
If you are designing a database that potentially could be very large, holding millions or billions of rows, consider the option of horizontally partitioning your large tables. Horizontal partitioning divides what would typically be a single table into multiple tables, creating many smaller tables instead of a single, large table. The advantage of this is that is generally is much faster to query a single small table than a single large table.
For example, if you expect to add 10 million rows a year to a transaction table, after five years it will contain 50 million rows. In most cases, you may find that most queries (although not all) queries on the table will be for data from a single year. If this is the case, if you partition the table into a separate table for each year of transactions, then you can significantly reduce the overhead of the most common of queries.
Taking advantage of horizontal partitioning requires early planning as the various queries you develop for your application need to be designed to take advantage of your database’s design. It is much easier to implement horizontal partitioning in the early design stage than it is to implement after your application has been put into production. [2000, 2005, 2008] Updated 2-3-2009