Performance Tuning for SQL Server Developers
How to Optimize SQL Server’s Configuration Settings
Another common misconception about tuning SQL Server is that you must fine-tune its various configuration settings in order to get optimum performance. While there was some truth to this in earlier versions of SQL, this is no longer much of an issue, except on the very largest and busiest of servers.
For the most part, SQL Server is self-tuning. What does this mean? It means that SQL Server observes what is running on itself, and automatically makes internal adjustments which, for the most part, keep SQL Server running as optimally as possible given the tasks at hand and the given hardware.
When you perform performance testing on SQL Server, keep in mind that SQL Server can take some time before it adjusts itself optimally. In other words, the performance you get immediately after starting the SQL Server service, and the performance you get a couple of hours later after a typical workload has been running, can be different. Always perform your testing after SQL Server has had a chance to adjust itself to your workload.
There are 36 SQL Server configuration options that can be changed using either the Enterprise Manager or the sp_configure stored procedure. Unless you have a lot of experience tuning SQL Server, I don’t recommend you change any of SQL Server’s settings. As a novice, you may make a change that could in fact reduce performance. This is because when you change a setting, you are “hard-coding” the setting from then on. SQL Server has the ability to change its setting on the fly, based on the current workload. But once you “hard-code” a setting, you partially remove SQL Server’s ability to self-tune itself.
If after serious consideration you feel that making a change to one or more SQL Server configuration settings can boost performance in your particular environment, then you will want to proceed slowly and cautiously. Before you make the setting change, you will first want to get a good baseline on the SQL Server’s performance, under a typical workload, using a tool such as Performance Monitor (discussed later). Then make only one change at a time. Never make more than one change at a time, because if you do, you won’t know which change, if any of them, made a difference.
Once the one change is made, again measure SQL Server’s performance under the same workload to see if performance was actually boosted. If it wasn’t, which will often be the case, then change back to the default setting. If there was a performance boost, then continue to check to see if the boost in performance continues under other workloads the server experiences over time. Your later testing may show that your change helps some workloads, but hinders others. This is why changing most configuration settings is not recommended.
In any event, if your application is suffering from a performance-related issue, the odds of a configuration change resolving it are quite low.
How to Optimize Your Application’s Design
If you are using an n-tier design for your application, and who isn’t for most large-scale applications these days, SQL Server is just one part of a larger application. And perhaps more important than you realize, how your implement your n-tier design affects your application’s performance more than SQL Server itself. Unfortunately, SQL Server often gets more of the blame for poor performance than the application design, even when it is usually the application’s design that is causing most of the performance problems. What I hope to do here is offer a few suggestions that may aide you in your application design, helping to prevent SQL Server from getting all the blame for poor performance. So let’s start.
One of the first steps you must decide when designing an n-tier application is to select the logical and physical design. Of the two, the physical design is where most of the mistakes are made when it comes to performance. This is because this is where the theory (based on the logical design) has to be implemented in the real world. And just like anything else, you have many choices to make. And many of these choices don’t lend themselves to scalability or high performance.
For example, do you want to implement a physical two-tier implementation with fat clients, a physical two-tier implementation with a fat server, a physical three-tier implementation, an Internet implementation, or some other implementation? Once you decide this question, then you must ask yourself, what development language will be used, what browser, will you use Microsoft Transaction Server (MTS), will you use Microsoft Message Queue Server (MSMQ), and on and on.
Each of these many decisions can and will affect performance and scalability. Because there are so many options, it is again important to test potential designs early in the design stage, using rapid prototyping, to see which implementation will best meet your user’s needs.
More specifically, as you design your physical implementation, try to follow these general recommendations to help ensure scalability and optimal performance in your application:
Perform as many data-centered tasks as possible on SQL Server in the form of stored procedures. Avoid manipulating data at the presentation and business services tiers.
Don’t maintain state (don’t store data from the database) in the business services tier. Maintain state in the database as much as possible
Don’t create complex or deep object hierarchies. The creation and use of complex classes or a large number of objects used to model complex business rules can be resource intensive and reduce the performance and scalability of your application. This is because the memory allocation when creating and freeing these objects is costly.
Consider designing the application to take advantage of database connection pooling and object pooling using Microsoft Transaction Server (MTS). MTS allows both database connections and objects to be pooled, greatly increasing the overall performance and scalability of your application.
If your application runs queries against SQL Server that by nature are long, design the application to be able to run queries asynchronously. This way, one query does not have to wait for the next before it can run. One way to build in this functionality into your n-tier application is to use the Microsoft Message Queue Server (MSMQ).
While following these suggestions won’t guarantee a scalable and fast performing application, they are a good first start.
How to Optimize Your Database’s Design
Like application design, database design is very critical to the scalability and performance of your SQL Server applications. And also like application design, if you don’t do a good job in the first place, it is very hard and expensive to make changes after your application has gone into production. Here are some key things to keep in mind when designing SQL Server databases for scalability and performance.
As always, you will want to test your design as early as possible using realistic data. This means you will need to develop prototype databases with sample data, and test the design using the type of activity you expect to see in the database once production starts.
One of the first design decisions you must make is whether the database will be used for OLTP or OLAP. Notice that I said “or”. One of the biggest mistakes you can make when designing a database is to try to meet the needs of both OLTP and OLAP. These two types of applications are mutually exclusive in you are interested in any sense of high performance and scalability.
OLTP databases are generally highly normalized, helping to reduce the amount of data that has to be stored. The less data you store, the less I/O SQL Server will have to perform, and the faster database access will be. Transactions are also kept as short as possible in order to reduce locking conflicts. And last of all, indexing is generally minimized to reduce the overhead of high levels of INSERTs, UPDATEs, and DELETEs.
OLAP databases, on the other hand, are highly de-normalized. In addition, transactions are not used, and because the database is read-only, record locking is not an issue. And of course, heavy indexing is used in order to meet the wide variety of reporting needs.
As you can see, OLTP and OLAP databases serve two completely different purposes, and it is virtually impossible to design a database to handle both needs. While OLAP database design is out of this book’s scope, I do want to mention a couple of performance-related suggestions in regard to OLTP database design.
When you go through the normalization process when designing your OLTP databases, your initial goal should be to fully normalize it according to the three general principles of normalization. The next step is to perform some preliminary performance testing, especially if you foresee having to perform joins on four or more tables at a time. Be sure to test using realistic sample data.
If performance is acceptable, then don’t worry about having to join four or more tables in a query. But if performance is not acceptable, then you may want to do some selective de-normalization of the tables involved in order to reduce the number of joins used in the query, and to speed performance.
It is much easier to catch a problem in the early database design stage, rather than after the finished application has been rolled out. De-normalization of tables after the application is complete is nearly impossible. One word of warning. Don’t be tempted to de-normalize your database without thorough testing. It is very hard to deduce logically what de-normalization will do to performance. Only through realistic testing can you know for sure if de-normalization will gain you anything in regards to performance.
How to Optimize Your Application Code for SQL Server
At some point during the development process you will have to begin coding your application to work with SQL Server. By this time, the application and database designs should have already been completed and tested for performance and scalability using rapid prototyping techniques.
How your code your application has a significant bearing on performance and scalability, just as the database design and the overall application design affect performance and scalability. Sometimes, something as simple as choosing one coding technique over another can make a significant different. Rarely is there only one way to code a task, but often there is only one way to code a task for optimum performance and scalability.
What I want to do in this section is focus on some essential techniques that can affect the performance of your application and SQL Server.
Since I don’t know what development language you will be using, I am going to assume here that you will be using Microsoft’s ADO (Active Data Objects) object model to access SQL Server from your application. The examples I use here should work for most Visual Basic and ASP developers. So let’s just dive in and look at some specific techniques you should implement in your application code when accessing SQL Server data to help ensure high performance.
Use OLE DB to Access SQL Server
You can access SQL Server data using either ODBC or OLE DB. Which method you use depends on how you specify the connection string when you use ADO to connect to SQL Server. For best performance, always select OLE DB. OLE DB is used natively by SQL Server, and is the most effective way to access any SQL Server data.
Along these same lines, when creating an ADO connection to SQL Server, you can either use a DSN in the connection string, or you can use a DSN-less connection. For optimal performance, use DSN-less connections. Using them prevents the need for the OLE DB driver to look up connection string information in the registry of the client the application code is running on, saving some overhead.
Encapsulate your DML (Data Manipulation Language) in Stored Procedures
ADO allows you three different ways to SELECT, INSERT, UPDATE, or DELETE data in a SQL Server database. You can use ADO’s methods, you can use dynamic SQL, or you can use stored procedures. Let’s take a brief look at each of these.
The easiest way to manipulate data from your application is to use ADO’s various methods, such as rs.AddNew, rs.Update, or rs.Delete. While using these methods is easy to learn and implement, you pay a relatively steep penalty in overhead for using them. ADO’s methods often create slow cursors and generate large amounts of network traffic. If your application is very small, you would never notice the difference. But if your application has much data at all, your application’s performance could suffer greatly.
Another way to manipulate data stored in SQL Server using ADO is to use dynamic SQL (also sometimes referred to as ad hoc queries). Here, what you do is send Transact-SQL in the form of strings from ADO in your application to be run on SQL Server. Using dynamic SQL is generally much faster than using ADO’s methods, although it does not offer the greatest performance. When SQL Server receives the dynamic SQL from your ADO-based application, it has to compile the Transact-SQL code, create a query plan for it, and then execute it. Compiling the code and creating the query plan the first time takes a little overhead. But once the Transact-SQL code has been compiled and a query plan created, it can be reused over and over assuming the Transact-SQL code sent later is nearly identical, which saves overhead.
For optimal performance, you will want to use ADO to called stored procedures on your server to perform all your data manipulation. The advantages of stored procedures are many. Stored procedures are already pre-compiled and optimized, so this step doesn’t have to be repeated every time the stored procedure is run. The first time a stored procedure is run, a query plan is created and stored in SQL Server’s memory, so it can be reused, saving even more time. Another benefit of stored procedures is that they help reduce network traffic and latency. When your application’s ADO code calls a stored procedure on SQL Server, it makes a single network call. Then any required data processing is performed on SQL Server, where data processing is most efficiently performed, and then if appropriate, it will return any results to your application. This greatly reduces network traffic and increases scalability and performance.
While stored procedures handle basic data manipulation like a champ, they can also handle much more very well. Stored procedures can run virtually any Transact-SQL code, and since Transact-SQL code is the most efficient way to manipulate data, all of your application’s data manipulations should be done inside of stored procedures on SQL Server, not in COM components in the business-tier or on the client.
When you use ADO to execute stored procedures on SQL Server, you have two major ways to proceed. You can use ADO to call the Refresh method of the Parameters collection in order to save you a little coding. ADO needs to know what parameters are used by the stored procedure, and the Refresh method can query the stored procedure on SQL Server to find out the parameters. But as you might expect, this produces additional network traffic and overhead. While it takes a little more coding, a more efficient way to call a SQL Server stored procedure is to create the parameters explicitly in your code. This eliminates the extra overhead caused by the Refresh method and speeds up your application.
Encapsulate Your ADO Code in COM Components
As part of creating a scalable and optimized n-tier applications, put the ADO code that accesses SQL Server data into COM components. This is true whether your front end is a Visual Basic application or a web-based ASP application. This gives you all the standard benefits of COM components, such as object pooling using MTS. And for ASP-based applications, it provides greater speed because the ADO code in COM objects is already compiled, unlike ADO code found in ASP pages. How you implement your data manipulation code in COM components should be considered when the application is first designed.
When designing COM objects, make them stateless as possible, avoiding the use of properties. Instead, use methods to perform your data-related tasks. This is especially critical if you use MTS, as any objects that preserve state can significantly reduce MTS’s ability to scale, while at the same time, increasing overhead and hurting performance.
For optimum performance, COM objects should be compiled as in-process DLLs (which is required if they are to run under MTS). You should always employ early binding when referencing COM objects, and create them explicitly, not implicitly.
How to Optimize Your Transact-SQL Code
Transact-SQL, just like any programming language, offers more than one way to perform many tasks. And as you might imagine, some techniques offer better performance than others. In this section you will learn some of the “tricks-of-the-trade” when it comes to writing high performing Transact-SQL code.
Choose the Appropriate Data Types
While you might think that this topic should be under database design, I have decided to discuss it here because Transact-SQL is used to create the physical tables that were designed during the earlier database design stage.
Choosing the appropriate data types can affect how quickly SQL Server can SELECT, INSERT, UPDATE, and DELETE data, and choosing the most optimum data type is not always obvious. Here are some suggestions you should implement when creating physical SQL Server tables to help ensure optimum performance.
Always choose the smallest data type you need to hold the data you need to store in a column. For example, if all you are going to be storing in a column are the numbers 1 through 10, then the TINYINT data type is more appropriate that the INT data type. The same goes for CHAR and VARCHAR data types. Don’t specify more characters for character columns that you need. This allows SQL Server to store more rows in its data and index pages, reducing the amount of I/O needed to read them. Also, it reduces the amount of data moved from the server to the client, reducing network traffic and latency.
If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. Although the VARCHAR data type has slightly more overhead than the CHAR data type, the amount of space saved by using VARCHAR over CHAR on variable length columns can reduce I/O, improving overall SQL Server performance.
Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O overhead.
If you need to store large strings of data, and they are less than 8,000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag down performance.
If you have a column that is designed to hold only numbers, use a numeric data type, such as INTEGER, instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value as does a character data type. This helps to reduce the size of the columns, and can boost performance when the columns is searched (WHERE clause) or joined to another column.
Use Triggers Cautiously
Triggers can be a powerful tool in Transact-SQL, but since they execute every time that a table is INSERTED, UPDATED, or DELETED (depending on how the trigger is created), they can produce a lot of overhead. Here’s some tips on how to optimize trigger performance.
Keep the code in your triggers to the very minimum to reduce overhead. The more code that runs in the trigger, the slower each INSERT, UPDATE, and DELETE that fires it will be.
Don’t use triggers to perform tasks that can be performed using more efficient techniques. For example, don’t use a trigger to enforce referential integrity if SQL Server’s built-referential integrity is available to accomplish your goal. The same goes if you have a choice between using a trigger or a CHECK constraint to enforce rules or defaults. You will generally want to choose a CHECK constraint as they are faster than using triggers when performing the same task.
Try to avoid rolling back triggers because of the overhead involved. Instead of letting the trigger find a problem and rollback a transaction, catch the error before it can get to the trigger (if possible based on your code). Catching an error early (before the trigger fires) consumes fewer server resources than letting the trigger roll back.
Don’t Access More Data Than You Need
While this suggestion may sound obvious, it must not be, because this is a common performance-related issue I find over and over again in many SQL Server-based applications. Here are some ideas on how to minimize the amount of data that is returned to the client.
Don’t return more columns or rows of data to the client than absolutely necessary. This just increases disk I/O on the server and network traffic, both of which hurts performance. In SELECT statements, don’t use SELECT * to return rows, always specify in your SELECT statement exactly which columns are needed to be returned for this particular query, and not a column more. In most cases, be sure to include a WHERE clause to reduce the number or rows sent to only those rows the clients needs to perform the task immediately at hand.
If your application allows users to run queries, but you are unable in your application to easily prevent users from returning hundreds, even thousands of unnecessary rows of data they don’t need, consider using the TOP operator within the SELECT statement. This way, you can limit how may rows are returned, even if the user doesn’t enter any criteria to help reduce the number or rows returned to the client.
Avoid Using Cursors
Transact-SQL is designed to work best on result sets, not on individual records. That’s where cursors come into play. They allow you to process individual records. The only problem with individual record processing is that it is slow. Ideally, for high-performing SQL Server-based applications, cursors should be avoided.
If you need to perform row-by-row operations, try to find another method to perform the task. Some options are to perform row-by-row tasks at the client instead of the server, using tempdb tables at the server, or using a correlated sub-query.
Unfortunately, these are not always possible, and you have to use a cursor. If you find it impossible to avoid using cursors in your applications, then perhaps one of these suggestions will help.
SQL Server offers you several different types of cursors, each with its different performance characteristics. Always select the cursor with the least amount of overhead that has the features you need to accomplish your goals. The most efficient cursor you can choose is the fast forward-only cursor.
When using a server-side cursor, always try to fetch as small a result set as possible. This includes fetching only those rows and columns the client needs immediately. The smaller the cursor, no matter what type of server-side cursor it is, the fewer resources it will use, and performance will benefit.
When you are done using a cursor, don’t just CLOSE it, you must also DEALLOCATE it. Deallocation is required to free up the SQL Server resources used by the cursor. If you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you don’t DEALLOCATE your cursors, the resources used by the cursor will stay allocated, degrading the performance of your server until they are released.