VLDB Performance Tuning and Optimization

Network connections should be of the highest speed, such as 100Mbs Switched Ethernet, or even 1Gbs Switched Ethernet. [6.5, 7.0, 2000, 2005] Updated 1-3-2005


Generally, DBCC commands are not run on a routine basis on VLDB because of the times involved, often greater than a day. Instead, only run them if you suspect possible hardware-related problems may have occurred. [6.5, 7.0, 2000, 2005] Updated 1-3-2005


Use filegroups to divide up the database into units that are relatively easy to backup and restore. It is virtually impossible to manage VLDB backups and restores without judicial use of filegroups. [7.0, 2000, 2005] Updated 1-3-2005


Queries that join two or more very large tables should ideally have each table in a separate file and on a separate physical array, and if possible, a separate I/O controller. [7.0, 2000, 2005] Updated 1-3-2005


Be sure you have the “automatically grow file” and “auto shrink” options turned off on your VLDB. These both take a large performance hit on VLDB. Instead, perform these tasks manually only when needed. [7.0, 2000, 2005] Updated 1-3-2005


An issue that comes up from time-to-time in the land of SQL is searching through very large databases. Have you ever had to deal with 15 tables of relational data containing 3 or 4 million records (or more) of one type or another joined together in a single query? Well it happens, and performance can be horrible. Two immediate performance tuning options most people perform, assuming they haven’t done so already, is to put their complex SQL queries and joins in a stored procedure and to tune the indexes. While these steps are important, they are often not enough to get the performance your users need when running complex queries against huge tables.

So here is where it gets interesting; where the laws of data normalization break apart. What we are forced to do, assuming we want better performance is to take all of our nice, neat, normalized data and denormalize it into more manageable units.

The solution here is to take the data from all the tables needed to produce the results of your complex queries and dump them into one or more tables that more simplified queries can be run against in order to produce the required results. This obviously reduces the number of required joins and greatly increases the performance of the query.

After reading the above statement, your first question may be how this is even possible? As you can imagine, this recommendation does not fit every situation. If your data is in a transaction-style database and your queries must always access up to the second data, then most likely you can’t take advantage of this recommendation.

On the other hand, if your queries can work with data that is an hour, four hours, or 24 hours old (or whatever time period is appropriate for you), then you can. What you can do is to create the appropriate denormalized tables you need to run your one or more heavy-duty queries, and then use DTS or SSIS to move the data from the production tables into these denormalized tables which will be used by the queries instead. Depending on the workload of your server, these tables can be located on the same server, or even a different server, in order to help spread out the workload. And like any tables, you will need to add the necessary indexes that have been optimized to meet the needs of your heavy-duty queries.

Other options you may want to consider as part of your overall denormalization processes is to put the tables in a separate database and setting it to READ ONLY, and dividing up huge tables into smaller tables based on some common theme, such as by year, quarter, or date. This way, if the searches are based on date, they won’t have to search through as much data, further boosting performance.

Even though you have denormalized your data following the above method, you will still want to take every step you can to still optimize your query. Some of these query optimization steps include avoiding cursors (or at least using the fastest cursor possible), only retrieving the data you actually need right now, and encapsulating your queries in stored procedures.

Deciding how to denormalize your data, and keeping it up to date using DTS/SSIS or other means, can be a complex chore, and is beyond the scope of this tip. Unfortunately, this is often the only way to speed up some data retrieval tasks. While you may not want to use this recommendation for all of your slow data retrieval problems, it will surely meet some of them. [6.5, 7.0, 2000, 2005] Contributed by Chris Donnan, ImpressionOne, Added 1-28-2001


No comments yet... Be the first to leave a reply!