SQL Server Performance Case Studies
False starts can waste a lot of time when performance tuning SQL Server-based applications. Here’s a classic example. As the SQL Server DBA, I got a call from our company’s networking department (LAN, WAN management). They wanted me to run a SQL Server Profiler trace on a Bill of Lading document that was initiated at a remote office. Apparently, a manager at the remote site was complaining that producing a Bill of Lading was taking way too long. Apparently, it took from 6-20 minutes to run and print, which everyone at the remote office thought was too long.
The Bill of Lading document they were running was created in-house by our company’s report development staff. It was created using Crystal Reports, and the data was supplied by a SQL Server I managed.
The user’s from the remote office accessed the Bill of Lading printing functionality from the home office over a T-1 line. The document is initiated and printed at the remote office, but it actually is created at the home office on the Crystal Reports Server using data supplied by the SQL Server.
After the manager at the remote office complained about the document taking too long to run, for some reason the networking department was first contacted about the problem. I guess this was because someone thought that it would be the best place to start tracking down this performance problem. So they started by running their own network traces, but they couldn’t find any problems due to the T-1 line. After exhausting all of their options, they decided to contact me, wanting me to run a Profiler trace. At this point I had not heard about the performance problem, and didn’t know the details.
So without knowing much about the problem, I performed a Profiler trace several different times as different Bill of Ladings were created. Each time the Bill of Lading query ran in SQL Server, it took about 3-4 minutes to complete. This accounted for part of the time it took to run the document, but it wasn’t the 6-20 minutes the remote office said the documents took to run.
So the first thing I did was look at the query that was running on SQL Server. It was a query based on a view, and it was sent to SQL Server from the Crystal Report Server whenever the Bill of Lading report was run. The query itself returned a lot of columns, and it had one sort, but the WHERE clause was very simple with a single equality clause. By itself, this query didn’t look too bad.
But since this query was based on a view, my next step was to review the view. As I mentioned earlier, the report was developed in-house by our report development staff. I don’t get involved in report development, so I rarely ever take the time to review the views they create. I wish I could review every view and query produced by our report development staff, but the produce hundreds of reports a year, and I am only brought in for special problems.
At first glance, the SELECT statement used in the view was huge. It involved two multiple-join SELECT statements with complex WHERE clauses that were combined with a UNION statement. Without getting into much detail, my first assumption was that the view’s complexity was the biggest problem, but it wasn’t.
Because the view was so complex, I didn’t relish spending a lot of time reverse engineering it to see how it worked. So what I did first was to run the query that used the view to see its results. Again, this is the query that is sent from the Crystal Reports Server to SQL Server to return the required rows. After doing this, it was immediately apparent what the problem was. Can you guess what it was?
If you are not familiar with Bill of Lading documents, they are generally short documents (1-2 pages) that describe all of the product orders that are being shipped at a particular time. But when I ran the query, I found out that over 11,000 rows were returned, which of course did not make a lot of sense. At this point I contacted the developer of the report to find out why they were returning over 11,000 rows if they only needed a handful of them.
Here was the developer’s response. First, he explained to me how Crystal Reports Server worked, and I was not familiar with this. He said that when a Bill of Lading document needs to be produced, the person responsible for creating the document specifies what product orders are to appear in this specific bill of lading document. When the bill of lading document runs, several things happen. One of these is that the query I described above is run. What this query does is to return every product order ever made (over 11,000 at this time) to the Crystal Reports Server. Once the Crystal Reports Server has this data, then it runs its own internal query on this data to retrieve only the needed rows for this particular bill of lading (this was taking several minutes). So in other words, every time a bill of lading document was created anywhere in the company, which happens hundreds of times every week, every product order every created (since the system was installed) is returned.
Now, I don’t think it takes a really smart person to figure out that this is not a good thing. Not only does this cause the bill of lading document a long time to be created and printed, it puts an unnecessary load on the SQL Server. Fortunately, our SQL Server has lots of CPU cycles to spare, and this problem had yet to become evident, otherwise I would have noticed a big problem by now.
Now comes the kicker. I asked again to the developer why they returned all of the rows, as I hadn’t heard any good reason in the previous explanation. The answer was because they did not know of a good way to narrow down the number of rows. I suggested that why don’t they use a date in the WHERE clause. He said he wasn’t familiar with how SQL Server used dates, so they left it up to the Seagate Info Server to narrow down the number of rows. And it was this reason that they had written a query that always returned every product order, not just the current ones. So I photocopied several pages from a book on Transact-SQL and gave it to him, which showed him how to use dates in a SQL Server WHERE clause.
If you are familiar with this website at all, you would know that one of the most basic of SQL Server performance tuning tips was broken here. And that is to never return more data that you need. Make sure you have effective WHERE clauses.
As you can imagine, after a WHERE clause was added to the query that returns the Bill of Lading information, it now only takes just moments to process. [7.0, 2000, 2005] Added 4-11-2001
A developer came to me with a stored procedure that was taking forever (over 30 minutes) to run. The SP had a UNION ALL as part of the code, and what was strange is that if you ran each query separately (on either side of the UNION), individually they ran very fast. But when run as a UNION, the query ran forever.
My first instinct was to look at the execution plan of each SELECT statement individually, and then of the entire statement, looking for any obvious issues. There weren’t any as the execution plan looked good.
One thing I did notice was that parallelism was used by SQL Server to run the query. The server has 4 CPUs on it. This was true whether the two separate queries were run individually, of if they were run together as part of a UNION.
Past experience has told me that parallelism and SQL Server don’t always go well together. So my next step was to add this hint to the bottom of the SP:
OPTION (MAXDOP 1)
This query hint is used to tell the Query Optimizer to not use parallelism in this particular query. The number “1″ in the hint specifies that only one CPU is to be used to run the query.
After adding the hint, the query went from taking over 30 minutes to run to less than 2 seconds. Obviously, this query raised a parallelism bug (related to the UNION) which was causing the performance issue, but the hint turned off the parallelism bug, fixing the problem. The lesson to be learned here is that parallel queries aren’t always faster than non-parallel queries. [7.0, 2000, 2005] Added 3-27-2003