An Interview with the Authors of SQL Performance Tuning
And that leaves the SQL code. It’s always important to get it right (though it’s never the only thing to get right). It’s also the one thing that involves transferable knowledge. Over and over, I’ve found that SQL database engines have more in common than people assume. Because practical DBAs and application programmers become specialists, they may fail to realize that. And that’s why we keep on having to explain this: “look, you can get quicker and bigger gains by upgrading hardware or tweaking settings, but what you learn about improving SQL code will work for a longer time, and will work in more than one place.” So it’s what users should know.
TRUDY: Your question reminds me of a fable, called The Tale Of The Vital Organs And The Asshole. One day the important body parts were quarrelling about who was most important. The heart noted that it distributed the vital oxygen everywhere, the lungs mused that the distribution job would be useless if they didn’t supply the oxygen in the first place, the brain interposed that what was vital in a complex organization was leadership (ahem), the liver and limbs and many other parts put forward their own claims. And then … the one body part that everybody had forgotten about, and which was held in least respect, went on strike. Within a short time, all the vital organs conceded that it was the component which had the most significant effect. Moral: it’s what can cause the bottleneck that’s important, not what does the most work.
What do you consider are the top five performance-related mistakes SQL developers make when writing their code?
TRUDY: I can’t think of any mistakes that people make when writing their code. However, here are five errors that occur before or after writing code.
1. Operating with a vague maxim like “indexes are good for queries but bad for updates”. This on-the-one-hand-on-the-other-hand thinking is fuzzy — it shows you need to perform more research before you can decide about any index questions.
2. Using EXPLAIN (Editor’s Note: Query Plans in SQL Server) early. The DBMS can tell you how good your choice is, but it can’t list all the possible choices for you. So really, we think that optimizing has nothing to do with explain, monitoring, and testing. Those are merely essential analysis. To optimize, it’s important to learn a principle or two first.
3. Doubling an estimate. For example, suppose you have a million street addresses and they’re all shorter than 30 characters. So you define the relevant column as VARCHAR(60) to be on the safe side. There are scenarios where this can degrade performance, such as during memory allocation for sort tags.
4. Overusing good but complex things. For example, it’s generally agreed that stored procedures, indexes, and partitions are good. But programmers sometimes end up using them as mantras, like moving something into a stored procedure just to see if it helps.
5. Transferring simple ideas from other contexts. For example, any programmer will, at an early stage, learn a language that has elegant loops in it. What could be more natural than to think up “for each row, if this, do that” — i.e. cursor manipulation from the client — when this is the normal way for a programmer to think?
Are these really “the top five”? Are they really in order? Heck, I don’t know. I’m sure everybody would have an easy time debunking a list like this, and as soon as they do, I’d probably change my mind and agree with them. The fact is, everybody has their own experiences, and I haven’t seen any surveys which will summarize for me what everyone else’s experiences are.
While most of your book focuses on SQL code, you also address the issue of indexes. Can you offer several tips and suggestions on how to get the most out of indexes?
1. Learn what the selectivity is.
2. Learn what clustered indexes are.
3. Learn what covering indexes are.
4. Learn what collations or linguistic sort orders are.
5. Learn what the alternatives to indexes are.
Some developers are tasked with creating SQL code that has to run on multiple database platforms. What SQL performance tuning tips and suggestions do you have for them?
PETER: I hold a minority view — that portable code is good and is becoming more possible. Adherence to the SQL Standard is getting better among the major vendors. (I wrote a survey about that which you can catch at dbazine.com/gulutzan3.html.)
On the back end, there are some accepted ways of doing things for relational databases, so even the “physical” aspects are often the same regardless of the DBMS. (For example, Microsoft SQL Server and IBM DB2 both use ARIES-protocol logging.) However, there are obviously some irritating areas where compatibility is unattainable. In the book, we tried to list all the gotchas that can surprise, say, an InterBase lover who also has to do things for Ingres. But in a general way, if you must be portable: write some “if it’s SQL Server do this but if it’s Oracle do that” code, write stored procedures in a language other than SQL/PSM, and keep track of progress in standard SQL.
By “standard SQL”, by the way, I don’t necessarily mean the common denominators of SQL-92 Entry Level. The SQL:2003 specification is important too, as a roadmap that suggests what will appear in more than one DBMS’s future versions.