An Exclusive Interview with Author Fernando Guerrero On How to Get the Most of ADO.NET and SQL Server
What are the biggest mistakes most developers make that negatively affect performance of their SQL Server-based applications? What can they do to avoid these mistakes?
In my opinion, the biggest mistake that most developers make when interacting with SQL Server is the “SELECT * FROM myTable” syndrome. If there’s no WHERE clause, something doesn’t smell right. And If there’s an “*” you are retrieving too much. Developers should only retrieve the columns they need from the rows they need to use at any given time. Another great problem is the misunderstanding of JOIN strategies. I’m always amazed by the overuse of LEFT OUTER JOIN. Whenever I see one of them, I ask the developer: “Are you really sure this is what you want to retrieve?” And what about badly-formed queries resulting in unexpected CROSS JOIN queries? A good understanding of SQL language would help them tremendously.
Another common problem is the use of costly server side cursors, avoiding efficient set operations; often due to lack of knowledge about the way relational databases work. Whenever I see a stored procedure with a Transact-SQL cursor, I try to remove it by using more efficient plain-vanilla set operations. In the last 9 years working with SQL Server, I found just a handful of cases where using cursors were perhaps the only solution to those particular problems.
Concurrency problems are another common source of “performance” problems. You’d be surprised about how many times a connection is waiting only for locked resources to become available. The user perceives this as a performance problem, when actually the server’s CPU is almost idle, just waiting for blocking locks to release their precious resources. Applying the right isolation level is key in this topic.
Security is another common missing point among developers. “If it doesn’t run as sa, it doesn’t run at all.” Have you ever heard this comment? How many database applications connect to SQL Server as sa and no password? Hackers know this, and they are extremely happy with this common practice.
However, perhaps the biggest mistake overall is lack of understanding about the data, its structure, and the way this particular database system manages data. I did mention earlier that a DBA should understand how ADO.NET works, and for the same reasons, a client side or middle layer programmer should have as much database knowledge as possible.
When using ADO.NET, what are some tips that they can follow in order to get the best performance out of their application and SQL Server?
It is difficult to mention here all possible answers to this question, but I could mention a few important ones.
Use the SQL Server .NET Data Provider to connect to SQL Server. This simple advice will provide a tremendous performance boost. If you need connecting to other data sources, you could be tempted to use a more general provider, such as the OLE DB .NET Data Provider, but what you’ll gain in flexibility you’ll loose in performance.
Try to use the ExecuteNonQuery method as much as possible, with SQLCommand objects, as this is the most efficient way to execute queries from ADO.NET. Use output parameters with SQLCommand objects if you need to retrieve just a few values, or a single data row, instead of using more expensive techniques, such as a SQLDataAdapter, a SQLDataReader, or a strongly typed DataSet.
Consider a DataSet only as a MINI-database, defined as a TINY collection of related data that serves a particular purpose. If any command that fills a dataset doesn’t have a WHERE clause, think twice about it. Data will need to traverse the network from the data server to the client, and there’s always limited bandwidth available. And the more data you retrieve, the more overhead for your database server, and the more concurrency problems you might experience.
“Get late, release early” or “Connects as late as possible, read as fast as possible, and disconnect as soon as possible.”
Can you tell us a little bit about how the new native access to SQL Server, via TDS, benefits performance? Why wasn’t this option available in previous versions of ADO?
Direct TDS access was available in previous data access techniques, in both ODBC and OLE DB. The OLE DB SQL Server Provider and the ODBC SQL Server Driver were both libraries designed to communicate natively with SQL Server via TDS, because this is the way that SQL Server uses natively to communicate with the outside world. The problem was that you needed to program directly the OLE DB Provider for SQL Server (SQLOLEDB) or the ODBC Driver, and they were designed by C++ programmers for C++ programmers. Using RDO or ADO was easier, but it represented an extra layer, and that meant more overhead and lower performance. Still, many programmers prefer an easier interface (ADO) over performance (native SQLOLEDB).
Also, the new provider runs in managed code, so there is better code control, fewer memory leaks, and eventually better error checking.
Another advantage of the new provider is the fact that the data access developer team is part of the same group as the SQL Server developer team, at last, so both development teams have a better understanding on how to optimize access to SQL Server from the client side.
For anyone curious about the different way that the new provider works, I’d recommend using Profiler to trace what actually arrives to SQL Server and try a few common things, with both ADODB and the SQL Server .NET Data Provider, and you will see a more natural translation of commands into Transact-SQL, with not many surprises for a DBA.