How can I determine the maximum value for each data? And which option provides the best performance?

One of the most common requirements is to generate a list of, say, customers along with their last order date. Or, generally speaking, a list of distinct values in one table along with a corresponding maximum value from another table. To solve such problems you basically have three different alternatives. Let’s have a look at the methods first, and then discuss their respective performance. All methods are tested against the Northwind sample database.

Alternative 1
SELECT DISTINCT(CustomerID), OrderDate FROM orders t1
WHERE t1.orderdate=
(SELECT MAX(orderdate) FROM orders t2
WHERE t1.customerid=t2.customerid)
ORDER BY t1.customerid
Alternative 2
SELECT DISTINCT (t1.CustomerID), t1.OrderDate FROM orders t1 INNER JOIN
(SELECT customerid, MAX(orderdate) as Maxdate FROM orders GROUP BY customerid) t2
ON t1.customerid = t2.customerid
AND t1.orderdate = t2.MAXdate
ORDER BY t1.customerid
Alternative 3
SELECT DISTINCT(CustomerID), OrderDate
FROM orders t1
WHERE t1.orderdate IN
(
SELECT top 1 t2.orderdate
FROM orders t2
WHERE t2.customerid = t1.customerid
ORDER BY t2.orderdate DESC
)
ORDER BY t1.customerid All methods generate an identical output, which is not shown here. So which one is preferable? A look at the output of SET STATISTICS IO reveals some very interesting insights. (89 row(s) affected) Table ‘Orders’. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0. (89 row(s) affected) Table ‘Orders’. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0. (89 row(s) affected) Table ‘Orders’. Scan count 825, logical reads 130143, physical reads 0, read-ahead reads 0. To say the least, Alternative 3 is not a real option for a high-performance environment. But that’s not really surprising when you take a closer look at what the last method does. For each distinct CustomerID’s in the Orders table, it must scan the Orders table for the latest entry in OrderDate. There are only 89 distinct CustomerID in that table. So, we can rule out Alternative 3. That leaves us with 2 other methods. Is there one method superior to the other? A look at the output of SET STATISTICS IO tells us, both produce equal IO. A look at the execution plan shows that for both methods an identical plan is generated. So, internally, both queries are treated the same way by SQL Server. Since there is no differences between alternative 1 and 2, you might want to use that one which you feel more comfortable with or the one, which “looks better” to you.

]]>

Leave a comment Cancel reply