Fun with Reporting Queries, Part 1 | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Fun with Reporting Queries, Part 1

Greetings folks,<br /><br />I’m working on some reports for a project and I’ve run into a performance wall. The reporting aspect of this project imposes many restrictions on the construction of queries. The major restriction is that filtering must be done in the WHERE clause, not the ON section of the JOIN clause. Here’s the setup:<br /><pre id="code"><font face="courier" size="2" id="code">begin transaction<br />GO<br />create table a (<br /> k1 int not null — Major group<br /> , k2 int not null — Minor group<br /> , k3 int not null — Object<br /> , k4 int not null — Sub-Object<br /> , d1 varchar(20) null — Description<br /> , constraint PK_a primary key clustered (k1, k2, k3, k4)<br />)<br />create table b (<br /> k1 int not null — Major group<br /> , k2 int not null — Minor group<br /> , k3 int not null — Object<br /> , k4 smallint not null — Year<br /> , d1 money null — Price<br /> , constraint PK_b primary key clustered (k1, k2, k3, k4)<br />)<br />insert a values (1, 1, 0, 0, ‘abcdefg’)<br />insert a values (1, 1, 1, 0, ‘hijklmnop’)<br />insert a values (2, 2, 2, 0, ‘qrstuv’)<br />insert b values (1, 1, 0, 2000, $25)<br />insert b values (1, 1, 0, 2001, $50)<br />insert b values (1, 1, 0, 2002, $75)<br />insert b values (2, 2, 2, 2000, $5)<br />GO<br />set showplan_text on<br />GO<br />select *<br />from a<br /> left hash join b<br /> on a.k1 = b.k1<br /> and a.k2 = b.k2<br /> and a.k3 = b.k3<br />where a.k1 = 1<br /> and a.k2 = 1<br />GO<br />set showplan_text off<br />GO<br />rollback transaction<br />GO</font id="code"></pre id="code">The goal is to select <u>all</u> records from table a in a <u>specific</u> minor group and any corresponding records in table b. (I specified a hash join to simulate the join type that would be chosen for a large number of table rows.) Here is the execution plan generated:<pre id="code"><font face="courier" size="2" id="code"> |–Hash Match(Left Outer Join, <br /> HASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[a].[k1], [a].[k2], [a].[k3])=(<b>.[k1], <b>.[k2], <b>.[k3]), <br /> RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />([a].[k1]=<b>.[k1] AND [a].[k2]=<b>.[k2]) <br /> AND [a].[k3]=<b>.[k3]))<br /> |–Clustered Index Seek(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[db].[dbo].[a].[PK_a]), <br /> SEEK<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[a].[k1]=1 AND [a].[k2]=1 AND [a].[k3]=1) <br /> ORDERED FORWARD)<br /> |–Clustered Index Scan(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[db].[dbo].<b>.[PK_b]))</font id="code"></pre id="code">As you can see, the optimizer chooses to do a table scan on the inner table, even though the WHERE clause specifies values for non-nullable columns in the outer table that are directly joined to non-nullable columns in the inner table. Is there some reason that it must do a scan as opposed to a seek on table b? It seems like "forwarding" the selection criteria to the inner table would be a no-brainer for the optimizer. I’ve spent hours trying to find a solution to this problem, but to no avail. Adding <pre id="code"><font face="courier" size="2" id="code">… and b.k1 = 1 and b.k2 = 1 …</font id="code"></pre id="code"> just makes it into an inner join, which is not what I need. It seems like the only way to get the data in a timely manner is to include the filter criteria in the ON section of the JOIN clause, which is not possible given the restrictions of the reporting context.<br /><br />Any ideas on how to optimize this query without violating the reporting restrictions? I can create views, but table-valued functions and FROM clause subqueries are a no-go.<br /><br />Thanks for any advice you can offer. If there is a solution out there for this problem, then there’s always Part 2…<br /><br /><br />-Hartmut5
]]>