Fundamental index question | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Fundamental index question

Ok, here’s the deal. First let me start by saying that I am, by no means, a SQL Server expert much less an SQL guru. I do, however, have a database with a primary table that has more than 100 Million rows that is growing at a rate of roughly 5 million rows a day. Originally my biggest problem was getting the data into the database fast enough – with considerable tuning I was able to get this down to roughly 60 seconds each hour which works just fine. That was great but I was then left with a write-only [<img src=’/community/emoticons/emotion-4.gif’ alt=’:p‘ />] database as the queries were taking FOREVER to complete but, even worse, the results were totally inconsistent.<br /><br />The primary table has a clustered index with a number of foreign keys. When I write a query that accesses the table using the the index but referencing the names in the foreign keys the performance is terrible. However, when I execute the same query with the foreign keys fully resolved the performance is sub-second. In other words;<br /><br />select * from mytable, fktable1, fktable2, fktable3<br /> where fktable1.name = "fkname1" and fktable2.name = "fkname2" <br /> and fktable3.name = "fkname3"<br /> and mytable.fk1 = fktable1.id and mytable.fk2 = fktable2.id<br /> and mytable.fk3 = fktable3.id<br /><br />totally sucks (looking at the execution path it uses only part of the index (scanning something like 2million rows) when it could reasonably use the entire index (and scan less than a few hundred rows). However, the query<br /><br />select * from mytable<br /> where fk1 = fkvalue1 and fk2 = fkvalue2 and fk3 = fkvalue3<br /><br />totally rocks.<br /><br />I have re-written the queries so that they always use the second form but fundamentally feel that I should, reasonably, be able to achieve the same level of performance from the first. Performance is excellent all of the time but the application logic is much more involved and results in less flexibility.<br /><br />Can anyone explain why I can’t get the performance out of the first query – i.e. why isn’t it actually using the index when all of the values are specified, albeit in the foreign keys? Can it be re-written so that it is fundamentally the same (i.e. using the actual names in the foreign keys)?<br /><br />Any comments, thoughts, suggestions or guidance would be greatly appreciated.<br />
I’ve moved to developers tuning. You will get more answers there (I think…). Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided “AS IS” with no warranties for accuracy.
Thanks – I hope…
Checkhttp://www.sqlskills.com/blogs/kimberly/PermaLink.aspx?guid=1de5cfcc-493d-4874-aed4-7fb330f9eaf7 for relevant information. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
You should make sure that statistics on both lookup tables and main table are up-to-date. I doubt it is not the case with lookup tables (fktable1-fktable3). Do you have pk and indexes on fktables? How many rows do you have there?
Satya, Thanks for the pointer. Right now I don’t have a performance problem so the generalized reading, although interesting and potentially helpful, it’s particularly important right now. Can you point me to a specific posting that addresses my specific question? Mmarovic, Thanks for the reply. I thought that statistics are maintained automatically unless you instruct SQL Server not to – I have not done this. The foreign key tables all have indexes on them and are relatively small (a few hundred rows on all but the largest which I think is about 5K rows). Thanks again..
Satya, Thanks for the pointer. Right now I don’t have a performance problem so the generalized reading, although interesting and potentially helpful, it’s not particularly important right now. Can you point me to a specific posting that addresses my specific question? Mmarovic, Thanks for the reply. I thought that statistics are maintained automatically unless you instruct SQL Server not to – I have not done this. The foreign key tables all have indexes on them and are relatively small (a few hundred rows on all but the largest which I think is about 5K rows). Thanks again..
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote">I thought that statistics are maintained automatically unless you instruct SQL Server not to – I have not done this. The foreign key tables all have indexes on them and are relatively small (a few hundred rows on all but the largest which I think is about 5K rows).<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">I didn’t say statistics are not maintained I said you should make sure they are [<img src=’/community/emoticons/emotion-2.gif’ alt=’:D‘ />]. <br />If foreign key tables are so small then statistics might not be maintained on main table. If not, then this is strange that so bad execution plan is selected. However it happens from time to time, not as often as in previous mssql server releases, but it still happens. <br /><br />You can fix the execution plan by using index, join and/or force order hints. However, the problem with that approach may be that depending on different query parameters different exectution plan can be optimal.
Performing intermittent update statistics will help to keepup the performance, but perform it during less traffic hours on database. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
[quote
You can fix the execution plan by using index, join and/or force order hints. However, the problem with that approach may be that depending on different query parameters different exectution plan can be optimal.
Thanks again. I will check the statistics but suspect that, even if they are somewhat old, they are still valid on the smaller tables. You’re right, I can’t force the use of a specific index since there are three possibilities, all are used, and there’s no way to know programatically which one it should pick. How would "join" affect the executation plan and can you give an example for me to try? Thanks!
Hi ps2. Right now I don’t have enough time to elaborate exactly why it happens, but I have explanation and I’ll post it later here. Be patient, as soon as I have time (tomorrow I guess) I’ll post explanation.
Not a problem, I understand complete. Whenever you get a chance, an explanation would be great – there’s no hurry. I look forward to hearing from you.
Thanks again…
About statistics check for null and old ones. SELECT
RTRIM(object_name(I.id)) tablename,
RTRIM(name) name,
DATALENGTH (statblob) size,
STATS_DATE (I.id, I.indid) last_updated
FROM
sysindexes as I
WHERE
OBJECTPROPERTY(I.id, N’IsUserTable’) = 1 AND
INDEXPROPERTY (I.id , name , ‘IsAutoStatistics’ ) = 1 and
DATALENGTH (statblob) is null order by tablename,
last_updated ASC, size DESC Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided “AS IS” with no warranties for accuracy.
There are situation where fresh statistics doesn’t help. Consider next scenario: Orders table supporting order management system contains StatusID column which is fk to LookupStatuses table. Possible statuses are: StatusID = 0, StatusName = ‘Submitted’
StatusID = 1, StatusName = ‘Approved’
StatusID = 2, StatusName = ‘Rejected’
StatusID = 3, StatusName = ‘Canceled’
StatusID = 4, StatusName = ‘Re-submitted’
….
StatusID = k, StatusName = <step k has completed>
….
StatusID = N, StatusName = ‘Completed’ Let’s assume there are more then 10 statuses. Statuses ‘Canceled’ and ‘Completed’ are final statuses, which means that more then 99% orders stored in Orders table are either Completed or canceled. Let’s take a look how would mssql server query optimizer resolve next query having up-to date statistics:
Select <something>
from Orders o
join LookupStatuses s on o.StatusID = s.StatusID
join <another lookup table> l on l.<l pk column> = s.<another fk column>
where s.StatusName = ‘Completed’
and <condition on another lookup table> To make it simple let’s say that there are just two possibly efficient execution plans: Plan 1:
Select all rows from another lookup table based on condition in where clause and then use loop join and index on another fk column to select matching rows from Orders table.
For each row from Orders table matching row from another lookup table will be selected (another loop join) to check condition on that table. Plan 2:
Select all rows from LookupStatuses where StatusName = ‘Completed’ and then use loop join and index on StatusID column to select matching rows from Orders table.
For each row from orders table matching row from LookupStatuses will be selected to check condition StatusName = ‘Completed’
To estimate costs for plan 2 there are fresh statistics available so QO knows that there is just one row out of 10+ from LookupStatuses table that matches condition.
QO doesn’t know which StatusID value it is since it uses statistics and doesn’t query the table, so it estimates that less then 10% rows from Orders table will be accessed using that execution plan.
However, estimation is far from reality because 99% percent of Orders rows would be selected. Number of rows in orders table gives the majority of exec plan costs and if condition on another lookup table is estimated to return more then 10% of rows that plan is not going to be used even though it is more efficient one. For the reasons mentioned when you have query: Select <something>
from Orders o
join LookupStatuses s on o.StatusID = s.StatusID
join <another lookup table> l on l.<l pk column> = s.<another fk column>
where s.StatusName = ‘Approved’
and <condition on another lookup table>
The first table filtered may be another lookup table even though starting with LookupStatus table is much more efficient (since less then 1% of rows would be selected from Orders table)
]]>