SQL Server Performance

40 million rows and are increasing

Discussion in 'T-SQL Performance Tuning for Developers' started by sqlServer2k, Aug 11, 2003.

  1. sqlServer2k New Member

    hi,
    I have a table which consists of around 40 million records in the dattabase.when i go for a select count it takes around 3-4 hours to show the results.Can you help me in optmising the query maby be partitioned views etc?
  2. satya Moderator

    How about indexes and optimization of index and data for this table?

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  3. sqlServer2k New Member

    Hi Satya thanks for the early repl

    there is no index on the field --> total_Amt [money]
    The total_Amt is the field....which i am going to search on and male a sum(total_Amt )

    Can you tell me what has to be done next?

  4. satya Moderator

    If most of the queries are executed agains total_Amt then consider having an Index on that field with/without combination to other fields.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  5. sqlServer2k New Member

    Which index has to be created .Unique/Clustered or non clustered.?And how much in % do you think the performance will increase if i do "select sum(total_Amt) from billing".
    there are around 40 million records in this table
    Please help
  6. satya Moderator

    Non clustered would be fine (I think) and check it yourself from QA using QUERY EXECUTION PLAN for the analaysis.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  7. sqlServer2k New Member

    When i run a query
    "SELECT COUNT(*) AS Expr1 FROM bill_master"
    it gives me timeout expired.How should i overcome this
  8. satya Moderator

    What are the settings for timeout for QUERY & LOGIN on the server?

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  9. sqlServer2k New Member

    Its 3600 seconds.
  10. bambola New Member

    If you just need a count, and you can live with not the most updates results, you can get the number of rows from sysindexes. This value will be accurate after running update statistics

    SELECT rows
    FROM sysindexes
    WHERE id = object_id('table_name')
    AND indid < 2

    Bambola.
  11. sqlServer2k New Member

    Hey "bambola" that worked pretty fast.Its ok for count now
    Can that be something done pretty fast on the foll query

    SELECT SUM(total_Amt) AS Expr1 FROM bill_Master

    Remeber there are around 40 million records in this table.
  12. gaurav_bindlish New Member

    Yeah, valid reference by Bambola. Just another addition to this... This table is not updated on real time basis. So if you need the accurate figure, run DBCC UPDATEUSAGE.

    The same technique is used by sp_spaceused but aagin there is a chance of the sysindexes table not being updated.

    For the SUM query, the system will go for Table scan as there is no WHERE condition to limit the data.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  13. sqlServer2k New Member

    Will creating an clustered index on the total_amt help increasing the
    performance?Because this query takes me hours to show the results.
    Please help its urgent.
  14. gaurav_bindlish New Member

    I don't think this will help as anyway the query has to have the entire table for running.

    Another solution is to create a trigger on the table and store the computed value at some other location. When any record is updated / inserted / deleted, the value will be updated. For the existing data you can calcuate using the old method once.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  15. sqlServer2k New Member

    Ok
    let me explain in details the table structure
    Table A consists of an idx_number
    Table B consist of
    1) idx_number and
    2) total_amt

    I will write a query like

    select sum(total_amt) from TABLE B where TABLE B.idx_number=TABLE A.idx_number
    WHERE idx_number='1234'

    So what should i do for indexes ..shall i create non clustered inde x on Table A on
    idx_number and total_amt column?Please help

  16. bambola New Member

    Yes, you should index both columns. idx_number + total_amt. And yes, it will increase the
    performance of your query since it will read all data from index pages and not scan the table.

    Bambola.
  17. gaurav_bindlish New Member

    Valid response by Bambola, creating indexes as stated will increase performance. Sorry about the confusion of data pages Vs Index Pages.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  18. sqlServer2k New Member

    ok
    I wil do that any other suggestions like partitoned views etc....I have
    read somewhere on the forum that partitoned views helps you to increase performance on large tables like this one.How do i do that ?does it really helps?
  19. gaurav_bindlish New Member

    If you cann't create indexes on these columns, partitioning data vertically (keeping the most accessed fields in one table and rest in saperate table) may help but that will mean a performance hit on rest of the queries.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  20. bambola New Member

    Let me ask you something. What other queries do you run on this table?

    Bambola.
  21. sqlServer2k New Member

    select sum(total_amt) from TABLE B where TABLE B.idx_number=TABLE A.idx_number
    WHERE idx_number='1234'

    This is my main query

    Coming to your question there will be also fields 3 fields to be selected but that will be separate report the fields are like
    FirstServedDate , LastServed_Date etc
  22. sqlServer2k New Member

    ok i am trying to create a clustered index

    But it ask for Order what should i choose "Ascending" or desceding for both the
    fields ?
  23. bambola New Member

    Take into consideration that creating a clustered index means recreating the table! it will take time and space.
    As for the order, I usually go with the default.

    Bambola.
  24. gaurav_bindlish New Member

    If there is any other query that is run on the system, and has ORDER BY clause / TOP clause, use order by as spcified in the query.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  25. sqlServer2k New Member

    Sorry actually i am creating a non clustered index not clustered.Now i will check it and see how much time it will take to show records
  26. Twan New Member

    Hi ya,

    I would change the query to
    select sum(total_amt) from TABLE where WHERE idx_number='1234'

    since idx_number is in both of the tables, so a join is not required

    creating an index will give a huge saving, another option is to create a view and index the view

    e.g.

    create view vw_sum_TABLE
    with SCHEMABINDING
    as
    select idx_number, sum( total_amt )
    from TABLE
    group by idx_number
    go

    create unique clustered index iuc_sum_TABLE on vw_sum_TABLE( idx_number )

    You can then query the view rather than the table. This would work best if there are not that many unique idx_numbers in the TABLE table. i.e. lots of records per idx_number

    Cheers
    Twan
  27. sqlServer2k New Member

    Sorry Twan but i am not getting you

    *************************************************************
    I would change the query to
    select sum(total_amt) from TABLE where WHERE idx_number='1234'
    since idx_number is in both of the tables, so a join is not required
    *************************************************************
    without join how is it possible

    TABLE B.idx_number=TABLE A.idx_number
    if there is no join query it willreturn multiple results wont it?

    my query is
    *************************************************************
    select sum(total_amt) from TABLE B where TABLE B.idx_number=TABLE A.idx_number
    WHERE idx_number='1234'
    *************************************************************
  28. bambola New Member

    A join is required if he is needs to get other columns from the other table, or if he needs rows that exists in both tables. In the second case EXISTS is a better choice. Otherwise you are right there's no need for a join.
    As for the clustered view, why creating one and not creating an index on the table?
    I understand creating a clustered view when you already have a clustered index on table and
    need another one. In this case creating a clustered view is like adding another clustered index on a table that already has one. Otherwise I don't really understand the need of it.

    Bambola.
  29. gaurav_bindlish New Member

    Yeah valid point by Twan... So small thing to miss... Can have big performace imppact.

    However, if SQLServer2K needs to verify if idx_number='1234' is present in TABLE A, he has to run another query or create a foreign key constraint on table A referencing Table B idx_number.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  30. bambola New Member

    sqlServer2k, Are you joining the table with another one or with itself. Yout select statment is missing the join and I assumed you only meant to show the main things in the query...

    Bambola.
  31. bambola New Member

    I don't see a need for a FK to run this query if we are talking about 2 tables. A simple index should do the job. But before going further let's see first if we are talking about 2 different tables or one.

    Bambola.

  32. sqlServer2k New Member

    Let me clear some points

    The idx_number is basically genrated from Table A
    and is also present as a foreign key to Table B

    Table B is the table which contains 40 milion records
    Table A is the table which contains around 30 lakh records

    So selecting "idx_number from table A is a must"

    **********************************************************************************

    select sum(40m.total_amt) from TABLE B as 40m ,TABLE A as 30k where 40m .idx_number=30k.idx_number
    WHERE 30k.idx_number='1234'
    **********************************************************************************
    Hope this is clear.Please help
  33. bambola New Member

    I didn't get the "lakh"...

    Bambola.
  34. Twan New Member

    as others have mentioned, if the query only retrieves a sum then you won't need to do the join. You won't get multiple values since you are retrieving an agregated value with no group by.

    I'd use the index option since it is much simpler, but it will be ever so slightly slower than using the indexed view, since the view would be precalculated.

    So if you update the table infrequently or in batches, and you run the sum query often, and the sum query needs to be as fast as it could possibly be then options are:
    - create an indexed view
    - use triggers to maintain a sum table
    - index the base table

    However if you do the sum not that often and performance is not that critical (ie. subsecond is ok, no need to be 10ms) then I'd only go as far as adding an index to the table

    Cheers
    Twan
  35. sqlServer2k New Member

    Ok i will reformat it and post it


    The idx_number is basically genrated from Table A
    and is also present as a foreign key to Table B

    Table B is the table which contains 40 million records
    Table A is the table which contains around 30 k records

    So selecting "idx_number from table A is a must"

    **********************************************************************************

    select sum(A1.total_amt) from TABLE B as A1 ,TABLE A as B1 where A1 .idx_number=B1.idx_number
    WHERE B1.idx_number='1234'
    **********************************************************************************
    Hope this is clear.Please help
  36. gaurav_bindlish New Member

    1 lakh is 0.1 million.

    Why do you need the join in the query?
    where 40m .idx_number=30k.idx_number
    Do you want to verify if the record is present in 30K table as well? If the table has foreign key constraint, u don't need to check that.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  37. sqlServer2k New Member


    The tables will be updated every quarter
  38. bambola New Member

    ok. So in this case I'll try EXISTS.

    SELECT SUM(A1.total_amt)
    FROM TABLE B A1
    WHERE EXISTS
    (
    SELECT NULL
    FROM TABLE_2
    WHERE A1.idx_number= TABLE_2.idx_number
    AND TABLE_2.idx_number='1234'
    )

    Bambola.
  39. Twan New Member

    How often is the sum query run and how vital is its performance?

    an index on the base table is the best first step. If you find that perforance is still not as good then as part of the quarterly update you could run a query to get the new totals and save them in a table somewhere to index and query against

    select idx_number, sum( total_amt )
    from 40m
    group by idx_number

    Twan
  40. Twan New Member

    I'd stay clear of the exists, since it would imply doing a table scan of the 40m table, an index scan of the 30k table and merge join them together, which doesn't sound very optimal to me (but then I don't have a 40m row table to play with <img src='/community/emoticons/emotion-1.gif' alt=':)' /> )<br /><br />
  41. sqlServer2k New Member

    How can this table scan avoided then Twan?
  42. bambola New Member

    It will scan the small table if index is not selective enough, but it should so a seek on the big table. At least this is what I have seen on my test. But I don't have a 40 mill record table to test it on.

    Bambola.
  43. sqlServer2k New Member

    Ok what i think from our discussion is to create an Non-clustered index on
    Table B with the two fields as total_amt and the idx_number
    and then

    1)write a join query? or
    2)Create partitioned view with for Table B..??

    I am little bit confused

    Please clarify
  44. bambola New Member

    I'd say take it step by step <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br />Create the index than test some queries and according to the execution plan see what works best for you.<br />I'd be curious to know though what did you end up with.<br /><br />Bambola.
  45. sqlServer2k New Member

    Please can you tell me what are the steps

    First one i see is to create a non clustered index on Table B

    then...?
  46. bambola New Member

    First step is to create the index on the big table on idx_number + total_amt. <br />If you don't have an index on the idx_number on the other table, create one too.<br />Then try the different queries and see what works better. We are still here so don't worry <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Bambola.
  47. Twan New Member


    in reply to sqlserver2k

    only by creating a covering index for this query will you definitely avoid a table scan of the 40m table. So yas a non-clustered index on idx_number, total_amt would be the best first step.

    then try your queries and you should find a multiple orders of magnitude in performance improvement.

    If this is not sufficient then creating a summary table (or column on the 30K table) would give you further improvments. BUT you'd need to update this any time the 40m table was updated (i.e. quarterly)

    Cheers
    Twan

    NOTE if there was a suitable index for SQL to do a seek then it would have done so in the original query since it knows that idx_number must equal '1234' even if you specified that value for the joined column and not for the 40m table column
  48. sqlServer2k New Member

    Hi,<br /> Thanks.....I have created an non clustered index... on Table B 30 mintues back<br />But i dont think it will be completed right now [<img src='/community/emoticons/emotion-6.gif' alt=':(' />] ....<br />I think it is recreating the table.....or dont know whats happening<br />But i think it will take another 5-6 hours.Maybe we can talk tommorow morning.....<br />since its late night out here and i think it will be completed by morning only<br /> But guys you have helped me a lot.But please can you be there in the forum tommorow morning<br />That will be a GREAT GREAT HELP HELP.<br />I will try to implement each and every solution tommorow till the index is done<br />PLEASE PLEASE CAN YOU BE AVAILABLE TOMMOROW...SAME TIME<br /> OR I HAVE TO WAIT TONIGHT TILLL 5-6 HOURS FOR INDEXING TO BE COMPLETE<br /> HOPE YOU CAN UNDERSTAND<br /><br />Thanks<br /><br />
  49. bambola New Member

    Don't worry, we will be here <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Bambola.
  50. bambola New Member

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />I'd use the index option since it is much simpler, but it will be ever so slightly slower than using the indexed view, since the view would be precalculated.<br /><hr height="1" noshade id="quote"></blockquote id="quote"></font id="quote"><br />Twan - So you meant a clustered view on the results... I didn't understand that <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Bambola.<br />
  51. Twan New Member

    Hi Bambola,<br /><br />yep I meant a clustered view (aka indexed or materialised view) depending on your db of choice <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />This is my first time to this site. It looks pretty cool!
  52. bambola New Member

    > This is my first time to this site.
    Good to have you here!

    > It looks pretty cool!
    It is. Go over the articles and the tips and you will find a lot of interesting things.

    Bambola.
  53. SanetteWessels New Member

    Hi everyone,

    I just 1 have question...why have an index on both the Id and value column? One of our tables has around 320,000,000 rows (Yea scary!!!!) and the Clustered index is only on the ID column, which is also a FK to another table (around 7,000,000 rows). If I do a sum on one of the value columns the result is instant. Here is my query....

    SELECT sum(SalesHistoryQuantity)
    FROM tbl_InventoryHistory A -- 320,000,000 rows
    INNER JOIN tbl_Inventory B -- 7,000,000 rows
    ON A.InventoryID = B.InventoryID
    WHERE A.InventoryID = 365410

    Regards
    Sanette
  54. joechang New Member

    the index on more than one columns applies mostly for nonclustered indexes and where many rows are retrieved,
    try putting an index on some other column in the tbl_Inventory table, preferably with < 100 distinct values, then try the sum(SalesHistoryQuantity) query with both index options
  55. SanetteWessels New Member

    Oops! [<img src='/community/emoticons/emotion-3.gif' alt=':eek:' />)] I think I was misunderstood. We do not have a problem with speed when calculating a sum value on a table with over 320 Million rows. (The result is seen in less than a second!). I was trying to find out why "sqlServer2k" was having a problem doing the same calculation but on a much smaller table than ours. Why not just have 1 Clustered index in the idx_number?<br /><br />SQL Developer<br />JustEnough Software Corporation
  56. preethi Member

    I have a relatively small table (around 1 million records). if clustered index on ID is used, it is slightly slower than using a non clustered index on id + amount


    G.R. Preethiviraj Kulasingham

Share This Page