40 million rows and are increasing | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

40 million rows and are increasing

hi,
I have a table which consists of around 40 million records in the dattabase.when i go for a select count it takes around 3-4 hours to show the results.Can you help me in optmising the query maby be partitioned views etc?

How about indexes and optimization of index and data for this table? _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Hi Satya thanks for the early repl there is no index on the field –> total_Amt [money]
The total_Amt is the field….which i am going to search on and male a sum(total_Amt ) Can you tell me what has to be done next?
If most of the queries are executed agains total_Amt then consider having an Index on that field with/without combination to other fields. _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Which index has to be created .Unique/Clustered or non clustered.?And how much in % do you think the performance will increase if i do "select sum(total_Amt) from billing".
there are around 40 million records in this table
Please help
Non clustered would be fine (I think) and check it yourself from QA using QUERY EXECUTION PLAN for the analaysis. _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

When i run a query
"SELECT COUNT(*) AS Expr1 FROM bill_master"
it gives me timeout expired.How should i overcome this
What are the settings for timeout for QUERY & LOGIN on the server? _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Its 3600 seconds.
If you just need a count, and you can live with not the most updates results, you can get the number of rows from sysindexes. This value will be accurate after running update statistics SELECT rows
FROM sysindexes
WHERE id = object_id(‘table_name’)
AND indid < 2 Bambola.
Hey "bambola" that worked pretty fast.Its ok for count now
Can that be something done pretty fast on the foll query SELECT SUM(total_Amt) AS Expr1 FROM bill_Master Remeber there are around 40 million records in this table.
Yeah, valid reference by Bambola. Just another addition to this… This table is not updated on real time basis. So if you need the accurate figure, run DBCC UPDATEUSAGE. The same technique is used by sp_spaceused but aagin there is a chance of the sysindexes table not being updated. For the SUM query, the system will go for Table scan as there is no WHERE condition to limit the data. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

Will creating an clustered index on the total_amt help increasing the
performance?Because this query takes me hours to show the results.
Please help its urgent.
I don’t think this will help as anyway the query has to have the entire table for running. Another solution is to create a trigger on the table and store the computed value at some other location. When any record is updated / inserted / deleted, the value will be updated. For the existing data you can calcuate using the old method once. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

Ok
let me explain in details the table structure
Table A consists of an idx_number
Table B consist of
1) idx_number and
2) total_amt I will write a query like select sum(total_amt) from TABLE B where TABLE B.idx_number=TABLE A.idx_number
WHERE idx_number=’1234′ So what should i do for indexes ..shall i create non clustered inde x on Table A on
idx_number and total_amt column?Please help
Yes, you should index both columns. idx_number + total_amt. And yes, it will increase the
performance of your query since it will read all data from index pages and not scan the table. Bambola.
Valid response by Bambola, creating indexes as stated will increase performance. Sorry about the confusion of data pages Vs Index Pages. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

ok
I wil do that any other suggestions like partitoned views etc….I have
read somewhere on the forum that partitoned views helps you to increase performance on large tables like this one.How do i do that ?does it really helps?
If you cann’t create indexes on these columns, partitioning data vertically (keeping the most accessed fields in one table and rest in saperate table) may help but that will mean a performance hit on rest of the queries. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

Let me ask you something. What other queries do you run on this table? Bambola.
select sum(total_amt) from TABLE B where TABLE B.idx_number=TABLE A.idx_number
WHERE idx_number=’1234′ This is my main query Coming to your question there will be also fields 3 fields to be selected but that will be separate report the fields are like
FirstServedDate , LastServed_Date etc
ok i am trying to create a clustered index But it ask for Order what should i choose "Ascending" or desceding for both the
fields ?
Take into consideration that creating a clustered index means recreating the table! it will take time and space.
As for the order, I usually go with the default. Bambola.
If there is any other query that is run on the system, and has ORDER BY clause / TOP clause, use order by as spcified in the query. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

Sorry actually i am creating a non clustered index not clustered.Now i will check it and see how much time it will take to show records

Hi ya, I would change the query to
select sum(total_amt) from TABLE where WHERE idx_number=’1234′ since idx_number is in both of the tables, so a join is not required creating an index will give a huge saving, another option is to create a view and index the view e.g. create view vw_sum_TABLE
with SCHEMABINDING
as
select idx_number, sum( total_amt )
from TABLE
group by idx_number
go create unique clustered index iuc_sum_TABLE on vw_sum_TABLE( idx_number ) You can then query the view rather than the table. This would work best if there are not that many unique idx_numbers in the TABLE table. i.e. lots of records per idx_number Cheers
Twan
Sorry Twan but i am not getting you *************************************************************
I would change the query to
select sum(total_amt) from TABLE where WHERE idx_number=’1234′
since idx_number is in both of the tables, so a join is not required
*************************************************************
without join how is it possible TABLE B.idx_number=TABLE A.idx_number
if there is no join query it willreturn multiple results wont it? my query is
*************************************************************
select sum(total_amt) from TABLE B where TABLE B.idx_number=TABLE A.idx_number
WHERE idx_number=’1234′
*************************************************************

A join is required if he is needs to get other columns from the other table, or if he needs rows that exists in both tables. In the second case EXISTS is a better choice. Otherwise you are right there’s no need for a join.
As for the clustered view, why creating one and not creating an index on the table?
I understand creating a clustered view when you already have a clustered index on table and
need another one. In this case creating a clustered view is like adding another clustered index on a table that already has one. Otherwise I don’t really understand the need of it. Bambola.
Yeah valid point by Twan… So small thing to miss… Can have big performace imppact. However, if SQLServer2K needs to verify if idx_number=’1234′ is present in TABLE A, he has to run another query or create a foreign key constraint on table A referencing Table B idx_number. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

sqlServer2k, Are you joining the table with another one or with itself. Yout select statment is missing the join and I assumed you only meant to show the main things in the query… Bambola.
I don’t see a need for a FK to run this query if we are talking about 2 tables. A simple index should do the job. But before going further let’s see first if we are talking about 2 different tables or one. Bambola.
Let me clear some points The idx_number is basically genrated from Table A
and is also present as a foreign key to Table B Table B is the table which contains 40 milion records
Table A is the table which contains around 30 lakh records So selecting "idx_number from table A is a must" ********************************************************************************** select sum(40m.total_amt) from TABLE B as 40m ,TABLE A as 30k where 40m .idx_number=30k.idx_number
WHERE 30k.idx_number=’1234′
**********************************************************************************
Hope this is clear.Please help
I didn’t get the "lakh"… Bambola.

as others have mentioned, if the query only retrieves a sum then you won’t need to do the join. You won’t get multiple values since you are retrieving an agregated value with no group by. I’d use the index option since it is much simpler, but it will be ever so slightly slower than using the indexed view, since the view would be precalculated. So if you update the table infrequently or in batches, and you run the sum query often, and the sum query needs to be as fast as it could possibly be then options are:
– create an indexed view
– use triggers to maintain a sum table
– index the base table However if you do the sum not that often and performance is not that critical (ie. subsecond is ok, no need to be 10ms) then I’d only go as far as adding an index to the table Cheers
Twan
Ok i will reformat it and post it
The idx_number is basically genrated from Table A
and is also present as a foreign key to Table B Table B is the table which contains 40 million records
Table A is the table which contains around 30 k records So selecting "idx_number from table A is a must" ********************************************************************************** select sum(A1.total_amt) from TABLE B as A1 ,TABLE A as B1 where A1 .idx_number=B1.idx_number
WHERE B1.idx_number=’1234′
**********************************************************************************
Hope this is clear.Please help
1 lakh is 0.1 million. Why do you need the join in the query?
where 40m .idx_number=30k.idx_number
Do you want to verify if the record is present in 30K table as well? If the table has foreign key constraint, u don’t need to check that. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard


The tables will be updated every quarter
ok. So in this case I’ll try EXISTS. SELECT SUM(A1.total_amt)
FROM TABLE B A1
WHERE EXISTS
(
SELECT NULL
FROM TABLE_2
WHERE A1.idx_number= TABLE_2.idx_number
AND TABLE_2.idx_number=’1234′
) Bambola.

How often is the sum query run and how vital is its performance? an index on the base table is the best first step. If you find that perforance is still not as good then as part of the quarterly update you could run a query to get the new totals and save them in a table somewhere to index and query against select idx_number, sum( total_amt )
from 40m
group by idx_number Twan
I’d stay clear of the exists, since it would imply doing a table scan of the 40m table, an index scan of the 30k table and merge join them together, which doesn’t sound very optimal to me (but then I don’t have a 40m row table to play with <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> )<br /><br />
How can this table scan avoided then Twan?
It will scan the small table if index is not selective enough, but it should so a seek on the big table. At least this is what I have seen on my test. But I don’t have a 40 mill record table to test it on. Bambola.
Ok what i think from our discussion is to create an Non-clustered index on
Table B with the two fields as total_amt and the idx_number
and then 1)write a join query? or
2)Create partitioned view with for Table B..?? I am little bit confused Please clarify
I’d say take it step by step <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br />Create the index than test some queries and according to the execution plan see what works best for you.<br />I’d be curious to know though what did you end up with.<br /><br />Bambola.
Please can you tell me what are the steps First one i see is to create a non clustered index on Table B then…?
First step is to create the index on the big table on idx_number + total_amt. <br />If you don’t have an index on the idx_number on the other table, create one too.<br />Then try the different queries and see what works better. We are still here so don’t worry <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />Bambola.

in reply to sqlserver2k only by creating a covering index for this query will you definitely avoid a table scan of the 40m table. So yas a non-clustered index on idx_number, total_amt would be the best first step. then try your queries and you should find a multiple orders of magnitude in performance improvement. If this is not sufficient then creating a summary table (or column on the 30K table) would give you further improvments. BUT you’d need to update this any time the 40m table was updated (i.e. quarterly) Cheers
Twan NOTE if there was a suitable index for SQL to do a seek then it would have done so in the original query since it knows that idx_number must equal ‘1234’ even if you specified that value for the joined column and not for the 40m table column

Hi,<br /> Thanks…..I have created an non clustered index… on Table B 30 mintues back<br />But i dont think it will be completed right now [<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />] ….<br />I think it is recreating the table…..or dont know whats happening<br />But i think it will take another 5-6 hours.Maybe we can talk tommorow morning…..<br />since its late night out here and i think it will be completed by morning only<br /> But guys you have helped me a lot.But please can you be there in the forum tommorow morning<br />That will be a GREAT GREAT HELP HELP.<br />I will try to implement each and every solution tommorow till the index is done<br />PLEASE PLEASE CAN YOU BE AVAILABLE TOMMOROW…SAME TIME<br /> OR I HAVE TO WAIT TONIGHT TILLL 5-6 HOURS FOR INDEXING TO BE COMPLETE<br /> HOPE YOU CAN UNDERSTAND<br /><br />Thanks<br /><br />
Don’t worry, we will be here <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />Bambola.
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />I’d use the index option since it is much simpler, but it will be ever so slightly slower than using the indexed view, since the view would be precalculated.<br /><hr height="1" noshade id="quote"></blockquote id="quote"></font id="quote"><br />Twan – So you meant a clustered view on the results… I didn’t understand that <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />Bambola.<br />
Hi Bambola,<br /><br />yep I meant a clustered view (aka indexed or materialised view) depending on your db of choice <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />This is my first time to this site. It looks pretty cool!
> This is my first time to this site.
Good to have you here! > It looks pretty cool!
It is. Go over the articles and the tips and you will find a lot of interesting things. Bambola.
Hi everyone, I just 1 have question…why have an index on both the Id and value column? One of our tables has around 320,000,000 rows (Yea scary!!!!) and the Clustered index is only on the ID column, which is also a FK to another table (around 7,000,000 rows). If I do a sum on one of the value columns the result is instant. Here is my query…. SELECT sum(SalesHistoryQuantity)
FROM tbl_InventoryHistory A — 320,000,000 rows
INNER JOIN tbl_Inventory B — 7,000,000 rows
ON A.InventoryID = B.InventoryID
WHERE A.InventoryID = 365410 Regards
Sanette

the index on more than one columns applies mostly for nonclustered indexes and where many rows are retrieved,
try putting an index on some other column in the tbl_Inventory table, preferably with < 100 distinct values, then try the sum(SalesHistoryQuantity) query with both index options
Oops! [<img src=’/community/emoticons/emotion-3.gif’ alt=’:eek:‘ />)] I think I was misunderstood. We do not have a problem with speed when calculating a sum value on a table with over 320 Million rows. (The result is seen in less than a second!). I was trying to find out why "sqlServer2k" was having a problem doing the same calculation but on a much smaller table than ours. Why not just have 1 Clustered index in the idx_number?<br /><br />SQL Developer<br />JustEnough Software Corporation
I have a relatively small table (around 1 million records). if clustered index on ID is used, it is slightly slower than using a non clustered index on id + amount
G.R. Preethiviraj Kulasingham

]]>