Pls optimize this query | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Pls optimize this query

I need to create a report on the traffic distribution to our site.
It means that I have to find # of people from different country. THe IP Look up table : fromIP bigint
toIP bigint
Country varchar(20)
the QUery I have is :
select country, count(*) from iplookup, traffic
where fromip <=traffic.ip and traffic.ip<=toIP
I have millions of record per day and it takes 7 minute for 10000 records. How can i improve the runtime? Note: IP is is represented in a number form by : eg: 192.168.0.1 = 192 * 256^3 + 168^2 + 1
Did you see execution plan?. May be you need some index.
Luis Martin
Moderator
SQL-Server-Performance.com Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason.
Leonardo Da Vinci Nunca esperes el reconocimiento de tus hijos, eso ocurrirá luego de tu muerte
All postings are provided “AS IS” with no warranties for accuracy.
I tried to make a cluster index on ipfrom and ipto, but it doesn’t help.
THe index tunning wizard has no suggestion for indexing
Could you post both tables, (columns and attributes)? Luis Martin
Moderator
SQL-Server-Performance.com Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason.
Leonardo Da Vinci Nunca esperes el reconocimiento de tus hijos, eso ocurrirá luego de tu muerte
All postings are provided “AS IS” with no warranties for accuracy.
The tables struture :
IPLookup: FromIP bigInt
ToIp bigint
Country varchar(20) Tracking table: trackingid int pk
TrackingIP bigint
trackingtimestamp datetime
You can’t possibly be running that query against those tables. The columns/table names in your select don’t match what you just posted. Run your query like this (changing it so it works): SET STATISTICS IO ON
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON select country, count(*) from iplookup, traffic
where fromip <=traffic.ip and traffic.ip<=toIP SET STATISTICS IO OFF
SET STATISTICS PROFILE OFF
SET STATISTICS TIME OFF Paste the results of both the results and messages tab into here (minus the actual result set of data) with [ q u o t e ] [ / q u o t e ] around without the spaces in the tags. Also, how many rows are in both those tables when you run this query against 10,000 rows? What is the criteria for only selecting 10k rows? Have you tried putting non-clustered indexes on the TrackingIP, FromIP, and ToIP? Also, you might not be getting any mileage at all out of your indexes, depending on what the data in those tables looks like, since you basically using a BETWEEN clause to join these two tables together. Last, if you run the query like this: SET STATISTICS IO ON
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON SELECT
il.country,
count(1)
FROM
IPLookup il
INNER LOOP JOIN Tracking t ON il.FromIP <= t.TrackingIP
AND t.TrackingIP <=il.ToIP SET STATISTICS IO OFF
SET STATISTICS PROFILE OFF
SET STATISTICS TIME OFF
Again, you have to correct either your code you posted or the table structures you posted, so this whole thing makes sense to us.
MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
The query can’t be faster unless you add date range restriction in the where clause. If you do so, you also need to make sure you have appropriate index.
Also group by clause is missing in your query.
mmarovic, it will be better if you can have more constructive input.
date range is up to the report requirement. "need to make sure you have appropriate index". What’s approperiate? Can you elborate instead of giving common sense?
"Also group by clause is missing in your query." I cant give the run time unless the query is having the right syntax.
WHy don’t you give me a faster query suggestion instead of picking on typo?
Hi ya, Mirko means that your query as it is will give way too many rows, it should have a group by cluase as in select country, count(*) from iplookup, traffic
where fromip <=traffic.ip and traffic.ip<=toIP
group by country
How many rows in traffic? since it is going to table scan the entire table with this query…? I’d put a clustered index on fromip, toip, and a nonclustered index on traffic.ip. Finally I’d change count(*) to count(country) assuming country is never null Cheers
Twan
kungfung_lam, I allways give a constructive input. However, you didn’t provided working query. You can’t have aggregate function (count) and a column in the same query without group by clause.
That means we don’t know what is the query we are supposed to optimize. More, even if you did just forget group by, there is no where clause so it looks like your query returns all records from the big table (unless lookup table contains only part of possible ranges). So, if you don’t add a where clause no significatn improvement is possible. I hope it is clear now. However, I don’t like to be accussed to be "not constructive" when I spent my time trying to help. So this is my last answer to any problem you post on this board.
One more thing, forgot about indexes (this is really my last answer): What do you think is appropriate index when searching on date range? Common sense is enough to find the answer.
1 day. to 1 month.
1 day with 73641 data
1 month with 1965766 data I have also mention in the first post that I only test with the first 10K data for 1 day and it runs like 7 minutes
"I have millions of record per day and it takes 7 minute for 10000 records."
So, your common sense is not good enough to observe. Please elbroate your index by answering How if you know database.
ok so you have a clustered index on traffic.trackingtimestamp then? Cheers
Twan
yes. I have a cluster on the time.
I have another cluster on the fromip and toip, but it does’t help on the runtime
Hi ya, and the query you have has the group by too? could you please post the results of the command when first running "set showplan on" Cheers
Twan
This is a technical forum so it is important that we, those trying to help, get as much as possible, accurate technical information. The performance might not be on the query but also on the configuration and the hardware itself. Please provide us the execution plan so we can determine if you have a optimum index/execution path. Do you have a baseline? Do you have another server with the same hardware spec? If you do, run a baseline generator for both server and see if there is a difference. If there is, then the problem might not be in the query itself but the SQL config or OS config. May the Almighty God bless us all!
www.empoweredinformation.com
If you want help, listen. Don’t criticize people trying to help you. If you can’t post code that at least would compile, you’re not having performance issues. You can’t have performance issues when nothing works. MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
Here is the thing I ran<br />SET STATISTICS IO ON<br />SET STATISTICS PROFILE ON<br />SET STATISTICS TIME ON<br /><br />select countryshort , count(*) from tbl_IPCountry, <br />(select top 1000 * from tbl_tracking) tbl_tracking<br />where ipfrom &lt;= dbo.getIPNumber(tracking_IP) and dbo.getIPNumber(tracking_ip) &lt;= ipTo<br />group by countryshort<br /><br />SET STATISTICS IO OFF<br />SET STATISTICS PROFILE OFF<br />SET STATISTICS TIME OFF<br /><br /><br /><br />result: (How can I make it not that nested?) It takes 1 hr 37 mins<br /><br />131select countryshort , count(*) from tbl_IPCountry, (select top 1000 * from tbl_tracking) tbl_tracking where ipfrom &lt;= dbo.getIPNumber(tracking_IP) and dbo.getIPNumber(tracking_ip) &lt;= ipTo group by countryshort210NULLNULLNULLNULL232.0NULLNULLNULL1.9521354NULLNULLSELECT0NULL<br />131 |–Compute Scalar(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004]=Convert([globalagg1006])))221Compute ScalarCompute ScalarDEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Expr1004]=Convert([globalagg1006]))[Expr1004]=Convert([globalagg1006])232.00.00.0000232161.9521354[tbl_IPCountry].[countrySHORT], [Expr1004]NULLPLAN_ROW01.0<br />131 |–Hash Match(Aggregate, HASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[tbl_IPCountry].[countrySHORT]), RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[tbl_IPCountry].[countrySHORT]=[tbl_IPCountry].[countrySHORT]) DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[globalagg1006]=SUM([partialagg1005])))232Hash MatchAggregateHASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[tbl_IPCountry].[countrySHORT]), RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[tbl_IPCountry].[countrySHORT]=[tbl_IPCountry].[countrySHORT])[globalagg1006]=SUM([partialagg1005])232.00.00.48632047201.9521123[tbl_IPCountry].[countrySHORT], [globalagg1006]NULLPLAN_ROW01.0<br />9411 |–Nested Loops(Inner Join, OUTER REFERENCES<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Tbl_Tracking].[Tracking_IP]))243Nested LoopsInner JoinOUTER REFERENCES<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Tbl_Tracking].[Tracking_IP])NULL70111.8830.00.293067661141.4657918[tbl_IPCountry].[countrySHORT], [partialagg1005]NULLPLAN_ROW01.0<br />9411 |–Hash Match(Aggregate, HASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Tbl_Tracking].[Tracking_IP]), RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Tbl_Tracking].[Tracking_IP]=[Tbl_Tracking].[Tracking_IP]) DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[partialagg1005]=COUNT(*)))254Hash MatchAggregateHASH<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Tbl_Tracking].[Tracking_IP]), RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[Tbl_Tracking].[Tracking_IP]=[Tbl_Tracking].[Tracking_IP])[partialagg1005]=COUNT(*)9.48175330.02.4479087E-2313.8903072E-2[Tbl_Tracking].[Tracking_IP], [partialagg1005]NULLPLAN_ROW01.0<br />10001 | |–Top(1000)265TopTopNULLNULL1000.00.09.9999997E-5231.4423984E-2[Tbl_Tracking].[Tracking_IP]NULLPLAN_ROW01.0<br />10001 | |–Clustered Index Scan(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[MEDIAWHIZ_SURVEY_NEW_DB].[dbo].[Tbl_Tracking].[export_friendly_index]))286Clustered Index ScanClustered Index ScanOBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[MEDIAWHIZ_SURVEY_NEW_DB].[dbo].[Tbl_Tracking].[export_friendly_index])[Tbl_Tracking].[Tracking_IP]1000.07.1119918E-37.1119918E-32451.4223984E-2[Tbl_Tracking].[Tracking_IP]NULLPLAN_ROW01.0<br />941941 |–Filter(WHERE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[dbo].[getIPNumber]([Tbl_Tracking].[Tracking_IP])&lt;=[tbl_IPCountry].[ipTO]))2134FilterFilterWHERE<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[dbo].[getIPNumber]([Tbl_Tracking].[Tracking_IP])&lt;=[tbl_IPCountry].[ipTO])NULL7394.40040.01.4295841E-2921.1338211[tbl_IPCountry].[countrySHORT]NULLPLAN_ROW09.4817533<br />16911448941 |–Clustered Index Seek(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[MEDIAWHIZ_SURVEY_NEW_DB].[dbo].[tbl_IPCountry].[testIP]), SEEK<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[tbl_IPCountry].[ipFROM] &lt;= [dbo].[getIPNumber]([Tbl_Tracking].[Tracking_IP])) ORDERED FORWARD)21413Clustered Index SeekClustered Index SeekOBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[MEDIAWHIZ_SURVEY_NEW_DB].[dbo].[tbl_IPCountry].[testIP]), SEEK<img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />[tbl_IPCountry].[ipFROM] &lt;= [dbo].[getIPNumber]([Tbl_Tracking].[Tracking_IP])) ORDERED FORWARD[tbl_IPCountry].[ipTO], [tbl_IPCountry].[countrySHORT]24648.0020.103944172.7289893E-2920.99827147[tbl_IPCountry].[ipTO], [tbl_IPCountry].[countrySHORT]NULLPLAN_ROW09.4817533<br />
i think your are being killed on your getIPNumber function call
i had another customer who used string IP’s, which has less overhead than a function call when rebuild the table to use binary(4) IP, there was a significant improvement basically, convert IP to binary(4), store it in the table that way,
convert your input IP to binary(4)
then write your query so there is no function call

Can someone please help on this Problem?
As Joe suggested, SQL will perform faster if it is a number (binary) compared to varchar. I suggest you time the following statements: –check how long this statement
SELECT TOP 1000 dbo.getIPNumber(tracking_IP)
FROM tbl_tracking
GO –time this separately
select countryshort , count(*)
from tbl_IPCountry
WHERE ‘192.168.0.1’ between ipfrom and ipTo –hardcoded
GO –time this separately
select countryshort , count(*)
from tbl_IPCountry
WHERE dbo.getIPNumber(‘192.168.0.1’) between ipfrom and ipTo –hardcoded
GO Hopefully, this will give you an idea where’s the issue is. May the Almighty God bless us all!
www.empoweredinformation.com
SELECT TOP 1000 dbo.getIPNumber(tracking_IP)
FROM tbl_tracking
GO
It Takes 0 second
select countryshort , count(*)
from tbl_IPCountry
WHERE dbo.getIpnumber(‘192.168.0.1’) between ipfrom and ipTo –hardcoded
group by countryshort It takes 1 sec

Turn the getIpnumber into a table function. Join to the table function instead of calling it each time in the WHERE clause. This will allow you to fully take advantage of indexes. That should help you out quite a bit. (It’s hard to troubleshoot this without knowing exactly what the getIpnumber function does.)
MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
When I said about time the statement, it is not the time when it starts executing but when it finishes it. That means, after the 1000 records. The reason I am asking you this is because query has outer reference of tbl_tracking which does not use the index that well with other ip. This means that the 1000 records is constrained by the other conditions, not necessarily the subquery of selecting 1000 records. Can you tell us if you want TO (1) EXTRACT FIRST THE 1000 RECORDS OF TBL_TRACKING AND THEN, CHECK if it meets the condition OR (2) you want to extract the TOP 1000 records of TBL_TRACKING THAT MEETS THE CONDITION. Which one is functionally equivalent: SELECT countryshort, count(*)
FROM
tbl_IPCountry TC LEFT JOIN (SELECT TOP 1000 DISTINCT dbo.getIPNumber(tracking_IP) IP
FROM tbl_tracking
) AS TA ON (TA.IP between tc.ipFrom and tc.ipTo)
GROUP BY countryshort OR
SELECT countryshort, count(*)
FROM
tbl_IPCountry TC JOIN (SELECT TOP 1000 DISTINCT dbo.getIPNumber(tracking_IP) IP
FROM tbl_tracking
) AS TA ON (TA.IP between tc.ipFrom and tc.ipTo)
GROUP BY countryshort DECLARE
@Start DATETIME,
@End DATETIME
SET @Start = GETDATE() SET @End = GETDATE() May the Almighty God bless us all!
www.empoweredinformation.com
]]>