Aggregation On Large Data | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Aggregation On Large Data

Hello Every 1,
this my first post to you, i don’t know if its a problem or its just i can’t handel it
my situation is as follows
i have a database stores IISlog file entries for analysis purposes, the problem is on performing aggreagtion functions on those data,my current sample is 2.500.000,it takes time more than 3 mins to perform simple select and group by query,i’m using index on most important field but resluts are not that good
i wish if any one of you can tell me the right thing to do
thanks in advance
Yours, M.M.Faramawi
3 min aggregating 2.5 million rows can be both as good as possible or it can be improved depending on hardware handling db. The question is: Is it good enough for you? If you need it to be improved, post your query table scripts (create table with constraints and create index scripts) so we can try to optimize it.
Hi,thanks for that fast reply
here is the only one table i have which it looks as the same structre as the IIS Log File
//************The Script************************
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[IISLogs]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1)
drop table [dbo].[IISLogs]
GO CREATE TABLE [dbo].[IISLogs] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[date] [smalldatetime] NULL ,
[time] [datetime] NULL ,
[c-ip] [varchar] (16) COLLATE Arabic_CI_AS NULL ,
[cs-username] [varchar] (20) COLLATE Arabic_CI_AS NULL ,
[s-sitename] [varchar] (20) COLLATE Arabic_CI_AS NULL ,
[s-computername] [varchar] (20) COLLATE Arabic_CI_AS NULL ,
[s-ip] [varchar] (16) COLLATE Arabic_CI_AS NULL ,
[s-port] [int] NULL ,
[cs-method] [varchar] (100) COLLATE Arabic_CI_AS NULL ,
[cs-uri-stem] [varchar] (100) COLLATE Arabic_CI_AS NULL ,
[cs-uri-query] [varchar] (5000) COLLATE Arabic_CI_AS NULL ,
[sc-status] [int] NULL ,
[sc-win32-status] [varchar] (10) COLLATE Arabic_CI_AS NULL ,
[sc-bytes] [int] NULL ,
[cs-bytes] [int] NULL ,
[time-taken] [int] NULL ,
[cs-version] [varchar] (16) COLLATE Arabic_CI_AS NULL ,
[cs-host] [varchar] (50) COLLATE Arabic_CI_AS NULL ,
[cs(User-Agent)] [varchar] (255) COLLATE Arabic_CI_AS NULL ,
[cs(Cookie)] [varchar] (5000) COLLATE Arabic_CI_AS NULL ,
[cs(Referer)] [varchar] (5000) COLLATE Arabic_CI_AS NULL
) ON [PRIMARY]
GO ALTER TABLE [dbo].[IISLogs] WITH NOCHECK ADD
CONSTRAINT [PK_ROICommander] PRIMARY KEY CLUSTERED
(
[ID]
) ON [PRIMARY]
GO CREATE INDEX [IX_ROICommander_DateTime] ON [dbo].[IISLogs]([date], [time]) ON [PRIMARY]
GO CREATE INDEX [IX_ROICommander_Client] ON [dbo].[IISLogs]([c-ip]) ON [PRIMARY]
GO CREATE INDEX [IX_ROICommander_Page] ON [dbo].[IISLogs]([cs-uri-stem]) ON [PRIMARY]
GO CREATE INDEX [IX_ROICommander_Status] ON [dbo].[IISLogs]([sc-status]) ON [PRIMARY]
GO CREATE INDEX [IX_ROICommander_Server] ON [dbo].[IISLogs]([s-ip], [s-port]) ON [PRIMARY]
GO CREATE INDEX [TimeTaken] ON [dbo].[IISLogs]([time-taken]) ON [PRIMARY]
GO
//********************************* Script End ********************************** and i would like to run a query like the following
SELECT [cs-uri-stem] as Request , COUNT(*) AS [Count],Min([Time-Taken]) as Max([Time-Taken])as MaxTime,AVG([Time-Taken]) as AVGTime
FROM IISLogs
GROUP BY [cs-uri-stem]
order by [count] desc for this query i tried it without aggregation it give me output in 3 sec, but when i added only one part wich was MIN i got results on 6 min,another strange happend i used to COUNT(IP) but this without aggregation gave me resluts in 1.5 min, when i put Count(*) i got those 3 sec i wich if and one can tell whats happedned and caused that time minimization Thanks in Advance
Yours,
M.M.Faramawi
quote:for this query i tried it without aggregation it give me output in 3 sec
Without aggregations query needs only cs-uri-stem values so it used index on this column.
quote:but when i added only one part wich was MIN i got results on 6 min
Now you need values of columns cs-uri-stem and [Time-Taken] so index on these two columns (in that order) will speed-up this query.
quote:another strange happend i used to COUNT(IP) but this without aggregation gave me resluts in 1.5 min
I guess you had count(distinct s-ip) (or it was c-ip?). You don’t have composit index on cs-uri-stem and s-ip so table scan was used.
quote:when i put Count(*) i got those 3 sec i wich if and one can tell whats happedned and caused that time minimization
Count(*) return number of rows for specific grouped by column, so all info are in index on cs-uri-stem.
]]>