Ask help on aggregate-median calculation | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Ask help on aggregate-median calculation

Hi friends, It might not be hard to write a median function for a certain value. The problem is, when it is used as an aggregation function, the performance is getting terribly poor. I would appreciate if somebody could help on this. Following is the problem description and the code I am currently using: Two tables CREATE TABLE [dbo].[bmBranchSku] (
[skuID] [varchar] (20) NOT NULL ,
[branchID] [varchar] (10) NOT NULL ,
[branchSkuLastSalePrice] [decimal](18, 2) NULL
) ON [PRIMARY]
GO CREATE TABLE [dbo].[bmSku] (
[skuID] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[categoryID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[skuTotalSale] [decimal](18, 2) NULL ,
[skuBmSalePriceMedian] [decimal](18, 2) NULL
) ON [PRIMARY]
GO The purpose is to calculatie the median value of each sku (in bmSku table) and store them to bmSku.skuBmSalePriceMedian field. bmSku table in my database has over 5,000 rows. Because there are 15 branches, the bmBranchSku has about 15*5,000 = 75,000 rows. Here is the T-SQL script I am using(I assume both tables are full of data): update bmSku
set skuBmSalePriceMedian = dbo.fn_bmBranchSku_branchSkuLastSalePrice_median(skuID) CREATE function fn_bmBranchSku_branchSkuLastSalePrice_median(@skuID varchar(20))
returns decimal(18,2) as
begin
declare @factor1 decimal(18,2)
declare @factor2 decimal(18,2) select top 50 percent
@factor1 = branchSkuLastSalePrice
from bmBranchSku
where skuID = @skuID
and branchSkuLastSalePrice > 0
order by branchSkuLastSalePrice asc if @@rowcount % 2 = 1
set @factor2 = @factor1 — when rowcount is odd
else
select top 50 percent
@factor2 = branchSkuLastSalePrice
from bmBranchSku
where skuID = @skuID
and branchSkuLastSalePrice > 0
order by branchSkuLastSalePrice desc return (@factor1 + @factor2) / 2
end I think the reason why it is so slow is because for each sku, the function have to scan the huge bmBranchSku table. Even through I have build index for every field. Thank you very much! Haitao
I don’t know how you define "terribly poor performance", but here’s how I calculate the median.<br /><pre><br />IF OBJECT_ID(‘median’) IS NOT NULL<br />DROP TABLE median<br />GO<br />CREATE TABLE median(<br />col1 INT<br />)<br />GO <br />INSERT INTO median (col1) VALUES (1)<br />INSERT INTO median (col1) VALUES (2)<br />INSERT INTO median (col1) VALUES (3)<br />INSERT INTO median (col1) VALUES (3)<br />INSERT INTO median (col1) VALUES (4)<br />INSERT INTO median (col1) VALUES (7)<br />INSERT INTO median (col1) VALUES (<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ /><br />INSERT INTO median (col1) VALUES (9)<br /><br />SELECT <br />((SELECT <br />MIN(Col1)<br />FROM<br />(SELECT TOP 50 PERCENT <br />Col1<br />FROM <br />Median<br /> ORDER BY <br />Col1 <br />DESC) a) +<br />(SELECT <br />MAX(Col1)<br />FROM<br /> (SELECT TOP 50 PERCENT <br />Col1<br /> FROM <br />Median<br />ORDER BY <br />Col1) a))/2. AS Median<br /><br />Median <br />—————— <br />3.500000<br /><br />(1 row(s) affected)<br /><br /></pre><br />This is the financial median. And it should work with both odd and even numbers of rows<br /><br />In case you need the statistical median, you can do something like:<br /><pre><br />SELECT <br />MAX(Col1) AS Median<br />FROM<br />(SELECT TOP 50 PERCENT <br />Col1<br />FROM <br />Median<br />ORDER BY <br />Col1) a<br /><br />Median <br />———– <br />3<br /><br />(1 row(s) affected)<br /></pre><br />The number of rows you have is not that huge, So with proper indexes the performance shouldn’t be too bad. Generally I tend to avoid doing statistics on the server. That is by far easier done at the client.<br /><br />———————–<br />–Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />———————–<br />
Hey, thanks Frank,<br /><br />The median calculation took over 4 min based on the amount of data I mentioned, while the other tens(possibly over 100) of queries took less than 4 min. That is how I define ‘poor performance’. I might be too greedy <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />What I really want is an aggregate median function doing job like this<br />——————————-<br /> select skuID, MEDIAN(price)<br /> from branchSku<br /> GROUP BY skuID<br />——————————-<br />So for each median value, the query will focus on the data within the small group rather than scaning the whole table.
Hey Frank, I sovled the problem by changing the code based on your advice!!!!! Fantastic!!!! Thank you very much, Frank!!!! New sql statement is as below select skuID,
((select max(branchSkuLastSalePrice)
from
(select top 50 percent branchSkuLastSalePrice
from bmBranchSku
where skuID = b.skuID and branchSkuLastSalePrice > 0
order by branchSkuLastSalePrice asc
) fristHalf
) +
(
select min(branchSkuLastSalePrice)
from
(select top 50 percent branchSkuLastSalePrice
from bmBranchSku
where skuID = b.skuID and branchSkuLastSalePrice > 0
order by branchSkuLastSalePrice desc
) secondHalf
)) / 2 as median, skuBmSalePriceMedian
from bmSku b Haitao
By the way, the new query took less than 2 sec – doing job as if it is a built-in aggregate function.
Glad to see it works ———————–
–Frank
http://www.insidesql.de
———————–

]]>