Warehouse Query | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Warehouse Query

Hello All, I have a warehouse database which has tables like SalesFact[CustomerId, ProductId, DateId, NetSales, NetUnit] rows(3000000), Customer[CustomerId, CustomerName] rows (40000), Date[DateId, Month, Date, Year] rows(1000), Product[ProductId, ProductName] rows(100000). I need to write a query which shows the top N customer order by Sum(NetSales). If for a month there is no sales (rows) then that month should appear in the output with 0 rows and for a product in the specified date ranges. The output of the query should be like Customer1 ,Product1 ,Jan ,NetSales , NetUnits
Customer1 ,Product1 ,Feb ,NetSales , NetUnits
Customer1 ,Product1 ,Mar ,0 , 0 (if no rows for the Mar sales) …………………. Customer1 ,Product2 ,Jan ,NetSales , NetUnits
Customer1 ,Product2 ,Feb ,NetSales , NetUnits ………………… CustomerN ,ProductN, Mar, NetSales , NetUnits
….. OtherCustomers, ,Jan,NetSales , NetUnits
OtherCustomers, ,Feb,NetSales , NetUnits
OtherCustomers, ,Mar,NetSales , NetUnits
……………………………………….. OtherCustomers are the customer which are not fall under the TOP N condition.
As the rows in each tables are huge I need to write the optimized query. Could anybody suggest me what query and index would be best in this case. Thanks,
Vivek

Number of rows is pretty modest compared to number I deal with every day. It should’t be the problem if you just put indexes on pk and fk columns. If it is the most frequent/time consuming query by far, and you don’t have frequent updates you could add index on SalesFact (CustomerID, NetSales). Besides clasic join between fact and dimension tables you have to include join with:
Select top N CustomerID, Sum(NetSales)
from SalesFact
group by CustomerID
order by 2
Because (top) N is constant in mssql 2000 you will need to use dynamic sql if you need N as variable. I suggest to put your question in dw forum, someone may offer better solution I am not aware of.
]]>