# Evil statistical type problem | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

# Evil statistical type problem

I have been trying to solve this problem off and on for over a year and no one can help me with it. I am turning to you in desperation! I need to supply data points for a graph. The data set might be as follows: Job Start Date End Date Dollars
x 01/01/2003 04/30/2003 56,000
y 03/01/2003 07/30/2003 78,000
Z 06/01/2003 10/31/2003 45,000
w 09/01/2003 12/31/2003 62,000 The data points I need to get from my query are the cumulative dollars for the y-axis and the slope change points for the x-axis. So, for instance, the first graph point will be at (1/1/2003, \$0.00). The next date point is at 3/1/2003, the start of the second job. Between Jan 1 and Mar 1, the y-axis value will increase by \$470.50 per day, which is the average spend rate for Job 1: \$56,000 / (4.30.2003 â€“ 1.1.2003) = \$470.59. There are 59 days between Jan1 and Mar 1, so the y-axis at point 2 is \$27,764.71. Between Mar 1 and Apr 30, however, the daily y-increase is increased by the daily average of job Y, which is \$78,000 / (7.30.2003 â€“ 3.1.2003) = \$516.56. This must be added to the \$470.50 per day to come up with a new slope. The solution set I need for the above table is as follows: 01/01/2003 \$ 0.00
03/01/2003 \$ 27,764.71
04/30/2003 \$ 86,993.38
06/01/2003 \$103,523.18
07/30/2003 \$151,467.11
09/01/2003 \$161,236.84
10/31/2003 \$209,743.80
12/31/2003 \$241,000.00 If you can show me how to make a SQL Query return that result from the table above, or point me to someone who can, I would be really grateful. This problem has been hanging around for a long time!
Try this, MiscTest = Your table name. SELECT a.Dt, SUM(CASE
WHEN b.EndDate <= a.Dt THEN b.Dollars
WHEN b.StartDate >= a.Dt THEN 0
ELSE b.Dollars/DATEDIFF(dy, b.StartDate, a.Dt)
END) AS SumDollars
FROM (
SELECT StartDate AS Dt
FROM MiscTest
UNION
SELECT EndDate AS Dt
FROM MiscTest
) AS a
JOIN MiscTest b
ON (b.StartDate <= a.Dt)
GROUP BY a.Dt
ORDER BY a.Dt
That is awfully close, but the totals between the first point and the last point are off. When I ran the query I got the following: 2003-01-01 \$0
2003-03-01 \$949
2003-04-30 \$57,300
2003-06-01 \$56,847
2003-07-30 \$134,762
2003-09-01 \$134,489
2003-10-31 \$180,033
2003-12-31 \$241,000 As you can see, this is a little off from the desired reult set above. It even shows the dollar sum dropping between 7/30 and 9/1, which shouldn’t be possible. It looks like maybe it’s almost there, though. I will be trying to understand what you did, but if you (or anyone else) can figure out where things went awry in the meantime I would appreciate it very much. Thanks for the reply!!
I found the problem, just a math error. I was taking the dollar amount divided by the datediff of the start date and the current date, which should have been the dollar amount divided by datediff of the start date and end date mutiplied by the datediff of the start date and the current date. Corrected query below: SELECT a.Dt, SUM(CASE
WHEN b.EndDate <= a.Dt THEN b.Dollars
WHEN b.StartDate >= a.Dt THEN 0
ELSE (b.Dollars/DATEDIFF(dy, b.StartDate, b.EndDate)) * DATEDIFF(dy, b.StartDate, a.Dt)
END) AS SumDollars
FROM (
SELECT StartDate AS Dt
FROM MiscTest
UNION
SELECT EndDate AS Dt
FROM MiscTest
) AS a
JOIN MiscTest b
ON (b.StartDate <= a.Dt)
GROUP BY a.Dt
ORDER BY a.Dt
Thank you very much! That is working great. In the real app I have to do a subquery to get the dollar amount, which I will try tonight. I really appreciate your help with this.
]]>