# I really need an efficient way to do this...

Discussion in 'General Developer Questions' started by airjrdn, Jul 27, 2005.

1. ### airjrdnNew Member

I'd like some input on potentially more efficient ways of doing this. It seems overly complicated and I'm hoping there's a better way.

In this code, the following represents:
---------------------------------------
Product = Table with products for sale
ProductPrice = Table with Prices for the products
* Note that the same product purchased at different times may have different prices
** Note that the product may have multiple price entries for the same EffDate - newest UpdateDate should be chosen
*** Since products could have been priced when the first entry for an EffDate was in effect, those records must be retained

[other code here]
from Product p
Inner Join ProductPrice pp On p.a = pp.a
and p.b = pp.b
and p.c = pp.c
and p.PurchaseDate >= pp.EffDate-- Price must have been in effect at this purchase date
and pp.EffDate = (SelectMax(EffDate)-- Newest price that was in effect for this product
FromProductPrice pp2
Wherep.a = pp2.a
andp.b = pp2.b
andp.c = pp2.c
andpp2.EffDate <= p.PurchaseDate)
and pp.UpdateDate = (SelectMax(UpdateDate)-- Newest price with that effective date
FromProductPrice pp2
Wherep.a = pp2.a
andp.b = pp2.b
andp.c = pp2.c
andpp2.EffDate = pp.EffDate)

Posting table structures, sample data and expected result would be helpful to give the solution

Failing to plan is Planning to fail
3. ### FrankKalisModerator

USE Northwind
SELECT t1.OrderID, t1.Quantity FROM [order details] t1
WHERE t1.Quantity=
(SELECT MAX(Quantity) FROM [order details] t2
WHERE t1.orderid=t2.orderid)
ORDER BY t1.orderid

SELECT t1.* FROM [order details] t1 INNER JOIN
(SELECT orderid, MAX(Quantity) AS maxdate FROM [order details] GROUP BY orderid) t2
ON t1.orderid = t2.orderid
AND t1.Quantity = t2.maxdate
ORDER BY t1.orderid

are two other alternatives. Are you experienceing any performance issues with your query?

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstÃ¼tze PASS Deutschland e.V. http://www.sqlpass.de)
4. ### airjrdnNew Member

I'm pretty reluctant to post structures and actual code without getting prior approval. I tried as best I could to post very similar code though.

As for performance issues, yes. The process this runs within takes around 12 hours to run. Granted, we have a large # of records, but this is an integral part of the overall process.

We process just under a million records a day. For each of these records, we currently run through this process. For business reasons, we may end up having to run through this same process multiple times per record.

I'm contemplating a process that would pre-build a table, having already determined the most recently updated record per date, then simply doing a 1 to 1 join to that table. While I think this would be the most efficient, I was wondering if someone might have stumbled across something in the past that would be a more efficient way of accomplishing this.

One of the problems may be that your query's resultset contains duplicate lines because of the following line in your JOIN definition: p.PurchaseDate >= pp.EffDate. If you're suppressing the duplicate lines with a DISTINCT clause, this will slow down your query. From what I can tell, you should be able to evaluate this bit in an EXISTS clause, which doesn't produce duplicate rows.

I would also use derived tables to give you the MAX(EffDate) and MAX(UpdateDate) - use GROUP BY on the columns on which you will link the main table to each derived table.
6. ### airjrdnNew Member

They aren't being suppressed with a distinct, the "and pp.UpdateDate = (SelectMax(UpdateDate)" line handles that.<br /><br />The realism is that we will get records today stating that the same product was purchased on multiple dates. The price may be different depending on the date. We need the latest price that was in effect when the product was purchased.<br /><br />Clear as mud isn't it? <img src='/community/emoticons/emotion-1.gif' alt='' /><br />

Okay, let's do this one step at a time.

You need a query that tells you, for each actual purchase date of each product, what the most recent price was at that time.

Assuming that the product is defined by columns a, b and c, this would be my guess:

SELECT Product.a, Product.b, Product.c, Product.PurchaseDate,
--correlated subquery - start
(SELECT TOP 1 pp.Price FROM ProductPrice pp
WHERE pp.a = Product.a AND pp.b = Product.b AND pp.c = Product.c
AND pp.EffDate <= Product.PurchaseDate ORDER BY pp.EffDate DESC, pp.UpdateDate DESC) AS Effective_Price
--correlated subquery - end
FROM Product

Let's look at that subquery to see if I got it right:

SELECT TOP 1 pp.Price FROM ProductPrice pp
WHERE pp.a = Product.a AND pp.b = Product.b AND pp.c = Product.c
AND pp.EffDate <= Product.PurchaseDate
ORDER BY pp.EffDate DESC, pp.UpdateDate DESC

The "TOP 1" clauses should make sure I have only one return value.
The "WHERE" clause makes sure it's
(1) for the same product (a, b and c columns)
(2) for EffDate on or before the PurchaseDate
The "ORDER BY" clause makes sure it's
(1) for the most recent EffDate on or before the PurchaseDate
(2) for the most recently updated record for the most recent EffDate
8. ### airjrdnNew Member

I think you've got it right. Let me put some code together to test it for performance.

Edit...

One thing not accounted for...the updatedate.

There could be two entries w/the same EffectiveDate. We use whichever one was updated (inserted really) most recently.

So it's the newest effective date, and if there's more than one, it's the one with the newest updatedate.

No, the UpdateDate IS being accounted for, as we ORDER BY both EffectiveDate DESC and UpdateDate DESC, in that order.

So if you happen to have two entries with the same EffectiveDate at the top of the list, the one with the most recent UpdateDate comes out on top.

Is it possible to have two entries with the same EffectiveDate and UpdateDate? Then you need to add a third column to the ORDER BY clause to break the tie.
10. ### airjrdnNew Member

No, there shouldn't be a conflict w/the update date. I'll give this a whirl.
11. ### airjrdnNew Member

Sorry for the delay, I just now had a chance to give this a shot.

The table in the original code that's equivalent to ProductPrice here is referenced 45 (yeah, 45) times in the original procedure in the select. Obviously, it wouldn't be more efficient to do the subselect that many times. I tried joining to it as a derived table, but in a derived table, you can't join back out to the outer tables. Meaning this won't work:

SELECT Product.a, Product.b, Product.c, Product.PurchaseDate, PurchPrice.Price
FROM Product
INNER JOIN (SELECT TOP 1 pp.Price FROM ProductPrice pp
WHERE pp.a = Product.a AND pp.b = Product.b AND pp.c = Product.c
AND pp.EffDate <= Product.PurchaseDate ORDER BY pp.EffDate DESC, pp.UpdateDate DESC) PurchPrice

Is there any other way to make it work that you can think of?

12. ### FrankKalisModerator

I might be missing something, but have you noticed the second SELECT I've posted in my first reply?

SELECT t1.* FROM [order details] t1
INNER JOIN
(SELECT orderid, MAX(Quantity) AS maxdate
FROM [order details]
GROUP BY orderid) t2 ON t1.orderid = t2.orderid
AND t1.Quantity = t2.maxdate ORDER BY t1.orderid

Isn't that applicable to what you need?

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstÃ¼tze PASS Deutschland e.V. http://www.sqlpass.de)
13. ### airjrdnNew Member

Trying now...

I believe that in SQL 2005 you will be able to create a "derived table" at the start of your stored procedure, and then access that as if it were a regular table.

Of course the same can already be accomplished with a temp table that holds "most recent Price per EffDate per product" like so:

CREATE TABLE #TMP (a VARCHAR(5), b. VARCHAR(5), c VARCHAR(5), EffDate DATETIME, Price FLOAT)
-- You may want to add some indexing here ...

INSERT INTO #TMP
SELECT pp.a, pp.b, pp.c, pp.EffDate, pp.Price
FROM ProductPrice pp
WHERE NOT EXISTS
(SELECT * FROM ProductPrice pp2
WHERE pp2.a = pp.a AND pp2.b = pp.b AND pp2.c = pp.c
AND pp2.EffDate = pp.EffDate AND pp2.UpdateDate > pp.UpdateDate)

-- (1) Cases where PurchaseDate matches an EffDate
SELECT p.a, p.b, p.c, p.PurchaseDate, t.Price
FROM Product p
INNER JOIN #TMP ON p.a = #TMP.a AND p.b = #TMP.b AND p.c = #TMP.c
AND p.PurchaseDate = #TMP.EffDate
-- (2) cases without matching EffDate: we match on latest EffDate before PurchaseDate
UNION ALL
SELECT p.a, p.b, p.c, p.PurchaseDate, t1.Price
FROM Product p
INNER JOIN #TMP t1 ON p.a = t1.a AND p.b = t1.b AND p.c = t1.c
WHERE NOT EXISTS
(SELECT t2.a FROM #TMP t2
WHERE p.a = t2.a AND p.b = t2.b AND p.c = t2.c AND p.PurchaseDate = t2.PurchaseDate)
AND t1.EffDate =
(SELECT MAX(t3.EffDate) FROM #TMP t3
WHERE p.a = t3.a AND p.b = t3.b AND p.c = t3.c AND t3.EffDate < p.PurchaseDate)
15. ### airjrdnNew Member

That's actually what I spent some time doing late last week.

Before running the select, I pre-built the table with the latest updatedate for each effective date. On a join condition with relatively few rows, it took 23% of the time while the original code took 77% of the time. However, when the join conditions changed, it took 47% while the original took 53%.

I've got an idea I'd like to try for a clustered index, but given the current processes that are running, I can't really try it right now.

FrankKalis - I'm running the version similar to what you posted now.

Not sure what you mean by "when the join conditions changed" - the join statement is static?

If you're comparing the response time of queries in a batch, you should clear out buffers before, inbetween and after to get an accurate timing. This can be done with some DBCC commands that hopefully someone will add to this thread ... and you probably should not do that on a production server.
17. ### airjrdnNew Member

I got permission to post the actual code. Given the formatting, I wasn't sure how to make it look "nice" in here, so I simply pasted it within a code block. Note that even this is a stripped down version, the actual code is updating these fields in a table. I've converted it to a select for testing purposes.

Also, there are of course other tables, etc. being used here, but this should give you a better idea of what I'm working with.

Does this help?

selectCDRArchive.CDRRecordID,
When 2 Then ((AdminRateCode.MsgRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))/CDRArchive.BMOU)
Else 0
When 2 Then (AdminRateCode.MinRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))
Else 0
End
When 2 Then (AdminRateCode.MinRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))
Else 0
End
When 2 Then ((AdminRateCode.MsgRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))/CDRArchive.BMOU)
Else 0
End
Else Null
End,
When 2 Then ((AdminRateCode.MsgConsortiumRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))/CDRArchive.BMOU)
Else 0
When 2 Then (AdminRateCode.MinConsortiumRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))
Else 0
End
When 2 Then (AdminRateCode.MinConsortiumRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))
Else 0
End
When 2 Then ((AdminRateCode.MsgConsortiumRateAmt * (CDRArchive.CDRNoSurCostAmt/CDRArchive.BMOU))/CDRArchive.BMOU)
Else 0
End
Else Null
End,
ContractRateCodeID4 = Case When AdminRateCode.MsgConsortiumRateAmt Is Not Null Or AdminRateCode.MinConsortiumRateAmt Is Not Null
Else Null
End,
CDRStatsUpdateFlag = Left(Cast(AdminRateCode.RateApplyCodeID As Varchar(10)), 1),
intoDefaultDB..QwestRateTest_Orig_Version
FromQwestTieredMonthly.dbo.tbCDRMonthly CDRArchive-- try a clustered index on CarrierID, ProductID, JurisdictionID, ProductTypeCodeID, DACodeID
--FromQwestMonthly.dbo.tbCDRMonthly_200506 CDRArchive
Inner Join CDRRating.dbo.tbCarrierProductCode CarrierProductCode OnCDRArchive.CarrierID = CarrierProductCode.CarrierID
AndCDRArchive.ProductID = CarrierProductCode.ProductID
AndCDRArchive.JurisdictionID = CarrierProductCode.JurisdictionID
AndCDRArchive.ProductTypeCodeID = CarrierProductCode.ProductTypeCodeID
AndCDRArchive.DACodeID = CarrierProductCode.DACodeID
InnerJoin ANPICorporate.dbo.tbState StateCode OnCDRArchive.StateCalling = StateCode.StateCode
Andz.EffectiveDate <= CDRArchive.CallDate)
WhereCarrierProductCode.CarrierID = 12
--WhereCarrierProductCode.CarrierID = 8
FromCDRRating.dbo.tbRateSchedule RS
OnRS.RateScheduleID = AM.RateScheduleID
InnerJoin ANPICorporate.dbo.tbState SC
OnCDRArchive.StateCalling = SC.StateCode
Andz2.EffectiveDate <= CDRArchive.CallDate)
InnerJoin CDRRating.dbo.tbCarrierProductCode CPC
OnAM.CarrierProductCodeID = CPC.CarrierProductCodeID
WhereCPC.CarrierID = CDRArchive.CarrierID
AndCPC.ProductID = CDRArchive.ProductID
AndCPC.JurisdictionID = CDRArchive.JurisdictionID
AndCPC.ProductTypeCodeID = CDRArchive.ProductTypeCodeID
AndCPC.DACodeID = CDRArchive.DACodeID)

quote:Originally posted by airjrdn

Before running the select, I pre-built the table with the latest updatedate for each effective date.
Not sure, but perhaps you read my post before the final edit: I replaced UpdateDate with EffDate in a few places in the script. You need EffDate, not UpdateDate - but of course you can copy the UpdateDate value into the #TMP table as well if you need it for the query further down.
19. ### airjrdnNew Member

Sorry....

Regarding the join conditions changing, there are multiple versions of this proc that are very similar. The only difference being that one version might join on OrigStateCodeID, and another version might join on OrigStateCodeID and OrigOCNCode.

Looking at that query, I don't think the expressions in the SELECT list are too bad: for testing, if you replace them with a handful of column names then the response time should not really improve much.

Again, I don't think you need to double-check on UpdateDate for EffDate, as long as you do that when compiling the aggregate table.

A small issue is that you should delete the DISTINCT keyword from: