LEFT OUTER JOIN *FIRST* Table2 -Possible? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

LEFT OUTER JOIN *FIRST* Table2 -Possible?

I’m trying to do a query that I don’t even know is possible. I’d like to get all of my customers, and the LAST product they purchased in a result set. I don’t want multiple products for each customer in the results because of the size of the result set. Table A is a customer table. It looks something like this:
CustID, CustName
01, Bill Knight
02, Bob Harrington
03, Doyle Brunson Table B is a product table that looks something like this:
CustID, ProductID, ProductName, DatePurchased
01, 01, Pencil, 08/01/04
01, 03, Chalk, 08/03/04
01, 02, Eraser, 08/07/04
02, 03, Chalk, 08/02/04
03, 02, Eraser, 08/04/04
03, 01, Pencil, 08/05/04 I’d like to do a query like the following:
SELECT Distinct CustID, CustName, ProductID, ProductName
FROM CustomerTable C
LEFT OUTER JOIN ***FIRST ROW ONLY OF*** ProductTable P ON C.CustID = P.CustID
(Product Table would be ordered by DateOrdered DESC) The expected result would be a single result set with the following data:
01, Bill Knight, 02, Eraser
02, Bob Harrington, 03, Chalk
03, Doyle Brunson, 01, Pencil As you can see, the result set is a list of the distinct customers with the last product they purchased ONLY. I’ve played with syntax, but there doesn’t seem to be any easy way to do this… Any ideas?
You haven’t mentioned what is Primary key in Table B. If you get an ID for this table, then it would be easy.
ID, CustId, ProductId, …
01, 01, 01, Pencil, 08/01/04
02, 01, 03, Chalk, 08/03/04
03, 01, 02, Eraser, 08/07/04
04, 02, 03, Chalk, 08/02/04
05, 03, 02, Eraser, 08/04/04
06, 03, 01, Pencil, 08/05/04
I will post the SELECT for you.
CanadaDBA
The same product, on the same day, can be purchased by the same customer, so a primary key doesn’t seem necessary to store the data I need. If it is necessary for the query, then assume your ID column is fine. If it matters, the primary key on the customer table can be the CustID.
Here is the code. Probably it could be written easier than this but I am too busy to think. <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> <br /><pre><br />SELECT Id, A.CustId, CustName, ProductId, ProductName<br /> FROM ProductTable A INNER Join CustTable B ON A.CustId = B.CustId<br /> WHERE Id in (<br />SELECT Max(A.Id)<br /> FROM ProductTable A INNER Join CustTable B ON A.CustId = B.CustId<br /> Group BY A.CustId, CustName )<br /></pre><br /><br /><br />CanadaDBA
That seems like it would work *IF* the IDs were always ordered in the same sequence as the DateOrdered. Luckily, in my particular case, I think it might work for now, so thanks! But, if the DateOrdred can be manually entered by a user (maybe for back orders) then the DateOrdred and the ID will not always be ordered the same. This is especially true if there is any converted data in the product table. Is there a more foolproof method out there that doesn’t need a key in the product table to be ordered in the same sequence as the OrderedDate?
What about playing a little bit around with TOP and/or GROUP BY ? –Frank
http://www.insidesql.de

As far as what I can tell, TOP doesn’t work as an aggregate used with the group by clause. I’ve tried TOP, TOP 1, FIRST, FIRST ROW, FIRST ROW ONLY, and other stuff like that in place of the aggreagate functions (Max) in select list for the group by. I’ve tried playing around with the what order the key words were in, using them like key words, using them as functions, etc. Nothing seems to work, or at least not the way I expected it to… My hope was that something like this was possible: SELECT C.CustId, C.CustName, FIRST ROW ONLY (P.*)
FROM CustTable C
LEFT OUTER JOIN ProductTable P ON C.CustID = P.CustID
GROUP BY C.CustId, C.CustName
ORDER BY C.CustId, C.CustName, P.DateOrdered DESC To me, that seems like a pretty basic concept, but without having a unique ID that’s ordered in a way that directly makes an aggregate function work, I can’t seem to find a way to do this.
Well, you need to use a derived table (in a FROM statement you use a SELECT statement instead of a table name). The derived table in itself uses either a subquery or again a derived table that tells you the last ID for each client. This is where you really need the unique key on ProductTable to be an IDENTITY field so you know which record was entered last. Here’s a test script that shows how it can be done (just the key fields included): CREATE TABLE Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
GO CREATE TABLE ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT)
GO INSERT INTO Customer (Name) VALUES (‘Jantje’)
INSERT INTO Customer (Name) VALUES (‘Pietje’) INSERT INTO ProductTable (CustID) VALUES (1)
INSERT INTO ProductTable (CustID) VALUES (1)
INSERT INTO ProductTable (CustID) VALUES (1)
INSERT INTO ProductTable (CustID) VALUES (2)
INSERT INTO ProductTable (CustID) VALUES (2) SELECT * FROM Customer
SELECT * FROM ProductTable SELECT Customer.[Name], LastPurchase.PurchaseID
FROM Customer
INNER JOIN
(SELECT LastOneOnly.PurchaseID, LastOneOnly.CustID FROM ProductTable AS LastOneOnly
WHERE LastOneOnly.PurchaseID = (SELECT MAX(PurchaseID) FROM ProductTable WHERE CustID = LastOneOnly.CustID)) AS LastPurchase
ON Customer.CustID = LastPurchase.CustID DROP TABLE Customer
GO
DROP TABLE ProductTable
GO And these are the results of that query:
Jantje – 3
Pietje – 5

quote:Originally posted by IdRatherBeProgramming As far as what I can tell, TOP doesn’t work as an aggregate used with the group by clause. I’ve tried TOP, TOP 1, FIRST, FIRST ROW, FIRST ROW ONLY, and other stuff like that in place of the aggreagate functions (Max) in select list for the group by. I’ve tried playing around with the what order the key words were in, using them like key words, using them as functions, etc. Nothing seems to work, or at least not the way I expected it to… My hope was that something like this was possible: SELECT C.CustId, C.CustName, FIRST ROW ONLY (P.*)
FROM CustTable C
LEFT OUTER JOIN ProductTable P ON C.CustID = P.CustID
GROUP BY C.CustId, C.CustName
ORDER BY C.CustId, C.CustName, P.DateOrdered DESC To me, that seems like a pretty basic concept, but without having a unique ID that’s ordered in a way that directly makes an aggregate function work, I can’t seem to find a way to do this.
Sorry, I was a little bit unprecise. You’re right, although TOP works with GROUP BY, the result is not what you expect or want. but you can use something like Adriaan posted.
–Frank
http://www.insidesql.de

Adriaan, thanks for the attempt, but I believe your answer has the same problem as FarHadr… It assumes that the Identity column on the product table is in the same order as the product purchase date. FarHadr’s answer is the most simple method and I did implement it successfully in my scenario. However, knowing that I may eventually BULK LOAD DATA into this table (from a vendor), the "order date" of a product and the Identity column in the Product table may NOT always be in the same order. If I bulk load Archived data from long ago into my system, they would all get the next IDs available, which would be greater than all current records in the table (even though the "order date" may be prior to the latest record currently in the table). As you see, I cannot assume that the product ID (Identity) column is in the same order as the data I want to use to pull out my "Latest Record". Because after a bulk load, my latest record would have an ID less than anything that was just bulk inserted… The only way I know to do this 100% successfully is using the following logic: Set Results = objConn.Execute("SELECT * FROM CUSTOMERS C LEFT OUTER JOIN PRODUCTS P ON C.CustID = P.CustID ORDER BY C.CustID, P.OrderDate DESC") While Not Results.EOF
Customer = Results.Fields("C.CustomerName")
LastProduct = Results.Fields("P.ProductName") WriteToFile("Customer" & Customer & "’s Last Purchase was: " & LastProduct) CurCustID = Results.Fields("C.CustID")
LastCustID = CurCustID
While CurCustID = LastCustID and Not Results.EOF
Results.MoveNext()
if not Results.EOF then CurCustID = Results.Fields("C.CustID")
Wend
Wend Now I know that code will probably cause an infinite loop or it might skip the last customer, because I didn’t test it and I don’t think I exited it just right, but the theory is sound and in gest, works fine… The problem is, I think that pulling back all of the data to just take the first product and throw away the rest is wasteful on your bandwidth and on the CPU doing the work if you don’t need that data. I’d hope theres a way to get the SQL Server engine to do this work for you…
I would suggest that u add one identity column to your product table.
and use the following querry.
Select P.*,Identity (int,1,1) As UnqID Into TmpProd From Product P Select Distinct C.CustID,C.CustName,Prod.ProductID, Prod.ProductName From Customer C,
(
Select P.* From TmpProd P,
( Select CustID,Max(UnqID) As UnqID From TmpProd Group By CustID ) As X
Where P.CustId=X.CustID And P.UnqID=X.UnqID
) As Prod
Where C.CustId=Prod.CustID I think This Will Solve Your Problem Sandy (DB Developer)

If the IDENTITY order is not the same as the purchase date order, then this will work: CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
GO CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, PurchaseDate DATETIME)
GO INSERT INTO #Customer (Name) VALUES (‘Johnson’)
INSERT INTO #Customer (Name) VALUES (‘Williams’) INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (1, ‘2004-03-01’)
INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (1, ‘2003-04-05’)
INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (1, ‘2003-12-22’)
INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (2, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (2, ‘2004-02-03’) SELECT #Customer.[Name], LastPurchase.PurchaseDate
FROM #Customer
INNER JOIN
(SELECT LastOneOnly.CustID, LastOneOnly.PurchaseDate FROM #ProductTable AS LastOneOnly
WHERE LastOneOnly.PurchaseDate = (SELECT MAX(PurchaseDate) FROM #ProductTable WHERE CustID = LastOneOnly.CustID)
GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.PurchaseDate) AS LastPurchase
ON #Customer.CustID = LastPurchase.CustID DROP TABLE #Customer
DROP TABLE #ProductTable Results:
Johnson 2004-03-01 00:00:00.000
Williams 2004-04-02 00:00:00.000

Thx Adriaan. That does work and does give me the results I need. I don’t know how efficient that will be on LARGE tables, but it definitely does work, and will work no matter what data I put in my tables… Can you do a SELECT within a SELECT within a SELECT on tables with millions of rows and expect the result to come back in a reasonable time? I’ll keep you all posted if I get this working in my scenario…
If you look at the execution plan (running against permanent tables) SQL does two Clustered Index Scans on ProductTable, at 46% each, and the remaining 8 percent are spent on a Clustered Index Seek on Customer. With millions of rows, SQL will still be using the clustered indexes so I would expect performance to be no problem.
Well, I tried to implement that code in my system and I didn’t get a single row returned for each customer when there were multiple products purchased on the same day, I still got each product ordered on the latest day… PS…I modified adriaan’s code slightly to be closer to what I’m looking to achieve. CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
GO CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
GO INSERT INTO #Customer (Name) VALUES (‘Johnson’)
INSERT INTO #Customer (Name) VALUES (‘Williams’) INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Pencil’, ‘2004-03-01’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Eraser’, ‘2003-04-05’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Chalk’, ‘2003-12-22’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-02-03’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-02-03’) SELECT #Customer.[Name], LastPurchase.ProductName, LastPurchase.PurchaseDate
FROM #Customer
INNER JOIN
(SELECT LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate FROM #ProductTable AS LastOneOnly
WHERE LastOneOnly.PurchaseDate = (SELECT MAX(PurchaseDate) FROM #ProductTable WHERE CustID = LastOneOnly.CustID)
GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate) AS LastPurchase
ON #Customer.CustID = LastPurchase.CustID DROP TABLE #Customer
DROP TABLE #ProductTable Results:
Johnson Pencil 2004-03-01 00:00:00.000
Williams Pencil 2004-04-02 00:00:00.000
Williams Pencil 2004-04-02 00:00:00.000
Williams Eraser 2004-04-02 00:00:00.000
Williams Chalk 2004-04-02 00:00:00.000 So, multiple results come back when there are multiple products ordered on the same day… Not that in this specific scenario it make sense do demand that I only get 1, but that’s where this whole thread came from…the need to JOIN the first and only 1 child row to a parent table. You can try putting Top 1 in the 2nd select, but that doesn’t give the results I expected… Any other ideas?
If there is no dependable identity field to tell you what was the last item for a given date for a given customer, then you need either to include the time on the PurchaseDate, or a serial number within each purchase. If you want to order the items by the amounts, then you need to work that into the query the same way as we did the latest PurchaseDate.
Sadly, This is the only way I could get the results to ALWAYS give me the customer and first (latest) product that they ordered, regardless of the order of the Identity field. CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
GO CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
GO INSERT INTO #Customer (Name) VALUES (‘Johnson’)
INSERT INTO #Customer (Name) VALUES (‘Williams’) INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Pencil’, ‘2004-03-01’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Eraser’, ‘2003-04-05’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Chalk’, ‘2003-12-22’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-01-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-03-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-02-03’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-05-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-02-03’) SELECT #Customer.[Name], LastPurchase.ProductName, LastPurchase.PurchaseDate
FROM #Customer
INNER JOIN (
SELECT LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
FROM #ProductTable AS LastOneOnly
WHERE LastOneOnly.PurchaseID = (
SELECT PurchaseID
FROM (
SELECT top 1 PurchaseID
FROM #ProductTable
WHERE CustID = LastOneOnly.CustID
ORDER BY PurchaseDate DESC
) as TopProduct
WHERE CustID = LastOneOnly.CustID
)
GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
) AS LastPurchase
ON #Customer.CustID = LastPurchase.CustID DROP TABLE #Customer
DROP TABLE #ProductTable There’s got to be a better way!
You could try select
c.custID, c.CustName,
p.ProductID, p.ProductName, p.DatePurchased
from
TableA c
INNER JOIN TableB p ON (c.custID = p.CustID)
where
p.DatePurchased = (select MAX(innerP.DatePurchased) from TableB innerP where innerP.CustID = p.CustID) Depends though, if a customer bought several products on the same date theyd all be retrieved.
But if thats the case then how do you want to distinguish between those products? which one should be listed?

Chappy, Methods to retrieve all products on the latest day have already been described above. The idea is to get all distinct customers and the "FIRST" matching record in a child table only. The "FIRST" in this case is the latest product purchased, if there are multiple products purchased on the same day, then you just need to take the TOP 1 record and join it. I’m trying to find a way to get the "FIRST TOP 1 RECORD" that matches up to a parent table only, without having to pull the entire product table back and looping through it, skipping all subsequent products…
I don’t know what I was smoking when I adjusted the code to use the TOP 1 code in my last post, but I didn’t see it could be simplified slightly… CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
GO CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
GO INSERT INTO #Customer (Name) VALUES (‘Johnson’)
INSERT INTO #Customer (Name) VALUES (‘Williams’) INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Pencil’, ‘2004-03-01’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Eraser’, ‘2003-04-05’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Chalk’, ‘2003-12-22’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-01-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-03-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-04-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-02-03’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-05-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-02-03’) SELECT #Customer.[Name], LastPurchase.ProductName, LastPurchase.PurchaseDate
FROM #Customer
INNER JOIN (
SELECT LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
FROM #ProductTable AS LastOneOnly
WHERE LastOneOnly.PurchaseID = (
SELECT top 1 PurchaseID
FROM #ProductTable
WHERE CustID = LastOneOnly.CustID
ORDER BY PurchaseDate DESC
)
GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
) AS LastPurchase
ON #Customer.CustID = LastPurchase.CustID DROP TABLE #Customer
DROP TABLE #ProductTable Results:
Johnson Pencil 2004-03-01 00:00:00.000
Williams Chalk 2004-05-02 00:00:00.000 It still is a SELECT within a SELECT within a SELECT, but it seems to give the results I’m looking for…

A friend of mine e-mailed me a pretty simple answer I’ve adjusted it to work with our example SQL code. Here it is! CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
GO CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
GO INSERT INTO #Customer (Name) VALUES (‘Johnson’)
INSERT INTO #Customer (Name) VALUES (‘Williams’) INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Pencil’, ‘2004-03-01’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Eraser’, ‘2003-04-05’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, ‘Chalk’, ‘2003-12-22’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-01-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Pencil’, ‘2004-03-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-02-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Eraser’, ‘2004-02-03’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-05-02’)
INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, ‘Chalk’, ‘2004-02-03’) SELECT C.Name, P.ProductName, P.PurchaseDate
FROM #Customer AS C LEFT JOIN #ProductTable AS P ON C.CustID = P.CustID
WHERE P.PurchaseID = (
SELECT TOP 1 P2.PurchaseID
FROM #ProductTable P2
WHERE P2.CustID = C.CustID
ORDER BY P2.PurchaseDate DESC
); DROP TABLE #Customer
DROP TABLE #ProductTable Results:
Johnson Pencil 2004-03-01 00:00:00.000
Williams Chalk 2004-05-02 00:00:00.000 It looks straight forward to me and it worked with all the tests I’ve run it through. It seems so simple I wonder why I couldn’t come up with it in the first place. Thanks Tony!
]]>