SQL Server Performance

LEFT OUTER JOIN *FIRST* Table2 -Possible?

Discussion in 'General DBA Questions' started by IdRatherBeProgramming, Aug 23, 2004.

  1. I'm trying to do a query that I don't even know is possible. I'd like to get all of my customers, and the LAST product they purchased in a result set. I don't want multiple products for each customer in the results because of the size of the result set.

    Table A is a customer table. It looks something like this:
    CustID, CustName
    01, Bill Knight
    02, Bob Harrington
    03, Doyle Brunson

    Table B is a product table that looks something like this:
    CustID, ProductID, ProductName, DatePurchased
    01, 01, Pencil, 08/01/04
    01, 03, Chalk, 08/03/04
    01, 02, Eraser, 08/07/04
    02, 03, Chalk, 08/02/04
    03, 02, Eraser, 08/04/04
    03, 01, Pencil, 08/05/04

    I'd like to do a query like the following:
    SELECT Distinct CustID, CustName, ProductID, ProductName
    FROM CustomerTable C
    LEFT OUTER JOIN ***FIRST ROW ONLY OF*** ProductTable P ON C.CustID = P.CustID
    (Product Table would be ordered by DateOrdered DESC)

    The expected result would be a single result set with the following data:
    01, Bill Knight, 02, Eraser
    02, Bob Harrington, 03, Chalk
    03, Doyle Brunson, 01, Pencil

    As you can see, the result set is a list of the distinct customers with the last product they purchased ONLY. I've played with syntax, but there doesn't seem to be any easy way to do this...

    Any ideas?
  2. CanadaDBA New Member

    You haven't mentioned what is Primary key in Table B. If you get an ID for this table, then it would be easy.


    ID, CustId, ProductId, ...
    01, 01, 01, Pencil, 08/01/04
    02, 01, 03, Chalk, 08/03/04
    03, 01, 02, Eraser, 08/07/04
    04, 02, 03, Chalk, 08/02/04
    05, 03, 02, Eraser, 08/04/04
    06, 03, 01, Pencil, 08/05/04
    I will post the SELECT for you.


    CanadaDBA
  3. The same product, on the same day, can be purchased by the same customer, so a primary key doesn't seem necessary to store the data I need. If it is necessary for the query, then assume your ID column is fine.

    If it matters, the primary key on the customer table can be the CustID.
  4. CanadaDBA New Member

    Here is the code. Probably it could be written easier than this but I am too busy to think. <img src='/community/emoticons/emotion-1.gif' alt=':)' /> <br /><pre><br />SELECT Id, A.CustId, CustName, ProductId, ProductName<br /> FROM ProductTable A INNER Join CustTable B ON A.CustId = B.CustId<br /> WHERE Id in (<br />SELECT Max(A.Id)<br /> FROM ProductTable A INNER Join CustTable B ON A.CustId = B.CustId<br /> Group BY A.CustId, CustName )<br /></pre><br /><br /><br />CanadaDBA
  5. That seems like it would work *IF* the IDs were always ordered in the same sequence as the DateOrdered.

    Luckily, in my particular case, I think it might work for now, so thanks!

    But, if the DateOrdred can be manually entered by a user (maybe for back orders) then the DateOrdred and the ID will not always be ordered the same. This is especially true if there is any converted data in the product table.

    Is there a more foolproof method out there that doesn't need a key in the product table to be ordered in the same sequence as the OrderedDate?
  6. FrankKalis Moderator

  7. As far as what I can tell, TOP doesn't work as an aggregate used with the group by clause. I've tried TOP, TOP 1, FIRST, FIRST ROW, FIRST ROW ONLY, and other stuff like that in place of the aggreagate functions (Max) in select list for the group by. I've tried playing around with the what order the key words were in, using them like key words, using them as functions, etc. Nothing seems to work, or at least not the way I expected it to...

    My hope was that something like this was possible:

    SELECT C.CustId, C.CustName, FIRST ROW ONLY (P.*)
    FROM CustTable C
    LEFT OUTER JOIN ProductTable P ON C.CustID = P.CustID
    GROUP BY C.CustId, C.CustName
    ORDER BY C.CustId, C.CustName, P.DateOrdered DESC

    To me, that seems like a pretty basic concept, but without having a unique ID that's ordered in a way that directly makes an aggregate function work, I can't seem to find a way to do this.
  8. Adriaan New Member

    Well, you need to use a derived table (in a FROM statement you use a SELECT statement instead of a table name). The derived table in itself uses either a subquery or again a derived table that tells you the last ID for each client. This is where you really need the unique key on ProductTable to be an IDENTITY field so you know which record was entered last.

    Here's a test script that shows how it can be done (just the key fields included):

    CREATE TABLE Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
    GO

    CREATE TABLE ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT)
    GO

    INSERT INTO Customer (Name) VALUES ('Jantje')
    INSERT INTO Customer (Name) VALUES ('Pietje')

    INSERT INTO ProductTable (CustID) VALUES (1)
    INSERT INTO ProductTable (CustID) VALUES (1)
    INSERT INTO ProductTable (CustID) VALUES (1)
    INSERT INTO ProductTable (CustID) VALUES (2)
    INSERT INTO ProductTable (CustID) VALUES (2)

    SELECT * FROM Customer
    SELECT * FROM ProductTable

    SELECT Customer.[Name], LastPurchase.PurchaseID
    FROM Customer
    INNER JOIN
    (SELECT LastOneOnly.PurchaseID, LastOneOnly.CustID FROM ProductTable AS LastOneOnly
    WHERE LastOneOnly.PurchaseID = (SELECT MAX(PurchaseID) FROM ProductTable WHERE CustID = LastOneOnly.CustID)) AS LastPurchase
    ON Customer.CustID = LastPurchase.CustID

    DROP TABLE Customer
    GO
    DROP TABLE ProductTable
    GO

    And these are the results of that query:
    Jantje - 3
    Pietje - 5
  9. FrankKalis Moderator

    quote:Originally posted by IdRatherBeProgramming

    As far as what I can tell, TOP doesn't work as an aggregate used with the group by clause. I've tried TOP, TOP 1, FIRST, FIRST ROW, FIRST ROW ONLY, and other stuff like that in place of the aggreagate functions (Max) in select list for the group by. I've tried playing around with the what order the key words were in, using them like key words, using them as functions, etc. Nothing seems to work, or at least not the way I expected it to...

    My hope was that something like this was possible:

    SELECT C.CustId, C.CustName, FIRST ROW ONLY (P.*)
    FROM CustTable C
    LEFT OUTER JOIN ProductTable P ON C.CustID = P.CustID
    GROUP BY C.CustId, C.CustName
    ORDER BY C.CustId, C.CustName, P.DateOrdered DESC

    To me, that seems like a pretty basic concept, but without having a unique ID that's ordered in a way that directly makes an aggregate function work, I can't seem to find a way to do this.
    Sorry, I was a little bit unprecise. You're right, although TOP works with GROUP BY, the result is not what you expect or want. but you can use something like Adriaan posted.


    --Frank
    http://www.insidesql.de
  10. Adriaan, thanks for the attempt, but I believe your answer has the same problem as FarHadr...

    It assumes that the Identity column on the product table is in the same order as the product purchase date. FarHadr's answer is the most simple method and I did implement it successfully in my scenario.

    However, knowing that I may eventually BULK LOAD DATA into this table (from a vendor), the "order date" of a product and the Identity column in the Product table may NOT always be in the same order. If I bulk load Archived data from long ago into my system, they would all get the next IDs available, which would be greater than all current records in the table (even though the "order date" may be prior to the latest record currently in the table).

    As you see, I cannot assume that the product ID (Identity) column is in the same order as the data I want to use to pull out my "Latest Record". Because after a bulk load, my latest record would have an ID less than anything that was just bulk inserted...

    The only way I know to do this 100% successfully is using the following logic:

    Set Results = objConn.Execute("SELECT * FROM CUSTOMERS C LEFT OUTER JOIN PRODUCTS P ON C.CustID = P.CustID ORDER BY C.CustID, P.OrderDate DESC")

    While Not Results.EOF
    Customer = Results.Fields("C.CustomerName")
    LastProduct = Results.Fields("P.ProductName")

    WriteToFile("Customer" & Customer & "'s Last Purchase was: " & LastProduct)

    CurCustID = Results.Fields("C.CustID")
    LastCustID = CurCustID
    While CurCustID = LastCustID and Not Results.EOF
    Results.MoveNext()
    if not Results.EOF then CurCustID = Results.Fields("C.CustID")
    Wend
    Wend

    Now I know that code will probably cause an infinite loop or it might skip the last customer, because I didn't test it and I don't think I exited it just right, but the theory is sound and in gest, works fine...

    The problem is, I think that pulling back all of the data to just take the first product and throw away the rest is wasteful on your bandwidth and on the CPU doing the work if you don't need that data.

    I'd hope theres a way to get the SQL Server engine to do this work for you...
  11. sundeip New Member

    I would suggest that u add one identity column to your product table.
    and use the following querry.


    Select P.*,Identity (int,1,1) As UnqID Into TmpProd From Product P

    Select Distinct C.CustID,C.CustName,Prod.ProductID, Prod.ProductName From Customer C,
    (
    Select P.* From TmpProd P,
    ( Select CustID,Max(UnqID) As UnqID From TmpProd Group By CustID ) As X
    Where P.CustId=X.CustID And P.UnqID=X.UnqID
    ) As Prod
    Where C.CustId=Prod.CustID

    I think This Will Solve Your Problem



    Sandy

    (DB Developer)
  12. Adriaan New Member

    If the IDENTITY order is not the same as the purchase date order, then this will work:



    CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
    GO

    CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, PurchaseDate DATETIME)
    GO

    INSERT INTO #Customer (Name) VALUES ('Johnson')
    INSERT INTO #Customer (Name) VALUES ('Williams')

    INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (1, '2004-03-01')
    INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (1, '2003-04-05')
    INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (1, '2003-12-22')
    INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (2, '2004-04-02')
    INSERT INTO #ProductTable (CustID, PurchaseDate) VALUES (2, '2004-02-03')

    SELECT #Customer.[Name], LastPurchase.PurchaseDate
    FROM #Customer
    INNER JOIN
    (SELECT LastOneOnly.CustID, LastOneOnly.PurchaseDate FROM #ProductTable AS LastOneOnly
    WHERE LastOneOnly.PurchaseDate = (SELECT MAX(PurchaseDate) FROM #ProductTable WHERE CustID = LastOneOnly.CustID)
    GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.PurchaseDate) AS LastPurchase
    ON #Customer.CustID = LastPurchase.CustID

    DROP TABLE #Customer
    DROP TABLE #ProductTable

    Results:
    Johnson 2004-03-01 00:00:00.000
    Williams 2004-04-02 00:00:00.000
  13. Thx Adriaan. That does work and does give me the results I need. I don't know how efficient that will be on LARGE tables, but it definitely does work, and will work no matter what data I put in my tables...

    Can you do a SELECT within a SELECT within a SELECT on tables with millions of rows and expect the result to come back in a reasonable time? I'll keep you all posted if I get this working in my scenario...
  14. Adriaan New Member

    If you look at the execution plan (running against permanent tables) SQL does two Clustered Index Scans on ProductTable, at 46% each, and the remaining 8 percent are spent on a Clustered Index Seek on Customer. With millions of rows, SQL will still be using the clustered indexes so I would expect performance to be no problem.
  15. Well, I tried to implement that code in my system and I didn't get a single row returned for each customer when there were multiple products purchased on the same day, I still got each product ordered on the latest day...

    PS...I modified adriaan's code slightly to be closer to what I'm looking to achieve.



    CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
    GO

    CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
    GO

    INSERT INTO #Customer (Name) VALUES ('Johnson')
    INSERT INTO #Customer (Name) VALUES ('Williams')

    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Pencil', '2004-03-01')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Eraser', '2003-04-05')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Chalk', '2003-12-22')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-04-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-04-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-04-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-02-03')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-04-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-02-03')

    SELECT #Customer.[Name], LastPurchase.ProductName, LastPurchase.PurchaseDate
    FROM #Customer
    INNER JOIN
    (SELECT LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate FROM #ProductTable AS LastOneOnly
    WHERE LastOneOnly.PurchaseDate = (SELECT MAX(PurchaseDate) FROM #ProductTable WHERE CustID = LastOneOnly.CustID)
    GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate) AS LastPurchase
    ON #Customer.CustID = LastPurchase.CustID

    DROP TABLE #Customer
    DROP TABLE #ProductTable

    Results:
    Johnson Pencil 2004-03-01 00:00:00.000
    Williams Pencil 2004-04-02 00:00:00.000
    Williams Pencil 2004-04-02 00:00:00.000
    Williams Eraser 2004-04-02 00:00:00.000
    Williams Chalk 2004-04-02 00:00:00.000

    So, multiple results come back when there are multiple products ordered on the same day...

    Not that in this specific scenario it make sense do demand that I only get 1, but that's where this whole thread came from...the need to JOIN the first and only 1 child row to a parent table. You can try putting Top 1 in the 2nd select, but that doesn't give the results I expected...

    Any other ideas?
  16. Adriaan New Member

    If there is no dependable identity field to tell you what was the last item for a given date for a given customer, then you need either to include the time on the PurchaseDate, or a serial number within each purchase. If you want to order the items by the amounts, then you need to work that into the query the same way as we did the latest PurchaseDate.
  17. Sadly, This is the only way I could get the results to ALWAYS give me the customer and first (latest) product that they ordered, regardless of the order of the Identity field.



    CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
    GO

    CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
    GO

    INSERT INTO #Customer (Name) VALUES ('Johnson')
    INSERT INTO #Customer (Name) VALUES ('Williams')

    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Pencil', '2004-03-01')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Eraser', '2003-04-05')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Chalk', '2003-12-22')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-01-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-03-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-04-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-02-03')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-05-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-02-03')

    SELECT #Customer.[Name], LastPurchase.ProductName, LastPurchase.PurchaseDate
    FROM #Customer
    INNER JOIN (
    SELECT LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
    FROM #ProductTable AS LastOneOnly
    WHERE LastOneOnly.PurchaseID = (
    SELECT PurchaseID
    FROM (
    SELECT top 1 PurchaseID
    FROM #ProductTable
    WHERE CustID = LastOneOnly.CustID
    ORDER BY PurchaseDate DESC
    ) as TopProduct
    WHERE CustID = LastOneOnly.CustID
    )
    GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
    ) AS LastPurchase
    ON #Customer.CustID = LastPurchase.CustID

    DROP TABLE #Customer
    DROP TABLE #ProductTable

    There's got to be a better way!
  18. Chappy New Member

    You could try



    select
    c.custID, c.CustName,
    p.ProductID, p.ProductName, p.DatePurchased
    from
    TableA c
    INNER JOIN TableB p ON (c.custID = p.CustID)
    where
    p.DatePurchased = (select MAX(innerP.DatePurchased) from TableB innerP where innerP.CustID = p.CustID)

    Depends though, if a customer bought several products on the same date theyd all be retrieved.
    But if thats the case then how do you want to distinguish between those products? which one should be listed?
  19. Chappy, Methods to retrieve all products on the latest day have already been described above. The idea is to get all distinct customers and the "FIRST" matching record in a child table only. The "FIRST" in this case is the latest product purchased, if there are multiple products purchased on the same day, then you just need to take the TOP 1 record and join it.

    I'm trying to find a way to get the "FIRST TOP 1 RECORD" that matches up to a parent table only, without having to pull the entire product table back and looping through it, skipping all subsequent products...
  20. I don't know what I was smoking when I adjusted the code to use the TOP 1 code in my last post, but I didn't see it could be simplified slightly...



    CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
    GO

    CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
    GO

    INSERT INTO #Customer (Name) VALUES ('Johnson')
    INSERT INTO #Customer (Name) VALUES ('Williams')

    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Pencil', '2004-03-01')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Eraser', '2003-04-05')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Chalk', '2003-12-22')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-01-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-03-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-04-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-02-03')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-05-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-02-03')

    SELECT #Customer.[Name], LastPurchase.ProductName, LastPurchase.PurchaseDate
    FROM #Customer
    INNER JOIN (
    SELECT LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
    FROM #ProductTable AS LastOneOnly
    WHERE LastOneOnly.PurchaseID = (
    SELECT top 1 PurchaseID
    FROM #ProductTable
    WHERE CustID = LastOneOnly.CustID
    ORDER BY PurchaseDate DESC
    )
    GROUP BY LastOneOnly.PurchaseID, LastOneOnly.CustID, LastOneOnly.ProductName, LastOneOnly.PurchaseDate
    ) AS LastPurchase
    ON #Customer.CustID = LastPurchase.CustID

    DROP TABLE #Customer
    DROP TABLE #ProductTable

    Results:
    Johnson Pencil 2004-03-01 00:00:00.000
    Williams Chalk 2004-05-02 00:00:00.000

    It still is a SELECT within a SELECT within a SELECT, but it seems to give the results I'm looking for...
  21. A friend of mine e-mailed me a pretty simple answer I've adjusted it to work with our example SQL code. Here it is!



    CREATE TABLE #Customer (CustID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(100))
    GO

    CREATE TABLE #ProductTable (PurchaseID INT IDENTITY (1,1) PRIMARY KEY, CustID INT, ProductName CHAR(20), PurchaseDate DATETIME)
    GO

    INSERT INTO #Customer (Name) VALUES ('Johnson')
    INSERT INTO #Customer (Name) VALUES ('Williams')

    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Pencil', '2004-03-01')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Eraser', '2003-04-05')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (1, 'Chalk', '2003-12-22')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-01-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Pencil', '2004-03-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-02-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Eraser', '2004-02-03')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-05-02')
    INSERT INTO #ProductTable (CustID, ProductName, PurchaseDate) VALUES (2, 'Chalk', '2004-02-03')

    SELECT C.Name, P.ProductName, P.PurchaseDate
    FROM #Customer AS C LEFT JOIN #ProductTable AS P ON C.CustID = P.CustID
    WHERE P.PurchaseID = (
    SELECT TOP 1 P2.PurchaseID
    FROM #ProductTable P2
    WHERE P2.CustID = C.CustID
    ORDER BY P2.PurchaseDate DESC
    );

    DROP TABLE #Customer
    DROP TABLE #ProductTable

    Results:
    Johnson Pencil 2004-03-01 00:00:00.000
    Williams Chalk 2004-05-02 00:00:00.000

    It looks straight forward to me and it worked with all the tests I've run it through. It seems so simple I wonder why I couldn't come up with it in the first place. Thanks Tony!

Share This Page