SQL Server Performance Forum – Threads Archive

Cant reference alias from a derived table 🙁

Can anyone please help me with this TSQL problem Ive had for quite a while now.. Pasted below is a description of what im trying to do, sample data, schema definitions, and what the results should look like. The main query fails because I cant access a top level alias from within a derived table (subquery). It is not an option to do it in any other way than a single TSQL statement. Why? Because this SQL is then passed to a stored procedure, which executes it and analyses its result set. Also this query Ive pasted is only a simple example, and could be merely a subquery in itself, of a much larger query. Please ask any questions which might make this clearer to you. Ive tried newsgroups, the web, pretty much everywhere and still havent found a solution. Thanks for any advice…. ———————————- We have three tables. Horses (NEW_H), Race Headers (NEW_RH), and a link table to say horses in race (NEW_HIR). I want to pull out all qualifying horses, and the race they ran in. In order for a horse to qualify for a race, the average of its 2 highest
ratings (ill call this AVGRAT), gained prior to the race, must be higher
than any other horses AVGRAT running in this race.
————— — TEST DATA AND GENERATION SCRIPT STARTS HERE —————— — Race Header table. An entry in this table represents a single race at a
— given datetime. This links to NEW_HIR (Horses in Race), so you can find
— all horses which ran in this race. Primary key is RH_RNo DROP TABLE NEW_RH
CREATE TABLE [NEW_RH] (
[RH_RNo] [int] NULL ,
[RH_CourseID] [int] NULL ,
[RH_Date] [smalldatetime] NULL ,
[RH_Name] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[RH_Value] [float] NULL ,
[RH_NoOfRunners] [int] NULL ,
) ON [PRIMARY]
GO — Horses in Race table.
— Each horse running in each race has an entry here.
— Primary key is (HIR_RNo, HIR_HNo)
DROP TABLE NEW_HIR
CREATE TABLE [NEW_HIR] (
[HIR_RNo] [int] NULL ,
[HIR_HNo] [int] NULL ,
[HIR_JNo] [int] NULL ,
[HIR_TNo] [int] NULL ,
[HIR_PositionNo] [int] NULL ,
[HIR_Rating] [int] NULL ,
[HIR_OddsID] [int] NULL
) ON [PRIMARY]
GO — Horse Names. Links H_No from NEW_HIR to a horse name
— primary key is H_No
DROP TABLE NEW_H
CREATE TABLE [NEW_H] (
[H_No] [int] NULL ,
[H_Name] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO — TEST DATA Insert into NEW_H (H_No, H_NAME) VALUES (1, ‘Horse 1’)
Insert into NEW_H (H_No, H_NAME) VALUES (2, ‘Horse 2’)
Insert into NEW_H (H_No, H_NAME) VALUES (3, ‘Horse 3’) insert into NEW_RH (RH_RNo, RH_CourseID, RH_Date, RH_Name, RH_Value, RH_NoOfRunners) Values (1, 1, ‘2002-07-07’, ‘Historical Race #1’, 5000, 3)
insert into NEW_RH (RH_RNo, RH_CourseID, RH_Date, RH_Name, RH_Value, RH_NoOfRunners) Values (2, 1, ‘2002-08-08’, ‘Historical Race #2’, 6000, 3)
insert into NEW_RH (RH_RNo, RH_CourseID, RH_Date, RH_Name, RH_Value, RH_NoOfRunners) Values (3, 1, ‘2002-09-09’, ‘Historical Race #3’, 7000, 3)
insert into NEW_RH (RH_RNo, RH_CourseID, RH_Date, RH_Name, RH_Value, RH_NoOfRunners) Values (4, 1, ‘2002-10-10’, ‘SQL TEST Handicap Race’, 10000, 3) — historical race #1
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (1, 1, 1, 1, 1, 50, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (1, 2, 1, 1, 2, 30, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (1, 3, 1, 1, 3, 20, 1)
— historical race #2
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (2, 2, 1, 1, 1, 70, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (2, 3, 1, 1, 2, 40, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (2, 1, 1, 1, 3, 15, 1)
— historical race #3
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (3, 2, 1, 1, 1, 80, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (3, 3, 1, 1, 2, 25, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (3, 1, 1, 1, 3, 60, 1)
— SQL TEST Race
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (4, 2, 1, 1, null, null, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (4, 3, 1, 1, null, null, 1)
insert into NEW_HIR (HIR_RNo, HIR_HNo, HIR_JNo, HIR_TNo, HIR_PositionNo, HIR_Rating, HIR_OddsID) VALUES (4, 1, 1, 1, null, null, 1) — THIS IS THE QUERY WHICH FAILS (BaseHIR Alias is not recognised inside the derived table) select BaseH.H_Name, BaseH.H_No, BaseRH.R_No
from
NEW_H as BaseH, NEW_HIR as BaseHIR, NEW_RH as BaseRH
where
(BaseH.H_No = BaseHIR.HIR_HNo) and
(BaseHIR.HIR_HNo =
(select TOP 1 GroupedAverages.HIR_HNo from
(select AVG(TopThreeRatings.HIR_Rating) as AverageRating, TopThreeRatings.HIR_HNo from
(select TOP 2 AvgHIR.HIR_Rating, AvgHIR.HIR_HNo
from NEW_HIR as AvgHIR
where
AvgHIR.HIR_RNo < BaseHIR.HIR_RNo and
AvgHIR.HIR_Rating <> 0 and
AvgHIR.HIR_HNo IN (select HIR_HNo from NEW_HIR as RunnersHIR where HIR_RNo = BaseHIR.HIR_RNo) and
AvgHIR.HIR_Rating IS NOT NULL
order by AvgHIR.HIR_Rating DESC) TopThreeRatings
group by TopThreeRatings.HIR_HNo) GroupedAverages
order by AverageRating DESC)) and
(BaseRH.RH_RNo = BaseHIR.HIR_RNo)
Order By BaseRH.RH_Date — INFO The fixed query ought to pull out three entries with the sample data
race 2, Horse 1 (AVG=50)
race 3, Horse 2 (AVG=50)
race 4, Horse 2 (AVG=75) race 1 does not appear because no horses have any ratings before race 1 was run.
race 2 is the same as the average of each horses rating from race 1 (only 1 rating each, therefore horse 1 qualifgies with a rating of 50)
race 3 takes average ratings from both race 1 and 2. Horse 2 qualifies with a average of (30+70)/2 = 50
race 4 takes average ratings from races 1, 2 and 3. Horse 2 is pulled out again with the highest average being (70+80)/2 = 75

Btw, please delete this if you feel it is off topic. I realise it is not performance tuning as such, but if a query is failing even to parse, one could say it doesnt really perform very well (hehe)!

Okay, I’ve been working all day, jumping from SQL code to ASP code to analysing various server perfomance data to troubleshooting IIS problems. My brain is already turning into mush then I see this! <img src=’/community/emoticons/emotion-5.gif’ alt=’;-)’ /> I don’t have an immediate answer, although my first impression is that you are trying to be too specific with your nested derived tables. I’ll try look at this tomorrow, but would suggest maybe rethinking the logic a bit. Maybe try to incorporate the HIR_RNo into the derived tables and join on that, rather than having the sub-query so deep in there… ie, try to get rid of the inner IN clause and replace it with a join. I’ll think on it and get back to you.

Hehe yeah, its not the most straight forward of problems even to explain, which is why ive now got a standard script to present when asking the question in various places. Thanks for taking an interest, Im at a dead end with it and its coming close to impacting my project. You could be right about the subquery being better as a join, but im unsure if its possible. The reason I use so many subqueries is because i need to get the top two highest ratings for each horse (prior to the current race), then take an average of them for each horse, and then pick the horse with the highest average. Perhaps a user defined function could be useful, but I was hoping to avoid that because i think performance would not be so good.

I thought id resurrect this thread because the forum has had a lot of new members since I posted it some time ago. I am still totally stumped as to how to rewrite the query detailed above. Could anyone please suggest a way I can do the following (given the schema and sample data in initial post)..
For EACH race (in Table NEW_RH) ..
FOR each horse (in Table NEW_H) in current race (according to NEW_HIR.HIR_RNo = NEW_H.H_No)
Find this horses highest two ratings (NEW_H.H_Rating), gained prior to the date (NEW_RH.RH_Date) of this race
Average these two ratings for this horse
For this race, select the horse with the highest average
My problem is the query I have written appears to be correct, but it does not parse due to the fact I use derived tables. Ive tried to restructure it to use Joins but I am struggling. If anyone has any thoughts please advise, as this is a big sticking point for me now.
Thanks!

Chappy – I’ll try to help with the part of finding the 2 highest rating…
SELECT TOP 2 x.HIR_HNo, AVG(x.MAX_rating) av_MAX_rating
FROM (
— the highest score
SELECT HIR_HNo, MAX(HIR_Rating) MAX_rating
FROM NEW_HIR a
WHERE HIR_Rating IS NOT NULL
GROUP BY HIR_HNo UNION ALL
— the second highest score (not sure what you want to do if they are equal)
SELECT b.HIR_HNo, MAX(b.HIR_Rating) MAX_rating
FROM NEW_HIR a, NEW_HIR b
where a.HIR_HNo = b.HIR_HNo
AND a.HIR_Rating > b.HIR_Rating
AND a.HIR_Rating IS NOT NULL
GROUP BY b.HIR_HNo
) x
GROUP BY HIR_HNo
ORDER BY av_MAX_rating DESC
Let me know if I’m in the right direction… Bambola.

Hi Bambola, Thanks for spending time on this. This certainly looks useful, although this query needs running for every race in the outside loop, and the top average must come from a horse in the current race, and only ratings gained prior to the current races date can form part of the average. I think I will attempt to turn this into a UDF, which takes the current raceID and racedate to do this, and see how it fares. Interesting way in which you found the top 2, I hadnt thought of using it that way. Thanks

I thought about UDF myself. don’t know if I’m getting any closer…
ALTER function AvarageMaxRating()
RETURNS TABLE
AS
RETURN
SELECT TOP 2 x.HIR_HNo HorseNumber, AVG(x.MAX_rating) Av_MAX_rating, RH_Date
FROM (
— the highest score
SELECT HIR_HNo, MAX(HIR_Rating) MAX_rating, b.RH_Date RH_Date
FROM NEW_HIR a
INNER JOIN NEW_RH b ON b.RH_RNo = a.HIR_HNo
WHERE a.HIR_Rating IS NOT NULL
—
GROUP BY HIR_HNo, RH_Date UNION ALL
— the second highest score (not sure what you want to do if they are equal)
SELECT b.HIR_HNo, MAX(b.HIR_Rating) MAX_rating, c.RH_Date RH_Date
FROM NEW_HIR a INNER JOIN NEW_HIR b
ON a.HIR_HNo = b.HIR_HNo
INNER JOIN NEW_RH c
ON c.RH_RNo = a.HIR_HNo
WHERE a.HIR_Rating > b.HIR_Rating
AND a.HIR_Rating IS NOT NULL
GROUP BY b.HIR_HNo, RH_Date
) x
GROUP BY HIR_HNo, RH_Date
ORDER BY av_MAX_rating DESC SELECT a.*, h.H_Name, r.*
FROM dbo.NEW_HIR i
INNER JOIN dbo.NEW_H h
ON h.H_No = i.HIR_HNo
INNER JOIN dbo.NEW_RH r
ON r.RH_RNo = i.HIR_RNo
INNER JOIN dbo.AvarageMaxRating() a
ON a.HorseNumber = i.HIR_HNo AND r.RH_Date = a.RH_Date Bambola.

Thanks for your continued help. This looks promising, but is not quite there… For a given race it needs to get average of two highest ratings (prior to race date) for each horse in that race, and then pick the horse with the highest average. I think this might be achieved by passing in the racedate to the UDF you have designed,
and making sure only ratings prior to this date can form part of the average.
Then ill change the function to only return one horse (the top one) for each race, then I can join this result set onto my main query

Making progress now.. <pre> ALTER function AverageMaxRating(@currentRaceDate datetime, @currentRaceID int) RETURNS TABLE AS RETURN SELECT TOP 1 x.HIR_HNo HorseNumber, AVG(x.MAX_rating) Av_MAX_rating FROM ( – find the highest rating (gained before @currentRaceDate) for this horse SELECT HIR_HNo, MAX(HIR_Rating) MAX_rating FROM NEW_HIR a INNER JOIN NEW_RH b ON b.RH_RNo = a.HIR_RNo WHERE ISNULL(a.HIR_Rating, 0) > 0 and b.RH_DateTime < @currentRaceDate and a.HIR_HNo IN (select HIR_HNo from NEW_HIR d where d.HIR_RNo = @currentRaceID) GROUP BY HIR_HNo UNION ALL – find the 2nd highest rating for this horse (gained before @currentRaceDate) – this is trickier than it looks, because 2nd value could equate to the first, so – find the rating where there exists exactly one rating above or equal to it SELECT b.HIR_HNo, MAX(b.HIR_Rating) MAX_rating FROM NEW_RH c INNER JOIN NEW_HIR b ON (c.RH_RNo = b.HIR_RNo) WHERE ISNULL(b.HIR_Rating, 0) > 0 and b.HIR_HNo IN (select HIR_HNo from NEW_HIR d where d.HIR_RNo = @currentRaceID) and EXISTS (SELECT HIR_HNo from NEW_HIR e INNER JOIN NEW_RH f ON (f.RH_RNo = e.HIR_RNo) where b.HIR_HNo = e.HIR_HNo and b.HIR_Rating <= e.HIR_Rating and b.HIR_RNo <> e.HIR_RNo and f.RH_DateTime < @currentRaceDate) and c.RH_DateTime < @currentRaceDate GROUP BY b.HIR_HNo ) x GROUP BY HIR_HNo ORDER BY av_MAX_rating DESC GO </pre> The UDF idea turned out to be a good one, It allows the inner queries to access the parameters, where previously these were part of the outside query and couldnt be accessed by a derived table. Given a race ID, and the date that race occurred on (I could query to find this based on the given Race ID, but the outer query will have it already to hand anyway. Id need to test to see whether this actually improves performance, but intuition says it ought to), this UDF finds the two highest ratings for each horse in that race, averages them, and spits out the horse who averaged the highest. Jobs a good ‘un <img src=’/community/emoticons/emotion-1.gif’ alt=’

‘ /> Now Ill see if I can adapt it so that it can work off @N averages instead of 2! Many thanks for all your help bambola. I was really struggling to think outside the subquery tactic I was using prior to your suggestion.

]]>