Is there a better way? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Is there a better way?

hi A bit of background first we have 2 major tables in a database which stores data in blobs, timeseries and timeseriesreference. We are running on MsSqlServer 2000. The table structure for both is fairly simple
timeseries
superkey, int
Savetime, datetime
timeseriesdata, image
schemaversion, int
savedby, varchar[50] this table is indexed on (superkey,Savetime)
timeseriesreference
ParentSuperkey, int
ParentSavetime, datetime
Superkey, int
Savetime, datetime this table is indexed on (parentsuperkey,parentsavetime)
superkey relates to what type of record is stored, there are approximately 700 different types of objects. Approximately 13,000 records are inserted into timeseries and around 2000 references are added daily. The total table sizes are just under 7 million and 1.5 million respectively The issue i’m having is performing a history request on the data.
A typical request could be for a year’s worth of data, but we are only interested in the last record saved daily. Another request could be for the data produced in the last month, but retrieving records with hourly frequency. Also if the primary type of data requested has a referenced type the referenced type object may have to be retrieved. Certain types of object chain more than one type, eg Type 1 references type 2 & 3 and type 2 references type 4. So for every type 1 object returned from the initial query a type 2,3 & 4 will need to be retrieved. For every object returned the timestamp of the next saved object is needed so the user knows what time period this object is valid for. The stored procedure i have written is; CREATE PROCEDURE __GetTimeseriesHistoryInterval
@SuperKey Integer,
@StartDate Datetime,
@EndDate Datetime,
@Interval Float,
@GetRefs bit
AS
–get interval’d objects
declare @CurrentTime as DateTime select
@CurrentTime = @StartDate –Create temporary table
create table #tmp_Dates (dt dateTime, ndt dateTime) –Populate date table
declare @LastTime as DateTime
declare @NextTime as DateTime select
@LastTime = 0 select
@NextTime = @CurrentTime while @CurrentTime < @EndDate + @Interval
begin
–last / next date times
declare @ldt as datetime
declare @ndt as datetime –find last timestamp
exec
FindTimeseriesObjectTimestamp @Superkey,@CurrentTime,@ldt OUTPUT if @LastTime != @ldt
begin
exec
FindNextTimeseriesObjectTimestamp @Superkey,@CurrentTime,@ndt OUTPUT insert into
#tmp_Dates
values
(@ldt,@ndt) select
@LastTime = @ldt
end select
@CurrentTime = @[email protected]
end –get tsObjects
select
Ts.Superkey,
Ts.SaveTime,
Dates.ndt as NextObjectTime,
Ts.SchemaVersion,
Ts.SavedBy,
Ts.TimeSeriesData
into
#Ts_Results
from
Timeseries as Ts
inner join
#tmp_Dates as Dates
on
Ts.savetime=Dates.dt
where
Ts.superkey = @Superkey select SuperKey, SaveTime, NextObjectTime, SchemaVersion, SavedBy, TimeSeriesData from #Ts_Results
–Get References
select
TsRef.Superkey,
TsRef.SaveTime,
TsRef.ParentSavetime
into
#Ts_Refs
from
TimeseriesReference as TsRef
inner join
#Ts_Results as TsRes on TsRef.ParentSuperKey = TsRes.Superkey
and TsRef.ParentSaveTime = TsRes.SaveTime select
SuperKey,
SaveTime,
ParentSavetime
from
#Ts_Refs while (select count(distinct(SaveTime)) from #Ts_Refs)>0 and @GetRefs = 1
begin
truncate table #Ts_Results insert
#Ts_Results
select
Ts1.Superkey,
Ts1.SaveTime,
Ts2.SaveTime as NextObjectTime,
Ts1.SchemaVersion,
Ts1.SavedBy,
Ts1.TimeSeriesData
from
#Ts_Refs as TsRefs
inner join
Timeseries as Ts1
on
TsRefs.SuperKey = Ts1.Superkey
and TsRefs.SaveTime = Ts1.SaveTime
left join
Timeseries as Ts2
on
TsRefs.Superkey = ts2.Superkey
and Ts2.savetime = ( select min(SaveTime) from TimeSeries where SaveTime > Tsrefs.SaveTime and Superkey = Tsrefs.Superkey )
order by
Ts1.Superkey select
Superkey,
SaveTime,
NextObjectTime,
SchemaVersion,
SavedBy,
TimeSeriesData
from
#Ts_Results truncate table #Ts_Refs insert
#Ts_Refs
select
TsRef.Superkey,
TsRef.SaveTime,
TsRef.ParentSavetime
from
TimeseriesReference as TsRef
inner join
#Ts_Results as TsRes on TsRef.ParentSuperKey = TsRes.Superkey
and TsRef.ParentSaveTime = TsRes.SaveTime select
SuperKey,
SaveTime,
ParentSavetime
from
#Ts_Refs
end
GO
the two stored procedures this procedure calls are FindTimeseriesObjectTimestamp and FindNextTimeseriesObjectTimestamp which find the corresponding timestamps around a specified time for a certain type Is there a better way of doing this operation or a way of optimising this procedure?

i am inclined to say yes there is a better way,
but it would require actually doing the work with the actual data, example, don’t insert the image data into the temp table if the average size of this field is large, do a join back to the original table to get this info. so is it worth it to you to hire someone to do this work?
how much time does it take to run this sproc?
is it causing problems as it is now?
I didn’t really read everything, but I found one significant improvement <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br />Replace:<pre id="code"><font face="courier" size="2" id="code">(select count(distinct(SaveTime)) from #Ts_Refs)&gt;0</font id="code"></pre id="code">with<pre id="code"><font face="courier" size="2" id="code">exists(select * from #Ts_Refs)</font id="code"></pre id="code">
I think my first attempt would be to prepopulate a temporary table containing a timestamp for each interval you are interested in. I dont know how many intervals are typical for a report so this may or may not be feasible. But then you could add an index to this temp table, and join it onto your main result set, the idea being to get it all back in one hit. This would eliminate the while loop and Id guess the performance gains are potentially large To do this would also require you to recode FindTimeseriesObjectTimestamp so that it is inline, and ditch the ‘exec’. But id recommend this anyway if possible. Additionally, you appear to repeatedly fill and truncate two temporary tables from the results of a single select. Im not sure I understand the need for the temporary table in this case. Why not just do a direct select so results go straight back to client, rather than select into temp table, and then select temp table back to client? This will cut down on the slow disk activity, help to conserve your cache, and maybe even improve concurrency to boot! If you DO need to do it via a temporary table (though I dont see why), also a suggestion is to repcreate the temporary table. You know its schema, so do a CREATE #table and then INSERT INTO #table, rather than a SELECT INTO. This gain will probably not amount to much unless the final while loop iterates a lot of times, but again its good practice to do this anyway Just my thoughts!

Maybe you could post the script for FindTimeseriesObjectTimestamp
I think it could be possible to eliminate the first while loop altogether, without even using the temporary interval table I mentioned in previous reply. This would be a better solution
hi thanks for your replies, i’ll deal with the easiest question first CREATE PROCEDURE FindTimeseriesObjectTimestamp
@SuperKey Integer,
@Timestamp Datetime,
@LastTimestamp Datetime OUTPUT
AS
select @LastTimestamp = max(SaveTime) from TimeSeries where [email protected] and SaveTime <= @Timestamp
if @LastTimestamp = NULL
begin
[email protected] is prior to the start of the data move forwards instead
select @LastTimestamp = min(SaveTime) from TimeSeries where [email protected] and SaveTime >= @Timestamp
end GO CREATE PROCEDURE FindNextTimeseriesObjectTimestamp
@SuperKey Integer,
@Timestamp Datetime,
@NextTimestamp Datetime OUTPUT
AS
select @NextTimestamp = min(SaveTime) from TimeSeries where [email protected] and SaveTime > @Timestamp GO mmarovic, implemented your improvement and for that part of the query the execution cost has sped up 10x, thanks chappy, not sure your suggestion would work with the data (unless i misunderstood it). The reason being if the user requested all the close figures for a certain object (Superkey 71) over a week in feburary ‘2005-02-07 18:00’ to ‘2005-02-11 18:00’. The last valid objects for each day of this week are: 2005-02-07 16:14:18
2005-02-08 16:27:09
2005-02-09 16:16:37
2005-02-10 16:14:18
2005-02-11 16:07:32 So if the timestamps are precalculated, they would all be at 18:00. I’ll move FindTimeseriesobjectTimestamp and findnext… inline though, or at the very least combine them. I think it’ll be possible to remove the #Ts_Results table, definately the first time its used and select it back to the client. The reason i used #Ts_Results at that stage was i could then query against the timeseriesreference table, but i already have the information i join against in the #tmp_Dates table. I’ll try this later and post the results. Unsure wether this would work inside the while loop (but doing some analysis of user requests the parameter @GetRefs is TRUE very infrequently so its not often used. Heres what the bottom part of the SP does, using the same period and interval for a week in feburary, the references returned for the above timestamps:
512005-02-07 16:13:50.000
512005-02-08 16:27:05.000
512005-02-09 16:16:35.000
512005-02-10 16:14:05.000
512005-02-11 16:06:47.000
if @GetRefs is TRUE then the objects that relate to this timestamps would also need to be retrieved from the timeseries table which is the first part of the bottom while loop (insert #Ts_Results…). Then looking up these objects in the references table returns (the bottom part of the while loop (insert #Ts_Refs): 502005-02-07 16:13:50.000
502005-02-08 16:27:05.000
502005-02-09 16:16:35.000
502005-02-10 16:14:04.000
502005-02-11 16:06:47.000 It then loops returning the objects relating to the records in #Ts_Refs. There are no more records so #Ts_refs is empty at the end of the second loop. Typical calls are daily requests over years (Normally over the full data), or shorter periods of time, 1 week 5 minute interval or 1 month 30 minute / 1 hour. Joe Chang, 1. no, budget wouldnt allow at the moment, 2. around 20 seconds to call the SP from for 5 years worth of data, this returns around 40megs worth of data. Slight problems as it causes the sql server to slow down and interfere with other databases on there. Thanks for the help + advice again, i’ll hopefully to get time today to implement some of the suggestion, if i do i’ll repost the modified SP. Tony

]]>