SQL Server Performance

Selecting only most recent row in a related table

Discussion in 'SQL Server 2005 General Developer Questions' started by Heartsbane, Oct 24, 2009.

  1. Heartsbane New Member

    Consider a database with 2 large tables -- Book and BookLog. The Book table contains millions of book records and BookLog contains activity about those books (e.g. acquired, checked out, checked in, damaged, repaired, etc.). Each Book could have 0 to many (thousands) of log entries. Each log entry contains a datetime that indicated when it was recorded.
    I need to create a stored proc that will return all books along with the most recent log entry (if one exists). I can think of a few ways to do this, but all require multiple steps/queries and suffer from performance problems. Anyone have any suggestions for a query / stored proc like this that maximizes performance for large tables?
    I run into this scenario quite often - you've got a master table then a related table with 0 to many rows in it, and want to return a query that shows the last (or first or whatever) related record. I've never found a solution to this type of query that I'm happy with.
  2. davidfarr Member

    My prefered solution to this type of scenario is to create a new column on the BookLog table and write an INSERT trigger on this table that updates this column with an index number of some kind to mark the first or last log entry record for that ISBN reference.
    As a simple example; the column can be a BIT data type with a trigger on the table that updates (or sets the inserted value) of the most recent record to value 1 and also updates the previous record (having the same ISBN number) from value 1 to value 0.
    You could expand on this same idea to create a more elaborate set of index numbers in an INT column, with specific numbers to indicate first or last records, etc. The performance benefit will be that the trigger will only be updating two records per ISBN per INSERT. SELECT queries can then join the tables on the foreign key reference (perhaps ISBN) and the column that is updated by the trigger, to join on the last record, or first record, etc.
    If you want even faster performance, and if you're only interested in the last log record per book, you could set a BooksLog table trigger to update a column in the Books table with the latest Primary Key value from the BookLog table for a specific ISBN. The subsequent SELECT would then only need to join on this one PK value, which can be referenced in both tables, almost like having a second foreign key with a 1 to 1 relationship.
  3. Heartsbane New Member

    Thanks for these suggestions. I like both ideas. Unfortunately in my case I can't (easily) modify the underlying table structures, create triggers, etc. However I will keep these suggestions in mind, and may be able to make a strong enough argument to implement this type of approach.
  4. satya Moderator

    What is the hardware and SQL version you are talking here?
    I bet most of the times the performance optimization can be achieved if the hardware is good enough when there is a little chance of schema changes for normalization.

Share This Page