Site sponsored by: Idera Try Idera’s new SQL admin toolset
SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Quiz
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

Write for Us

Share you SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

Capture DDL Changes using Change Data Capture with SQL Server 2008 ...
Business Intelligence in Collaborative Planning, Forecasting and Replenishment
Inside SQL Server Cluster Setup and Troubleshooting Techniques - Part I ...
Configure and Manage Policy Based Management in SQL Server 2008 ...

More     
 
Latest FAQ's

Cannot Start SQL Server Service
Users are able to connect to report manager but not able ...
Errors when SQL Server Snapshot Replication is Running
How to Display Server Name or IP Address in a Reporting ...

More     
   
Latest Software Reviews

Spotlight on ApexSQL Doc 2008
ApexSQL Enforce
Embarcadero Change Manager
SQL Server DBA Dashboard

More     

articles >> performance tuning >> Reduce Aggravating Aggregation: Improve the Performance of ...

Reduce Aggravating Aggregation: Improve the Performance of History or Status Tables

By : Merrill Aldrich
Aug 29, 2007

Page 2 / 2

To avoid changing the application code that creates orderStatus rows, it’s possible to mimic the insert behavior of the original design by adding a trigger to the orderStatusCurrent table. The trigger will, upon insert, move any existing rows for the same orders to the orderStatusHistory table, and then insert the new rows. In this way, the application can continue with the simple model of inserting status rows as needed, but the database engine will automatically separate the latest rows from the older ones.

The trigger code looks like this:

CREATE TRIGGER moveOrderStatusToHistory 
ON orderStatusCurrent
INSTEAD OF INSERT 
AS

INSERT INTO orderStatusHistory( orderid, statusDate, status )
    SELECT existing.orderid, 
        existing.statusDate, 
        existing.status
    FROM orderStatusCurrent existing
    INNER JOIN inserted ON existing.orderid = inserted.orderid

DELETE FROM orderStatusCurrent 
    FROM orderStatusCurrent existing
    INNER JOIN inserted ON existing.orderid = inserted.orderid

INSERT INTO orderStatusCurrent( orderid, statusDate, status )
    SELECT orderid, 
        statusdate, 
        [status]
    FROM inserted 
GO

A side note on triggers: whenever creating trigger code, though it can be challenging, make absolutely certain the code is set based, and can handle multiple rows. It’s a mistake ­ and I have seen it in commercial, production code ­ to assume that a table will only be manipulated one row at a time and create a trigger based on that assumption. Triggers must correctly handle sets of rows.

The trigger has three steps: it first inserts any rows that already exist in the orderStatusCurrent table, for the same orders as the rows being inserted, into the orderStatusHistory table; it then removes those rows from orderStatusCurrent. Finally, it inserts the new, incoming status rows. In effect the new rows replace the old ones, and the old ones get “archived.”

To test the trigger, we can run the same sequence of inserts against the new orderStatusCurrent table, and watch to make sure that a) exactly one row per order remains in the table, and b) the older rows are getting moved to orderStatusHistory correctly:

/* Test against one order */
SELECT * FROM orderStatusCurrent

INSERT INTO orderStatusCurrent( orderid, statusDate, status )
VALUES ( 1000, GETDATE(), 'Fulfillment' )

SELECT * FROM orderStatusCurrent

INSERT INTO orderStatusCurrent( orderid, statusDate, status )
VALUES ( 1000, GETDATE(), 'Stocking' )

SELECT * FROM orderStatusCurrent
SELECT * FROM orderStatusHistory

INSERT INTO orderStatusCurrent( orderid, statusDate, status )
VALUES ( 1000, GETDATE(), 'Packaging' )

SELECT * FROM orderStatusCurrent
SELECT * FROM orderStatusHistory

INSERT INTO orderStatusCurrent( orderid, statusDate, status )
VALUES ( 1000, GETDATE(), 'Shipping' )

SELECT * FROM orderStatusCurrent
SELECT * FROM orderStatusHistory

DELETE FROM orderStatusCurrent
DELETE FROM orderStatusHistory
GO

If that works as expected, then we can run similar insert code to the first example, to populate the status records for all orders with test data:

INSERT INTO orderStatusCurrent ( orderid, statusdate, [status] )
    SELECT orderid, orderdate, 'Fulfillment' FROM orders

INSERT INTO orderStatusCurrent ( orderid, statusdate, [status] )
    SELECT TOP 800000 orderid, DATEADD( day, 5, orderdate ), 'Stocking'
    FROM orders ORDER BY NEWID()

INSERT INTO orderStatusCurrent ( orderid, statusdate, [status] )
    SELECT TOP 600000 orders.orderid, DATEADD( day, 5, orderStatusCurrent.statusdate ), 'Packaging'
    FROM orders INNER JOIN orderStatusCurrent ON orders.orderid = orderStatusCurrent.orderID
        AND orderStatusCurrent.status = 'Stocking'
    ORDER BY NEWID()

INSERT INTO orderStatusCurrent ( orderid, statusdate, [status] )
    SELECT TOP 400000 orders.orderid, DATEADD( day, 2, orderStatusCurrent.statusdate ), 'Shipping'
    FROM orders INNER JOIN orderStatusCurrent ON orders.orderid = orderStatusCurrent.orderID
        AND orderStatusCurrent.status = 'Packaging'
    ORDER BY NEWID()

INSERT INTO orderStatusCurrent ( orderid, statusdate, [status] )
    SELECT TOP 200000 orders.orderid, DATEADD( day, 2, orderStatusCurrent.statusdate ), 'Shipped'
    FROM orders INNER JOIN orderStatusCurrent ON orders.orderid = orderStatusCurrent.orderID
        AND orderStatusCurrent.status = 'Shipping'
    ORDER BY NEWID()

INSERT INTO orderStatusCurrent ( orderid, statusdate, [status] )
    SELECT TOP 100000 orders.orderid, DATEADD( day, 2, orderStatusCurrent.statusdate ), 'Received'
    FROM orders INNER JOIN orderStatusCurrent ON orders.orderid = orderStatusCurrent.orderID
        AND orderStatusCurrent.status = 'Shipped'
    ORDER BY NEWID()

After these inserts, the current status value for each order will be in the target table, and the trigger will have moved the previous status values to the history table. Now we can run some comparisons to see what sort of performance improvement this yields.

First, a comparison of a large query reporting the current status of each order:

/* Select using the separate history table */
SELECT    o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN orderStatusCurrent os
    ON o.orderID = os.orderID 
OPTION( MAXDOP 1 )

/* Select using the original single status table */
SELECT    o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN ( 
    SELECT orderid, MAX(statusdate) maxdate FROM orderStatus GROUP BY orderid
) lastStatusDates ON o.orderID = lastStatusDates.orderid
INNER JOIN orderStatus os
    ON o.orderID = os.orderID AND lastStatusDates.maxdate = os.statusDate
OPTION( MAXDOP 1 )

Note: I am using MAXDOP 1 to keep these test cases single-threaded, for a simpler apples-to-apples comparison. Analyzing them with parallelism is outside the scope of this exercise.

The optimizer reports that the first query, with the separate history table, has an estimated cost around 15; the original design shows a cost of 33.4, so we are getting something around twice the performance from splitting the table.

Another sample query involves fetching all orders having a specific status:

SELECT    o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN orderStatusCurrent os
    ON o.orderID = os.orderID 
WHERE os.status = 'Packaging'
OPTION( MAXDOP 1 )

SELECT    o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN ( 
    SELECT orderid, MAX(statusdate) maxdate FROM orderStatus GROUP BY orderid
) lastStatusDates ON o.orderID = lastStatusDates.orderid
INNER JOIN orderStatus os
    ON o.orderID = os.orderID AND lastStatusDates.maxdate = os.statusDate
WHERE os.status = 'Packaging'
OPTION( MAXDOP 1 )

Here the difference is even more: the split status tables have a query plan cost of 14, compared to 53 for the original design. Finally, orders for a specific customer:

SELECT    o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN orderStatusCurrent os
    ON o.orderID = os.orderID 
WHERE o.customerID = 12345
OPTION( MAXDOP 1 )

SELECT    o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN ( 
    SELECT orderid, MAX(statusdate) maxdate FROM orderStatus GROUP BY orderid
) lastStatusDates ON o.orderID = lastStatusDates.orderid
INNER JOIN orderStatus os
    ON o.orderID = os.orderID AND lastStatusDates.maxdate = os.statusDate
WHERE o.customerID = 12345
OPTION( MAXDOP 1 )

Cost for the original design 0.096; with the status table split, that improves to 0.063.

The one case that becomes slightly tricky is gathering all the status values for a particular order. In the original design, this is simply

SELECT o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN orderStatus os
    ON o.orderID = os.orderID 
WHERE o.orderID = 12345

But with the new design we are forced to do some sort of union to get both the current and past values, like these two examples:

SELECT o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    st.statusDate,
    st.status
FROM orders o
INNER JOIN ( select * from orderStatusCurrent os 
    UNION ALL
    select * from orderStatusHistory ) st
    ON o.orderID = st.orderID 
WHERE o.orderID = 12345


SELECT o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN orderStatusCurrent os
    ON o.orderID = os.orderID 
WHERE o.orderID = 12345
UNION ALL
SELECT o.orderID,
    o.customerID,
    o.orderDate,
    o.description,
    os.statusDate,
    os.status
FROM orders o
INNER JOIN orderStatusHistory os
    ON o.orderID = os.orderID 
WHERE o.orderID = 12345

Here, unsurprisingly, the original design costs about 30% less, but (this did surprise me) of the other two examples, the first is about 25% faster than the second. The optimizer does a very good job at refactoring the derived table containing the union.

There is some cost for this improvement in select performance, as one would expect. Instead of a single insert, a status change for an order record would require two inserts and a delete, moving the old record from one table to the other. So if the speed of inserts is the only concern, this change might not be advisable; however, on many systems (including the one I was working with) the select/reporting traffic outweighs the transactional load, often many times over, so this change would probably help overall.

What I took away from this study was this: for scalability, it’s obviously not wise to mix rows of different kinds in the same table ­ but that rule can apply even when the difference is as subtle as current vs. past values, that otherwise have exactly the same structure.


<< Prev Page         








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 1999-2008 by T10 Media. All rights reserved