Best Practices for Versioning Data | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Best Practices for Versioning Data

At the company I work for, the DBA and I (I am a developer) have a disagreement about what is the best way to version data. The DBA believes that it is best to keep prior versions of a row’s data in a seperate, identical table and only the most current version of the data should be stored in the main table. I think that it is better to keep all row data together in the same table and have a version number in that table to use (with the MAX function) to retrieve the most current data. What is the better solution? The DBA argues that using MAX will degrade performance of our queries. I argue that maintaining these duplicate tables will be harder because any changes made to the main table must be reflected to the copy table. I also think that creating and maintaining stored procedures, not to mention the application code, would be more of a headache. But it has been standard practice at this company to handle versioning in this manner and if that is what’s best practice, then I am all for it. But somehow, I think that this isn’t the way it should be done. Can someone point me (us) in the right direction? Also, before anyone asks, there will probably be between 200K-300K rows of data we will need to store with maybe 1/3 of that being the most current version.
Well, storing the data in a separated table has the charming aspect that you keep your production table as small as possible. Keeping archive data in a prod table means you blow up that table, thus forcing SQL Server to carry out more read operations than are necessary to fulfill a user’s request. I’m not really sure if you notice a serious performance hit either way with only 300k rows at all, but personally I would also choose a separated table. Just my $0.02 cents anyway —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs

Whilst its not uncommon to store all historic data in the same table, as well as the current, I do agree with the DBA in that its not really desirable to do a MAX everytime you want to retrieve current data. How often would you need to refer to historic data? Is it used in any reports or routines, or is it simply to keep as an audit trail? I suspect your resistance to this method is because it makes it more difficult to write queries. Im not sure this should be a valid argument, since using the DBS’s method, once the query is written, its written. Whereas using your method, it impacts the database performance permanently. An alternative method is a creation datetime (timestamp) and a current 0/1 tinyint. Then at least you could simply say where current = 1 which would have less impact than doing a MAX.
Also you could place an indexed view on the current data and query that, which would help alleviate the performance problem
]]>