SQL Server Performance

Implementing derived table fields using views

Discussion in 'T-SQL Performance Tuning for Developers' started by Yarik, Oct 4, 2005.

  1. Yarik New Member

    Hello,

    I am not sure that the term "derived table field" would be recognizable, so it is probably better to illustrate my problem at hand using a simplified example.

    Let's say, there is a table named Product and it has fields named Price and SalePrice. This table is used all across the system and is referred to by many views, sprocs, and UDFs. Some of those views/sprocs/UDFs need to calculate a product price discount based on Price and SalePrice fields. The question is: what is the best approach to encapsulate discount calculation in a single place (to avoid repeated SQL code)?

    One option that comes to mind is to create a view that would encapsulate the widely used logic into a calculated field and then use that view everywhere instead of the Product table itself. For example:



    ---------------
    -- Utility View
    ---------------
    create view ProductEx as
    select
    *, -- Or explicit list of all the Product's fields...
    (Price - SalePrice) / SalePrice as Discount
    from
    Product



    ---------------
    -- Calling code
    ---------------
    select
    ...
    ProductEx.Price -- Instead of Product.Price
    ProductEx.Discount -- Instead of complex formula with Product's fields
    ...
    from
    ProductEx -- Instead of Product


    Another option is also to create an utility view encapsulating the logic, but make this view somewhat complimentary to the original table (as opposite to "replacing" the table). For example:



    ---------------
    -- Utility View
    ---------------
    create view ProductEx as
    select
    ID, -- The primary key of the Product table
    (Price - SalePrice) / SalePrice as Discount
    from
    Product



    ---------------
    -- Calling code
    ---------------
    select
    ...
    Product.Price -- As before
    ProductEx.Discount -- Instead of complex formula with Product's fields
    ...
    from
    Product -- As before

    inner join -- Little extra code which
    ProductEx -- still feels like a fair price
    on Product.ID = ProductEx.ID -- for better encapsulation of some logic



    Putting aside the issue of which of these two options provides more maintainable code, I would like to hear an expert opinion on performance implications of these two tricks:

    (1) Are they going to substantially degrade perfomance of the client code (in comparison to the initial version where the encapsulated calculation was repeated all over the place)?

    (2) Would there be any noticeable differences in performance between these two options?


    I'd greatly appreciate any comments and advices!

    Thank you,
    Yarik.

  2. dineshasanka Moderator

  3. Yarik New Member

    Dineshasanka, the very point is that I am trying to avoid doing what you suggest.

    As I said, I simplified the actual case for the sake of shorter problem description. Now, imagine that the calculation is more complex than (A-B)/B. Imagine that I need to do this calculation in several places. I want to eliminate repeated code so that I could change this formula in one place instead of a dozen of places.

  4. Madhivanan Moderator

    If you use Front end application to show these data, then why dont you do these calculations there?

    Madhivanan

    Failing to plan is Planning to fail
  5. benwilson New Member

    Hi, not too sure what the exact solution is, but have you considered that if you do it in a view, it will perform the calculation for every Product every time you call the view before returning the Product/s you are interested in??

    I would probably create a function that takes Price and SalePrice as inputs and returns the value

    'I reject your reality and substitute my own' - Adam Savage
  6. Yarik New Member

    Perhaps, I should have mentioned the following:<br /><br />I am using SQL Server 2000, so I know that I could use a scalar UDF to encapsulate the formula. However, there seem to be only two ways to write such UDF, and both ways are not satidfactory.<br /><br />The first way would be something like this:<br /><br /><pre id="code"><font face="courier" size="2" id="code"><br /> create function Discount(@Price as currency, @SalePrice as currency) <br /> returns double<br /> as<br /> return (@Price - @SalePrice) / @SalePrice<br /></font id="code"></pre id="code"><br /><br />This should not affect performance noticeably (I think <img src='/community/emoticons/emotion-1.gif' alt=':)' />, but it has two serious drawbacks: <br /><br /><ul><br /> <li>It leads to a less convenient client code: the writer of the client code has to remember which fields of the <b>Product</b> table have to be passed to this UDF. Also the client code would be less compact than it could.</li><br /> <li>In case of the real, more complicated formula, it makes client code less maintainable than it could: for example, if the formula changes so that it depends on other fields of the <b>Product</b> table, the parameters of the UDF have to be changed appropriately and, unfortunately, the client code would have to be changed too.</li><br /></ul><br /><br /><br />Another way to write the <b>Discount</b> UDF would be:<br /><br /><pre id="code"><font face="courier" size="2" id="code"><br /> create function Discount(@ProductID as int) <br /> returns double<br /> as<br /> return select (@Price - @SalePrice) / @SalePrice from Product where ID = @ProductID<br /></font id="code"></pre id="code"><br /><br />This variant encapsulates all I want to encapsulate, but its performance really sucks when this function is called for many products...<br /><br />So I am trying to understand whether the views described in my initial post would provide a more viable solution than UDFs...<br />
  7. Yarik New Member

    To Madhivanan: Unfortunately, there are several disparate "front-ends": an Excel add-in, an MS Access application, few standalone VB applications, a couple of Java applications, you name it... They all have to "share" this formula (and not only this formula). This makes the database the easiest place to encapsulate some parts of "business logic" like the formula I am talking about...
  8. benwilson New Member

    Hi Yarik,
    In my opinion, the view would provide worse performance- as i mentioned previously, the entire view would be built and the calculation would be performed for *every* product before the product/s you were interested in were returned to you...personally, I think the second udf you mentioned would be the way to go

    Ben


    quote:Originally posted by Yarik
    So I am trying to understand whether the views described in my initial post would provide a more viable solution than UDFs...


    'I reject your reality and substitute my own' - Adam Savage
  9. Yarik New Member

    quote:
    benwilson wrote:

    Hi, not too sure what the exact solution is, but have you considered that if you do it in a view, it will perform the calculation for every Product every time you call the view before returning the Product/s you are interested in??

    Yes, that's one of the pitfalls that I expected. However, it does not look like there is any problem like that. For example, let's consider the following two simple queries



    select ID from Product where <...condition...>
    select ID from ProductEx where <...condition...>


    where condition severely limits the number of returned records (for example, in case of my database it makes the queries return only 500 records out of total 100,000 records in the Product table). If your theory is right, then the second query should work much slower. But that's not what I see: their performance appears to be identical.
    (Maybe a table with 100K records is not big enough for a general test, but it fine for our purposes as long as it is not going to have more than 500K of records in the foreseeable future.)

    In fact, I tried some more complex experiments replacing Product by ProductEx in more complex views (included so-called nested views), and so far did not notice any significant impact on performance. But I cannot quickly test each and every object in our T-SQL codebase, so I was hoping that someone already went through the similar excercise and can confirm (or not confirm) that the abovementioned tricks with views are "performance-friendly"...

  10. Yarik New Member

    quote:Originally posted by benwilson
    In my opinion, the view would provide worse performance- as i mentioned previously, the entire view would be built and the calculation would be performed for *every* product before the product/s you were interested in were returned to you...personally, I think the second udf you mentioned would be the way to go

    That's not what I see in the following simple experiment:



    select Discount from ProductEx where ...
    select dbo.Discount1(Price,SalePrice) as Discount from Product where ...
    select dbo.Discount2(ID) as Discount from Product where ...


    The first and second queries perform almost identically. In fact, the first query based on "helper-view" even appears to perform a little bit better than the second (in spite of using the very same dbo.Discount1 function internally! -- go figure!?).

    The third query performs approximately two times slower than the first two (and this ratio seems to become only higher as I loosen the condition to return more records).

    As a matter of fact, I used UDFs like Discount2 a lot. They are really handy, indeed. However, their performance really sucks as the number of calls grows...

    Speaking of "handiness", in the last code snippet I took the opportunity to demonstrate how compactness and readability of the client code with "helper-view" trick compares to those of client code using UDF. Even the shortest Discount2 makes code more cluttered...
  11. Yarik New Member

    FWIW, this is an interesting response I received in a Microsoft's newsgroup:


    quote:
    It depends. Are you using calculated values in all( or least most ) times when you select sales and sales price? If that is the case Create Calculated column in the *table* itself( why view again) and of course you can index depeds on the situation.

    if you are not using everytime and only few times you need calculated values write a trigger/procedure with job which updates another table and join the tables when ever discount is called.

    So, a computed column appears to be a very strong contender in addition to the views that I originally had in mind, at least in terms of overall code maintainability. Unfortunately, the BOL's info on computed columns is pretty scarce. In particular, the issue of potential performance impact stays open...
  12. mmarovic Active Member

    quote: To Madhivanan: Unfortunately, there are several disparate "front-ends": an Excel add-in, an MS Access application, few standalone VB applications, a couple of Java applications, you name it... They all have to "share" this formula (and not only this formula). This makes the database the easiest place to encapsulate some parts of "business logic" like the formula I am talking about...
    That's what is middle tier used for.
    quote:In my opinion, the view would provide worse performance- as i mentioned previously, the entire view would be built and the calculation would be performed for *every* product before the product/s you were interested in were returned to you
    I am afraid that's not true. View definition is embeded in query by query optimizer and then execution plan is generated just like any other query (unless it is indexed or partitioned view).

    If you insist on maintaining such code in database, I would then make db api using stored procedures. Calling store procedure can do calculation using udf (@param1*@param2/@param3). Developers need to know only which sp to call from their code. That approach has further advantages from performance, code maintenance and security point of view.
  13. Yarik New Member

    quote:Originally posted by mmarovic


    quote:
    To Madhivanan: Unfortunately, there are several disparate "front-ends"...

    That's what is middle tier used for.
    ...
    If you insist on maintaining such code in database, I would then make db api using stored procedures. Calling store procedure can do calculation using udf (@param1*@param2/@param3). Developers need to know only which sp to call from their code. That approach has further advantages from performance, code maintenance and security point of view.

    Well, a middle tier or an SP-based "API-layer" probably are good things. However, in this case I'm not designing the system from scratch. Also I do not have a luxury of approval for major refactorings (too little resources)... Hopefully, the next version of this internal product will be reshaped significantly. But at this point I'm stuck with another "fruit" of the notorius do-it-right-now-and-think-later approach, so I can make only so little steps to improve the codebase. I'm afraid, I have to insist on maintaining a lot of "business logic" in the database...

    Anyway, dìkuji for your comments, Mirko!
  14. mmarovic Active Member

    Neni zac. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />Well, you don't have to redesign the whole system, you can just package the code you are working now inside stored procedure. <br />I guess the problem may be that in that case you would have to write all procs, instead of developers making inline sql code using view/computed column you defined.<br />
  15. Yarik New Member

    quote:Originally posted by mmarovic
    Well, you don't have to redesign the whole system, you can just package the code you are working now inside stored procedure.
    I guess the problem may be that in that case you would have to write all procs, instead of developers making inline sql code using view/computed column you defined.

    That's right. I'd have to redo entire T-SQL codebase and, as a consequence, rewrite a lot of the client-side code. Alas, it's more refactoring work than our tiny team can afford right now...
  16. mmarovic Active Member

    I can share the process applied in my former company.

    DB team wrote and regurarly improved document "MSSQL Server coding standards and best practices". Based on that document all sql code had to be implemented using stored procedures. During development phase one db engineer was assign to each project. Developers saved their code using version controll system. There was project subtree for sql code. Db engineer played the role of qa during that phase. He/She checked the code. When sql code was rejected it was always with explanation how it should be done. Developers the had to fix the code. After projects was moved to qa phase, existing "db kit" compiled from all sql code in sql branch was used for releasing db changes on qa db server. In qa phase db administrator and db engineer worked together on tunnig further sql code, designining indexes and so on. After testing was done the same db kit was used for releasing db changes on production db server. After project was done sql standards document was updated based on experience on a new project and new db training was prepared. Attendees were developers and team leaders who worked on that project and new employees.

    *

    I know your team is too small, I just wanted to share how it is possible to have a good sql code without need to write everything yourself.
  17. Adriaan New Member

    For "open" user interfaces like Access, where people are used to the concept of linked tables, are you aware that you can add views as linked tables? Other than that, a calculated column seems the easiest solution to your immediate problem.

    Don't forget to relink those tables after a major design change like adding a column.

Share This Page