SQL Server Performance

Chalenge improving query performance on large tables

Discussion in 'SQL Server 2008 General Developer Questions' started by CASTELLI, Sep 27, 2011.

  1. CASTELLI New Member

    Hi,

    Here's the structure of the db I'm using:
    * Table frs_ArticlesBDFournisseur which contains article main data.One row per article/color. Key is Refrefart (int). Size is over 220000 rows at the momemt. Only index is the PK (clusterred).
    * Table frs_ArticlesBDFournisseurTaille contains all the various sizes and barcode for the corresponding article/color from the previous table. Key is RefREFARTTAILLE (int). Size is over 1200000 rows at the moment. refrefart column also exists in this table to link with the previous table, altough there is no physical constraint linking the 2 tables. Indexes are PK (clustered), refrefart (non clustered) and refrefart,taille,codeean,codeeanfrs (non clustered).

    Here's what I want to do:
    I work at headquarters of franchise shops. I have to collect data from each selling point, match it with our own database (the one discribed above) to normalise the incoming data and complete it with more details. The best way to match an article is to use its barcode (when it exits in the shop data). We pack up to 2 barcodes with every article (callled CodeEan and CodeEanFrs from frs_ArticlesBDFournisseurTaille). I need to complete the information I have with the article size (called taille from frs_ArticlesBDFournisseurTaille), the article's season (called Saison from frs_ArticlesBDFournisseur) and the number of different season the article has been existing in (to know whether it is a one season product or a permanent product).

    Here's the challenge for the query at hand:
    There is no constant data in frs_ArticlesBDFournisseurTaille with which I can do a GROUP BY.
    I need to get the latest product data (in case there is more than one season) and the total number of seasons.

    Here's the query I've designed:

    SELECT M.*, ART_REF, A.ART_PSEUDO, ART_NOM, ART_FEDAS, ART_ANNEECREATION, COU_CODE, COU_NOM, TGF_NOM, MRK_IDREF, MRK_NOM, MRK_NOM_ORI, CBI_EAN, CBI_EANFRS, B.RefRefArt, B.Taille AS sp2kTaille, CASE when B.CodeEan = '' then null else B.CodeEan END AS sp2kEan, CASE when B.CodeEanFrs = '' then null else B.CodeEanFrs END AS sp2kEanFrs, B.Saison, Coalesce(B.Pos, 0) as Permanent
    FROM gin_Mouvements M
    LEFT JOIN gin_Articles A ON M.MOV_ARTID=A.ART_ID
    LEFT JOIN gin_Couleurs C ON M.MOV_COUID=C.COU_ID
    LEFT JOIN gin_Tailles T ON M.MOV_TGFID=T.TGF_ID
    LEFT JOIN gin_Marques N ON A.ART_MRKID=N.MRK_ID
    LEFT JOIN gin_Ean E ON (M.MOV_ARTID=E.CBI_ARTID AND M.MOV_COUID=E.CBI_COUID AND M.MOV_TGFID=E.CBI_TGFID)
    OUTER APPLY (SELECT TOP 1 T.RefRefArt, CodeEan, CodeEANFrs, Taille, Art.Saison,
    Rank() over (order by T.RefREFART ASC) as Pos
    FROM frs_ArticlesBDFournisseurTaille T
    LEFT JOIN frs_ArticlesBDFournisseur Art ON T.RefREFART=Art.RefREFART
    WHERE (CodeEAN = E.CBI_EAN OR CodeEANFrs = CBI_EANFRS
    OR CodeEANFrs = CBI_EAN OR CodeEAN = CBI_EANFRS)
    ORDER BY RefREFART DESC) B

    Query explained:
    All the first part, with all the left join, is to gather the data, relatively small, from the shops. There is no performance issue here.
    I've used an outer apply to avoid having to do a left join and the ON conditions which I couldn't get to work to return at the same time the max(refrefart) and the total number of season; remember I can not make a group by.

    Performances:
    This query runs at a 20row/s at best (nearly 20s to return 500 rows). The data in both tables described at the beginning is bound to increase twice a year by about 10000 new products and 6 times more for the barcodes. I'm afraid the performances will be degrading even more soon.

    Advice ?:
    If anyone can see how to improve the query or rewrite the query differently or improve the db or else ... please let me know.

    Stats:
    You'll find the execution plan attached. As a sum up: it is spending 57% on Nested Loops (Left Outer Join) and 40% on Index Scan (NonClustered). They are thick lines but they are mostly to the right of the diagram.
    The client statistics based on a TOP 500 run of the query is also attached.

    Thank you.

    Attached Files:

  2. FrankKalis Moderator

    Welcome to the forum!
    Looks like the OR clause in the OUTER APPLY is the killer. I would experiement with the indexes on frs_ArticlesBDFournisseurTaille to try to get to a seek instead of a scan. Maybe rearrange the order of the columns from refrefart,taille,codeean,codeeanfrs to codeean,codeeanfrs INCLUDE (refrefart,taille) or maybe split into 2 indexes codeean INCLUDE (refrefart,taille) and codeeanfrs INCLUDE (refrefart,taille). Strictly speaking INCLUDING refrefart should not be necessary as it is indexed anyway, but this is just an inexpensive way to help the optimiser.
  3. CASTELLI New Member

    Hi Frank,

    It worked wonders (2 indexes with the INCLUDE)!
    40625 rows in 1,62s !
    I'm really amazed to see how just an index twick improves the performance by that much.
    I'll have to learn a bit more about this.

    Thanks very much
  4. satya Moderator

    You can simply take such advantage by looking at Activity Monitor from SQL Server Management Studio, as you are uisng 2008 version there is a columne RECENT EXPENSIVE QUERIES where you can select certain query to see the execution plan which in turn specifies if there is any missing index for that query. Then you can add the index to take advantage during next execution.
  5. FrankKalis Moderator

    Glad, I could help. Picking the "right" index works really wonders. INCLUDE() is mostly overlooked in that regard. I learned this myself the hard way. :)
  6. preethi Member

    Is it possible to be more specific on the index improvement you have made.
    I suspected few clustered index scans as the culprits, but just want to verify. (For my learning)

    Apart from duration, there are other things you may have to worry at times: Do you have any of the "hot spot" tables? If so, it is better to introduce temp tables. I have seen them bringing down the locks on those busy tables drastically when you have outer joins.

Share This Page