SQL Server Performance

Trying to create an ideal index for this query...

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by mrfoof, Jan 26, 2009.

  1. mrfoof New Member

    I'm in the process of beginning to dig into a production application that was just handed off to me.
    Digging into some of the worst offenders, I've found a few queries that were run relatively often that used UDFs in places they'd be non-SARGable, which meant indexes couldn't be used. Then I realized the indexes that were there were garbage on top.
    After being refactored and turned into a sproc, here's the query in its current state:
    IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE id=OBJECT_ID('tempdb..#TEMP_SUBPOPS_LIST'))
    DROP TABLE #TEMP_SUBPOPS_LIST

    CREATE TABLE #TEMP_SUBPOPS_LIST
    (
    subpop_id INT NOT NULL PRIMARY KEY CLUSTERED
    , subpop_name VARCHAR(2000) NULL
    );

    INSERT INTO
    #TEMP_SUBPOPS_LIST
    (
    subpop_id
    , subpop_name
    )
    SELECT
    [dbo].[data_point].[subpop_id]
    , [dbo].[ufn_get_full_subpop_name]([dbo].[data_point].[subpop_id])
    FROM
    [dbo].[data_point]
    WHERE
    [dbo].[data_point].[assembly_id] = 369
    GROUP BY
    [dbo].[data_point].[subpop_id]


    SELECT
    [dbo].[assembly].[name] AS [Assembly]
    , [dbo].[data_set].[name] AS [Data Set]
    , [dbo].[data_point].[year] AS [Year]
    , [dbo].[metric_type].[name] AS [Metric]
    , #TEMP_SUBPOPS_LIST.[subpop_name] AS [Subpop]
    , [dbo].[location].[name] AS [Location]
    , CASE [dbo].[data_point].[gender]
    WHEN 'M' THEN 'Male'
    ELSE 'Female'
    END AS [Gender]
    , ' ' + [dbo].[age_cohort].[name] AS [Age Cohort]
    , SUM([dbo].[data_point].[pop]) AS [Population]
    , AVG([dbo].[data_point].[rate]) AS [Rate]
    , SUM([dbo].[data_point].[value]) AS [Cases]
    , MAX([dbo].[data_point].[rate_factor]) AS [Rate Factor]
    , MAX([dbo].[data_point].[pop_factor]) AS [Population Factor]
    , [dbo].[age_cohort].[sort_order]
    , CASE [dbo].[data_point].[gender]
    WHEN 'M' THEN 1
    ELSE 2
    END AS [GenderSort]
    FROM
    [dbo].[data_point]
    INNER JOIN [dbo].[assembly]
    ON [dbo].[assembly].[id] = [dbo].[data_point].[assembly_id]
    INNER JOIN [dbo].[data_set]
    ON [dbo].[data_set].[id] = [dbo].[data_point].[data_set_id]
    INNER JOIN [dbo].[metric_type]
    ON [dbo].[metric_type].[id] = [dbo].[data_point].[metric_type_id]
    INNER JOIN #TEMP_SUBPOPS_LIST
    ON #TEMP_SUBPOPS_LIST.subpop_id = [dbo].[data_point].[subpop_id]
    INNER JOIN [dbo].[location]
    ON [dbo].[location].[id] = [dbo].[data_point].[location_id]
    INNER JOIN [dbo].[age_cohort]
    ON [dbo].[age_cohort].[id] = [dbo].[data_point].[age_cohort_id]
    WHERE
    [dbo].[data_point].[assembly_id] = 369
    GROUP BY
    [dbo].[assembly].[name]
    , [dbo].[data_set].[name]
    , [dbo].[data_point].[year]
    , [dbo].[metric_type].[name]
    , #TEMP_SUBPOPS_LIST.[subpop_name]
    , [dbo].[location].[name]
    , CASE [dbo].[data_point].[gender]
    WHEN 'M' THEN 'Male'
    ELSE 'Female'
    END
    , [dbo].[age_cohort].[name]
    , [dbo].[age_cohort].[sort_order]
    , CASE [dbo].[data_point].[gender]
    WHEN 'M' THEN 1
    ELSE 2
    END
    ORDER BY
    [Assembly]
    , [Data Set]
    , [Year]
    , [Metric]
    , [Subpop]
    , [Location]
    , [Gender]
    , [GenderSort]
    , [dbo].[age_cohort].[sort_order]

    IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE id=OBJECT_ID('tempdb..#TEMP_SUBPOPS_LIST'))
    DROP TABLE #TEMP_SUBPOPS_LIST
    The problem is creating an index that'll work despite the subpop_name needing to be in that position in the sorted result set. Without subpops, I can just create the following index and it's quick -- reads were reduced from the original 3.1 million in my worst-case test scenario to about 3600. But with it, gah, you're back up to around 230K. Maybe I'm just not caffeinated this morning, but I can't seem to come up with a good solution to getting this down to a much more manageable number of reads.
    Here's the index I'd use if subpops didn't have to be included, but again, trying to figure out what change to make to get this performing as desired.
    CREATE NONCLUSTERED INDEX [index_data_point_TEST] ON [dbo].[data_point]
    (
    [assembly_id] ASC
    , [data_set_id] ASC
    , [year] ASC
    , [metric_type_id] ASC
    , [location_id] ASC
    , [age_cohort_id] ASC
    )
    INCLUDE
    (
    [gender]
    , [pop]
    , [rate]
    , [value]
    , [rate_factor]
    , [pop_factor]
    )
    Thanks in advance.
  2. preethi Member

    Query performance is highly depending on the data distribution and rows in the tables. It is quite difficult to answer your question without details of the tables. I am firing soem shorts in dark Assuming that data_point is your single large table involved in this query. If so, you can try out a few things:
    • Creating seperate indexes for each column
    • Changing the order ofthe columns. Assmebly_id is a good candidate to be the first column as it is part of WHERE clause.
    • What is the Clustered index on this table. If it is not on assumbly_id you can try creating a clustered index on assemply_Id and check the performance.
    Hope this helps.

Share This Page