SQL Server Performance

How to avoid duplicate rows without using distinct

Discussion in 'T-SQL Performance Tuning for Developers' started by tcshekar, Feb 10, 2006.

  1. tcshekar New Member

    We have a query which returns duplicates in ID column - want to know how to write a query which avoids duplicate in the query results (WITHOUT using DISTINCT)

    Please advice.
  2. FrankKalis Moderator

  3. Adriaan New Member

    Your query probably has JOINs on child tables. If there are multiple rows on the child table with matching ID values, then you get the duplicates in your results.

    You need to give us more details about what's in your query before we can make any further suggestions.

    The simple answer would be to use GROUP BY ID but that is assuming there are no other columns returned by your query. Also, GROUP BY is essentially the same as DISTINCT.
  4. Madhivanan Moderator

    Also post some sample data and the result you want. Why do you not want to use Distinct?

    Madhivanan

    Failing to plan is Planning to fail
  5. tcshekar New Member

    I have only one column returning duplicated IDs. So if we use GROUP BY, is efficient to DISTINCT.

  6. Adriaan New Member

    If you check the execution plans for GROUP BY and DISTINCT versions of the same query, you will see that they are identical.

    So the queries will be equally efficient.
  7. Madhivanan Moderator

    quote:Originally posted by tcshekar

    I have only one column returning duplicated IDs. So if we use GROUP BY, is efficient to DISTINCT.


    Both will have same impact.
    Test this with execution plan


    select distinct col1, col2, col3 from
    (
    select 1 as col1, 345 as col2, 'test' as col3 union all
    select 1 as col1, 345 as col2, 'test' as col3 union all
    select 1 as col1, 345 as col2, 'test' as col3 union all
    select 16 as col1, 35 as col2, 'test' as col3 union all
    select 16 as col1, 35 as col2, 'test' as col3 union all
    select 17 as col1, 5 as col2, 'test' as col3
    ) T


    select distinct col1, col2, col3 from
    (
    select 1 as col1, 345 as col2, 'test' as col3 union all
    select 1 as col1, 345 as col2, 'test' as col3 union all
    select 1 as col1, 345 as col2, 'test' as col3 union all
    select 16 as col1, 35 as col2, 'test' as col3 union all
    select 16 as col1, 35 as col2, 'test' as col3 union all
    select 17 as col1, 5 as col2, 'test' as col3
    ) T group by col1, col2, col3


    Madhivanan

    Failing to plan is Planning to fail
  8. JiriJ New Member

    -- ONLY in SQL 2005 - USING CTE and ROW_NUMBER()

    WITH DUPLICATE(ID, RowNumber)
    AS
    (
    SELECT ID
    ,ROW_NUMBER() OVER (PARTITION BY ID order by ID) AS RowNumber
    FROM MYTABLE
    )
    Select * FROM DUPLICATE WHERE RowNumber = 1

    -- This seems bit overkill for question asked but if we want to DELETE the DUPLICATE ROW(s) then:

    WITH DUPLICATE(ID, RowNumber)
    AS
    (
    SELECT ID
    ,ROW_NUMBER() OVER (PARTITION BY ID order by ID) AS RowNumber
    FROM MYTABLE
    )
    DELETE FROM DUPLICATE WHERE RowNumber >1

    -- will do the task

    --

    Jiri JANECEK
    MSE, MBA, MCSD.NET
  9. tcshekar New Member

    quote:Originally posted by Madhivanan

    Also post some sample data and the result you want. Why do you not want to use Distinct?

    Madhivanan

    Failing to plan is Planning to fail

    members
    member_id (auto)member_typefirstnamecompany_name
    1 E Tim company 1
    2 A Lov company 2
    3 E Jimmy company 3
    4 A Mak company 4
    5 A Chuck company 5

    groups
    group_id (auto)groupname
    10 IT
    11 Finance
    12 Sales
    13 Marketing
    14 Research
    15 HR

    member_groups
    invite_id (auto)member_idgroup_id
    1 2 10
    2 2 11
    3 2 15
    4 4 10
    5 4 15
    6 5 12

    I WANT RESULTS LIKE THE BELOW? Can you please advice.

    member_idcompany_namegroups
    2 company 2IT, Finance, HR
    4 company 4IT, HR
    5 company 5Sales
  10. Madhivanan Moderator

Share This Page