SQL Server Performance

t-sql avoid a catesian product

Discussion in 'ALL SQL SERVER QUESTIONS' started by Wendy elizabeth, Dec 24, 2012.

  1. Wendy elizabeth New Member

    For a customer, I need to load data from excel 2010 spreadsheets into a sql server 2008 r2 database for a one time set of adhoc queries. I will then run queries to obtain to look at the data based upon the user requirements.

    The problem with the data is there is alot of duplicate data within different rows within the same table. When I join the various tables together, I get a cartesian product.
    I am trying to determine how to run the queries I need to run without getting the cartesian product.

    There are 3 tables which are: 1. a claims table by client number that has the duplicate data wtihin some of the rows. 2. a price table and 3. an authorizagtion table.

    I am thinking of loading the data into temp tables in a way that the data will not be duplicated. If this is a possbility, can you tell me how to accomplish this goal?

    If you have any other suggestions, can you show me code on how to accomplish this goal?
  2. Shehap MVP, MCTS, MCITP SQL Server

    You can use CTE expression with the Aid of Row_number function and partition by command to filter out the duplicated data according to a certain Key , you can emulate the below example within your case :

    With Temp as

    (

    Select row_number(Duplicated_Column order by Order_Column ) as rankno , * from table1)

    select * from temp where rankno >1

    And If needed to carry out some kind of data cleansing , you can run the below query

    With Temp as

    (

    Select row_number(Duplicated_Column order by Order_Column ) as rankno , * from table1)

    delete from temp where rankno >1

    Please let me know if any further help is needed

Share This Page