SQL Server Performance

Slow JOIN performance using OR condition

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by Captain DB, Jul 10, 2009.

  1. Captain DB New Member

    I am having a performance issue with a SQL join query. The database has 50K objects and 15000 relationship objects. Here is the query:
    select * from table1 t1, table2 t2, relationships t3, table4 t4 WHERE ((t3.id = t4.id1 AND (t1.id = t3.id) ) OR (t1.id = t4.id2 AND (t3.id = 1))) AND t1.id = t2.id ORDER BY t4.id, d1.id
    The query does not complete even after several minutes.
    I've determined it has something to do with the OR condition. When creating 2 separate queries from the clauses of the OR condition, they run fine. But when doing UNION of those queries, the performance is still very slow. I tried converting the query to use JOIN/ON syntax but the result is the same.
    Table 'table1' has all the object data. It will also have some data from the relationship objects. Table 'relationships' has of course relations between objects. Table 'table4' is a 2 column table, each of BIGINT representing an Object Identifier. The first column has Relationship identifiers. The second column has Object identifiers.
    The goal is to get all the objects and it's relationships based on the objects in table 'table4'. Since objects that are not relationships will not exist in the relationships table, "t3.id = 1" in second clause of 'OR' was added to get back 1 row from join of table1 and relationships, otherwise you would end of up with cartesian product of all 'relationships' rows.
    Table 'table1' has a clustered index on column 'id'. Table 'relationships' has a non-clustered index on 'id' column.
    Any information will be appreciated.
  2. satya Moderator

    Have you looked estimated execution plan of this query under query editor?
  3. TommCatt New Member

    [quote user="Captain DB"]
    The query does not complete even after several minutes.
    [/quote] So? I work with many queries that, on a good day, run 45minutes or more. Do you have any reason to expect the query toexecute in less than "several minutes"?
    You have two join conditions which may be expressed like this(pseudo-code):
    (t1.id = t2.id = t4.id = t3.id) or (t1.id = t2.id = t4.id andt3.id = 1)
    This may be simplified:
    t1.id = t2.id = t4.id and t3.id in (t1.id, 1)
    This is rather strange join criteria, mainly because therelationship between the tables is unclear. From your description, t1is the “object” table. But there is a table named “Relationship”(t3) and a table you state contains relationship info (t4). It lookslike t4 may be an intersection (or cross) table which is used toresolve a many-to-many relationship. Can a row in t1 relate to manyrows in t3 and a row in t3 relate to many rows in t1?
    You also state, quite puzzingly:
    [quote user="Captain DB"]
    ... "t3.id = 1" in second clause of 'OR' was added toget back 1 row from join of table1 and relationships, otherwise youwould end of up with cartesian product of all 'relationships' rows.
    [/quote]
    which is not true. In fact, it will create a Cartesian product inthat you end up with not only all the rows where t1.id = t2.id =t4.id = t3.id but also all the rows where t1.id = t2.id = t4.idrepeated for each row where t3.id = 1. If you do in fact have aperformance problem, it is probably here.
  4. Captain DB New Member

    When I rewrote the query as follows:
    select * from table1 t1, table2 t2, relationships t3, table4 t4 WHERE ( (t1.id = t4.id1) AND ( t1.id = t3.id )) AND t1.id = t2.id
    UNION
    select * from table1 t1, table2 t2, relationships t3, table4 t4 WHERE (( t1.id = t4.id2) AND (t3.id = 1)) AND t1.id = t2.id
    the query returned the correct results in about 3 minutes.
    When I changed the select clause to select certain columns but not including columns from relationships table, the query completed in about 22 seconds. So I believe it is doing some type of table scan from relationships table to get columns from relationships table that are not in the index.
    Table t4 really represents a joined object. So there is one row for every Relationship for a specific object.
    Relationships table has one row for every relationship object whereas t1 has many rows to represent one object. Rows in relationships table can relate to many rows in t1 but a row in t1 can only relate to one row in t3.
    What you said about t3.id =1 concerning cartesian product is true but that is what I want. There is only one row where t3.id=1 so when it does a join for rows in t1 that are not relationship objects I get back the correct number of rows. Relationship rows in t1 also map to only 1 row in relationships table so again I will get back the correct number of rows.

Share This Page