SQL Server Performance

Parallelism (Gather Streams)

Discussion in 'SQL Server 2005 General DBA Questions' started by srik316, Aug 6, 2008.

  1. srik316 New Member

    Hi,
    I'm joining 2 tables (each with over 400000 records). The join is made on 4 columns. Its returning rows more than what am expecting. Each record is present atleast twice even though there is no difference btw two rows... When I saw the execution plan, its showing parallelism (gather streams). What does this mean? It takes more than 3 minutes to execute a select statement.
    Am involved in doing data imports. I'm actually trying to update one table with the values from other table (which has imported data). The update seems to be running forever.
    Anyway to reduce it?
    Thanks.
  2. Adriaan New Member

    Highly unlikely that it's a technical problem.
    If you join two tables, and there are repeating values on the column(s) on which you join, on either side, then you will get "duplicate" rows in the resultset.
    create table #a (i int)
    create table #b (j int)
    insert into #a values (1)
    insert into #a values (1)
    insert into #a values (2)
    insert into #a values (3)
    insert into #b values(1)
    insert into #b values(2)
    insert into #b values(3)
    select * from #a inner join #b on #a.i = #b.j
    drop table #a
    drop table #b
    Results:
    i j
    ----------- -----------
    1 1
    1 1
    2 2
    3 3
    Use a GROUP BY clause or DISTINCT to suppress the duplicates.
  3. srik316 New Member

    Thanks for the reply... just after posting the msg I noticed that I noticed that I missed the group by clause.
    BTW what does Parallelism(gather streams) indicate?
  4. satya Moderator

    BOL specifies:
    The Gather Streams operator is only used in parallel query plans. The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order preserving, all input streams must be ordered.
    If the output is ordered, the Argument column contains an ORDER BY:() predicate and the names of columns being ordered.
    Gather Streams is a logical operator.
  5. venkatesanj@hcl.in New Member

    1. Use distinct statement to avoid the duplicates.
    2. There is no point in thinking over Parallelism for your single query.
    3. To enhance the performance, please check the indexes on the table.
    Thanks and Regards,
    Venkatesan Prabu .J
  6. M Drost New Member

    But how can Parallelism decrease query performance while with the OPTION (MAXDOP 1)
    performance is great? (Does any1 have the logic?)

Share This Page