SQL Server Parallel Execution Plans
Figure 5 below shows a portion of the parallel execution plan for another query. Figure 6a shows the cost detail for the Clustered Index Scan on the Customer table for the non-parallel execution plan and Figure 6b shows the Customer table Clustered Index Scan cost detail for the parallel execution plan.
Figure 5. Parallel execution plan for second example.
Figure 6a. Non-parallel Clustered Index Scan cost details.
Figure 6b. Parallel Clustered Index Scan cost details.
The non-parallel version is fairly simple to interpret. The estimated IO cost is 2.45 and the estimated CPU cost is 0.165 for a total of 2.61, even though the numbers do not add up exactly. The parallel version is more difficult to interpret. The I/O cost is approximately one-half of the non-parallel IO cost, the CPU is only one-quarter, but the total cost in the parallel plan, 2.529, is only slightly less the non-parallel plan total, 2.651.
In other cases of Table or Index Scans, the I/O cost is the same between non-parallel and parallel versions, but the CPU cost on the parallel plan is one-half that of the non-parallel version. In these cases, the total cost of the parallel plan may be less than the equivalent the non-parallel plan, but more the one-half the non-parallel plan cost. It seems unusual that the execution plan suggests that table and index scans do not derive the same degree of benefit from parallel execution as index seek and hash match operations.
In some queries with parallel execution plans, the cost of the some of the parallelism operators, typically Parallel/Repartition Stream, can be expensive compared to the other operations, such that the overall query actually has a higher plan cost the than non-parallel plan. Yet this does not inhibit the use of the parallel execution plan.
Figure 7 shows another aspect of parallel execution plans. The Nation table at the upper right is a small table and is scanned with a non-parallel operation. The results are then distributed to multiple threads with the Parallelism/Distribute Streams operation.
Figure 7. Parallel execution plan with non-parallel component.
TPC-H Query Analysis on SQL Server
A simple method of investigating parallel execution plans is to use the data and code generators from the TPC-H benchmark. The Transaction Processing Council (tpc.org) provides the source code for the dgen program used to generate the dataset for the TPC-H benchmark and the qgen program to generate queries. Most TPC-H benchmarks published recently involved at least 100GB or much larger datasets. However, even the 1GB dataset is sufficient to generate parallel execution plans on SQL Server. By choosing a dataset small enough to fit in memory, the disk system is not likely to be a bottleneck, allowing for simpler test configurations focused on investigating CPU usage.