SQL Server Processor Performance, 2006

Processor

Freq. GHz

Cache

Mem. GB

No. of Disks

tpm-C

tpm-C /GB

tpm-C /Disk

Report Date

PIII Xeon

0.9

2 MB

8

182

39,158

4,895

215

09/27/01

Xeon MP

1.6

1 MB

8

238

48,911

6,114

206

05/17/02

Xeon MP

1.6

1 MB

32

235

61,564

1,924

262

08/23/02

Xeon MP

2.0

2 MB

32

292

78,116

2,441

268

04/21/03

Xeon MP

2.8

2 MB

32

296

84,595

2,644

286

06/30/03

Xeon MP

3.0

4 MB

32

240

95,163

2,974

397

03/01/04

Xeon MP

3.0

4 MB

32

266

102,667

3,208

386

03/01/04

Xeon MP

3.66

1 MB

64

434

141,504

2,211

326

04/21/05

Xeon 7041

3.0

2×2 MB

64

458

188,761

2,949

412

10/28/05

Opteron

2.4

1 MB

32

303

115,110

3,597

380

10/15/04

Opteron

2.6

1 MB

64

403

130,623

2,041

324

02/14/05

Opteron

2.8

1 MB

64

407

138,845

2,169

341

09/30/05

Opteron DC

2.2

2×1 MB

64

408

187,296

2,927

459

04/21/05

Opteron DC

2.4

2×1 MB

128

406

206,181

1,611

508

11/04/05

Opteron DC

2.6

2×1 MB

128

406

213,986

1,672

527

03/20/06

Tulsa

3.73

16 MB

?

?

320 KB?

?

?

?

Table 4: Selected TPC-C results for four socket systems.

Figure 13 below shows TPC-C performance for dual socket Intel systems from the Pentium III 1.0 GHz to the new Dual-Core Intel 5160 (Woodcrest) at 3.0 GHz. The Pentium III 1.0 GHz/256 KB posted a result of 17,335 tpm-C. The Pentium III 1.26 GHz/512 KB reached 22,007. No results were posted for two-way Xeon systems with the 180nm 256 KB cache Willamette core. The next step was a 130nm Xeon 2.2 GHz/512 KB result of 33,768. From here, Northwood based systems (512 KB L2, no L3) progressed to 44 KB at 3.06 GHz. Then the Gallatin core with 1 MB or 2 MB L3 cache in addition to the 512 KB L2 cache was made available in two-way Xeon systems, having been previously available only in four-way Xeon MP systems, and reached 60 KB. The Prescott derived core with 2 MB cache at 3.6 GHz reached 74 KB. The 90nm Xeon actually reached 3.8 GHz, but there are no published TPC-C results. Beyond this point, the 90nm Prescott/Irwindale core was not advanced further because of thermal limitations.


Figure 12: TPC-C performance for dual processor Intel systems.

The AMD Opteron took the two processor (socket) lead starting with the single core Opteron 2.8 GHz at 76 KB. This particular Opteron result was achieved with 32 GB memory compared with 16 GB in the two Xeon 3.6 GHz. Under normal circumstances, reducing the Opteron 2.8 GHz configuration memory to 16 GB should reduce the performance on the order of 10% for the given performance to memory (tpm-C/GB) ratio. Due to the peculiar memory options for Opteron systems, a 16 GB configuration would allow the use of PC3200 DDR versus PC2100 for the 32 GB configuration. So the best possible two-socket Opteron 2.8 GHz single core system probably would have achieved approximately the same result. The dual-core Opteron 2.6 GHz reached 113,628 also with 32 GB.

The dual core Prescott based 90nm processor, Smithfield, was limited by power dissipation to 2.8 GHz compared with the top single core frequency of 3.8 GHz. This was not sufficient to generate an impressive (read: publishable) performance result. Opteron, with a top single core frequency of 2.8 GHz, was able to fit the thermal envelope with a 2.6 GHz dual core version. On 90nm, the Opteron has comparable performance to the Xeon at single core, with frequencies of 2.8 GHz and 3.6 GHz respectively. In the dual core versions, the Opteron 2.6 GHz has a significant advantage over Xeon at 2.8 GHz. The 65nm dual core Dempsey (derivative of Prescott) was able to reach 3.73 GHz, with the result of 125,954, now with 32 GB memory. In theory, a 65nm dual core Opteron at 3.9 GHz should reach 40% higher than the 90nm 2.6 GHz 113 KB result. The 3.0 GHz Woodcrest on 65nm is at 169,360 with 64 GB memory. A similar dual socket Woodcrest result of 140 KB with 32 GB memory shows that doubling memory contribute 20% performance gain from approximately 4K tpm-C per GB performance to memory ratio.

Figure 14 shows the TPC-C performance for quad processor Intel systems from 2000 to 2006. The four-way Pentium III Xeon 900 MHz with 2 MB L2 cache result of 39 KB is a strong indication that the two-way Pentium III 1 GHz/256 KB L2 performance was probably constrained by its small cache. Otherwise, the four-way performance should not be more than double the two-way result. The Xeon MP with the Foster core at 1.6 GHz/1 MB reached 61 KB. The 130nm Gallatin core, introduced at 2.0 GHz with 2 MB L3 cache, achieved 78 KB. A later release at 2.8 GHz/2 MB reached 84.7 KB and the final version at 3.0 GHz with 4 MB L3 cache reached 102 KB (March 2004). The AMD Opteron gained the four processor lead starting with the single core 2.2 GHz at 105 KB (May 2004) continuing to the 2.6 GHz at 130 KB (February 2005).


Figure 13: TPC-C performance for quad processor Intel systems.

The Intel Xeon MP lineup on the 90nm process offered both the desktop core with 1 MB L2 and an MP server-only version with 8 MB L3 cache. The Cranford codename applied to the Prescott core in the Xeon MP form-factor, while the Potomac codename applied to the version with 8 MB L3. The top Cranford frequency was 3.66 GHz, and the top Potomac was 3.33 GHz. The Cranford processor regained the single core 4-socket system lead at 141 KB (April 2005). Interestingly, no four-way result was published for the Potomac core with 8 MB L3 cache. There was speculation that the latency to the L3 cache was excessive and negated its benefits. Potomac did produce respectable eight- and 16-way single core performance of 251 KB and 376 KB tpm-C, respectively.

The performance gain from Gallatin 3.0 GHz to Cranford 3.66 GHz does warrant some observations. There is a 20% frequency increase and a 40% performance gain. The memory configuration increased from 32 GB to 64 GB. Also significant is the chipset change from the ServerWorks GC-HE with a single 400 MHz FSB to support four processors to the Intel E8500 with two 667 MHz FSB, for two processors per bus. So some performance gain must be attributed to the improved memory system. Whether this is from memory bandwidth or memory transaction rate is unclear. The SPEC CPU 2000 integer results for Gallatin 3.0 GHz/4 MB L3 is comparable to Cranford 3.66 GHz/1 MB L2 (1379 and 1317), so it is possible the last of the Gallatin line on the ServerWorks chipset was memory bandwidth limited.

HP published TPC-C results for the four socket dual core Opteron systems: 2.2 GHz at 188 KB tpm-C (April 2005), 2.4 GHz at 206 KB (November 2005), and 2.6 GHz at 214 KB (march 2006). Paxville, a dual core version of Irwindale with 2 MB L2 cache at 3.0 GHz, with official names Xeon 7040 and 7041, yielded 188 KB (October 2005) but this was less than the best contemporary four socket dual core Opteron result, and about 12% less than the Opteron 2.6 GHz DC. The Xeon 7041 result was with 64 GB memory. At the tpm-C/GB ratio of 2,949, it is possible that increasing memory to 128 GB might have made up the difference. The Intel 8501 chipset can support 128 GB DDR memory, but all vendors have elected the DDR2 option with 64 GB maximum memory configuration.

Instead of employing the dual core 65nm Cedar Mill in the Xeon MP line, Intel plans to introduce Tulsa, which features 2 Prescott derived cores each with 1 MB L2 cache and a shared 16 MB L3 cache. Intel presentations claim a 1.7x increase over Paxville, which would be 320 KB. Presumably this is based on actual measurements with pre-production cores, and not an estimate. If we suppose that 15% of this gain relative to the Xeon 7041 result of 188 KB is from the frequency difference, and another 15% from larger memory configuration, this still leaves nearly a 30% gain that is attributed to improved scaling with the 16 MB shared cache. If all of this turns out to be the case, then whatever deficiency existed with the Potomac L3 cache has been resolved in Tulsa.

From the Pentium III 1.0 GHz/256 KB to the Xeon 3.6 GHz/2 MB, there was approximately a 4x increase in performance in dual processor systems and a 3.5x increase in the quad processor systems. The TPC-C benchmark depends on a combination of factors, including processor, memory, and disk I/O. The fact that the results still fall in line with SPEC CPU 2000 Integer indicates that the overall system architecture is being properly scaled with processor performance. With the new multi-core processors launching this year and in 2007, finally unconstrained by thermal limitations, look for performance gains in multi-threaded applications to exceed the pace of Moore’s Law. The traditional doubling of the logic complexity of a processor was only expected to generate a 40% performance gain. An unconstrained dual-core can yield a 80% performance gain over the corresponding single core.

There are not enough published TPC-H results for a thorough comparative analysis. Care should be taken in comparing results. SQL Server 2000 had serious deficiencies in handling the very large queries of the TPC-H benchmark, especially on high-end server systems. SQL Server 2005 made significant improvements and is highly competitive with other big name DBMS products in data warehouse type applications. Some of this is illustrated in Table 5, showing the TPC-H 1000 GB scores for 16-way Itanium 2 systems with various SQL Server versions. The first is a Unisys result with the 1.5 GHz. The next two are Bull results with the slightly faster 1.6 GHz Itanium. Nearly all of the performance gain is due to improvements from SQL Server 2000 to SQL Server 2005. SQL Server 2000 had a highly questionable ability to benefit from parallel execution plans, especially beyond four processors. In fact, a parallel execution plan was as likely to cause performance degradation.

SQL Server Version

Freq. GHz

Cache

Mem. GB

No. of Disks

Power

Throughput

QphH

Report Date

2000

1.5

6 MB

64

214

7,331

3,687

5,199

10/15/03

2005

1.6

6 MB

64

238

19,348

9,799

13,769

7/5/05

2005 + SP1

1.6

6 MB

64

238

23,279

12,502

17,060

11/7/05

Table 5: TPC-H 1000 GB results for 16-way Itanium 2 systems.

Continues…

Leave a comment

Your email address will not be published.