SQL Server Processor Performance, 2006

This historical view on the progression in processor performance over time is presented as a guide to upgrading older server systems and as an acquisition strategy for new systems. The focus is on the Intel IA-32 line because Intel designs processors to Moore’s Law, and a common set of performance results are available for this processor line over an extended period. Some discussion is also given on Opteron and Itanium characteristics relevant to SQL Server performance.

It has been over six years since the introduction of the Pentium III processor on the 180nm process. Since that time, the Pentium III line was succeeded in desktop and server systems by the Pentium 4 line and in mobile systems by the Pentium M line. Both the Pentium 4 and Pentium M lines are now being succeeded by the Core 2 line which is mostly a descendant of the Pentium M with a few genes contributed from the Pentium 4. The 180nm process (or 0.18µm) has been succeeded by three process generations, 130nm, 90nm, and now 65nm.

The implication of Moore’s Law is that for server systems with fast load growth, on the order of 40% year-to-year, a frequent replacement cycle, two years, is more effective than a four-five year replacement cycle. This is counter to common accounting practices that seem to think computer systems should have a five year life cycle. So it is important for the DBA to present the case to management decision makers that technology practices should be driven by technology developments and not arbitrary accounting rules.

Intel IA-32 Processors from Coppermine to Merom/Conroe/Woodcrest

Moore’s Law originally stated that processor performance can be doubled every two years. This was adjusted to 18 months in the early years to account for progress in areas not originally considered. But the original rate is probably more accurate in recent history and the near future.

The basis for Moore’s Law is as follows. Every two years, a new manufacturing process is available. From one process generation to the next, linear dimensions are reduced by a factor of 0.72, for an area reduction of approximately one-half (0.72 x 0.72). The goal of a process shrink is to increase transistor switching speed by 1.3x. With other layout improvements, it should be possible to increase the frequency of a processor architecture designed on the previous process by 50%, for a net performance gain of approximately 40%. It should also be possible to design a new microprocessor architecture on the new process with twice as many (logic) transistors as the original. The new processor architecture should be approximately 40% faster than the old architecture on the same manufacturing process, hence the original Moore’s Law.

To properly distinguish between architecture and manufacturing process generations for Intel processors, it is more convenient to use the code names, rather than the official names (see http://www.sandpile.org/ for translation). Official names are driven by marketing strategy where a single name can apply to more than one processor core. Table 1 shows the desktop and mobile cores by process, official product name, code name, top frequency, launch date, and cache size. Table 2 shows codenames for some Xeon, Xeon MP, and other processors.

The Pentium III processor on the 180nm process launched in November 1999 has the Coppermine core. The Pentium 4, with the Willamette core, was a new micro-architecture radically different from previous generations, also introduced on the 180nm process in November 2000. Both the Pentium III and Pentium 4 processors were continued to the 130nm process starting in mid-2001 for the Tualatin core and in January 2002 for Northwood. Both lines featured 256 KB on-die L2 cache at 180nm and 512 KB on 130nm. The Pentium III brand and architecture ended with the 130nm process.

Process

Processor

Core

Top Freq.

Launch

Notes

180nm

Pentium III

Coppermine

1.0 GHz

Nov. 1999

256 KB L2

180nm

Pentium 4

Willamette

2.0 GHz

Nov. 2000

256 KB L2

130nm

Pentium III

Tualatin

1.4 GHz

July 2001

512 KB L2

130nm

Pentium 4

Northwood

3.4 GHz

Jan. 2002

512 KB L2

130nm

Pentium M

Banias

1.7 GHz

March 2003

1 MB L2

130nm

Pentium 4 EE

Gallatin

3.4 GHz

Feb. 2004

2 MB L3

90nm

Pentium 4

Prescott

3.8 GHz

Feb. 2004

1 MB L2

90nm

Pentium M

Dothan

2.26 GHz

May 2005

2 MB L2

90nm

Pentium 4

Irwindale

3.8 GHz

Feb. 2005

2 MB L2

65nm

Pentium 4

Cedar Mill

3.8 GHz

Jan. 2006

2 MB L2

65nm

Core Duo

Yonah

2.16 GHz

Jan. 2006

2 MB L2

65nm

Core 2

Woodcrest

3.0 GHz

June 2006

4 MB L2

Table 1: Intel processors with codenames, launch date and top frequency.

Process

Processor

Core

Notes

180nm

Pentium III Xeon

Cascades

2 MB L2

180nm

Xeon MP

Foster

1 MB L3

130nm

Xeon MP

Gallatin

2 MB/4 MB L3

90nm

Pentium D

Smithfield

1 MB L2 Dual Core

90nm

Xeon

Nocona

1 MB L2

90nm

Pentium D

Paxville

2 MB L2

90nm

Xeon MP

Cranford

1 MB L2

90nm

Xeon MP

Potomac

1 MB L2 + 8 MB L3

90nm

Xeon 7040

Irwindale?

2 MB L2 Dual Core

65nm

Pentium D

Presler

2 MB L2 Dual Core

65nm

Xeon 50×0

Dempsey

2 MB L2 Dual Core

65nm

Xeon MP

Tulsa

2×1 MB L2 + shared 16 MB L3

Table 2: Additional codenames for derivative processors.

The Willamette/Northwood micro-architecture also ended with the 130nm process, but the Pentium 4 brand was continued with a new architecture derived from the original. The Prescott core launched in February 2004 on the 90nm process with a 1 MB L2 cache. A year later, this was followed the 90nm Irwindale core, having the same architecture as Prescott with a bigger 2 MB L2 cache. The final process for the Prescott line is 65nm with the Cedar Mill core for desktops and Dempsey for two socket servers, and Tulsa for four socket servers. Variants of the Prescott core featured two single core processor dies in one package, with brand names Pentium D or Pentium Extreme Edition. This offers essentially the same capability as one die with two independent processor cores with no shared resources or special inter-processor communication capabilities.

The first Pentium 4 architecture, Willamette core, implemented 20+ pipeline stages to achieve a much higher frequency than the Pentium III architecture (12+ pipeline stages) on the same process. The second Pentium 4 architecture with the Prescott core continued this strategy to 30+ pipeline stages presumably with the intent to further accelerate the pace of frequency increase.

It was realized the Willamette architecture did not address the needs of the mobile market, which needed the best performance to power ratio and good but not necessarily top performance. For this purpose the Pentium M line was developed starting with the Banias core (1 MB L2 cache) on 130nm in March 2003, continued to the 90nm Dothan core, 2 MB L2 in May 2004, and then to the 65nm dual-core Yonah in January 2006. With each process shrink, the processor core received minor enhancements. A major change with Yonah was the shared L2 cache. It is simpler to implement a dedicated L2 cache for each core. However, the theory is that a single shared 2 MB L2 cache is better than two separate 1 MB L2 caches, one for each core, if implemented properly.

Intel has just introduced the first of the Core 2 with codenames Merom, Conroe, and Woodcrest for mobile, desktop, and server systems, consolidating the Pentium 4/Pentium D and Pentium M product lines. Woodcrest launched first, and the published performance results are for Woodcrest. This codename will be used, but all three refer to the same processor core. (See David Kanter’s article on http://www.realworldtech.com/ on core micro-architecture performance.)

Continues…

Leave a comment

Your email address will not be published.