Server System Architecture, 2002
Pentium 3 and Pentium III Xeon Server System
Many Intel Pentium III and Pentium III Xeon server systems are based on either the ServerWorks (formerly RCC) LE, HE and HE-SL, or Intel (via the acquisition of Corollary) Profusion chipsets. Single processor servers are generally designed with a desktop chipset. Most dual processor Pentium III server systems are designed either with the ServerWorks LE or HE-SL chipsets. The ServerWorks HE chipset is popular for 4-way Pentium III Xeon servers. Older systems, designed around the Intel 440BX and 450NX chipsets, were mostly phased out by the end of 2000. The ServerWorks LE and HE chipsets were originally called the ServerSet III family, but that term was later dropped. The Intel ProFusion chipset or a derivative is used in most 8-way server Pentium III Xeon systems.
Figure 2 below shows the ServerWorks LE chipset in a 2-way Pentium III system. The main components of the ServerWorks LE chipset are the NB-LE host (north) bridge and the CSB5 south bridge. The north bridge has interfaces for one Pentium III processor bus, one SDRAM memory channel, one 64-bit 66MHz PCI bus and one 32-bit 33MHz PCI bus. A 32-bit PCI bus connects the north bridge and south bridge. The south bridge has another PCI bus interface for 32-bit PCI slots, and interfaces for IDE, USB and other low bandwidth I/O devices.
The processor bus and memory channel can support either 100MHz or 133MHz operation, but most dual processor systems employ the higher bandwidth 133MHz bus. Some dual processor motherboards were designed for the Pentium III Xeon processors that only supported the 100MHz FSB. Many of the earlier ServerWorks LE systems restricted the 64-bit PCI bus to 33MHz, at which 4-5 slots could be supported. At that time, there where few 66MHz PCI adapters. As 66MHz PCI adapters became more available, many newer LE based systems elected to support two 64-bit 66MHz PCI slots rather than 4-5 64-bit 33MHz slots. It is also possible to supporting more 64-bit 66MHz slots is to use PCI-PCI bridge chip, but few vendors elected to use this option.
Figure 2. ServerWorks LE chipset.
The ServerWorks LE became very popular in dual processor servers when Intel did not produce a viable successor to the 440BX chipset. The 440BX chipset and its derivatives did not support the 133MHz FSB. ServerWorks also firmly established the practice of employing a chipset specifically designed for server applications in dual processor systems rather than modified desktop chipset.
Figure 3 below shows a 4-way Pentium III Xeon server with the ServerWorks HE chipset.
Figure 3. Quad processor server with ServerWorks HE chipset.
The ServerWorks HE north bridge has two memory channels, each further multiplexed with two Memory Access Data Path (MADP) devices, which functions as repeaters to support four banks of four DIMMs for a total of 16 and a maximum capacity of 16GB. The HE chipset can be implemented with some combinations of either two 64-bit 66MHz PCI busses or five 64-bit 33MHz PCI busses. The most common PCI bus implementation on 4-way servers is triple peer PCI busses, one 64-bit 66MHz, one 64-bit 33MHz and one 32-bit 33MHz.
The 4-way server has a 100MHz front side bus (FSB) compared with the 133MHz bus available in dual processor systems. It is normally desirable to have higher bandwidth to support the additional processors on one bus. However, the Intel Pentium II Xeon and Pentium III Xeon processors with, the SC-330 connector (formerly known as Slot 2), has an electrical loading limitation for a system in the original physical layout of four processors and the chipset north bridge on one side of a system board that restricts bus speed. Dual processor systems can have a FSB at 133MHz with either of the SC-242 (formerly known as Slot 1), SC-330 and FC-PGA (PGA-370) form factor options.
Applications that are more memory bandwidth intensive than processor bound might have better performance on dual processor systems with the 133MHz FSB. Applications that tend to be processor bound are better suited for 4-way systems with lower FSB bandwidth. A larger cache can reduce bus traffic and partially compensate for the reduced FSB bandwidth. The 0.18mm Intel Pentium III Xeon processors consists of one product line with 256K L2 cache, intended for 2-way 133MHz FSB systems and a second product line with 1M or 2M L2 cache, intended for systems supporting up to four processors on one 100MHz FSB. The older 0.25mm Intel Pentium II & III Xeon processors were only available with the 100MHz FSB.
The two memory channels have a combined bandwidth of 1.6GB/sec to the north bridge when operating at 100MHz. The memory subsystem is 4-way interleaved. A word (64-bit data and 8-bit ECC) can be transferred from a DIMM to a MADP on each of the four busses in one clock cycle, so technically the bandwidth in this phase of a memory access is 3.2GB/sec. In the next phase, one 8-byte word can be transferred in one clock cycle on each memory channel from the MADP to the north bridge, so the memory bandwidth in this phase is 1.6GB/sec. In the third phase, the north bridge can transfer only a single word to the processor in one clock cycle, so the bandwidth here is 800MB/sec. The memory bandwidth can be 800MB/sec, 1.6GB/sec or 3.2GB/sec depending on where it is measured. In dual processor configuration with the 133MHz FSB, the memory bandwidth can be 1GB/sec, 2GB/sec or 4GB/sec.
The general trend in desktop systems is that the memory bandwidth matches the system bus bandwidth, except for unusual situations. In practice, an exact bandwidth match is not very important. The bandwidth is just a signaling rate. It might be possible to generate burst transactions at the full signaling rate, but sustained usage is usually much lower. Server systems frequently implement 2-way and 4-way interleaved memory to hide the refresh time between successive memory accesses to DRAM memory. The claim of 2 or 4 times the system bus bandwidth is true from a certain point of view, but is irrelevant to the actual benefit of interleaved memory. There are other techniques for improving memory system performance in addition to interleaving.