Server System Architecture, 2002
Pentium 3 and Pentium III Xeon Server System
Many Intel Pentium III and Pentium III Xeon server systems are based on either the ServerWorks (formerly RCC) LE, HE and HE-SL, or Intel (via the acquisition of Corollary) Profusion chipsets. Single processor servers are generally designed with a desktop chipset. Most dual processor Pentium III server systems are designed either with the ServerWorks LE or HE-SL chipsets. The ServerWorks HE chipset is popular for 4-way Pentium III Xeon servers. Older systems, designed around the Intel 440BX and 450NX chipsets, were mostly phased out by the end of 2000. The ServerWorks LE and HE chipsets were originally called the ServerSet III family, but that term was later dropped. The Intel ProFusion chipset or a derivative is used in most 8-way server Pentium III Xeon systems.
Figure 2 below shows the ServerWorks LE chipset in a 2-way Pentium III system. The main components of the ServerWorks LE chipset are the NB-LE host (north) bridge and the CSB5 south bridge. The north bridge has interfaces for one Pentium III processor bus, one SDRAM memory channel, one 64-bit 66MHz PCI bus and one 32-bit 33MHz PCI bus. A 32-bit PCI bus connects the north bridge and south bridge. The south bridge has another PCI bus interface for 32-bit PCI slots, and interfaces for IDE, USB and other low bandwidth I/O devices.
The processor bus and memory channel can support either 100MHz or 133MHz operation, but most dual processor systems employ the higher bandwidth 133MHz bus. Some dual processor motherboards were designed for the Pentium III Xeon processors that only supported the 100MHz FSB. Many of the earlier ServerWorks LE systems restricted the 64-bit PCI bus to 33MHz, at which 4-5 slots could be supported. At that time, there where few 66MHz PCI adapters. As 66MHz PCI adapters became more available, many newer LE based systems elected to support two 64-bit 66MHz PCI slots rather than 4-5 64-bit 33MHz slots. It is also possible to supporting more 64-bit 66MHz slots is to use PCI-PCI bridge chip, but few vendors elected to use this option.
Figure 2. ServerWorks LE chipset.
The ServerWorks LE became very popular in dual processor servers when Intel did not produce a viable successor to the 440BX chipset. The 440BX chipset and its derivatives did not support the 133MHz FSB. ServerWorks also firmly established the practice of employing a chipset specifically designed for server applications in dual processor systems rather than modified desktop chipset.
Figure 3 below shows a 4-way Pentium III Xeon server with the ServerWorks HE chipset.
Figure 3. Quad processor server with ServerWorks HE chipset.
The ServerWorks HE north bridge has two memory channels, each further multiplexed with two Memory Access Data Path (MADP) devices, which functions as repeaters to support four banks of four DIMMs for a total of 16 and a maximum capacity of 16GB. The HE chipset can be implemented with some combinations of either two 64-bit 66MHz PCI busses or five 64-bit 33MHz PCI busses. The most common PCI bus implementation on 4-way servers is triple peer PCI busses, one 64-bit 66MHz, one 64-bit 33MHz and one 32-bit 33MHz.
The 4-way server has a 100MHz front side bus (FSB) compared with the 133MHz bus available in dual processor systems. It is normally desirable to have higher bandwidth to support the additional processors on one bus. However, the Intel Pentium II Xeon and Pentium III Xeon processors with, the SC-330 connector (formerly known as Slot 2), has an electrical loading limitation for a system in the original physical layout of four processors and the chipset north bridge on one side of a system board that restricts bus speed. Dual processor systems can have a FSB at 133MHz with either of the SC-242 (formerly known as Slot 1), SC-330 and FC-PGA (PGA-370) form factor options.
Applications that are more memory bandwidth intensive than processor bound might have better performance on dual processor systems with the 133MHz FSB. Applications that tend to be processor bound are better suited for 4-way systems with lower FSB bandwidth. A larger cache can reduce bus traffic and partially compensate for the reduced FSB bandwidth. The 0.18mm Intel Pentium III Xeon processors consists of one product line with 256K L2 cache, intended for 2-way 133MHz FSB systems and a second product line with 1M or 2M L2 cache, intended for systems supporting up to four processors on one 100MHz FSB. The older 0.25mm Intel Pentium II & III Xeon processors were only available with the 100MHz FSB.
The two memory channels have a combined bandwidth of 1.6GB/sec to the north bridge when operating at 100MHz. The memory subsystem is 4-way interleaved. A word (64-bit data and 8-bit ECC) can be transferred from a DIMM to a MADP on each of the four busses in one clock cycle, so technically the bandwidth in this phase of a memory access is 3.2GB/sec. In the next phase, one 8-byte word can be transferred in one clock cycle on each memory channel from the MADP to the north bridge, so the memory bandwidth in this phase is 1.6GB/sec. In the third phase, the north bridge can transfer only a single word to the processor in one clock cycle, so the bandwidth here is 800MB/sec. The memory bandwidth can be 800MB/sec, 1.6GB/sec or 3.2GB/sec depending on where it is measured. In dual processor configuration with the 133MHz FSB, the memory bandwidth can be 1GB/sec, 2GB/sec or 4GB/sec.
The general trend in desktop systems is that the memory bandwidth matches the system bus bandwidth, except for unusual situations. In practice, an exact bandwidth match is not very important. The bandwidth is just a signaling rate. It might be possible to generate burst transactions at the full signaling rate, but sustained usage is usually much lower. Server systems frequently implement 2-way and 4-way interleaved memory to hide the refresh time between successive memory accesses to DRAM memory. The claim of 2 or 4 times the system bus bandwidth is true from a certain point of view, but is irrelevant to the actual benefit of interleaved memory. There are other techniques for improving memory system performance in addition to interleaving.
The ServerWorks HE-SL brings some of the HE features into a chipset for dual processor Pentium III systems. This includes two memory channels for 2-way interleaved operation with 12GB maximum memory and the use of high speed Inter-Module Buses for greatly improved I/O bus bandwidth. The IMB links the north bridge to I/O bridges (CIOB2). Each CIOB2 device can support two 64-bit 66MHz PCI buses. Most HE-SL systems implement only a single CIOB2 with the two 64-bit 66MHz PCI buses, along with the 32-bit 33MHz PCI bus attached the CSB south bridge.
Figure 4. ServerWorks HE-SL chipset.
The ServerWorks HE-SL represents the new trend in server chipset design. The north bridge links directly with the latency critical memory bus. The I/O busses, which are less sensitive to latency, connect via a bridge chip. This allows the north bridge to connect to the I/O bridge chip on high bandwidth busses while conserving pin-count. The extra bridge chip may add 20-30ns latency. However I/O devices generally have much higher latency so the latency introduced by a bridge chip is negligible.
Figure 5, below, shows the layout of an 8-way Pentium III Xeon server based on the Intel ProFusion chipset. The overall system is comprised of five busses: two processor busses, two memory channels, and one I/O bus. The I/O bus supports four peer PCI busses. It is difficult for a single device to connect all the signals for the five busses and two cache coherency (SRAM) boards. The Profusion north bridge is implemented in two separate components. The Memory Access Controller (MAC) links the address signals and the Data Interface Buffer (DIB) links the data signals.
The processors are divided into two separate busses each with up to four processors. The cache coherency filters allows processors on one bus to determine if a particular 32-byte block of memory is in the cache of one of the processors on the other bus. This reduces snoop traffic to the other bus to maintain cache coherency.
Figure 5. 8-way Pentium III Xeon system with Intel Profusion chipset.
The Profusion chipset can address up to 32GB of memory on 32 DIMM sites with capacities from 128MB to 1GB. Each memory channel can support up to 16 DIMM sites, so each channel is expanded into four electrically separated busses. The ProFusion architecture does not use interleaved memory in the same manner as the ServerSet III design. A four-word (4×8 byte) memory request will access one SDRAM DIMM in the standard X-1-1-1 manner. Memory addresses, however, are interleaved across 32 byte (the size of the processor cache line) partitions. Even addresses map to one memory channel, odd addresses map to the other memory channel. In the both cases, the design is intended support higher realizable memory access rates than a single memory channel.
The PCI busses are connected to an I/O bus by PCI bus expanders (PB64). The I/O bus has a bandwidth of 800MB/sec and supports a maximum of four PB64 devices. Each PB64 can support a 64-bit bus at either 66MHz or 33MHz. Most eight-way servers implement four 64-bit PCI busses, two capable of 66MHz operation, each with 2 slots, and two 33MHz busses, one with four slots, the other with two slots and embedded SCSI. The combined bandwidth of the four PCI busses (1.6GB/sec) is greater than the I/O bus (800MB/sec), but this is should not be an issue. It would take a highly unusual and perhaps contrived circumstance for all busses to be saturated in a cumulative manner. Regardless of the peak sustainable I/O bandwidth, it can be beneficial to employ the highest bandwidth PCI adapters available (64-bit 66MHz) to allow individual actions to complete as quickly as possible and with minimum latency.