Server System Architecture, 2002

Pentium 4 and Xeon Server System

The low end of entry-level servers are generally built on a desktop chipset. The features of a desktop chipset are not necessarily the best match for servers, but almost nothing has lower cost structure than a desktop chipset. Figure 6 below shows the layout of a single processor Pentium 4 system with the Intel 845E chipset. Desktop chipsets used to support dual processor configurations in the Pentium II and Pentium III generations, but this capability was dropped to meet the more stringent electrical requirements for the 400MHz system data bus on Pentium 4 processors. The two major components of the 845E chipset are the Memory Controller Hub (MCH) and the I/O Controller Hub (ICH). The MCH is the north bridge and ICH is the south bridge. The MCH interfaces the low voltage buses (<2V) and the ICH interfaces older high voltage buses (5V and 3.3V).

Figure 6. Intel 845E chipset.

Pentium 4 system data bus is 64-bits wide and operates at either 400MHz or 533MHz for a bandwidth of 3.2GB/sec or 4.2GB/sec. The system data bus is quad-pumped, that is, the data transfer rate is four times the bus clock. The bus clock is actually 100MHz or 133MHz for some of the more recent versions. For marketing reasons, it is less confusing to explain only the data transfer rate. The address rate is the same as the bus clock of 100 or 133MHz. The 845E memory bus is a 64-bit DDR interface (72-bit with ECC) with either 200MHz data transfer rate for a bandwidth of 1.6GB/sec for the 400MHz front-side bus (FSB) or 266MHz, 2.1GB/sec for the 533MHz FSB

The Pentium 4 processor was originally designed to operate with 2 RDRAM channels with 3.2GB/sec combined memory bandwidth. But the high cost of RDRAM in relation to other DRAM types resulted in demand for a chipset supporting SDRAM and DDR memory. The original Intel 845 chipset supported PC100 SDRAM and a follow-on chipset supported DDR. To allow low cost desktop platforms, the 845 has only a single 64-bit memory channel, rather than two 64-bit memory channels, which would allow 200/266MHz DDR memory to match the 400/533MHz Pentium 4 FSB in bandwidth. Intel publishes performance results for Pentium 4 systems with the 850 RDRAM-based chipset, but not with any of the 845 SDRAM or DDR chipsets. Presumably the 850 chipset has better performance. 

The 845E Memory Controller Hub has an 8-bit 66MHz quad-pumped point-to-point Hub Interface (HI) to the I/O Controller Hub (ICH) with 266MB/sec bandwidth. The ICH supports one 32-bit 33MHz PCI bus, two IDE/ATA 100 channels, USB ports and other low-bandwidth legacy devices. The 32-bit/33MHz 133MB/sec PCI is somewhat inadequate for a server platform with such a powerful processor. The older generation 440BX/GX chipset was offered with an AGP-to-PCI bridge chip, allowing one 64-bit/66MHz PCI bus to be operated from the AGP port. This might still be possible with the 845E chipset, but no systems offer this capability.

Figure 7 below shows the ServerWorks GC-LE chipset for a dual processor Xeon system. The Xeon processor has the Pentium 4 core in a 603-pin socket that allows multiprocessor configurations and with Hyper-Threading capability enabled. The memory bandwidth matches the system bus bandwidth, but the more important point is that the memory is 2-way interleaved. The IMB (3.2GB/sec) has more bandwidth than the maximum combined bandwidth of the two PCI-X busses (1GB/sec) on each CIOB-X. This reflects the current signaling capability of a point-to-point link rather than a required design feature.

Figure 7. ServerWorks GC-LE chipset.

Figure 8 below shows the ServerWorks GC-SL chipset, with a single DDR memory channel and a single IMB bus for two PCI-X busses. This allows for lower cost system with still considerable I/O bus bandwidth. 

The advances from the ServerWorks HE-SL to the GC-LE chipset are substantial increases in system, memory and I/O bandwidth. The memory subsystem changed from SDRAM to DDR to support twice the data bandwidth for the same data path width. On the I/O side, the IMB bandwidth increased from 1GB/sec to 3.2GB/sec and the 64-bit 66MHz PCI busses changed to 64-bit 100/133MHz PCI-X busses. The GC-SL chipset gives up 2-way interleaved memory for a single DDR channel but still has much better I/O capability than the Intel 845E desktop chipset.

Figure 8. ServerWorks GC-SL chipset.

Figure 9 below shows the Intel E7500 chipset for dual processor Xeon systems. The E7500 chipset is comprised of one MCH, up to 3 P64H2’s and one ICH. The E7500 MCH interfaces the system bus, 2 72-bit (64-bit data, 8-bit ECC) DDR channels, 3 Hub Interface 2.0 connections and 1 HI 1.5 connection. The HI 2.0 connection is 16-bit wide, and has 66MHz clock with 8X data transfer for a total bandwidth of 1.066GB/sec. Each P64H2 device can support 2 64-bit PCI-X busses.

Figure 9. Intel E7500 chipset.

It might seem the HI 2.0 connection, with 1.066GB/sec bandwidth, is a potential bottleneck when each PCI-X bus is running at 100 or 133MHz with a bandwidth of 800MB/sec or 1GB/sec. This is actually not a significant issue. The older generation 33MHz PCI bus was frequently configured with 4 or more slots. It certainly seems reasonable that a higher frequency bus can support at least the same number slots, even taking into account the higher bandwidth requirements of individual adapters. However, even the improved signaling capability in PCI-X allows only 2 slots at 100MHz and 1 slot at 133MHz. One solution is to create two electrically independent busses, even if the uplink bandwidth is only that of single bus. This allows either 4 100MHz or 2 133MHz PCI-X slots on each P64H2. The full 3GB/sec bandwidth is not even a firm requirement at the system level as Intel’s own motherboard for the E7500 chipset only implements 2 of the 6 possible PCI-X busses on just 1 P64H2 and one 32-bit PCI bus on the ICH.

Figure 10 below shows the Server Works GC-HE chipset for 4-way Xeon systems. The GC-HE north bridge has a 4-way interleaved DDR memory subsystem and can support three CIOB-X for a combined bandwidth of 5GB/sec, even though each IMB bus can signal at 3.2GB/sec. This supports the claim that the full I/O bandwidth is not entirely required.

Figure 10. ServerWorks GC-HE 4-way Xeon system.

There is also the IBM XA-32 chipset for the Xeon processor that supports 4 Xeon processors in one node and allows four nodes to be linked together. The XA-32 north bridge has a 32MB SRAM cache. This extra cache can provide 15-20% performance increase. The size of the SRAM cache leads one to ask whether the role of SRAM in memory systems should be reconsidered.

All of the Pentium 4 and Xeon chipsets targeted for server system employ DDR memory. Some Intel desktop chipsets support RDRAM, but no server systems are designed with RDRAM. The ability to configure a large amount of memory economically is important for server systems. A major advantage of RDRAM is the ability to support high-bandwidth in a relatively small memory configuration (128M), which is not important in server systems. The disadvantage of RDRAM is that cost per chip is higher than that of DDR. This may not be a major liability in a high-end desktop system with only 256MB of memory, but is for a server system with multi-GB memory configurations. 

The chipsets designed specifically for server systems all support PCI-X up to 133MHz. Technically the ServerWorks GC-LE chipset has more I/O bandwidth than the Intel E7500, but so far there are no clear demonstrations with actual applications that can need multi-gigabyte/sec I/O bandwidth. It is not unheard of for new hardware technologies to enable very large boosts in bandwidth. However, it may take some time for the operating system and applications to be properly tuned to take full advantage of the new bandwidth.

Continues…

Leave a comment

Your email address will not be published.