SQL Server Processor Performance, 2006
Two other significant innovations in the Opteron architecture include the following. Simultaneous bi-directional (SBD) point-to-point links between processors and I/O controllers. In the AMD line, this is called Hyper-Transport. SBD was used in the Intel 870 chipset for Itanium 2 processors to link nodes of a NUMA system. The general idea is to achieve maximum bandwidth per pin and the lowest latency in off-chip communication. In this area, SBD is better than the shared bus architecture, which the Xeon line carried over from the original Pentium Pro architecture of the mid 1990’s. Also significant is the instruction set architecture enhancement. Simply extending the x86 instruction set architecture from 32- to 64-bit was definitely called for, but by itself does not require great innovation. The very significant, perhaps great, ISA innovation was in figuring how to expand the 64-bit mode to support 16 general-purpose registers over 8 in the 32-bit and previous modes. The principle remaining advantage of RISC architecture over x86 was the higher number of registers. Of course, most RISC architectures have already been nearly obliterated from the computer systems market thorough strong investment, good execution, and other reasons by the two major x86 companies.
One aspect of the Opteron processor and system architecture is that memory bandwidth scales with the number of processors because the memory controller is integrated into the processor. Adding processors adds memory bandwidth. One of the marketing arguments made for AMD over Intel is that memory bandwidth scales on Opteron and is a bottleneck on the Xeon system. The first part of the statement is true, and there are definitely applications that can use the additional memory bandwidth in Opteron systems compared with earlier generation Xeon systems. But unless the particular application requires the extra bandwidth, this would not be a bottleneck.
There is no definitive evidence that memory bandwidth on the Xeon platforms is a constraint to SQL Server performance. Many have reiterated the memory bandwidth argument in discussing Opteron versus Xeon on the subject of SQL Server performance without introducing clear evidence, even adding that this leads to better performance and scaling. Such a statement indicates a lack of understanding of the difference between performance and scaling. Scaling refers to the performance trend with increasing processors, but not absolute performance. So if architecture A has a one processor performance of 1.0, a two processor performance of 1.7, and a four processor performance of 2.55 (=1.7×1.5), while architecture B has a one processor performance of 1.0 (possibly on a difference scale), a two processor performance of 1.8, and a four processor performance of 3.06 (=1.8×1.7), then B has better scaling than A, while making no statement on the baseline performance.
Consider the following. The two processor 130nm Xeon 3.06 GHz, 1 MB L3 cache, 533 MHz FSB (4.3 GB/sec) achieved a TPC-C result of 52 KB. The four processor Xeon MP 3.0 GHz, 4 MB L3 cache, 400 MHz FSB (3.2 GB/sec) achieved 102 KB tpm-C. The next generation of four socket Xeon systems increased the combined FSB bandwidth to 10.6 GB/sec or better. So if there ever was a memory bandwidth bottleneck, this probably ended with the old ServerWorks GC-HE chipset.
Given that single core Xeon 3.6 GHz performance is comparable to single core Opteron 2.8 GHz for both two and four socket systems, there is every reason to indicate that the Opteron advantage over Xeon at dual cores is because the Opteron is only slightly constrained thermally while the Xeon is significantly constrained. So applying the memory bandwidth argument to SQL Server performance is nothing but mindless regurgitation of a marketing argument unsupported by facts and with nothing to indicate that it has any relevance to SQL Server.
It is possible that the Opteron memory architecture may have had an advantage in memory transaction rate, meaning the ability to fetch small blocks. This has to do with the number of pipes to memory, not necessarily the bandwidth to memory. None of this detracts from the fact that the Opteron processor and system architecture was highly successful, particularly the 90nm dual core generation that achieved a respectable performance lead over contemporary Xeons.
On TPC-H, there are not a sufficient range of results for comparison purposes.
System Replacement and Purchasing Strategy Implications
In part, the above discussion was to build guidelines for the replacement of existing systems and a strategy for purchasing new systems. Within reason, Intel has built the IA-32 line to Moore’s Law, with performance doubling every two years. Presumably AMD intends to be competitive. The Itanium line fits certain needs. However, the slower succession cycle has rendered Itanium systems less competitive when a significant process generation gap develops. An example being Montecito at 90nm competing with Woodcrest at 65nm. The theoretical 50% advantage of the Itanium architecture is significantly reduced by the process generation gap.
In addition to Moore’s Law, the overall system cost structure is being maintained or improved. That is, a two-socket server system in one generation is likely to be succeeded by the next generation system at approximately the same or lower cost. Another factor is that in any given generation, a four socket system will cost more than twice as much as two two socket systems. The same holds for an eight socket system relative to two four socket systems. Further, a four socket system should have slightly less than twice the performance of the two socket system when configurations are properly adjusted.
All of this translates to not buying much more performance headroom than needed in the near term, and relying on a frequent replacement cycle. For a database application with a 40% annual load growth rate, purchasing 2x headroom is a reasonable strategy. In two years time, the headroom is consumed, and then the system is replaced by a new system with twice the performance. This is far more effective than purchasing 4x headroom for a four year replacement cycle. For higher usage growth rates, a one year replacement cycle might be advisable.
It is a common accounting practice to depreciate computer hardware over five years. However, using this practice to mandate a five year replacement cycle for a server tasked to a specific application in a high growth rate environment can be impractical. If it is necessary to keep a particular system for five years, then consider rotating the servers running the most critical applications to less critical functions as supported by technical analysis.
For data centers where floor space and electrical power are serious considerations, a technology driven replacement cycle is even more important. Replacing systems on a two year cycle frees up floor space effectively doubling performance per volume. Consider that the other option involves significant capital outlay to expand the size of the data center. The new Woodcrest systems and SFF SAS disk drives can also significantly reduce power consumption per system over the recent previous generations, in addition to increasing performance per watt.