Processor Performance, 2002
Cache improves performance by reducing long latency memory accesses. Latency is extremely important in recent generation microprocessors. Processor performance has more or less followed Moore’s Law of 40% per year. Yet in the last several years, memory latency has barely improved. Memory bandwidth has scaled with a combination of wide and clock speed. DDR memory can support a data rate of 266MHz, where words are transferred 266 million times per sec, but the elapsed time between the processor issuing a memory request to the time when data is actually received is not much better than Fast Page Mode memory with a data transfer rate of 33MHz. There are some specialty DRAM memory products with low latency, but all systems implement then standard DRAM memory such as DDR or RDRAM as main memory.
Another contribution to latency is the signal propagation delays. Any time a signal needs to move off one silicon chip, propagate on a wire to another chip, the delay is typically 20-25ns. For this reason, on-die cache performs much better than off-die SRAM cache, even if the on-die cache is much smaller than the off-die cache.
(Many server applications use only a small portion of main memory for program code and data structures. A much larger portion of main memory is typically used to buffer information stored on disk drives. One might wonder if it is not time for computer systems to implement main memory in SRAM and implement a separate DRAM buffer memory, which could also be used for virtual memory. The SRAM memory might be only 32-128MB while the DRAM system could be several GB in size.)
Figure 9 show the Pentium III 1.13GHz performance with 256K and with 512K L2 cache. The 256K cache processor has the 0.18u Coppermine core and the 512K has the 0.13u Tualatin core. There were some other minor enhancements besides the increase in L2 cache from the Coppermine to the Tualatin cores. The 256K result is on the Intel 820 desktop chipset with RDRAM memory, while the 512K result is on a ServerWorks HE-SL SDRAM chipset. Both use the version 5.0 compiler.
Figure 9. Pentium III 1.13GHz, 133MHz FSB with 256K and 512K L2 cache.
Figure 10 compares the Pentium III 700MHz 256K L2 cache performance and the Pentium III Xeon with 2M L2 cache. Both systems have a 100MHz FSB. The 256K result is on the Intel 440BX chipset and uses the version 4.5 compiler. The 2M result is on the ServerWorks HE chipset and version 5.0 compiler.
Figure 10. Pentium III 700MHz, 100MHz FSB, 256K and 2M L2 cache.
Figure 11 compares the Pentium 4 2.0GHz performance with 256K and 512K cache on the same platform and compiler.
Figure 11. Pentium 4 2.0GHz processors with 256K and 512K L2 cache.
Figure 12 shows the relative performance gain for the Pentium 1.13GHz processor from 256K L2 cache to 512K L2 cache. Some of the performance differences may be attributed to differences between the Intel 820 chipset with RDRAM and the HE-SL chipset with interleaved SDRAM. However, most of the difference is probably due to the difference in cache size.
Figure 13 shows the relative performance gain from the Pentium III 700MHz with 256K L2 cache to the Pentium III Xeon 700MHz with 2M L2 cache. The 256K result used the 4.5 compiler and the 2M result used the 5.0 compiler, so some differences are due to the compiler. The EON performance may be mostly due to the difference between the 4.5 and 5.0 compilers. The MCF program shows a much larger performance gain than due to the compiler.
Figure 12. Pentium III 1.13GHz, 133MHz FSB with 256K and 512K L2 cache.
Figure 13. Pentium III 700MHz, 100MHz FSB with 256K and 2M L2 cache.
Eight of the twelve individual programs show a performance gain of more than 15% from 256K to 512K cache. Two of the eight, VPR and TWOLF, get good additional performance gain with an even larger 2M cache. MCF does not show much of a performance from 256K to 512K, but shows a very large gain at 2M.
Figure 14 shows the performance gain on the Pentium 4 2.0GHz processor with 256K to 512K L2 cache. The Pentium 4 gains 11% from 256K to 512K cache compared to the 22% gained for the same cache size change on the Pentium III.
Figure 14. Pentium 4 2.0GHz performance with 256K and 512K L2 cache.
One possible explanation is that the Pentium 4 processor is less constrained by bus bandwidth than the Pentium III, 3.2GB/sec for the Pentium versus 1GB/sec for the Pentium III. Hence any characteristic that reduces bus utilization will have a larger performance impact for the Pentium III. Otherwise, one would expect that the faster processor should benefit more from a larger cache.