SQL Server Performance Forum – Threads Archive

Opteron vs Xeon preference??

We are beginning to start a hardware refresh and I was wondering what type of preference people on this board have? In my experience I have used both Opteron and Xeon. And my preference has leaned towards the AMD Opterons sides becauses of performance and price. But I have found a lot of more XEONs out there then Opteron.

Check following from Joe.. Recommended Hardware Configurations for SQL Server
http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=15002
MohammedU.
Moderator
SQL-Server-Performance.com

I wonder if it valid also for 2005. Wait for Joe answer. Now to be sure about Joe reading I’ve move (arbitrary) to hardware forum. Luis Martin
Moderator
SQL-Server-Performance.com All in Love is Fair
Stevie Wonder
All postings are provided â€œAS ISâ€ with no warranties for accuracy.

it seems like pchee is asking for non-technical preferences my own suggestion is that DBAs not get emotionally involved in the platform if it gets the job done.
whoever has the lead in any particular area can change in 6 months to 1 year right now, dual-core to dual-core
Intel has a moderate lead in TPC-C which is more high volume calls, AMD has a moderate lead in TPC-H which is big queries and bandwidth intensive At the platform level
for 2 sockets, I like the Proliant ML370 for Xeon because it supports 64GM mem, 6 PCI-e sockets, and 16 internal disks the AMD can make a perfectly good 2 socket platform, but vendors only offer 2U and 32GB max mem. At 4 socket, I like the Proliant DL585 because it has 3x8PCIe and 4×4 compared with 6 x4 for the Intel
and 128G max mem compared with 64G for Intel the server planning team at Intel could never seem to give clear and intelligent direction to the silicon designers providing vision 4-6 years in the future,
resulting in frequent chaos, band-aiding, catch-up, direction changing etc
The 5000P was well executed, but the 8501 is getting long in the tooth
when do they expect to have a Core 2 in 4 socket?
with >64G memory support, hopefully they will have the sense to jump to 256G considering they have sat at 64G for so long (excluding a meaningless 128G DDR-1 option)
will it support 8-10 PCI-e sockets? Update
from 2007-Feb-21 Intel slide deck
Core 2 micro arch in Xeon 7000 series in Q3 -07,
not sure if this is dual core or quad core
I can’t wait,
Quad Independent Bus

I have a friend who is in the server business. He says that the XEON weakness is the Northbridge. It is 80’s technology. 2 Sockets AMD and Xeons are comparable, but 4 sockets with dual core, you’re beginning to saturate the Northbridge, advantage Opteron. He says Intels is gonna have a hard time to beating AMD on 4 socket, multi core etc, until they come out with newer technology. BUT… I like AMDs price

your friend probably does not have direct access to performance models and simulations from the silicon architects/designers
and is just regurtitating bs put out by the marketing people, who are paid to put the best possible spin on a situation regardless of the whole truth and nothing but the whole truth. What it comes down to is shared bus vs point to point links.
Any given piece of silicon can have a limited number of pins to the outside world at a given cost structure.
Today this is around 1400,
so figuring about 700 are available for comm,
The rest for power and ground.
If you have a 64-bit bus shared with 3 devices,
then may you can run the bus at 800MHz,
but any two devices can use the full bandwidth at one time.
With the same number of pins,
one device can have two 32-bit buses,
one to each of the other 2 devices.
You can now drive the bus at 1333MHz,
about 50-60% faster, so there is more total bandwidth,
but the max bandwidth between 2 devices is reduced. The exciting new technology is point to point signaling,
which today, can run at 2.5Gbit/sec.
I think the reason p-p can run so much faster than a bus between 2 devices has to do with clocking.
The other neat trick is that it is possible to send and receive signals on the same wires simultaneously.
This is why x8 PCI-E can do 2GB/sec in each direction for a total IO BW of 4GB/sec. The more significant advantage of the high speed narrow p-p links over shared bus is that there is less latency in issuing the address of a memory request or write,
not the data bandwidth that marketing people like to tout.
Basically they look for the biggest number,
not the more significant number.
The above is true for many server workloads.
I do believe FP burns bandwidth, this is why AMD will present an aggregate benchmark including FP. I do not believe that the current Intel system is saturated on the FSB/Northbridge.
I think only the dual core Xeon MP 70X0 with 2M cache per core was saturated in 4-way systems.
The current Tulsa (Xeon 71X0) with 16M shared L3 cache is tops among X86 in 4-scoket TPC-C. My recollection was that for a given processor core architecture (not bus),
implementing integrated memory controller + point to point links to other processors was estimated to yield at 15-25% advantage over the same core architecture on shared bus.
So if you cannot build an equivalent core,
you will not have the full 20% advantage of IMC & PP in the final complete system.
in fact if your core is more than 20% below the other core,
your net system perf will be lower. AMD (marketing) likes to talk about the theoretical advantages of this new technology.
Theory is nice, but actually realizing performance gain means executing many components of the complete system correctly. If what your friend says is true ,
then why does the Intel 4 socket have better TPC-C than AMD?
even though the Xeon 7140 is severely underclocked at 3.4GHz
If the poor leakage characteristics at 90 and 65nm had been solved, the NetBurst architecture probably would have hit 4-5GHz at 90nm and 6-7GHz at 65nm. For 2006 Intel has Core 2 on 65nm (design in Israel)
Expect Penryn – Core 2 at 45nm this year.
A new architecture (Nehalem on 45nm) next year designed by the Oregon team.
a shrink to 32nm in 2009 (Westmere).
A new architecture (Gesher) in 2010 presumably from the Israel design center.

on the matter of new improved technology
how many of the readers remember the old RISC/CISC/X86 ideological/religous war & conversion effort of the 1980s into the 90s? first, X86 is not really CISC but originated as an inconsequential step child, inadvertently became famous, then had massive surgery to cover up its humble origins. the whole concept of RISC was that much of the silicon in a CISC chip was rarely used, resulting in wasted silicon and more expensive design effort the theory put forward by RISC advocates was that a simple design would have most of the silicon real estate put to good use on most clock cycles, so it would have competitive performance, and relatively lower effort to design the thought in the mid 90’s was that on a given manufacturing process, a processor designed to use a set chip size (400mm^2 typical for a new design
the RISC chip should out perform the X86 by approx 20%
most from having 32 GP registers over 8 for X86
second most from less silicon spent decoding instructions, making more available to actually executing the instruction at the time, an X86 design team might have had over 400 people, compared to less than 100 for the contemporary RISC the argument was so convincing that the powers at Intel decided they had to something better than RISC this was not unreasonable because RISC originated when it was difficult to get the entire processor on to a single silicon chip.
by the mid-90s, there was so much silicon real estate that it was reasonable to propose a better idea to fully utilize all that silicon the RISC argument was so pervasive in academia that all CS/EE students bought in to it and did not want to work on X86, even though it had a very sizeable market there was one person, fred i think, who realized many risc concepts could be applied to x86, and with perfect execution, the massive financial resources, and best manufacturing process, to produce a competitive X86, even though it had no natural beauty as it turned, the Pentium Pro more or less matched the RISC chips in peformance, later X86 from both Intel and AMD competely clobbered the RISC chips
IBM Power 5 leads in FP only because of the massive memory bandwidth that is cost prohibitive in the volume market

He is a Bios programmer for for the Newisys and they build servers for HP and SUN so he does have access to all sort of performance models and trends. He pretty much knows what is fluff and what is the good stuff. I forwarded what you said to him and just waiting for a response. the only recent comparision stuff I’ve found are from Anandtech http://www.anandtech.com/IT/showdoc.aspx?i=2745&p=4.

Best NOT to trust those underpaid hardware magazine types.
Esp Toms, Anandtech and Hot Hardware. As usual, they test 10-15% of a market segment and declare a review finished. Stick with Intel.
What if Intel solved the bus issues this year with 45nm silicon and new FSB chipsets?
Then AMD will be finished.

the more proper comparison between the Intel Xeon 7041 2×3.0GHz/2M
and Opteron 2.4GHz DC is the published TPC-C.
(this is the Xeon MP I mentioned above as being constrained, because it was really meant for 2 socket systems, but was band-aided into the 4-scoket until Tulsa was ready) |Fujitsu|4|Intel 7041 DC|3.0GHz| 64GB|450-15K+8-15K| 188,761|
|HP |4|Opteron 880 |2.4GHz|128GB|394-15K+12-10K| 206,181| However, I will not complain too much on the Anandtech result.
Large system performance testing is very tricky matter and requires skills beyond what non-professionals have.
Without disclosure and careful analysis, it is possible to generate any result range on memory difference alone On top of that, getting maximum performance out of the Xeon is especially tricky with Hyper-threading having potential positive and negative characteristics, negative if disk IO is involved and special precautions. I am not ready to declared AMD dead,
Tigerton/Caneland comes out this year, but 45nm 4-socket is probably a full year away or more.
Also, while Core 2 delivers spectacular SPEC CPU, the advantage is not as large in TPC,
My guess is the disk IO code is responsible
My own tests with SQL Server operations in memory show excellent Core 2 characteristics, probably in range with SPEC CPU I am also inclined to think that so long as AMD can produce a decent processor core, there will be enough Intel missteps to keep AMD in the game.
I just do not think their server planning team can provide a clear and concise vision of product requirements 4-6 years into the future for a silicon design team to build to without last minute (1-2 years in silicon world) course changes

any response?

From what I’m hearing right now, even from my friend, the XEONS are faster at the DUAL processor server level. because Intel is pretty much pushing the limit. But they are still sharing the Northbridge. With a shared bus architecture they need to pretty much optimized the chip to the limit and add ton of cache memory. That equals more expensive chip, more power consumption, and HEAT(AC cost). At the QUAD processor, the XEON can handle the smaller queries faster because it can get on and off the northbridge pretty fast, but the Opteron leads the large block queries (in agreement) The northbridge is saturated. QUAD Core technologies AMD will be coming out with true quad core technology in a couple of months
Intel’s Quad core is a dual core with HT http://www.hypertransport.org/tech/index.cfm?m=1 If you thought PCI-E was fast at 2GB/sec, wait til HTX comes out at 2.6GB/sec and Intel won’t be using it because they can’t collect a royalty. It’s open source. AMD made HyperTransport open source so anyone can develop for it and make money with out paying a licensing fee. Intel doesn’t want to use hypertransport or HTX, they want to develope their own product and anyone that uses it must pay a license fee(timeframe late 2007). When HTX comes out it will only be available on AMD opteron systems

intels quad core is 2 dual core die in one package, not HT, anyways, it sounds as your friend is just reiterating standard vendor marketing rubbish focusing on individual elements of technology rather the complete picture. think of it this way
when you are buying a car,
you want horse power at the wheel, where rubber meets the road.
the engine generates the horsepower,
and its delivered to the wheels by the drive train ok, so we all know the AMD HT drive train is more efficient at transmitting engine horsepower,
but they sat on the K8 core (with incremental improvements) for too long,
ideally, one should have more significant improvements every 2 years,
or at worst 4 years,
so how substantially did AMD improve the core of their new quad core?
ie, single core performance?
since the Core2 engine is much more powerful than AMD, it is not consequential that the drive train is less efficient so long as the engine makes up the difference and more
HT and PCI-E are completely different matters
its not about bandwidth, which seems to be the only thing idiots can talk about PCI-E was designed for IO,
meaning the protocols are designed for IO
FSB, HT and SP (later) are designed for CPU-CPU/memory
meaning the protocols are for memory there are only a few devices that require direct CPU/memory communications
HT is very good for this because it is a stable standard
Intel did not like to open up the FSB because it was not meant to open Intel does have the SP, which is in the Itanium chipset,
used to link memory controller nodes and IO
because Itanium never took off, SP is fairly unkown next year, Intel will have something that might be called CSI,
which presumably a serial bi-directional technology (as are SP, HT, and PCI-E)
this will be used by both the Itanium and Core/Xeon lines
but it will be some time later for a common platform that can work with either
i also think that right now, amd would like to be making money,
whats the point of free technology if the company sponsoring it goes bankrupt

HTX is designed for IO http://www.hypertransport.org/tech/tech_htx_pci.cfm?m=9 I think you’re right about the 2 duo core on a wafer = quad core AMD quad core = true quad core AMD is still profitable, I think the market as a whole is down. AMD is hoping to climb to 30% of the server market share with its QUAD CORE chip. Keep in mind that Intel use to have 100% of the x86 server market. As far as your analogy of the power at the wheel goes. Please explain. Are you talking about the pipe from the southbridge to the PCI-E? Yeah I heard that INTEL was coming out with something comparable with HT because the northbridge archtitecture is just stone age.

Ok I see what your point of the Power at the wheel analogy is. My mistake sorry. But I think HTX is going to be the answer for PCI-E

I am not saying I do not like Opteron or the next gen
I am saying that alot of what is being talked about is meaningless gibberish
by idiots who do not know to focus on what is important true quad core
at the user level, you care about how many cores you have, total
whether its 4 on one die or 2 die with 2 cores each is of no meaningful important
(except cache-to-cache comms)
that AMD continues to pound on this point shows they are grasping at straws on the matter of HTX
how many HTX slots will there be on your system?
if just 1, then will there be a disk IO controller that can drive the full HTX load?
or will you be in the stupid situation of some disks on HTX and others on PCI-E I think HTX will be for high-end graphics,
and special co-processors, like they had in the good (well, not so good) old days I just do not see it being viable for storage
this is no fault of HTX,
but rather due to storage vendors being unwilling to do a massivelly powerful SAS HBA
32 SAS ports on the HTX would work for me anyways
we will soon have both AMD Quad-Core Quad-scoket versus Intel Quad-Core Quad-Socket
then we can see where the rubber meets the road

in looking over the HTX docs and recent AMD news
this is my expectation: in the Q3-Q4 time frame
for 2 socket systems
Xeon 5XXX (Core2) on 45nm versus new Opteron on 65nm
for 4 socket systems
Xeon 7XXX (Core2) on 65nm versus new Opteron on 65nm my expectation is the new Opteron on 65nm will be competitive Core 2 on 65nm, both running around 3GHz,
its highly likely that an individual core could run much higher than 3GHz, but the thermal envelope for currents system simply will not support a quad core much beyond 3GHz at 65nm New Opteron
I do like the fact that the new Opteron will have the option of:
4 x16 HT3 links
or 8 x8 links
versus the original 3 links this allows a 4 socket system to be fully connected with x16 links
or an 8 socket system to be fully connected with x8 links the original Opteron in 4 socket allows something like below
_______M_____M_______
_______|_____|_______
I/O_-_CPU_-_CPU_-_I/O
_______|_____|_______
I/O_-_CPU_-_CPU_-_I/O
_______|_____|_______
_______M_____M_______
(of course, as far as I am aware, no vendors configured more than 2 of the possible 4 I/O controllers?) the main point of the above is that memory access from CPU could be:
a. 1 hop, CPU to memory
b. 2 hop, CPU to CPU to memory
c. 3 hop, CPU to CPU to CPU to memory in the new config, it will be either 1 or 2 hops,
a third hop is not required because each CPU socket is directly connected to all other CPU sockets the x9 options allows a fully connected 8 socket system 8 socket Opteron system
this has me really excited,
with Intel systems, more than 4 sockets required a high NUMA ratio,
typically ~150ns to access to local memory
and 250-400ns for access to remote node memory while my tests showed that it is possible to build a SQL Server application that scales well on NUMA systems
it also required that you know in detail the performance characteristics of each SQL operation on that particular platform architecture
and that you architect your database in a manner so that the SQL cost based optimizer does not use the non-scalable operations as much as possible
and it may also take some very high effort query hints in certain cases. I have never had anyone willing to under go a complete re-architecture of their db to support good scalability on NUMA platformss.
(typically 10-20 weeks of my time plus the full efforts of your internal development staff for the same period, i think the freeze on features over this period was a painful as my price, but then whats $200K when you are talking about a $1M system?) Corollary(?) built the ProFusion chipset for an 8-socket Pentium III, HP/Compaq extended it to Xeon (1st gen P4)
but now only IBM and Unisys support Xeon > 4 sockets I think there is reason to hope that a common DB app, with lesser work (4-10 weeks of my time)
might scale to 8 sockets (quad core per socket) on the new Opteron system HTX for IO
Ok, HTX is great technology for high bandwidth and low latency IO
let me stress the low latency part, because PCI-E can do high bandwith too Only really critical components deserve HTX implementation.
IO devices that work fine on PCI-E should stay there.
So very high end graphics and special co-processors definitely warrant HTX implementation Now the AMD view on platform implementation seems to be something like the following:
for a 1 socket system:
1 x8 HTX slot
1 x8 HTX bridged to PCI-E
presuming the 1 socket processor option has 1 x16 HTX available presumably a 2 socket system could have 2 x8 HTX slots and 2 x8 HTX bridges to PCI-E, with one set on each socket
and a 4 socket, four of each
where a 2 socket processor option implements 2 x16 HTX ports, one used for the other proc, the second one each used for IO
and a 4 socket option implements all 4 x16 HTX ports potentially, a very specialized 1-2 socket system might implement one x16 HTX slot, 1 x8 HTX slot and 1 x8 HTX bridge to PCI-E
basically 2 x 16 HTX dedicated to IO in some combination or the other a fully connected 8 socket system would only have 1 x8 HTX available for IO
so my view is to have 2 bridged to PCI-E
and 6 for HTX IO HTX for storage
Now one could argue that high-end storage deserved HTX implementation I have frequently advocated high sequential transfer capability in storage systems
its really something to see a table scan running at 10GB/sec.
got query that does a 100GB table scan? no problem
(it is common to see people with an expensive SAN that can only do 160MB/sec or less, so for which even a 10GB table scan is a big problem) to match up well with HTX, i think a 32-port SAS adapter is best,
connected to 50-150 disks
for a SAN, it should just be an HTX connection to the SAN, which then bridges down to SAS ports another matter is that a high-end system should not have only a single storage adapter, so 2 minimum,
in this case, not nececssarily for increased, but also for redundacy
ie,
a 2 socket system with 2 HTX ports,
each HTX-SAS adapter connected to the same set of 50-150 disks each SAS adapters should be able to drive 4-5GB/sec in sequential disk IO the problem
right now
i just do not see storage (chip) vendors doing this
this will require a custom design for what they see a low volume market on top of this, SAN vendors have no means of reaching 4-5GB/sec in their systems what it comes down to is:
HTX may be great technology,
the graphics and coprocessor people will definitely adopt it
i just do not see storage vendors as having the guts to implement this technology correctly in the near future (1-2 years)
really for years, they have sold obsolete technoloy at high prices
why spend the effort to implement current technology

My friend said that one of the uses for the HTX technology is for the SAN vendors on their HBAs. If that is true then that’s pretty exciting news, but we’ll see. If HTX does becomes common household technology like PCI-Express, PCI-X, … Then basically they(SAN vendors) become the bottleneck. If there is a market for it then they have an incentive to bring it to market.

]]>