SQL Server Performance

Opteron vs Xeon preference??

Discussion in 'SQL Server 2005 Performance Tuning for Hardware' started by pchee373, Feb 20, 2007.

  1. pchee373 New Member

    We are beginning to start a hardware refresh and I was wondering what type of preference people on this board have? In my experience I have used both Opteron and Xeon. And my preference has leaned towards the AMD Opterons sides becauses of performance and price. But I have found a lot of more XEONs out there then Opteron.
  2. MohammedU New Member

  3. Luis Martin Moderator

    I wonder if it valid also for 2005. Wait for Joe answer. Now to be sure about Joe reading I've move (arbitrary) to hardware forum.



    Luis Martin
    Moderator
    SQL-Server-Performance.com

    All in Love is Fair
    Stevie Wonder


    All postings are provided “AS IS” with no warranties for accuracy.



  4. joechang New Member

    it seems like pchee is asking for non-technical preferences

    my own suggestion is that DBAs not get emotionally involved in the platform if it gets the job done.
    whoever has the lead in any particular area can change in 6 months to 1 year

    right now, dual-core to dual-core
    Intel has a moderate lead in TPC-C which is more high volume calls, AMD has a moderate lead in TPC-H which is big queries and bandwidth intensive

    At the platform level
    for 2 sockets, I like the Proliant ML370 for Xeon because it supports 64GM mem, 6 PCI-e sockets, and 16 internal disks

    the AMD can make a perfectly good 2 socket platform, but vendors only offer 2U and 32GB max mem.

    At 4 socket, I like the Proliant DL585 because it has 3x8PCIe and 4x4 compared with 6 x4 for the Intel
    and 128G max mem compared with 64G for Intel

    the server planning team at Intel could never seem to give clear and intelligent direction to the silicon designers providing vision 4-6 years in the future,
    resulting in frequent chaos, band-aiding, catch-up, direction changing etc
    The 5000P was well executed,

    but the 8501 is getting long in the tooth
    when do they expect to have a Core 2 in 4 socket?
    with >64G memory support, hopefully they will have the sense to jump to 256G considering they have sat at 64G for so long (excluding a meaningless 128G DDR-1 option)
    will it support 8-10 PCI-e sockets?

    Update
    from 2007-Feb-21 Intel slide deck
    Core 2 micro arch in Xeon 7000 series in Q3 -07,
    not sure if this is dual core or quad core
    I can't wait,
    Quad Independent Bus
  5. pchee373 New Member

    I have a friend who is in the server business. He says that the XEON weakness is the Northbridge. It is 80's technology. 2 Sockets AMD and Xeons are comparable, but 4 sockets with dual core, you're beginning to saturate the Northbridge, advantage Opteron. He says Intels is gonna have a hard time to beating AMD on 4 socket, multi core etc, until they come out with newer technology. BUT... I like AMDs price
  6. joechang New Member

    your friend probably does not have direct access to performance models and simulations from the silicon architects/designers
    and is just regurtitating bs put out by the marketing people, who are paid to put the best possible spin on a situation regardless of the whole truth and nothing but the whole truth.

    What it comes down to is shared bus vs point to point links.
    Any given piece of silicon can have a limited number of pins to the outside world at a given cost structure.
    Today this is around 1400,
    so figuring about 700 are available for comm,
    The rest for power and ground.
    If you have a 64-bit bus shared with 3 devices,
    then may you can run the bus at 800MHz,
    but any two devices can use the full bandwidth at one time.
    With the same number of pins,
    one device can have two 32-bit buses,
    one to each of the other 2 devices.
    You can now drive the bus at 1333MHz,
    about 50-60% faster, so there is more total bandwidth,
    but the max bandwidth between 2 devices is reduced.

    The exciting new technology is point to point signaling,
    which today, can run at 2.5Gbit/sec.
    I think the reason p-p can run so much faster than a bus between 2 devices has to do with clocking.
    The other neat trick is that it is possible to send and receive signals on the same wires simultaneously.
    This is why x8 PCI-E can do 2GB/sec in each direction for a total IO BW of 4GB/sec.

    The more significant advantage of the high speed narrow p-p links over shared bus is that there is less latency in issuing the address of a memory request or write,
    not the data bandwidth that marketing people like to tout.
    Basically they look for the biggest number,
    not the more significant number.
    The above is true for many server workloads.
    I do believe FP burns bandwidth, this is why AMD will present an aggregate benchmark including FP.

    I do not believe that the current Intel system is saturated on the FSB/Northbridge.
    I think only the dual core Xeon MP 70X0 with 2M cache per core was saturated in 4-way systems.
    The current Tulsa (Xeon 71X0) with 16M shared L3 cache is tops among X86 in 4-scoket TPC-C.

    My recollection was that for a given processor core architecture (not bus),
    implementing integrated memory controller + point to point links to other processors was estimated to yield at 15-25% advantage over the same core architecture on shared bus.
    So if you cannot build an equivalent core,
    you will not have the full 20% advantage of IMC & PP in the final complete system.
    in fact if your core is more than 20% below the other core,
    your net system perf will be lower.

    AMD (marketing) likes to talk about the theoretical advantages of this new technology.
    Theory is nice, but actually realizing performance gain means executing many components of the complete system correctly.

    If what your friend says is true ,
    then why does the Intel 4 socket have better TPC-C than AMD?
    even though the Xeon 7140 is severely underclocked at 3.4GHz
    If the poor leakage characteristics at 90 and 65nm had been solved, the NetBurst architecture probably would have hit 4-5GHz at 90nm and 6-7GHz at 65nm.

    For 2006 Intel has Core 2 on 65nm (design in Israel)
    Expect Penryn - Core 2 at 45nm this year.
    A new architecture (Nehalem on 45nm) next year designed by the Oregon team.
    a shrink to 32nm in 2009 (Westmere).
    A new architecture (Gesher) in 2010 presumably from the Israel design center.
  7. joechang New Member

    on the matter of new improved technology
    how many of the readers remember the old RISC/CISC/X86 ideological/religous war & conversion effort of the 1980s into the 90s?

    first, X86 is not really CISC but originated as an inconsequential step child, inadvertently became famous, then had massive surgery to cover up its humble origins.

    the whole concept of RISC was that much of the silicon in a CISC chip was rarely used, resulting in wasted silicon and more expensive design effort

    the theory put forward by RISC advocates was that a simple design would have most of the silicon real estate put to good use on most clock cycles, so it would have competitive performance, and relatively lower effort to design

    the thought in the mid 90's was that on a given manufacturing process, a processor designed to use a set chip size (400mm^2 typical for a new design
    the RISC chip should out perform the X86 by approx 20%
    most from having 32 GP registers over 8 for X86
    second most from less silicon spent decoding instructions, making more available to actually executing the instruction

    at the time, an X86 design team might have had over 400 people, compared to less than 100 for the contemporary RISC

    the argument was so convincing that the powers at Intel decided they had to something better than RISC

    this was not unreasonable because RISC originated when it was difficult to get the entire processor on to a single silicon chip.
    by the mid-90s, there was so much silicon real estate that it was reasonable to propose a better idea to fully utilize all that silicon

    the RISC argument was so pervasive in academia that all CS/EE students bought in to it and did not want to work on X86, even though it had a very sizeable market

    there was one person, fred i think, who realized many risc concepts could be applied to x86, and with perfect execution, the massive financial resources, and best manufacturing process, to produce a competitive X86, even though it had no natural beauty

    as it turned, the Pentium Pro more or less matched the RISC chips in peformance, later X86 from both Intel and AMD competely clobbered the RISC chips
    IBM Power 5 leads in FP only because of the massive memory bandwidth that is cost prohibitive in the volume market
  8. pchee373 New Member

    He is a Bios programmer for for the Newisys and they build servers for HP and SUN so he does have access to all sort of performance models and trends. He pretty much knows what is fluff and what is the good stuff. I forwarded what you said to him and just waiting for a response. the only recent comparision stuff I've found are from Anandtech http://www.anandtech.com/IT/showdoc.aspx?i=2745&p=4.
  9. bytehd New Member

    Best NOT to trust those underpaid hardware magazine types.
    Esp Toms, Anandtech and Hot Hardware.

    As usual, they test 10-15% of a market segment and declare a review finished.

    Stick with Intel.
    What if Intel solved the bus issues this year with 45nm silicon and new FSB chipsets?
    Then AMD will be finished.
  10. joechang New Member

    the more proper comparison between the Intel Xeon 7041 2x3.0GHz/2M
    and Opteron 2.4GHz DC is the published TPC-C.
    (this is the Xeon MP I mentioned above as being constrained, because it was really meant for 2 socket systems, but was band-aided into the 4-scoket until Tulsa was ready)

    |Fujitsu|4|Intel 7041 DC|3.0GHz| 64GB|450-15K+8-15K| 188,761|
    |HP |4|Opteron 880 |2.4GHz|128GB|394-15K+12-10K| 206,181|

    However, I will not complain too much on the Anandtech result.
    Large system performance testing is very tricky matter and requires skills beyond what non-professionals have.
    Without disclosure and careful analysis, it is possible to generate any result range on memory difference alone

    On top of that, getting maximum performance out of the Xeon is especially tricky with Hyper-threading having potential positive and negative characteristics, negative if disk IO is involved and special precautions.

    I am not ready to declared AMD dead,
    Tigerton/Caneland comes out this year, but 45nm 4-socket is probably a full year away or more.
    Also, while Core 2 delivers spectacular SPEC CPU, the advantage is not as large in TPC,
    My guess is the disk IO code is responsible
    My own tests with SQL Server operations in memory show excellent Core 2 characteristics, probably in range with SPEC CPU

    I am also inclined to think that so long as AMD can produce a decent processor core, there will be enough Intel missteps to keep AMD in the game.
    I just do not think their server planning team can provide a clear and concise vision of product requirements 4-6 years into the future for a silicon design team to build to without last minute (1-2 years in silicon world) course changes
  11. joechang New Member

    any response?
  12. pchee373 New Member

    From what I'm hearing right now, even from my friend, the XEONS are faster at the DUAL processor server level. because Intel is pretty much pushing the limit. But they are still sharing the Northbridge. With a shared bus architecture they need to pretty much optimized the chip to the limit and add ton of cache memory. That equals more expensive chip, more power consumption, and HEAT(AC cost). At the QUAD processor, the XEON can handle the smaller queries faster because it can get on and off the northbridge pretty fast, but the Opteron leads the large block queries (in agreement) The northbridge is saturated.

    QUAD Core technologies

    AMD will be coming out with true quad core technology in a couple of months
    Intel's Quad core is a dual core with HT

    http://www.hypertransport.org/tech/index.cfm?m=1

    If you thought PCI-E was fast at 2GB/sec, wait til HTX comes out at 2.6GB/sec and Intel won't be using it because they can't collect a royalty. It's open source. AMD made HyperTransport open source so anyone can develop for it and make money with out paying a licensing fee. Intel doesn't want to use hypertransport or HTX, they want to develope their own product and anyone that uses it must pay a license fee(timeframe late 2007). When HTX comes out it will only be available on AMD opteron systems
  13. joechang New Member

    intels quad core is 2 dual core die in one package, not HT,

    anyways, it sounds as your friend is just reiterating standard vendor marketing rubbish focusing on individual elements of technology rather the complete picture.

    think of it this way
    when you are buying a car,
    you want horse power at the wheel, where rubber meets the road.
    the engine generates the horsepower,
    and its delivered to the wheels by the drive train

    ok, so we all know the AMD HT drive train is more efficient at transmitting engine horsepower,
    but they sat on the K8 core (with incremental improvements) for too long,
    ideally, one should have more significant improvements every 2 years,
    or at worst 4 years,
    so how substantially did AMD improve the core of their new quad core?
    ie, single core performance?
    since the Core2 engine is much more powerful than AMD, it is not consequential that the drive train is less efficient so long as the engine makes up the difference and more


    HT and PCI-E are completely different matters
    its not about bandwidth, which seems to be the only thing idiots can talk about

    PCI-E was designed for IO,
    meaning the protocols are designed for IO
    FSB, HT and SP (later) are designed for CPU-CPU/memory
    meaning the protocols are for memory

    there are only a few devices that require direct CPU/memory communications
    HT is very good for this because it is a stable standard
    Intel did not like to open up the FSB because it was not meant to open

    Intel does have the SP, which is in the Itanium chipset,
    used to link memory controller nodes and IO
    because Itanium never took off, SP is fairly unkown

    next year, Intel will have something that might be called CSI,
    which presumably a serial bi-directional technology (as are SP, HT, and PCI-E)
    this will be used by both the Itanium and Core/Xeon lines
    but it will be some time later for a common platform that can work with either


    i also think that right now, amd would like to be making money,
    whats the point of free technology if the company sponsoring it goes bankrupt
  14. pchee373 New Member

    HTX is designed for IO

    http://www.hypertransport.org/tech/tech_htx_pci.cfm?m=9

    I think you're right about the 2 duo core on a wafer = quad core

    AMD quad core = true quad core

    AMD is still profitable, I think the market as a whole is down. AMD is hoping to climb to 30% of the server market share with its QUAD CORE chip. Keep in mind that Intel use to have 100% of the x86 server market.

    As far as your analogy of the power at the wheel goes. Please explain. Are you talking about the pipe from the southbridge to the PCI-E?

    Yeah I heard that INTEL was coming out with something comparable with HT because the northbridge archtitecture is just stone age.

  15. pchee373 New Member

    Ok I see what your point of the Power at the wheel analogy is. My mistake sorry. But I think HTX is going to be the answer for PCI-E
  16. joechang New Member

    I am not saying I do not like Opteron or the next gen
    I am saying that alot of what is being talked about is meaningless gibberish
    by idiots who do not know to focus on what is important

    true quad core
    at the user level, you care about how many cores you have, total
    whether its 4 on one die or 2 die with 2 cores each is of no meaningful important
    (except cache-to-cache comms)
    that AMD continues to pound on this point shows they are grasping at straws

    on the matter of HTX
    how many HTX slots will there be on your system?
    if just 1, then will there be a disk IO controller that can drive the full HTX load?
    or will you be in the stupid situation of some disks on HTX and others on PCI-E

    I think HTX will be for high-end graphics,
    and special co-processors, like they had in the good (well, not so good) old days

    I just do not see it being viable for storage
    this is no fault of HTX,
    but rather due to storage vendors being unwilling to do a massivelly powerful SAS HBA
    32 SAS ports on the HTX would work for me

    anyways
    we will soon have both AMD Quad-Core Quad-scoket versus Intel Quad-Core Quad-Socket
    then we can see where the rubber meets the road
  17. joechang New Member

    in looking over the HTX docs and recent AMD news
    this is my expectation:

    in the Q3-Q4 time frame
    for 2 socket systems
    Xeon 5XXX (Core2) on 45nm versus new Opteron on 65nm
    for 4 socket systems
    Xeon 7XXX (Core2) on 65nm versus new Opteron on 65nm

    my expectation is the new Opteron on 65nm will be competitive Core 2 on 65nm, both running around 3GHz,
    its highly likely that an individual core could run much higher than 3GHz, but the thermal envelope for currents system simply will not support a quad core much beyond 3GHz at 65nm

    New Opteron
    I do like the fact that the new Opteron will have the option of:
    4 x16 HT3 links
    or 8 x8 links
    versus the original 3 links

    this allows a 4 socket system to be fully connected with x16 links
    or an 8 socket system to be fully connected with x8 links

    the original Opteron in 4 socket allows something like below
    _______M_____M_______
    _______|_____|_______
    I/O_-_CPU_-_CPU_-_I/O
    _______|_____|_______
    I/O_-_CPU_-_CPU_-_I/O
    _______|_____|_______
    _______M_____M_______
    (of course, as far as I am aware, no vendors configured more than 2 of the possible 4 I/O controllers?)

    the main point of the above is that memory access from CPU could be:
    a. 1 hop, CPU to memory
    b. 2 hop, CPU to CPU to memory
    c. 3 hop, CPU to CPU to CPU to memory

    in the new config, it will be either 1 or 2 hops,
    a third hop is not required because each CPU socket is directly connected to all other CPU sockets

    the x9 options allows a fully connected 8 socket system

    8 socket Opteron system
    this has me really excited,
    with Intel systems, more than 4 sockets required a high NUMA ratio,
    typically ~150ns to access to local memory
    and 250-400ns for access to remote node memory

    while my tests showed that it is possible to build a SQL Server application that scales well on NUMA systems
    it also required that you know in detail the performance characteristics of each SQL operation on that particular platform architecture
    and that you architect your database in a manner so that the SQL cost based optimizer does not use the non-scalable operations as much as possible
    and it may also take some very high effort query hints in certain cases.

    I have never had anyone willing to under go a complete re-architecture of their db to support good scalability on NUMA platformss.
    (typically 10-20 weeks of my time plus the full efforts of your internal development staff for the same period, i think the freeze on features over this period was a painful as my price, but then whats $200K when you are talking about a $1M system?)

    Corollary(?) built the ProFusion chipset for an 8-socket Pentium III, HP/Compaq extended it to Xeon (1st gen P4)
    but now only IBM and Unisys support Xeon > 4 sockets

    I think there is reason to hope that a common DB app, with lesser work (4-10 weeks of my time)
    might scale to 8 sockets (quad core per socket) on the new Opteron system

    HTX for IO
    Ok, HTX is great technology for high bandwidth and low latency IO
    let me stress the low latency part, because PCI-E can do high bandwith too

    Only really critical components deserve HTX implementation.
    IO devices that work fine on PCI-E should stay there.
    So very high end graphics and special co-processors definitely warrant HTX implementation

    Now the AMD view on platform implementation seems to be something like the following:
    for a 1 socket system:
    1 x8 HTX slot
    1 x8 HTX bridged to PCI-E
    presuming the 1 socket processor option has 1 x16 HTX available

    presumably a 2 socket system could have 2 x8 HTX slots and 2 x8 HTX bridges to PCI-E, with one set on each socket
    and a 4 socket, four of each
    where a 2 socket processor option implements 2 x16 HTX ports, one used for the other proc, the second one each used for IO
    and a 4 socket option implements all 4 x16 HTX ports

    potentially, a very specialized 1-2 socket system might implement one x16 HTX slot, 1 x8 HTX slot and 1 x8 HTX bridge to PCI-E
    basically 2 x 16 HTX dedicated to IO in some combination or the other

    a fully connected 8 socket system would only have 1 x8 HTX available for IO
    so my view is to have 2 bridged to PCI-E
    and 6 for HTX IO

    HTX for storage
    Now one could argue that high-end storage deserved HTX implementation

    I have frequently advocated high sequential transfer capability in storage systems
    its really something to see a table scan running at 10GB/sec.
    got query that does a 100GB table scan? no problem
    (it is common to see people with an expensive SAN that can only do 160MB/sec or less, so for which even a 10GB table scan is a big problem)

    to match up well with HTX, i think a 32-port SAS adapter is best,
    connected to 50-150 disks
    for a SAN, it should just be an HTX connection to the SAN, which then bridges down to SAS ports

    another matter is that a high-end system should not have only a single storage adapter, so 2 minimum,
    in this case, not nececssarily for increased, but also for redundacy
    ie,
    a 2 socket system with 2 HTX ports,
    each HTX-SAS adapter connected to the same set of 50-150 disks

    each SAS adapters should be able to drive 4-5GB/sec in sequential disk IO

    the problem
    right now
    i just do not see storage (chip) vendors doing this
    this will require a custom design for what they see a low volume market

    on top of this, SAN vendors have no means of reaching 4-5GB/sec in their systems

    what it comes down to is:
    HTX may be great technology,
    the graphics and coprocessor people will definitely adopt it
    i just do not see storage vendors as having the guts to implement this technology correctly in the near future (1-2 years)
    really for years, they have sold obsolete technoloy at high prices
    why spend the effort to implement current technology
  18. pchee373 New Member

    My friend said that one of the uses for the HTX technology is for the SAN vendors on their HBAs. If that is true then that's pretty exciting news, but we'll see. If HTX does becomes common household technology like PCI-Express, PCI-X, ... Then basically they(SAN vendors) become the bottleneck. If there is a market for it then they have an incentive to bring it to market.

Share This Page