SQL Server Performance

Too Much Ram?

Discussion in 'SQL Server 2005 Performance Tuning for Hardware' started by BenBetter, Jan 27, 2009.

  1. BenBetter New Member

    We are running a processor intensive process, that performs combinatorial operations to combine tables (pure math) as well as Monte Carlo style re-sampling techniques to combine tables. This process is, and seems to have always been, bound by processor capacity. We recently moved to a new, dedicated server and saw something strange.The new server has 30+ GB of RAM and we’ve capped SQL at 27 GB. The more ram SQL uses (easily tracked) the less processor utilization we see, and the less system performance. This process runs for days on end, so our observation window is pretty solid. When we stop the process and reset the memory we get a boost to 100 % performance. Within an hour SQL is using 4 GB or so and our performance has fallen to 80%. Within 6 hours SQL is using 12-15 GB and our performance falls to 65%. 24 hours later we are using all 27 GB and our performance is down to 50%. Over this time % processor usage falls from mid to high 90s down to mid 50s, just like our performance statistics. Is there a connection here? Does managing this obscene amount of active RAM somehow hurt my SQL instance’s performance? I have looked at all of the classic perfmon metrics… Has anyone else ever heard of something like this? What else might be the cause of my performance degradation?
  2. Elisabeth Redei New Member

    Can you pass us information about what version and plattform you are running on? Is this process constantly repeating itself so the workload is exactly the same throughout? Does your workload consist of ad-hoc sql server statements, prepared statements or stored procedures?
    Remember that SQL Server typically commits memory until it hits the roof so with a fair amount of workload and unless you are re-visiting the same data and index pages over and over, it is pretty typical to see the bufferpool (where, among other things, your data and index pages go) for SQL Server gradually grow after service startup (when you say "reset the memory" - do you mean you are recycling the service? ). So what I am trying to say is that it's not necessarily the fact that the amount of committed memory grows that is the cause of the degrading performance.
    Managing RAM of 27GB is not a problem and I wouldn't say it is an obscene amount of memory either :) Only if SQL Server memory needs to be paged out then of course it takes a lot longer to page out 27 rather than 2 GB. The way it works is that SQL Server maintains something called a Free List which varies in size depending on size of the bufferpool (you can see the size in the Buffer Manager:Free pages counter). This freelist is maintained by any worker thread at certain points and by the Lazywriter which wakes up at a regular interval (the CHECKPOINT process on the other just writes dirty pages but doesn't put them on the Free list). So it's not so that SQL Server has to scan the entire bufferpool everytime a new page is needed.
    I assume this is not a pattern you saw on your old server and that you were running the same version of SQL Server on the old server?
    Can you also tell us a little bit about the other Performance counters you have been looking at?
  3. BenBetter New Member

    I am running on Windows 2003 x64 Standard Edition. SQL Server 2005 version 9.0.3068. The process does not degrade on our development server, which is running Windows 2003 x86 Standard, version 9.0.3233. Unfortunately there are too many differences between these two systems. Investigating the differences between the two servers is where we started and we weren't able to find anything. Maybe we weren't looking in the right places though? The process is primarily the combination of probability distributions into new probability distributions. The base distributions are different for every item my company stocks, so while the operations is similar, the tables that are being accessed and created are seldom similar. This is all executed through stored procs, that move through a driver table with a list of items to process until there is nothing left to do (this takes days). The item base is highly diverse, we are running many items per minutes, and the items on the list aren't in any particular order. Meanwhile, the performance steadily declines every time we run, so we have ruled out "coincidence" as a cause of the problem. When I "recycle" the SQL instance (Services-->SQL Server(Instance)-->Right Click-->Restart)my performance will always come back to 100% for a short while. I have been monitoring pages/sec, available memory, the buffer cache hit ratio, % disk time, avg. disk queue length, avg. disk sec/ transfer, current disk/queue length, % processor Time, % Privileged Time, Interrupts/sec, context switches/sec, processes, processor queue length, and threads. Frankly, none of these look out of the ordinary to me, except for high processor utilization for the first few hours. I know there are some more SQL specific counters I can look at...Any ideas?Sorry for the giant block of text, I have been trying to space my posts into paragraphs but the carriage returns do not persist when I hit the post button. -Ben
  4. Elisabeth Redei New Member

    Hi again,
    So one platform is 32-bit and the other one is 64-bit then? I assume there is a difference in the number of processors and clockrate as well? If everything is stored procedures than the procedure cache should not be an issue (one difference is that in 32-bit your proc cache could only grow up to 2GB, possibly 3 GB if the /3GB switch was used in boot.ini; in 64-bit you don't have this limit but there are mechanisms in place that should prevent the proc cache from exploding).
    The way your CPU utilization drops suggests blocking or possibly that you are not getting the same degree of parallelism on the new server.
    I guess the main questions is whether you should focus on finding the difference between these two very different environments or troubleshoot it like any performance related problem.
    Are the databases *identical*? If not I would start with making sure I had an exact copy of the database. Even though there are other big differences between the servers that are more interesting, I would probably also bring up the 64-bit server to the same build.

    Next, compare the output from:
    sp_configure 'show advanced',1
    reconfigure with override
    sp_configure 'show advanced',0
    reconfigure with override
    Then, I would go on and do the following:
    - UPDATE STATISTICS WITH FULLSCAN on all indexes involved (this will take a while if they are big)
    - If absolutely everything is stored procedures, you can use sys.dm_exec_query_stats to do a quick compare of resources used and execution plans for the procedures (but you somehow need to syncronize which part of the process you are looking into. You can use some variation of the script I have attached. I also attached a script that gives you a snapshot of what is currently running (uses sys.dm_exec_requests, a query that runs in parallell will have multiple execution Id's, ecid column)
    - Compare the output from sys.dm_os_wait_stats on both servers. This statistics is accumulated since server restart so you would do like this:
    DBCC SQLPERF (sys.dm_os_wait_stats,'CLEAR')
    -- run your process
    -- run the attached script to get a report on the contents of sys.dm_os_wait_stats
    This will reveal where SQL Server spends most time waiting and it would be interesting to see how that differs between the servers.
    - Check degree of parallelism for a known part of the workload, or take samples at some rate. I have attached a script that will identify procedures that run in parallell (and if you haven't done so already- inspect how the CPU workload is distributed between the processors as well).
    - Although this might be a bit cumbersome considering the workload, I would try to run a SQL Server trace (using the sp_'s rather then the GUI, you can configure the trace in Profiler and then choose to save it as a script). Let me know if this is feasible and I give you some suggestions on what events to include
    - I would then run the same trace on the dev box and compare the results in a tool called Reporter (which let's you compare two different trace results (link to download area here http://sqlcat.com/technicalnotes/ar...oft-sitql-server-using-rml-utilities-9-0.aspx)
    Oh, seems like I cannot attach things, send me a private message and I will email the scripts.

Share This Page