High Buffer Cache Hit Ratio but low Page Life | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

High Buffer Cache Hit Ratio but low Page Life

Experts! We are attempting to diagnose whether our SQL Server STD is experiencing a shortage of RAM. It’s on a box with RAM aplenty, only limited by virtue of the Std Edition cap. In Perf Mon, the buffer cache hit ratio looks good (99%) but we have some IO backup (as measured by Avg Disk Queue Length bumping 8-10 sometimes) which makes us suspect memory pressure. Page Life Expectancy gets very low when the system is in heavy use — minimum 10 and average maybe 150, measured across an hour, which seems bad. Which to believe? Cache hit ratio or Page Life Expectancy? Eschew obfuscation, whilst doggedly pursuing the reduction of complexification.
something about doesn’t seem right
look in perfmon or else were for the number of database pages, that is, specifically the number of 8k pages for buffer cache, not the other uses of memory max possible for Std ED is about 200K (1.6GB)
if PLE is 10, that means 200K pages are replaced in 10sec or 20K pages/sec
if buffer cache is 99%, that means you are hitting 2M pages/sec, ie, logical IO at 2M/sec, which is very high, but not out of the realm of possibility for a recent 4 proc Xeon or Opteron what is more unlikey is that you are going 20K 8K IO/sec,
if a table scan, this 160MB/sec which is ok,
but for random io, this is too high unless you happen to have 50+ disks,
see if the PLE 10 corresponds to a table scan at PLE 150, this is 1.3K pages/sec (physical) and 130K logical IO pages which is reasonable.
but are you doing 1,300 disk reads/sec?
Hi Joe – I am also puzzled. Perf Mon > Buffer Manager > Data Pages (a quick glance just now) is average 176,000; at the same time PLE is only about 80(!); page reads/sec is at this moment on a huge spike of 12,000 but averaging about 1,600
is there a big table scan going on during the drop in PLE,
if you followed the math above, PLE at 10 probably means you have a table scan doing 140MB/sec,
176K data pages, a PLE around 150 & BCH 99% corresponds to 1200 disk IOPS at 8KB per IOP,
all of which is possible & reasonable for the scenario described,
so there is no inconsistentcy disregard what i said about something not seeming right, i did not consider the table scan scenario,
unless you are not doing a table scan
I guess the issue might be the frequency of table scans — there’s not a drop in PLE followed by a recovery, PLE is just low during all the hours the system is in use. This is leading me to suspect that we do not have enough RAM even thought the Buffer Cache Hit Ratio looks OK, or that we are constantly getting large table scans.
if you are getting table scans when there should not, that can be fixed,
if you really do need more memory, your options are either SQL EE, SQL 2K5, or Quickshift (www.quickshift.com)
You’ve put your finger on it exactly <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />. The question is whether we need to get more RAM or whether we can tune- and index-away this issue.