Help with high Page Faults/sec | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Help with high Page Faults/sec

We are receiving a lot of complaints about poor performance with an application. The only possible issue I see on SQL Server is memory. I ran perfmon for the following counters over a 30 minute window. While perfmon was running, the users said performance dropped during a 10 minute portion of this window. Below are the results. Counter Average
Page Faults/sec 6966.941
Page Reads/sec 59.555
Page Writes/sec .857
Pages Input/sec 952.435
Pages Output/sec 13.712
Pages/sec 966.146 We have a 2GB pagefile and only about 50MB is being used, which isn’t bad. 2 – 4% paging appears to be about normal. Physical memory is 4GB, with 3GB allocated to SQL Server via the /3GB switch. The OS is Windows Server 2003 SE, SP1 and SQL is SQL Server 2000 EE with SP3a. I’m not sure how to interpret the counters, specifically Page Faults/sec. The count appears very high, yet the page file usage is very small. What are your thoughts? Thanks, Dave

1st, don’t worry about the Page Faults/sec, as that includes both hard and soft, and you only care about hard. notice the large majority is input & read, not writes,
this means you are probably evicting executable binary, not data structures, if you paged data structures, that must be writen to the page file, then read back
if paging is on the binary, that can be discarded, then read from the original executable code. SQL Server by itself should not generate any paging activity,
this should occur if there are other processes competing for memory.
of course this is less likely on a system with 4GB where SQL can only use 3GB.
unless someone turned on AWE memory?
I see high pages/sec. May be you need more memory.
Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided “AS IS” with no warranties for accuracy.
Joe & Luis, Further investigation revealed the page faults occur every 15 minutes while a transaction log backup job is running. The backup job copies the T-Log file to a server located at our alternate data center. The job step to copy the file is what appears to be causing the page faults. Perhaps a high amount of page faults during a file copy are normal. Any thoughts? Thanks, Dave

One of the technet article refers
The Cache Manager incurs a page fault when it attempts to access a file segment that is not resident in the cache. The Windows 2000 Virtual Memory Manager is then invoked to resolve the page fault. VMM determines that the page fault falls within the scope of a mapped file virtual address range. VMM accesses the virtual address descriptor (VAD) associated with the virtual address, which, in turn, points to the file object mapped into that specific 256 KB cached file segment block. VMM then issues a callback to the appropriate filesystem driver, which generates and processes a physical disk I/O request. This disk request copies data into Cache memory, not the application’s virtual address space. At this point, the page fault is resolved and the Cache Manager processing resumes. As on a hit, the data in cache is then copied into the application data buffer provided. Check the Copy Reads/sec and Copy Read Hits % counters to measure file cache activity that uses the Copy interface, which applies to most normal applications. The Redirector service used to resolve networked file requests uses the Fast Copy interface routinely, as does the file server service for smaller sized requests. Satya SKJ
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
Satya, I’m not sure I understand. Are you saying it is normal to see page faults when copying a file? Here are the counters for the 3 minutes the transaction log backup is running. Counter———-Average
Copy Reads/sec—89.799
Copy Read Hits %-98.016
Page Faults/sec–1692.734
Page Reads/sec—43.978
Page Writes/sec–1
Pages Input/sec–703.058
Pages Output/sec-16.004
Pages/sec——–719.062
% Usage———-5.942 Here are the counteres for a one hour duration. Counter———-Average
Copy Reads/sec—20.839
Copy Read Hits %-99.678
Page Faults/sec–491.420
Page Reads/sec—9.317
Page Writes/sec–1.014
Pages Input/sec–148.879
Pages Output/sec-16.216
Pages/sec——–165.095
% Usage———-5.927 Thanks, Dave
When you copy any file from server to other destination (even same server) pages/sec and pages faults/sec get higher.
Looking at 1 hours counters, I don’t see any issue.
Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided “AS IS” with no warranties for accuracy.
Dave
It could be a problem of memory crunch or misconfigured disk and it is taking lot of resources while copying the files. I can see average pages/sec counter is high as per normal situation, may check with SysAdmin for any network related issues in this environment. Satya SKJ
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
I noticed the NIC’s throughput would spike up to 95 mbps on a NIC configured for 100 mbps. I don’t know the danger point with NICs in terms of a threshold that would indicate a bottleneck, but I assume spikes to 95% during the time I see a high number of Page Faults is an indication the NIC is a possible bottleneck. During the two to three minutes of time it takes to copy the T-Logs to a remote server I see the NIC throughput consistently above 75% or 75mbps. Here is my overall take of the situation. The number of Hard Page Faults/sec, from what I’ve read, should be less then 20 on average (some people say less then 5 or even 2). Since I’m seeing over 900 that indicates a possible memory bottleneck. However looking at the problem a little closer reveals the high Page Faults/sec occur during a file copy operation, where the file can be as large as 1GB. Since this operation (T-Log Backups) is occurring every 15 minutes, odds are most of the data being copied does not exist in physical memory, but rather on disk. For this reason I should expect to see a high reading for Pages/sec. Looking into the problem further I decided the copy activity would cause the 1GB file (or a large part of it) to be pulled into physical memory, thereby flushing any SQL Server generated data out of physical memory and causing other SQL Server activities to have to read from disk rather then physical memory while the file copy is occuring. I am still a bit uncertain with this part of my theory due to how the OS has been assigned 1GB of physical memory on my server and SQL Server the remaining 3GB. When the copy activity occurs is the memory usage being taken from the OS allocated memory or the SQL Server allocated memory (since SQL Server invoked the copy command from the SQL Job scheduler). Finally, looking at the NIC throughput during the file copy shows the NIC was being hit heavily, further adding to slow server performance. At least this is what I think is happening. Does this make reasoning make sense or am I missing something? Thanks, Dave

It sound reasonable to me.
Luis Martin
Moderator
SQL-Server-Performance.com Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason.
Leonardo Da Vinci Nunca esperes el reconocimiento de tus hijos, eso ocurrirá luego de tu muerte
All postings are provided “AS IS” with no warranties for accuracy.
Any chance of changing those NIC on this server or simlate the same on testing environment? Satya SKJ
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
Hi ya, the fact that you’ve set the 3GB switch, emans that SQL can at the most use 3GB out of its virtual memory space (as opposed to the normal 2GB) It does not imply that SQL does take 3GB, which depends on factors such as other processes, laod on SQL, size of data, etc. It also doesn’t limit the OS/other processes ability to take more physical RAM than 1GB, as we are talking about how to split the virtual memory of a single process. I’d open a perfmon session, measure the pages/sec for every process. Set the graph to be a bar graph. Look for any bars which are wildly higher than any others, _total will be one of them which can be ignored This will tell you which process is causing the paging In terms of the NIC, normally a nic/network running at 70% utilisation or above will start to see significant slow downs for network activity. So you may find that the paging isn’t causing you any problem but that it is the network utilisation itself which is causing this. You can deploy the same trick as above but looking at network utilisation for each process. If you find that the large file copies are the problem and you do have to have large logfile/db copies from the db server to another storage media, then it is sometimes worthwhile having a separate NIC/network for that purpose to ensure continued responsiveness to the application. Another option is to compress the backups either using pkzip after the backup is done or something like SQLZip which compresses during the backup process Cheers
Twan
Thanks all for your help. Satya, The NIC is a gigabit NIC that is set to 100/FULL. We will be changing it to 1000/FULL. Dave
Dave, I’m seeing this conversation late in the game but a couple of comments. PageFaults/sec are not an indicator of a problem unless Pages/sec is hihg as well. A page fault simply means that the VMM didn’t find the next instruction in contiguous memory. Page Fault and Pages/sec do not directly correlate. The level of Pages/sec you are seeing is a definitel problem if sustained for more than a very short period of time and indicates that you either 1) have too much of a non-SQL load on the server or you don’t have enough memory for the workload you are running. Memory is cheap…if your server will handle more, add it and use the PAE boot.ini switch to go above the 4gb line. Your question about OS taking memory from SQL: set your minimum server memory setting to the level you want SQL Server to have. If you are still concerned about SQl Server memory you can set the ‘set working set size’ option as well.
If the SQL Server is only application on the server then I wouldn’t recommend to play with SET WORKING SIZE configuration and leave memory setting as dynamic on SQL. Satya SKJ
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
Interesting thread all! Just some additional things I thought I would share. I have frequently seen problems with the /3GB switch and large file operations, including file copies and backups. The file size I was seeing problems with was over 50GB, so it may not apply to a 1GB file, but the much of the memory concept is the same. I’m also not sure how much of this was fixed with Windows 2003.
http://support.microsoft.com/?kbid=259837
http://support.microsoft.com/?id=311901
http://support.microsoft.com/?id=304101 Some excerpts – the functions used by all of the copy utilities that are included with Windows NT 4.0 and Windows 2000 (COPY, XCOPY, Explorer and Robocopy.exe) use buffered input/output (I/O) to transfer data from the source file to the destination file. Whenever buffered I/O is performed on a file, there is a paged pool memory overhead requirement that is proportional to the size of the file…In Win2k, the system Page Table Entries space is located in the kernel address space with a 2-gigabyte (GB) limit…A range of 80,000 to 140,000 system Page Table Entries is available in a standard configuration, while only 40,000 Page Table Entries are available when using the /3GB switch. Using the /PAE switch instructs the memory manager to use two PTEs to map every page of memory (in effect, limiting to 20,000 usable PTEs)… In any case, I don’t suggest using the /3GB switch. If you need the extra memory for SQL (and you are already running Enterprise class), just add the memory. Bumping up the speed on the GigNic should help too. Ever think about SQL litespeed also? It could shrink down the size of your TransLog backup files before you send them over the network.

Thanks for the input. Unfortunately adding more memory is not an option unless we upgrade the OS to Enterprise Edition. The problem we thought was NIC related turned out to be a bad switch. Despite that we are still going to use the gigabit switch instead of the 100 megabit. We’ve noticed performance problems with copying large files from this server to another server, however copying large files to this server is not an issue. There are some error messages in the event logs pertaining to possible disk problems, however the copy problem occurs on all drives. Our server team is researching the issue. I’ll keep in mind what you say about the kernel address space. If the server team resolves the issue with the disk error message an we still see performance issues with copying large files to another server, I’ll remove the /3GB switch and see if performance improves. The moral of this story is don’t install SQL EE on a Windows SE. I’m kicking myself for being talked into it just to save a few dollars. There’s a great misconception regarding the price of Windows EE and SQL EE vs. SE, especially with SQL Server. I can’t tell you how many people I’ve spoken with who think SQL EE is over $10,000 when in reality they are thinking about per processor licensing and not CAL licensing. But that’s a subject for another day. Thank again, Dave
]]>