How to get more information about the buffer cache - filesystems

I am using kvm, and I want to benchmark nested page tables with 2MB (huge page) hyervisor page size. It can be done in linux with hugetlbfs, where 2MB pages belong to a filesystem, and the application can then mmap from this filesystem and be sure that its actually using a 2MB page.
So, in qemu kvm, by running qemu with argument -mem-path /path, qemu will start using a 2MB page.
I want to make sure that this indeed my guest operating system is backed by 2 MB physical huge page.
I am using 12G physical memory for the guest.
So this is probably how the control should flow.. When the guest uses some page for the 1st time, it will take a page fault in the guest and in the hypervisor, a 2MB page should be mapped by the host(hypervisor). This 2MB page is backed by hugetlbfs, hence it should be a part of buffer cache.. Am i right?
So is there a way where i can get more information about buffer cache and see how many pages of different filesystems are in the buffer cache?
Its important for me, because I want to benchmark 2MB page in hypervisor with 4KB page in hypervisor, and I dont want the page faults in hypervisor to have an effect in my measurements. So, I want to get all the hypervisor pages in physical memory for both the cases and then start my benchmarking.
Thanks

Related

register large buffer for RDMA in Linux kernel module

i'm a newbie experimenting a project using rdma (ib_verbs) in kernel module. I got the example code from krping and tinkering on it. The system run on 64bits Linux Centos with a custom 3.10 Linux kernel that require transparent huge pages disabled.
I want a large (4GB up) of RDMA read/write able space which doesn't have to be contiguous as i'll most likely write/read at most 1MB at a time from remote party (random access).
Question:
Should i just do a thousand times of 4MB kmalloc and register DMA region? How bad it is, design wise for allocating large chuck of memory using kmalloc instead of vmalloc? I heard it should not be done and large memory should only retrieved via vmalloc. But addresses from vmalloc are not good for DMA.
If not then what would be a good alternative way to have a 4GB buffer that can be random access from remote party?
How does user-space rdma manage this kind of buffer? I remembered that i only malloc 4GB of memory and call ibv_reg_mr and it is ready to use.
As long as you're not using a memory that covers the entire physical memory (which isn't recommended for write-enabled MRs), you should use the IB_WR_REG_MR work request to register your memory region. For that, you would use the ib_map_mr_sg function which accepts a scatterlist and a page size. So basically, you can register an MR that is built with chunks of a fixed size that you choose.
There's a tradeoff here: using small allocation size will allow the kernel to find free memory easier on fragmented systems, but on the other hand it could decrease performance, as it can increase the load on the NIC's IOTLB.
User-space handles large MR registration by calling get_user_pages and using the system's page size (normally 4kb). Though some drivers have optimizations to try and detect larger page sizes internally, if the user-space memory happens to align that way.

What is use of extended page table?

Can we show page table address using c program?
what is the difference between page table and extended page table?
Can we show page table address using c program?
Not using a plain-old C program, no you can't. User-mode programs run in virtual memory, which is provided by the kernel, using paging mechanisms. All of this is abstracted away so userspace knows nothing about it.
The Linux kernel does provide a mechanism for userspace to observe the pagetables however, as indicated at this question.
what is the difference between page table and extended page table?
"Extended page tables" are Intel's implementation of Second Level Address Translation (SLAT), also known as nested paging, which is used to more efficiently virtualize the memory of guest VMs.
Basically, guest virtual addresses are first translated to guest physical addresses, which are then translated to host physical addresses. This is all done in hardware (by the MMU) to avoid extra work needing to be done in software by the VMM.
1] What is second level Page Table
Extended page tables are a mechanism to allow each virtual machine to manage its page table, without giving access to the underlying host machine's MMU - Hardware.
Have a quick look at the link below. It should give an idea
http://www.cs.cmu.edu/~dga/15-440/F10/lectures/vm-ucsd.pdf
2] Is it possible to print Page Table using a C program?
- Its perfectly possible. There will be an MMU driver in your system. MMU driver will be setting up the Page Tables in some part of RAM. You need to know that location.
In conventional operating systems [linux,windows etc] . This memory area would be privileged, so applications may not get direct access.
If your platform is an embedded system with a micro-kernel running on it, probably you will be able to access this table.

How can you limit RAM consumption in a process?

How can you limit the physical memory consumption of a C program from within the source code on a linux 2.6.32 machine?
I need to determine the type of page replacement algorithm the system is using.
The problem is that without limiting the number of pages a process can have in memory, it becomes difficult to analyze the pattern of page faults to determine the page replacement algorithm.
Also, I don't have root access on the machine.
setrlimit(RLIMIT_MEMLOCK, ...).

Can I create a file system accessible from CE 6.0 and my bootloader?

I have a CE 6.0 project on a PXA310 where I need to be able to download OS updates (nk.bin) via Wi-Fi and safely flash the new OS to my device. I'm open to other suggestions about how to do this, but I'm considering saving the nk.bin to my file system in NAND flash, then restarting and have the bootloader locate the file in the file system and flash it to the BINFS partition. Is this possible, and if so, can you give me an outline of what I'd need to do?
One caveat is that this needs to be very robust since the devices are deployed in the field and are not field serviceable. I need to be sure that if the OS flash fails (due to power failure, etc.) that upon reboot the bootloader can try again. That is why I'd like to store the downloaded image in persistent flash and avoid having to re-download the image.
Technically just about anything is possible. For this strategy what you would need is code for your bootloader to mount the NAND flash as a drive and have a FAT driver so that it can traverse that file system and find the image. That is a lot of work if you don't already have it.
THe other option is to just store it in flash outside of the file system in a known address location. That's a lot easier from the bootloader perspective as all you have to do is map to the address and copy. Of course it makes the writes more challenging because then you're doing it from the OS and you have to disable any other flash accesses completely while you do your write to prevent corruption by two threads sending flash commands to the chip at the same time.
In either case, if you have the space it's a good idea to store a "known-good" image elsewhere too, so that if the new image has a problem (fails a checksum or x number of load attempts fails) then you have a working OS that the bootloader can fall back to.
Clearly a lot depends on your hardware setup, but we've done this without making the Bootloader support the Flash Filesystem.
In our product, the OS image is loaded from Flash to execute from RAM -- I think most WinCE devices work this way nowadays. So to update the OS we use a special Flash driver which lets an application, running under WinCE, update the OS blocks in the Flash -- then all you need is a hard reboot and the Bootloader loads the new flash image into RAM in order to execute it. We've found this pretty reliable in the field (with some not-very-technical end-users!).
A special Flash driver was needed because the MS Flash Filesystem drivers have no access to the OS image sectors of the Flash, in order to prevent trashing the OS by accident.
You do need to load the NK.BIN into some memory which the OS programming application can read, normally the NAND Flash, but if you had enough RAM it could just go into the root of the filestore. However either way you can delete it when you've finished programming the OS sectors before the reboot so it's only a temporary requirement.

Appropriate Windows O/S pagefile size for SQL Server

Does any know a good rule of thumb for the appropriate pagefile size for a Windows 2003 server running SQL Server?
With all due respect to Remus (whom I respect greatly), I strongly disagree. If your page file is large enough to support a full dump, it will perform a full dump every time. If you have a very large amount of RAM, this can cause a tiny blip to became a major outage.
You do NOT want your server to have to write out 1 TB of RAM to disk if there is a one-time transient issue. If there is a recurring issue, you can increase the page file to capture a full dump. I would wait to do this until you have been isntructed by PSS (or someone else qualified to analyze a full dump) request you to capture a full dump. An extremely small percentage of DBAs know how to analyze a full dump. A mini-dump is sufficent for troubleshooting most issues that pop up anyway.
Plus, if your server is configured to allow a 1 TB full dump and a recurring issue occurs, how much free disk space would you recommend having on hand? You could fill up an entire SAN in a single weekend.
A page file 1.5*RAM was the norm back in the days when you were lucky to have a SQL Server with 3 or 4 GB of RAM. This is not the case any more. I leave the page file at Windows default size and settings on all production servers (except for an SSAS server that is experiencing memory pressure).
And just for clarification, I've worked with servers ranging from 2 GB of RAM to 2 TB of RAM. After more than 11 years, I have only had to increae the paging file to capture a full dump one time.
Irrelevant of the size of the RAM, you still need a pagefile at least 1.5 times the amount of physical RAM. This is true even if you have a 1 TB RAM machine, you'll need 1.5 TB pagefile on disk (sounds crazy, but is true).
When a process asks MEM_COMMIT memory via VirtualAlloc/VirtualAllocEx, the requested size needs to be reserved in the pagefile. This was true in the first Win NT system, and is still true today see Managing Virtual Memory in Win32:
When memory is committed, physical
pages of memory are allocated and
space is reserved in a pagefile.
Bare some extreme odd cases, SQL Server will always ask for MEM_COMMIT pages. And given the fact that SQL uses a Dynamic Memory Management policy that reserves upfront as much buffer pool as possible (reserves and commits in terms of VAS), SQL Server will request at start up a huge reservation of space in the pagefile. If the pagefile is not properly sized errors 801/802 will start showing up in SQL's ERRORLOG file and operations.
This always causes some confusion, as administrators erroneously assume that a large RAM eliminates the need for a pagefile. In truth the contrary happens, a large RAM increases the need for pagefile, just because of the inner workings of the Windows NT memory manager. The reserved pagefile is, hopefully, never used.
According to Microsoft, "as the amount of RAM in a computer increases, the need for a page file decreases." The article then goes on to describe how to use Performance Logs to determine how much of the page file is actually being used. Try setting your page file to 1.5X system memory for a start, then do the recommended monitoring and make adjustments from there.
How to determine the appropriate page file size for 64-bit versions of Windows
The bigger the better up to the size of the working set of the application where you will start to get into diminishing returns. You can try to find this by slowly increasing or decreasing the size until you see a significant change in cache hit rates. However, if the cache hit rate is over 90% or so you're probably OK. Generally you should keep an eye on this on a production system to make sure it hasn't outgrown its RAM allocation.
We were recently having some performance issues with one of our SQL Server that we weren't able to completely narrow down, and actually used one of our Microsoft support tickets to have them help troubleshoot. The optimal pagefile size to use with SQL Server came up, and Microsoft's recommendation is that it be 1 1/2 times the amount of RAM.
In this case, the normal recommendation of 1.5 times total physical RAM is not the best. This very general recommendation is provided under the assumption that all memory is being used by "normal" processes, which can generally have their least-used pages moved to disk without generating massive performance issues for the application process the memory belongs to.
For servers running SQL Server (generally with very large amounts of RAM), the majority of the physical RAM is committed to the SQL Server process and should be (if configured correctly) locked in physical memory, preventing it from being paged out to the pagefile. SQL Server manages its own memory very carefully with performance in mind, using a large part of the RAM allocated to its process as a data cache to reduce disk I/O. It does not make sense to page out those data cache pages to the pagefile, as the sole purpose of having that data in RAM in the first place is to reduce disk I/O. (Note that the Windows OS also uses available RAM similarly as disk cache to speed up system operation.) Since SQL Server already manages its own memory space, this memory space should not be considered "pageable", and not included in a calculation for pagefile size.
In regard to MEM_COMMIT mentioned by Remus, the terminology is confusing because in the virtual memory parlance, "reserved" never refers to actual allocation, but to preventing use of an address space (not physical space) by another process. Memory available to be "committed" is basically equal to the sum of physical RAM and pagefile size, and doing a MEM_COMMIT just decrements the amount available in the committed pool. It does not allocate a matching page in the pagefile at that time. When a committed memory page is actually written to, that is when the virtual memory system will allocate a physical memory page and possibly bump another memory page from physical RAM to the pagefile. See MSDN's VirtualAlloc function reference.
The Windows OS keeps track of memory pressures between application processes and its own disk cache mechanism and decides when it should bump non-locked memory pages from physical to the pagefile. My understanding is that having a pagefile that is way too large compared to the actual non-locked memory space can result in Windows overzealously paging out application memory to the pagefile, resulting in those applications suffering the consequences of page misses (slow performance).
As long as the server is not running other memory-hungry processes, a pagefile size of 4GB should be plenty. If you have set SQL Server to allow locking pages in memory, you should also consider setting SQL Server's max memory setting so that it leaves some physical RAM available to the OS for itself and other processes.
802 errors in SQL Server indicate that the system cannot commit any more pages for the data cache. Increasing the pagefile size will only help in this situation insofar as Windows is able to page out memory from non-SQL Server processes. Allowing SQL Server memory to grow into the pagefile in this situation might get rid of the error messages, but it is counterproductive, due to the point earlier about the reason for the data cache in the first place.
If you're looking for high performance, you are going to want to avoid paging completely, so the page file size becomes less significant. Invest in as much RAM as feasible for the DB server.
After much research our dedicated SQL Servers running Enterprise x64 on Windows 2003 Enterprise x64 have no page file.
Simply, the page file is a cache for files that gets managed by the OS, and SQL has it's own internal memory management system.
The MS article referenced does not qualify that the advice is for the OS running out-of-the-box services such as file sharing.
Having a page file simply burdens the disk I/O because Windows is trying to help, when only the SQL OS can do the job.

Resources