Large PnP driver buffer - c

I'm developing a kernel PnP driver to map my FPGA. I need four 32Mb buffer of contiguous memory as I use a non scatter gather DMA. Right now I have a problem allocating them with WdfCommonBufferCreate as sometime it works, sometimes don't. I don't understand why the allocation fails sporadically as the device memory and devices does not changes.
Is there a way to ensure my buffer will be created? What can cause the sporadic fail?
I also thought removing 128Mb from Windows with Bcdedit and use the space left for my buffer. I have no problem doing that since the driver is for a specific platform in a controlled environnement but I did not found a way to know memory size with Windows Driver API.
Is there a way to know the size of actual memory? Can I actually use the memory left and if yes, how?
Thanks for your help

That's a lot of contiguous memory. The Windows Driver Framework can break up large DMA transactions into a size your driver can handle, if you tell it the maximum amount of scatter/gather descriptors you have through WdfDmaEnablerSetMaximumScatterGatherElements. Just use a smaller fixed number of scatter/gather elements.

Related

Copy large array from userspace to kernel

I'm developing a userspace application and corresponding kernel driver.
I need to pass a huge byte array from userspace to the driver.
copy_from_user has a limit of 16K, but the array is much larger than this.
Is there a way to do it either by copy_from_user or a better solution?

register large buffer for RDMA in Linux kernel module

i'm a newbie experimenting a project using rdma (ib_verbs) in kernel module. I got the example code from krping and tinkering on it. The system run on 64bits Linux Centos with a custom 3.10 Linux kernel that require transparent huge pages disabled.
I want a large (4GB up) of RDMA read/write able space which doesn't have to be contiguous as i'll most likely write/read at most 1MB at a time from remote party (random access).
Question:
Should i just do a thousand times of 4MB kmalloc and register DMA region? How bad it is, design wise for allocating large chuck of memory using kmalloc instead of vmalloc? I heard it should not be done and large memory should only retrieved via vmalloc. But addresses from vmalloc are not good for DMA.
If not then what would be a good alternative way to have a 4GB buffer that can be random access from remote party?
How does user-space rdma manage this kind of buffer? I remembered that i only malloc 4GB of memory and call ibv_reg_mr and it is ready to use.
As long as you're not using a memory that covers the entire physical memory (which isn't recommended for write-enabled MRs), you should use the IB_WR_REG_MR work request to register your memory region. For that, you would use the ib_map_mr_sg function which accepts a scatterlist and a page size. So basically, you can register an MR that is built with chunks of a fixed size that you choose.
There's a tradeoff here: using small allocation size will allow the kernel to find free memory easier on fragmented systems, but on the other hand it could decrease performance, as it can increase the load on the NIC's IOTLB.
User-space handles large MR registration by calling get_user_pages and using the system's page size (normally 4kb). Though some drivers have optimizations to try and detect larger page sizes internally, if the user-space memory happens to align that way.

Driver requires large chunks of contiguous physical memory

I need to modify a network adapter driver to increase its performance for my use, and I need a huge physical memory chunk to be contiguous.
I will need several of these chunks based on the number of ports. Each chunk should be around 64MB.
Currently I am looking at my option to be CMA and bootmem.
Is there any other option for same and I haven't used any of it till date so can someone give me a direction on how to use it? as in are there inbuilt functions to manage this allocated memory or will I have to manage it all in my driver?

Read from any ram address directly?

Im doing some debugging on hardware with a Linux OS.
Now there no way for me to know if any of it works unless I can check the allocated ram that I asked it to write to.
Is there some way that I can check what is in that block or RAM from an external program running in the same OS?
If I could write a little program in C to do that how will I go about it since I cant just go and assign pointers custom addresses ?
Thanks
I think the best way to do what you are asking for is to use a debugger. And you cannot read another programme's memory unless you execute your code in a privileged space (i.e. the kernel), and privileged from the point of view of the CPU. And this because each programme is running in its own virtual memory space (for security concerns) and even the kernel is running in a virtual memory space but it has the privilege to map any physical memory block inside the virtual memory space it is current running. Anyway, I will not explain more in depth how an modern OS manage memory with the underneath hardware, it would be long.
You should really look at using a debugger. Once you environment with your debugger is ready, you should put a break after that memory block allocation so the debugger will stop the programme there and so you can inspect that freshly allocated memory block as you wish. Depending on whether you use an IDE or not, it can be very easy to use a debugger ;)
/dev/mem could come to use. It is a device file that is an image of the physical memory (including non-RAM memory). It's generally used to read/write memory of peripheral devices.
By mmap()ing to it, you could access physical memory.
See this linux documentation project page
memedit is a handy utility to display and change memory content for testing purposes.
It's main purpose is to display SoC hardware registers but it could be used to display RAM. It is based on mmap() mechanism. It could be good starting point to write custom application.

How to use more than 3 GB in a process on 32-bit PAE-enabled Linux app?

PAE (Physical Address Extension) was introduced in CPUs back in 1994. This allows a 32-bit processor to access 64 GB of memory instead of 4 GB. Linux kernels offer support for this starting with 2.3.23. Assume I am booting one of these kernels, and want to write an application in C that will access more than 3 GB of memory (why 3 GB? See this).
How would I go about accessing more than 3 GB of memory? Certainly, I could fork off multiple processes; each one would get access to 3 GB, and could communicate with each other. But that's not a realistic solution for most use cases. What other options are available?
Obviously, the best solution in most cases would be to simply boot in 64-bit mode, but my question is strictly about how to make use of physical memory above 4 GB in an application running on a PAE-enabled 32-bit kernel.
You don't, directly -- as long as you're running on 32-bit, each process will be subject to the VM split that the kernel was built with (2GB, 3GB, or if you have a patched kernel with the 4GB/4GB split, 4GB).
One of the simplest ways to have a process work with more data and still keep it in RAM is to create a shmfs and then put your data in files on that fs, accessing them with the ordinary seek/read/write primitives, or mapping them into memory one at a time with mmap (which is basically equivalent to doing your own paging). But whatever you do it's going to take more work than using the first 3GB.
Or you could fire up as many instances of memcached as needed until all physical memory is mapped. Each memcached instance could make 3GiB available on a 32 bit machine.
Then access memory in chunks via the APIs and language bindings for memcached. Depending on the application, it might be almost as fast as working on a 64-bit platform directly. For some applications you get the added benefit of creating a scalable program. Not many motherboards handle more than 64GiB RAM but with memcached you have easy access to as much RAM as you can pay for.
Edited to note, that this approach of course works in Windows too, or any platform which can run memcached.
PAE is an extension of the hardware's address bus, and some page table modifications to handle that. It doesn't change the fact that a pointer is still 32 bits, limiting you to 4G of address space in a single process. Honestly, in the modern world the proper way to write an application that needs more than 2G (windows) or 3G (linux) of address space is to simply target a 64 bit platform.
On Unix one way to access that more-than 32bit addressable memory in user space by using mmap/munmap if/when you want to access a subset of the memory that you aren't currently using. Kind of like manually paging. Another way (easier) is to implicitly utilize the memory by using different subsets of the memory in multiple processes (if you have a multi-process archeteticture for your code).
The mmap method is essentially the same trick as commodore 128 programmers used to do for bank switching. In these post commodore-64 days, with 64-bit support so readily available, there aren't many good reasons to even think about it;)
I had fun deleting all the hideous PAE code from our product a number of years ago.
You can't have pointers pointing to > 4G of address space, so you'd have to do a lot of tricks.
It should be possible to switch a block of address space between different physical pages by using mmap to map bits of a large file; you can change the mapping at any time by another call to mmap to change the offset into the file (in multiples of the OS page size).
However this is a really nasty technique and should be avoided. What are you planning on using the memory for? Surely there is an easier way?
Obviously, the best solution in most cases would be to simply boot in 64-bit mode, but my question is strictly about how to make use of physical memory above 4 GB in an application running on a PAE-enabled 32-bit kernel.
There's nothing special you need to do. Only the kernel needs to address physical memory, and with PAE, it knows how to address physical memory above 4 GB. The application will use memory above 4 GB automatically and with no issues.

Resources