I have a PCI device which exposes a BAR and few offsets in the bar for accessing the device.
On one of the Bar offset, i need to program a 64KB allocated memory. In my Linux driver, i allocate a 64KB of memory using kmalloc() which as i know returns virtual address. If this is programmed into the offset, HW won't be able to see the same. How do i convert this virtual address to physical ?
When i Google, i see few links pointing to virt_to_phys() but few responses says this doesn't work well with kmalloc(). Any idea how to go about this?
You normally use pci_resource_start() / pci_resource_end() from within a kernel driver. I assume you are writing a device driver ?
I won't map the memory yourself : That's where the kernel functions are for. That way, you're sure it works on all platforms. That 64k block I assume is some memory map that is provided by the PCI device ? if yes, then the above is correct. If no, please give more information.
Rather than using kmalloc() , use alloc_pages() function.
struct page* alloc_pages(gfp_t gfp_mask, 4 );
one page is of 4K, so it will allocate 2^4=16 pages which is equal to 16 * 4K = 64K memory and returns physical address.
Related
Arch=x86_64
I am working through a DMA solution following the process outlined in this question,
Direct Memory Access in Linux
My call to ioremap successfully returns with an address, pt.
In my call to remap_pfn_range I use, virt_to_phys(pt) >> PAGE_SHIFT, to specify the pfn of the area generated by the ioremap call.
When the userspace application using mmap executes and the call to remap_pfn_range is made, the machine crashes. I assume the mapping is off and I am forcing the system to use memory that is already allocated (screen glitches before exit), however I'm not clear on where the mismatch is occurring. The system has 4 Gigs of Ram and I reserved 2Gigs by using the kernel boot option mem=2048M.
I use BUFFER_SIZE=1024u*1024u*1024u and BUFFER_OFFSET=2u*1024u*1024u*1024u.
putting these into pt=ioremap(BUFFER_SIZE,BUFFER_OFFSET) I believe pt should equal a virtual address to the physical memory located at the 2GB boundary up to the 3GB boundary. Is this assumption accurate?
When I execute my kernel module, but I change my remap_pfn_range to use vma->vm_pgoff>>PAGE_SHIFT as the target pfn the code executes with no error and I can read and write to the memory. However this is not using the reserved physical memory that I intended.
Since everything works when using vma->vm_pgoff>>PAGE_SHIFT I believe my culprit is between my ioremap and the remap_pfn_range
Thanks for any suggestions!
The motivation behind the use of this kernel module is the need for large contiguous buffers for DMA from a PCI device. In this application, recompiling the kernel isn't an option so I'm trying to accomplish it with a module + hardware.
My call to ioremap successfully returns with an address, pt.
In my call to remap_pfn_range I use, virt_to_phys(pt) >> PAGE_SHIFT,
to specify the pfn of the area generated by the ioremap call.
This is illegal, because ioremap reserves virtual region in vmalloc area. The virt_to_phys() is OK only for linearly mapped part of memory.
putting these into pt=ioremap(BUFFER_SIZE,BUFFER_OFFSET) I believe pt
should equal a virtual address to the physical memory located at the
2GB boundary up to the 3GB boundary. Is this assumption accurate?
That is not exactly true, for example on my machine
cat /proc/iomem
...
00001000-0009ebff : System RAM
...
00100000-1fffffff : System RAM
...
There may be several memory banks, and the memory not obligatory will start at address 0x0 of physical address space.
This might be usefull for you Dynamic DMA mapping Guide
I have basic query about ioremap used to map device IO addressed into kernel's virtual memory.
I would like to know if returned address from ioremap is passed to routines like virt_to_phys(), would it return back Device IO address ?
Thanks
virt_to_phys() is only valid for virtual addresses within the kernel linear map, since it's just some fast address arithmetic, not a full software table walk. The linear map normally only covers RAM. The virtual address returned by ioremap(), however, will usually (probably always, but I don't have the patience to check every implementation) be a vmalloc address, so if you pass that to virt_to_phys() you'll get nonsense back.
I am trying to make use of a contiguous memory i reserved while passing the "mem" parameter to Linux when booting.
Now, i have the physical address of this space i reserved earlier, and the length of it, and i wish to make use of this reserved space for DMA purposes in my driver.
Normally i would use dma_alloc_coherent() , and if i were using CMA i would use that too, but in this case, its different.
Now, i have read that an acceptable way of mapping a physical space to kernel virtual space is to use ioremap
And, an acceptable way of "taking over" a contiguous space for DMA purposes is to use dma_map_single (mapping it for bus address)
I'm having trouble combining the two. ioremap works and returns a virtual address. Now, i have read that this is no ordinary virtual address and i should only be using access methods to read/write from this memory.
Thing is, when i try to pass this virtual address to dma_map_single , it doesn't report an error, but i suspect that this is wrong.
Am i doing it right? What can i do to make it work like it should?
10x
You are doing right
You don't need to allocate the memory because you already set it on boot time but you need to use dam_map_single to prevent cache problems for example if you want to do a DMA from memory to the device but the RAM is not synchronised with the L2 cache (the cache has a newer version) you will get the wrong data so you need to map and unmap before and after the DMA operation
I'm trying to allocate a DMA buffer for a HPC workload. It requires 64GB of buffer space. In between computation, some data is offloaded to a PCIe card. Rather than copy data into a bunch of dinky 4MB buffers given by pci_alloc_consistent, I would like to just create 64 1GB buffers, backed by 1GB HugePages.
Some background info:
kernel version: CentOS 6.4 / 2.6.32-358.el6.x86_64
kernel boot options: hugepagesz=1g hugepages=64 default_hugepagesz=1g
relevant portion of /proc/meminfo:
AnonHugePages: 0 kB
HugePages_Total: 64
HugePages_Free: 64
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
DirectMap4k: 848 kB
DirectMap2M: 2062336 kB
DirectMap1G: 132120576 kB
I can mount -t hugetlbfs nodev /mnt/hugepages. CONFIG_HUGETLB_PAGE is true. MAP_HUGETLB is defined.
I have read some info on using libhugetlbfs to call get_huge_pages() in user space, but ideally this buffer would be allocated in kernel space. I tried calling do_mmap() with MAP_HUGETLB but it didn't seem to change the number of free hugepages, so I don't think it was actually backing the mmap with huge pages.
So I guess what I'm getting at, is there any way I can map a buffer to a 1GB HugePage in kernel space, or does it have to be done in user space? Or if anyone knows of any other way I can get an immense (1-64GB) amount of contiguous physical memory available as a kernel buffer?
PROBLEM
Normally if you want to allocate a DMA buffer, or get a physical address, this is done in kernel space, as user code should never have to muck around with physical addresses.
Hugetlbfs only provides user-space mappings to allocate 1GB huge pages, and get user-space virtual addresses
No function exists to map a user hugepage virtual address to a physical address
EUREKA
But the function does exist! Buried deep in the 2.6 kernel source code lies this function to get a struct page from a virtual address, marked as "just for testing" and blocked with #if 0:
#if 0 /* This is just for testing */
struct page *
follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
{
unsigned long start = address;
int length = 1;
int nr;
struct page *page;
struct vm_area_struct *vma;
vma = find_vma(mm, addr);
if (!vma || !is_vm_hugetlb_page(vma))
return ERR_PTR(-EINVAL);
pte = huge_pte_offset(mm, address);
/* hugetlb should be locked, and hence, prefaulted */
WARN_ON(!pte || pte_none(*pte));
page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)];
WARN_ON(!PageHead(page));
return page;
}
SOLUTION:
Since the function above isn't actually compiled into the kernel, you will need to add it to your driver source.
USER SIDE WORKFLOW
Allocate 1gb hugepages at boot with kernel boot options
Call get_huge_pages() with hugetlbfs to get user space pointer (virtual address)
Pass user virtual address (normal pointer cast to unsigned long) to driver ioctl
KERNEL DRIVER WORKFLOW
Accept user virtual address via ioctl
Call follow_huge_addr to get the struct page*
Call page_to_phys on the struct page* to get the physical address
Provide physical address to device for DMA
Call kmap on the struct page* if you also want a kernel virtual pointer
DISCLAIMER
The above steps are being recollected several years later. I have lost access to the original source code. Do your due diligence and make sure I'm not forgetting a step.
The only reason this works is because 1GB huge pages are allocated at boot time and their physical addresses are permanently locked. Don't try to map a non-1GBhugepage-backed user virtual address into a DMA physical address! You're going to have a bad time!
Test carefully on your system to confirm that your 1GB huge pages are in fact locked in physical memory and that everything is working exactly. This code worked flawlessly on my setup, but there is great danger here if something goes wrong.
This code is only guaranteed to work on x86/x64 architecture (where physical address == bus address), and on kernel version 2.6.XX. There may be an easier way to do this on later kernel versions, or it may be completely impossible now.
This is not commonly done in the kernel space, so not too many examples.
Just like any other page, huge pages are allocated with alloc_pages, to the tune:
struct page *p = alloc_pages(GFP_TRANSHUGE, HPAGE_PMD_ORDER);
HPAGE_PMD_ORDER is a macro, defining an order of a single huge page in terms of normal pages. The above implies that transparent huge pages are enabled in kernel.
Then you can proceed mapping the obtained page pointer in the usual fashion with kmap().
Disclaimer: I never tried it myself, so you may have to do some experimenting around. One thing to check for is this: HPAGE_PMD_SHIFT represents an order of a smaller "huge" page. If you want to use those giant 1GB pages, you will probably need to try a different order, probably PUD_SHIFT - PAGE_SHIFT.
This function returns correct virtual addr in kernel space if given physical address from user space allocated in hugespace.
static inline void * phys_to_virt(unsigned long address)
Look for function on kernel code, it is tested with dpdk and kernel module.
I reserve the memory chunk using a memmap=8G$4G linux kernel boot parameter.
Is it needed to ioremap this memory ?
ioremap man pages say :
ioremap performs a platform specific sequence of operations to make
bus memory CPU accessible via the readb/readw/readl/writeb/
writew/writel functions and the other mmio helpers. The returned
address is not guaranteed to be usable directly as a virtual address.
So if i can't use the returned address of ioremap as a virtual address for directly addressing the memory, then a broader question is when should we ioremap the memory ?
Yes, you have to ioremap this region to access it. The kernel does not set up the Page Directory Entries for this memory region as you instructed the kernel to ignore this region.
The addresses returned by ioremap may not be used directly if you remapped the addresses of io-port address space. When you remap the addresses from the memory address space then it is OK to use them directly.
However, please take look at https://unix.stackexchange.com/questions/37729/how-can-i-reserve-a-block-of-memory-from-the-linux-kernel
As per my experience with reserving ( or blocking) the memory is as followed.
if you are trying to reserve a particular volume of memory you may have to remap the already existing memory map provided by BIOS.
If your system doesnt enable you to do so then you will have to identify which area is free in the BIOS provided memory map and only that can be reserved.