DMA transfer form kernel to user space

DMA transfer form kernel to user space - c

I am trying to realize DMA in Xilinx, using DMA Engine. FPGA guys give me an AXI DMA IP-Core with DMA support, so I need to write a driver, which will transfer data from kernel to user space buffer. Here you can find an example of the driver and app, which works so:
1) process opens /dev/device_file that represents the driver
2) it maps kernel coherent memory, allocated via dma_alloc_coherent
3) fills buffer with some data
4) calls ioctl() to start dma transfer
It works fine, but i have such question - can I transfer user space buffer to kernel space (via read function from file_operations structure), prepare it (shift by PAGE_SIZE or something else) and execute dma operation ? I don't understand how to make user space memory available in DMA operation.

You probably want to implement mmap method of struct file_operations. Consider:
static int
sample_drv_mem_mmap(struct file *filep, struct vm_area_struct *vma)
{
/*
* Set your "dev" pointer here (the one you used
* for dma_alloc_coherent() invocation)
*/
struct device *dev;
/*
* Set DMA address here (the one you obtained with
* dma_alloc_coherent() via its third argument)
*/
dma_addr_t dma_addr;
/* Set your DMA buffer size here */
size_t dma_size;
/* Physical page frame number to be derived from "dma_addr" */
unsigned long pfn;
/* Check the buffer size requested by the user */
if (vma->vm_end - vma->vm_start > dma_size)
return -EINVAL;
/*
* For the sake of simplicity, do not let the user specify an offset;
* you may want to take care of that in later versions of your code
*/
if (vma->vm_pgoff != 0)
return -EINVAL;
pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));
return remap_pfn_range(vma, vma->vm_start, pfn,
vma->vm_end - vma->vm_start,
vma->vm_page_prot);
}
/* ... */
static const struct file_operations sample_drv_fops = {
/* ... */
.mmap = sample_drv_mem_mmap,
/* ... */
};
Long story short, the idea is to convert the DMA (bus) address you have to a kernel physical address and then use remap_pfn_range() to do the actual mapping between the kernel and the userland.
In the user application, one should invoke mmap() to request the mapping (instead of the read / write approach) For more information on that, please refer to man 2 mmap on your system.

Related

Correct way to block read operations until an external event?

I am working on a devices driver for a data acquisition system. There is a pci device that provides input and output data at the same time at regular intervals. And then the linux mod manages the data in circular buffers that are read and written to through file operations.
The data throughput of the system is relatively low it receives just over 750,000 bytes/second and transmits just over 150,000 bytes per second.
There is a small user space utility that writes and reads data in a loop for testing purposes.
Here is a section of the driver code (All the code related to the circular buffers has been omitted for simplicity sake. PCI device initialization is taken care of elsewhere and pci_interupt not the real entry point for the interrupt handler)
#include <linux/sched.h>
#include <linux/wait.h>
static DECLARE_WAIT_QUEUE_HEAD(wq_head);
static ssize_t read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos)
{
DECLARE_WAITQUEUE(wq, current);
if(count == 0)
return 0;
add_wait_queue(&wq_head, &wq);
do
{
set_current_state(TASK_INTERRUPTIBLE);
if(/*There is any data in the receive buffer*/)
{
/*Copy Data from the receive buffer into user space*/
break;
}
schedule();
} while(1);
set_current_state(TASK_RUNNING);
remove_wait_queue(&wq_head, &wq);
return count;
}
static ssize_t write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) {
/* Copy data from userspace into the transmit buffer*/
}
/* This procedure get's called in real time roughly once every 5 milliseconds,
It writes 4k to the receiving buffer and reads 1k from the transmit buffer*/
static void pci_interrupt() {
/*Copy data from PCI dma buffer to receiving buffer*/
if(/*There is enough data in the transmit buffer to fill the PCI dma buffer*/) {
/*Copy from the transmit buffer to the PCI device*/
} else {
/*Copy zero's to the PCI device*/
printk(KERN_ALERT DEVICE_NAME ": Data Underflow. Writing 0's'");
}
wake_up_interruptible(&wq_head);
}
The above code works well for long periods of time however every 12-18 hours there is a data underflow error. Resulting in zeros being written.
My first thought is that due to the userspace application not being truly real-time the time delay between it's read and write operations occasionally got too large causing the failure. However I tried changing the size of the reads and writes in userspace and changing the niceness of the userspace application this had no effect on the frequency of the error.
Do to the error's nature I believe there is some form of race condition in the three methods above. I am not sure how linux kernel wait queues work.
Is there a decent alternative to the above method for blocking reads or is there something else that is wrong the could cause this behavior.
System Information:
Linux Version: Ubuntu 16.10
Linux Kernel: linux-4.8.0-lowlatency
Chipset: Intel Celeron N3150/N3160 Quad Core 2.08 GHz SoC
TL;DR: The above code hits underflow errors every 12-18 hours is there a better way to do blocking IO or some race condition in the code.

One standard way used in linux can also be used in your case.
User space test program:
1. open file in blocking mode (default in linux until you specify NONBLOCK flag)
2. call select() to block on file descriptor.
Kernel driver:
1. Register interrupt handler which gets invoked whenever there is data available
2. Handler take lock to protect common buffer between reads/writes and transfer of data
Take a look at these links for source code from ldd3 book test and driver.

Unable to write the complete script onto a device on the serial port

The script file has over 6000 bytes which is copied into a buffer.The contents of the buffer are then written to the device connected to the serial port.However the write function only returns 4608 bytes whereas the buffer contains 6117 bytes.I'm unable to understand why this happens.
{
FILE *ptr;
long numbytes;
int i;
ptr=fopen("compass_script(1).4th","r");//Opening the script file
if(ptr==NULL)
return 1;
fseek(ptr,0,SEEK_END);
numbytes = ftell(ptr);//Number of bytes in the script
printf("number of bytes in the calibration script %ld\n",numbytes);
//Number of bytes in the script is 6117.
fseek(ptr,0,SEEK_SET);
char writebuffer[numbytes];//Creating a buffer to copy the file
if(writebuffer == NULL)
return 1;
int s=fread(writebuffer,sizeof(char),numbytes,ptr);
//Transferring contents into the buffer
perror("fread");
fclose(ptr);
fd = open("/dev/ttyUSB3",O_RDWR | O_NOCTTY | O_NONBLOCK);
//Opening serial port
speed_t baud=B115200;
struct termios serialset;//Setting a baud rate for communication
tcgetattr(fd,&serialset);
cfsetispeed(&serialset,baud);
cfsetospeed(&serialset,baud);
tcsetattr(fd,TCSANOW,&serialset);
long bytesw=0;
tcflush(fd,TCIFLUSH);
printf("\nnumbytes %ld",numbytes);
bytesw=write(fd,writebuffer,numbytes);
//Writing the script into the device connected to the serial port
printf("bytes written%ld\n",bytesw);//Only 4608 bytes are written
close (fd);
return 0;
}

Well, that's the specification. When you write to a file, your process normally is blocked until the whole data is written. And this means your process will run again only when all the data has been written to the disk buffers. This is not true for devices, as the device driver is the responsible of determining how much data is to be written in one pass. This means that, depending on the device driver, you'll get all data driven, only part of it, or even none at all. That simply depends on the device, and how the driver implements its control.
On the floor, device drivers normally have a limited amount of memory to fill buffers and are capable of a limited amount of data to be accepted. There are two policies here, the driver can block the process until more buffer space is available to process it, or it can return with a partial write only.
It's your program resposibility to accept a partial read and continue writing the rest of the buffer, or to pass back the problem to the client module and return only a partial write again. This approach is the most flexible one, and is the one implemented everywhere. Now you have a reason for your partial write, but the ball is on your roof, you have to decide what to do next.
Also, be careful, as you use long for the ftell() function call return value and int for the fwrite() function call... Although your amount of data is not huge and it's not probable that this values cannot be converted to long and int respectively, the return type of both calls is size_t and ssize_t resp. (like the speed_t type you use for the baudrate values) long can be 32bit and size_t a 64bit type.
The best thing you can do is to ensure the whole buffer is written by some code snippet like the next one:
char *p = buffer;
while (numbytes > 0) {
ssize_t n = write(fd, p, numbytes);
if (n < 0) {
perror("write");
/* driver signals some error */
return 1;
}
/* writing 0 bytes is weird, but possible, consider putting
* some code here to cope for that possibility. */
/* n >= 0 */
/* update pointer and numbytes */
p += n;
numbytes -= n;
}
/* if we get here, we have written all numbytes */

delayed write from userspace to kernel space using framebuffer node

I have implemented a linux kernel driver which uses deferred IO mechanism to track the changes in framebuffer node.
static struct fb_deferred_io fb_defio = {
.delay = HZ/2,
.deferred_io = fb_dpy_deferred_io,
};
Per say the registered framebuffer node is /dev/graphics/fb1.
The sample application code to access this node is:
fbfd = open("/dev/graphics/fb1", O_RDWR);
if (!fbfd) {
printf("error\n");
exit(0);
}
screensize = 540*960*4;
/* Map the device to memory */
fbp = (unsigned char *)mmap(0, screensize, PROT_READ | PROT_WRITE, MAP_SHARED,
fbfd, 0);
if ((int)fbp == -1) {
printf("Error: failed to start framebuffer device to memory.");
}
int grey = 0x1;
for(cnt = 0; cnt < screensize; cnt++)
*(fbp + cnt) = grey<<4|grey;
This would fill up entire fb1 node with 1's.
The issue now is at the kernel driver when i try to read the entire buffer I find data mismatch at different locations.
The buffer in kernel is mapped as:
par->buffer = dma_alloc_coherent(dev, roundup((dpyw*dpyh*BPP/8), PAGE_SIZE),(dma_addr_t *) &DmaPhysBuf, GFP_KERNEL);
if (!par->buffer) {
printk(KERN_WARNING "probe: dma_alloc_coherent failed.\n");
goto err_vfree;
}
and finally the buffer is registered through register_framebuffer function.
On reading the source buffer I find that at random locations the data is not been written instead the old data is reflected.
For example:
At buffer location 3964 i was expecting 11111111 but i found FF00FF00.
On running the same application program with value of grey changed to 22222222
At buffer location 3964 i was expecting 22222222 but i found 11111111
It looks like there is some delayed write in the buffer. Is there any solution to this effect, because of partially wrong data my image is getting corrupted.
Please let me know if any more information is required.
Note: Looks like an issue of mapped buffer being cacheable or not. Its a lazy write to copy the data from cache to ram. Need to make sure that the data is copied properly but how still no idea.. :-(

"Deferred io" means that frame buffer memory is not really mapped to a display device. Rather, it's an ordinary memory area shared between user process and kernel driver. Thus it needs to be "synced" for kernel to actually do anything about it:
msync(fbp, screensize, MS_SYNC);
Calling fsync(fbfd) may also work.
You may also try calling ioctl(fbfd, FBIO_WAITFORVSYNC, 0) if your driver supports it. The call will make your application wait until vsync happens and the frame buffer data was definitely transferred to the device.

I was having a similar issue where I was having random artifacts displaying on the screen. Originally, the framebuffer driver was not using dma at all.
I tried the suggestion of using msync(), which improved the situation (artifacts happened less frequently), but it did not completely solve the issue.
After doing some research I came to the conclusion that I need to use dma memory because it is not cached. There is still the issue with mmap because it is mapping the kernel memory to userspace. However, I found that there is already a function in the kernel to handle this.
So, my solution was in my framebuffer driver, set the mmap function:
static int my_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
{
return dma_mmap_coherent(info->dev, vma, info->screen_base,
info->fix.smem_start, info->fix.smem_len);
}
static struct fb_ops my_fb_ops = {
...
.fb_mmap = my_fb_mmap,
};
And then in the probe function:
struct fb_info *info;
struct my_fb_par *par;
dma_addr_t dma_addr;
char *buf
info = framebuffer_alloc(sizeof(struct my_fb_par), &my_parent->dev);
...
buf = dma_alloc_coherent(info->dev, MY_FB_SIZE, dma_addr, GFP_KERNEL);
...
info->screen_base = buf;
info->fbops = &my_fb_ops;
info->fix = my_fb_fix;
info->fix.smem_start = dma_addr;
info->fix.smem_len = MY_FB_SIZE;
...
par = info->par
...
par->buffer = buf;
Obviously, I've left out the error checking and unwinding, but hopefully I have touched on all of the important parts.
Note: Comments in the kernel source say that dmac_flush_range() is for private use only.

Well eventually i found a better way to solve the issue. The data written through app at mmaped device node is first written in cache which is later written in RAM through delayed write policy. In order to make sure that the data is flushed properly we need to call the flush function in kernel. I used
dmac_flush_range((void *)pSrc, (void *)pSrc + bufSize);
to flush the data completely so that the kernel receives a clean data.

If I have only the physical address of device buffer (PCIe), how can I map this buffer to user-space?

If I have only the physical address of the memory buffer to which is mapped the device buffer via the PCI-Express BAR (Base Address Register), how can I map this buffer to user-space?
For example, how does usually the code should look like in Linux-kernel?
unsigned long long phys_addr = ...; // get device phys addr
unsigned long long size_buff = ...l // get device size buff
// ... mmap(), remap_pfn_range(), Or what should I do now?
On: Linux x86_64
From: https://stackoverflow.com/a/17278263/1558037
ioremap() maps a physical address into a kernel virtual address.
remap_pfn_range() maps physical addresses directly to user space.
From: https://stackoverflow.com/a/9075865/1558037
int remap_pfn_range(struct vm_area_struct *vma, unsigned long virt_addr,
unsigned long pfn, unsigned long size, pgprot_t prot);
remap_pfn_range - remap kernel memory to userspace
May be can I use it so?
unsigned long long phys_addr = ...; // get device phys addr
unsigned long long size_buff = ...l // get device size buff
remap_pfn_range(vma, vma->vm_start, (phys_addr >> PAGE_SHIFT),
size_buff, vma->vm_page_prot);
Question: But, where can I get wma, and what I must pre-do with wma before call to remap_pfn_range()?

Mapping PCI resource is dependent on the architecture.
BARs are already available to userspace with the sysfs files /sys/bus/pci/devices/*/resource*, which support mmap.
This is implemented by the function pci_mmap_resource in drivers/pci/pci-sysfs.c, which ends up calling pci_mmap_page_range.

The Linux kernel, at least, versions 2.6.x use the ioremap() function.
void *vaddr = ioremap (phys_addr, size_addr);
if (vaddr) {
/* do stuff with the memory using vaddr pointer */
iounmap (vaddr);
}
You should make a previous call to request_mem_region() to check if that memory space is already reclaimed by another driver, and politely request that memory to be owned by your code (driver). The complete example should look like this:
void *vaddr;
if (request_mem_region (phys_addr, size_addr, "my_driver")) {
vaddr = ioremap (phys_addr, size_addr);
if (vaddr) {
/* do stuff with the memory */
iounmap (vaddr);
}
release_mem_region (phys_addr, size_addr);
}
You can check your ownership by checking /proc/iomem, which will reflect the address range and the owner of every piece of memory in your system.
UPDATE: I don't really know if this works for 64-bit kernels. It does for 32-bit. If 64-bit kernel don't have these kernel functions, they will have similar ones, I guess.

Linux is not allowing me to access a fixed region of memory

I have some data stored in a FLASH memory that I need to access with C pointers to be able to make a non-Linux graphics driver work (I think this requirement is DMA related, not sure). Calling read works, but I don't want to have intermediate RAM buffers between the FLASH and the non-Linux driver.
However, just creating a pointer and storing the address that I want on it is making Linux emit an exception about invalid access on me.
void *ptr = 0xdeadbeef;
int a = *ptr; // invalid access!
What am I missing here? And could someone point me to a material to make this concepts clear for me?
I'm reading about mmap but I'm not sure that this is what I need.

The problem you have is that linux runs your program in a virtual address space. So every address you use directly in the code (like 0xdeadbeef) is a virtual address that gets translated by the memory management unit into a physical address which is not necessarily the same as your virtual address. This allows easy separation of multiple independent processes and other stuff like paging, etc.
The problem is now, that in your case no physical address is mapped to the virtual address 0xdeadbeef causing the kernel to abort execution.
The call mmap you already found asks the kernel to assign a specific file (from a specific offset) to a virtual address of your process. Note that the returning address of mmap could be a completely different address. So don't make any assumptions about the virtual address you get.
Therefore there are examples with mmap and /dev/mem out there where the offset for the memory device is the physical address. After the kernel was able to assign the file from the offset you gave to a virtual address of your process you can access the memory area asif it were a direct access.
After you don't need the area anymore don't forget to munmap the area. Otherwise you'll cause something similar to a memory leak.
One problem with the /dev/mem method is that the user running the process needs access to this device. This could introduce a security issue (e.g. Samsung recently introduced such a security hole in their hand held devices)
A more secure way is the way described in a article i found (The Userspace I/O HOWTO) as you still have control about the memory areas accessable by the user's process.

You need to access the memory differently. Basically you need to open /dev/mem and use mmap(). (as you suggested). Simple example:
int openMem(unsigned int address, unsigned int size)
{
int mmapFD;
int page_size;
unsigned int page_start_address;
/* Minimum page size for the mmapped region. */
mask = size - 1;
/* Get the page size. */
page_size = (int) sysconf(_SC_PAGE_SIZE);
/* We have to map shared memory to beginning of memory page so adjust
* memory address accordingly. */
page_start_address = address - (address % page_size);
/* Open the file that will be mapped. */
if((mmapFD = open("/dev/mem", (O_RDWR | O_SYNC))) == -1)
{
printf("Opening shared memory device failed\n");
return -1;
}
mmap_base_address = mmap(0, size, (PROT_READ|PROT_WRITE), MAP_SHARED, mmapFD, (off_t)page_start_address & ~mask);
if(mmap_base_address == MAP_FAILED)
{
printf("Mapping memory failed\n");
return -1;
}
return 0;
}
unsigned int *getAddress(unsigned int address)
{
unsigned int log_address;
log_address = (int)((off_t)mmap_base_address + ((off_t)address & mask));
return (unsigned int*)log_address;
}
...
result = openMem(address, 0x10000);
if (result < 0)
return result;
target_address = getValue(address);
*(unsigned int*)target_address = value;
This would set "value" to "address".

You need to call ioremap - something like:
void *myaddr = ioremap(0xdeadbeef, size);
where size is the size of your memory region. You probably want to use a page-aligned address for the first argument, e.g. 0xdeadb000 - but I expect your actual device isn't at "0xdeadbeef" anyways.
Edit: The call to ioremap must be done from a driver!