Linux error from munmap - c

I have a simple question regarding mmap and munmap in Linux : is it possible that mmap succeeds but munmap fails?
Assuming all the parameters are correctly given, for example, see the following code snippet. In what circumstances munmap failed! will be printed??
char *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
... exit if mmap was not successful ...
... do some stuff using mmaped area ...
if( munmap(addr, 4096) == -1 ){
printf("munmap failed!\n");
}

Yes, it can fail. From mmunmap man pages :
Upon successful completion, munmap() shall return 0; otherwise, it shall return -1 and set errno to indicate the error.
The error codes indicates that people do pass invalid parameters. But if you pass pointer you got from mmap(), and correct size, then it will not fail.
Assuming all the parameters are correctly given,
Then it will not fail. That is why most implementations (99%) just don't check the return value of unmmap(). In such case, even if it fails, you can't do anything (other then informing the user).

munmap can fail with EINVAL if it receives an invalid parameter.
It can also fail with ENOMEM if the process's maximum number of mappings would have been exceeded. That can happen if you try to unmap a region in the middle of an existing map, and that results in two smaller maps.
You can see the maximum number of mappings per process with:
sysctl vm.max_map_count
You can increase it with, for example:
sysctl -w vm.max_map_count=131060
Be warned that if you don't check munmap's return, you will not have any clue when that causes a crash or another kind of failure, specially in long-running processes where the error can linger.
An munmap EINVAL, for instance, can point to a memory leak.
Also notice that mmap fails with MAP_FAILED, which is (void*)-1 and not NULL.

Related

copy_to_user fails to copy data to mmap user from kernel?

In the user space programm I am allocating some memory via mmap as the following function call:
void *memory;
int fd;
fd = open(filepath, O_RDWR);
if (fd < 0)
return errno;
memory = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0);
if (memory == MAP_FAILED)
return -1;
//syscall() goes here
In the kernel space in my system call I am trying to copy data to the memory mapped region as follows:
copy_to_user(memory,src,4096);
EDIT: added error checking code to the post for clarification
The copy_to_user() call is repeatedly failing in this case, whereas if I would have done a memory = malloc() it was succeeding always.
Am I getting some permission flags wrong in this case for mmap ?
Does the open succeed? What about mmap? Is the target file big enough? Can you write to the file through the mapping in userspace?
Also, the repeated 4096 is a strong hit your code is wrong. Userspace should pass the expected size instead.

strace has different result of Linux write system call on different buffer

I try to write a data buffer into a file using write() system call on Linux, here is the user space code I wrote.
memset (dataBuffer, 'F', FILESIZE);
fp = open(fileName, O_WRONLY | O_CREAT, 0644);
write (fp, dataBuffer, FILESIZE);
I tried two types of dataBuffer, one is from malloc(), another one is from mmap().
And I use strace to watch what the kernel will do on these two kind of buffer. Most of them are the same, but I saw when doing write(), they look different.
buffer from malloc()
[pid 258] open("/mnt/mtd/mmc/block/DATA10", O_WRONLY|O_CREAT, 0644) = 3
[pid 258] write(3, "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"..., 691200) = 691200
buffer from mmap()
[pid 262] open("/mnt/mtd/mmc/block/DATA10", O_WRONLY|O_CREAT, 0644) = 4
[pid 262] write(4, 0x76557000, 691200) = 691200
Like you can see above, the parameters of write() are different, one is "FFFFF..." like I memset before, but another is like a memory address.
Also the first parameter is different, one is 3, another is 4.
And on my system, buffer from malloc() is faster than mmap().
How come they are different? Who made this different?
Thanks.
Update: how do I measure malloc() is faster?
I trace deep inside the write() in kernel, find out the last step of write() is iov_iter_copy_from_user_atomic(), I think this is the actual memory copy operation.
Then I using gettimeofday() to measure how long the iov_iter_copy_from_user_atomic() cost in malloc/mmap buffer.
I think it's checking whether there's a trailing 0 byte at the end of the buffer. If there is, it assumes the data is a string and displays it with quotes. If not, it just shows the address.
I can't think of a reason why the buffer from malloc() would be faster than mmap(). Calling memset() should get the memory of both into RAM, so the write() shouldn't have to wait for anything to be loaded into memory.

Disk write does not work with malloc in C

I did write to disk using C code.
First I tried with malloc and found that write did not work (write returned -1):
fd = open('/dev/sdb', O_DIRECT | O_SYNC | O_RDWR);
void *buff = malloc(512);
lseek(fd, 0, SEEK_SET);
write(fd, buff, 512);
Then I changed the second line with this and it worked:
void *buff;
posix_memalign(&buff,512,512);
However, when I changed the lseek offset to 1: lseek(fd, 1, SEEK_SET);, write did not work again.
First, why didn't malloc work?
Then, I know that in my case, posix_memalign guarantees that start address of memory alignment must be multiple of 512. But, should not memory alignment and write be a separate process? So why I could not write to any offset that I want?
From the Linux man page for open(2):
The O_DIRECT flag may impose alignment restrictions on the length
and address of user-space buffers and the file offset of I/Os.
And:
Under Linux 2.4, transfer sizes, and the alignment of the user
buffer and the file offset must all be multiples of the logical block
size of the filesystem. Under Linux 2.6, alignment to 512-byte
boundaries suffices.
The meaning of O_DIRECT is to "try to minimize cache effects of the I/O to and from this file", and if I understand it correctly it means that the kernel should copy directly from the user-space buffer, thus perhaps requiring stricter alignment of the data.
Maybe the documentation doesn't say, but it's quite possible that write and reads to/from a block device is required to be aligned and for entire blocks to succeed (this would explain why you get failure in your first and last cases but not in the second). If you use linux the documentation of open(2) basically says this:
The O_DIRECT flag may impose alignment restrictions on the length and
address of user-space buffers and the file offset of I/Os. In Linux
alignment restrictions vary by file system and kernel version and
might be absent entirely. However there is cur‐
rently no file system-independent interface for an application to discover these restrictions for a given file or file system. Some
file systems provide their own interfaces for doing so, for example
the XFS_IOC_DIOINFO operation in xfsctl(3).
Your code shows lack of error handling. Every line in the code contains functions that may fail and open, lseek and write also reports the cause of error in errno. So with some kind of error handling it would be:
fd = open('/dev/sdb', O_DIRECT | O_SYNC | O_RDWR);
if( fd == -1 ) {
perror("open failed");
return;
}
void *buff = malloc(512);
if( !buff ) {
printf("malloc failed");
return;
}
if( lseek(fd, 0, SEEK_SET) == (off_t)-1 ) {
perror("lseek failed");
free(buff);
return;
}
if( write(fd, buff, 512) == -1 ) {
perror("write failed");
free(buff);
return;
}
in that case you would at least get a more detailed explaination on what goes wrong. In this case I suspect that you get EIO (Input/output error) from the write call.
Note that the above maybe isn't complete errorhandling as perror and printf themselves can fail (and you might want to do something about that possibility).

Why does fopen/fgets use both mmap and read system calls to access the data?

I have a small example program which simply fopens a file and uses fgets to read it. Using strace, I notice that the first call to fgets runs a mmap system call, and then read system calls are used to actually read the contents of the file. on fclose, the file is munmaped. If I instead open read the file with open/read directly, this obviously does not occur. I'm curious as to what is the purpose of this mmap is, and what it is accomplishing.
On my Linux 2.6.31 based system, when under heavy virtual memory demand these mmaps will sometimes hang for several seconds, and appear to me to be unnecessary.
The example code:
#include <stdlib.h>
#include <stdio.h>
int main ()
{
FILE *f;
if ( NULL == ( f=fopen( "foo.txt","r" )))
{
printf ("Fail to open\n");
}
char buf[256];
fgets(buf,256,f);
fclose(f);
}
And here is the relevant strace output when the above code is run:
open("foo.txt", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb8039000
read(3, "foo\nbar\n\n"..., 4096) = 9
close(3) = 0
munmap(0xb8039000, 4096) = 0
It's not the file that is mmap'ed - in this case mmap is used anonymously (not on a file), probably to allocate memory for the buffer that the consequent reads will use.
malloc in fact results in such a call to mmap. Similarly, the munmap corresponds to a call to free.
The mmap is not mapping the file; instead it's allocating memory for the stdio FILE buffering. Normally malloc would not use mmap to service such a small allocation, but it seems glibc's stdio implementation is using mmap directly to get the buffer. This is probably to ensure it's page-aligned (though posix_memalign could achieve the same thing) and/or to make sure closing the file returns the buffer memory to the kernel. I question the usefulness of page-aligning the buffer. Presumably it's for performance, but I can't see any way it would help unless the file offset you're reading from is also page-aligned, and even then it seems like a dubious micro-optimization.
from what i have read memory mapping functions are useful while handling large files. now the definition of large is something i have no idea about. but yes for the large files they are significantly faster as compared to the 'buffered' i/o calls.
in the example that you have posted i think the file is opened by the open() function and mmap is used for allocating memory or something else.
from the syntax of mmap function this can be seen clearly:
void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
the second last parameter takes the file descriptor which should be non-negative.
while in the stack trace it is -1
Source code of fopen in glibc shows that mmap can be actually used.
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofopen.c;h=965d21cd978f3acb25ca23152993d9cac9f120e3;hb=HEAD#l36

mmap on /proc/pid/mem

Has anybody succeeded in mmap'ing a /proc/pid/mem file with Linux kernel 2.6? I am getting an ENODEV (No such device) error. My call looks like this:
char * map = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, mem_fd, offset);
And I have verified by looking at the /proc/pid/maps file while debugging that, when execution reaches this call, offset has the value of the top of the stack minus PAGE_SIZE. I have also verified with ptrace that mmap is setting errno to ENODEV.
See proc_mem_operations in /usr/src/linux/fs/proc/base.c: /proc/.../mem does not support mmap.

Resources