Segfault while using mmap in C for reading binary files - c

I am trying to use mmap in C just to see how it exactly works. Currently I am try to read a binary file byte by byte using mmap. My code is like this:
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
int main(int argc, char *argv[]) {
int fd;
char *data;
for ( int i = 1; i<argc; i++)
{
if(strcmp(argv[i],"-i")==0)
fd = open(argv[i+1],O_RDONLY);
}
data = mmap(NULL, 4000, PROT_READ, MAP_SHARED, fd, 8000);
int i = 0;
notation = data [i];
// ......
}
My problem occurs when I try notation = data[0] and I get a segfault . I am sure that the first byte in the binary file is a character as well. My for loop checks if there is an -i flag while compiling , if there is the next argument should be the file name.

It appears that mmap fails because the offset is not a multiple of page size. You can test this with perror and see that the problem is an invalid argument. If you write:
data = mmap(NULL, 4000, PROT_READ, MAP_SHARED, fd, 8000);
perror("Error");
At least on my OS X the following error is printed:
Error: Invalid argument
Changing offset from 8000 to 4096 or 8192 works. 6144 doesn't, so it has to be a multiple of 4096 on this platform. Incidentally,
printf("%d\n",getpagesize());
prints 4096. You should round your offset down to nearest multiple of this for mmap and add the remainder to i when accessing the area. Of course, get the page size for your particular platform from that function. It's probably defined in unistd.h, which you already declared.
Here's how to handle the offset correctly and deal with possible errors. It prints the byte at position 8000:
int offset = 8000;
int pageoffset = offset % getpagesize();
data = mmap(NULL, 4000 + pageoffset, PROT_READ, MAP_SHARED, fd, offset - pageoffset);
if ( data == MAP_FAILED ) {
perror ( "mmap" );
exit ( EXIT_FAILURE );
}
i = 0;
printf("%c\n",data [i + pageoffset]);

Related

Mapping existing memory (data segment) to another memory segment

As the title suggests, I would like to ask if there is any way for me to map the data segment of my executable to another memory so that any changes to the second are updated instantly on the first. One initial thought I had was to use mmap, but unfortunately mmap requires a file descriptor and I do not know of a way to somehow open a file descriptor on my running processes memory. I tried to use shmget/shmat in order to create a shared memory object on the process data segment (&__data_start) but again I failed ( even though that might have been a mistake on my end as I am unfamiliar with the shm API). A similar question I found is this: Linux mapping virtual memory range to existing virtual memory range? , but the replies are not helpful.. Any thoughts are welcome.
Thank you in advance.
Some pseudocode would look like this:
extern char __data_start, _end;
char test = 'A';
int main(int argc, char *argv[]){
size_t size = &_end - &__data_start;
char *mirror = malloc(size);
magic_map(&__data_start, mirror, size); //this is the part I need.
printf("%c\n", test) // prints A
int offset = &test - &__data_start;
*(mirror + offset) = 'B';
printf("%c\n", test) // prints B
free(mirror);
return 0;
}
it appears I managed to solve this. To be honest I don't know if it will cause problems in the future and what side effects this might have, but this is it (If any issues arise I will try to log them here for future references).
Solution:
Basically what I did was use the mmap flags MAP_ANONYMOUS and MAP_FIXED.
MAP_ANONYMOUS: With this flag a file descriptor is no longer required (hence the -1 in the call)
MAP_FIXED: With this flag the addr argument is no longer a hint, but it will put the mapping on the address you specify.
MAP_SHARED: With this you have the shared mapping so that any changes are visible to the original mapping.
I have left in a comment the munmap function. This is because if unmap executes we free the data_segment (pointed to by &__data_start) and as a result the global and static variables are corrupted. When at_exit function is called after main returns the program will crash with a segmentation fault. (Because it tries to double free the data segment)
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define _GNU_SOURCE 1
#include <unistd.h>
#include <sys/mman.h>
extern char __data_start;
extern char _end;
int test = 10;
int main(int argc, char *argv[])
{
size_t size = 4096;
char *shared = mmap(&__data_start, 4096, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if(shared == (void *)-1){
printf("Cant mmap\n");
exit(-1);
}
printf("original: %p, shared: %p\n",&__data_start, shared);
size_t offset = (void *)&test - (void *)&__data_start;
*(shared+offset) = 50;
msync(shared, 4096, MS_SYNC);
printf("test: %d :: %d\n", test, *(shared+offset));
test = 25;
printf("test: %d :: %d\n", test, *(shared+offset));
//munmap(shared, 4096);
}
Output:
original: 0x55c4066eb000, shared: 0x55c4066eb000
test: 50 :: 50
test: 25 :: 25

map a big file and scan through data

Trying to search a pattern in a big file using mmap. The file is huge (way more than the physical memory). My worry is that if I used the file size as the second parameter for mmap(), there won't be enough physical memory to satisfy the system call. So I used 0x1000 as the length in the hope that OS will automatically map the right part of file as my pointer moves. But the following code snippet gave segmentation fault.
Any ideas?
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
long fileSize(char *fname) {
struct stat stat_buf;
int rc = stat(fname, &stat_buf);
return rc == 0 ? stat_buf.st_size : -1;
}
int main(int argc, char *argv[]) {
long size = fileSize(argv[1]);
printf("size=%ld\n", size);
int fd = open(argv[1], O_RDONLY);
printf("fd=%d\n", fd);
char *p = mmap(0, 0x1000, PROT_READ, MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror ("mmap");
return 1;
}
long i;
int pktLen;
int *pInt;
for (i=0; i < size; i+=4) {
pInt = (int*)(p+i);
if (pInt[i] == 0x12345678) {
printf("found it at %ld\n", i); break;
}
}
if (i == size) {
printf("didn't find it\n");
}
close(fd);
return 0;
}
Update
Turned out I had a silly bug
The line
if (pInt[i] == 0x12345678) should have been if (pInt[0] == 0x12345678)
Use
struct stat info;
long page;
const char *map;
size_t size, mapping;
int fd, result;
page = sysconf(_SC_PAGESIZE);
if (page < 1L) {
fprintf(stderr, "Invalid page size.\n");
exit(EXIT_FAILURE);
}
fd = open(filename, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "%s: Cannot open file: %s.\n", filename, strerror(errno));
exit(EXIT_FAILURE);
}
result = fstat(fd, &info);
if (result == -1) {
fprintf(stderr, "%s: Cannot get file information: %s.\n", filename, strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
if (info.st_size <= 0) {
fprintf(stderr, "%s: No data.\n", filename);
close(fd);
exit(EXIT_FAILURE);
}
size = info.st_size;
if ((off_t)size != info.st_size) {
fprintf(stderr, "%s: File is too large to map.\n", filename);
close(fd);
exit(EXIT_FAILURE);
}
/* mapping is size rounded up to a multiple of page. */
if (size % (size_t)page)
mapping = size + page - (size % (size_t)page);
else
mapping = size;
map = mmap(NULL, mapping, PROT_READ, MAP_SHARED | MAP_NORESERVE, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "%s: Cannot map file: %s.\n", filename, strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
if (close(fd)) {
fprintf(stderr, "%s: Unexpected error closing file descriptor.\n", filename);
exit(EXIT_FAILURE);
}
/*
* Use map[0] to map[size-1], but remember that it is not a string,
* and that there is no trailing '\0' at map[size].
*
* Accessing map[size] to map[mapping-1] is not allowed, and may
* generate a SIGBUS signal (and kill the process).
*/
/* The mapping is automatically torn down when the process exits,
* but you can also unmap it with */
munmap(map, mapping);
The important points in the code above:
You'll need to start your code with e.g.
#define _POSIX_C_SOURCE 200809L
#define _BSD_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
The _BSD_SOURCE is required for MAP_NORESERVE to be defined, even though it is a GNU/Linux-specific feature.
mapping (length in man 2 mmap) must be a multiple of page size (sysconf(_SC_PAGESIZE)).
MAP_NORESERVE flag tells the kernel that the mapping is backed by the file only, and as such, is allowed to be larger than available RAM + SWAP.
You can (but do not need to) close the file descriptor referring to the mapped file with no issues, because the mapping itself contains a reference in-kernel.
Years ago, on a different forum, I showed a simple program to manipulate a terabyte of data (1 TiB = 1,099,511,627,776 bytes) using this very approach (although it uses a sparse backing file; i.e. mostly implicit zeroes, with less than 250 MB of actual data written to the backing file -- mostly to reduce the amount of disk space needed). Of course, it requires a 64-bit machine running Linux, as the virtual memory on 32-bit machines is limited to 232 = 4 GiB (Linux does not support segmented memory models).
The Linux kernel is surprisingly efficient in choosing which pages to keep in RAM, and which pages to evict. Of course, you can make that even more efficient, by telling the kernel which parts of the mapping you are unlikely to access (and therefore can be evicted), by using posix_madvise(address, length, advice) with advice being POSIX_MADV_DONTNEED or POSIX_MADV_WILLNEED. This has the benefit that unlike unmapping the "dontneed" parts, you can, if you need to, re-access that part of the mapping. (If the pages are already evicted, the access to the mapping will just block until the pages are re-loaded to memory. In other words, you can use posix_madvise() to "optimize" eviction logic, without limiting what part of the mapping can be accessed.)
In your case, if you do a linear or semi-linear search over the data using e.g. memmem(), you can use posix_madvise(map, mapping, POSIX_MADV_SEQUENTIAL).
Personally, I'd run the search first without using any posix_madvise() calls, and then see if it makes a significant enough positive difference, using the same data set (and several runs, of course). (You can safely -- with no risk of losing any data -- clear the page cache between test runs using sudo sh -c 'sync ; echo 3 > /proc/sys/vm/drop_caches ; sync', if you wish to exclude the effects of having the large file (mostly) already cached, between timing runs.)
The SIGSEGV is because you're accessing beyond 0x1000 bytes (in the for loop). You have to mmap() the complete size bytes of the fd.
The concept of demand paging in virtual memory subsystem helps exact same scenarios like yours - applications/application data bigger than the physical memory size. After the mmap(), as and when you access the (virtual) address, if there is no physical page mapped to it (page fault), kernel will find out a physical page that can be used (page replacement).
fd = open(argv[1], O_RDONLY);
ptr = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
/* Consume the entire file's data as needed */
munmap(ptr, file_size);
Alternately you can put a loop around the mmap()/munmap() to scan the file in PAGE_SIZE or in multiples of PAGE_SIZE. The last arg of mmap() - offset will come handy for that.
From man-page :
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);
Pseudo-code :
fd = open(argv[1], O_RDONLY);
last_block_size = file_size % PAGE_SIZE;
num_pages = file_size / PAGE_SIZE + (last_block_size ? 1 : 0)
for (int i = 0; i < num_pages; i++) {
block_size = last_block_size && (i == num_pages - 1) ? last_block_size : PAGE_SIZE;
ptr = mmap(NULL, block_size, PROT_READ, MAP_PRIVATE, fd, i * PAGE_SIZE);
/* Consume the file's data range (ptr, ptr+block_size-1) as needed */
munmap(ptr, block_size);
}
Please use MAP_PRIVATE as the mapping might be just needed for your process alone. It just avoids few extra steps by the kernel for the MAP_SHARED.
Edit : It should have been MAP_PRIVATE in place of MAP_ANON. Changed.

copy whole of a file into memory using mmap

i want to copy whole of a file to memory using mmap in C.i write this code:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <errno.h>
int main(int arg, char *argv[])
{
char c ;
int numOfWs = 0 ;
int numOfPr = 0 ;
int numberOfCharacters ;
int i=0;
int k;
int pageSize = getpagesize();
char *data;
float wsP = 0;
float prP = 0;
int fp = open("2.txt", O_RDWR);
data = mmap((caddr_t)0, pageSize, PROT_READ, MAP_SHARED, fp,pageSize);
printf("%s\n", data);
exit(0);
}
when i execute the code i get the Bus error message.
next, i want to iterate this copied file and do some thing on it.
how can i copy the file correctly?
2 things.
The second parameter of mmap() is the size of the portion of file you want to make visible in your address space. The last one is the offset in the file from which you want the map. This means that as you have called mmap() you will see only 1 page (on x86 and ARM it's 4096 bytes) starting at offset 4096 in your file. If your file is smaller than 4096 bytes, then there will be no mapping and mmap() will return MAP_FAILED (i.e. (caddr_t)-1). You didn't check the return value of the function so the following printf() dereferences an illegal pointer => BUS ERROR.
Using a memory map with string functions can be difficult. If the file doesn't contain binary 0. It can happen that these functions then try to access past the mapped size of the file and touch unmapped memory => SEGFAULT.
To open a memory for a file, you have to know the size of the file.
struct stat filestat;
if(fstat(fd, &filestat) !=0) {
perror("stat failed");
exit(1);
}
data = mmap(NULL, filestat.st_size, PROT_READ, MAP_SHARED, fp, 0);
if(data == MAP_FAILED) {
perror("mmap failed");
exit(2);
}
EDIT: The memory map will always be opened with a size that is a multiple of the pagesize. This means that the last page will be filled with 0 up to the next multiple of the pagesize. Often programs using memory mapped files with string functions (like your printf()) will work most of the time, but will suddenly crash when mapping a file whith a size exactly a multiple of the page size (4096, 8192, 12288 etc.). The often seen advice to pass to mmap() a size bigger than real file size works on Linux but is not portable and is even in violation of Posix, which explicitly states that mapping beyond the file size is undefined behaviour. The only portable way is to not use string functions on memory maps.
The last parameter of mmap is the offset within the file, where the part of file mapped to memory starts. It shall be 0 in your case
data = mmap(NULL, pageSize, PROT_READ, MAP_SHARED, fp,0);
If your file is shorter than pageSize, you will not be able to use addresses beyond the end of file. To use the full size, you shall expand the size to pageSize before calling mmap. Use something like:
ftruncate(fp, pageSize);
If you want to write to the memory (file) you shall use flag PROT_WRITE as well. I.e.
data = mmap(NULL, pageSize, PROT_READ|PROT_WRITE, MAP_SHARED, fp,0);
If your file does not contain 0 character (as end of string) and you want to print it as a string, you shall use printf with explicitly specified maximum size:
printf("%.*s\n", pageSize, data);
Also, of course, as pointed by #Jongware, you shall test result of open for -1 and mmap for MAP_FAILED.

Segmentation fault with Posix-C program using mmap and mapfile

Well I have this program and I get a segmentation fault: 11 (core dumped). After lots of checks I get this when the for loop gets to i=1024 and it tries to mapfile[i]=0. The program is about making a server and a client program that communicates by read/writing in a common file made in the server program. This is the server program and it prints the value inside before and after the change. I would like to see what's going on, if it's a problem with the mapping or just problem with memory of the *mapfile. Thanks!
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <errno.h>
#include <math.h>
int main()
{
int ret, i;
int *mapfile;
system("dd if=/dev/zero of=/tmp/c4 bs=4 count=500");
ret = open("/tmp/c4", O_RDWR | (mode_t)0600);
if (ret == -1)
{
perror("File");
return 0;
}
mapfile = mmap(NULL, 2000, PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
for (i=1; i<=2000; i++)
{
mapfile[i] = 0;
}
while(mapfile[0] != 555)
{
mapfile = mmap(NULL, 2000, PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
if (mapfile[0] != 0)
{
printf("Readed from file /tmp/c4 (before): %d\n", mapfile[0]);
mapfile[0]=mapfile[0]+5;
printf("Readed from file /tmp/c4 (after) : %d\n", mapfile[0]);
mapfile[0] = 0;
}
sleep(1);
}
ret = munmap(mapfile, 2000);
if (ret == -1)
{
perror("munmap");
return 0;
}
close(ret);
return 0;
}
mapfile = mmap(NULL, 2000, PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
for (i=1; i<=2000; i++)
{
mapfile[i] = 0;
}
In this code here, you see that you are requesting 2000 units of memory. In this case mmap is taking in a size_t type meaning that its looking for a size, and not an amount of things for memory. As #Mat mentioned, you will need t use the sizeof(int) operator in order to feed mmap the proper size it requires.
The other issue that should be noted about this code that may cause a problem for you down the road, is beginning your loop index at i=1 rather than i=0. Starting your index at 0 wil ensure that you are going from the indices 0 - 1999, which corresponds to the memory you are trying to allocate.
Overall here, it looks like what your trying to do is initialize the values of your memory to 0. perhaps you could do this easier by relying on a builtin function called memset:
void *memset(void *str, int c, size_t n)
your code then becomes:
mapfile = mmap(NULL, 2000*sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
void *returnedPointer = memset(mapfile, 0, 2000*sizeof(int));
docs for memset can be found here:
http://www.tutorialspoint.com/c_standard_library/c_function_memset.htm
You're requesting 2000 bytes from mmap, but treating the returned value as an array of 2000 ints. That can't work, an int is usually 4 or 8 bytes these days. You'll be writing past the end of the reserved memory in your loop.
Change the mmap calls to use 2000*sizeof(int). And while you're at it, give that 2000 constant a name (e.g. const int num_elems = 2000; near the top) and don't repeat the magic constant all over the place. And once that's done change it to 1024 or 2048 so that the resulting size is a multiple of the page size (if you're not sure of your page size, getconf PAGE_SIZE on the command line).
And also change your dd command to create a large-enough file. It is currently creating a 2000 byte file, you'll need to increase that as well.
And validate the return value of mmap - it can fail, and you should detect that.
Finally, don't continuously remap, you're using MAP_SHARED modifications through other shared mappings of the same file and offset will be visible to your process. (Must really be the same file, if the other process also does a dd, that might not work. Only one process should have the responsibility of creating that file.)
If you do want to remap, you must also unmap each time. Otherwise you're leaking mappings.

Linux mapping virtual memory range to existing virtual memory range?

In Linux, is there a way (in user space) to map a virtual address range to the physical pages that back an existing virtual address range? The mmap() function only allows one to map files or "new" physical pages. I need to be able to do something like this:
int* addr1 = malloc(SIZE);
int* addr2 = 0x60000; // Assume nothing is allocated here
fancy_map_function(addr1, addr2, SIZE);
assert(*addr1 == *addr2); // Should succeed
assert(addr1 != addr2); // Should succeed
I was curious so I tested the shared memory idea suggested in question comments, and it seems to work:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <assert.h>
#define SIZE 256
int main (int argc, char ** argv) {
int fd;
int *addr1, *addr2;
fd = shm_open("/example_shm", O_RDWR | O_CREAT, 0777);
ftruncate( fd, SIZE);
addr1 = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
addr2 = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
printf("addr1 = %p addr2 = %p\n", addr1, addr2);
*addr1 = 0x12345678;
assert(*addr1 == *addr2); // Should succeed
assert(addr1 != addr2); // Should succeed
return 0;
}
(Obviously real code will want to check the return value of the syscalls for errors and clean up after itself)
If you have the fd for the file mapped at addr1, you can simply mmap it again at addr2.
Otherwise, the Linux-specific remap_file_pages can modify the virtual address ⇆ file offset translation within a single VMA, with page-sized granularity, including mapping the same file offset to multiple addresses.
Open /proc/self/mem and mmap the range of virtual addresses you need from it.

Resources