I am trying accessing /dev/mem from user space. Using qemu-system-arm for this purpose.
UART0 is mapped: 0x101f1000 and UARTDR is placed at offset 0x0
$ devmem 0x101f1000 8 0x61
The above writes 'a' on the console.
When i try the achieve the same logic from C code, it fails
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
int fd;
char ch = 'a';
fd = open("/dev/mem", O_RDWR | O_SYNC);
if (fd < 0) {
perror("open failed");
return -1;
}
if (lseek(fd, 0x101f1000, SEEK_SET) == -1) {
perror("lseek");
}
if (write(fd, &ch, sizeof(ch)) == -1) {
perror("write");
}
close(fd);
return 0;
}
It fails with error:
write: Bad address
Trying to access device registers by using the read and write syscalls on /dev/mem is not a good idea. /dev/mem implements those syscalls mostly for convenience in accessing RAM, and there is no guarantee about whether it will make accesses of the right width for the device if you try to do that on an area of the address space with a device there. For accessing devices you should instead use mmap() and then access the right addresses directly (which gives you more control about the width of the access and exactly which addresses are touched). For an example of this you can look at the source code for devmem itself: https://github.com/hackndev/tools/blob/master/devmem2.c -- at less than 100 lines of code it's very simple and you already know it works correctly for your use case.
[probably not your main problem but still]
lseek: Success
write: Bad address
You only want to use errno (or call perror()) is the previous call failed (and is documented to set errno on failure).
So this
lseek(fd, 0x101f1000, SEEK_SET);
perror("lseek");
should look like
if ((off_t) -1 == lseek(fd, 0x101f1000, SEEK_SET))
{
perror("lseek() failed");
}
Same for the call to write(), BTW.
Related
I'm not sure if this question makes sense, but let's say I have a pointer to some memory:
char *mem;
size_t len;
Is it possible to somehow map the contents of mem to another address as a read-only mapping? i.e. I want to obtain a pointer mem2 such that mem2 != mem and accessing mem2[i] actually reads mem[i] (without doing a copy).
My ultimate goal would be to take non-contiguous chunks of memory and make them appear to be contiguous by mapping them next to each other.
One approach I considered is to use fmemopen and then mmap, but there's no file descriptor associated with the result of fmemopen.
General case - no control over first mapping
/proc/[PID]/pagemap + /dev/mem
The only way I can think of making this work without any copying is by manually opening and checking /proc/[PID]/pagemap to get the Page Frame Number of the physical page corresponding to the page you want to "alias", and then opening and mapping /dev/mem at the corresponding offset. While this would work in theory, it would require root privileges, and is most likely not possible on any reasonable Linux distribution since the kernel is usually configured with CONFIG_STRICT_DEVMEM=y which puts strict restrictions over the usage of /dev/mem. For example on x86 it disallows reading RAM from /dev/mem (only allows reading memory-mapped PCI regions). Note that in order for this to work the page you want to "alias" needs to be locked to keep it in RAM.
In any case, here's an example of how this would work if you were able/willing to do this (I am assuming x86 64bit here):
#include <stdio.h>
#include <errno.h>
#include <limits.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
/* Get the physical address of an existing virtual memory page and map it. */
int main(void) {
FILE *fp;
char *endp;
unsigned long addr, info, physaddr, val;
long off;
int fd;
void *mem;
void *orig_mem;
// Suppose that this is the existing page you want to "alias"
orig_mem = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (orig_mem == MAP_FAILED) {
perror("mmap orig_mem failed");
return 1;
}
// Write a dummy value just for testing
*(unsigned long *)orig_mem = 0x1122334455667788UL;
// Lock the page to prevent it from being swapped out
if (mlock(orig_mem, 0x1000)) {
perror("mlock orig_mem failed");
return 1;
}
fp = fopen("/proc/self/pagemap", "rb");
if (!fp) {
perror("Failed to open \"/proc/self/pagemap\"");
return 1;
}
addr = (unsigned long)orig_mem;
off = addr / 0x1000 * 8;
if (fseek(fp, off, SEEK_SET)) {
perror("fseek failed");
return 1;
}
// Get its information from /proc/self/pagemap
if (fread(&info, sizeof(info), 1, fp) != 1) {
perror("fread failed");
return 1;
}
physaddr = (info & ((1UL << 55) - 1)) << 12;
printf("Value: %016lx\n", info);
printf("Physical address: 0x%016lx\n", physaddr);
// Ensure page is in RAM, should be true since it was mlock'd
if (!(info & (1UL << 63))) {
fputs("Page is not in RAM? Strange! Aborting.\n", stderr);
return 1;
}
fd = open("/dev/mem", O_RDONLY);
if (fd == -1) {
perror("open(\"/dev/mem\") failed");
return 1;
}
mem = mmap(NULL, 0x1000, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, fd, physaddr);
if (mem == MAP_FAILED) {
perror("Failed to mmap \"/dev/mem\"");
return 1;
}
// Now `mem` is effecively referring to the same physical page that
// `orig_mem` refers to.
// Try reading 8 bytes (note: this will just return 0 if
// CONFIG_STRICT_DEVMEM=y).
val = *(unsigned long *)mem;
printf("Read 8 bytes at physaddr 0x%016lx: %016lx\n", physaddr, val);
return 0;
}
userfaultfd(2)
Other than what I described above, AFAIK there isn't a way to do what you want from userspace without copying. I.E. there is not a way to simply tell the kernel "map this second virtual addresses to the same memory of an existing one". You can however register an userspace handler for page faults through the userfaultfd(2) syscall and ioctl_userfaultfd(2), and I think this is overall your best shot.
The whole mechanism is similar to what the kernel would do with a real memory page, only that the faults are handled by a user-defined userspace handler thread. This is still pretty much an actual copy, but is atomic to the faulting thread and gives you more control. It could potentially also perform better in general since the copying is controlled by you and can therefore be done only if/when needed (i.e. at the first read fault), while in the case of a normal mmap + copy you always do the copying regardless if the page will ever be accessed later or not.
There is a pretty good example program in the manual page for userfaultfd(2) which I linked above, so I'm not going to copy-paste it here. It deals with one or more pages and should give you an idea about the whole API.
Simpler case - control over the first mapping
In the case you do have control over the first mapping which you want to "alias", then you can simply create a shared mapping. What you are looking for is memfd_create(2). You can use it to create an anonymous file which can then be mmaped multiple times with different permissions.
Here's a simple example:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(void) {
int memfd;
void *mem_ro, *mem_rw;
// Create a memfd
memfd = memfd_create("something", 0);
if (memfd == -1) {
perror("memfd_create failed");
return 1;
}
// Give the file a size, otherwise reading/writing will fail
if (ftruncate(memfd, 0x1000) == -1) {
perror("ftruncate failed");
return 1;
}
// Map the fd as read only and private
mem_ro = mmap(NULL, 0x1000, PROT_READ, MAP_PRIVATE, memfd, 0);
if (mem_ro == MAP_FAILED) {
perror("mmap failed");
return 1;
}
// Map the fd as read/write and shared (shared is needed if we want
// write operations to be propagated to the other mappings)
mem_rw = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, memfd, 0);
if (mem_rw == MAP_FAILED) {
perror("mmap failed");
return 1;
}
printf("ro mapping # %p\n", mem_ro);
printf("rw mapping # %p\n", mem_rw);
// This write can now be read from both mem_ro and mem_rw
*(char *)mem_rw = 123;
// Test reading
printf("read from ro mapping: %d\n", *(char *)mem_ro);
printf("read from rw mapping: %d\n", *(char *)mem_rw);
return 0;
}
Trying to search a pattern in a big file using mmap. The file is huge (way more than the physical memory). My worry is that if I used the file size as the second parameter for mmap(), there won't be enough physical memory to satisfy the system call. So I used 0x1000 as the length in the hope that OS will automatically map the right part of file as my pointer moves. But the following code snippet gave segmentation fault.
Any ideas?
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
long fileSize(char *fname) {
struct stat stat_buf;
int rc = stat(fname, &stat_buf);
return rc == 0 ? stat_buf.st_size : -1;
}
int main(int argc, char *argv[]) {
long size = fileSize(argv[1]);
printf("size=%ld\n", size);
int fd = open(argv[1], O_RDONLY);
printf("fd=%d\n", fd);
char *p = mmap(0, 0x1000, PROT_READ, MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror ("mmap");
return 1;
}
long i;
int pktLen;
int *pInt;
for (i=0; i < size; i+=4) {
pInt = (int*)(p+i);
if (pInt[i] == 0x12345678) {
printf("found it at %ld\n", i); break;
}
}
if (i == size) {
printf("didn't find it\n");
}
close(fd);
return 0;
}
Update
Turned out I had a silly bug
The line
if (pInt[i] == 0x12345678) should have been if (pInt[0] == 0x12345678)
Use
struct stat info;
long page;
const char *map;
size_t size, mapping;
int fd, result;
page = sysconf(_SC_PAGESIZE);
if (page < 1L) {
fprintf(stderr, "Invalid page size.\n");
exit(EXIT_FAILURE);
}
fd = open(filename, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "%s: Cannot open file: %s.\n", filename, strerror(errno));
exit(EXIT_FAILURE);
}
result = fstat(fd, &info);
if (result == -1) {
fprintf(stderr, "%s: Cannot get file information: %s.\n", filename, strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
if (info.st_size <= 0) {
fprintf(stderr, "%s: No data.\n", filename);
close(fd);
exit(EXIT_FAILURE);
}
size = info.st_size;
if ((off_t)size != info.st_size) {
fprintf(stderr, "%s: File is too large to map.\n", filename);
close(fd);
exit(EXIT_FAILURE);
}
/* mapping is size rounded up to a multiple of page. */
if (size % (size_t)page)
mapping = size + page - (size % (size_t)page);
else
mapping = size;
map = mmap(NULL, mapping, PROT_READ, MAP_SHARED | MAP_NORESERVE, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "%s: Cannot map file: %s.\n", filename, strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
if (close(fd)) {
fprintf(stderr, "%s: Unexpected error closing file descriptor.\n", filename);
exit(EXIT_FAILURE);
}
/*
* Use map[0] to map[size-1], but remember that it is not a string,
* and that there is no trailing '\0' at map[size].
*
* Accessing map[size] to map[mapping-1] is not allowed, and may
* generate a SIGBUS signal (and kill the process).
*/
/* The mapping is automatically torn down when the process exits,
* but you can also unmap it with */
munmap(map, mapping);
The important points in the code above:
You'll need to start your code with e.g.
#define _POSIX_C_SOURCE 200809L
#define _BSD_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
The _BSD_SOURCE is required for MAP_NORESERVE to be defined, even though it is a GNU/Linux-specific feature.
mapping (length in man 2 mmap) must be a multiple of page size (sysconf(_SC_PAGESIZE)).
MAP_NORESERVE flag tells the kernel that the mapping is backed by the file only, and as such, is allowed to be larger than available RAM + SWAP.
You can (but do not need to) close the file descriptor referring to the mapped file with no issues, because the mapping itself contains a reference in-kernel.
Years ago, on a different forum, I showed a simple program to manipulate a terabyte of data (1 TiB = 1,099,511,627,776 bytes) using this very approach (although it uses a sparse backing file; i.e. mostly implicit zeroes, with less than 250 MB of actual data written to the backing file -- mostly to reduce the amount of disk space needed). Of course, it requires a 64-bit machine running Linux, as the virtual memory on 32-bit machines is limited to 232 = 4 GiB (Linux does not support segmented memory models).
The Linux kernel is surprisingly efficient in choosing which pages to keep in RAM, and which pages to evict. Of course, you can make that even more efficient, by telling the kernel which parts of the mapping you are unlikely to access (and therefore can be evicted), by using posix_madvise(address, length, advice) with advice being POSIX_MADV_DONTNEED or POSIX_MADV_WILLNEED. This has the benefit that unlike unmapping the "dontneed" parts, you can, if you need to, re-access that part of the mapping. (If the pages are already evicted, the access to the mapping will just block until the pages are re-loaded to memory. In other words, you can use posix_madvise() to "optimize" eviction logic, without limiting what part of the mapping can be accessed.)
In your case, if you do a linear or semi-linear search over the data using e.g. memmem(), you can use posix_madvise(map, mapping, POSIX_MADV_SEQUENTIAL).
Personally, I'd run the search first without using any posix_madvise() calls, and then see if it makes a significant enough positive difference, using the same data set (and several runs, of course). (You can safely -- with no risk of losing any data -- clear the page cache between test runs using sudo sh -c 'sync ; echo 3 > /proc/sys/vm/drop_caches ; sync', if you wish to exclude the effects of having the large file (mostly) already cached, between timing runs.)
The SIGSEGV is because you're accessing beyond 0x1000 bytes (in the for loop). You have to mmap() the complete size bytes of the fd.
The concept of demand paging in virtual memory subsystem helps exact same scenarios like yours - applications/application data bigger than the physical memory size. After the mmap(), as and when you access the (virtual) address, if there is no physical page mapped to it (page fault), kernel will find out a physical page that can be used (page replacement).
fd = open(argv[1], O_RDONLY);
ptr = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
/* Consume the entire file's data as needed */
munmap(ptr, file_size);
Alternately you can put a loop around the mmap()/munmap() to scan the file in PAGE_SIZE or in multiples of PAGE_SIZE. The last arg of mmap() - offset will come handy for that.
From man-page :
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);
Pseudo-code :
fd = open(argv[1], O_RDONLY);
last_block_size = file_size % PAGE_SIZE;
num_pages = file_size / PAGE_SIZE + (last_block_size ? 1 : 0)
for (int i = 0; i < num_pages; i++) {
block_size = last_block_size && (i == num_pages - 1) ? last_block_size : PAGE_SIZE;
ptr = mmap(NULL, block_size, PROT_READ, MAP_PRIVATE, fd, i * PAGE_SIZE);
/* Consume the file's data range (ptr, ptr+block_size-1) as needed */
munmap(ptr, block_size);
}
Please use MAP_PRIVATE as the mapping might be just needed for your process alone. It just avoids few extra steps by the kernel for the MAP_SHARED.
Edit : It should have been MAP_PRIVATE in place of MAP_ANON. Changed.
Let's say I have the standard "Hello, World! \n" saved to a text file called hello.txt. If I want to change the 'H' to a 'R' or something, can I achieve this with mmap()?
mmap does not exist in the standard C99 (or C11) specification. It is defined in POSIX.
So assuming you have a POSIX system (e.g. Linux), you could first open(2) the file for read & write:
int myfd = open("hello.txt", O_RDWR);
if (myfd<0) { perror("hello.txt open"); exit(EXIT_FAILURE); };
Then you get the size (and other meta-data) of the file with fstat(2):
struct stat mystat = {};
if (fstat(myfd,&mystat)) { perror("fstat"); exit(EXIT_FAILURE); };
Now the size of the file is in mystat.st_size.
off_t myfsz = mystat.st_size;
Now we can call mmap(2) and we need to share the mapping (to be able to write inside the file thru the virtual address space)
void*ad = mmap(NULL, myfsz, PROT_READ|PROT_WRITE, MAP_SHARED,
myfd, 0);
if (ad == MMAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); };
Then we can overwrite the first byte (and we check that indeed the first byte in that file is H since you promised so):
assert (*(char*ad) == 'H');
((char*)ad) = 'R';
We might call msync(2) to ensure the file is updated right now on the disk. If we don't, it could be updated later.
Notably for very large mappings (notably those much larger than available RAM), we can assist the kernel (and its page cache) with hints given thru madvise(2) or posix_madvise(3)...
Notice that a mapping remains in effect even after a close(2). Use munmap & mprotect or mmap with MAP_FIXED on the same address range to change them.
On Linux, you could use proc(5) to query the address space. So your program could read (e.g. after fopen, using fgets in a loop) the pseudo /proc/self/maps file (or /proc/1234/maps for process of pid 1234).
BTW, mmap is used by dlopen(3); it can be called a lot of times, my manydl.c program demonstrates that on Linux you could have many hundreds of thousands of dlopen-ed shared files (so many hundreds of thousands of memory mappings).
Here's a working example.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(){
int myFile = open("hello.txt", O_RDWR);
if(myFile < 0){
printf("open error\n");
}
struct stat myStat = {};
if (fstat(myFile, &myStat)){
printf("fstat error\n");
}
off_t size = myStat.st_size;
char *addr;
addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0);
if (addr == MAP_FAILED){
printf("mmap error\n");
}
if (addr[0] != 'H'){
printf("Error: first char in file not H");
}
addr[0] = 'J';
return 0;
}
When opening the file /dev/urandom in nonblocking mode it is still blocking when reading. Why is the read call still blocking.
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
int main(int argc, char *argv[])
{
int fd = open("/dev/urandom", O_NONBLOCK);
if (fd == -1) {
printf("Unable to open file\n");
return 1;
}
int flags = fcntl(fd, F_GETFL);
if (flags & O_NONBLOCK) {
printf("non block is set\n");
}
int ret;
char* buf = (char*)malloc(10000000);
ret = read(fd, buf, 10000000);
if (ret == -1) {
printf("Error reading: %s\n", strerror(errno));
} else {
printf("bytes read: %d\n", ret);
}
return 0;
}
The output looks like this:
gcc nonblock.c -o nonblock
./nonblock
non block is set
bytes read: 10000000
Opening any (device) file in nonblocking mode does not mean you never need to wait for it.
O_NONBLOCK just says return EAGAIN if there is no data available.
Obviously, the urandom driver always considers to have data available, but isn't necessarily fast to deliver it.
/dev/urandom is non-blocking by design:
When read, the /dev/random device will only return random bytes
within the estimated number of bits of noise in the entropy pool.
/dev/random should be suitable for uses that need very high quality
randomness such as one-time pad or key generation. When the entropy
pool is empty, reads from /dev/random will block until additional
environmental noise is gathered.
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver.
If you replace it with /dev/random, your program should produce a different result.
In Linux, it is not possible to open regular files in non blocking mode. You have to use the AIO interface to read from /dev/urandom in non blocking mode.
I've been working with large sparse files on openSUSE 11.2 x86_64. When I try to mmap() a 1TB sparse file, it fails with ENOMEM. I would have thought that the 64 bit address space would be adequate to map in a terabyte, but it seems not. Experimenting further, a 1GB file works fine, but a 2GB file (and anything bigger) fails. I'm guessing there might be a setting somewhere to tweak, but an extensive search turns up nothing.
Here's some sample code that shows the problem - any clues?
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
char * filename = argv[1];
int fd;
off_t size = 1UL << 40; // 30 == 1GB, 40 == 1TB
fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666);
ftruncate(fd, size);
printf("Created %ld byte sparse file\n", size);
char * buffer = (char *)mmap(NULL, (size_t)size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if ( buffer == MAP_FAILED ) {
perror("mmap");
exit(1);
}
printf("Done mmap - returned 0x0%lx\n", (unsigned long)buffer);
strcpy( buffer, "cafebabe" );
printf("Wrote to start\n");
strcpy( buffer + (size - 9), "deadbeef" );
printf("Wrote to end\n");
if ( munmap(buffer, (size_t)size) < 0 ) {
perror("munmap");
exit(1);
}
close(fd);
return 0;
}
The problem was that the per-process virtual memory limit was set to only 1.7GB. ulimit -v 1610612736 set it to 1.5TB and my mmap() call succeeded. Thanks, bmargulies, for the hint to try ulimit -a!
Is there some sort of per-user quota, limiting the amount of memory available to a user process?
My guess is the the kernel is having difficulty allocating the memory that it needs to keep up with this memory mapping. I don't know how swapped out pages are kept up with in the Linux kernel (and I assume that most of the file would be in the swapped out state most of the time), but it may end up needing an entry for each page of memory that the file takes up in a table. Since this file might be mmapped by more than one process the kernel has to keep up with the mapping from the process's point of view, which would map to another point of view, which would map to secondary storage (and include fields for device and location).
This would fit into your addressable space, but might not fit (at least contiguously) within physical memory.
If anyone knows more about how Linux does this I'd be interested to hear about it.