Using mprotect() to set main() as writeable - c

Using mprotect to set main() as writeable correctly works using this code.
https://godbolt.org/z/68vfrTq8z
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
int main()
{
long page_size = sysconf(_SC_PAGE_SIZE);
printf("page_size = %li\n", page_size);
void* main_address = (void*)&main;
printf("main_address = %p\n",main_address);
void* main_start = (void*)(((uintptr_t)&main) & ~(sysconf(_SC_PAGE_SIZE) - 1));
printf("main_start = %p\n", main_start);
size_t main_size = (void*)&&main_end - (void*)&main;
printf("main_size = %d\n", main_size);
mprotect(main_start, main_size, PROT_READ | PROT_WRITE | PROT_EXEC);
// nop the breakpoint to allow printf() to run!
*((char*)&&breakpoint) = 0x90;
breakpoint:
__asm("int $3");
printf("\nHello, World!\n");
return 0;
main_end:
}
The program outputs
page_size = 4096
main_address = 0x401156
main_start = 0x401000
main_size = 193
Hello, World!
Question
I initially coded it passing main_address to mprotect but it failed with an "Invalid argument" (22) error because it was not aligned to a page boundary! Passing main_start works because it's the aligned address of the program entry point, but what is main_address and
why isn't it initially aligned too?
What's confusing me is that if mprotect(0x401000, 193, ...) is being write enabled, but the main_address is at 0x401156 which is 342 bytes further down in memory, how is the main() still writeable? Does mprotect() modify pages and not a specific byte range?
If pages are being write enabled, does that potentially imply memory portions past the end of main() are also writeable?

Per the POSIX documentation:
The mprotect() function shall change the access protections to be that specified by prot
for those whole pages containing any part of the address space of the process starting at
address addr and continuing for len bytes.
So by that, it will figure out which pages contain the specified bytes (effectively, rounding addr down to a page boundary and addr+len up to a page boundary) and change the permissions on those pages. So it may affect the permissions on bytes before and after the range requested.
However, under errors it says:
The mprotect() function may fail if:
EINVAL The addr argument is not a multiple of the page size as returned by sysconf().
So a non-page aligned addr may fail instead of rounding down, but a non-page aligned len will work. In practice, it depends on the actual OS -- some POSIX systems support unaligned addresses here and others do not.

Related

Mapping existing memory (data segment) to another memory segment

As the title suggests, I would like to ask if there is any way for me to map the data segment of my executable to another memory so that any changes to the second are updated instantly on the first. One initial thought I had was to use mmap, but unfortunately mmap requires a file descriptor and I do not know of a way to somehow open a file descriptor on my running processes memory. I tried to use shmget/shmat in order to create a shared memory object on the process data segment (&__data_start) but again I failed ( even though that might have been a mistake on my end as I am unfamiliar with the shm API). A similar question I found is this: Linux mapping virtual memory range to existing virtual memory range? , but the replies are not helpful.. Any thoughts are welcome.
Thank you in advance.
Some pseudocode would look like this:
extern char __data_start, _end;
char test = 'A';
int main(int argc, char *argv[]){
size_t size = &_end - &__data_start;
char *mirror = malloc(size);
magic_map(&__data_start, mirror, size); //this is the part I need.
printf("%c\n", test) // prints A
int offset = &test - &__data_start;
*(mirror + offset) = 'B';
printf("%c\n", test) // prints B
free(mirror);
return 0;
}
it appears I managed to solve this. To be honest I don't know if it will cause problems in the future and what side effects this might have, but this is it (If any issues arise I will try to log them here for future references).
Solution:
Basically what I did was use the mmap flags MAP_ANONYMOUS and MAP_FIXED.
MAP_ANONYMOUS: With this flag a file descriptor is no longer required (hence the -1 in the call)
MAP_FIXED: With this flag the addr argument is no longer a hint, but it will put the mapping on the address you specify.
MAP_SHARED: With this you have the shared mapping so that any changes are visible to the original mapping.
I have left in a comment the munmap function. This is because if unmap executes we free the data_segment (pointed to by &__data_start) and as a result the global and static variables are corrupted. When at_exit function is called after main returns the program will crash with a segmentation fault. (Because it tries to double free the data segment)
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define _GNU_SOURCE 1
#include <unistd.h>
#include <sys/mman.h>
extern char __data_start;
extern char _end;
int test = 10;
int main(int argc, char *argv[])
{
size_t size = 4096;
char *shared = mmap(&__data_start, 4096, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if(shared == (void *)-1){
printf("Cant mmap\n");
exit(-1);
}
printf("original: %p, shared: %p\n",&__data_start, shared);
size_t offset = (void *)&test - (void *)&__data_start;
*(shared+offset) = 50;
msync(shared, 4096, MS_SYNC);
printf("test: %d :: %d\n", test, *(shared+offset));
test = 25;
printf("test: %d :: %d\n", test, *(shared+offset));
//munmap(shared, 4096);
}
Output:
original: 0x55c4066eb000, shared: 0x55c4066eb000
test: 50 :: 50
test: 25 :: 25

How to use malloc with madvise and enable MADV_DONTDUMP option

I'm looking to use madvise and malloc but I have always the same error:
madvise error: Invalid argument
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
The page size is 4096.
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
p_optimize_object = malloc(optimize_object_size);
if (madvise(p_optimize_object, optimize_object_size, MADV_DONTDUMP | MADV_SEQUENTIAL) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
Here's the command:
$ gcc -g -O3 madvice.c && ./a.out
Output:
madvise error: Invalid argument
You can't and even if you could do it in certain cases with certain flags (and the flags you're trying to use here should be relatively harmless), you shouldn't. madvise operates on memory from lower level allocations than malloc gives you and messing with the memory from malloc will likely break malloc.
If you want some block of memory that you can call madvise on, you should obtain it using mmap.
Your usage of sizeof is wrong; you are allocating only four bytes of memory (sizeof unsigned int), and calling madvise() with a size argument of 1M for the same chunk of memory.
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
p_optimize_object = malloc(sizeof(optimize_object_size));
fprintf(stderr, "Allocated %zu bytes\n", sizeof(optimize_object_size));
if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_SEQUENTIAL) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
Output:
optimize_object_size = 1052672
Allocated 4 bytes
madvise error: Invalid argument
OK
UPDATE:
And the other problem is that malloc() can give you non-aligned memory (probably with an alignment of 4,8,16,...), where madvice() wants page-aligned memory:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
int rc;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
#if 0
p_optimize_object = malloc(sizeof(optimize_object_size));
fprintf(stderr, "Allocated %zu bytes\n", sizeof(optimize_object_size));
#elif 0
p_optimize_object = malloc(optimize_object_size);
fprintf(stderr, "Allocated %zu bytes\n", optimize_object_size);
#else
rc = posix_memalign (&p_optimize_object, 4096, optimize_object_size);
fprintf(stderr, "Allocated %zu bytes:%d\n", optimize_object_size, rc);
#endif
// if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_SEQUENTIAL) == -1)
if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_DONTFORK) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
OUTPUT:
$ ./a.out
optimize_object_size = 1052672
Allocated 1052672 bytes:0
OK
And the alignement requerement appears to be linux-specific:
Linux Notes
The current Linux implementation (2.4.0) views this system call more as a command than as advice and hence may return an error when it cannot
do what it usually would do in response to this advice. (See the ERRORS description above.) This is non-standard behavior.
The Linux implementation requires that the address addr be page-aligned, and allows length to be zero. If there are some parts of the speciā€
fied address range that are not mapped, the Linux version of madvise() ignores them and applies the call to the rest (but returns ENOMEM from
the system call, as it should).
Finally:
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
Which, of course, doesn't make sense. Malloc or posix_memalign add to your address space, making (at least) the VSIZ of your running program larger. What happens to the this space is completely in the hands of the (kernel) memory manager, driven by your program's references to the particular memory, with maybe a few hints from madvice.
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
Read again, and more carefully, the madvise(2) man page.
The address should be page aligned. The result of malloc is generally not page aligned (page size is often 4Kbytes, but see sysconf(3) for SC_PAGESIZE). Use mmap(2) to ask for a page-aligned segment in your virtual address space.
You won't save any space in your binary executable. You'll just save space in your core dump, see core(5). And core dumps should not happen. See signal(7) (read also about segmentation fault and undefined behaviour).
To disable core dumps, consider rather setrlimit(2) with RLIMIT_CORE (or the ulimit -c bash builtin in your terminal running a bash shell).

Why does a read operation on a memory mapped zero byte file lead to SIGBUS?

Here is the example code I wrote.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
int main()
{
int fd;
long pagesize;
char *data;
if ((fd = open("foo.txt", O_RDONLY)) == -1) {
perror("open");
return 1;
}
pagesize = sysconf(_SC_PAGESIZE);
printf("pagesize: %ld\n", pagesize);
data = mmap(NULL, pagesize, PROT_READ, MAP_SHARED, fd, 0);
printf("data: %p\n", data);
if (data == (void *) -1) {
perror("mmap");
return 1;
}
printf("%d\n", data[0]);
printf("%d\n", data[1]);
printf("%d\n", data[2]);
printf("%d\n", data[4096]);
printf("%d\n", data[4097]);
printf("%d\n", data[4098]);
return 0;
}
If I provide a zero byte foo.txt to this program, it terminates with SIGBUS.
$ > foo.txt && gcc foo.c && ./a.out
pagesize: 4096
data: 0x7f8d882ab000
Bus error
If I provide a one byte foo.txt to this program, then there is no such issue.
$ printf A > foo.txt && gcc foo.c && ./a.out
pagesize: 4096
data: 0x7f5f3b679000
65
0
0
48
56
10
mmap(2) mentions the following.
Use of a mapped region can result in these signals:
SIGSEGV Attempted write into a region mapped as read-only.
SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file).
So if I understand this correctly, even the second test case (1-byte file) should have led to SIGBUS because data[1] and data[2] are trying to access a portion of the buffer (data) that does not correspond to the file.
Can you help me understand why only a zero byte file causes this program to fail with SIGBUS?
You get SIGBUS when accessing past the end of last whole mapped page because the POSIX standard states:
The mmap() function can be used to map a region of memory that is larger than the current size of the object. Memory access within the mapping but beyond the current end of the underlying objects may result in SIGBUS signals being sent to the process.
With a zero-byte file, the entire page you mapped is "beyond the current end of the underlying object". So you get SIGBUS.
You do NOT get a SIGBUS when you go beyond the 4kB page you've mapped because that's not within your mapping. You don't get a SIGBUS accessing your mapping when your file is larger than zero bytes because the entire page gets mapped.
But you would get a SIGBUS if you mapped additional pages past the end of the file, such as mapping two 4kB pages for a 1-byte file. If you access that second 4kB page, you'd get SIGBUS.
A 1-byte file does not lead to the crash because mmap will map memory in multiples of the page size and zero the remainder. From the man page:
A file is mapped in multiples of the page size. For a file that is
not a multiple of the page size, the remaining memory is zeroed when
mapped, and writes to that region are not written out to the file.
The effect of changing the size of the underlying file of a mapping
on the pages that correspond to added or removed regions of the file
is unspecified.

copy whole of a file into memory using mmap

i want to copy whole of a file to memory using mmap in C.i write this code:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <errno.h>
int main(int arg, char *argv[])
{
char c ;
int numOfWs = 0 ;
int numOfPr = 0 ;
int numberOfCharacters ;
int i=0;
int k;
int pageSize = getpagesize();
char *data;
float wsP = 0;
float prP = 0;
int fp = open("2.txt", O_RDWR);
data = mmap((caddr_t)0, pageSize, PROT_READ, MAP_SHARED, fp,pageSize);
printf("%s\n", data);
exit(0);
}
when i execute the code i get the Bus error message.
next, i want to iterate this copied file and do some thing on it.
how can i copy the file correctly?
2 things.
The second parameter of mmap() is the size of the portion of file you want to make visible in your address space. The last one is the offset in the file from which you want the map. This means that as you have called mmap() you will see only 1 page (on x86 and ARM it's 4096 bytes) starting at offset 4096 in your file. If your file is smaller than 4096 bytes, then there will be no mapping and mmap() will return MAP_FAILED (i.e. (caddr_t)-1). You didn't check the return value of the function so the following printf() dereferences an illegal pointer => BUS ERROR.
Using a memory map with string functions can be difficult. If the file doesn't contain binary 0. It can happen that these functions then try to access past the mapped size of the file and touch unmapped memory => SEGFAULT.
To open a memory for a file, you have to know the size of the file.
struct stat filestat;
if(fstat(fd, &filestat) !=0) {
perror("stat failed");
exit(1);
}
data = mmap(NULL, filestat.st_size, PROT_READ, MAP_SHARED, fp, 0);
if(data == MAP_FAILED) {
perror("mmap failed");
exit(2);
}
EDIT: The memory map will always be opened with a size that is a multiple of the pagesize. This means that the last page will be filled with 0 up to the next multiple of the pagesize. Often programs using memory mapped files with string functions (like your printf()) will work most of the time, but will suddenly crash when mapping a file whith a size exactly a multiple of the page size (4096, 8192, 12288 etc.). The often seen advice to pass to mmap() a size bigger than real file size works on Linux but is not portable and is even in violation of Posix, which explicitly states that mapping beyond the file size is undefined behaviour. The only portable way is to not use string functions on memory maps.
The last parameter of mmap is the offset within the file, where the part of file mapped to memory starts. It shall be 0 in your case
data = mmap(NULL, pageSize, PROT_READ, MAP_SHARED, fp,0);
If your file is shorter than pageSize, you will not be able to use addresses beyond the end of file. To use the full size, you shall expand the size to pageSize before calling mmap. Use something like:
ftruncate(fp, pageSize);
If you want to write to the memory (file) you shall use flag PROT_WRITE as well. I.e.
data = mmap(NULL, pageSize, PROT_READ|PROT_WRITE, MAP_SHARED, fp,0);
If your file does not contain 0 character (as end of string) and you want to print it as a string, you shall use printf with explicitly specified maximum size:
printf("%.*s\n", pageSize, data);
Also, of course, as pointed by #Jongware, you shall test result of open for -1 and mmap for MAP_FAILED.

Using system calls from C, how do I get the utilization of the CPU(s)?

In C on FreeBSD, how does one access the CPU utilization?
I am writing some code to handle HTTP redirects. If the CPU load goes above a threshold on a FReeBSD system, I want to redirect client requests. Looking over the man pages, kvm_getpcpu() seems to be the right answer, but the man pages (that I read) don't document the usage.
Any tips or pointers would be welcome - thanks!
After reading the answers here, I was able to come up with the below. Due to the poor documentation, I'm not 100% sure it is correct, but top seems to agree. Thanks to everyone who answered.
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <unistd.h>
#define CP_USER 0
#define CP_NICE 1
#define CP_SYS 2
#define CP_INTR 3
#define CP_IDLE 4
#define CPUSTATES 5
int main()
{
long cur[CPUSTATES], last[CPUSTATES];
size_t cur_sz = sizeof cur;
int state, i;
long sum;
double util;
memset(last, 0, sizeof last);
for (i=0; i<6; i++)
{
if (sysctlbyname("kern.cp_time", &cur, &cur_sz, NULL, 0) < 0)
{
printf ("Error reading kern.cp_times sysctl\n");
return -1;
}
sum = 0;
for (state = 0; state<CPUSTATES; state++)
{
long tmp = cur[state];
cur[state] -= last[state];
last[state] = tmp;
sum += cur[state];
}
util = 100.0L - (100.0L * cur[CP_IDLE] / (sum ? (double) sum : 1.0L));
printf("cpu utilization: %7.3f\n", util);
sleep(1);
}
return 0;
}
From the MAN pages
NAME
kvm_getmaxcpu, kvm_getpcpu -- access per-CPU data
LIBRARY
Kernel Data Access Library (libkvm, -lkvm)
SYNOPSIS
#include <sys/param.h>
#include <sys/pcpu.h>
#include <sys/sysctl.h>
#include <kvm.h>
int
kvm_getmaxcpu(kvm_t *kd);
void *
kvm_getpcpu(kvm_t *kd, int cpu);
DESCRIPTION
The kvm_getmaxcpu() and kvm_getpcpu() functions are used to access the
per-CPU data of active processors in the kernel indicated by kd. The
kvm_getmaxcpu() function returns the maximum number of CPUs supported by
the kernel. The kvm_getpcpu() function returns a buffer holding the per-
CPU data for a single CPU. This buffer is described by the struct pcpu
type. The caller is responsible for releasing the buffer via a call to
free(3) when it is no longer needed. If cpu is not active, then NULL is
returned instead.
CACHING
These functions cache the nlist values for various kernel variables which
are reused in successive calls. You may call either function with kd set
to NULL to clear this cache.
RETURN VALUES
On success, the kvm_getmaxcpu() function returns the maximum number of
CPUs supported by the kernel. If an error occurs, it returns -1 instead.
On success, the kvm_getpcpu() function returns a pointer to an allocated
buffer or NULL. If an error occurs, it returns -1 instead.
If either function encounters an error, then an error message may be
retrieved via kvm_geterr(3.)
EDIT
Here's the kvm_t struct:
struct __kvm {
/*
* a string to be prepended to error messages
* provided for compatibility with sun's interface
* if this value is null, errors are saved in errbuf[]
*/
const char *program;
char *errp; /* XXX this can probably go away */
char errbuf[_POSIX2_LINE_MAX];
#define ISALIVE(kd) ((kd)->vmfd >= 0)
int pmfd; /* physical memory file (or crashdump) */
int vmfd; /* virtual memory file (-1 if crashdump) */
int unused; /* was: swap file (e.g., /dev/drum) */
int nlfd; /* namelist file (e.g., /kernel) */
struct kinfo_proc *procbase;
char *argspc; /* (dynamic) storage for argv strings */
int arglen; /* length of the above */
char **argv; /* (dynamic) storage for argv pointers */
int argc; /* length of above (not actual # present) */
char *argbuf; /* (dynamic) temporary storage */
/*
* Kernel virtual address translation state. This only gets filled
* in for dead kernels; otherwise, the running kernel (i.e. kmem)
* will do the translations for us. It could be big, so we
* only allocate it if necessary.
*/
struct vmstate *vmst;
};
I believe you want to look into 'man sysctl'.
I don't know the exact library, command, or system call; however, if you really get stuck, download the source code to top. It displays per-cpu stats when you use the "-P" flag, and it has to get that information from somewhere.

Resources