Inject shared library into a process - c

I just started to learn injection techniques in Linux and want to write a simple program to inject a shared library into a running process. (the library will simply print a string.) However, after a couple of hours research, I couldn't find any complete example. Well, I did figure out I probably need to use ptrace() to pause the process and inject the contents, but not sure how to load the library into the memory space of target process and relocation stuff in C code. Does anyone know any good resources or working examples for shared library injection? (Of course, I know there might be some existing libraries like hotpatch I can use to make injection much easier but that's not what I want)
And if anyone can write some pseudo code or give me an example, I will appreciate it. Thanks.
PS: I am not asking about LD_PRELOAD trick.

The "LD_PRELOAD trick" André Puel mentioned in a comment to the original question, is no trick, really. It is the standard method of adding functionality -- or more commonly, interposing existing functionality -- in a dynamically-linked process. It is standard functionality provided by ld.so, the Linux dynamic linker.
The Linux dynamic linker is controlled by environment variables (and configuration files); LD_PRELOAD is simply an environment variable that provides a list of dynamic libraries that should be linked against each process. (You could also add the library to /etc/ld.so.preload, in which case it is automatically loaded for every binary, regardless of the LD_PRELOAD environment variable.)
Here's an example, example.c:
#include <unistd.h>
#include <errno.h>
static void init(void) __attribute__((constructor));
static void wrerr(const char *p)
{
const char *q;
int saved_errno;
if (!p)
return;
q = p;
while (*q)
q++;
if (q == p)
return;
saved_errno = errno;
while (p < q) {
ssize_t n = write(STDERR_FILENO, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != (ssize_t)-1 || errno != EINTR)
break;
}
errno = saved_errno;
}
static void init(void)
{
wrerr("I am loaded and running.\n");
}
Compile it to libexample.so using
gcc -Wall -O2 -fPIC -shared example.c -ldl -Wl,-soname,libexample.so -o libexample.so
If you then run any (dynamically linked) binary with the full path to libexample.so listed in LD_PREALOD environment variable, the binary will output "I am loaded and running" to standard output before its normal output. For example,
LD_PRELOAD=$PWD/libexample.so date
will output something like
I am loaded and running.
Mon Jun 23 21:30:00 UTC 2014
Note that the init() function in the example library is automatically executed, because it is marked __attribute__((constructor)); that attribute means the function will be executed prior to main().
My example library may seem funny to you -- no printf() et cetera, wrerr() messing with errno --, but there are very good reasons I wrote it like this.
First, errno is a thread-local variable. If you run some code, initially saving the original errno value, and restoring that value just before returning, the interrupted thread will not see any change in errno. (And because it is thread-local, nobody else will see any change either, unless you try something silly like &errno.) Code that is supposed to run without the rest of the process noticing random effects, better make sure it keeps errno unchanged in this manner!
The wrerr() function itself is a simple function that writes a string safely to standard error. It is async-signal-safe (meaning you can use it in signal handlers, unlike printf() et al.), and other than errno which is kept unchanged, it does not affect the state of the rest of the process in any way. Simply put, it is a safe way to output strings to standard error. It is also simple enough for everbody to understand.
Second, not all processes use standard C I/O. For example, programs compiled in Fortran do not. So, if you try to use standard C I/O, it might work, it might not, or it might even confuse the heck out of the target binary. Using the wrerr() function avoids all that: it will just write the string to standard error, without confusing the rest of the process, no matter what programming language it was written in -- well, as long as that language's runtime does not move or close the standard error file descriptor (STDERR_FILENO == 2).
To load that library dynamically in a running process, you'll need to first attach ptrace to it, then stop it before next entry to a syscall (PTRACE_SYSEMU), to make sure you're somewhere you can safely do the dlopen call.
Check /proc/PID/maps to verify you are within the process' own code, not in shared library code. You can do PTRACE_SYSCALL or PTRACE_SYSEMU to continue to next candidate stopping point. Also, remember to wait() for the child to actually stop after attaching to it, and that you attach to all threads.
While stopped, use PTRACE_GETREGS to get the register state, and PTRACE_PEEKTEXT to copy enough code, so you can replace it with PTRACE_POKETEXT to a position-independent sequence that calls dlopen("/path/to/libexample.so", RTLD_NOW), RTLD_NOW being an integer constant defined for your architecture in /usr/include/.../dlfcn.h, typically 2. Since the pathname is constant string, you can save it (temporarily) over the code; the function call takes a pointer to it, after all.
Have that position-independent sequence you used to rewrite some of the existing code end with a syscall, so that you can run the inserted using PTRACE_SYSCALL (in a loop, until it ends up at that inserted syscall) without having to single-step it. Then you use PTRACE_POKETEXT to revert the code to its original state, and finally PTRACE_SETREGS to revert the program state to what its initial state was.
Consider this trivial program, compiled as say target:
#include <stdio.h>
int main(void)
{
int c;
while (EOF != (c = getc(stdin)))
putc(c, stdout);
return 0;
}
Let's say we're already running that (pid $(ps -o pid= -C target)), and we wish to inject code that prints "Hello, world!" to standard error.
On x86-64, kernel syscalls are done using the syscall instruction (0F 05 in binary; it's a two-byte instruction). So, to execute any syscall you want on behalf of a target process, you need to replace two bytes. (On x86-64 PTRACE_POKETEXT actually transfers a 64-bit word, preferably aligned on a 64-bit boundary.)
Consider the following program, compiled to say agent:
#define _GNU_SOURCE
#include <sys/ptrace.h>
#include <sys/user.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
struct user_regs_struct oldregs, regs;
unsigned long pid, addr, save[2];
siginfo_t info;
char dummy;
if (argc != 3 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s PID ADDRESS\n", argv[0]);
fprintf(stderr, "\n");
return 1;
}
if (sscanf(argv[1], " %lu %c", &pid, &dummy) != 1 || pid < 1UL) {
fprintf(stderr, "%s: Invalid process ID.\n", argv[1]);
return 1;
}
if (sscanf(argv[2], " %lx %c", &addr, &dummy) != 1) {
fprintf(stderr, "%s: Invalid address.\n", argv[2]);
return 1;
}
if (addr & 7) {
fprintf(stderr, "%s: Address is not a multiple of 8.\n", argv[2]);
return 1;
}
/* Attach to the target process. */
if (ptrace(PTRACE_ATTACH, (pid_t)pid, NULL, NULL)) {
fprintf(stderr, "Cannot attach to process %lu: %s.\n", pid, strerror(errno));
return 1;
}
/* Wait for attaching to complete. */
waitid(P_PID, (pid_t)pid, &info, WSTOPPED);
/* Get target process (main thread) register state. */
if (ptrace(PTRACE_GETREGS, (pid_t)pid, NULL, &oldregs)) {
fprintf(stderr, "Cannot get register state from process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Save the 16 bytes at the specified address in the target process. */
save[0] = ptrace(PTRACE_PEEKTEXT, (pid_t)pid, (void *)(addr + 0UL), NULL);
save[1] = ptrace(PTRACE_PEEKTEXT, (pid_t)pid, (void *)(addr + 8UL), NULL);
/* Replace the 16 bytes with 'syscall' (0F 05), followed by the message string. */
if (ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 0UL), (void *)0x2c6f6c6c6548050fULL) ||
ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 8UL), (void *)0x0a21646c726f7720ULL)) {
fprintf(stderr, "Cannot modify process %lu code: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Modify process registers, to execute the just inserted code. */
regs = oldregs;
regs.rip = addr;
regs.rax = SYS_write;
regs.rdi = STDERR_FILENO;
regs.rsi = addr + 2UL;
regs.rdx = 14; /* 14 bytes of message, no '\0' at end needed. */
if (ptrace(PTRACE_SETREGS, (pid_t)pid, NULL, &regs)) {
fprintf(stderr, "Cannot set register state from process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Do the syscall. */
if (ptrace(PTRACE_SINGLESTEP, (pid_t)pid, NULL, NULL)) {
fprintf(stderr, "Cannot execute injected code to process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Wait for the client to execute the syscall, and stop. */
waitid(P_PID, (pid_t)pid, &info, WSTOPPED);
/* Revert the 16 bytes we modified. */
if (ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 0UL), (void *)save[0]) ||
ptrace(PTRACE_POKETEXT, (pid_t)pid, (void *)(addr + 8UL), (void *)save[1])) {
fprintf(stderr, "Cannot revert process %lu code modifications: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Revert the registers, too, to the old state. */
if (ptrace(PTRACE_SETREGS, (pid_t)pid, NULL, &oldregs)) {
fprintf(stderr, "Cannot reset register state from process %lu: %s.\n", pid, strerror(errno));
ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL);
return 1;
}
/* Detach. */
if (ptrace(PTRACE_DETACH, (pid_t)pid, NULL, NULL)) {
fprintf(stderr, "Cannot detach from process %lu: %s.\n", pid, strerror(errno));
return 1;
}
fprintf(stderr, "Done.\n");
return 0;
}
It takes two parameters: the pid of the target process, and the address to use to replace with the injected executable code.
The two magic constants, 0x2c6f6c6c6548050fULL and 0x0a21646c726f7720ULL, are simply the native representation on x86-64 for the 16 bytes
0F 05 "Hello, world!\n"
with no string-terminating NUL byte. Note that the string is 14 characters long, and starts two bytes after the original address.
On my machine, running cat /proc/$(ps -o pid= -C target)/maps -- which shows the complete address mapping for the target -- shows that target's code is located at 0x400000 .. 0x401000. objdump -d ./target shows that there is no code after 0x4006ef or so. Therefore, addresses 0x400700 to 0x401000 are reserved for executable code, but do not contain any. The address 0x400700 -- on my machine; may very well differ on yours! -- is therefore a very good address for injecting code into target while it is running.
Running ./agent $(ps -o pid= -C target) 0x400700 injects the necessary syscall code and string to the target binary at 0x400700, executes the injected code, and replaces the injected code with original code. Essentially, it accomplishes the desired task: for target to output "Hello, world!" to standard error.
Note that Ubuntu and some other Linux distributions nowadays allow a process to ptrace only their child processes running as the same user. Since target is not a child of agent, you either need to have superuser privileges (run sudo ./agent $(ps -o pid= -C target) 0x400700), or modify target so that it explicitly allows the ptracing (for example, by adding prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY); near the start of the program). See man ptrace and man prctl for details.
Like I explained already above, for longer or more complicated code, use ptrace to cause the target to first execute mmap(NULL, page_aligned_length, PROT_READ | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0), which allocates executable memory for new code. So, on x86-64, you only need to locate one 64-bit word you can replace safely, and then you can PTRACE_POKETEXT the new code for the target to execute. While my example uses the write() syscall, it is a really small change to have it use mmap() or mmap2() syscall instead.
(On x86-64 in Linux, the syscall number is in rax, and parameters in rdi, rsi, rdx, r10, r8, and r9, reading from left to right, respectively; and return value is also in rax.)
Parsing /proc/PID/maps is very useful -- see /proc/PID/maps under man 5 proc. It provides all the pertinent information on the target process address space. To find out whether there are useful unused code areas, parse objdump -wh /proc/$(ps -o pid= -C target)/exe output; it examines the actual binary of the target process directly. (In fact, you could easily find how much unused code there is at the end of the code mapping, and use that automatically.)
Further questions?

Related

perf_event_open tracepoint with bpf for a specific cpu

I want to write a C program that triggers execution of a bpf program when a syscall
is executed on a specific CPU by any process/thread
So the idea is to do a perf_event_open(pattr, -1, {MY_CPU_NUM}, -1, 0) followed
by ret = ioctl(efd, PERF_EVENT_IOC_SET_BPF, prog_fd);. My BPF program increments a counter in a map, that I am reading.
The specific system call I am using in my example is sys_exit_unlinkat, and I am testing the program by command taskset --cpu-list {ANY_CPU_OTHER_THAN_MY_CPU_NUMBER} rm -rf {DIRECTORY}}.
I expect that if I command to remove directory from a different core than where I placed my perf event, I should not see my counter increment. However, I see my counter increment irrespective of the cpu argument I provide in perf_event_open.
I dont understand why!
I tried, seeing what does perf record -C XX do, and it shows up bunch of perf_event_open along with one perf_event_open with PERF_TYPE_TRACEPOINT with similar arguments as mine, and it works correctly that it shows output only when rm -rf is executed on the MY_CPU_NUM.
Code Snippet:
pattr.type = PERF_TYPE_TRACEPOINT;
pattr.size = sizeof(pattr);
pattr.config =721; //unlinkat // 723; // rmdir
pattr.sample_period = 1;
pattr.wakeup_events = 1;
pattr.disabled = 1;
pattr.exclude_guest = 1;
pattr.sample_type = PERF_SAMPLE_RAW;
efd = perf_event_open(&pattr, -1, 0, -1, 0); // cpu number is zero
if(efd < 0) {
printf("error in efd opening, %s\n", strerror(errno));
exit(1);
}
ret = ioctl(efd, PERF_EVENT_IOC_SET_BPF, prog_fd);
if (ret < 0) {
printf("PERF_EVENT_IOC_SET_BPF error: %s\n", strerror(errno));
exit(-1);
}
ret = ioctl(efd, PERF_EVENT_IOC_ENABLE, 0);
if (ret < 0) {
printf("PERF_EVENT_IOC_ENABLE error: %s\n", strerror(errno));
exit(-1);
}
output of uname -a
Linux zephyr 5.4.0-110-generic in my machine.
EDIT-1:
Okay, I tried some noob debugging by putting the kernel into gdb
and trying to figure out the issue.
So, in the syscall_exit path perf_syscall_exit(kernel/events/trace_syscalls.c) is called, which then looks if there is some perf event associated with the current cpu.
code snippet:
static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
{
...
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
return;
sys_data = syscall_nr_to_meta(syscall_nr);
if (!sys_data)
return;
head = this_cpu_ptr(sys_data->exit_event->perf_events);
valid_prog_array = bpf_prog_array_valid(sys_data->exit_event);
if (!valid_prog_array && hlist_empty(head)) // <--- WATCH
return;
...
Now, in the above code, see where I commented WATCH. So what it checks I think is, that if the program is invalid and the event list is empty, return. So, imagine if the program is valid yet the event list is empty, then irrespective whether cpu has an event attached or not, this check will not pass and we will go ahead exeucting the BPF program.
So, I checked by installing perf_event without attaching bpf program and I saw that the check passed and we did not go ahead when the rm -rf {DIRECTORY} was executed from a different cpu. And when I executed from the core 0(where event was attached), the check failed and the program proceeded ahead.
So does that mean, that in the kernel, we cannot attach BPF program to an event that is tied to a specific CPU? Is this a kernel bug? or design necessity?

how to get heap start address of a elf binary

I have written a sample program in C that uses libelf to dump the different sections.
However i want to dump the heap & stack segment starting address which is only available when the process is "live" (running).
This is what i do for a binary i read from disk.
elf_fd = open(elf_fname, O_RDWR);
if(elf_fd < 0) {
fprintf(stderr, "Failed to open \"%s\" reason=%s\n", elf_fname, strerror(errno));
return -1;;
}
followed by
if(elf_version(EV_CURRENT) == EV_NONE) {
fprintf(stderr, "Failed to initialize libelf\n");
res = -1;
goto done;
}
elf->e = elf_begin(elf->fd, ELF_C_READ, NULL);
if(!elf->e) {
int err = elf_errno();
if (err != 0) {
fprintf(stderr, "Failed to open ELF file code=%d reason=%s\n", err,
elf_errmsg(err));
}
res = -1;
goto done;
}
This works fine when i read the binary image on disk
For run time what i tried is instead of doing this
That is using the elf_fd returned by open
elf_fd = open(elf_fname, O_RDWR);
if(elf_fd < 0) {
fprintf(stderr, "Failed to open \"%s\" reason=%s\n", elf_fname, strerror(errno));
return -1;;
}
I instead do this
That is i get a handle from the pid of the current process
elf_fd = pidfd_open(getpid(), 0);
if (elf_fd == -1) {
perror("pidfd_open");
fprintf(stderr, "failed to open self %d\n", elf_fd);
exit(EXIT_FAILURE);
}
It returns me a valid descriptor but when i use this descriptor with
elf->e = elf_begin(elf->fd, ELF_C_READ, NULL);
if(!elf->e) {
int err = elf_errno();
if (err != 0) {
fprintf(stderr, "Failed to open ELF file code=%d reason=%s\n", err,
elf_errmsg(err));
}
}
It says "Invalid descriptor".
Question is how can i get heap & stack base address of a live process from within it
Also yes i did also try at the very start in main call
sbrk(0) & that seems to print the heap start address but this may not always be reliable as there maybe no heap without a malloc call prior
for now it does seem to print it.
Question is how can i get heap & stack base address of a live process from within it
Note that neither heap, nor stack have anything to do with the ELF format, or libelf.
There is no such thing as "heap base address" -- most modern heap allocators will perform multiple mmap calls to obtain memory from the OS, then "dole" it out to various malloc requests.
i did also try at the very start in main call sbrk(0)
"Legacy" malloc used to obtain memory using sbrk(), but few modern ones do. If the malloc you are using does use sbrk, then calling sbrk(0) near the start of main is a usable approximation.
For the main thread stack, you would want to do the same. A good first approximation is taking &argc, and rounding it up to page boundary.
If you want to get better approximation, you could use the fact that on Linux (and possibly other ELF platforms) the kernel puts specific values on the stack before invoking the entry point. Iterating through the __environ values looking for the highest address will give a better approximation.

Logging compatibly with logrotate

I am writing a Linux daemon that writes a log. I'd like the log to be rotated by logrotate. The program is written in C.
Normally, my program would open the log file when it starts, then write entries as needed and then, finally, close the log file on exit.
What do I need to do differently in order to support log rotation using logrotate? As far as I have understood, my program should be able to reopen the log file each time logrotate has finished it's work. The sources that I googled didn't, however, specify what reopening the log file exactly means. Do I need to do something about the old file and can I just create another file with the same name? I'd prefer quite specific instructions, like some simple sample code.
I also understood that there should be a way to tell my program when it is time to do the reopening. My program already has a D-Bus interface and I thought of using that for those notifications.
Note: I don't need instructions on how to configure logrotate. This question is only about how to make my own software compatible with it.
There are several common ways:
you use logrotate and your program should be able to catch a signal (usually SIGHUP) as a request to close and reopen its log file. Then logrotate sends the signal in a postrotate script
you use logrotate and your program is not aware of it, but can be restarted. Then logrotate restarts your program in a postrotate script. Cons: if the start of the program is expensive, this may be suboptimal
you use logrotate and your program is not aware of it, but you pass the copytruncate option to logrotate. Then logrotate copies the file and then truncates it. Cons: in race conditions you can lose messages. From rotatelog.conf manpage
... Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost...
you use rotatelogs, an utility for httpd Apache. Instead of writing directly to a file, you programs pipes its logs to rotatelogs. Then rotatelogs manages the different log files. Cons: your program should be able to log to a pipe or you will need to install a named fifo.
But beware, for critical logs, it may be interesting to close the files after each message, because it ensures that everything has reached the disk in case of an application crash.
Although man logrotate examples use the HUP signal, I recommend using USR1 or USR2, as it is common to use HUP for "reload configuration". So, in logrotate configuration file, you'd have for example
/var/log/yourapp/log {
rotate 7
weekly
postrotate
/usr/bin/killall -USR1 yourapp
endscript
}
The tricky bit is to handle the case where the signal arrives in the middle of logging. The fact that none of the locking primitives (other than sem_post(), which does not help here) are async-signal safe makes it an interesting issue.
The easiest way to do it is to use a dedicated thread, waiting in sigwaitinfo(), with the signal blocked in all threads. At exit time, the process sends the signal itself, and joins the dedicated thread. For example,
#define ROTATE_SIGNAL SIGUSR1
static pthread_t log_thread;
static pthread_mutex_t log_lock = PTHREAD_MUTEX_INITIALIZER;
static char *log_path = NULL;
static FILE *volatile log_file = NULL;
int log(const char *format, ...)
{
va_list args;
int retval;
if (!format)
return -1;
if (!*format)
return 0;
va_start(args, format);
pthread_mutex_lock(&log_lock);
if (!log_file)
return -1;
retval = vfprintf(log_file, format, args);
pthread_mutex_unlock(&log_lock);
va_end(args);
return retval;
}
void *log_sighandler(void *unused)
{
siginfo_t info;
sigset_t sigs;
int signum;
sigemptyset(&sigs);
sigaddset(&sigs, ROTATE_SIGNAL);
while (1) {
signum = sigwaitinfo(&sigs, &info);
if (signum != ROTATE_SIGNAL)
continue;
/* Sent by this process itself, for exiting? */
if (info.si_pid == getpid())
break;
pthread_mutex_lock(&log_lock);
if (log_file) {
fflush(log_file);
fclose(log_file);
log_file = NULL;
}
if (log_path) {
log_file = fopen(log_path, "a");
}
pthread_mutex_unlock(&log_lock);
}
/* Close time. */
pthread_mutex_lock(&log_lock);
if (log_file) {
fflush(log_file);
fclose(log_file);
log_file = NULL;
}
pthread_mutex_unlock(&log_lock);
return NULL;
}
/* Initialize logging to the specified path.
Returns 0 if successful, errno otherwise. */
int log_init(const char *path)
{
sigset_t sigs;
pthread_attr_t attrs;
int retval;
/* Block the rotate signal in all threads. */
sigemptyset(&sigs);
sigaddset(&sigs, ROTATE_SIGNAL);
pthread_sigmask(SIG_BLOCK, &sigs, NULL);
/* Open the log file. Since this is in the main thread,
before the rotate signal thread, no need to use log_lock. */
if (log_file) {
/* You're using this wrong. */
fflush(log_file);
fclose(log_file);
}
log_file = fopen(path, "a");
if (!log_file)
return errno;
log_path = strdup(path);
/* Create a thread to handle the rotate signal, with a tiny stack. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(65536);
retval = pthread_create(&log_thread, &attrs, log_sighandler, NULL);
pthread_attr_destroy(&attrs);
if (retval)
return errno = retval;
return 0;
}
void log_done(void)
{
pthread_kill(log_thread, ROTATE_SIGNAL);
pthread_join(log_thread, NULL);
free(log_path);
log_path = NULL;
}
The idea is that in main(), before logging or creating any other threads, you call log_init(path-to-log-file), noting that a copy of the log file path is saved. It sets up the signal mask (inherited by any threads you might create), and creates the helper thread. Before exiting, you call log_done(). To log something to the log file, use log() like you would use printf().
I'd personally also add a timestamp before the vfprintf() line, automatically:
struct timespec ts;
struct tm tm;
if (clock_gettime(CLOCK_REALTIME, &ts) == 0 &&
localtime_r(&(ts.tv_sec), &tm) == &tm)
fprintf(log_file, "%04d-%02d-%02d %02d:%02d:%02d.%03ld: ",
tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
tm.tm_hour, tm.tm_min, tm.tm_sec,
ts.tv_nsec / 1000000L);
This YYYY-MM-DD HH:MM:SS.sss format has the nice benefit that it is close to a worldwide standard (ISO 8601) and sorts in the correct order.
Normally, my program would open the log file when it starts, then
write entries as needed and then, finally, close the log file on exit.
What do I need to do differently in order to support log rotation
using logrotate?
No, your program should work as if it doesn't know anything about logrotate.
Do I need to do something about the old file and can I just create another file with the same name?
No. There should be only one log file to be opened and be written. Logrotate will check that file and if it becomes too large, it does copy/save the old part, and truncate the current log file. Therefore, your program should work completely transparent - it doesn't need to know anything about logrotate.

How to find if the underlying Linux Kernel supports Copy on Write?

For example, I am working on an ancient kernel and want to know whether it really implements Copy on Write. Is there a way ( preferably programattically in C ) to find out?
No, there isn't a reliable programmatic way to find that out from within a userland process.
The idea behind COW is that it should be fully transparent to the user code. Your code touches the individual pages, a page fault is invoked, the kernel copies the corresponding page and your process is resumed as if nothing had happened.
I casually stumbled upon this rather old question, and I see that other people already pointed out that it does not make much sense to "detect CoW" since Linux already implies CoW.
However I find this question pretty interesting, and while technically one should not be able to detect this kind of kernel mechanism which should be completely transparent to userspace processes, there actually are architecture specific ways (i.e. side-channels) that can be exploited to determine whether Copy on Write happens or not.
On x86 processors that support Restricted Transactional Memory, you can leverage the fact that memory transactions are aborted when an exception such as a page fault occurs. Given a valid address, this information can be used to detect if a page is resident in memory or not (similarly to the use of minicore(2)), or even to detect Copy on Write.
Here's a working example. Note: check that your processor supports RTM by looking at /proc/cpuinfo for the rtm flag, and compile using GCC without optimizations and with the -mrtm flag.
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <immintrin.h>
/* Use x86 transactional memory to detect a page fault when trying to write
* at the specified address, assuming it's a valid address.
*/
static int page_dirty(void *page) {
unsigned char *p = page;
if (_xbegin() == _XBEGIN_STARTED) {
*p = 0;
_xend();
/* Transaction successfully ended => no context switch happened to
* copy page into virtual memory of the process => page was dirty.
*/
return 1;
} else {
/* Transaction aborted => page fault happened and context was switched
* to copy page into virtual memory of the process => page wasn't dirty.
*/
return 0;
}
/* Should not happen! */
return -1;
}
int main(void) {
unsigned char *addr;
addr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap failed");
return 1;
}
// Write to trigger initial page fault and actually reserve memory
*addr = 123;
fprintf(stderr, "Initial state : %d\n", page_dirty(addr));
fputs("----- fork -----\n", stderr);
if (fork()) {
fprintf(stderr, "Parent before : %d\n", page_dirty(addr));
// Read (should NOT trigger Copy on Write)
*addr;
fprintf(stderr, "Parent after R: %d\n", page_dirty(addr));
// Write (should trigger Copy on Write)
*addr = 123;
fprintf(stderr, "Parent after W: %d\n", page_dirty(addr));
} else {
fprintf(stderr, "Child before : %d\n", page_dirty(addr));
// Read (should NOT trigger Copy on Write)
*addr;
fprintf(stderr, "Child after R : %d\n", page_dirty(addr));
// Write (should trigger Copy on Write)
*addr = 123;
fprintf(stderr, "Child after W : %d\n", page_dirty(addr));
}
return 0;
}
Output on my machine:
Initial state : 1
----- fork -----
Parent before : 0
Parent after R: 0
Parent after W: 1
Child before : 0
Child after R : 0
Child after W : 1
As you can see, writing to pages marked as CoW (in this case after fork), causes the transaction to fail because a page fault exception is triggered and causes a transaction abort. The changes are reverted by hardware before the transaction is aborted. After writing to the page, trying to do the same thing again results in the transaction correctly terminating and the function returning 1.
Of course, this should not really be used seriously, but merely be taken as a fun and interesting exercise. Since RTM transactions are aborted for any kind of exception and also for context switch, false negatives are possible (for example if the process is preempted by the kernel right in the middle of the transaction). Keeping the transaction code really short (in the above case just a branch and an assignment *p = 0) is essential. Multiple tests could also be made to avoid false negatives.

Using CreateFileMapping between two programs - C

I have two window form applications written in C, one holds a struct consisting of two integers, another will receive it using the CreateFileMapping.
Although not directly related I want to have three events in place so each of the processes can "speak" to each other, one saying that the first program has something to pass to the second, one saying the first one has closed and another saying the second one has closed.
What would be the best way about doing this exactly? I've looked at the MSDN entry for the CreateFileMapping operation but I'm still not sure as to how it should be done.
I didn't want to start implementing it without having some sort of clear idea as to what I need to do.
Thanks for your time.
A file mapping does not seem like the best way to handle this. It has a lot of overhead for simply sending two integers in one direction. For something like that, I'd consider something like a pipe. A pipe automates most of the other details, so (for example) attempting to read or write a pipe that's been closed on the other end will fail and GetLastError() will return ERROR_BROKEN_PIPE. To get the equivalent of the third event (saying there's something waiting) you work with the pipe in overlapped mode. You can wait on the pipe handle itself (see caveats in the documentation) or use an OVERLAPPED structure, which includes a handle for an event.
In answer to your question of how you WOULD do it if you wanted to used Shared Memory, you could use a byte in the shared memory to communicate between the two processes. Here is some sample code. You can easily replace the wait loops with semaphores
/
/ SharedMemoryServer.cpp : Defines the entry point for the console application.
//
//#include "stdafx.h"
#include <windows.h>
#include <stdio.h>
#include <conio.h> // getch()
#include <tchar.h>
#include "Aclapi.h" // SE_KERNEL_OBJECT
#define SM_NAME "Global\\SharedMemTest"
#define SIGNAL_NONE 0
#define SIGNAL_WANT_DATA 1
#define SIGNAL_DATA_READY 2
#define BUFF_SIZE 1920*1200*4
struct MySharedData
{
unsigned char Flag;
unsigned char Buff[BUFF_SIZE];
};
int _tmain(int argc, _TCHAR* argv[])
{
HANDLE hFileMapping = CreateFileMapping (INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE | SEC_COMMIT, 0, sizeof(MySharedData), SM_NAME);
if (hFileMapping == NULL)
printf ("CreateFileMapping failed");
else
{
// Grant anyone access
SetNamedSecurityInfo(SM_NAME, SE_KERNEL_OBJECT, DACL_SECURITY_INFORMATION, 0, 0, (PACL) NULL, NULL);
MySharedData* pSharedData = (MySharedData *) MapViewOfFile(hFileMapping, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, 0);
printf("Waiting for instructions\n");
while (pSharedData->Flag == SIGNAL_NONE) // Wait to be signaled for data
;
if (pSharedData->Flag == SIGNAL_WANT_DATA)
{
printf("Signal for data received\n");
size_t len = sizeof(pSharedData->Buff);
memset (pSharedData->Buff, 0xFF, len);
pSharedData->Flag = SIGNAL_DATA_READY;
printf("Data ready signal set\n");
// Wait for data to be consumed or up to 10 seconds
while (pSharedData->Flag != SIGNAL_NONE)
;
printf("Data consumed signal detected\n");
}
}
_getch();
return 0;
}
The client process would be equivalent but the code in the else case following the call to MapViewOfFile() would look something like this:
pSharedData->Flag = SIGNAL_WANT_DATA; // Signal for data
printf("Signal for data set\n");
while (pSharedData->Flag != SIGNAL_DATA_READY)
;
printf("Data ready signal detected\n");
if (pSharedData->Flag == SIGNAL_DATA_READY)
{
// Dump the first 10 bytes
printf ("Data received: %x %x %x %x %x %x %x %x %x %x\n",
pSharedData->Buff[0], pSharedData->Buff[1], pSharedData->Buff[2],
pSharedData->Buff[3], pSharedData->Buff[4], pSharedData->Buff[5],
pSharedData->Buff[6], pSharedData->Buff[7], pSharedData->Buff[8],
pSharedData->Buff[9]);
}
You can use CreateSemaphore and provide a name for the last parameter to create a named semaphore. Processes can share that semaphore (the other process would use OpenSemaphore). One process signals when the data is ready and the other can wait on it.
Having said this, I have to agree with Jerry that using a pipe might be a lot simpler to get it working. On the other hand, using the shared memory approach with semaphores (or events) may translate more simply to other platforms (e.g., Linux) if it becomes necessary to port it.

Resources