futex man page demo result incorrect

futex man page demo result incorrect - c

The futex man page provides a simple demo, but I can't get the result as the page described, the result seems to be deadlock on my machine (linux 5.2.1); the parent process isn't awaken by its child. Is the man page wrong?
Example of output on my machine:
[root#archlinux ~]# ./a.out
Child (12877) 0
Parent (12876) 0
Child (12877) 1
// block here
my system:
[root#archlinux ~]# uname -a
Linux archlinux 5.2.1-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 14 14:52:52 UTC 2019 x86_64 GNU/Linux

Indeed the example in the man page is erroneous, at two places the code deviates from the correct depiction in the corresponding comment.
/* Acquire the futex pointed to by 'futexp': wait for its value to
become 1, and then set the value to 0. */
static void
fwait(int *futexp)
{
int s;
/* atomic_compare_exchange_strong(ptr, oldval, newval)
atomically performs the equivalent of:
if (*ptr == *oldval)
*ptr = newval;
It returns true if the test yielded true and *ptr was updated. */
while (1) {
/* Is the futex available? */
const int zero = 0;
if (atomic_compare_exchange_strong(futexp, &zero, 1))
break; /* Yes */
has to be the other way round:
/* Is the futex available? */
if (atomic_compare_exchange_strong(futexp, &(int){1}, 0))
break; /* Yes */
/* Release the futex pointed to by 'futexp': if the futex currently
has the value 0, set its value to 1 and the wake any futex waiters,
so that if the peer is blocked in fpost(), it can proceed. */
static void
fpost(int *futexp)
{
int s;
/* atomic_compare_exchange_strong() was described in comments above */
const int one = 1;
if (atomic_compare_exchange_strong(futexp, &one, 0)) {
…
has to be the other way round:
if (atomic_compare_exchange_strong(futexp, &(int){0}, 1)) {
…

Related

What is the purpose of uint64_t ps_pledge varibale in OpenBSD kernel code?

I am playing with OpenBSD kernel code, especially this file sys/kern/sched_bsd.c.
void
schedcpu(void *arg)
{
......
......
LIST_FOREACH(p, &allproc, p_list) {
/*
* Increment sleep time (if sleeping). We ignore overflow.
*/
if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
p->p_slptime++;
p->p_pctcpu = (p->p_pctcpu * ccpu) >> FSHIFT;
/*
* If the process has slept the entire second,
* stop recalculating its priority until it wakes up.
*/
if (p->p_slptime > 1)
continue;
SCHED_LOCK(s);
/*
* p_pctcpu is only for diagnostic tools such as ps.
*/
....
....
LIST_FOREACH(TYPE *var, LIST_HEAD *head, LIST_ENTRY NAME);
The macro LIST_FOREACH traverses the list referenced by head in the forward direction, assigning each element in turn to var.
Now, here, p will contain the address of struct proc structure of eery process which is in the file
sys/sys/proc.h
Now, again, this structure contains another struct process *p_p structure, which denotes the properties of every process like its pid, flags, threads etc.
struct proc {
TAILQ_ENTRY(proc) p_runq;
LIST_ENTRY(proc) p_list; /* List of all threads. */
struct process *p_p; /* The process of this thread. */
TAILQ_ENTRY(proc) p_thr_link; /* Threads in a process linkage. */
TAILQ_ENTRY(proc) p_fut_link; /* Threads in a futex linkage. */
struct futex *p_futex; /* Current sleeping futex. */
/* substructures: */
struct filedesc *p_fd; /* copy of p_p->ps_fd */
struct vmspace *p_vmspace; /* copy of p_p->ps_vmspace */
#define p_rlimit p_p->ps_limit->pl_rlimit
....
....
Now, stucture struct process contains uint64_t ps_plegde.
struct process {
/*
* ps_mainproc is the original thread in the process.
* It's only still special for the handling of p_xstat and
* some signal and ptrace behaviors that need to be fixed.
*/
struct proc *ps_mainproc;
struct ucred *ps_ucred; /* Process owner's identity. */
....
....
u_short ps_acflag; /* Accounting flags. */
uint64_t ps_pledge;
uint64_t ps_execpledge;
....
....
Now, I wrote some modification in void schedcpu() function code.
void
schedcpu(void *arg)
{
pid_t pid;
uint64_t pledge_bit;
....
....
LIST_FOREACH(p, &allproc, p_list) {
pid=p->p_p->pid;
pledge_bit=p->p_p->ps_pledge;
if (pledge_bit) {
printf("pid: %10d pledge_bit: %10llu pledge_xbit:%10llx\n",pid,pledge_bit,pledge_bit);
}
/*
* Increment sleep time (if sleeping). We ignore overflow.
*/
if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
p->p_slptime++;
p->p_pctcpu = (p->p_pctcpu * ccpu) >> FS
....
....
Here, Kernel log
pid: 37846 pledge_bit: 393359 pledge_xbit: 6008f
pid: 96037 pledge_bit: 393544 pledge_xbit: 60148
pid: 86032 pledge_bit: 264297 pledge_xbit: 40869
pid: 72264 pledge_bit: 393480 pledge_xbit: 60108
pid: 40102 pledge_bit: 8 pledge_xbit: 8
pid: 841 pledge_bit: 2148162527 pledge_xbit: 800a5bdf
pid: 49970 pledge_bit: 2148096143 pledge_xbit: 8009588f
pid: 68505 pledge_bit: 40 pledge_xbit: 28
pid: 46106 pledge_bit: 72 pledge_xbit: 48
pid: 77690 pledge_bit: 537161 pledge_xbit: 83249
pid: 44005 pledge_bit: 262152 pledge_xbit: 40008
pid: 82731 pledge_bit: 2148096143 pledge_xbit: 8009588f
pid: 71609 pledge_bit: 262472 pledge_xbit: 40148
pid: 54330 pledge_bit: 662063 pledge_xbit: a1a2f
pid: 77764 pledge_bit: 1052776 pledge_xbit: 101068
pid: 699 pledge_bit: 2148096143 pledge_xbit: 8009588f
pid: 84265 pledge_bit: 1052776 pledge_xbit: 101068
....
....
Now, Is that possible to know which process pledge what permissions, from looking at pledge_bit (decimal or hex values) that I got from above output?
I took pledge hex value of dhclient process i.e 0x8009588f, then, I wrote a sample hello world program with pledge("STDIO",NULL); and again I looked at dmesg and got same pledge_bit for hello world i.e 0x8009588f.
Then, this time I looked at dhclient source code and found out that, dhclient code pledged pledge("stdio inet dns route proc", NULL).
But, then, how is it possible to get same pledge hex bit for different pledge parameters?

I have got my answer after reading source code again and again.
So, I just want to contribute my learning so that future developers won't face any problem regarding this issue or confusion.
First understanding:
I wrote hello world like this,
void
main() {
pledge("STDIO", NULL); /* wrong use of pledge call */
printf("Hello world\n");
}
Correction of above code:
void
main() {
if (pledge("stdio", NULL) == -1) {
printf("Error\n");
}
printf("Hello world\n");
}
I forgot to check the return value from pledge().
Second understanding:
dhclient.c code contains pledge() call as:
int
main(int argc, char *argv[])
{
struct ieee80211_nwid nwid;
struct ifreq ifr;
struct stat sb;
const char *tail_path = "/etc/resolv.conf.tail";
....
....
fork_privchld(ifi, socket_fd[0], socket_fd[1]);
....
....
if ((cmd_opts & OPT_FOREGROUND) == 0) {
if (pledge("stdio inet dns route proc", NULL) == -1)
fatal("pledge");
} else {
if (pledge("stdio inet dns route", NULL) == -1)
fatal("pledge");
}
....
....
void
fork_privchld(struct interface_info *ifi, int fd, int fd2)
{
struct pollfd pfd[1];
struct imsgbuf *priv_ibuf;
ssize_t n;
int ioctlfd, routefd, nfds, rslt;
switch (fork()) {
case -1:
fatal("fork");
break;
case 0:
break;
default:
return;
}
....
....
}
Now, I wrote sample hello world code which contains the same parameters as dhclient:
void
main() {
if(pledge("stdio inet proc route dns", NULL) == -1) {
printf("Error\n");
}
while(1) {}
}
Now, I tested this above sample hello world code and got pledge_bit as 0x101068 in hex, which is correct.
But, as I told you earlier in my question, I became confused, when I saw different pledge_bit for dhclient. Because, as you guys know that both have same parameters in their pledge().
Then, how is it possible?
Now, here is the catch,
After continuously looking at dhclient source code, I found one function named as fork_privchld().
This function called before pledging in dhclient, so, it's like there is no pledge for this function as it is called before pledging.
So, just for verification again I wrote sample hello world code, but this time without any pledge() syscall.
void
main() {
printf("hello\n");
}
And, guess what, I got the same pledge_bit as 0x8009588f.
So, after this verification, it is verified that fork_privchld() function doesn't have any pledge bit set because as it is called before pledging in dhclient.
This function creates a [priv] child process for dhclient.
# ps aux|grep dhclient
root 26416 0.0 0.1 608 544 ?? Is 8:36AM 0:00.00 dhclient: em0 [priv] (dhclient)
_dhcp 33480 0.0 0.1 744 716 ?? Isp 8:36AM 0:00.00 dhclient: em0 (dhclient)
And, I don't know why I was only looking at the first process i.e [priv] (dhclient). This process is the one which is created by fork_privchld() function.
That's why this process has 0x8009588f pledge bit (due to called before pledging, so, no pledging at this point).
And, when I checked the second process i.e _dhcp, then I got my expected pledge_bit i.e 0x101068 (due to pledging).
So, I hope things will clear after reading this.
Note: Please, feel free to update me, if anything I forgot or missed.

Can we read and fault-inject another thread's program counter?

Assume that we have a single thread program and we hope to capture the value of program counter (PC) when a predefined interrupt occurs (like a timer interrupt).
It seems easy as you know we just write a specific assembly code using a special keyword __asm__ and pop the value on the top of the stack after making a shift 4 byte.
What about Multithreaded programs ?
How can we get values of all threads from another thread which run in the same process? (It seems extremely incredible to get values from thread which run on a separate core in multi-core processors).
(in multithreaded programs, every thread has its stack and registers too).
I want to implement a saboteur thread.
in order to perform fault injection in the target multi-threaded program, the model of fault is SEU (single error upset) which means that an arbitrary bit in the program counter register modified randomly (bit-flip) causing to violate the right program sequence. therefore, control flow error (CFE) occurs.
Since our target program is a multi-threaded program, we have to perform fault injection on all threads' PC. This is the task of saboteur tread. It should be able to obtain threads' PC to perform fault injection.
assume we have this code,
main ()
{
foo
}
void foo()
{
__asm__{
pop "%eax"
pop "%ebx" // now ebx holds porgram counter value (for main thread)
// her code injection like 00000111 XOR ebx for example
push ...
push ...
};
}
If our program was a multithreaded program.
is it means that we have more than one stack?
when OS perform context switching, it means that the stack and registers of the thread that was running moved to some place in the memory. Does this mean that if we want to get the values of the program counter for those threads, we find them in memory? where? and is it possible during run-time?

When you install a signal handler using sigaction() with SA_SIGINFO in the flags, the second parameter the signal handler gets is a pointer to siginfo_t, and the third parameter is a pointer to an ucontext_t. In Linux, this structure contains, among other things, the set of register values when the kernel interrupted the thread, including program counter.
#define _POSIX_C_SOURCE 200809L
#define _GNU_SOURCE
#include <signal.h>
#include <ucontext.h>
#if defined(__x86_64__)
#define PROGCOUNTER(ctx) (((ucontext *)ctx)->uc_mcontext.greg[REG_RIP])
#elif defined(__i386__)
#define PROGCOUNTER(ctx) (((ucontext *)ctx)->uc_mcontext.greg[REG_EIP])
#else
#error Unsupported architecture.
#endif
void signal_handler(int signum, siginfo_t *info, void *context)
{
const size_t program_counter = PROGCOUNTER(context);
/* Do something ... */
}
As usual, printf() et al. are not async-signal safe, which means it is not safe to use them in a signal handler. If you wish to output the program counter to e.g. standard error, you should not use any of the standard I/O to print to stderr, and instead construct the string to be printed by hand, and use a loop to write() the contents of the string; for example,
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
static void wrerr(const char *p)
{
const int saved_errno = errno;
const char *q = p;
ssize_t n;
/* Nothing to print? */
if (!p || !*p)
return;
/* Find end of q. strlen() is not async-signal safe. */
while (*q) q++;
/* Write data from p to q. */
while (p < q) {
n = write(STDERR_FILENO, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != -1 || errno != EINTR)
break;
}
errno = saved_errno;
}
Note that you'll want to keep the value of errno unchanged in the signal handler, so that if interrupted after a failed library function, the interrupted thread still sees the correct errno value. (It's mostly a debugging issue, and "good form"; some idiots pooh-pooh this as "it does not happen often enough for me to worry about".)
Your program can examine the /proc/self/maps pseudofile (it is not a real file, but something that the kernel generates on the fly when the file is read) to see the memory regions used by the program, to determine whether the program was running a C library function (very common) or something else when the interrupt was delivered.
If you wish to interrupt a specific thread in a multi-threaded program, just use pthread_kill(). Otherwise the signal is delivered to one of the threads that has not blocked the signal, more or less at random.
Here is an example program, that is tested to in x86-64 (AMD64) and x86, when compiled with GCC-4.8.4 using -Wall -O2:
#define _POSIX_C_SOURCE 200809L
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <ucontext.h>
#include <time.h>
#include <stdio.h>
#if defined(__x86_64__)
#define PROGRAM_COUNTER(mctx) ((mctx).gregs[REG_RIP])
#define STACK_POINTER(mctx) ((mctx).gregs[REG_RSP])
#elif defined(__i386__)
#define PROGRAM_COUNTER(mctx) ((mctx).gregs[REG_EIP])
#define STACK_POINTER(mctx) ((mctx).gregs[REG_ESP])
#else
#error Unsupported hardware architecture.
#endif
#define MAX_SIGNALS 64
#define MCTX(ctx) (((ucontext_t *)ctx)->uc_mcontext)
static void wrerr(const char *p, const char *q)
{
while (p < q) {
ssize_t n = write(STDERR_FILENO, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != -1 || errno != EINTR)
break;
}
}
static const char hexc[16] = "0123456789abcdef";
static inline char *prehex(char *before, size_t value)
{
do {
*(--before) = hexc[value & 15];
value /= (size_t)16;
} while (value);
*(--before) = 'x';
*(--before) = '0';
return before;
}
static volatile sig_atomic_t done = 0;
static void handle_done(int signum)
{
done = signum;
}
static int install_done(const int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_handler = handle_done;
act.sa_flags = 0;
if (sigaction(signum, &act, NULL) == -1)
return errno;
return 0;
}
static size_t jump_target[MAX_SIGNALS] = { 0 };
static size_t jump_stack[MAX_SIGNALS] = { 0 };
static void handle_jump(int signum, siginfo_t *info, void *context)
{
const int saved_errno = errno;
char buffer[128];
char *p = buffer + sizeof buffer;
*(--p) = '\n';
p = prehex(p, STACK_POINTER(MCTX(context)));
*(--p) = ' ';
*(--p) = 'k';
*(--p) = 'c';
*(--p) = 'a';
*(--p) = 't';
*(--p) = 's';
*(--p) = ' ';
*(--p) = ',';
p = prehex(p, PROGRAM_COUNTER(MCTX(context)));
*(--p) = ' ';
*(--p) = '#';
wrerr(p, buffer + sizeof buffer);
if (signum >= 0 && signum < MAX_SIGNALS) {
if (jump_target[signum])
PROGRAM_COUNTER(MCTX(context)) = jump_target[signum];
if (jump_stack[signum])
STACK_POINTER(MCTX(context)) = jump_stack[signum];
}
errno = saved_errno;
}
static int install_jump(const int signum, void *target, size_t stack)
{
struct sigaction act;
if (signum < 0 || signum >= MAX_SIGNALS)
return errno = EINVAL;
jump_target[signum] = (size_t)target;
jump_stack[signum] = (size_t)stack;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_sigaction = handle_jump;
act.sa_flags = SA_SIGINFO;
if (sigaction(signum, &act, NULL) == -1)
return errno;
return 0;
}
int main(int argc, char *argv[])
{
const struct timespec sec = { .tv_sec = 1, .tv_nsec = 0L };
const int pid = (int)getpid();
ucontext_t ctx;
printf("Run\n");
printf("\tkill -KILL %d\n", pid);
printf("\tkill -TERM %d\n", pid);
printf("\tkill -HUP %d\n", pid);
printf("\tkill -INT %d\n", pid);
printf("or press Ctrl+C to stop this process, or\n");
printf("\tkill -USR1 %d\n", pid);
printf("\tkill -USR2 %d\n", pid);
printf("to send the respective signal to this process.\n");
fflush(stdout);
if (install_done(SIGTERM) ||
install_done(SIGHUP) ||
install_done(SIGINT) ) {
printf("Cannot install signal handlers: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
getcontext(&ctx);
if (install_jump(SIGUSR1, &&usr1_target, STACK_POINTER(MCTX(&ctx))) ||
install_jump(SIGUSR2, &&usr2_target, STACK_POINTER(MCTX(&ctx))) ) {
printf("Cannot install signal handlers: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
/* These are expressions that should evaluate to false, but the compiler
* should not be able to optimize them away. */
if (argv[0][1] == 'A') {
usr1_target:
fputs("USR1\n", stdout);
fflush(stdout);
}
if (argv[0][1] == 'B') {
usr2_target:
fputs("USR2\n", stdout);
fflush(stdout);
}
while (!done) {
putchar('.');
fflush(stdout);
nanosleep(&sec, NULL);
}
fputs("\nAll done.\n", stdout);
fflush(stdout);
return EXIT_SUCCESS;
}
If you save the above as example.c, you can compile it using
gcc -Wall -O2 example.c -o example
and run it
./example
Press Ctrl+C to exit the program. Copy the commands (for sending SIGUSR1 and SIGUSR2 signals), and run them from another window, and you'll see they modify the position for current execution. (The signals cause the program counter/instruction pointer to jump back, into an if clause that should never be executed otherwise.)
There are two sets of signal handlers. handle_done() just sets the done flag. handle_jump() outputs a message to standard error (using low-level I/O), and if specified, updates the program counter (instruction pointer) and stack pointer.
The stack pointer is the tricky part when creating an example program like this. It would be easy if we were satisfied with just crashing the program. However, an example is only useful if it works.
When we arbitrarily change the program counter/instruction pointer, and the interrupt was delivered when in a function call (most C library functions...), the return address is left on the stack. The kernel can deliver the interrupt at any point, so we cannot even assume that the interrupt was delivered when in a function call, either! So, to make sure the test program does not crash, I had to update the program counter/instruction pointer and stack pointer as a pair.
When a jump signal is received, the stack pointer is reset to a value I obtained using getcontext(). This is not guaranteed to be suitable for any jump location; it's just the best I could do for a minimal example. I definitely assume the jump labels are nearby, and not in subscopes where the compiler is likely to mess with the stack, mind you.
It is also important to keep in mind that because we are dealing with details left to the C compiler, we must conform to whatever binary code the compiler produces, not the other way around. For reliable manipulation of a process and its threads, ptrace() is a much better (and honestly, easier) interface. You just set up a parent process, and in the target traced child process, explicitly allow the tracing. I've shown examples here and here (both answers to the same question) on how to start, stop, and single-step individual threads in a target process. The hardest part is understanding the overall scheme, the concepts; the code itself is easier -- and much, much more robust than this signal-handler-context-manipulation way.
For self-introducing register errors (either to program counter/instruction pointer, or to any other register), with the assumption that most of the time that leads to the process crashing, this signal handler context manipulation should be sufficient.

No, it's not possible while a thread is executing. While a thread is executing, the current value of its program counter (EIP) is private to the CPU core it's running on. It's not available in memory anywhere.
It would be possible for an architecture to have special instructions to send inter-processor requests with queries about execution state, but x86 doesn't have this.
However, you can use ptrace system calls to do anything a debugger could; interrupt another thread and modify any of its state (general purpose registers, flags, program counter, etc. etc.) I can't give you an example, I just know that's the system call that debuggers use to modify the saved state of another thread / process. For example, this question asks about modifying another process's RIP using ptrace (for testing code-injection).
I'm not sure it's viable to ptrace one thread from another thread in the same process; your fault injector might work better as a separate process that interferes with the threads of another process.
Anyway, what will happen when you make a ptrace system call to modify something in another thread is that the CPU running your system call will send and inter-processor message to the kernel on the CPU running the other thread, which will interrupt that thread you want to mess with. Its state will be saved into memory by the kernel, where it can be modified by any CPU.
Once the other thread stops running, it isn't strongly associated with any CPU anymore. It will be cheaper to resume it on the CPU that already has hot caches for it, but that isn't guaranteed because that CPU could have started running any other thread once it was no longer busy running the thread you caused to be stopped.
Side note, not relevant to inter-thread fault injection:
Your C function for modifying EIP (foo()) is really ugly, BTW:
First of all, it's MSVC inline asm, so no Linux compiler will accept it (maybe icc?). Second, it only works with -fno-omit-frame-pointer, because it assumes that its inside a function that's pushed %ebp.
It would be so much easier to just write the whole function in asm. In 64bit non-inline asm, you'd just write:
global fault_inject_program_counter
fault_inject_program_counter:
xor qword [rsp], 0b00000111
ret
and assemble that file separately with NASM or YASM, and link the .o with code that calls it. (I'm assuming you'd prefer Intel syntax, since you used MSVC-style asm {} instead GNU C asm("pop ; ... ; "::: ); inline asm.)
an inline asm version might look like:
// this can't possibly work if inlined, or if compiled without `-fno-omit-frame-pointer
__attribute__((noinline)) void foo()
{
__asm__ volatile(
// "pop %eax\n\t"
// "pop %ebx\n\t" // now ebx holds the return address
// here code injection like 00000111 XOR ebx for example
// normal people would just write
"xorl $0b00000111, -4(%esp)\n\t"
// to modify the return value in-place, in a function with a frame pointer.
// push ...
// push ...
);
}

Time interrupt in C

with an ATmega328 I coded a time ago a LED-Matrix.
To achieve this, I used on the ATmega328 a timer interrupt (ISR), which was called every 10ms, so the Matrix didn't flickered.
Now I asked me, if operating systems (Windows, Linux, MacOSX) can do the same.
(I would expect it.)
Can somebody give me more infos, I don't find anything in the WWW.
I'd like to call a function every 1ms (just for learning).

There is two different ways to do it in kernel space or user space,
in kernel space I think you can do it on any OSes at least on Linux and Windows for sure.
From user space there is different real time extension of Linux and Windows,
for example https://rt.wiki.kernel.org/index.php/Main_Page, without usage of this modifications I doubt that you can call some function with 1ms period from user space.

Ok, I compiled now the latest kernel (Ubuntu Linux 15.04).
$ uname -a
Linux comments 4.1.10-rt11 #1 SMP PREEMPT RT Thu Nov 5 14:29:16 CET 2015 x86_64 x86_64 x86_64 GNU/Linux
Now I found this example:
int main(int argc, char* argv[])
{
struct timespec t;
struct sched_param param;
int interval = 50000; /* 50us*/
/* Declare ourself as a real time task */
param.sched_priority = MY_PRIORITY;
if(sched_setscheduler(0, SCHED_FIFO, &param) == -1) {
perror("sched_setscheduler failed");
exit(-1);
}
/* Lock memory */
if(mlockall(MCL_CURRENT|MCL_FUTURE) == -1) {
perror("mlockall failed");
exit(-2);
}
/* Pre-fault our stack */
stack_prefault();
clock_gettime(CLOCK_MONOTONIC ,&t);
/* start after one second */
t.tv_sec++;
while(1) {
/* wait until next shot */
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t, NULL);
/* do the stuff */
/* calculate next shot */
t.tv_nsec += interval;
while (t.tv_nsec >= NSEC_PER_SEC) {
t.tv_nsec -= NSEC_PER_SEC;
t.tv_sec++;
}
}
}
But I have more question, how can I determine the priority or how I can find out the priority from other running applications?
Does it works with threads too (libpthread)?

Why sem_open doesn't return the same value for the same name?

I'm running the program below, and as per sem_open's man page :
If a process makes repeated calls to sem_open(), with the same name
argument, the same descriptor is returned for each successful call,
unless sem_unlink() has been called on the semaphore in the interim.
I would expect sem1 and sem2 to be equal but seems like that they are not. The program prints semaphores are not equal to stdout.
Program:
#include <err.h>
#include <semaphore.h>
#include <stdio.h>
int main(int argc, const char * argv[]) {
const char *sem_name = "/sem";
sem_t *sem1, *sem2;
sem1 = sem_open(sem_name, O_CREAT, 0777, 0);
sem2 = sem_open(sem_name, O_CREAT, 0777, 0);
if (sem1 == SEM_FAILED || sem2 == SEM_FAILED) {
printf("SEM_VALUE_MAX is %ud\n", SEM_VALUE_MAX);
err(1, "SEM_FAILED");
}
if (sem1 != sem2) {
printf("semaphores are not equal\n");
return (2);
}
return (0);
}
Some additional information about my environment:
(jalcazar#mac ~)$ uname -a
Darwin mac.local 13.2.0 Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64 x86_64
Also The Open Group Base Specifications Issue 7 says:
If a process makes multiple successful calls to sem_open() with the
same value for name, the same semaphore address shall be returned for
each such successful call.
I feel like I'm missing something very basic but haven't figured out what is it.
Any hint?
Edit:
A slightly modified version works as expected on Ubuntu 13.04 and FreeBSD 10.0.
It prints semaphores are not equal on OpenBSD 5.5 but since only bbf44dc795572df9c53f06b4ba06c4e51d8660a7502b8a0cd0b2b43081af314f.sem is in /tmp it would make sense to assume that it is the same semaphore.

You can't compare a sem_t* with == to determine if it's the same semaphore. sem1 == sem2 will check if the twp pointers are equal in the sense they both came from the same sem_open() call, not if they refer to the same underlying semaphore. The wording of the OSX man page sounds a bit dishonest in this regard.
There are however no functions or operators you can use to check if two sem_t* refers to the same semaphore object, albeit if you add these lines, it's reasonalbe to exepct they are if the output is 2:
int val = 9;
sem_post(sem1);
sem_post(sem1);
sem_getvalue(sem2, &val);
printf("sem2 value=%d\n", val);
Think of the sem_t* just as a handle that might refer to the same underlying semaphore. This is quite similar to calling open() twice on the same file, you get two different file descriptors back but they still refer to the same file.

Execute/invoke user-space program, and get its pid, from a kernel module

I checked out Kernel APIs, Part 1: Invoking user - space applications from the kernel, and Executing a user-space function from the kernel space - Stack Overflow - and here is a small kernel module, callmodule.c, demonstrating that:
// http://people.ee.ethz.ch/~arkeller/linux/code/usermodehelper.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <asm/uaccess.h>
static int __init callmodule_init(void)
{
int ret = 0;
char userprog[] = "/path/to/mytest";
char *argv[] = {userprog, "2", NULL };
char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
printk("callmodule: init %s\n", userprog);
/* last parameter: 1 -> wait until execution has finished, 0 go ahead without waiting*/
/* returns 0 if usermode process was started successfully, errorvalue otherwise*/
/* no possiblity to get return value of usermode process*/
ret = call_usermodehelper(userprog, argv, envp, UMH_WAIT_EXEC);
if (ret != 0)
printk("error in call to usermodehelper: %i\n", ret);
else
printk("everything all right\n");
return 0;
}
static void __exit callmodule_exit(void)
{
printk("callmodule: exit\n");
}
module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");
... with Makefile:
obj-m += callmodule.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
When I run this via sudo insmod ./callmodule.ko && sudo rmmod callmodule, I get in /var/log/syslog:
Feb 10 00:42:45 mypc kernel: [71455.260355] callmodule: init /path/to/mytest
Feb 10 00:42:45 mypc kernel: [71455.261218] everything all right
Feb 10 00:42:45 mypc kernel: [71455.286131] callmodule: exit
... which apparently means all went fine. (Using Linux 2.6.38-16-generic #67-Ubuntu SMP)
My question is - how can I get the PID of the process instantiated from a kernel module? Is there a similar process other than call_usermodehelper, that will allow me to instantiate a user-space process in kernel space, and obtain its pid?
Note that it may not be possible to use call_usermodehelper and get the instantiated process PID:
Re: call_usermodehelper's pid ? — Linux Kernel Newbies
I want to create a user space process from within
a kernel module, and be able to kill it, send signals
to it, etc...
can I know its pid ?
No, you can't. But since inside the implementation the pid is known,
patch that makes it available would not be too hard (note, that errors
are always negative in kernel and pids are positive, limited to 2**16).
You would have to modify all callers that expect 0 on success though.
I poked around the sources a bit, and it seems ultimately there is a call chain: call_usermodehelper -> call_usermodehelper_setup -> __call_usermodehelper, which looks like:
static void __call_usermodehelper(struct work_struct *work)
{
struct subprocess_info *sub_info =
container_of(work, struct subprocess_info, work);
// ...
if (wait == UMH_WAIT_PROC)
pid = kernel_thread(wait_for_helper, sub_info,
CLONE_FS | CLONE_FILES | SIGCHLD);
else
pid = kernel_thread(____call_usermodehelper, sub_info,
CLONE_VFORK | SIGCHLD);
...
... so a PID of a kernel thread is used, but it is saved nowhere; also, neither work_struct nor subprocess_info have a pid field (task_struct does, but nothing here seems to use task_struct). Recording this pid would require changing the kernel sources - and as I'd like to avoid that, this is the reason why I'm also interested in approaches other than call_usermodehelper ...

A tentative answer from my understanding of the implementation in kmod.c.
If you look at the code of call_usermodehelper you will see it calls call_usermodehelper_setup and then call_usermodehelper_exec.
call_usermodehelper_setup takes as parameter an init function that will be executed just before the do_execve. I believe the value of current when the init function gets executed will get you the task_struct of the user process.
So to get the pid you will need to:
Copy the implementation of call_usermodehelper in your code.
Define an init function that you will pass as parameter to call_usermodehelper_setup.
In the init function retrieves the current task_struct and in turn the PID.

Well, this was tedious... Below is a rather hacky way to achieve this, at least on my platform, as callmodule.c (same Makefile as above can be used). As I cannot believe that this is the way this should be done, more proper answers are still welcome (hopefully, also, with a code example I could test). But at least, it does the job as a kernel module only - without the need to patch the kernel itself - for the 2.6.38 version, which was quite important to me.
Basically, I copied all functions (renamed with a "B" suffix), until the point where the PID is available. Then I use a copy of subprocess_info with an extra field to save it (although that is not strictly necessary: in order not to mess with function signatures in respect to return value, I have to save the pid as a global variable anyway; left it as an exercise). Now, when I run sudo insmod ./callmodule.ko && sudo rmmod callmodule, in /var/log/syslog I get:
Feb 10 18:53:02 mypc kernel: [ 2942.891886] callmodule: > init /path/to/mytest
Feb 10 18:53:02 mypc kernel: [ 2942.891912] callmodule: symbol # 0xc1065b60 is wait_for_helper+0x0/0xb0
Feb 10 18:53:02 mypc kernel: [ 2942.891923] callmodule: symbol # 0xc1065ed0 is ____call_usermodehelper+0x0/0x90
Feb 10 18:53:02 mypc kernel: [ 2942.891932] callmodule:a: pid 0
Feb 10 18:53:02 mypc kernel: [ 2942.891937] callmodule:b: pid 0
Feb 10 18:53:02 mypc kernel: [ 2942.893491] callmodule: : pid 23306
Feb 10 18:53:02 mypc kernel: [ 2942.894474] callmodule:c: pid 23306
Feb 10 18:53:02 mypc kernel: [ 2942.894483] callmodule: everything all right; pid 23306
Feb 10 18:53:02 mypc kernel: [ 2942.894494] callmodule: pid task a: ec401940 c: mytest p: [23306] s: runnable
Feb 10 18:53:02 mypc kernel: [ 2942.894502] callmodule: parent task a: f40aa5e0 c: kworker/u:1 p: [14] s: stopped
Feb 10 18:53:02 mypc kernel: [ 2942.894510] callmodule: - mytest [23306]
Feb 10 18:53:02 mypc kernel: [ 2942.918500] callmodule: < exit
One of the nasty problems here, is that once you start copying functions, at a certain time you come to a point, where kernel functions are used which are not exported, like in this case wait_for_helper. What I did was basically look in /proc/kallsyms (remember sudo!) to get absolute addresses for e.g. wait_for_helper, then hardcoded those in the kernel module as function pointers - seems to work. Another problem is that functions in kernel source refer to enum umh_wait, which cannot be used as argument from the module (those need to simply be converted to use int instead).
So the module starts the user-space process, gets the PID (noting that "What the kernel calls PIDs are actually kernel-level thread ids (often called TIDs) ... What's considered a PID in the POSIX sense of "process", on the other hand, is called a "thread group ID" or "TGID" in the kernel."), gets the corresponding task_struct and its parent, and tries to list all children of the parent and of the spawned process itself. So I can see that kworker/u:1 is typically the parent, and it has no other children than mytest - and since mytest is very simple (in my case, just a single write to disk file), it spawns no threads of its own, so it has no children either.
I encountered a couple of Oopses which required a reboot - I think they are solved now, but just in case, caveat emptor.
Here is the callmodule.c code (with some notes/links at end):
// callmodule.c with pid, url: https://stackoverflow.com/questions/21668727/
#include <linux/module.h>
#include <linux/slab.h> //kzalloc
#include <linux/syscalls.h> // SIGCHLD, ... sys_wait4, ...
#include <linux/kallsyms.h> // kallsyms_lookup, print_symbol
// global variable - to avoid intervening too much in the return of call_usermodehelperB:
static int callmodule_pid;
// >>>>>>>>>>>>>>>>>>>>>>
// modified kernel functions - taken from
// http://lxr.missinglinkelectronics.com/linux+v2.6.38/+save=include/linux/kmod.h
// http://lxr.linux.no/linux+v2.6.38/+save=kernel/kmod.c
// define a modified struct (with extra pid field) here:
struct subprocess_infoB {
struct work_struct work;
struct completion *complete;
char *path;
char **argv;
char **envp;
int wait; //enum umh_wait wait;
int retval;
int (*init)(struct subprocess_info *info);
void (*cleanup)(struct subprocess_info *info);
void *data;
pid_t pid;
};
// forward declare:
struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
char **envp, gfp_t gfp_mask);
static inline int
call_usermodehelper_fnsB(char *path, char **argv, char **envp,
int wait, //enum umh_wait wait,
int (*init)(struct subprocess_info *info),
void (*cleanup)(struct subprocess_info *), void *data)
{
struct subprocess_info *info;
struct subprocess_infoB *infoB;
gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
int ret;
populate_rootfs_wait(); // is in linux-headers-2.6.38-16-generic/include/linux/kmod.h
infoB = call_usermodehelper_setupB(path, argv, envp, gfp_mask);
printk(KBUILD_MODNAME ":a: pid %d\n", infoB->pid);
info = (struct subprocess_info *) infoB;
if (info == NULL)
return -ENOMEM;
call_usermodehelper_setfns(info, init, cleanup, data);
printk(KBUILD_MODNAME ":b: pid %d\n", infoB->pid);
// this must be called first, before infoB->pid is populated (by __call_usermodehelperB):
ret = call_usermodehelper_exec(info, wait);
// assign global pid here, so rest of the code has it:
callmodule_pid = infoB->pid;
printk(KBUILD_MODNAME ":c: pid %d\n", callmodule_pid);
return ret;
}
static inline int
call_usermodehelperB(char *path, char **argv, char **envp, int wait) //enum umh_wait wait)
{
return call_usermodehelper_fnsB(path, argv, envp, wait,
NULL, NULL, NULL);
}
/* This is run by khelper thread */
static void __call_usermodehelperB(struct work_struct *work)
{
struct subprocess_infoB *sub_infoB =
container_of(work, struct subprocess_infoB, work);
int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait;
pid_t pid;
struct subprocess_info *sub_info;
// hack - declare function pointers, to use for wait_for_helper/____call_usermodehelper
int (*ptrwait_for_helper)(void *data);
int (*ptr____call_usermodehelper)(void *data);
// assign function pointers to verbatim addresses as obtained from /proc/kallsyms
ptrwait_for_helper = (void *)0xc1065b60;
ptr____call_usermodehelper = (void *)0xc1065ed0;
sub_info = (struct subprocess_info *)sub_infoB;
/* CLONE_VFORK: wait until the usermode helper has execve'd
* successfully We need the data structures to stay around
* until that is done. */
if (wait == UMH_WAIT_PROC)
pid = kernel_thread((*ptrwait_for_helper), sub_info, //(wait_for_helper, sub_info,
CLONE_FS | CLONE_FILES | SIGCHLD);
else
pid = kernel_thread((*ptr____call_usermodehelper), sub_info, //(____call_usermodehelper, sub_info,
CLONE_VFORK | SIGCHLD);
printk(KBUILD_MODNAME ": : pid %d\n", pid);
// grab and save the pid here:
sub_infoB->pid = pid;
switch (wait) {
case UMH_NO_WAIT:
call_usermodehelper_freeinfo(sub_info);
break;
case UMH_WAIT_PROC:
if (pid > 0)
break;
/* FALLTHROUGH */
case UMH_WAIT_EXEC:
if (pid < 0)
sub_info->retval = pid;
complete(sub_info->complete);
}
}
/**
* call_usermodehelper_setup - prepare to call a usermode helper
*/
struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
char **envp, gfp_t gfp_mask)
{
struct subprocess_infoB *sub_infoB;
sub_infoB = kzalloc(sizeof(struct subprocess_infoB), gfp_mask);
if (!sub_infoB)
goto out;
INIT_WORK(&sub_infoB->work, __call_usermodehelperB);
sub_infoB->path = path;
sub_infoB->argv = argv;
sub_infoB->envp = envp;
out:
return sub_infoB;
}
// <<<<<<<<<<<<<<<<<<<<<<
static int __init callmodule_init(void)
{
int ret = 0;
char userprog[] = "/path/to/mytest";
char *argv[] = {userprog, "2", NULL };
char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
struct task_struct *p;
struct task_struct *par;
struct task_struct *pc;
struct list_head *children_list_head;
struct list_head *cchildren_list_head;
char *state_str;
printk(KBUILD_MODNAME ": > init %s\n", userprog);
/* last parameter: 1 -> wait until execution has finished, 0 go ahead without waiting*/
/* returns 0 if usermode process was started successfully, errorvalue otherwise*/
/* no possiblity to get return value of usermode process*/
// note - only one argument allowed for print_symbol
print_symbol(KBUILD_MODNAME ": symbol # 0xc1065b60 is %s\n", 0xc1065b60); // shows wait_for_helper+0x0/0xb0
print_symbol(KBUILD_MODNAME ": symbol # 0xc1065ed0 is %s\n", 0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90
ret = call_usermodehelperB(userprog, argv, envp, UMH_WAIT_EXEC);
if (ret != 0)
printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n", ret);
else
printk(KBUILD_MODNAME ": everything all right; pid %d\n", callmodule_pid);
// find the task:
// note: sometimes p may end up being NULL here, causing kernel oops -
// just exit prematurely in that case
rcu_read_lock();
p = pid_task(find_vpid(callmodule_pid), PIDTYPE_PID);
rcu_read_unlock();
if (p == NULL) {
printk(KBUILD_MODNAME ": p is NULL - exiting\n");
return 0;
}
// p->comm should be the command/program name (as per userprog)
// (out here that task is typically in runnable state)
state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": pid task a: %p c: %s p: [%d] s: %s\n",
p, p->comm, p->pid, state_str);
// find parent task:
// parent task could typically be: c: kworker/u:1 p: [14] s: stopped
par = p->parent;
if (par == NULL) {
printk(KBUILD_MODNAME ": par is NULL - exiting\n");
return 0;
}
state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped");
printk(KBUILD_MODNAME ": parent task a: %p c: %s p: [%d] s: %s\n",
par, par->comm, par->pid, state_str);
// iterate through parent's (and our task's) child processes:
rcu_read_lock(); // read_lock(&tasklist_lock);
list_for_each(children_list_head, &par->children){
p = list_entry(children_list_head, struct task_struct, sibling);
printk(KBUILD_MODNAME ": - %s [%d] \n", p->comm, p->pid);
// note: trying to print "%p",p here results with oops/segfault:
// printk(KBUILD_MODNAME ": - %s [%d] %p\n", p->comm, p->pid, p);
if (p->pid == callmodule_pid) {
list_for_each(cchildren_list_head, &p->children){
pc = list_entry(cchildren_list_head, struct task_struct, sibling);
printk(KBUILD_MODNAME ": - - %s [%d] \n", pc->comm, pc->pid);
}
}
}
rcu_read_unlock(); //~ read_unlock(&tasklist_lock);
return 0;
}
static void __exit callmodule_exit(void)
{
printk(KBUILD_MODNAME ": < exit\n");
}
module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");
/*
NOTES:
// assign function pointers to verbatim addresses as obtained from /proc/kallsyms:
// ( cast to void* to avoid "warning: assignment makes pointer from integer without a cast",
// see also https://stackoverflow.com/questions/3941793/what-is-guaranteed-about-the-size-of-a-function-pointer )
// $ sudo grep 'wait_for_helper\|____call_usermodehelper' /proc/kallsyms
// c1065b60 t wait_for_helper
// c1065ed0 t ____call_usermodehelper
// protos:
// static int wait_for_helper(void *data)
// static int ____call_usermodehelper(void *data)
// see also:
// http://www.linuxforu.com/2012/02/function-pointers-and-callbacks-in-c-an-odyssey/
// from include/linux/kmod.h:
//~ enum umh_wait {
//~ UMH_NO_WAIT = -1, /* don't wait at all * /
//~ UMH_WAIT_EXEC = 0, /* wait for the exec, but not the process * /
//~ UMH_WAIT_PROC = 1, /* wait for the process to complete * /
//~ };
// however, note:
// /usr/src/linux-headers-2.6.38-16-generic/include/linux/kmod.h:
// #define UMH_NO_WAIT 0 ; UMH_WAIT_EXEC 1 ; UMH_WAIT_PROC 2 ; UMH_KILLABLE 4 !
// those defines end up here, regardless of the enum definition above
// (NB: 0,1,2,4 enumeration starts from kmod.h?v=3.4 on lxr.free-electrons.com !)
// also, note, in "generic" include/, prototypes of call_usermodehelper(_fns)
// use int wait, and not enum umh_wait wait ...
// seems these cannot be used from a module, nonetheless:
//~ extern int wait_for_helper(void *data);
//~ extern int ____call_usermodehelper(void *data);
// we probably would have to (via http://www.linuxconsulting.ro/pidwatcher/)
// edit /usr/src/linux/kernel/ksyms.c and add:
//EXPORT_SYMBOL(wait_for_helper);
// but that is kernel re-compilation...
// https://stackoverflow.com/questions/19360298/triggering-user-space-with-kernel
// You should not be using PIDs to identify processes within the kernel. The process can exit and a different process re-use that PID. Instead, you should be using a pointer to the task_struct for the process (rather than storing current->pid at registration time, just store current)
# reports task name from the pid (pid_task(find_get_pid(..)):
http://tuxthink.blogspot.dk/2012/07/module-to-find-task-from-its-pid.html
// find the task:
//~ rcu_read_lock();
// uprobes uses this - but find_task_by_pid is not exported for modules:
//~ p = find_task_by_pid(callmodule_pid);
//~ if (p)
//~ get_task_struct(p);
//~ rcu_read_unlock();
// see: [http://www.gossamer-threads.com/lists/linux/kernel/1260996 find_task_by_pid() problem | Linux | Kernel]
// https://stackoverflow.com/questions/18408766/make-a-system-call-to-get-list-of-processes
// this macro loops through *all* processes; our callmodule_pid should be listed by it
//~ for_each_process(p)
//~ pr_info("%s [%d]\n", p->comm, p->pid);
// [https://lists.debian.org/debian-devel/2008/05/msg00034.html Re: problems for making kernel module]
// note - WARNING: "tasklist_lock" ... undefined; because tasklist_lock removed in 2.6.1*:
// "tasklist_lock protects the kernel internal task list. Modules have no business looking at it";
// https://stackoverflow.com/questions/13002444/list-all-threads-within-the-current-process
// "all methods that loop over the task lists need to be wrapped in rcu_read_lock(); / rcu_read_unlock(); to be correct."
// https://stackoverflow.com/questions/19208487/kernel-module-that-iterates-over-all-tasks-using-depth-first-tree
// https://stackoverflow.com/questions/5728592/how-can-i-get-the-children-process-list-in-kernel-code
// https://stackoverflow.com/questions/1446239/traversing-task-struct-children-in-linux-kernel
// https://stackoverflow.com/questions/8207160/kernel-how-to-iterate-the-children-of-the-current-process
// https://stackoverflow.com/questions/10262017/linux-kernel-list-list-head-init-vs-init-list-head
// https://stackoverflow.com/questions/16230524/explain-list-for-each-entry-and-list-for-each-entry-safe "list_entry is just an alias for container_of"
*/

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight