Let's say we have a multi-threaded Linux x86-64 executable (written in C, for example) with three threads: main, consumer and producer. Some of the functions are intended to use by some threads only. For example, the produce() function should only ever be called by the producer thread. I would like that if another thread (such as consumer) calls produce(), then we'd get a fatal error (a SIGABRT or SIGSEGV, for example).
One way to deal with this is to register the thread id, and check that the thread id calling produce() is in fact the producer thread id. If not, call abort(). That method unfortunately requires a runtime check for each function call, that maybe prohibitive if the function is in a hot path.
I'm wondering if there's another way, such as annotating and then moving all functions intended for producer only to their own section and remove executable memory accesses for all the other threads - my understanding is that this wouldn't work since mprotect() sets process-wide permissions - ?
Edit:
#AlanAu asks whether this check has to be done at runtime. It's not a requirement, but my understanding is that such a check would only work at runtime for non-trivial programs using functions pointers, for example.
Edit2:
I realize using processes would help address this, but as noted in the comments inter-threads communications is more efficient.
One /rather hackey/ way of doing this is to make pointers for these and calling the pointers instead of the functions themselves. Example:
void disallowed_call(void)
{ abort(); }
void testfunc(void)
{
printf("Hello, world!\n");
}
void childcode(void (*notrestricted)(void), void (*restricted)(void);)
{
printf("Non restricted call:\n");
*notrestricted();
printf("Restricted call:\n");
*restricted();
}
int main()
{
fork();
if (getpid() == 0)
{
childcode(&testfunc, &testfunc);
}
else
{
childcode(&testfunc, &disallowed_call);
}
return 0;
}
That might be a bit more complicated than you were looking for, but it should work. The runtime check is done only once.
Related
For an assignment, I need to use sched_yield() to synchronize threads. I understand a mutex lock/conditional variables would be much more effective, but I am not allowed to use those.
The only functions we are allowed to use are sched_yield(), pthread_create(), and pthread_join(). We cannot use mutexes, locks, semaphores, or any type of shared variable.
I know sched_yield() is supposed to relinquish access to the thread so another thread can run. So it should move the thread it executes on to the back of the running queue.
The code below is supposed to print 'abc' in order and then the newline after all three threads have executed. I looped sched_yield() in functions b() and c() because it wasn't working as I expected, but I'm pretty sure all that is doing is delaying the printing because a function is running so many times, not because sched_yield() is working.
The server it needs to run on has 16 CPUs. I saw somewhere that sched_yield() may immediately assign the thread to a new CPU.
Essentially I'm unsure of how, using only sched_yield(), to synchronize these threads given everything I could find and troubleshoot with online.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sched.h>
void* a(void*);
void* b(void*);
void* c(void*);
int main( void ){
pthread_t a_id, b_id, c_id;
pthread_create(&a_id, NULL, a, NULL);
pthread_create(&b_id, NULL, b, NULL);
pthread_create(&c_id, NULL, c, NULL);
pthread_join(a_id, NULL);
pthread_join(b_id, NULL);
pthread_join(c_id, NULL);
printf("\n");
return 0;
}
void* a(void* ret){
printf("a");
return ret;
}
void* b(void* ret){
for(int i = 0; i < 10; i++){
sched_yield();
}
printf("b");
return ret;
}
void* c(void* ret){
for(int i = 0; i < 100; i++){
sched_yield();
}
printf("c");
return ret;
}
There's 4 cases:
a) the scheduler doesn't use multiplexing (e.g. doesn't use "round robin" but uses "highest priority thread that can run does run", or "earliest deadline first", or ...) and sched_yield() does nothing.
b) the scheduler does use multiplexing in theory, but you have more CPUs than threads so the multiplexing doesn't actually happen, and sched_yield() does nothing. Note: With 16 CPUs and 2 threads, this is likely what you'd get for "default scheduling policy" on an OS like Linux - the sched_yield() just does a "Hrm, no other thread I could use this CPU for, so I guess the calling thread can keep using the same CPU!").
c) the scheduler does use multiplexing and there's more threads than CPUs, but to improve performance (avoid task switches) the scheduler designer decided that sched_yield() does nothing.
d) sched_yield() does cause a task switch (yielding the CPU to some other task), but that is not enough to do any kind of synchronization on its own (e.g. you'd need an atomic variable or something for the actual synchronization - maybe like "while( atomic_variable_not_set_by_other_thread ) { sched_yield(); }). Note that with an atomic variable (introduced in C11) it'd work without sched_yield() - the sched_yield() (if it does anything) merely makes busy waiting less awful/wasteful.
Essentially I'm unsure of how, using only sched_yield(), to
synchronize these threads given everything I could find and
troubleshoot with online.
That would be because sched_yield() is not well suited to the task. As I wrote in comments, sched_yield() is about scheduling, not synchronization. There is a relationship between the two, in the sense that synchronization events affect which threads are eligible to run, but that goes in the wrong direction for your needs.
You are probably looking at the problem from the wrong end. You need each of your threads to wait to execute until it is their turn, and for them to do that, they need some mechanism to convey information among them about whose turn it is. There are several alternatives for that, but if "only sched_yield()" is taken to mean that no library functions other than sched_yield() may be used for that purpose then a shared variable seems the expected choice. The starting point should therefore be how you could use a shared variable to make the threads take turns in the appropriate order.
Flawed starting point
Here is a naive approach that might spring immediately to mind:
/* FLAWED */
void *b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
// nothing?
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That is, the thread executes a busy loop, monitoring the shared variable to await it taking a value signifying that the thread should proceed. When it has done its work, the thread modifies the variable to indicate that the next thread may proceed. But there are several problems with that, among them:
Supposing that at least one other thread writes to the object designated by *whose_turn, the program contains a data race, and therefore its behavior is undefined. As a practical matter, a thread that once entered the body of the loop in that function might loop infinitely, notwithstanding any action by other threads.
Without making additional assumptions about thread scheduling, such as a fairness policy, it is not safe to assume that the thread that will make the needed modification to the shared variable will be scheduled in bounded time.
While a thread is executing the loop in that function, it prevents any other thread from executing on the same core, yet it cannot make progress until some other thread takes action. To the extent that we can assume preemptive thread scheduling, this is an efficiency issue and contributory to (2). However, if we assume neither preemptive thread scheduling nor the threads being scheduled each on a separate core then this is an invitation to deadlock.
Possible improvements
The conventional and most appropriate way to do that in a pthreads program is with the use of a mutex and condition variable. Properly implemented, that resolves the data race (issue 1) and it ensures that other threads get a chance to run (issue 3). If that leaves no other threads eligible to run besides the one that will modify the shared variable then it also addresses issue 2, to the extent that the scheduler is assumed to grant any CPU to the process at all.
But you are forbidden to do that, so what else is available? Well, you could make the shared variable _Atomic. That would resolve the data race, and in practice it would likely be sufficient for the wanted thread ordering. In principle, however, it does not resolve issue 3, and as a practical matter, it does not use sched_yield(). Also, all that busy-looping is wasteful.
But wait! You have a clue in that you are told to use sched_yield(). What could that do for you? Suppose you insert a call to sched_yield() in the body of the busy loop:
/* (A bit) better */
void* b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
sched_yield();
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That resolves issues 2 and 3, explicitly affording the possibility for other threads to run and putting the calling thread at the tail of the scheduler's thread list. Formally, it does not resolve issue 1 because sched_yield() has no documented effect on memory ordering, but in practice, I don't think it can be implemented without a (full) memory barrier. If you are allowed to use atomic objects then combining an atomic shared variable with sched_yield() would tick all three boxes. Even then, however, there would still be a bunch of wasteful busy-looping.
Final remarks
Note well that pthread_join() is a synchronization function, thus, as I understand the task, you may not use it to ensure that the main thread's output is printed last.
Note also that I have not spoken to how the main() function would need to be modified to support the approach I have suggested. Changes would be needed for that, and they are left as an exercise.
Let's assume I've got the following main in a c file
int f();
int main(){
//terminate f() if in infinite loop
return f();
}
and then a separate c file that could potentially hold the following:
int f() {
for(;;) {}
return 0;
}
Is there any way to detect that the function f() is in an infinite loop and terminate it's execution from within the main function?
EDIT:
I need this functionality as I am writing a testbench where the function called could potentially have an infinite loop - that's what I am checking for in the end. Therefore, I cannot modify f() in anyway. I'm also in a Linux environment.
No, there is no way to definitively determine if a function contains an infinite loop.
However, we can make a few assumptions to detect a potential infinite loop and exit a program gracefully within the program (e.g. we don't have to press Ctrl+C). This method is common in several testing frameworks used in JS. Basically, we set some arbitrary time limit for a function to complete in. If the function does not complete within that time limit, we assume it will not complete and we throw an error.
In C/C++ you could implement this with pthreads if you're on a Unix system. In Windows, you would use windows.h. I only have experience with pthreads, so I'll show a simple example of how you might get this working using pthreads.
#include <pthread.h> // Load pthread
#include <signal.h> // If f() does not exit, we will need this library to send it a signal to kill itself.
#include <stdbool.h> // You could use an int or char.
#include <stddef.h> // Defines NULL
#include <unistd.h> // Defines sleep()
bool testComplete; // Has the test completed?
/**
* The function being tested.
*/
void f() {
while(true);
}
/**
* This method handles executing the test. This is the function pthread will
* use as its start routine. It takes no arguments and returns no results.
* The signature is required for pthread_create().
*/
void *runTest(void *ptr) {
testComplete = false;
f();
testComplete = true;
}
int main() {
pthread_t testThread;
pthread_create(&testThread, NULL, runTest, NULL); // Create and start the new thread. It will begin executing runTest() eventually.
sleep(5); // Give it 5 seconds to complete (this should be adjusted or could even be made dynamic).
if(testComplete) {
// Test completed successfully.
pthread_join(testThread, NULL);
} else {
// The test did not exit successfully within the time limit. Kill it and you'll probably what to provide some feedback here.
pthread_kill(testThread, SIGPIPE); // There are other signals, but this one cannot be ignored or caught.
}
}
To compile this, you would need to execute gcc your_filename.c -o output_filename -lpthread.
If you expect the program to run on both Unix and Windows systems, you may want to consider making some unified interface for accessing threads and then adapting the OS-specific interfaces to your interface. It will make things a little simpler, especially when expanding this library.
You could call f() in a different thread and have main time-out f() when it reaches a certain limit. However, I don't think this is practical and you should really work on solving the infinite loop first.
On a Posix system (Linux, MacOS) you can schedule an alarm in the future with setitimer() before calling the function. Signal SIGALRM will be delivered to the process after the specified delay. Make sure that your program has the signal handler, you should register it with sigaction() before starting the timer.
When the signal handler takes control after the signal is raised, you may get out if the offending loop with setjmp() and longjmp().
If you call f() the way you showed (from main) then at that point the main context is in f, not main and therefore you cannot "check f from main".
What you can try is calling f() from a separate thread and check whether that thread has finished within specified time limit. However I'm not sure about practicality of this. While I don't know what you really plan to do in that function, n some cases you may stop this function from executing at the point where it did soemthing that requires cleaning up. One example that comes to mind is it calling malloc but but being able to call free at the point where you interrupt it.
Honestly, if there's a certain requirement about the time in which given function has to finish, just put that check within the function itself and return false to indicate it didn't finish successfully.
I want to improve the speed of a certain C application and for that I will be using threads. What concerns me is if I can, within a function being executed in a different thread, call another function of that is not being executed in the same thread: Example:
void checks_if_program_can_do_something()
{
}
void does_something()
{
checks_if_program_can_do_something();
}
int main()
{
does_something(); //another thread is used to execute this function
return 1;
}
In this case, the function does_something() calls the function checks_if_program_can_do_something(), but does_something() is being executed in another thread. I hope I made myself clear. Can I also call the function checks_if_program_can_do_something() in other functions using multiple threads?
Yes, but you should take care that the function does not alter state in such a way that other threads would be impacted negatively.
The terms related to this kind of protection are reentrant, which means the program can safely be paused and continued, and thread-safe which means that two non-pausing calls can be made at the same time.
These features you add to a function require programming approaches that might differ from most people's standard approaches; but, they handle two important scenarios that must be accounted for when writing threaded code:
The CPU pauses part of your program (it needs to wait on I/O) or needs the core for a different part of your program.
The CPU decides to run two threads of your program at the same time on different cores.
Gide lines for safe programming approaches are plentiful, but I've provided one here for you to get started. Keep in mind that if you use someone else's code in a threaded situation, you also need to verify that their code is written to be reentrant / thread safe.
I have C code like this
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
void handler_function(int);
int i=0;
int j=0;
int main() {
signal(SIGINT,f);
while(1) {
/* do something in variable `i` */
}
}
void f(int signum) {
/* do something else on variable `i` */
}
Can it produce a data race? i.e. is f executed in parallel (even in a multithread machine) to the main. Or maybe is the main stopped until f finish its execution?
First of all according to the man page of signal() you should not use signal() but sigaction()
The behavior of signal() varies across UNIX versions, and has also varied historically across different versions of Linux. Avoid its use: use sigaction(2) instead. See Portability below.
But one might hope that signal() behaves sanely. However, there might be a data race because main might be interrupted before a store e.g. in a situation like this
if ( i > 10 ) {
i += j;
}
void f(int signum) {
i = 0;
}
If main is past the compare (or if the according registers do not get update if main was interrupted while compare), main would still to i += j which is a data race.
So where does this leave us? - Don't ever modify globals that get modified elsewhere in signal handlers if you cannot guarantee that the signal handler cannot interrupt this operation (e.g. disable signal handler for certain operations).
Unless you use the raise() from Standard C or kill() with the value from getpid() as the PID argument, signal events are asynchronous.
In single-threaded code on a multi-core machine, it means that you cannot tell what is happening in the 'do something to variable i' code. For example, that code might have just fetched the value from i and have incremented it, but not yet saved the incremented value. If the signal handler function f() reads i, modifies it in a different way, saves the result and returns, the original code may now write the incremented value of i instead of using the value modified by f().
This is what leads to the many constraints on what you can do in a signal handler. For example, it is not safe to call printf() in a signal handler because it might need to do memory allocation (malloc()) and yet the signal might have arrived while malloc() was modifying its linked lists of available memory. The second call to malloc() might get thoroughly confused.
So, even in a single-threaded program, you have to be aware and very careful about how you modify global variables.
However, in a single-threaded program, there will be no activity from the main loop while the signal is being handled. Indeed, even in a multi-threaded program, the thread that receives (handles) the signal is suspended while the signal handler is running, but other threads are not suspeded so there could be concurrent activity from other threads. If it matters, make sure the access to the variables is properly serialized.
See also:
What is the difference between sigaction() and signal()?
Signal concepts.
Does anyone tell me how to block some specific system calls within a program, please? I am building a system which takes a piece of C source code, compiles it with gcc and runs it. For security reasons, I need to prevent the compiled program from calling some system calls. Is there any way to do it, from the source code level (e.g. stripping the header files of gcc, detecting malicious external calls, ...) to the executable level?
Edited #1: Add details about malicious calls.
Edited #2: My system is a GNU/Linux one.
Edited #3:
I have tried some methods within a few days and here are the conclusions I've got so far:
Scanning the source code does not solve the main problem since one can always obsfucate his/her C source file quite well.
"Overriding C symbol" works well for libraries, but for system calls I have not achieved what I wanted. This idea is not dead, however, doing this would definitely cause me a lot of time hacking (gcc and/or ld).
Permission deescalation works like a charm. I could use fakeroot or a "guest" user to do it. This method is also the easiest to implement.
The other one is native client which I have not tried yet but I definitely would in near future due to the common between the project and my work.
As others have noted, it's impossible for a program to avoid making system calls, they permate the C library all over the place.
However you might be able to make some headway with careful use of the LD_PRELOAD mechanism, if your platform supports it (e.g. Linux): you write a shared library with the same symbol names as those in the C library, which are called instead of the intended libc functions. (For example, Electric Fence is built as a shared library on Debian-based systems and intercepts calls to malloc, free et al.)
I suspect you could use this mechanism to trap or argument-check calls to any libc functions you don't like, and perhaps to note those which you consider unconditionally safe. It might then be reasonable to scan the compiled executable for the code corresponding to INT 0x80 to trap out any attempts to make raw syscalls (0xcd 0x80 - though beware of false positives). However I have only give this a few moments of thought, I could easily have missed something or this might turn out to be impractical...
You could run the compiled program by forking it from a wrapper and use the Linux ptrace(2) facility to intercept and inspect all system calls invoked by the program.
The following example code shows a wrapper that runs the /usr/bin/w command, prints each system call invoked by the command, and terminates the command if it tries to invoke the write(2) system call.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sys/reg.h>
#define BAD_SYSCALL __NR_write
int main(int argc, char *argv)
{
pid_t child;
int status, syscall_nr;
child = fork();
if (child == 0) {
/* In child. */
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/usr/bin/w", NULL, NULL);
// not reached
}
/* In parent. */
while (1) {
wait(&status);
/* Abort loop if child has exited. */
if (WIFEXITED(status) || WIFSIGNALED(status))
break;
/* Obtain syscall number from the child's process context. */
syscall_nr = ptrace(PTRACE_PEEKUSER, child, 4 * ORIG_EAX, NULL);
printf("Child wants to execute system call %d: ", syscall_nr);
if (syscall_nr != BAD_SYSCALL) {
/* Allow system call. */
printf("allowed.\n");
ptrace(PTRACE_SYSCALL, child, NULL, NULL);
} else {
/* Terminate child. */
printf("not allowed. Terminating child.\n");
ptrace(PTRACE_KILL, child, NULL, NULL);
}
}
exit(EXIT_SUCCESS);
}
You can do much more powerful things using ptrace, such as inspect and change a process' address space (e.g., to obtain and modify the parameters passed to a system call).
A good introduction can be found in this Linux Journal Article and its follow-up.
You can't.
Even this program:
#include <stdio.h>
int main()
{
printf("Hello, World\n");
return 0;
}
makes at least one system call (to send the string "Hello, World\n" to standard out). System calls are the only way for a program to interact with the outside World. Use the operating system's security model for security.
Edited for this comment:
I meant not all system calls but malicious system calls, e.g. execv() could be used to execute a BASH script which wipes out my data on the disk.
Your operating system already includes mechanisms to stop that sort of thing happening. For instance, in order for a bash script to wipe out your data, the process must already have write access to that data. That means it must have been started by you or root. Your only real option is not to install untrustworthy software.
By the way, depending on your platform, execv is not necessarily a system call. On Linux, it's a C library wrapper for the real system call (execve).
Just to illustrate that this is not possible, the following program:
int main() {
return 0;
}
makes over 20 system calls as reported using strace. The calls include open (twice) which is one of the calls you seem to want to block.
Well, if you just want to block specific calls, why not just do a grep through the source code before attempting to compile it ? And reject programs which use the insecure system calls.
Some project have similar idea you can take a look at nacl: http://code.google.com/p/nativeclient/