cgo Interacting with C Library that uses Thread Local Storage - c

I'm in the midst of wrapping a C library with cgo to be usable by normal Go code.
My problem is that I'd like to propagate error strings up to the Go API, but the C library in question makes error strings available via thread-local storage; there's a global get_error() call that returns a pointer to thread local character data.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
The problem that I foresee here is that TLS in C works on the level of native OS threads, but in my understanding, the calling Go code will be coming from one of potentially N goroutines that are multiplexed across some number of underlying native threads in a thread pool managed by the Go scheduler.
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. When the original goroutine gets swapped back in, it could potentially be on a different native thread for all I know, but even if it gets swapped back onto the same thread, any goroutines that ran there in the intervening time could've changed the state of the TLS, causing me to load an error string for an unrelated call.
My questions are these:
Is this a reasonable concern? Am I misunderstanding something about the go scheduler, or the way it interacts with cgo, that would cause this to not be an issue?
If this is a reasonable concern, how can I work around it?
cgo somehow manages to propagate errno values back to the calling Go code, which are also stored in TLS, which makes me think there must be a safe way to do this.
I can't think of a way that the C code itself could get preempted by the go scheduler, so should I introduce a wrapper C function and have IT make the necessary call and then conditionally copy the error string before returning back up to goland?
I'm interested in any solution that would allow me to propagate the error strings out to the rest of Go, but I'm hoping to avoid any solution that would require me to serialize accesses around the TLS, as adding a lock just to grab an error string seems greatly unfortunate to me.
Thanks in advance!

What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. ...
Is this a reasonable concern?
Yes. The cgo "call C code" wrappers lock on to one POSIX / OS thread for the duration of each call, but the thread they lock is not fixed for all time; it does in fact bop around, as it were, to multiple different threads over time, as long as your goroutines are operating normally. (Since Go is cooperatively scheduled in the current implementations, you can, in some circumstances, be careful not to do anything that might let you switch underlying OS threads, but this is probably not a good plan.)
You can use runtime.LockOSThread here, but I think the best plan is otherwise:
how can I work around it?
Grab the error before Go resumes its normal scheduling algorithm (i.e., before unlocking the goroutine from the C / POSIX thread).
cgo somehow manages to propagate errno values ...
It grabs the errno value before unlocking the goroutine from the POSIX thread.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
If there is a variant of this that takes the error number (rather than fishing it out of a TLS variable), that plan should still work: just make sure that your C routines provide both the return value and the error number.
If not, write your own C wrapper, just as you suggested:
ftype wrapper_for_realfunc(char **errp, arg1type arg1, arg2type arg2) {
ftype ret = realfunc(arg1, arg2);
if IS_ERROR(ret) {
*errp = get_error();
} else {
*errp = NULL;
}
return ret;
}
Now your Go wrapper simply calls the wrapper, which fills in a pointer to C memory with an extra *C.char argument, setting it to nil if there is no error, and setting it to something on which you can use C.GoString if there is an error.
If that's not feasible for some reason, consider using runtime.LockOSThread and its counterpart, runtime.UnlockOSThread.

Related

How to implement a system call that could check if itself has been successfully executed without going to the kernel log?

I have just stepped into the kernel world and would like to add some system calls. My goal is to add a system call that lets me check if it executed (without looking at the kernel log). However, I have been thinking for a long time, but have not yet figured out how to implement it. Could anyone please give me some advice? Or some pseudocodes? Thanks in advance.
My thinking is that we could implement a new system call, in which it writes something into a buffer. Then, another system call reads the content of the buffer to check if the previous system call has written to the buffer. (Somehow like pthread_create and pthread_join) Hence, my implementation consists of 2 system calls in total.
Here is a sketch of my thinking written in pseudocode:
syscall_2(...){
if (syscall_1 executes)
return 0;
if (syscall_1 NOT executes)
return -1;
}
syscall_1(){
do something;
create a buffer;
write something into buffer;
return syscall_2(buffer); // checks what is in buffer
}
My suggestion is that you have the system call itself accept a pointer to a userspace buffer that it overwrites with a specific piece of information.
You will have to learn how to access userspace memory, and more importantly how to verify that you were given a pointer to memory the process has mapped, and has write access to.
Then, once the system call completes, your program that called it can not only check the return code of the system call, you can also examine the memory to see if the system call wrote the correct thing to it.
Normally, system calls inform the caller if they are executed (how it went) so I guess you are interested in knowing which system calls have been executed, and how many times.
From this perspective, I think the best is to implement a device that can be queried (by means of some ioctl call) and let you know statistics about the individual system calls you can be interested on.
For example.... you can implement the number of system calls of type n you have used in some time.... by checking a counter at start time of interval and at end, and then check how many calls (if you implement a counter) you did in between by just subtracting the counter values at both times. You can also do the same to e.g. calculate the average time, by accumulating the time a system call takes to execute at the end of it. If you do this for example in picosecs, you can be sure this will be a good idea that you can publish. In this schema you can also account for the amount of I/O that each system call does, by counting the amount of bytes transferred to/from usermode.... You could implement this as ioctls to some device and then you don't need to add a system call for it.

cryptsetup backend safe with multithreading?

Is there a crypto backend for cryptsetup that either is always thread safe, or can be easily used (or even modified, preferably with minimal effort) in a thread safe manner for simply testing if a key is correct?
Background and what I have tried:
I started by testing if I could modify the source of cryptsetup to simply test multiple keys using pthreads. This crashed, I believe I used gcrypt initially. Eventually I tried all of the backends available in the cryptsetup stable source and found that openssl and nettle seems to avoid crashing.
However, my testing was not very thorough and even though it (nettle specifically) does not crash, it seems that it does not work correctly when using threads. When using a single thread it always works, but increasing the number of threads makes it increasingly likely it will silently never find the correct key.
This is for brute forcing LUKS devices. I am aware the pbkdf slows it down to a crawl. I'm also aware the key space of AES cannot be exhausted even if the KDF was not there. This is just for the fun of making it in a network distributed and multithreaded manner.
I noticed in the source of cryptsetup (libdevmapper.c):
/*
* libdevmapper is not context friendly, switch context on every DM call.
* FIXME: this is not safe if called in parallel but neither is DM lib.
*/
However, it is possible I'm simply not using it correctly.
if(!LUKS_open_key_with_hdr(CRYPT_ANY_SLOT, key, strlen(key), &cd->u.luks1.hdr, &vk, cd)) {
return 0;
}
Each worker thread does this. I only call crypt_init() and crypt_load() once before the worker threads start up and pass them their own separate copy of the struct crypt_device. vk is created locally for each attempt. The keys are simply fetched from a wordlist with access control by a mutex. I found that if each thread calls these functions (crypt_init and crypt_load) every time, it seems to crash more easily.
Is it completely incorrect to try start removing and rewriting the code that uses dmcrypt? In LUKS_endec_template() it attaches a loop device to the crypto device, and creates a dm device which it eventually gives to open(), which it then gives the fd of to read_blockwise(). My idea was to simply skip all of that since I don't really need to use the device except to verify the key. However, simply opening the crypto device directly (and give it to read_blockwise()) does not work.

is sprintf thread safe?

is sprintf thread safe ?
//Global log buffer
char logBuffer[20];
logStatus (char * status, int length)
{
snprintf(logBuffer, 19, status);
printf ("%s\n", logBuffer);
}
The thread safety of this function totally depends upon the thread safety of snprintf/sprintf .
Updates :
thanks for ur answers .
i dont mind, if the actual contents gts messed up. but want to confirm that the sprintf would not cause a memory corruption / buffer overflow going beyond 20 bytes in this case, when multiple threads are trying to write to logBuffer ?
There is no problem using snprintf() in multiple threads. But here you are writing to a shared string buffer, which I assume is shared across threads.
So your use of this function would not be thread safe.
Your question has an incorrect premise. Even if sprintf itself can be safely called from multiple threads at the same time (as I sure hope it can), your code is not protecting your global variable. The standard library can't possibly help you there.
You have several problems with your code.
Your usage of snprintf is very suspicious. Don't use it just to
copy a string. Generally don't pass dynamically allocated strings
with whatever content as format to any of the printf functions.
They interpret the contents and if there is anything in them that
resembles a %-format, you are doomed.
Don't use static buffers as you do. This is certainly neither
thread safe not re-entrant.
Either use printf with an appropriate format directly, or replace
the call by puts.
Then, Linux adheres to the POSIX standard, which requires that the standard IO functions are thread safe.
Regarding your update about not worrying if the logBuffer content get garbled:
I'm not sure why you want to avoid making your function completely thread safe by using a locally allocated buffer or some synchronization mechanism, but if you want to know what POSIX has to say about it, here you go (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11):
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. [followed by a list of functions which provide synchronization]
So, POSIX says that your program needs to make sure mutilple threads won't be modifying logBuffer concurrently (or modifying logBuffer in one thread while reading it in another). If you don't hold to that, there's no promise made that the worst that will happen is garbled data in logBuffer. There's simply no promise made at all about what the results will be. I don't know if Linux might document a more specific behavior, but I doubt it does.
"There is no problem using snprintf() in multiple threads."
Not true.
Not true, at least in case of POSIX functions.
All of the standard vararg functions are not mt-safe - this includes all the printf() family (1), but also every other variadic function as well (2)
sprintf() for example is: "MT-Safe locale|AS-Unsafe heap|AC-Unsafe mem" - what means, that it can fail if locale is set asynchronously or if asynchronous cancellation of threads is used. In other words, special attention must be paid when using such functions in MT environment.
va_arg is not mt-safe: |MT-Safe race:ap|AS-Safe|AC-Unsafe corrupt| - what means, that inter-locking is needed.
Additionally, what should be obvious, even totally mt-safe function can be used in unsafe way - what happens for example if two or more threads are operating the same data/memory areas.
It's not thread safe, since the buffer where you sprintf is shared between all threads.
"Do you have a refernce which says that they are not thread safe? When I Google, it seems that they are"
My previous answer to this question has been removed/deleted (why?), so I'll try again, using different approach:
AC (async. cancellation of threads): this is obviously a case when almost all of the "apparently MT-safe" code can fail, simply because the thread is interrupted at a random point of time, so none of synchronization methods are guaranted to work correctly (i.e. any form of mutex can't be really guranteed to work correctly)
Threads can use the same malloc() arena, what means, that if one of the threads will fail (i.e. it'll damage the malloc arena) then all the consecutive calls to malloc() will/can cause critical errors - this of course depends on system configuration - but it also means, that nobody should assume that malformed memory (de)allocations are safe.
Since all of the systems are providing the option to use different local settings, it is obvious, that async. change to the "locale" settings can cause errors...
Regards.

Why thread specific data is required in pthread?

all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.

C how to handle malloc returning NULL? exit() or abort()

When malloc() fails, which would be the best way to handle the error? If it fails, I want to immediately exit the program, which I would normally do with using exit(). But in this special case, I'm not quite sure if exit() would be the way to go here.
In library code, it's absolutely unacceptable to call exit or abort under any circumstances except when the caller broke the contact of your library's documented interface. If you're writing library code, you should gracefully handle any allocation failures, freeing any memory or other resources acquired in the attempted operation and returning an error condition to the caller. The calling program may then decide to exit, abort, reject whatever command the user gave which required excessive memory, free some unneeded data and try again, or whatever makes sense for the application.
In all cases, if your application is holding data which has not been synchronized to disk and which has some potential value to the user, you should make every effort to ensure that you don't throw away this data on allocation failures. The user will almost surely be very angry. It's best to design your applications so that the "save" function does not require any allocations, but if you can't do that in general, you might instead want to perform frequent auto-save-to-temp-file operations or provide a way of dumping the memory contents to disk in a form that's not the standard file format (which might for example require ugly XML and ZIP libraries, each with their own allocation needs, to write) but instead a more "raw dump" which you application can read and recover from on the next startup.
If malloc() returns NULL it means that the allocation was unsuccessful. It's up to you to deal with this error case. I personally find it excessive to exit your entire process because of a failed allocation. Deal with it some other way.
Use Both?
It depends on whether the core file will be useful. If no one is going to analyze it, then you may as well simply _exit(2) or exit(3).
If the program will sometimes be used locally and you intend to analyze any core files produced, then that's an argument for using abort(3).
You could always choose conditionally, so, with --debug use abort(3) and without it use exit.

Resources