is sprintf thread safe? - c

is sprintf thread safe ?
//Global log buffer
char logBuffer[20];
logStatus (char * status, int length)
{
snprintf(logBuffer, 19, status);
printf ("%s\n", logBuffer);
}
The thread safety of this function totally depends upon the thread safety of snprintf/sprintf .
Updates :
thanks for ur answers .
i dont mind, if the actual contents gts messed up. but want to confirm that the sprintf would not cause a memory corruption / buffer overflow going beyond 20 bytes in this case, when multiple threads are trying to write to logBuffer ?

There is no problem using snprintf() in multiple threads. But here you are writing to a shared string buffer, which I assume is shared across threads.
So your use of this function would not be thread safe.

Your question has an incorrect premise. Even if sprintf itself can be safely called from multiple threads at the same time (as I sure hope it can), your code is not protecting your global variable. The standard library can't possibly help you there.

You have several problems with your code.
Your usage of snprintf is very suspicious. Don't use it just to
copy a string. Generally don't pass dynamically allocated strings
with whatever content as format to any of the printf functions.
They interpret the contents and if there is anything in them that
resembles a %-format, you are doomed.
Don't use static buffers as you do. This is certainly neither
thread safe not re-entrant.
Either use printf with an appropriate format directly, or replace
the call by puts.
Then, Linux adheres to the POSIX standard, which requires that the standard IO functions are thread safe.

Regarding your update about not worrying if the logBuffer content get garbled:
I'm not sure why you want to avoid making your function completely thread safe by using a locally allocated buffer or some synchronization mechanism, but if you want to know what POSIX has to say about it, here you go (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11):
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. [followed by a list of functions which provide synchronization]
So, POSIX says that your program needs to make sure mutilple threads won't be modifying logBuffer concurrently (or modifying logBuffer in one thread while reading it in another). If you don't hold to that, there's no promise made that the worst that will happen is garbled data in logBuffer. There's simply no promise made at all about what the results will be. I don't know if Linux might document a more specific behavior, but I doubt it does.

"There is no problem using snprintf() in multiple threads."
Not true.
Not true, at least in case of POSIX functions.
All of the standard vararg functions are not mt-safe - this includes all the printf() family (1), but also every other variadic function as well (2)
sprintf() for example is: "MT-Safe locale|AS-Unsafe heap|AC-Unsafe mem" - what means, that it can fail if locale is set asynchronously or if asynchronous cancellation of threads is used. In other words, special attention must be paid when using such functions in MT environment.
va_arg is not mt-safe: |MT-Safe race:ap|AS-Safe|AC-Unsafe corrupt| - what means, that inter-locking is needed.
Additionally, what should be obvious, even totally mt-safe function can be used in unsafe way - what happens for example if two or more threads are operating the same data/memory areas.

It's not thread safe, since the buffer where you sprintf is shared between all threads.

"Do you have a refernce which says that they are not thread safe? When I Google, it seems that they are"
My previous answer to this question has been removed/deleted (why?), so I'll try again, using different approach:
AC (async. cancellation of threads): this is obviously a case when almost all of the "apparently MT-safe" code can fail, simply because the thread is interrupted at a random point of time, so none of synchronization methods are guaranted to work correctly (i.e. any form of mutex can't be really guranteed to work correctly)
Threads can use the same malloc() arena, what means, that if one of the threads will fail (i.e. it'll damage the malloc arena) then all the consecutive calls to malloc() will/can cause critical errors - this of course depends on system configuration - but it also means, that nobody should assume that malformed memory (de)allocations are safe.
Since all of the systems are providing the option to use different local settings, it is obvious, that async. change to the "locale" settings can cause errors...
Regards.

Related

cgo Interacting with C Library that uses Thread Local Storage

I'm in the midst of wrapping a C library with cgo to be usable by normal Go code.
My problem is that I'd like to propagate error strings up to the Go API, but the C library in question makes error strings available via thread-local storage; there's a global get_error() call that returns a pointer to thread local character data.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
The problem that I foresee here is that TLS in C works on the level of native OS threads, but in my understanding, the calling Go code will be coming from one of potentially N goroutines that are multiplexed across some number of underlying native threads in a thread pool managed by the Go scheduler.
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. When the original goroutine gets swapped back in, it could potentially be on a different native thread for all I know, but even if it gets swapped back onto the same thread, any goroutines that ran there in the intervening time could've changed the state of the TLS, causing me to load an error string for an unrelated call.
My questions are these:
Is this a reasonable concern? Am I misunderstanding something about the go scheduler, or the way it interacts with cgo, that would cause this to not be an issue?
If this is a reasonable concern, how can I work around it?
cgo somehow manages to propagate errno values back to the calling Go code, which are also stored in TLS, which makes me think there must be a safe way to do this.
I can't think of a way that the C code itself could get preempted by the go scheduler, so should I introduce a wrapper C function and have IT make the necessary call and then conditionally copy the error string before returning back up to goland?
I'm interested in any solution that would allow me to propagate the error strings out to the rest of Go, but I'm hoping to avoid any solution that would require me to serialize accesses around the TLS, as adding a lock just to grab an error string seems greatly unfortunate to me.
Thanks in advance!
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. ...
Is this a reasonable concern?
Yes. The cgo "call C code" wrappers lock on to one POSIX / OS thread for the duration of each call, but the thread they lock is not fixed for all time; it does in fact bop around, as it were, to multiple different threads over time, as long as your goroutines are operating normally. (Since Go is cooperatively scheduled in the current implementations, you can, in some circumstances, be careful not to do anything that might let you switch underlying OS threads, but this is probably not a good plan.)
You can use runtime.LockOSThread here, but I think the best plan is otherwise:
how can I work around it?
Grab the error before Go resumes its normal scheduling algorithm (i.e., before unlocking the goroutine from the C / POSIX thread).
cgo somehow manages to propagate errno values ...
It grabs the errno value before unlocking the goroutine from the POSIX thread.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
If there is a variant of this that takes the error number (rather than fishing it out of a TLS variable), that plan should still work: just make sure that your C routines provide both the return value and the error number.
If not, write your own C wrapper, just as you suggested:
ftype wrapper_for_realfunc(char **errp, arg1type arg1, arg2type arg2) {
ftype ret = realfunc(arg1, arg2);
if IS_ERROR(ret) {
*errp = get_error();
} else {
*errp = NULL;
}
return ret;
}
Now your Go wrapper simply calls the wrapper, which fills in a pointer to C memory with an extra *C.char argument, setting it to nil if there is no error, and setting it to something on which you can use C.GoString if there is an error.
If that's not feasible for some reason, consider using runtime.LockOSThread and its counterpart, runtime.UnlockOSThread.

GLib handle out of memory

I've a question concerning the GLib.
I would like to use the GLib in a server context but I'm not aware on how the memory is managed:
https://developer.gnome.org/glib/stable/glib-Memory-Allocation.html
If any call to allocate memory fails, the application is terminated. This also means that there is no need to check if the call succeeded.
If I look at the source code, if g_malloc failed, it will call g_error:
g_error()
define g_error(...)
A convenience function/macro to log an error message.
Error messages are always fatal, resulting in a call to abort() to terminate the application.[...]
But in my case, as I'm developing a server application, I don't want the application exit, I would prefer, as the traditional malloc function, the GLib functions returns NULL or something to indicate an error happened.
So, my question is, there is a manner to handle out of memory?
Is the GLib not recommended for server purpose applications?
If I look at the man of abort I can see that I can handle the signal but I'll make the management of out-of-memory errors a little bit painful...
The abort() function causes abnormal program termination to occur, unless
the signal SIGABRT is being caught and the signal handler does not
return.
Thanks for you help!
It's very difficult to recover from lack of memory. The reason for that is that it can be considered a terminal state, in the sense that lack of memory will persist for some time before it goes away. Even reacting to the lack of memory (like informing the user) might require more memory, for example, to build and send a message. A related problem is that there are operating systems (linux at least) that may be over optimistic about allocating memory. When the kernel realizes that memory is missing, it may kill the application, even if your code is handling the failures.
So either you have a much stricter grasp of your whole system than average, or you won't be able to successfully handle out of memory errors, and, in this case, it doesn't matter what the helper library is doing.
If you really want to control memory allocation while still using glib, you have partial ways to do that. Don't use any glib allocation function and use some from other library. Glib provides functions that receive a "free function" when necessary. For example:
https://developer.gnome.org/glib/2.31/glib-Hash-Tables.html#g-hash-table-new-full
The hash table constructor accepts functions for destroying both keys and values. In your case, the data will be allocated using custom allocation functions, while the hash data structures will be allocated with glib functions.
Alternatively you could use g_try_* macros to allocate memory, so you still use glib allocator, but it won't abort on error. Again, this only partially solves the problem. Internally, glib will implicitly call functions that may abort and it assumes it will never return on error.
About the general question: does it make sense for a server to crash when it's out of memory ? The obvious answer is no, but I can't estimate how theoretical this answer is. I can only expect that the server system be properly sized for its operation and reject as invalid any input that could potentially exceed its capacities, and for that, it doesn't matter which libraries it might use.
I'm probably editorializing a bit here, but the modern tendency to use virtual/logical memory (both names have been used, although "logical" is more distinct) does dramatically complicate knowing when memory is exhausted, although I think one can restore the old, real-(RAM + swap) model (I'll call this the physical model) in Linux with the following in /etc/sysctl.d/10-no-overcommit.conf:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100
This restores the ability to have the philosophy that if a program's malloc just failed, that program has a good chance of having been the actual cause of memory exhaustion, and can then back away from the construction of the current object, freeing memory along the way, possibly grumbling at the user for having asked for something crazy that needed too much RAM, and awaiting the next request. In this model, most OOM conditions resolve almost instantly - the program either copes and presumably returns RAM, or gets killed immediately on the following SEGV when it tries to use the 0 returned by malloc.
With the virtual/logical memory models that linux tends to default to in 2013, this doesn't work, since a program won't find memory isn't available at malloc, but instead upon attempting to access memory later at which point the kernel finally realizes there's nowhere in RAM for it. This amounts to disaster, since any program on the system can die, rather than the one the ran the host out of RAM. One can understand why some GLib folks don't even care about trying to fix this problem, because with the logical memory model, it can't be fixed.
The original point of logical memory was to allow huge programs using more than half the memory of the host to still be able to fork and exec supporting programs. It was typically enabled only on hosts with that particular usage pattern. Now in 2013 when a home workstation can have 24+ GiB of RAM, there's really no excuse to have logical memory enabled at all 99% of the time. It should probably be disabled by default on hosts with >4 GiB of RAM at boot.
Anyway. So if you want to take the old-school physical model approach, make sure your computer has it enabled, or there's no point to testing your malloc and realloc calls.
If you are in that model, remember that GLib wasn't really guided by the same philosophy (see http://code.google.com/p/chromium/issues/detail?id=51286#c27 for just how madly astray some of them are). Any library based on GLib might be infected with the same attitude as well. However, there may be some interesting things one can do with GLib in the physical memory model by emplacing your own memory handlers with g_mem_set_vtable(), since you might be able to poke around in program globals and reduce usage in a cache or the like to free up space, then retry the underlying malloc. However, that's limited in its own way by not knowing which object was under construction at the point your special handler is invoked.

Are memmove and malloc thread safe?

I have some code that moves bytes in a buffer using memmove(). The buffer is accessed by multiple threads. I get a very weird behavior; sometimes the buffer it's not what it should be and I was thinking if memmove() or/and malloc() are thread safe. I'm working on iOS (in case this is platform dependent).
In an implementation that provides threads, malloc will normally be thread safe (i.e., it will take steps to assure the heap doesn't get corrupted, even if malloc gets called from multiple threads). The exact way it will do that varies: some use a single heap, and internal synchronization to ensure against corruption. Others will use multiple heaps, so different threads can allocate memory simultaneously without collisions.
memmove will normally be just like if you did assignments in your own code -- if you're sharing a buffer across threads, it's your responsibility to synchronize access to that data.
You should be using a mutex (NSLock) as a protective barrier around accessing your buffer. Take a look at Synchronization in Apple's Threading Programming Guide.
Malloc may be thread-safe. The standard doesn't require it, but many C compilers are used in systems whose applications requires thread safety, and your particular compiler's library may be thread safe, or offer a thread safe option. I don't know about iOS.
Memmove (or any other kind of block move) is not thread safe, any more than an assignment statement is thread safe.
Since the current C standard does not specify threads, it has nothing to say about thread safety. Whenever you have threads, you're dealing with a system that has placed further requirements, beyond the basic C language standard's requirements, on how the standard library functions behave. I'm not sure what requirements iOS makes, but POSIX and Windows both require malloc to be thread-safe, and I'd find it hard to believe any system designed after the mid 90s would not make this requirement.
Note that the upcoming C1x standard will specify threads, and if the implementation has threads, then malloc will be required to be thread-safe.
No, since the standard C library doesn't have a concept of threads, the functions it defines can't be.

Why thread specific data is required in pthread?

all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.

C : how pthread dataspecific works?

I'm not sure about how pthread dataspecific works : considering the next code (found on the web), does this means i can create for example 5 threads in the main, have a call to func in only some of them (let's say 2) those threads would have the data 'key' set to something (ptr = malloc(OBJECT_SIZE) ) and the other threads would have the same key existing but with a NULL value?
static pthread_key_t key;
static pthread_once_t key_once = PTHREAD_ONCE_INIT;
static void
make_key()
{
(void) pthread_key_create(&key, NULL);
}
func()
{
void *ptr;
(void) pthread_once(&key_once, make_key);
if ((ptr = pthread_getspecific(key)) == NULL) {
ptr = malloc(OBJECT_SIZE);
...
(void) pthread_setspecific(key, ptr);
}
...
}
Some explanation on how dataspecific works and how it may have been implemented in pthread (simple way) would be appreciated!
Your reasoning is correct. These calls are for thread-specific data. They're a way of giving each thread a "global" area where it can store what it needs, but only if it needs it.
The key is shared among all threads, since it's created with pthread_once() the first time it's needed, but the value given to that key is different for each thread (unless it remains set to NULL). By having the value a void* to a memory block, a thread that needs thread-specific data can allocate it and save the address for later use. And threads that don't call a routine that needs thread-specific data never waste memory since it's never allocated for them.
The one area where I have used them is to make a standard C library thread-safe. The strtok() function (as opposed to a thread-safe strtok_r() which was considered an abomination when we were doing this) in an implementation I was involved in used almost this exact same code the first time it was called, to allocate some memory which would be used by strtok() for storing information for subsequent calls. These subsequent calls would retrieve the thread-specific data to continue tokenizing the string without interfering with other threads doing the exact same thing.
It meant users of the library didn't have to worry about cross-talk between threads - they still had to ensure a single thread didn't call the function until the last one had finished but that's the same as with single-threaded code.
It allowed us to give a 'proper' C environment to each thread running in our system without the usual "you have to call these special non-standard re-entrant routines" limitations that other vendors imposed on their users.
As for implementation, from what I remember of DCE user-mode threads (which I think were the precursor to the current pthreads), each thread had a single structure which stored things like instruction pointers, stack pointers, register contents and so on. It was a very simple matter to add one pointer to this structure to achieve very powerful functionality with minimal cost. The pointer pointed to a array (linked list in some implementations) of key/pointer pairs so each thread could have multiple keys (e.g., one for strtok(), one for rand()).
The answer to your first question is yes. In simple terms, it allows each thread to allocate and save its own data. This is roughly equivalent to w/o each thread simply allocating and passing around its own data structure. The API saves you the trouble of passing the thread-local structure to all subfunctions, and allows you to look it up on demand instead.
The implementation really doesn't matter all that much (it may vary per-OS), as long as the results are the same.
You can think of it as a two-level hashmap. The key specifies which thread-local "variable" you want to access, and the second level might perform a thread-id lookup to request the per-thread value.

Resources