Using non-thread safe libraries with threads. Say I have a library that makes a connection to a server. And it is non thread safe. Can I use initiate the library inside 2 threads?
ie:
thread_1(){
telnet_lib_t *connection1;
while(1){
do_somestuff
}
free_telnet(connection1);
}
thread_2(){
telnet_lib_t *connection2;
while(1){
do_somestuff;
}
free_telnet(connection2);
}
Would this work? Now I have 2 independent instances of the library running. So they would not interfere with each other, right?
No, you can't do this. If the library has no global state and its functions are just internally non-thread-safe, you could solve the problem by having the whole library protected by a mutex and only allowing one thread to access it at once (but this might be prohibitive, especially if the library performs any slow or blocking tasks). But if the library fundamentally has a singular global state it uses, and no way to save/restore/swap states, then there's simply no way to use it in multiple threads (or even to use multiple contexts in alternation in a non-threaded program). Such libraries are generally considered trash and should be replaced.
It depends on why it isn't threadsafe.
For example, if the library uses some static variable then that would still be shared between two threads.
So, generally this would be a bad idea. If something isn't threadsafe don't use it in threads, but, you could fork a child process and then use it, which is heavier than threads, but safer.
Without knowing more about the specifics of the non-thread-safe library, it's not possible to say it's safe to use as you suggest.
If the library has any global shared resource (e.g. a global variable), the two threads could well step on each other, overwriting that global variable in a manner not intended by the library writer.
The trouble is, no amount of testing can prove with certainty that you will not eventually trigger a conflict.
If you must use the library in parallel, the only safe way I can think of to do that is to use process isolation... create multiple child processes that each load an instance of the library.
You can do that only if you know that the telnet_lib_t and its methods don't work with any global state (i.e. they are not relying on global variables). If you know that the library's state is contained within itself, for sure then use it, otherwise don't do it. Even if you don't run into any issue during your testing, it won't mean there isn't a problem lurking somewhere.
Related
Can NIF implementations use the regular C/C++ thread locking primitives or must they use the NIF APIs (enif_mutex_lock(..), enif_mutex_create(..), etc.)
From nif docs
Threads and concurrency
A NIF is thread-safe without any explicit synchronization as long as it acts as a pure function and only reads the supplied arguments. As soon as you write towards a shared state either through static variables or enif_priv_data you need to supply your own explicit synchronization. This includes terms in process independent environments that are shared between threads. Resource objects will also require synchronization if you treat them as mutable.
So there is nothing forbidding you from doing whatever you want. You easily can write your own mutex/semafors and what not. And you can do it in C or C++ or Rust.
That said, there is nothing preventing from braking everything. If you break anything you break it in whole VM. I would try to use standard Erlang ways of doing things, especially while paling with threads. Those are verified methods, and I haven't found any reason for replacing them with anything else.
I am writing a shared library in C. I know C functions are not thread safe.
My library routines looks like,
struct lib_handle {
....
};
int lib_init(lib_handle **handle);
int lib_process(lib_handle *handle);
....
....
Every method takes a pointer to lib_handle object. All the state is stored inside this structure. No global variables are used.
I assume if each thread creates it's own lib_handle instances, multiple threads can use the library functions. Since each thread has it's own handle, everythibg should work.
I haven't validated this assumption yet. I am wondering what you guys think about this design and do you thing I can state my library as thread safe given each thread has it's own handles?
Any help would be great!
That will make data/state of library thread safe.
But you also have to make sure that your library uses threadsafe functions from other libraries, e.g. use strtok_r instead of strtok.
Threads works in shared memory space. Unsafe objects are the objects which can be accessed by multiple threads simulteniously. So if you have single lib_handle object for each threads there will be no problems.
If each thread has a private lib_handle object your library should be fully threadsafe; if you let several threads share lib_handle objects the person using your library can still makea thread safe program if she uses your library correctly (i.e. your library is not inherently thread-unsafe which it would be if you used e.g. global variables).
If this mode of operation (shared lib_handle) is interesting you should clearly separate the functions which only read the state of lib_handle and those which manipulate the state of lib_handle. The former needing a read lock and the latter needing a write lock (the calling scope must handle this).
For what it is worth I have used the pattern you describe quite a lot, and like it.
How does one modify a threads data from outside a thread?
If a thread is running a function that loops for the runtime of the application, how can its data be set, changed?
How does one call functions which modify a specific threads functions?
Where do these functions belong?
The advantage and disadvantage of threads is that they share the memory space with every other thread in the process. You can use any form of data transfer you would use in single threaded applications to pass data betweens segments of you application. However, in a multi-threaded application you must use some type of synchronization to assure data integrity and prevent deadlocks.
If the "thread's data" you want to modify from outside is in the form of local variables in a function running in the thread, or thread-specific data created with the __thread extension, then the only way you can modify them from outside (modulo code with UB that's technically just trashing memory) is by having the thread take the addresses of its variables and store that somewhere where other threads can see it (either in a global variable, or at a location passed in via the thread start function's void * argument.
Also note that, as rerun pointed out, you have to use some method of synchronization if multiple threads are accessing the same data. The only standard/portable synchronization methods are the pthread ones: pthread_mutex_lock etc., but you can also use assembly or compiler intrinsics (like __sync_* in gcc).
I want to write a high performance synchronized generator in C. I want to be able to feed events to it and have multiple threads be able to poll/read asynchronously, such that threads never receive duplicates.
I don't really know that much about how synchronization is typically done. Can someone give me a high level explanation of one or more techniques that I might be able to use?
Thanks!
You need a thread implementation; C does not have any built-in support for multiprocessing concepts. Threads are thus often implemented as libraries. Such a library will typically provide you with ways to synchronize the execution of multiple threads, ways to protect data, and so on.
The main concept in thread safety is the Mutex (though there is different kind of locks).
It is used to protect your memory from multiple accesses and race conditions.
A good example of its use would be when using a Linked List. You can't allow two different threads to modify it in the same time. In your example, you could possibly use a linked-list to create a queue, and each thread would consume some data from it.
Obviously there are other synchronization mechanisms, but this one is (by far ?) the most important.
You could have a look at this page (and referenced pages at the bottom) for more implementation details.
Thread-safe will be the problem when there are shared variables between threads. If you don't have any shared variables, it's not a problem. Every event can be readonly and disptaching to listeners randomly.
Thread safety is achieved by using whatever synchronisation primitives the multithreading implementation provides.
Your start point would probably be a linked list of events, a lock that protects it, and every thread takes the lock, consumes one event by adjusting the pointer to the first event and then releases the lock; appending events also locks the entire list. When the list is empty, the workers exit.
From there, various optimisations are possible:
Caching the pointer to the last event, so appending an event to the list becomes cheaper.
Adding a notification mechanism so worker threads can sleep while the list is empty. Typically, this is achieved with something called a condition variable.
Using multiple lists, so if the first list is locked, the worker can retrieve an event from another list without having to wait for the thread that has currently locked the list.
When trying to implement an asynchronous API calls / Non-blocking calls, I know a little in a All Plain-C application I have, I read a about APM (Asynchronous Programming Model) by 'Delegates'. Basically what I want to do is call one API f1() to do a functionality(which takes long time 8-10 seconds), So I call that API f1(), forget about it, and continue doing some other work, e.g. I/O for to fetch data for next call of the f1() or some functionality not dependent on result of f1().
If any one has used that APM model of programming, I am looking at some concise explanation for implementing non-blocking calls.
Is there any other way of implementing asynchronous APIs , any other library/framework which might help in this?
You basically need to create a multi-threaded (or multi-process) application. The f1() API needs to spawn a thread (or process) to process the data in a separate execution space. When it completes, the f1() routine needs to signal the main process that the execution is done (signal(), message queues, etc).
A popular way to do asynchronous programming in a plain C programs is to use an "event loop". There are numerous libraries that you could use. I suggest to take a look at
glib.
Another alternative is to use multiple pre-emptive threads (one for each concurrent operation) and synchronize them with mutexes and condition variables. However, pre-emptive threading in plain C is something I would avoid, especially if you want to write portable programs. It's hard to know which library functions are re-entrant, signal handling in threaded programs is a hassle, and in general C libraries and system functions have been designed for single-threaded use.
If you're planning to run your application only on one platform (like Windows) and the work done with f1() is a relatively simple thing, then threading can be OK.
If the function f1() which you are referring to is not itself implemented in a asynchronous fashion, you will need to wrap it up in its own thread yourself. When doing this, you need to be careful with regards to side effects that may be caused by that particular function being called. Many libraries are not designed in a thread-safe way and multiple concurrent invocations of functions from such libraries will lead to data corruption. In such cases, you may need to wrap up the functionality in an external worker process. For heavy lifting that you mention (8-10 seconds) that overhead may be acceptable. If you will only use the external non-threadsafe functions in one thread at a time, you may be safe.
The problem with using any form of event-loop is that an external function which isn't aware of your loop will never yield control back to your loop. Thus, you will not get to do anything else.
Replace delegates with pointers to functions in C, everything else is basically same to what you have read.
Well. Basically I've seen 2 types of async API:
Interrupt. You give a call a callback which should be performed after the call. GIO (part of previously mentioned GLib) works in such a way. It is relatively easy to program with but you usually have the thread in which the callback will be run changed (except if it is integrated with the main loop as in the case of GIO).
Poll. You check if the data is available. The well-known BSD Sockets operate in such a manner. It has an advantage of not necessarily being integrated with the main loop and running callback in a specific thread.
If you program for Gnome or Gtk+-based I'd like to add that GTask seems to be a very nice (potentially nice? I haven't used it). Vala will have better support for GIO-like async calls.