Camel: using unmarshal().tidyMarkup() from multiple threads - apache-camel

It's apparently thread safe. However, can anyone tell me if it's locked with single instance of TidyMarkupDataFormat or separate instances are created for separate threads?
I mean do we have multiple parsers (one per one thread) or a single parser shared among all threads.

I just checked the code of TidyMarkupDataFormat, it creates parer per unmarshalrequest.
So I think it should be thread safe, as no parer is shared in different threads.

Related

Comparing two remote threads?

I am implementing an application which executes two programs in lockstep. Each system call is a synchronization point. An application might have more than one thread, thus I need to identify unequivocally each of them in order to synchronize the execution of a thread from the first application with the execution of the same thread in the second application.
Is there a way to identify if two remote threads are executing the same code or function?
Every suggestion is welcomed!! :D
Well it's hard to say not knowing how you plan to do this synchronization. Are the two program communicating to each other and/or a third monitoring pgm?
In any case there are at least 3 possibilities:
Use an associative container like a map in the two programs (or the 3rd) that matches up
the pthread thread ids from the two programs (e.g. pthread_self() to get the tids)
the linux thread ids (e.g. gettid())
Or you can possibly make use of pthread_setname_np() and pthread_getname_np(). You can use these to give each thread in both programs the same name and maybe that becomes useful in some messaging scenario. You might also make use of the __FILE__, __LINE__ and __FUNCTION__ (__func__ in c99) macros in conjuction with the thread name if you are messaging.
That's my (black box) suggestions!

Should I use an access control on a shared structure between threads?

I'm writing a small piece of software in C using the pthread library.
I have a bunch of threads that need write access to a shared structure containing a dynamically allowed char array but I can guarantee that two different threads will never try to access the same element of this array. My question is: should I use something like a semaphore or a mutex or isn't it necessary?
If your threads only read information, then no lock is needed.
If your threads modify information other threads don't see, no lock is needed.
If there is a single place that can be modified by one thread and used by others, you need to use a mutex.
In your case the data are not shared between the threads and since the data is not shared between the threads, no synchronization mechanismn is required.
Well I think you answered the question yourself!
The purpose of mutexes is to protect against concurrent access of different threads on some resources. If you can guarantee that by design your threads will never concurrently access (read or write) the same memory area then you don't need mutex protection.

Linux Scheduling: OS vs "virtual"

How does one implement a multithreaded single process model in linux fedora under c where a single scheduler is used on a "main" core reading i/o availability (ex. tcp/ip, udp) then having a single-thread-per-core (started at init), the "execution thread", parse the data then update a small amount of info update to shared memory space (it is my understanding pthreads share data under a single process).
I beleive my options are:
Pthreads or the linux OS scheduler
I have a naive model in mind consisting of starting a certain number of these execution threads a single scheduler thread.
What is the best solution one could think when I know that I can use this sort of model.
Completing Benoit's answer, in order to communicate between your master and your worker threads, you could use conditional variable. The workers do something like:
while (true)
{
pthread_mutex_lock(workQueueMutex);
while (workQueue.empty())
pthread_cond_wait(workQueueCond, workQueueMutex);
/* if we get were then (a) we have work (b) we hold workQueueMutex */
work = pop(workQueue);
pthread_mutex_unlock(workQueueMutex);
/* do work */
}
and the master:
/* I/O received */
pthread_mutex_lock(workQueueMutex);
push(workQueue, work);
pthread_cond_signal(workQueueCond);
pthread_mutex_unlock(workQueueMutex);
This would wake up one idle work to immediately process the request. If no worker is available, the work will be dequeued and processed later.
Modifying the Linux scheduler is quite a tough work. I would just forget about it. Pthread is usually prefered. If I understand well, you want to have one core dedicated to the control plan, and a pool of other cores dedicated to the data plan processing? Then create a pool of threads from your master thread and setup core affinity for these slave threads with pthread_setaffinity_np(...).
Indeed threads of a process share the same address-space, and global variables are accessible by any threads of that process.
It looks to me that you have a version of the producer-consumer problem with a single consumer aggregating the results of n producers. This is a pretty standard problem, so I definitely think that pthread is more than enough for you. You don't need to go and mess around with the scheduler.
As one of the answer's states, a thread safe queue like the one described here works nicely for this sort of issue. Your original idea of spawning a bunch of threads is a good idea. You seem to be worried that the ability of the threads to share global state will cause you problems. I don't think that this is an issue if you keep shared state to a minimum and use sane locking discipline. Sharing state is fine as long as you do so responsibly.
Finally, unless you really know what you're doing, I would advise against manually messing with thread affinity. Just spawn the threads and let the scheduler handle when and on what core a thread runs. The thing to optimize is the number of threads you use. One for each core may not actually be the fastest approach if other threads are running.
Generally speaking, this is more or less exactly what the posix select and linux specific epoll functions are for.

Can I keep threads alive and give them other workloads?

Suppose I create threads with pthreads, is it possible to send them new things to work on after they have been initialized, so I don't waste resources in creating new threads? For instance, I create 3 threads, thread 2 signals completion and I send it another "task" without killing it and starting a new one. Thanks.
The usual, simple form is an ordinary (work) queue. In principle, you maintain a queue structure, perhaps as a linked list, protected by a mutex. Typically, condition variables are used by the main/producer threads to notify worker threads that new work is available, so they don't have to poll.
Some previous SO questions that may also be useful are:
How To Use Condition Variable
One producer, Two consumers and usage of pthread_cond_signal & pthread_mutex_lock
pthread conditional variable
Yes, and that is what servers like Apache do to increase their performance. The design pattern is called the Thread pool pattern and there are various implementations (this one for example) using pthreads.
Of course, you might want to keep your implementation as simple as possible, depending on what your goals are.
Of course. For example, you can use producer-consumer pattern. Here is an example in C#, but it can be easily implemented in pthreads as well.
The search-keyword to your question is "thread pooling" or "thread pool". Using this terms you will find plenty information on this site and also in Google.

Thread-safety in C?

I want to write a high performance synchronized generator in C. I want to be able to feed events to it and have multiple threads be able to poll/read asynchronously, such that threads never receive duplicates.
I don't really know that much about how synchronization is typically done. Can someone give me a high level explanation of one or more techniques that I might be able to use?
Thanks!
You need a thread implementation; C does not have any built-in support for multiprocessing concepts. Threads are thus often implemented as libraries. Such a library will typically provide you with ways to synchronize the execution of multiple threads, ways to protect data, and so on.
The main concept in thread safety is the Mutex (though there is different kind of locks).
It is used to protect your memory from multiple accesses and race conditions.
A good example of its use would be when using a Linked List. You can't allow two different threads to modify it in the same time. In your example, you could possibly use a linked-list to create a queue, and each thread would consume some data from it.
Obviously there are other synchronization mechanisms, but this one is (by far ?) the most important.
You could have a look at this page (and referenced pages at the bottom) for more implementation details.
Thread-safe will be the problem when there are shared variables between threads. If you don't have any shared variables, it's not a problem. Every event can be readonly and disptaching to listeners randomly.
Thread safety is achieved by using whatever synchronisation primitives the multithreading implementation provides.
Your start point would probably be a linked list of events, a lock that protects it, and every thread takes the lock, consumes one event by adjusting the pointer to the first event and then releases the lock; appending events also locks the entire list. When the list is empty, the workers exit.
From there, various optimisations are possible:
Caching the pointer to the last event, so appending an event to the list becomes cheaper.
Adding a notification mechanism so worker threads can sleep while the list is empty. Typically, this is achieved with something called a condition variable.
Using multiple lists, so if the first list is locked, the worker can retrieve an event from another list without having to wait for the thread that has currently locked the list.

Resources