Custom Attributes for POSIX Threads

Custom Attributes for POSIX Threads - c

Is it possible to add a custom attribute (i.e. name, mutex block level, etc) to a POSIX thread? The idea is to manipulate information attached to a thread context.

At first I thought of thread-local storage (TLS). But perhaps you want to do this from outside the thread ... if so, TLS won't work since it's only valid for code running inside the thread.
But since you have a unique identifier for all threads (the threadid) you should be able to use any dictionary-type data structure with that as the key.

Related

Concurrent modification of DefaultExchange

I have to ensure that the thread that runs a route (and the error handler) is guaranteed to see an exception set on the exchange instance if that instance is shared by many threads and the exception is set by one of those threads.
I have a route step that massages the data in an input stream (jetty endpoint) before proxying to the actual web server. For the stream data manipulation I use stream pipelines (PipedInputStream/PipedOutputStream) which need to run in their own thread per pipeline element. Each pipeline element holds a reference to the exchange and if an error is encountered the exception is set on the exchange.
This appears to work alright. But I don't think it is guaranteed to work as exception is not a volatile member of DefaultExchange. Thus the main thread is not guaranteed to see an exception set on the exchange by a worker thread. The worker threads themselves all use a ReentrantLock I provide as an exchange property (the property map is a ConcurrentHashMap) to synchronize their access to the exchange. So data visibility between the worker threads should not be an issue.
But how to do it for the main thread?
I had a look at the Splitter implementation and how it deals with parallel execution and aggregation. As far as I understand, the Splitter enforces visibility by returning an AtomicReference of an exchange that the main thread has never seen before. The exchange referenced by the atomic reference is treated as read-only after it is set on the atomic reference. So a get() on the atomic reference in the main thread is guaranteed to see the latest version of the referenced exchange instance.
I cannot apply this approach as it requires the main thread to wait for the workers to finish processing. If I block the main thread my workers would be blocked as well as the main thread is responsible for consuming the modified input stream contents from the last PipedInputStream element and forwarding it to the web server.
I also could not find a factory mechanism that would allow me to tell Camel to instantiate my own implementation of the Exchange interface (with a volatile member "exception").
(DefaultException was also not written with sub-classing in mind, it appears. E.g. DefaultException::isFailed() uses the private member exception instead of DefaultException::getException() for the answer. But that's a separate issue.)
Does anyone have any other ideas?
(Cross-posted here.)

Will this implementation using TPL work inside a process running on an STA thread?

I want to create an instance of an object from an assembly that implements an interface I define in my Forms app. I would create this object at app startup using Activator.CreateInstance, and keep an application-level reference to it.
At certain points during this application I want to call methods on this object without holding up the main thread using Task.Run(() => IMyObject.DoSomeWork(someList, someList2)). I just want to make a "fire and forget" void method call and I don't need to await or even register callbacks.
Will the fact that the app is running in an STA thread pose an issue? Do I have to worry about leaks or premature collection of objects I instantiate on the main thread and reference inside the task closure? I only intend to read the contents of these lists, not modify them.

No need to worry; as soon as you create the delegate, all the objects it references will be kept in memory, at least until the Task.Run exits. There's nothing that an STA thread does that changes that.
Threads don't factor into GC at all - except that all stacks for running threads contain root objects. You can cross-reference objects however you want and it won't confuse the GC.

Differences between events and semaphores

I already searched for this subject but couldn't understand it very well. What are the main differences between events and semaphores?

An event generally has only two states, unsignaled or signaled. A semaphore has a count, and is considered unsignaled if the count is zero, and signaled if the count is not zero. In the case of Windows, ReleaseSemaphore() increments a semaphore count, and WaitForSingleObject(...) with a handle of a semaphore will wait (unless the timeout parameter is set to zero) for a non-zero count, then decrement the count before returning.

Do you need do know it in a specific context? That would help to make it better understandable.
Typically a semaphore is some token that must be obtained to execute an action, e.g. lock on an execution unit that is protected from concurrent access.
Events are functions in a message/subscriber pattern.
So they are somewhat related but not even comparable.
A typical confusing/complex scenario that you may face is that one event triggers two different subscribers, that than want simultaneous access to some resource. They should request for a semaphore token and release it after use to let the other subscriber have a go.

Why should I use thread-specific data?

Since each thread has its own stack, its private data can be put on it. For example, each thread can allocate some heap memory to hold some data structure, and use the same interface to manipulate it. Then why thread-specific data is helpful?
The only case that I can think of is that, each thread may have many kinds of private data. If we need to access the private data in any function called within that thread, we need to pass the data as arguments to all those functions, which is boring and error-prone.

Thread-local storage is a solution for avoiding global state. If data isn't shared across threads but is accessed by several functions, you can make it thread-local. No need to worry about breaking reentrancy. Makes debugging that much easier.
From a performance point of view, using thread-local data is a way of avoiding false sharing. Let's say you have two threads, one responsible for writing to a variable x, and the other responsible for reading from a variable y. If you were to define these as global variables, they could be on the same cache line. This means that if one of the threads writes to x, the CPU will update the cache line, and this of course includes the variable y, so cache performance will degrade, because there was no reason to update y.
If you used thread-local data, one thread would only store the variable x and the other would only store the variable y, thus avoiding false sharing. Bear in mind, though, that there are other ways to go about this, e.g. cache line padding.

Unlike the stack (which, like thread-local data is dedicated to each thread), thread-local data is useful because it persists through function calls (unlike stack data which may already be overwritten if used out of its function).
The alternative would be to use adjacent pieces of global data dedicated to each thread, but that has some performance implications when the CPU caches are concerned. Since different threads are likely to run on different cores, such "sharing" of a global piece of data may bring some undesirable performance degradation because an access from one core may invalidate the cache-line of another, with the latter contributing to more inter-core traffic to ensure cache consistency.
In contrast, working with thread-local data should conceptually not involve messing up with the cache of other cores.

Think of thread local storage as another kind of global variable. It's global in the sense that you don't have to pass it around, different code can access it as they please (given the declaration of course). However, each different thread has its own separate variable. Normally, globals are extra bad in multithreaded programming bacause other threads can change the value. If you make it thread local, only your thread can see it so it is impossible for another thread to unexpectedly change it.
Another use case is when you are forced to use a (badly designed) API that expects you to use global variables to carry information to callback functions. This is a simple instance of being forced into a global variable, but using thread local storage to make it thread safe.

Well, I've been writing multithreaded apps for 30 odd years and have never, ever found any need to use TLS. If a thread needs a DB connection that the DB binds to the thread, the thread can open one of its own and keep it on the stack. Since threads cannot be called, only signaled, there is no problem. Every time I've ever looked at this magic 'TLS', I've realized it's not a solution to my problem.
With my typical message-passing design, where objects are queued in to threads that never terminate, there is just no need for TLS.
With thread-pools it's even more useless.
I can only say that using TLS=bad design. Someone put me right, if you can :)

I've used thread local storage for database connections and sometimes for request / response objects. To give two examples, both from a Java webapp environment, but the principles hold.
A web app might consist of a large spread of code that calls various subsystems. Many of these might need to access the database. In my case, I had written each subsystem that required the db to get a db connection from a db pool, use the connection, and return the connection to the pool. Thread-local storage provided a simpler alternative: when the request is created, fetch a db connection from the pool and store it in thread-local storage. Each subsystem then just uses the db connection from thread-local storage, and when the request is completing, it returns the connection to the db pool. This solution had performance advantages, while also not requiring me to pass the db connection through every level: ie my parameter lists remained shorter.
In the same web app, I decided in one remote subsystem that I actually wanted to see the web Request object. So I had either to refactor to pass this object all the way down, which would have involved a lot of parameter passing and refactoring, or I could simply place the object into Thread Local storage, and retrieve it when I wanted it.
In both cases, you could argue that I had messed up the design in the first place, and was just using Thread Local storage to save my bacon. You might have a point. But I could also argue that Thread Local made for cleaner code, while remaining thread-safe.
Of course, I had to be very sure that the things I was putting into Thread Local were indeed one-and-only-one per thread. In the case of a web app, the Request object or a database connection fit this description nicely.

I would like to add on the above answers, that as far as I know, performance wise, allocation on stack is faster than allocation on heap.
Regarding passing the local data across calls , well - if you allocate on heap, you will need to pass the pointer / reference (I'm a Java guy :) ) to the calls - otherwise, how will you access the memory?
TLS is also good in order to store a context for passing data across calls within a thread (We use it to hold information on a logged on user across the thread - some sort of session management).

Thread Specific Data is used when all the functions of a particular thread needs to access one common variable. This variable is local to that particular thread but acts as a global variable for all the functions of that thread.
Let's say we have two threads t1 and t2 of any process. Variable 'a' is the thread specific data for t1. Then, t2 has no knowledge over 'a' but all the functions of t1 can access 'a' as a global variable. Any change in 'a' will be seen by all the functions of t1.

With new OOP techniques available, I find thread specific data as irrelevant. Instead of passing the function to the thread, you can pass the functor. The functor class that you pass, can hold any thread specific data that you need.
Eg. Sample code with C++11 or boost would like like below
MyClassFunctor functorobj; <-- Functor Object. Can hold the function that runs as part of thread as well as any thread specific data
boost::thread mythread(functorobj);
Class MyClassFunctor
{
private:
std::list mylist; <-- Thread specific data
public:
void operator () ()
{
// This function is called when the thread runs
// This can access thread specific data mylist.
}
};

New Thread for Instance of a Class (C#)

I have a form and several external classes (serial port, file access) that are instantiated by the form.
1) What's the simplest way to run an instance of an external class in its own thread?
2) Is the instance's thread automatically terminated when the form closes?

1) What's the simplest way to run an instance of an external class in its own thread?
Instances of classes do not "run". Methods do.
As such, you may want to look into the APM pattern and the BackgroundWorker class.
2) Is the instance's thread automatically terminated when the form closes?
It depends on how the threads were started. A thread can be a background thread or a foreground thread - the latter prevents the application from terminating.

If it's just a couple of lines of code you want to call asynchronously, probably the best way is ThreadPool.QueueUserWorkItem. See: What's the difference between QueueUserWorkItem() and BeginInvoke(), for performing an asynchronous activity with no return types needed

See if you are working with managed Environment, when an object is instantiated it will automatically dispose off if it is out of scope. The Disposal is actually taken care of by Garbage collection.
If you are using UnManaged objects, its your responsibility to close resources before making the object out of scope.
Garbage collection periodically turns on and start collecting all the objects that are out of scope. If you need to work on large objects, you can try using WeakReference class which will hold the object but also expose it for Garbage collection.
Read about WeakReference and garbage collection from here:
http://www.abhisheksur.com/2010/07/garbage-collection-algorithm-with-use.html
I hope this would help you.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight