Launching background threads in C with a single statement - c

I've read quite a bit about the Go language. There's a lot I didn't like about it - enough that I don't want to use it in day-to-day life. However, there is one bit about the language that I really do like: goroutines.
I was thinking of ways of implementing it in C. So far, the best I can find on the internet is
#define go if (!fork()) for(;;exit(0))
That way, you can prefix function calls with go, so that
go printf("Hello, world!\n");
runs in a different thread, as well as:
go { printf("Hello, world!\n"); foo(); bar(); baz(); }
But, of course, fork() has speed issues. (On my box, it takes 7 times as long to fork() as it is to printf(), benchmarked using the rdtsc x86 instruction and running a few times to eliminate the possibility of switching between cores or being scheduled out.)
So, my question is, is there a better way of implementing this, so that it's faster?

Goroutines are mostly a fancy word for threads, with some extra functionality for inter-thread communication and such.
I am guessing the part you are interested in is the ability to succintly run a section of code in a separate thread. Unfortunately there isn't a simple way to do this in C - you would have to write a function that enclosed the code you wanted to run in a the background, and use a macro or function that accepted that function and did the necessary magic using pthread_create() or similar.
Unless someone comes up with a clever way to use macros to create a function on-the-fly? Anyone?
Keep in mind that in all but the most basic threaded application you will need some sort of synchronization, which will make things much less simple.

Take a look at OpenMP. It allows threads to be spawned for code blocks and loop iterations with relatively simple #pragma directives. It has been around for over a decade, and is already available in many compilers (including gcc).
Starting work in a thread should be faster than fork(), but the performance improvement may be obscured by behind-the-scenes thread-pool initialization overhead in simple applications that don't manage many threads.

Related

How to get the fastest data processing way: fork or/and multithreading

Imagine that we have a client, which keeps sending lots of double data.
Now we are trying to make a server, which can receive and process the data from the client.
Here is the fact:
The server can receive a double in a very short time.
There is a function to process a double at the server, which needs more than 3 min to process only one double.
We need to make the server as fast as possible to process 1000 double data from the client.
My idea as below:
Use a thread pool to create many threads, each thread can process one double.
All of these are in Linux.
My question:
For now my server is just one process which contains multi-threads. I'm considering if I use fork(), would it be faster?
I think using only fork() without multithreading should be a bad idea but what if I create two processes and each of them contains multi-threads? Can this method be faster?
Btw I have read:
What is the difference between fork and thread?
Forking vs Threading
To a certain degree, this very much depends on the underlying hardware. It also depends on memory constraints, IO throughput, ...
Example: if your CPU has 4 cores, and each one is able to run two threads (and not much else is going on on that system); then you probably would prefer to have a solution with 4 processes; each one running two threads!
Or, when working with fork(), you would fork() 4 times; but within each of the forked processes, you should be distributing your work to two threads.
Long story short, what you really want to do is: to not lock yourself into some corner. You want to create a service (as said, you are building a server, not a client) that has a sound and reasonable design.
And given your requirements, you want to build that application in a way that allows you to configure how many processes resp. threads it will be using. And then you start profiling (meaning: you measure what is going on); maybe you do experiments to find the optimum for a given piece of hardware / OS stack.
EDIT: I feel tempted to say - welcome to the real world. You are facing the requirement to meet precise "performance goals" for your product. Without such goals, programmer life is pretty easy: most of the time, one just sits down, puts together a reasonable product and given the power of todays hardware, "things are good enough".
But if things are not good enough, then there is only one way: you have to learn about all those things that play a role here. Starting with things "which system calls in my OS can I use to get the correct number of cores/threads?"
In other words: the days in which you "got away" without knowing about the exact capacity of the hardware you are using ... are over. If you intend to "play this game"; then there are no detours: you will have to learn the rules!
Finally: the most important thing here is not about processes versus threads. You have to understand that you need to grasp the whole picture here. It doesn't help if you tune your client for maximum CPU performance ... to then find that network or IO issues cause 10x of "loss" compared to what you gained by looking at CPU only. In other words: you have to look at all the pieces in your system; and then you need to measure to understand where you have bottlenecks. And then you decide the actions to take!
One good reading about that would be "Release It" by Michael Nygard. Of course his book is mainly about patterns in the Java world; but he does a great job what "performance" really means.
fork ing as such is way slower than kicking off a thread. A thread is much more lightweight (traditionally, although processes have caught up in the last years) than a full OS process, not only in terms of CPU requirements, but also with regards to memory footprint and general OS overhead.
As you are thinking about a pre-arranged pool of threads or processes, setup time would not account much during runtime of your program, so you need to look into "what is the cost of interprocess communications" - Which is (locally) generally cheaper between threads than it is between processes (threads do not need to go through the OS to exchang data, only for synchronisation, and in some cases you can even get away without that). But unfortunately you do not state whether there is any need for IPC between worker threads.
Summed up: I cannot see any advantage of using fork(), at least not with regards to efficiency.

About Dijkstra omp

Recently I've download a source code from internet of the OpenMP Dijkstra.
But I found that the parallel time will always larger than when it is run by one thread (whatever I use two, four or eight threads.)
Since I'm new to OpenMP I really want to figure out what happens.
The is due to the overheard of setting up the threads. The execution time of the work itself is theoretically the same, but the system has to set up the threads that manage the work (even if there's only one). For little work, or for only one thread, this overhead time makes your time-to-solution slower than the serial time-to-solution.
Alternatively, if you see the time increasing dramatically as you increase the thread-count, you could only be using 1 core on your computer and tricking it into thinking its 2,4,8, etc threads.
Finally, it's possible that the way you're implementing dijkstra's method is largely serial. But without looking at your code it would be too hard to say.

Understanding pthreads a little more in C

So I only very recently heard about these pthreads and my understanding of them is very limited so far but I just wanted to know if it would be able to do what I want before I get real into learning about them.
I have written a program that generates two output pulses from a micro-controller which happen with different frequencies, periods and duty cycles. At the moment the functions to output the pulses are happening in a loop and it works well because the timings I am using are multiples of each other so stopping one while not interrupting the other is not too much hassle.
However I want it be a lot more dynamic so I can change the duty cycles or periods easily without having to make some complicated loop specific for those timings... Below shows a quick sketch of what I am trying to achieve and I hope you can understand it...
So basically my question is, is something like this possible with pthreads in C, ie do they run simultaneously so one could be pulsing on and off while the one is waiting for a delay to finish?
If not is there anything that I could use for this instead?
In general, it's not worth using threads for such functionality on a uC. The cost of extra stacks etc. for such limited operations is not worth it, tempting it might be from a simplicity POV.
A hardware timer, interrupt and a delta-queue of events is probably the best you could do.

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

What is better: Select vs Threads?

In linux.
I want to build an autoclicker that will have an enable/disable function when a key is pressed. Obviously there should be 2 things running in parallel (the clicker itself, and the enable/disable function)
What are the cons and pros of each implementation:
Using a thread which will handle the autoclicking function and another main thread (for the enable/disable etc...)
Or using the syscall select and wait for input/keyboard?
Using select is better for performance, especially when you could have potentially hundreds of simultaneous operations. However it can be difficult to write the code correctly and the style of coding is very different from traditional single threaded programming. For example, you need to avoid calling any blocking methods as it could block your entire application.
Most people find using threads simpler because the majority of the code resembles ordinary single threaded code. The only difficult part is in the few places where you need interthread communication, via mutexes or other synchronization mechanisms.
In your specific case it seems that you will only need a small number of threads, so I'd go for the simpler programming model using threads.
Given the amount of work you're doing, it probably doesn't matter.
For high performance applications, there is a difference. In these cases, you need to be handling several thousand connections simultaneously; in such cases, you hand off new connections to new threads.
Creating several thousand threads is expensive, so selecting is used for efficiency. Actually different techniques such as kqueue or epoll are used for optimal switching.
I say it doesn't matter, because you're likely only going to create the thread once and have exactly two threads running for the lifetime of the application.

Resources