Making process survive failure in its thread - c

I'm writing app that has many independant threads. While I'm doing quite low level, dangerous stuff there, threads may fail (SIGSEGV, SIGBUS, SIGFPE) but they should not kill whole process. Is there a way to do it proper way?
Currently I intercept aforementioned signals and in their signal handler then I call pthread_exit(NULL). It seems to work but since pthread_exit is not async-signal-safe function I'm a bit concerned about this solution.
I know that splitting this app into multiple processes would solve the problem but in this case it's not an feasible option.
EDIT: I'm aware of all the Bad Thingsā„¢ that can happen (I'm experienced in low-level system and kernel programming) due to ignoring SIGSEGV/SIGBUS/SIGFPE, so please try to answer my particular question instead of giving me lessons about reliability.

The PROPER way to do this is to let the whole process die, and start another one. You don't explain WHY this isn't appropriate, but in essence, that's the only way that is completely safe against various nasty corner cases (which may or may not apply in your situation).
I'm not aware of any method that is 100% safe that doesn't involve letting the whole process. (Note also that sometimes just the act of continuing from these sort of errors are "undefined behaviour" - it doesn't mean that you are definitely going to fall over, just that it MAY be a problem).
It's of course possible that someone knows of some clever trick that works, but I'm pretty certain that the only 100% guaranteed method is to kill the entire process.

Low-latency code design involves a careful "be aware of the system you run on" type of coding and deployment. That means, for example, that standard IPC mechanisms (say, using SysV msgsnd/msgget to pass messages between processes, or pthread_cond_wait/pthread_cond_signal on the PThreads side) as well as ordinary locking primitives (adaptive mutexes) are to be considered rather slow ... because they involve something that takes thousands of CPU cycles ... namely, context switches.
Instead, use "hot-hot" handoff mechanisms such as the disruptor pattern - both producers as well as consumers spin in tight loops permanently polling a single or at worst a small number of atomically-updated memory locations that say where the next item-to-be-processed is found and/or to mark a processed item complete. Bind all producers / consumers to separate CPU cores so that they will never context switch.
In this type of usecase, whether you use separate threads (and get the memory sharing implicitly by virtue of all threads sharing the same address space) or separate processes (and get the memory sharing explicitly by using shared memory for the data-to-be-processed as well as the queue mgmt "metadata") makes very little difference because TLBs and data caches are "always hot" (you never context switch).
If your "processors" are unstable and/or have no guaranteed completion time, you need to add a "reaper" mechanism anyway to deal with failed / timed out messages, but such garbage collection mechanisms necessarily introduce jitter (latency spikes). That's because you need a system call to determine whether a specific thread or process has exited, and system call latency is a few micros even in best case.
From my point of view, you're trying to mix oil and water here; you're required to use library code not specifically written for use in low-latency deployments / library code not under your control, combined with the requirement to do message dispatch with nanosec latencies. There is no way to make e.g. pthread_cond_signal() give you nsec latency because it must do a system call to wake the target up, and that takes longer.
If your "handler code" relies on the "rich" environment, and a huge amount of "state" is shared between these and the main program ... it sounds a bit like saying "I need to make a steam-driven airplane break the sound barrier"...

Related

Is there a way to have precise timed events in GTK/GLib?

I want to have a function that would run every N milliseconds and i want it to run precisely (relatively, i dont need atomic clock precision).
From what i can see, GLib manual says that g_timeout_add() does not guarantee precision and can be delayed due to other events.
Is there any other way to have precise time events with GTK/GLib? I would rather not use platform specific code, as i want my program to work on both Windows and Linux with as few platform related code changes as possible.
How precise is "not atomic clock"? In the end, timing precision is going to be limited by factors like the platform's context-switching behaviour. Unless you're using customer kernels or specialist hardware, there might not be much you can do about that.
g_timeout_add() is doubly problematic, because it's operation is tangled up with the GTK event handling mechanism, which was never designed for precision.
In the end, your best bets might be either
Use a conventional, signal-based timer (e.g., from setitimer), or
Spawn a new thread and just usleep() a fixed time between actions.
Both these approaches are problematic in GTK, because it's hard to update the user interface from outside the GTK main context thread. Some fairly complicated locking and inter-thread communication is usually required.
If practicable -- and I have no idea whether it would be -- I would suggest delegating the timing part to some separate process, and have the GTK application interact with it using, e.g., sockets.
Without more detail g_usleep would probably be your best bet, but keep in mind that it blocks the current thread so if you want other tasks to proceed in parallel you'll need to spawn a new thread to run it in.

how to jump out of and resume at arbitrary locations in c-code without refactoring

BACKGROUND
I'm integrating micropython into my custom cooperative multitasking OS (no, my company won't change to pre-preemptive)
Micropython uses garbage collection and this takes much more time than my alloted time slice even when there's nothing to collect i.e. I called it twice in a row, timed it and still takes A LOT of time.
OBVIOUS SOLUTION
Yes I could refactor micropython source but then whenever there's a change . . .
IDEAL SOLUTION
The ideal solution would involve calling some function void pause(&func_in_call_stack) that would jump out, leaving the stack intact, all the way to the function that is at the top of the call stack, say main. And resume would . . . resume.
QUESTION
Is it possible, using C and assembly, to implement pause?
UPDATE
As I wrote this, I realize that the C-based exception handling code nlr_push()/nlr_pop() already does most of what I need.
Your question is about implementing context switching. As we've covered fairly exhaustively in comments, support for context switching is among the key characteristics of any multitasking system, and of a multitasking OS in particular. Inasmuch as you posit no OS support for context switching, you are talking about implementing multitasking for a single-tasking OS.
That you describe the OS as providing some kind of task queue ("to relinquish control, a thread must simply exit its run loop") does not change this, though to some extent we could consider it a question of semantics. I imagine that a typical task for such a system would operate by creating and executing a series of microtasks (the work of the "run loop"), providing a shared, mutable memory context to each. Such a run loop could safely exit and later be reentered, to resume generating microtasks from where it left off.
Dividing tasks into microtasks at boundaries defined by affirmative application action (i.e. your pause()) would depend on capabilities beyond those provided by ISO C. Very likely, however, it could be done with the help of some assembly, plus some kind of framework support. You need at least these things:
A mechanism for recording a task's current execution context -- stack, register contents, and maybe other details. This is inherently system-specific.
A task-associated place to store recorded execution context. There are various ways in which such a thing could be established. Promising alternatives include (i) provided by the OS; (ii) provided by some kind of userland multi-tasking system running on top of the OS; (iii) built into the task by the compiler.
A mechanism for restoring recorded execution context -- this, too, will be system-specific.
If the OS does not provide such features, then you could consider the (now removed) POSIX context system as a model interface for recording and restoring execution context. (See makecontext(), swapcontext(), getcontext(), and setcontext().) You would need to implement those yourself, however, and you might want to wrap them to present a simpler interface to applications. Details will be highly dependent on hardware and underlying OS.
As an alternative, you might implement transparent multitasking support for such a system by providing compilers that emit specially instrumented code (i.e. even more specially instrumented than you otherwise need). For example, consider compilers that emit bytecode for a VM of your own design. The VMs in which the resulting programs run would naturally track the state of the program running within, and could yield after each sequence of a certain number of opcodes.

How to get the fastest data processing way: fork or/and multithreading

Imagine that we have a client, which keeps sending lots of double data.
Now we are trying to make a server, which can receive and process the data from the client.
Here is the fact:
The server can receive a double in a very short time.
There is a function to process a double at the server, which needs more than 3 min to process only one double.
We need to make the server as fast as possible to process 1000 double data from the client.
My idea as below:
Use a thread pool to create many threads, each thread can process one double.
All of these are in Linux.
My question:
For now my server is just one process which contains multi-threads. I'm considering if I use fork(), would it be faster?
I think using only fork() without multithreading should be a bad idea but what if I create two processes and each of them contains multi-threads? Can this method be faster?
Btw I have read:
What is the difference between fork and thread?
Forking vs Threading
To a certain degree, this very much depends on the underlying hardware. It also depends on memory constraints, IO throughput, ...
Example: if your CPU has 4 cores, and each one is able to run two threads (and not much else is going on on that system); then you probably would prefer to have a solution with 4 processes; each one running two threads!
Or, when working with fork(), you would fork() 4 times; but within each of the forked processes, you should be distributing your work to two threads.
Long story short, what you really want to do is: to not lock yourself into some corner. You want to create a service (as said, you are building a server, not a client) that has a sound and reasonable design.
And given your requirements, you want to build that application in a way that allows you to configure how many processes resp. threads it will be using. And then you start profiling (meaning: you measure what is going on); maybe you do experiments to find the optimum for a given piece of hardware / OS stack.
EDIT: I feel tempted to say - welcome to the real world. You are facing the requirement to meet precise "performance goals" for your product. Without such goals, programmer life is pretty easy: most of the time, one just sits down, puts together a reasonable product and given the power of todays hardware, "things are good enough".
But if things are not good enough, then there is only one way: you have to learn about all those things that play a role here. Starting with things "which system calls in my OS can I use to get the correct number of cores/threads?"
In other words: the days in which you "got away" without knowing about the exact capacity of the hardware you are using ... are over. If you intend to "play this game"; then there are no detours: you will have to learn the rules!
Finally: the most important thing here is not about processes versus threads. You have to understand that you need to grasp the whole picture here. It doesn't help if you tune your client for maximum CPU performance ... to then find that network or IO issues cause 10x of "loss" compared to what you gained by looking at CPU only. In other words: you have to look at all the pieces in your system; and then you need to measure to understand where you have bottlenecks. And then you decide the actions to take!
One good reading about that would be "Release It" by Michael Nygard. Of course his book is mainly about patterns in the Java world; but he does a great job what "performance" really means.
fork ing as such is way slower than kicking off a thread. A thread is much more lightweight (traditionally, although processes have caught up in the last years) than a full OS process, not only in terms of CPU requirements, but also with regards to memory footprint and general OS overhead.
As you are thinking about a pre-arranged pool of threads or processes, setup time would not account much during runtime of your program, so you need to look into "what is the cost of interprocess communications" - Which is (locally) generally cheaper between threads than it is between processes (threads do not need to go through the OS to exchang data, only for synchronisation, and in some cases you can even get away without that). But unfortunately you do not state whether there is any need for IPC between worker threads.
Summed up: I cannot see any advantage of using fork(), at least not with regards to efficiency.

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

Select in socket programming

Is there any use in using the select() function ?
From my (small) experience I tend to believe that threads are enough.
So I wonder, is select() just a didactic tool for people who don't yet know threads ?
Consider the following example. You have a moderately busy web server with something like 100K connections. You're not using select or anything like it so you have one thread per connection, implying 100K threads which quickly becomes a problem.
Even if you tweak your system until it allows such a monstrosity, most of the threads will just wait on a socket. Wouldn't it be better if there was a mechanism to notify you when a socket becomes interesting ?
Put another way, threading and select-like mechanisms are complementary. You just can't use threads to replace the simple thing select does: monitoring file descriptors.
Single-threaded polling is by far simpler to use, implement and (most importantly) understand. Concurrent programming adds a huge intellectual cost to your project: Synchronising data is tricky and error-prone, locking introduces many opportunities for bugs, lock-free data structures cause performance hits, and the program flow becomes hard to visualize mentally (or "serialize" perhaps).
By contrast, single-threaded polling (maybe with epoll/kqueue rather than select) gives you generally very good performance (depending of course on what exactly you're doing in response to data) while remaining straight-forward.
In Linux in particular, you can have timerfds, eventfds, signalfds and inotify-fds, as well as nested epoll-fds, all sitting together in your polling set, giving you an very uniform way of dealing with all sorts of "asynchronous" events. If eventually you need more performance, you have a single point of parallelism by running several pollers concurrently, and much of the data synchronisation is done for you by the kernel, which promises that only one single thread receives a successful poll in the event of readiness.

Resources