Library for Dataflow in C - c

How can I do dataflow (pipes and filters, stream processing, flow based) in C? And not with UNIX pipes.
I recently came across stream.py.
Streams are iterables with a pipelining mechanism to enable data-flow programming and easy parallelization.
The idea is to take the output of a function that turns an iterable into another iterable and plug that as the input of another such function. While you can already do this using function composition, this package provides an elegant notation for it by overloading the >> operator.
I would like to duplicate a simple version of this kind of functionality in C. I particularly like the overloading of the >> operator to avoid function composition mess. Wikipedia points to this hint from a Usenet post in 1990.
Why C? Because I would like to be able to do this on microcontrollers and in C extensions for other high level languages (Max, Pd*, Python).
* (ironic given that Max and Pd were written, in C, specifically for this purpose – I'm looking for something barebones)

I know, that it's not a good answer, but you should make your own simple dataflow framework.
I've written a prototype DF server (together with a friend of mine), which have several unimplemented features yet: it can only pass Integer and Trigger data in messages, and it does not supports paralellism. I've just skipped this work: the components' producer ports have a list of function pointers to consumer ports, which are set up upon the initialization, and they call it (if the list is not empty). So, when an event fires, the components perform a tree-like walk-thru of the dataflow graph. As they work with Integers and Triggers, it's extremly quick.
Also, I've written a strange component, which have one consumer and one producer port, it just simply passes the data thru - but in another thread. It's consumer routine finishes quickly, as it just puts the data and sets a flag to the producer-side thread. Dirty, but it suits my needs: it detaches long processes of the tree-walk.
So, as you may recognized, it's a low-traffic asynchronous system for quick tasks, where the graph size does not matter.
Unfortunatelly, your problem differs as many points from mine, just as many one dataflow system can differ from another, you need a synchronous, paralell, stream handling solution.
I think, the biggest issue in a DF server is the dispatcher. Concurrency, collision, threads, priority... as I said, I've just skipped the problem, not solved. You should skip it, too. And you also should skip other problems.
Dispatcher
In case of a synchronous DF architecture, all the components must run once per cycle, except special cases. They have a simple precondition: is the input data available? So, you should just to scan thru the components, and pass them to a free caller thread, if data is available. After processing all of them, you will have N remaining components, which haven't processed. You should process the list again. After the second processing you will have M remainings. If N == M, the cycle is over.
I think some kind of same stuff will work, if the number of components is below only 100.
Binding
Yep, the best way of binding is the visual programming. Until finishing the editor, config-like code should used insetad, something like:
// disclaimer: not actual code
Component* c1 = new AddComponent();
Component* c2 = new PrintComponent();
c2->format = "The result is %d\n";
bind(c1->result,c2->feed);
It's easy to write, well-readable, other wish?
Message
You should pass pure raw packets among components' ports. You need only a list of bindings, which contain pairs of pointers of producer and consumer ports, and contains the processed flag, which the "dispatcher" uses.
Calling issue
The problem is that producer should not call the consumer port, but the component; all component (class) variables and firings are in the component. So, the producer should call the component's common entry point directly, passing the consumer's ID to it, or it should call the port, which should call any method of the component which it belongs.
So, if you can live with some restrictions, I say, go ahead, and write your lite framework. It's a good task, but writing small components and see, how smart can they wired together building a great app is the ultimate fun.
If you have further questions, feel free to ask, I often scan the "dataflow" keyword here.
Possibly, you can figure out a more simple dataflowish model for your program.

I'm not aware of any library for such purpose. Friend of mine implemented something similar in versity as a lab assignment. Main problems of such systems is low performance (really bad if functions in long pipe-lines are smallish) and potential need to implement scheduling (detecting dead-locks and boosting priority to avoid overload of pipe buffer).
From my experience with similar data processing, error handling is quite burdensome. Since functions in the pipeline know little of the context (intentionally, for reusability) they can't produce sensible error message. One can implement in-line error handling - passing errors down the pipe as data - but that would require special handling all over the place, especially on the output as it is not possible with streams to correlate to what input the error corresponds.
Considering known performance problems of the approach, it is hard for me to imagine how that would fit microcontrollers. Performance-wise, nothing beats a plain function: one can create a function for every path through the data pipe-line.
Probably you can look for some Petri net implementation (simulator or code generator), as they are one of the theoretical base for streams.

This is cool: http://code.google.com/p/libconcurrency/
A lightweight concurrency library for C, featuring symmetric coroutines as the main control flow abstraction. The library is similar to State Threads, but using coroutines instead of green threads. This simplifies inter-procedural calls and largely eliminates the need for mutexes and semaphores for signaling.
Eventually, coroutine calls will also be able to safely migrate between kernel threads, so the achievable scalability is consequently much higher than State Threads, which is purposely single-threaded.
This library was inspired by Douglas W. Jones' "minimal user-level thread package". The pseudo-platform-neutral probing algorithm on the svn trunk is derived from his code.
There is also a safer, more portable coroutine implementation based on stack copying, which was inspired by sigfpe's page on portable continuations in C. Copying is more portable and flexible than stack switching, and making copying competitive with switching is being researched.

Related

Optimal division of functions between source files

My C program has two threads, both of which interact with two external interfaces. There's too much code for one source file, so I'm splitting it in two. What is the right split?
One thread, MtoD, takes a message off an IPC message queue, processes it, and then sends commands to the driver of a physical interface. The other thread, DtoM, receives interrupts from that driver, processes the input, and then posts the results in a message to an IPC queue.
The obvious ways to split the code in two are:
by thread: two source files, MtoD.c and DtoM.c, each holding all the functions of a single thread - but both files will have to deal with both of the interfaces
by interface: two source files, M.c and D.c, each doing all the business related to a certain external interface - but the threads run through both files.
My concerns are
code maintenance. Doing it by thread makes it easier to follow the logic of a thread (no switching between files). But someone who'd write this object-oriented would probably wrap the interface to the IPC queues in one class, which would be in one file, and the driver interface in another, in the other file.
performance. If you have object files M.o and D.o, each will have just one external library to deal with - but they have to call into each other during execution of a thread. Does that incur any overhead (if the linker has made them into one binary)? If you have MtoD.o and DtoM.o, you could declare most functions as static, which might enable some more compiler optimizations. But would they both need links with the external libraries?
Which way is optimal?
That's an interesting one and you probably get BOTH options being recommended, simply because both have advanteges and disadvantages and it much depends how one values these.
Ok, third option: one thread ? If I get you right you connect a interface to an IPC, so if one thread both reacts to input on either side and sends it out the other side ? I dont think you loose much responce time this way, if any and you have it all in one place. If source is too big you can look into which classes you may naturally separate rather than separating into threads or interfaces.

how to jump out of and resume at arbitrary locations in c-code without refactoring

BACKGROUND
I'm integrating micropython into my custom cooperative multitasking OS (no, my company won't change to pre-preemptive)
Micropython uses garbage collection and this takes much more time than my alloted time slice even when there's nothing to collect i.e. I called it twice in a row, timed it and still takes A LOT of time.
OBVIOUS SOLUTION
Yes I could refactor micropython source but then whenever there's a change . . .
IDEAL SOLUTION
The ideal solution would involve calling some function void pause(&func_in_call_stack) that would jump out, leaving the stack intact, all the way to the function that is at the top of the call stack, say main. And resume would . . . resume.
QUESTION
Is it possible, using C and assembly, to implement pause?
UPDATE
As I wrote this, I realize that the C-based exception handling code nlr_push()/nlr_pop() already does most of what I need.
Your question is about implementing context switching. As we've covered fairly exhaustively in comments, support for context switching is among the key characteristics of any multitasking system, and of a multitasking OS in particular. Inasmuch as you posit no OS support for context switching, you are talking about implementing multitasking for a single-tasking OS.
That you describe the OS as providing some kind of task queue ("to relinquish control, a thread must simply exit its run loop") does not change this, though to some extent we could consider it a question of semantics. I imagine that a typical task for such a system would operate by creating and executing a series of microtasks (the work of the "run loop"), providing a shared, mutable memory context to each. Such a run loop could safely exit and later be reentered, to resume generating microtasks from where it left off.
Dividing tasks into microtasks at boundaries defined by affirmative application action (i.e. your pause()) would depend on capabilities beyond those provided by ISO C. Very likely, however, it could be done with the help of some assembly, plus some kind of framework support. You need at least these things:
A mechanism for recording a task's current execution context -- stack, register contents, and maybe other details. This is inherently system-specific.
A task-associated place to store recorded execution context. There are various ways in which such a thing could be established. Promising alternatives include (i) provided by the OS; (ii) provided by some kind of userland multi-tasking system running on top of the OS; (iii) built into the task by the compiler.
A mechanism for restoring recorded execution context -- this, too, will be system-specific.
If the OS does not provide such features, then you could consider the (now removed) POSIX context system as a model interface for recording and restoring execution context. (See makecontext(), swapcontext(), getcontext(), and setcontext().) You would need to implement those yourself, however, and you might want to wrap them to present a simpler interface to applications. Details will be highly dependent on hardware and underlying OS.
As an alternative, you might implement transparent multitasking support for such a system by providing compilers that emit specially instrumented code (i.e. even more specially instrumented than you otherwise need). For example, consider compilers that emit bytecode for a VM of your own design. The VMs in which the resulting programs run would naturally track the state of the program running within, and could yield after each sequence of a certain number of opcodes.

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

Making process survive failure in its thread

I'm writing app that has many independant threads. While I'm doing quite low level, dangerous stuff there, threads may fail (SIGSEGV, SIGBUS, SIGFPE) but they should not kill whole process. Is there a way to do it proper way?
Currently I intercept aforementioned signals and in their signal handler then I call pthread_exit(NULL). It seems to work but since pthread_exit is not async-signal-safe function I'm a bit concerned about this solution.
I know that splitting this app into multiple processes would solve the problem but in this case it's not an feasible option.
EDIT: I'm aware of all the Bad Things™ that can happen (I'm experienced in low-level system and kernel programming) due to ignoring SIGSEGV/SIGBUS/SIGFPE, so please try to answer my particular question instead of giving me lessons about reliability.
The PROPER way to do this is to let the whole process die, and start another one. You don't explain WHY this isn't appropriate, but in essence, that's the only way that is completely safe against various nasty corner cases (which may or may not apply in your situation).
I'm not aware of any method that is 100% safe that doesn't involve letting the whole process. (Note also that sometimes just the act of continuing from these sort of errors are "undefined behaviour" - it doesn't mean that you are definitely going to fall over, just that it MAY be a problem).
It's of course possible that someone knows of some clever trick that works, but I'm pretty certain that the only 100% guaranteed method is to kill the entire process.
Low-latency code design involves a careful "be aware of the system you run on" type of coding and deployment. That means, for example, that standard IPC mechanisms (say, using SysV msgsnd/msgget to pass messages between processes, or pthread_cond_wait/pthread_cond_signal on the PThreads side) as well as ordinary locking primitives (adaptive mutexes) are to be considered rather slow ... because they involve something that takes thousands of CPU cycles ... namely, context switches.
Instead, use "hot-hot" handoff mechanisms such as the disruptor pattern - both producers as well as consumers spin in tight loops permanently polling a single or at worst a small number of atomically-updated memory locations that say where the next item-to-be-processed is found and/or to mark a processed item complete. Bind all producers / consumers to separate CPU cores so that they will never context switch.
In this type of usecase, whether you use separate threads (and get the memory sharing implicitly by virtue of all threads sharing the same address space) or separate processes (and get the memory sharing explicitly by using shared memory for the data-to-be-processed as well as the queue mgmt "metadata") makes very little difference because TLBs and data caches are "always hot" (you never context switch).
If your "processors" are unstable and/or have no guaranteed completion time, you need to add a "reaper" mechanism anyway to deal with failed / timed out messages, but such garbage collection mechanisms necessarily introduce jitter (latency spikes). That's because you need a system call to determine whether a specific thread or process has exited, and system call latency is a few micros even in best case.
From my point of view, you're trying to mix oil and water here; you're required to use library code not specifically written for use in low-latency deployments / library code not under your control, combined with the requirement to do message dispatch with nanosec latencies. There is no way to make e.g. pthread_cond_signal() give you nsec latency because it must do a system call to wake the target up, and that takes longer.
If your "handler code" relies on the "rich" environment, and a huge amount of "state" is shared between these and the main program ... it sounds a bit like saying "I need to make a steam-driven airplane break the sound barrier"...

What is better: Select vs Threads?

In linux.
I want to build an autoclicker that will have an enable/disable function when a key is pressed. Obviously there should be 2 things running in parallel (the clicker itself, and the enable/disable function)
What are the cons and pros of each implementation:
Using a thread which will handle the autoclicking function and another main thread (for the enable/disable etc...)
Or using the syscall select and wait for input/keyboard?
Using select is better for performance, especially when you could have potentially hundreds of simultaneous operations. However it can be difficult to write the code correctly and the style of coding is very different from traditional single threaded programming. For example, you need to avoid calling any blocking methods as it could block your entire application.
Most people find using threads simpler because the majority of the code resembles ordinary single threaded code. The only difficult part is in the few places where you need interthread communication, via mutexes or other synchronization mechanisms.
In your specific case it seems that you will only need a small number of threads, so I'd go for the simpler programming model using threads.
Given the amount of work you're doing, it probably doesn't matter.
For high performance applications, there is a difference. In these cases, you need to be handling several thousand connections simultaneously; in such cases, you hand off new connections to new threads.
Creating several thousand threads is expensive, so selecting is used for efficiency. Actually different techniques such as kqueue or epoll are used for optimal switching.
I say it doesn't matter, because you're likely only going to create the thread once and have exactly two threads running for the lifetime of the application.

Resources