State machine event generation in multi-processor architecture

State machine event generation in multi-processor architecture - c

I'm having a small architecture argument with a coworker at the moment. I was hoping some of you could help settle it by strongly suggesting one approach over another.
We have a DSP and Cortex-M3 coupled together with shared memory. The DSP receives requests from the external world and some of these requests are to execute certain wireless test functionality which can only be done on the CM3. The DSP writes to shared memory, then signals the CM3 via an interrupt. The shared memory indicates what the request is along with any necessary data required to perform the request (channel to tune to, register of RF chip to read, etc).
My preference is to generate a unique event ID for each request that can occur in the interrupt. Then before leaving the interrupt pass the event on to the state machine's event queue, which would get handled in the thread devoted to RF activity.
My coworker would instead like to pass a single event ID (generic RF command) to the state machine and have the parsing of the shared memory area occur after receiving this event ID in the state machine. After parsing, then you would know the specific command that you need to act on.
I dislike this approach because you will be doing the parsing of shared memory in whatever state you happen to be in. You can make this a function, but it's still processing that should be state-independent. She doesn't like the idea of parsing shared memory in the interrupt.
Any comments on the better approach? If it helps, we're using the QP framework from Miro Samek for state machine implementation.
EDIT: moved statechart to ftp://hiddenoaks.asuscomm.com/Statechart.bmp

Here's a compromise:
pass a single event ID (generic RF command) to the state machine from the interrupt
create an action_function that "parses" the shared memory and returns a specific command
guard RF_EVENT transitions in the statechart with [parser_action_func() == RF_CMD_1] etc.
The statechart code generator should be smart enough to execute parser_action_func() only once per RF_EVENT. (Dunno if QP framework is that smart).
This has the same statechart semantics of your "unique event ID for each request," and avoids parsing the shared memory in the interrupt handler.
ADDENDUM
The difference in the statechart is N transitions labeled
----RF_EVT_CMD_1---->
----RF_EVT_CMD_2---->
...
----RF_EVT_CMD_N---->
verus
----RF_EVT[cmd()==CMD_1]---->
----RF_EVT[cmd()==CMD_2]---->
...
----RF_EVT[cmd()==CMD_N]---->
where cmd() is the parsing action function.

Related

Embedded Firmware Architecture

I’ve been writing increasingly complex firmware and am starting to notice that my knowledge of design patterns and architecture is a bit lacking. I’m trying to work on developing these skills and am hoping for some input. Note: this is for embedded c for microcontrollers.
I’m working with a concept for a new project as an exercise right now that goes something like this:
We have a battery management module with user I/O
The main controller is responsible for I/O (button, LCD, UART debug), detecting things like the charger being plugged in/unplugged, and managing high level operations.
The sub controller is a battery management controller (pretty much a custom PMIC) capable of monitoring battery levels, running charging/discharging firmware etc.
The PMIC interfaces with a fuel gauge IC that it uses to read battery information from
The interface between the two controllers, fuel gauge and the LCD are all I2C
Here is a rough system diagram:
Now what I’m trying to do is to come up with a good firmware architecture that will allow for expandability (adding multiple batteries, adding more sensors, changing the LCD (or other) interface from I2C to SPI, etc), and for testing (simulate button presses via UART, replace battery readings with simulated values to test PMIC charge firmware, etc).
What I would normally do is write a custom driver for each peripheral, and a firmware module for each block. I would implement a flagging module as well with a globally available get/set that would be used throughout the system. For example my timers would set 100Hz, 5Hz, 1Hz, flags, which the main loop would handle and call the individual modules at their desired rate. Then the modules themselves could set flags for the main loop to handle for events like I2C transaction complete, transaction timed out, temperature exceeded, etc.
What I am hoping to get from this is a few suggestions on a better way to architect the system to achieve my goals of scalability, encapsulation and abstraction. It seems like what I’m doing is sort of a pseudo event-driven system but one that’s been hacked together.
In any case here’s my attempt at an architecture diagram:

The concept of an "event bus" is over-complicated. In many cases, the simplest approach is to minimize the number of things that need to happen asynchronously, but instead have a "main poll" routine which runs on an "as often as convenient" basis and calls polling routines for each subsystem. It may be helpful to have such routine in a compilation by itself, so that the essence of that file would simply be a list of all polling functions used by other subsystems, rather than anything with semantics of its own. If one has a "get button push" routine, one can have a loop within that routine which calls the main poll routine until a button is pushed, there's a keyboard timeout, or something else happens that the caller needs to deal with. That would then allow the main UI to be implemented using code like:
void maybe_do_something_fun(void)
{
while(1)
{
show_message("Do something fun?");
wait_for_button();
if (button_hit(YES_BUTTON))
{
... do something fun
return;
}
else if (button_hit(NO_BUTTON))
{
... do something boring
return;
}
} while(1);
}
This is often much more convenient than trying to have a giant state machine and say that if the code is the STATE_MAYBE_DO_SOMETHING_FUN state and the yes or no button is pushed, it will need to advance to the STATE_START_DOING_SOMETHING_FUN or STATE_START_DOING_SOMETHING_BORING state.
Note that if one uses this approach, one will need to ensure that the worst-case time between calls to main_poll will always satisfy the timeliness requirements of the polling operations handled through main_poll, but in cases where that requirement can be met, this approach can be far more convenient and efficient than doing everything necessary to preemptively-scheduled multi-threaded code along with the locks and other guards needed to make it work reliably.

Is this a standard practice to use queues FIFO this way?

Let's say I have to work on a resource-stingy platform, the processor my program will be running on is rather low end (MCU, 80486, whatnot) and I try to implement this object oriented, event-driven programming model.
The source of the events comes from, of course, interrupts. Since the processor is rather slow, and I don't want to drop any interrupt event (which is BTW, quite often the case for MCUs, if I can't respond in time, I mask the relevant interrupt source and stops taking in interrupts altogether) what I'm going to do is I'm gonna have an interrupt event struct detailing everything, then I'm going to build a FIFO array, which contains not these events, but references (pointer) to these events, and if I care, I can make this FIFO a 2way linked list, as long as I watch out for the size or a memory leak may happen.
So every time an interrupt arrives, I "cram" all its information with a setter of some sort into the aforementioned event struct and push it into the FIFO, and I pop and deal with it later.
I believe this is particularly useful when the interrupts are mainly communication related, for example, if I were to implement a low power, standalone MQTT server.
My question is, is this something completely amateur, something only a "bad" programmer will do, or is it rather by-the-book?

User Level Interrupt Handler for Timer

I have to implement a simple os and a virtual machine, for a project, that supports some basic functions. This os will run on the virtual machine and the virtual machine like a normal program in Linux.
Suppose that now is the quantum that the virtual machine is executed.
How is possible to receive some extra timer signals in order to divide the virtual machine execution time in smaller quanta?
How many timers are available in my cpu? (It's more like a general question)
Can I handle timer signals inside the virtual machine with a user lever interrupt handler?
Any help or guidance would be very appreciated.
Thank you

I suggest you use exactly 1 interrupt, and organize your timers in either a queue (for few times, e.g. <50) or in a heap, which is quite a quick tree which, at any time, gives you access to the smallest element, that is, the element with the next Timer to be handled.
Thus you have one interrupt, one handler, and many Timers with associated functions that will be called by that single handler.

In fact, the normal program is also using interrupt(system-level),for example when they want to use system call.
In user level ,you can use swapcontext/makecontext to simulate system level swap context, but when you want to get time(to caculate the time difference) , you have to use syscall.So you'd better use the system timer directly, it's not a bad idea.

what's wrong in using interrupt handlers as event listeners

My system is simple enough that it runs without an OS, I simply use interrupt handlers like I would use event listener in a desktop program. In everything I read online, people try to spend as little time as they can in interrupt handlers, and give the control back to the tasks. But I don't have an OS or real task system, and I can't really find design information on OS-less targets.
I have basically one interrupt handler that reads a chunk of data from the USB and write the data to memory, and one interrupt handler that reads the data, sends the data on GPIO and schedule itself on an hardware timer again.
What's wrong with using the interrupts the way I do, and using the NVIC (I use a cortex-M3) to manage the work hierarchy ?

First of all, in the context of this question, let's refer to the OS as a scheduler.
Now, unlike threads, interrupt service routines are "above" the scheduling scheme.
In other words, the scheduler has no "control" over them.
An ISR enters execution as a result of a HW interrupt, which sets the PC to a different address in the code-section (more precisely, to the interrupt-vector, where you "do a few things" before calling the ISR).
Hence, essentially, the priority of any ISR is higher than the priority of the thread with the highest priority.
So one obvious reason to spend as little time as possible in an ISR, is the "side effect" that ISRs have on the scheduling scheme that you design for your system.
Since your system is purely interrupt-driven (i.e., no scheduler and no threads), this is not an issue.
However, if nested ISRs are not allowed, then interrupts must be disabled from the moment an interrupt occurs and until the corresponding ISR has completed. In that case, if any interrupt occurs while an ISR is in execution, then your program will effectively ignore it.
So the longer you spend inside an ISR, the higher the chances are that you'll "miss out" on an interrupt.

In many desktop programs, events are send to queue and there is some "event loop" that handle this queue. This event loop handles event by event so it is not possible to interrupt one event by other one. It also is good practise in event driven programming to have all event handlers as short as possible because they are not interruptable.
In bare metal programming, interrupts are similar to events but they are not send to queue.
execution of interrupt handlers is not sequential, they can be interrupted by interrupt with higher priority (numerically lower number in Cortex-M3)
there is no queue of same interrupts - e.g. you can't detect multiple GPIO interrupts while you are in that interrupt - this is the reason you should have all routines as short as possible.
It is possible to implement queues by yourself, feed these queues by interrupts and consume these queues in your super loop (consume while disabling all interrupts). By this approach, you can get sequential processing of interrupts. If you keep your handlers short, this is mostly not needed and you can do the work in handlers directly.
It is also good practise in OS based systems that they are using queues, semaphores and "interrupt handler tasks" to handle interrupts.

With bare metal it is perfectly fine to design for application bound or interrupt/event bound so long as you do your analysis. So if you know what events/interrupts are coming at what rate and you can insure that you will handle all of them in the desired/designed amount of time, you can certainly take your time in the event/interrupt handler rather than be quick and send a flag to the foreground task.
The common approach of course is to get in and out fast, saving just enough info to handle the thing in the foreground task. The foreground task has to spin its wheels of course looking for event flags, prioritizing, etc.
You could of course make it more complicated and when the interrupt/event comes, save state, and return to the forground handler in the forground mode rather than interrupt mode.
Now that is all general but specific to the cortex-m3 I dont think there are really modes like big brother ARMs. So long as you take a real-time approach and make sure your handlers are deterministic, and you do your system engineering and insure that no situation happens where the events/interrupts stack up such that the response is not deterministic, not too late or too long or loses stuff it is okay

What you have to ask yourself is whether all events can be services in time in all circumstances:
For example;
If your interrupt system were run-to-completion, will the servicing of one interrupt cause unacceptable delay in the servicing of another?
On the other hand, if the interrupt system is priority-based and preemptive, will the servicing of a high priority interrupt unacceptably delay a lower one?
In the latter case, you could use Rate Monotonic Analysis to assign priorities to assure the greatest responsiveness (the shortest execution-time handlers get the highest priority). In the first case your system may lack a degree of determinism, and performance will be variable under both event load, and code changes.
One approach is to divide the handler into real-time critical and non-critical sections, the time-critical code can be done in the handler, then a flag set to prompt the non-critical action to be performed in the "background" non-interrupt context in a "big-loop" system that simply polls event flags or shared data for work to complete. Often all that might be necessary in the interrupt handler is to copy some data to timestamp some event - making data available for background processing without holding up processing of new events.
For more sophisticated scheduling, there are a number of simple, low-cost or free RTOS schedulers that provide multi-tasking, synchronisation, IPC and timing services with very small footprints and can run on very low-end hardware. If you have a hardware timer and 10K of code space (sometimes less), you can deploy an RTOS.

I am taking your described problem first
As I interpret it your goal is to create a device which by receiving commands from the USB, outputs some GPIO, such as LEDs, relays etc. For this simple task, your approach seems to be fine (if the USB layer can work with it adequately).
A prioritizing problem exists though, in this case it may be that if you overload the USB side (with data from the other end of the cable), and the interrupt handling it is higher priority than that triggered by the timer, handling the GPIO, the GPIO side may miss ticks (like others explained, interrupts can't queue).
In your case this is about what could be considered.
Some general guidance
For the "spend as little time in the interrupt handler as possible" the rationale is just what others told: an OS may realize a queue, etc., however hardware interrupts offer no such concepts. If the event causing the interrupt happens, the CPU enters your handler. Then until you handle it's source (such as reading a receive holding register in the case of a UART), you lose any further occurrences of that event. After this point, until exiting the handler, you may receive whether the event happened, but not how many times (if the event happened again while the CPU was still processing the handler, the associated interrupt line goes active again, so after you return from the handler, the CPU immediately re-enters it provided nothing higher priority is waiting).
Above I described the general concept observable on 8 bit processors and the AVR 32bit (I have experience with these).
When designing such low-level systems (no OS, one "background" task, and some interrupts) it is fundamental to understand what goes on on each priority level (if you utilize such). In general, you would make the most real-time critical tasks the highest priority, taking the most care of serving those fast, while being more relaxed with the lower priority levels.
From an other aspect usually at design phase it can be planned how the system should react to missed interrupts, since where there are interrupts, missing one will eventually happen anyway. Critical data going across communication lines should have adequate checksums, an especially critical timer should be sourced from a count register, not from event counting, and the likes.
An other nasty part of interrupts is their asynchronous nature. If you fail to design the related locks properly, they will eventually corrupt something giving nightmares to that poor soul who will have to debug it. The "spend as little time in the interrupt handler as possible" statement also encourages you to keep the interrupt code reasonably short which means less code to consider for this problem as well. If you also worked with multitasking assisted by an RTOS you should know this part (there are some differences though: a higher priority interrupt handler's code does not need protection against a lower priority handler's).
If you can properly design your architecture regarding the necessary asynchronous tasks, getting around without an OS (from the no multitasking aspect) may even prove to be a nicer solution. It needs way more thinking to design it properly, however later there are much less locking related problems. I got through some mid-sized safety critical projects designed over a single background "task" with very few and little interrupts, and the experience and maintenance demands regarding those (especially the tracing of bugs) were quite satisfactory compared to some others in the company built over multitasking concepts.

protocol handler using dev_add_pack consumes cpu

I wrote a kernel module and used dev_add_pack to get all the incoming packets.
According to given filter rules, if packet matches, I am forwarding it to user space.
When I am loading this kernel module and send udp traffic using sipp,
ksoftirqd process appears and starts consume cpu. (I am testing this by top command)
is there any way to save cpu ?

I guess you use ETH_P_ALL type to register your packet_type structure to protocol stack. And I think your packet_type->func is the bottleneck, which maybe itself consumes lots of cpu, or it break the existing protocol stack model and triggers other existing packet_type functions to consumes cpu. So the only way to save cpu is to optimize you packet_type->func. If your function is too complicated, you should consider to spit the function to several parts, use the simple part as the packet_type->func which runs in ksoftirqd context, while the complicated parts should be put to other kernel thread context(you can create new thread in your kernel module if needed).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight