separate threads for data acquisition and plotting - c

I have a program written in C and running on Linux which acquires streaming data from a serial port device every 16 or so ms. This is a time critical piece of code that works fine. Another piece of code plots this data, also in real time, but its timely execution is less important to me than the data acquisition part. That is, I don't want to wait until all the plotting and drawing functions have finished before polling the serial port again. So I was thinking of having a separate thread do the plotting part of the application, or perhaps have the data acquisition part be the separate thread. I really have next to no experience when it comes to low-level programming, so could someone point me in the right direction? The pseudo-code with which I am working looks something like this:
int xyz; // global variable
int main() {
do_some_preliminary_stuff();
while 1 {
poll_serial_port_and_fill_xyz_with_new_position_and_repeat();
}
while 1 {
plot_xyz();
}
return 0;
}
Obviously as written, the code will be stuck in the first while loop, so yeah, threads?
Thanks.

Take care! Can your plotting routine keep up, on average, with the rate at which your data arrives on the serial port? If not, what should happen to xyz? Should un-plotted values be overwritten, or something else? If you can't keep up, this question needs to be answered first.
If you can keep up on average, then as you say you have little experience in low-level (i.e. threaded) programming, you might consider using two processes connected by a shell pipe:
poll_for_serial_data | plot_data
The first process is your while loop, writing the polled data to stdout in some convenient format. The second process reads dat from stdin and plots it. This achieves the same end as the multithreaded approach, but is simpler and easier to write as the OS handles the synchronisation and protection problems for you. And on Linux it's pretty efficient.
If this isn't performant enough for you, it could still act as a model for a multithreaded version.

Yup, that is the way to go. Have non-main thread be the data acquisition thread, which posts the buffered response to the main/UI thread which does the plotting. Main thread should consume this data for plotting.

Related

TCP Client/Server with Linux

This may be a very basic question/design but I am struggling with the correct method to handle the system I am going to define here.
I have a system with a single client (PC) that will connect to an embedded Linux board (Raspberry Pi) via TCP/IP protocol. This will be a command/response system where the PC will ask for something and the raspberry PI will respond with the results.
For Example:
CMD => Read/Return ADC Channel X
RSP => ADC Channel X Data
For this type of system I have already defined a packet protocol that will allow for this interaction. My problem is how to handle this on the Raspberry PI. I envision having a single thread handling the TCP connection; placing incomming data into a thread safe queue and pulling outgoing data from a thread safe queue. Then the main thread would poll the queue periodically looking for data. When data is found the command would be processed and a response will be generated. All commands have a response.
The main thread will also be doing other time critical tasks (PID control loop) so it cannot wait for incoming or outgoing data.
My guess is this type of system is fairly common and there is probably a good approach to implementing this type of system. I am very new to Linux programming but I have been programming highly embedded systems (No OS) forever. Just struggling with the correct approach for this type of design.
Note I chose TCP/IP because it handles retying in case of failure. In my case every command has a response so UDP could be used if it makes the design easier/more flexible.
Any help is greatly appreciated.
I tend to avoid threads if I can and only use them if I have to because they make debugging the program harder. They turn a determinsitic problem into a non-deterministic one. So my initial approach would be to see if I can do this without a thread and still achieve concurrency. This is possible using select which will notify your main program when there is something on the socket that needs to be read. Then, when there is something on the socket, it can read the data, process it, and wait for the next event. Problems with this approach is if the computation on the received data takes longer than the acceptable time wanted to process the next element of data, you could end up with a backlog of unprocessed data on the socket. If this is going to happen then you can go ahead and run the receive loop in thread, and the work function in another thread, or fork a new process and deal with a copy of the data from the new process.
the ultra classic linux approach is to have a listener program that forks a new copy of itself for each new client. Linux even has a built in demon that does that for you (initd - although that might have changed with all the systemd stuff). Thats how sshd, telnetd, ftpd all work. No threads

Dynamically detect when two processors are ready to communicate in MPI

I have an MPI program where each processor does the following:
Do expensive operation.
If I need to store anything remotely (could be on any other processor), queue a request for it in a buffer and continue.
If buffer gets full enter comm phase.
In the comm phase, the processor with the full buffer should send away some of its buffered information, then return to "expensive operation". Of course, this can't happen until at least two processors have entered the comm phase and can execute MPI commands.
Currently I'm dealing with this by pausing until ALL processors enter the comm phase, then doing something like,
MPI_Allgather(Num_send_local,NTask,MPI_INT,Num_send_global,NTask,MPI_INT,MPI_COMM_WORLD);
where Num_send_local is an array of length NTask containing the number of things to send to each task (so Num_send_global is then NTask*NTask).
This works, but can often result in a lot of wasted resources as processors that could be communicating with one another sit around waiting until everyone is ready to send.
Really what I want to happen is for communication to happen as soon as two processors enter the comm phase, but I'm having trouble implementing it. I've tried the following:
//Tell everyone I'm in the comm phase now
for(i=0;i<NTask;i++)
{
if(Task==i)
continue;
MPI_Isend(&Num_send_local[i],1,MPI_INT,i,0,MPI_COMM_WORLD,&request[i]);
}
MPI_Recv(&local,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
remote_partner = status.MPI_SOURCE;
//Do stuff between Task and remote_partner...
But this runs into problems where remote_partner receives someone else's send request instead of Task's.
I'm sure there's a better way of doing this. Anyone have any ideas?
It all depends on the scale of your application and as always, measuring performance is the key (when you have at least some working version). You could try a master-slave approach, where one process handles distributing work to idle slave processes. Stackoverflow and the greater Internet have lots of resources on implementing a master-slave parallel program.
I think what your looking for can be found in this rather lengthy example here of a master-slave model. When the work on a slave is finished, a slave will send a result and the master knowns to send another round of work to the slave.

How to prevent linux soft lockup/unresponsiveness in C without sleep

How would be the correct way to prevent a soft lockup/unresponsiveness in a long running while loop in a C program?
(dmesg is reporting a soft lockup)
Pseudo code is like this:
while( worktodo ) {
worktodo = doWork();
}
My code is of course way more complex, and also includes a printf statement which gets executed once a second to report progress, but the problem is, the program ceases to respond to ctrl+c at this point.
Things I've tried which do work (but I want an alternative):
doing printf every loop iteration (don't know why, but the program becomes responsive again that way (???)) - wastes a lot of performance due to unneeded printf calls (each doWork() call does not take very long)
using sleep/usleep/... - also seems like a waste of (processing-)time to me, as the whole program will already be running several hours at full speed
What I'm thinking about is some kind of process_waiting_events() function or the like, and normal signals seem to be working fine as I can use kill on a different shell to stop the program.
Additional background info: I'm using GWAN and my code is running inside the main.c "maintenance script", which seems to be running in the main thread as far as I can tell.
Thank you very much.
P.S.: Yes I did check all other threads I found regarding soft lockups, but they all seem to ask about why soft lockups occur, while I know the why and want to have a way of preventing them.
P.P.S.: Optimizing the program (making it run shorter) is not really a solution, as I'm processing a 29GB bz2 file which extracts to about 400GB xml, at the speed of about 10-40MB per second on a single thread, so even at max speed I would be bound by I/O and still have it running for several hours.
While the posed answer using threads might possibly be an option, it would in reality just shift the problem to a different thread. My solution after all was using
sleep(0)
Also tested sched_yield / pthread_yield, both of which didn't really help. Unfortunately I've been unable to find a good resource which documents sleep(0) in linux, but for windows the documentation states that using a value of 0 lets the thread yield it's remaining part of the current cpu slice.
It turns out that sleep(0) is most probably relying on what is called timer slack in linux - an article about this can be found here: http://lwn.net/Articles/463357/
Another possibility is using nanosleep(&(struct timespec){0}, NULL) which seems to not necessarily rely on timer slack - linux man pages for nanosleep state that if the requested interval is below clock granularity, it will be rounded up to clock granularity, which on linux depends on CLOCK_MONOTONIC according to the man pages. Thus, a value of 0 nanoseconds is perfectly valid and should always work, as clock granularity can never be 0.
Hope this helps someone else as well ;)
Your scenario is not really a soft lock up, it is a process is busy doing something.
How about this pseudo code:
void workerThread()
{
while(workToDo)
{
if(threadSignalled)
break;
workToDo = DoWork()
}
}
void sighandler()
{
signal worker thread to finish
waitForWorkerThreadFinished;
}
void main()
{
InstallSignalHandler;
CreateSemaphore
StartThread;
waitForWorkerThreadFinished;
}
Clearly a timing issue. Using a signalling mechanism should remove the problem.
The use of printf solves the problem because printf accesses the console which is an expensive and time consuming process which in your case gives enough time for the worker to complete its work.

Overlapping communications with computations in MPI (mvapich2) for large messages

I have a very simple code, a data decomposition problem in which in a loop each process sends two large messages to the ranks before and after itself at each cycle. I run this code in a cluster of SMP nodes (AMD Magny cores, 32 core per node, 8 cores per socket). It's a while I'm in the process of optimizing this code. I have used pgprof and tau for profiling and it looks to me that the bottleneck is the communication. I have tried to overlap the communication with the computations in my code however it looks that the actual communication starts when the computations finish :(
I use persistent communication in ready mode (MPI_Rsend_init) and in between the MPI_Start_all and MPI_Wait_all bulk of the computation is done. The code looks like this:
void main(int argc, char *argv[])
{
some definitions;
some initializations;
MPI_Init(&argc, &argv);
MPI_Rsend_init( channel to the rank before );
MPI_Rsend_init( channel to the rank after );
MPI_Recv_init( channel to the rank before );
MPI_Recv_init( channel to the rank after );
for (timestep=0; temstep<Time; timestep++)
{
prepare data for send;
MPI_Start_all();
do computations;
MPI_Wait_all();
do work on the received data;
}
MPI_Finalize();
}
Unfortunately the actual data transfer does not start until the computations are done, I don't understand why. The network uses QDR InfiniBand Interconnect and mvapich2. each message size is 23MB (totally 46 MB message is sent). I tried to change the message passing to eager mode, since the memory in the system is large enough. I use the following flags in my job script:
MV2_SMP_EAGERSIZE=46M
MV2_CPU_BINDING_LEVEL=socket
MV2_CPU_BINDING_POLICY=bunch
Which gives me an improvement of about 8%, probably because of better placement of the ranks inside the SMP nodes however still the problem with communication remains. My question is why can't I effectively overlap the communications with the computations? Is there any flag that I should use and I'm missing it? I know something is wrong, but whatever I have done has not been enough.
By the order of ranks inside the SMP nodes the actual message sizes between the nodes is also 46MB (2x23MB) and the ranks are in a loop. Can you please help me? To see the flags that other users use I have checked /etc/mvapich2.conf however it is empty.
Is there any other method that I should use? do you think one sided communication gives better performance? I feel there is a flag or something that I'm not aware of.
Thanks alot.
There is something called progression of operations in MPI. The standard allows for non-blocking operations to only be progressed to completion once the proper testing/waiting call was made:
A nonblocking send start call initiates the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed concurrently with computations done at the sender after the send was initiated and before it completed. Similarly, a nonblocking receive start call initiates the receive operation, but does not complete it. The call can return before a message is stored into the receive buffer. A separate receive complete call is needed to complete the receive operation and verify that the data has been received into the receive buffer. With suitable hardware, the transfer of data into the receiver memory may proceed concurrently with computations done after the receive was initiated and before it completed.
(words in bold are also bolded in the standard text; emphasis added by me)
Although this text comes from the section about non-blocking communication (ยง3.7 of MPI-3.0; the text is exactly the same in MPI-2.2), it also applies to persistent communication requests.
I haven't used MVAPICH2, but I am able to speak about how things are implemented in Open MPI. Whenever a non-blocking operation is initiated or a persistent communication request is started, the operation is added to a queue of pending operations and is then progressed in one of the two possible ways:
if Open MPI was compiled without an asynchronous progression thread, outstanding operations are progressed on each call to a send/receive or to some of the wait/test operations;
if Open MPI was compiled with an asynchronous progression thread, operations are progressed in the background even if no further communication calls are made.
The default behaviour is not to enable the asynchronous progression thread as doing so increases the latency of the operations somehow.
The MVAPICH site is unreachable at the moment from here, but earlier I saw a mention of asynchronous progress in the features list. Probably that's where you should start from - search for ways to enable it.
Also note that MV2_SMP_EAGERSIZE controls the shared memory protocol eager message size and does not affect the InfiniBand protocol, i.e. it can only improve the communication between processes that reside on the same cluster node.
By the way, there is no guarantee that the receive operations would be started before the ready send operations in the neighbouring ranks, so they might not function as expected as the ordering in time is very important there.
For MPICH, you can set MPICH_ASYNC_PROGRESS=1 environment variable when runing mpiexec/mpirun. This will spawn a background process which does "asynchronous progress" stuff.
MPICH_ASYNC_PROGRESS - Initiates a spare thread to provide
asynchronous progress. This improves progress semantics for
all MPI operations including point-to-point, collective,
one-sided operations and I/O. Setting this variable would
increase the thread-safety level to
MPI_THREAD_MULTIPLE. While this improves the progress
semantics, it might cause a small amount of performance
overhead for regular MPI operations.
from MPICH Environment Variables
I have tested on my cluster with MPICH-3.1.4, it worked! I believe MVAPICH will also work.

Reading from the serial port in a multi-threaded program on Linux

I'm writing a program in linux to interface, through serial, with a piece of hardware. The device sends packets of approximately 30-40 bytes at about 10Hz. This software module will interface with others and communicate via IPC so it must perform a specific IPC sleep to allow it to receive messages that it's subscribed to when it isn't doing anything useful.
Currently my code looks something like:
while(1){
IPC_sleep(some_time);
read_serial();
process_serial_data();
}
The problem with this is that sometimes the read will be performed while only a fraction of the next packet is available at the serial port, which means that it isn't all read until next time around the loop. For the specific application it is preferable that the data is read as soon as it's available, and that the program doesn't block while reading.
What's the best solution to this problem?
The best solution is not to sleep ! What I mean is a good solution is probably to mix
the IPC event and the serial event. select is a good tool to do this. Then you have to find and IPC mechanism that is select compatible.
socket based IPC is select() able
pipe based IPC is select() able
posix message queue are also selectable
And then your loop looks like this
while(1) {
select(serial_fd | ipc_fd); //of course this is pseudo code
if(FD_ISSET(fd_set, serial_fd)) {
parse_serial(serial_fd, serial_context);
if(complete_serial_message)
process_serial_data(serial_context)
}
if(FD_ISSET(ipc_fd)) {
do_ipc();
}
}
read_serial is replaced with parse_serial, because if you spend all your time waiting for complete serial packet, then all the benefit of the select is lost. But from your question, it seems you are already doing that, since you mention getting serial data in two different loop.
With the proposed architecture you have good reactivity on both the IPC and the serial side. You read serial data as soon as they are available, but without stopping to process IPC.
Of course it assumes you can change the IPC mechanism. If you can't, perhaps you can make a "bridge process" that interface on one side with whatever IPC you are stuck with, and on the other side uses a select()able IPC to communicate with your serial code.
Store away what you got so far of the message in a buffer of some sort.
If you don't want to block while waiting for new data, use something like select() on the serial port to check that more data is available. If not, you can continue doing some processing or whatever needs to be done instead of blocking until there is data to fetch.
When the rest of the data arrives, add to the buffer and check if there is enough to comprise a complete message. If there is, process it and remove it from the buffer.
You must cache enough of a message to know whether or not it is a complete message or if you will have a complete valid message.
If it is not valid or won't be in an acceptable timeframe, then you toss it. Otherwise, you keep it and process it.
This is typically called implementing a parser for the device's protocol.
This is the algorithm (blocking) that is needed:
while(! complete_packet(p) && time_taken < timeout)
{
p += reading_device.read(); //only blocks for t << 1sec.
time_taken.update();
}
//now you have a complete packet or a timeout.
You can intersperse a callback if you like, or inject relevant portions in your processing loops.

Resources