Suggestion for callbacks oriented library in C - c

I'm making small library for controlling various embedded devices using C language. I'm using UDP sockets to communicate with each of the devices. Devices send me various interesting data, alarms and notifications and at the same time they send some data that is used internally by the library but may not be interesting to users. So, I've implemented a callback approach, where user could register a callback function with some interesting events on each of the devices. Right now, overall design of this library is something like this:-
I've two threads running.
In one of the thread, there is a infinite while event-loop that uses select and non-blocking sockets to maintain the communication with each of the devices.
Basically, every time I receive a packet from any of devices, I strip off the header which is 20 bytes of some useless information and add my own header containing DEVICE_ID, REQUES_TIME (time request was sent to retrieve that packet and RETRIEVAL_TIME (time now when packet actually arrived) and REQUEST_ID and REQUEST_TYPE (alarm, data, notification etc..).
Now, this thread (one with infinite loop) put packet with new header into ring buffer and then notifies other thread (thread #2) to parse this information.
In thread #2, when notification is received, it locks the buffer and read pop the packet and start parsing it.
Every message contains some information that user may not be interested, so I'm providing user call back approach to act upon data which is useful to user.
Basically, I'm doing something like this in thread 2:-
THREAD #2
wait(data_put_in_buffer_cond)
lock(buffer_mutex)
packet_t* packet = pop_packet_from_buffer(buf);
unlock(buffer_mutex)
/* parsing the package... */
parsed_packet_t* parsed_packet = parse_and_change_endianess(packet->data);
/* header for put by thread #1 with host byte order only not parsing necessary */
header_t* header = get_header(packet);
/* thread 1 sets free callback for kind of packet it puts in buffer
* This not a critical section section of buffer, so fine without locks
*/
buffer.free_callback(packet);
foreach attribute in parsed_packet->attribute_list {
register_info_t* rinfo = USER_REGISTRED_EVENT_TABLE[header->device_id][attribute.attr_id];
/*user is register with this attribute ID on this device ID */
if(rinfo != NULL) {
rinof->callback(packet);
}
// Do some other stuff with this attribute..
}
free(parsed_packet);
Now, my concerned is that what will happen if callback function that user implements takes some time to complete and meanwhile I may drop some packet because ring buffer is in overwriting mode? I've tested my API for 3 to 4 devices, I don't see much drop event if callback function wait decent amount of time..I'm speculating that this approach may not be best.
Would it be a better design, if I use some sort of thread-pool to run user callback functions? In that case I would need to make explicit copy of packet before I send it to user callback? Each packet is about 500 to 700 bytes, I get around 2 packets per second from each device. Any suggestions or comments on improving the current design or solving this issues would be appreciated.

Getting 500-700 bytes per device is not a problem at all, especially if you only have 3-4 devices. Even if you had, let's say, 100 devices, it should not be a problem. The copy overhead would be most probably negligible. So, my suggest would be: do not try to optimize beforehand until you are certain that buffer copying is your bottleneck.
About losing packets, as you say in your question, you are already using a buffer ring (I assume that is something like a circular queue, right?). If the queue becomes full, then you just need to make thread #1 to wait until there is some available space in the queue. Clearly, more events from different devices may arrive, but that should not be a problem. Once, you have space again, select will let you know that you have available data from different devices, so you will just need to process all that data. Of course, in order to have a balanced system, you can set the size of the queue to a value that reduces as much as possible the number of times that the queue is full, and thus, thread #1 needs to wait.

Related

Sending large amount of data from ISR using queues in RTOS

I am working on an STM32F401 MC for audio acquisition and I am trying to send the audio data(384 bytes exactly) from ISR to a task using queues. The frequency of the ISR is too high and hence I believe some data is dropped due to the queue being full. The audio recorded from running the code is noisy. Is there any easier way to send large amounts of data from an ISR to a task?
The RTOS used is FreeRTOS and the ISR is the DMA callback from the I2S mic peripheral.
The general approach in these cases is:
Down-sample the raw data received in the ISR (e.g., save only 1 out of 4 samples)
Accumulate a certain number of samples before sending them in a message to the task
You can implement a "zero copy" queue by creating a queue of pointers to memory blocks rather than copying the memory itself. Have the audio data written directly to a block (by DMA for example), then when full, enqueue a pointer to the block, and switch to the next available block in the pool. The receiving task can then operate directly on the memory block without needing to copy the data either into and out of the queue - the only thing copied is the pointer.
The receiving task when done, returns the block back to the pool. The pool should have the same number of blocks as queue length.
To create a memory pool you would start with a static array:
tAudioSample block[QUEUE_LENGTH][BLOCK_SIZE] ;
Then fill a block_pool queue with pointers to each block element - pseudocode:
for( int i = 0; i < QUEUE_LENGTH; i++ )
{
queue_send( block_pool, block[i] ) ;
}
Then to get an "available" block, you simply take a pointer from the queue, fill it, and then send to your audio stream queue, and the receiver when finished with the block posts the pointer back to the block_pool.
Some RTOS provide a fixed block allocator that does exactly what I described above with the block_pool queue. If you are using the CMSIS RTOS API rather than native FreeRTOS API, that provides a memory pool API.
However, it sounds like this may be an X-Y problem - you have presented your diagnosis, which may or may not be correct and decided on a solution which you are then asking for help with. But what if it is the wrong or nor the optimum solution? Better to include some code showing how the data is generated and consumed, and provide concrete information such as where this data is coming from, how often the ISR is generated, sample rates, the platform it is running on, the priority and scheduling of the receiving task, and what other tasks are running that might delay it.
On most platforms 384 bytes is not a large amount of data, and the interrupt rate would have to be extraordinarily high or the receiving task to be excessively delayed (i.e not real time) or doing excessive or non-deterministic work to cause this problem. It may not be the ISR frequency that is the problem, but rather the performance and schedulability of the receiving task.
It is not clear if you 384 bytes results in a single interrupt or 384 interrupts or what?
That is to say that it may be a more holistic design issue rather than simply how to pass data more efficiently - though that can't be a bad thing.
If the thread receiving the data is called at periodic intervals, the queue should be sized sufficiently to hold all data that may be received in that interval. It would probably be a good idea to make sure the queue is large enough to hold data for at least two intervals.
If the thread receiving the data is simply unable to keep up with the incoming data, then one could consider increasing its priority.
There is some overhead processing associated with each push to and pull from the queue, since FreeRTOS will check to determine whether a higher priority task should wake up in response to the action. When writing or reading multiple items to or from the queue at the same time, it may help to suspend the scheduler while the transfer is taking place.
Another solution would be to implement a circular buffer and place it into shared memory. This will basically perform the same function as a queue, but without the extra overhead. You may need to use a mutex to block simultaneous access to the buffer, depending on how the circular buffer is implemented.

Lost in libpcap - how to use setnonblock() / should I use pcap_dispatch() or pcap_next_ex() for realtime?

I'm building a network sniffer that will run on a PFSense for monitoring an IPsec VPN. I'm compiling under FreeBSD 8.4.
I've chosen to use libpcap in C for the packet capture engine and Redis as the storage system.
There will be hundreds of packets per second to handle, the system will run all day long.
The goal is to have a webpage showing graphs about network activity, with updates every minutes or couple of seconds if that's possible.
When my sniffer will capture a packet, it'll determine its size, who (a geographical site, in our VPN context) sent it, to whom and when. Then those informations needs to be stored in the database.
I've done a lot of research, but I'm a little lost with libpcap, specifically with the way I should capture packets.
1) What function should I use to retrieve packets ? pcap_loop ? pcap_dispatch ? Or pcap_next_ex ? According to what I read online, loop and dispatch are blocking execution, so pcap_next_ex seems to be the solution.
2) When are we supposed to use pcap_setnonblock ? I mean with which capture function ? pcap_loop ? So if I use pcap_loop the execution won't be blocked ?
3) Is multi-threading the way to achieve this ? One thread running all the time capturing packets, analyzing them and storing some data in an array, and a second thread firing every minutes emptying this array ?
The more I think about it, the more I get lost, so please excuse me if I'm unclear and don't hesitate to ask me for precisions.
Any help is welcome.
Edit :
I'm currently trying to implement a worker pool, with the callback function only putting a new job in the job queue. Any help still welcome. I will post more details later.
1) What function should I use to retrieve packets ? pcap_loop ? pcap_dispatch ? Or pcap_next_ex ? According to what I read online, loop and dispatch are blocking execution, so pcap_next_ex seems to be the solution.
All of those functions block waiting for packets if there's no packet ready and you haven't put the pcap_t into non-blocking mode.
pcap_loop() contains a loop that runs indefinitely (or until pcap_breakloop() is called, and error occurs, or, if a count was specified, the count runs out). Thus, it may wait more than one time.
pcap_dispatch() blocks waiting for a batch of packets to arrive, if no packets are available, loops processing the batch, and then returns, so it only waits at most one time.
pcap_next() and pcap_next_ex() wait for a packet to be available and then returns it.
2) When are we supposed to use pcap_setnonblock ? I mean with which capture function ? pcap_loop ? So if I use pcap_loop the execution won't be blocked ?
No, it won't be, but that means that a call to pcap_loop() might return without processing any packets; you'll have to call it again to process any packets that arrive later. Non-blocking and pcap_loop() isn't really useful; you might as well use pcap_dispatch() or pcap_next_ex().
3) Is multi-threading the way to achieve this ? One thread running all the time capturing packets, analyzing them and storing some data in an array, and a second thread firing every minutes emptying this array ?
(The array would presumably be a queue, with the first thread appending packet data to the end of the queue and the second thread removing packet data from the head of the queue and putting it into the database.)
That would be one possibility, although I'm not sure whether it should be timer-based. Given that many packet capture mechanisms - including BPF, as used by *BSD and OS X - deliver packets in batches, perhaps you should have one loop that does something like
for (;;) {
pcap_dispatch(p, -1, callback);
wake up dumper thread;
}
(that's simplified - you might want to check for errors in the return value from pcap_dispatch()).
The callback would add the packet handed to it to the end of the queue.
The dumper thread would pull packets from the head of the queue and add them to the database.
This would not require that the pcap_t be non-blocking. On a single-CPU-core machine, it would rely on the threads being time-sliced by the scheduler.
I'm currently trying to implement a worker pool, with the callback function only putting a new job in the job queue.
Be aware that, once the callback function returns, neither the const struct pcap_pkthdr structure pointed to by its second argument nor the raw packet data pointed to by its third argument will be valid, so if you want to process them in the job, you will have to make copies of them - including a copy of all the packet data you will be processing in that job. You might want to consider doing some processing in the callback routine, even if it's only figuring out what packet data needs to be saved (e.g., IP source and destination address) and copying it, as that might be cheaper than copying the entire packet.

One Socket Multiple Threads

I'm coding a part of little complex communication protocol to control multiple medical devices from single computer terminal. Computer terminal need to manage about 20 such devices. Every device uses same protocol fro communication called DEP. Now, I've created a loop that multiplexes within different devices to send the request and received the patient data associated with a particular device. So structure of this loop, in general, is something like this:
Begin Loop
Select Device i
if Device.Socket has Data
Strip Header
Copy Data on Queue
end if
rem_time = TIMEOUT - (CurrentTime - Device.Session.LastRequestTime)
if TIMEOUT <= 0
Send Re-association Request to Device
else
Sort Pending Request According to Time
Select First Request
Send the Request
Set Request Priority Least
end Select
end if
end Select
end Loop
I might have made some mistake in above pseudo-code, but I hope I've made myself clear about what this loop is trying to do. I've priority list structure that selects the device and pending request for that device, so that, all the requests and devices are selected at good optimal intervals.
I forgot to mention, above loop do not actually parse the received data, but it only strips off the header and put it in a queue. The data in queue is parsed in different thread and recorded in file or database.
I wish to add a feature so that other computers may also import the data and control the devices attached to computer terminal remotely. For this, I would need to create socket that would listen to commands in this INFINITE LOOP and send the data in different thread where PARSING is performed.
Now, my question to all the concurrency experts is that:
Is it a good design to use single socket for reading and writing in two different threads? Where each of the thread will be strictly involved in either reading or writing not both. Also, I believe socket is synchronized on process level, so do I need locks to synchronize the read and write over one socket from different threads?
There is nothing inherently wrong with having multiple threads handle a single socket; however, there are many good and bad designs based around this one very general idea. If you do not want to rediscover the problems as you code your application, I suggest you search around for designs that best fit your planned particular style of packet handling.
There is also nothing inherently wrong with having a single thread handle a single socket; however, if you put the logic handling on that thread, then you have selected a bad design, as then that thread cannot handle requests while it is "working" on the last reqeust.
In your particular code, you might have an issue. If your packets support fragmentation, or even if your algorithm gets a little ahead of the hardware due to timing issues, you might have just part of the packet "received" in the buffer. In that case, your algorithm will fail in two ways.
It will process a partial packet, one which has the first part of it's data.
It will mis-process the subsequent packet, as the information in the buffer will not start with a valid packet header.
Such failures are difficult to conceive and diagnose until they are encountered. Perhaps your library already buffers and splits messages, perhaps not.
In short, your design is not dictated by how many threads are accessing your socket: how many threads access your socket is dictated by your design.

Reading from the serial port in a multi-threaded program on Linux

I'm writing a program in linux to interface, through serial, with a piece of hardware. The device sends packets of approximately 30-40 bytes at about 10Hz. This software module will interface with others and communicate via IPC so it must perform a specific IPC sleep to allow it to receive messages that it's subscribed to when it isn't doing anything useful.
Currently my code looks something like:
while(1){
IPC_sleep(some_time);
read_serial();
process_serial_data();
}
The problem with this is that sometimes the read will be performed while only a fraction of the next packet is available at the serial port, which means that it isn't all read until next time around the loop. For the specific application it is preferable that the data is read as soon as it's available, and that the program doesn't block while reading.
What's the best solution to this problem?
The best solution is not to sleep ! What I mean is a good solution is probably to mix
the IPC event and the serial event. select is a good tool to do this. Then you have to find and IPC mechanism that is select compatible.
socket based IPC is select() able
pipe based IPC is select() able
posix message queue are also selectable
And then your loop looks like this
while(1) {
select(serial_fd | ipc_fd); //of course this is pseudo code
if(FD_ISSET(fd_set, serial_fd)) {
parse_serial(serial_fd, serial_context);
if(complete_serial_message)
process_serial_data(serial_context)
}
if(FD_ISSET(ipc_fd)) {
do_ipc();
}
}
read_serial is replaced with parse_serial, because if you spend all your time waiting for complete serial packet, then all the benefit of the select is lost. But from your question, it seems you are already doing that, since you mention getting serial data in two different loop.
With the proposed architecture you have good reactivity on both the IPC and the serial side. You read serial data as soon as they are available, but without stopping to process IPC.
Of course it assumes you can change the IPC mechanism. If you can't, perhaps you can make a "bridge process" that interface on one side with whatever IPC you are stuck with, and on the other side uses a select()able IPC to communicate with your serial code.
Store away what you got so far of the message in a buffer of some sort.
If you don't want to block while waiting for new data, use something like select() on the serial port to check that more data is available. If not, you can continue doing some processing or whatever needs to be done instead of blocking until there is data to fetch.
When the rest of the data arrives, add to the buffer and check if there is enough to comprise a complete message. If there is, process it and remove it from the buffer.
You must cache enough of a message to know whether or not it is a complete message or if you will have a complete valid message.
If it is not valid or won't be in an acceptable timeframe, then you toss it. Otherwise, you keep it and process it.
This is typically called implementing a parser for the device's protocol.
This is the algorithm (blocking) that is needed:
while(! complete_packet(p) && time_taken < timeout)
{
p += reading_device.read(); //only blocks for t << 1sec.
time_taken.update();
}
//now you have a complete packet or a timeout.
You can intersperse a callback if you like, or inject relevant portions in your processing loops.

How to make a UDP socket replace old messages (not yet recv()'d) when new arrive?

First, a little bit of context to explain why I am on the "UDP sampling" route:
I would like to sample data produced at a fast rate for an unknown period of time. The data I want to sample is on another machine than the one consuming the data. I have a dedicated Ethernet connection between the two so bandwidth is not an issue. The problem I have is that the machine consuming the data is much slower than the one producing it. An added constraint is that while it's ok that I don't get all the samples (they are just samples), it is mandatory that I get the last one.
My 1st solution was to make the data producer send a UDP datagram for each produced sample and let the data consumer try to get the samples it could and let the others be discarded by the socket layer when the UDP socket is full. The problem with this solution is that when new UDP datagrams arrive and the socket is full, it is the new datagrams that get discarded and not the old ones. Therefore I am not guarantueed to have the last one!
My question is: is there a way to make a UDP socket replace old datagrams when new arrive?
The receiver is currently a Linux machine, but that could change in favor of another unix-like OS in the future (windows may be possible as it implements BSD sockets, but less likely)
The ideal solution would use widespread mecanisms (like setsockopt()s) to work.
PS: I thought of other solutions but they are more complex (involve heavy modification of the sender), therefore I would like first to have a definite status on the feasability of what I ask! :)
Updates:
- I know that the OS on the receiving machine can handle the network load + reassembly of the traffic generated by the sender. It's just that its default behaviour is to discard new datagrams when the socket buffer is full. And because of the processing times in the receiving process, I know it will become full whatever I do (wasting half of the memory on a socket buffer is not an option :)).
- I really would like to avoid having an helper process doing what the OS could have done at packet-dispatching time and waste resource just copying messages in a SHM.
- The problem I see with modifying the sender is that the code which I have access to is just a PleaseSendThisData() function, it has no knowledge that it can be the last time it is called before a long time, so I don't see any doable tricks at that end... but I'm open to suggestions! :)
If there are really no way to change the UDP receiving behaviour in a BSD socket, then well... just tell me, I am prepared to accept this terrible truth and will start working on the "helper process" solution when I go back to it :)
Just set the socket to non-blocking, and loop on recv() until it returns < 0 with errno == EAGAIN. Then process the last packet you got, rinse and repeat.
I agree with "caf".
Set the socket to a non-blocking mode.
Whenever you receive something on the socket - read in a loop until nothing more is left. Then handle the last read datagram.
Only one note: you should set a large system receive buffer for the socket
int nRcvBufSize = 5*1024*1024; // or whatever you think is ok
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, (char*) &nRcvBufSize, sizeof(nRcvBufSize));
This will be difficult to get completely right just on the listener side since it could actually miss the last packet in the Network Interface Chip, which will keep your program from ever having had a chance at seeing it.
The operating system's UDP code would be the best place to try to deal with this since it will get new packets even if it decides to discard them because it already has too many queued up. Then it could make the decision of dropping an old one or dropping a new one, but I don't know how to go about telling it that this is what you would want it to do.
You can try to deal with this on the receiver by having one program or thread that always tries to read in the newest packet and another that always tries to get that newest packet. How to do this would differ based on if you did it as two separate programs or as two threads.
As threads you would need a mutex (semaphore or something like it) to protect a pointer (or reference) to a structure used to hold 1 UDP payload and whatever else you wanted in there (size, sender IP, sender port, timestamp, etc).
The thread that actually read packets from the socket would store the packet's data in a struct, acquire the mutex protecting that pointer, swap out the current pointer for a pointer to the struct it just made, release the mutex, signal the processor thread that it has something to do, and then clear out the structure that it just got a pointer to and use it to hold the next packet that comes in.
The thread that actually processed packet payloads should wait on the signal from the other thread and/or periodically (500 ms or so is probably a good starting point for this, but you decide) and aquire the mutex, swap its pointer to a UDP payload structure with the one that is there, release the mutex, and then if the structure has any packet data it should process it and then wait on the next signal. If it did not have any data it should just go ahead and wait on the next signal.
The processor thread should probably run at a lower priority than the UDP listener so that the listener is less likely to ever miss a packet. When processing the last packet (the one you really care about) the processor will not be interrupted because there are no new packets for the listener to hear.
You could extend this by using a queue rather than just a single pointer as the swapping place for the two threads. The single pointer is just a queue of length 1 and is very easy to process.
You could also extend this by attempting to have the listener thread detect if there are multiple packets waiting and only actually putting the last of those into the queue for the processor thread. How you do this will differ by platform, but if you are using a *nix then this should return 0 for sockets with nothing waiting:
while (keep_doing_this()) {
ssize_t len = read(udp_socket_fd, my_udp_packet->buf, my_udp_packet->buf_len);
// this could have been recv or recvfrom
if (len < 0) {
error();
}
int sz;
int rc = ioctl(udp_socket_fd, FIONREAD, &sz);
if (rc < 0) {
error();
}
if (!sz) {
// There aren't any more packets ready, so queue up the one we got
my_udp_packet->current_len = len;
my_udp_packet = swap_udp_packet(my_ucp_packet);
/* swap_udp_packet is code you would have to write to implement what I talked
about above. */
tgkill(this_group, procesor_thread_tid, SIGUSR1);
} else if (sz > my_udp_packet->buf_len) {
/* You could resize the buffer for the packet payload here if it is too small.*/
}
}
A udp_packet would have to be allocated for each thread as well as 1 for the swapping pointer. If you use a queue for swapping then you must have enough udp_packets for each position in the queue -- since the pointer is just a queue of length 1 it only needs 1.
If you are using a POSIX system then consider not using a real time signal for the signaling because they queue up. Using a regular signal will allow you to treat being signaled many times the same as being signaled just once until the signal is handled, while real time signals queue up. Waking up periodically to check the queue also allows you to handle the possibility of the last signal arriving just after you have checked to see if you had any new packets but before you call pause to wait on a signal.
Another idea is to have a dedicated reader process that does nothing but loops on the socket and reads incoming packets into circular buffer in shared memory (you'll have to worry about proper write ordering). Something like kfifo. Non-blocking is fine here too. New data overrides old data. Then other process(es) would always have access to latest block at the head of the queue and all the previous chunks not yet overwritten.
Might be too complicated for a simple one-way reader, just an option.
I'm pretty sure that this is a provably insoluble problem closely related to the Two Army Problem.
I can think of a dirty solution: establish a TCP "control" sideband connection which carries the last packet which is also a "end transmission" indication. Otherwise you need to use one of the more general pragmatic means noted in Engineering Approaches.
This is an old question, but you are basically wanting to turn the socket queue (FIFO) into a stack (LIFO). It's not possible, unless you want to fiddle with the kernel.
You'll need to move the datagrams from kernel space to user space and then process. Easiest approach would be a loop like this...
Block until there is data on the socket (see select, poll, epoll)
Drain the socket, storing datagrams per your own selection policy
Process the stored datagrams
Repeat

Resources