I'm dealing with pipe communications between threads in C programming.
I have 2 threads:
-thread 1 just manage some events,
-thread 2 communicates with a serial port;
Thread 1 and 2 communicates with a pipe.
The "events manager" thread if there are some conditions should send a string to the "serial manager" with e.g. pipe[1], which is polling out from serial port and from pipe[0].
Then if there's a string from pipe[0] it should do his work.
The problem is that thread 1 writes faster than threads 2 reads.
So my question is: how do I have to properly read from pipe[0]? How do I have a queue? Because if I use read simply in blocking way just typing in thread 2:
read(pipe[0], string, sizeof(string)-1)
thread 2 reads all the thread 1 overload messages;
The only solution that I found is to create another pipe that blocks thread 1 (because thread 1 starts to read after writing, read is in blocking way), so thread 1 waits until thread 2 has done is work (this is useful 'cause I can get response from thread2), but my question is: is this the correct way? My convinction is that I'm missing something about read function.
"[I]s this the correct way [to read variable-length asynchronous messages over a FIFO, processing them one at a time]?"
No, you do not need to synchronize a single producer sending variable-length messages over a FIFO to a single consumer to process the messages one at a time.
As you documented in your own answer, you could add a record terminator to the messages. You could implement a simple protocol that describes the length of each message (c.f. netstrings).
There's a lot of prior practice you can borrow here. For example, you don't need to read record-terminated messages one character at a time, but could locally buffer partial messages — think of what stdio does to turn bytestreams into lines. The constraint that there's only one consumer makes some things easy.
"[I]s this the correct way [to send variable-length asynchronous messages between threads]?"
It's serviceable but perhaps not ideal.
A message-oriented, queuing channel might be better suited here: message queues or a datagram socket pair.
Thank you all for the answers.
I've found the solution, don't know if it is stylistically correct but it works very well, using read function in non-blocking way. So just have to configure the pipe in the main:
fcntl(pipe_fd[0], F_SETFL, O_NONBLOCK);
Then just assure to write from thread1 the string plus the '\0' char:
write (pipe_fd[1], string, sizeof(string)+1);
And at last the read in the thread2 should be like this one
int n_bytes, offset;
int pres;
char ch[2];
string[256];
pres = poll (....);
if (pres > 0){
...
...
/* if it's the pipe_fd[0] ...*/
offset = 0;
do{
n_bytes = read(pipe_fd[0], &ch, 1);
string[offset] = ch[0];
offset += n;
}while(n>0 && ch[0]!='\0' && offset < sizeof(string))
work_with_message(string)....
that's it, tell me what you think ;-)
Related
I am writing a small server that will receive data from multiple sources and process this data. The sources and data received is significant, but no more than epoll should be able to handle quite well. However, all received data must be parsed and run through a large number of tests which is time consuming and will block a single thread despite epoll multiplexing. Basically, the pattern should be something like follows: IO-loop receives data and bundles it into a job, sends to the first thread available in the pool, the bundle is processed by the job and the result is passed pack to the IO loop for writing to file.
I have decided to go for a single IO thread and N worker threads. The IO thread for accepting tcp connections and reading data is easy to implement using the example provided at:
http://linux.die.net/man/7/epoll
Thread are also usually easy enough to deal with, but I am struggling to combine the epoll IO loop with a threadpool in an elegant manner. I am unable to find any "best practice" for using epoll with a worker pool online either, but quite a few questions regarding the same topic.
I therefore have some question I hope someone can help me answering:
Could (and should) eventfd be used as a mechanism for 2-way synchronization between the IO thread and all the workers? For instance, is it a good idea for each worker thread to have its own epoll routine waiting on a shared eventfd (with a struct pointer, containing data/info about the job) i.e. using the eventfd as a job queue somehow? Also perhaps have another eventfd to pass results back into the IO thread from multiple worker threads?
After the IO thread is signaled about more data on a socket, should the actual recv take place on the IO thread, or should the worker recv the data on their own in order to not block the IO thread while parsing data frames etc.? In that case, how can I ensure safety, e.g. in case recv reads 1,5 frames of data in a worker thread and another worker thread receives the last 0,5 frame of data from the same connection?
If the worker thread pool is implemented through mutexes and such, will waiting for locks block the IO thread if N+1 threads are trying to use the same lock?
Are there any good practice patterns for how to build a worker thread pool around epoll with two way communication (i.e. both from IO to workers and back)?
EDIT: Can one possible solution be to update a ring buffer from the IO-loop, after update send the ring buffer index to the workers through a shared pipe for all workers (thus giving away control of that index to the first worker that reads the index off the pipe), let the worker own that index until end of processing and then send the index number back into the IO-thread through a pipe again, thus giving back control?
My application is Linux-only, so I can use Linux-only functionality in order to achieve this in the most elegant way possible. Cross platform support is not needed, but performance and thread safety is.
In my tests, one epoll instance per thread outperformed complicated threading models by far. If listener sockets are added to all epoll instances, the workers would simply accept(2) and the winner would be awarded the connection and process it for its lifetime.
Your workers could look something like this:
for (;;) {
nfds = epoll_wait(worker->efd, &evs, 1024, -1);
for (i = 0; i < nfds; i++)
((struct socket_context*)evs[i].data.ptr)->handler(
evs[i].data.ptr,
evs[i].events);
}
And every file descriptor added to an epoll instance could have a struct socket_context associated with it:
void listener_handler(struct socket_context* ctx, int ev)
{
struct socket_context* conn;
conn->fd = accept(ctx->fd, NULL, NULL);
conn->handler = conn_handler;
/* add to calling worker's epoll instance or implement some form
* of load balancing */
}
void conn_handler(struct socket_context* ctx, int ev)
{
/* read all available data and process. if incomplete, stash
* data in ctx and continue next time handler is called */
}
void dummy_handler(struct socket_context* ctx, int ev)
{
/* handle exit condition async by adding a pipe with its
* own handler */
}
I like this strategy because:
very simple design;
all threads are identical;
workers and connections are isolated--no stepping on each other's toes or calling read(2) in the wrong worker;
no locks are required (the kernel gets to worry about synchronization on accept(2));
somewhat naturally load balanced since no busy worker will actively contend on accept(2).
And some notes on epoll:
use edge-triggered mode, non-blocking sockets and always read until EAGAIN;
avoid dup(2) family of calls to spare yourself from some surprises (epoll registers file descriptors, but actually watches file descriptions);
you can epoll_ctl(2) other threads' epoll instances safely;
use a large struct epoll_event buffer for epoll_wait(2) to avoid starvation.
Some other notes:
use accept4(2) to save a system call;
use one thread per core (1 for each physical if CPU-bound, or 1 for each each logical if I/O-bound);
poll(2)/select(2) is likely faster if connection count is low.
I hope this helps.
When performing this model, because we only know the packet size once we have fully received the packet, unfortunately we cannot offload the receive itself to a worker thread. Instead the best we can still do is a thread to receive the data which will have to pass off pointers to fully received packets.
The data itself is probably best held in a circular buffer, however we will want a separate buffer for each input source (if we get a partial packet we can continue receiving from other sources without splitting up the data. The remaining question is how to inform the workers of when a new packet is ready, and to give them a pointer to the data in said packet. Because there is little data here, just some pointers the most elegant way of doing this would be with posix message queues. These provide the ability for multiple senders and multiple receivers to write and read messages, always ensuring every message is received and by precisely 1 thread.
You will want a struct resembling the one below for each data source, I shall go through the fields purposes now.
struct DataSource
{
int SourceFD;
char DataBuffer[MAX_PACKET_SIZE * (THREAD_COUNT + 1)];
char *LatestPacket;
char *CurrentLocation
int SizeLeft;
};
The SourceFD is obviously the file descriptor to the data stream in question, the DataBuffer is where Packets contents are held while being processed, it is a circular buffer. The LatestPacket pointer is used to temporarily hold a pointer to the most resent packet in case we receive a partial packet and move onto another source before passing the packet off. The CurrentLocation stores where the latest packet ends so that we know where to place the next one, or where to carry on in case of partial receive. The size left is the room left in the buffer, this will be used to tell if we can fit the packet or need to circle back around to the beginning.
The receiving function will thus effectively
Copy the contents of the packet into the buffer
Move CurrentLocation to point to the end of the packet
Update SizeLeft to account for the now decreased buffer
If we cannot fit the packet in the end of the buffer we cycle around
If there is no room there either we try again a bit later, going to another source meanwhile
If we had a partial receive store the LatestPacket pointer to point to the start of the packet and go to another stream until we get the rest
Send a message using a posix message queue to a worker thread so it can process the data, the message will contain a pointer to the DataSource structure so it can work on it, it also needs a pointer to the packet it is working on, and it's size, these can be calculated when we receive the packet
The worker thread will do its processing using the received pointers and then increase the SizeLeft so the receiver thread will know it can carry on filling the buffer. The atomic functions will be needed to work on the size value in the struct so we don't get race conditions with the size property (as it is possible it is written by a worker and the IO thread simultaneously, causing lost writes, see my comment below), they are listed here and are simple and extremely useful.
Now, I have given some general background but will address the points given specifically:
Using the EventFD as a synchronization mechanism is largely a bad idea, you will find yourself using a fair amount of unneeded CPU time and it is very hard to perform any synchronization. Particularly if you have multiple threads pick up the same file descriptor you could have major problems. This is in effect a nasty hack that will work sometimes but is no real substitute for proper synchronization.
It is also a bad idea to try and offload the receive as explained above, you can get around the issue with complex IPC but frankly it is unlikely receiving IO will take enough time to stall your application, your IO is also likely much slower than CPU so receiving with multiple threads will gain little. (this assumes you do not say, have several 10 gigabit network cards).
Using mutexes or locks is a silly idea here, it fits much better into lockless coding given the low amount of (simultaneously) shared data, you are really just handing off work and data. This will also boost performance of the receive thread and make your app far more scalable. Using the functions mentioned here http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html you can do this nice and easily. If you did do it this way, what you would need is a semaphore, this can be unlocked every time a packet is received and locked by each thread which starts a job to allow dynamically more threads in if more packets are ready, that would have far less overhead then a homebrew solution with mutexes.
There is not really much difference here to any thread pool, you spawn a lot of threads then have them all block in mq_receive on the data message queue to wait for messages. When they are done they send their result back to the main thread which adds the results message queue to its epoll list. It can then receive results this way, it is simple and very efficient for small data payloads like pointers. This will also use little CPU and not force the main thread to waste time managing workers.
Finally your edit is fairly sensible, except for the fact as I ave suggested, message queues are far better than pipes here as they very efficiently signal events , guarantee a full message read and provide automatic framing.
I hope this helps, however it is late so if I missed anything or you have questions feel free to comment for clarification or more explanation.
I post the same answer in other post: I want to wait on both a file descriptor and a mutex, what's the recommended way to do this?
==========================================================
This is a very common seen problem, especially when you are developing network server-side program. Most Linux server-side program's main look will loop like this:
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
resp = proc(req);
fd.send(resp);
}
}
It is single threaded(the main thread), epoll based server framework. The problem is, it is single threaded, not multi-threaded. It requires that proc() should never blocks or runs for a significant time(say 10 ms for common cases).
If proc() will ever runs for a long time, WE NEED MULTI THREADS, and executes proc() in a separated thread(the worker thread).
We can submit task to the worker thread without blocking the main thread, using a mutex based message queue, it is fast enough.
Then we need a way to obtain the task result from a worker thread. How? If we just check the message queue directly, before or after epoll_wait(), however, the checking action will execute after epoll_wait() to end, and epoll_wait() usually blocks for 10 micro seconds(common cases) if all file descriptors it wait are not active.
For a server, 10 ms is quite a long time! Can we signal epoll_wait() to end immediately when task result is generated?
Yes! I will describe how it is done in one of my open source project.
Create a pipe for all worker threads, and epoll waits on that pipe as well. Once a task result is generated, the worker thread writes one byte into the pipe, then epoll_wait() will end in nearly the same time! - Linux pipe has 5 us to 20 us latency.
In my project SSDB(a Redis protocol compatible in-disk NoSQL database), I create a SelectableQueue for passing messages between the main thread and worker threads. Just like its name, SelectableQueue has an file descriptor, which can be wait by epoll.
SelectableQueue: https://github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94
Usage in main thread:
epoll_add(serv_sock);
epoll_add(queue->fd());
while(1){
ret = epoll_wait();
foreach(ret as fd){
if(fd is worker_thread){
sock, resp = worker->pop_result();
sock.send(resp);
}
if(fd is client_socket){
req = fd.read();
worker->add_task(fd, req);
}
}
}
Usage in worker thread:
fd, req = queue->pop_task();
resp = proc(req);
queue->add_result(fd, resp);
I have created a FIFO where I can do non-blocking writes into in this way:
// others, searching for a non-blocking FIFO-writer may copy this ;-)
mkfifo("/tmp/myfifo", S_IRWXU);
int fifo_fd = open("/tmp/myfifo", O_RDWR);
fcntl(fifo_fd, F_SETFL, fcntl(fifo_fd, F_GETFL) | O_NONBLOCK);
// and then in a loop:
LOGI("Writing into fifo.");
if (write(fifo_fd, data, count) < 0) {
LOGE("Failed to write into fifo: %s", strerror(errno));
}
The non-blocking write works perfect.
On the other side, I open the FIFO for read and do the same fcntl() to make the read() non-blocking.
I now would like to make several (cpu-intensive) calculations on the write side, but ONLY if there is a reader attached.
Therefor I need to find a way on the write side, to detect if the FIFO is opended for read somewhere else.
Has anyone an idea how to achieve this?
I now would like to make several (cpu-intensive) calculations on the write side, but ONLY if there is a reader attached.
For that you can simply create a socket, and when a consumer connects to it, do some work and write back.
But I think a better solution is to have calculation results ready for consumers before they connect (or open FIFO). but you don't want the producer working if work is not being consumed. So define N, number of work results you're willing to keep available for consumption and let the producer (or producers) work and save results in a queue of size N until it's full.
You could implement this with threads, one thread listens for connections, pops from the queue and writes to the consumer and one or more producer threads working and pushing to the queue.
Or you could use POSIX message queues to avoid threading headaches. Create a queue of size N, independent producers (multiple processes written in different languages) could push to the queue until its full, and multiple independent consumers pop from it.
BackgroundI'm playing around with a FIFO and every time I try to write on my fifo it blocks until someone is reading the other side of the fd:
int fd;
char buffer[100] = {0};
char * myfifo = "/tmp/myfifo";
mkfifo(myfifo, 0666);
printf("What would you like to send?\n");
fgets(buffer, 100, stdin);
if((fd = open(myfifo, O_WRONLY)) < 0)
printf("Couldn't open the FIFO for writing!\n");
else {
write(fd, buffer, strlen(buffer));
close(fd);
This code works, but it blocks until I read the /tmp/myfifo side and get the data. When I change the code as such:
if((fd = open(myfifo, O_WRONLY | O_NONBLOCK)) < 0)
Then the open fails with error No such device or address, unless I have someone sitting blocked on the "read" side.
Analysis
As per the man page for fifo():
The kernel maintains exactly one pipe object for each FIFO special file that is opened by at least one process. The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.
So this is expected operation.
Question
Based on my experiments, and what I'm reading... I have to assume a FIFO is a non-queued, non-buffered mechanism and only works when there is a process sitting and waiting for data.
Is there a different communication mechanism that works in a non-blocked buffered manor, basically a buffered FIFO, or would I have to make my own message storage/notification system for that?
EDIT
I say I'm "playing around" which is actually pretty descriptive here. I'm trying to learn the ins and outs of the various IPC mechanisms (FIFOs, sockets, and pipes). I'm working towards learning to use select() and understand what can be used to wake up sleeping processes which call select. Reason being it's part of a communications driver I'm analyzing for port to a new platform.
I excluded this from the original post as it is sort of irrelevant. I'm just trying to make sure I can understand (at the moment) FIFO's, how to use them, the limitations on them, and other IPC mechanism. Hence my original assumption/questions about "better" versions of FIFO that will store data and can be written to without blocking.
Not only is a FIFO buffered, but that's basically all a FIFO is. A FIFO is little more than a buffer in the kernel.
Discussion: The kernel has a policy that it refuses to write data to the buffer unless a process has the FIFO open for reading. This behavior is similar to pipes and TCP connections, although if there's no reader for a pipe or a TCP connection, the kernel will actually signal the writing process, terminating it (unless you install a handler). This behavior allows us to string together commands the way we expect, e.g.,
hexdump file.dat | head
The hexdump program gets killed once head reads a few lines. This is what we want 99% of the time, and hexdump doesn't need any special code to achieve this.
Solutions: It would help if you describe some more context about the problem you are trying to solve.
If you want a client/server system where the server queues messages that can be read by the clients, you can achieve this with Unix domain sockets. Unix domain sockets are similar to FIFOs but more flexible in various ways. (Most database servers prefer Unix domain sockets over other types of IPC.)
If you want to be able to store a persistent queue, where applications can independently enqueue and dequeue messages, then you will need to use ordinary files.
Unfortunately, "playing around with a FIFO" is not much to go on. If you want a good exercise in IPC, try writing a chat server that uses Unix domain sockets or TCP (or both at the same time, which isn't much harder). You can use telnet or nc (netcat) as a client. (Make sure you have the "OpenBSD" version of netcat.)
I have been working on a file download application, where server continiously waits new connection requests from clients, when a new connection arrives server accepts this connection and creates a new process to serve the client recently connected to server. Clients can request to download multiple files from server. For each file , client and server sides create a new thread and the data transfer for each file should be carried out between the proper thread couples of the server and client. I'm using C and pthread for threads. I have stable socket connection and successful process creation for each client for now.
For threaded file transer, i had an attempt as follows:
In client i'm creating threads which runs a method to receive files:
int k;
for (k = 0; k < fNameCounter; k++)
{
pthread_t thread_id;
int status = pthread_create(&thread_id, NULL, &receiveFile, fName );
if (status != 0)
{
printf("Thread Creation Failed \n");
exit(0);
}
}
Similarly in server side, i create same number of threads as follows:
int k;
for (k = 0; k < fnameCounter; k++)
{
pthread_t thread_id;
int status = pthread_create(&thread_id, NULL, &sendFile, fName );
if (status != 0)
{
printf("Thread Creation Failed \n");
exit(0);
}
}
sendFile and receiveFile functions simply writes and reads the bytes of files specified by fName(as you can see in pthread_create) by socket, at that point i have a major problem:
In this program as far as i thought, there might be problems that files' contents may be different after the all threads complete receiving data from server, because since sendFile and readFile functions just reads from the socket and writes to the socket.
How can i guarantee that, each thread of client gets the proper data from the proper thread of server like i explained below:
receive send
cthread1 ----> a.txt <----- sthread1
cthread1 ----> a.txt <----- sthread1
cthread1 ----> a.txt <----- sthread1
p.s. i'm aware that creating many threads on one socket does not makes sense but, it is my hw and i need to do in that way :/.
Regards.
The easiest way is to open a new socket for each file.
To complete zvrba very valid point (one socket, one file).
I think the threads are useless here: you have one network card. You won't get faster by listening to the same resource several times. You're actually likely to have your threads block each other and be actually slower.
If you want to give feedback to the user, then you may create one background threads for all the socket IO.
You should handle your several connections in the same thread using select to detect which socket may be read from / written to.
Considering the restriction of only one socket, you might want to add some header info to the bytes being transmitted. This header might contain information about which pair of threads is responsible for that bit of information.
For example, sthread1 will send 100 bytes from file a.txt. You might add one byte to the beginning of that stream, containing the number 1. When receiving that chunck of data, cthread1 needs to check the first byte: if it is 1 then ok, this chunk is for cthread1, proceed processing. If the header isn't 1, then cthread1 should ignore this chunk and keep waiting, giving the other threads opportunity to run too.
If you don't identify your chunks of data, it will be impossible to determine which thread should process which chunk.
Please note that this adds a lot of complexity to deal with:
You will have to decide how to assign those identifiers to your pairs of threads
If you use only one byte, keep in mind that the maximum value it holds is 255.
You might have to use the flag MSG_PEEK in your recv function inside cthread (so you can ignore the chunk if needed). This flag alters the behaviour of the function recv. I advise you to read this (if you're programming on Win) or this (if you're on Linux).
I'm running a multi-threaded C program (process?) , making use of semaphores & pthreads. The threads keep interacting, blocking, waking & printing prompts on stdout continuously, without any human intervention. I want to be able to exit this process (gracefully after printing a message & putting down all threads, not via a crude CTRL+C SIGINT) by pressing a keyboard character like #.
What are my options for getting such an input from the user?
What more relevant information could I provide that will help to solve this problem?
Edit:
All your answers sound interesting, but my primary question remains. How do I get user input, when I don't know which thread is currently executing? Also, semaphore blocking using sem_wait() breaks if signalled via SIGINT, which may cause a deadlock.
There is no difference in reading standard input from threads except if more than one thread is trying to read it at the same time. Most likely your threads are not all calling functions to read standard input all the time, though.
If you regularly need to read input from the user you might want to have one thread that just reads this input and then sets flags or posts events to other threads based on this input.
If the kill character is the only thing you want or if this is just going to be used for debugging then what you probably want to do is occasionally poll for new data on standard input. You can do this either by setting up standard input as non-blocking and try to read from it occasionally. If reads return 0 characters read then no keys were pressed. This method has some problems, though. I've never used stdio.h functions on a FILE * after having set the underlying file descriptor (an int) to non-blocking, but suspect that they may act odd. You could avoid the use of the stdio functions and use read to avoid this. There is still an issue I read about once where the block/non-block flag could be changed by another process if you forked and exec-ed a new program that had access to a version of that file descriptor. I'm not sure if this is a problem on all systems. Nonblocking mode can be set or cleared with a 'fcntl' call.
But you could use one of the polling functions with a very small (0) timeout to see if there is data ready. The poll system call is probably the simplest, but there is also select. Various operating systems have other polling functions.
#include <poll.h>
...
/* return 0 if no data is available on stdin.
> 0 if there is data ready
< 0 if there is an error
*/
int poll_stdin(void) {
struct pollfd pfd = { .fd = 0, .events = POLLIN };
/* Since we only ask for POLLIN we assume that that was the only thing that
* the kernel would have put in pfd.revents */
return = poll(&pfd, 1, 0);
}
You can call this function within one of your threads until and as long as it retuns 0 you just keep on going. When it returns a positive number then you need to read a character from stdin to see what that was. Note that if you are using the stdio functions on stdin elsewhere there could actually be other characters already buffered up in front of the new character. poll tells you that the operating system has something new for you, not what C's stdio has.
If you are regularly reading from standard input in other threads then things just get messy. I'm assuming you aren't doing that (because if you are and it works correctly you probably wouldn't be asking this question).
You would have a thread listening for keyboard input, and then it would join() the other threads when receiving # as input.
Another way is to trap SIGINT and use it to handle the shutdown of your application.
The way I would do it is to keep a global int "should_die" or something, whose range is 0 or 1, and another global int "died," which keeps track of the number of threads terminated. should_die and died are both initially zero. You'll also need two semaphores to provide mutex around the globals.
At a certain point, a thread checks the should_die variable (after acquiring the mutex, of course). If it should die, it acquires the died_mutex, ups the died count, releases the died_mutex, and dies.
The main initial thread periodically wakes up, checks that the number of threads that have died is less than the number of threads, and goes back to sleep. The main thread dies when all the other threads have checked in.
If the main thread doesn't spawn all the threads itself, a small modification would be to have "threads_alive" instead of "died". threads_alive is incremented when a thread forks, and decremented when the thread dies.
In general, terminating a multithreaded operation cleanly is a pain in the butt, and besides special cases where you can use things like the semaphore barrier design pattern, this is the best I've heard of. I'd love to hear it if you find a better, cleaner one.
~anjruu
In general, I have threads waiting on a set of events and one of those events is the termination event.
In the main thread, when I have triggered the termination event, I then wait on all the threads having exited.
SIGINT is actually not that difficult to handle and is often used for graceful termination. You need a signal handler and a way to tell all the threads that it's time to stop. One global flag that threads check in their loops and the signal handler sets might do. Same approach works for "on user command" termination, though you need a way to get the input from the terminal - either poll in a dedicated thread, or again, set the terminal to generate a signal for you.
The tricky part is to unblock waiting threads. You have to carefully design the notification protocol of who tells who to stop and what they need to do - put dummy message into a queue, set a flag and signal a cv, etc.