I am creating a C application, which will be executed in openwrt router device. Because of limited resources I'm a bit scared about the message queue. What if the "reader" application, which takes the messages from the queue, crash and the "writer" still sends the messages? Should I be worried about the device's memory or will the message queue clean itself eventually?
EDIT I realised that I wasn't clear enough about my task. One application will be sending messages and other will be reading and processing them.
See the documentation for msgsnd:
The queue capacity is governed by the msg_qbytes field in the associated data structure for the message queue. During queue creation this field is initialized to MSGMNB bytes, but this limit can be modified using msgctl(2).
If insufficient space is available in the queue, then the default behavior of msgsnd() is to block until space becomes available. If IPC_NOWAIT is specified in msgflg, then the call instead fails with the error EAGAIN.
So the sender will wait for the receiver to process a message, unless you use IPC_NOWAIT, in which case it returns EAGAIN and the sender can check for this error code.
The default maximum buffer size is specified in a constant called MSGMNB. You can print this value to see what it is on your system. To change the maximum size for your queue, you can use the function msgctl.
Related
So more recently, I have been developing some asynchronous algorithms in my research. I was doing some parallel performance studies and I have been suspicious that I am not properly understanding some details about the various non-blocking MPI functions.
I've seen some insightful posts on here, namely:
MPI: blocking vs non-blocking
MPI Non-blocking Irecv didn't receive data?
There's a few things I am uncertain about or just want to clarify related to working with non-blocking functionality that I think will help me potentially increase the performance of my current software.
From the Nonblocking Communication part of the MPI 3.0 standard:
A nonblocking send start call initiates
the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed
concurrently with computations done at the sender after the send was initiated and before it completed.
...
If the send mode is standard then the send-complete call may
return before a matching receive is posted, if the message is
buffered. On the other hand, the receive-complete may not complete
until a matching receive is posted, and the message was copied into
the receive buffer.
So as a first set of questions about the MPI_Isend (and similarly MPI_Irecv), it seems as though to ensure a non-blocking send finishes, I need to use some mechanism to check that it is complete because in the worst case, there may not be suitable hardware to transfer the data concurrently, right? So if I never use something like MPI_Test or MPI_Wait following the non-blocking send, the MPI_Isend may never actually get its message out, right?
This question applies to some of my work because I am sending messages via MPI_Isend and not actually testing for completeness until I get the expected response message because I want to avoid the overhead of MPI_Test calls. While this approach has been working, it seems faulty based on my reading.
Further, the second paragraph appears to say that for the standard non-blocking send, MPI_Isend, it may not even begin to send any of its data until the destination process has called a matching receive. Given the availability of MPI_Probe/MPI_Iprobe, does this mean an MPI_Isend call will at least send out some preliminary metadata of the message, such as size, source, and tag, so that the probe functions on the destination process can know a message wants to be sent there and so the destination process can actually post a corresponding receive?
Related is a question about the probe. In the Probe and Cancel section, the standard says that
MPI_IPROBE(source, tag, comm, flag, status) returns flag = true if there is a message that can be received and that matches the pattern specifed by the arguments source, tag, and comm. The call matches the same message that would have been received by a call to MPI_RECV(..., source, tag, comm, status) executed at the same point in the program, and returns in status the same value that would have been returned by MPI_RECV(). Otherwise, the call returns flag = false, and leaves status undefined.
Going off of the above passage, it is clear the probing will tell you whether there's an available message you can receive corresponding to the specified source, tag, and comm. My question is, should you assume that the data for the corresponding send from a successful probing has not actually been transferred yet?
It seems reasonable to me now, after reading the standard, that indeed a message the probe is aware of need not be a message that the local process has actually fully received. Given the previous details about the standard non-blocking send, it seems you would need to post a receive after doing the probing to ensure the source non-blocking standard send will complete, because there might be times where the source is sending a large message that MPI does not want to copy into some internal buffer, right? And either way, it seems that posting the receive after a probing is how you ensure that you actually get the full data from the corresponding send to be sent. Is this correct?
This latter question relates to one instance in my code where I am doing a MPI_Iprobe call and if it succeeds, I perform an MPI_Recv call to get the message. However, I think this could be problematic now because I was thinking in my mind that if the probe succeeds, that means it has gotten the whole message already. This implied to me that the MPI_Recv would run quickly, then, since the full message would already be in local memory somewhere. However, I am feeling this was an incorrect assumption now that some clarification on would be helpful.
The MPI standard does not mandate a progress thread. That means that MPI_Isend() might do nothing at all until communications are progressed. Progress occurs under the hood by most MPI subroutines, MPI_Test(), MPI_Wait() and MPI_Probe() are the most obvious ones.
I am afraid you are mixing progress and synchronous send (e.g. MPI_Ssend()).
MPI_Probe() is a local operation, it means it will not contact the sender and ask if something was sent nor progress it.
Performance wise, you should as much as possible avoid unexpected messages, it means a receive should be posted on one end before the message is sent by the other end.
There is a trade-off between performance and portability here :
if you want to write portable code, then you cannot assume there is a MPI progress thread
if you want to optimize your application on a given system, you should give a try to a MPI library that implements a progress thread on the interconnect you are using
Keep in mind most MPI implementations (read this is not mandated by the MPI standard, and you should not rely on it) send small messages in eager mode.
It means MPI_Send() will likely return immediately if the message is small enough (and small enough depends among other things on your MPI implementation, how it is tuned or which interconnect is used).
This is probably a really dumb question, but googling isn't working out so here goes. I am writing a program using message queues to send a range of values to different processes. I have done research that indicates that I use msgsnd() to store a message on the queue and msgrcv() to receive messages. I need to store a start number and an end number in the queue. So my question is can I store multiple messages in this queue and if so, how to I go about storing them and retrieving them? TIA for all your help.
The concept of a queue means you can put things in (msgsnd) which creates one message in the queue. To receive one message, you have to call msgrcv. Each receive will normally only return one message, so if you send n messages, you have to receive n messages.
Queues are generally seen as FIFO (first in, first out), so the message creates by the first msgsnd will be the first message returned by msgrcv.
This is a little weakend if you message queues that operate over the network (and/or in a cluster). Due to network latency/failover/retries etc messages can get out of order, so it's generally advised to build the messages with all necessary information in them to process correctly for those cases.
I'm new with threads and please if you can advice me. I have server that broadcast messages
to clients. Then clients sends back replies to server. I want to handle every reply using
seperate thread. Every reply will have mesage id and thread id. How can I fill some structure
with this information from all threads and then read it
Also from my code, is it correctly to create thread in while or does it exist someway
to create thread just if I get reply from client?
Did I start with correct understanding?
int main(){
while(1){
sendto on broadcast IP
pthread_create(&cln_thread, NULL, msg_processing, (void *) &arg))
}
}
msg_processing () {
recvfrom client msg with id of packet and thread id
how to write those informations and then read them in main when program finish
}
Thank you
Err.. no, just create ONE thread, once only, for receiving datagrams on one socket. In the thread function, receive the datagrams in a while(true) loop. Don't let this receive thread terminate and don't create any more receive threads. Continually creating/terminating/destroying threads is inefficient, dangerous, unnecessary, error-prone, difficult-to-debug and you should try very hard to not do it, ever.
Edit:
One receive thread only - but you don't have to do the processing there. Malloc a 64K buffer, receive data into it, push the buffer pointer onto a producer-consumer queue to a pool of threads that will do the processing, loop back and so malloc again to reseat the pointer and create another buffer for the next datagram. Free the *buffers in the pool threads after the processing is completed. The receive thread will be back waiting for datagrams quickly while the buffer processing runs concurrently.
If you find that the datagrams can arrive so rapidly that the processing cannot keep up, memory use will grow unchecked as more and more *buffers pile up in the queue. There are a couple ways round that. You can use a bounded queue that blocks if its capacity is reached. You could create x buffers at startup and store them on another producer-consumer 'pool queue' that the receive thread pops from, (instead of the malloc) - the processing pool threads can then push the 'used' *buffers back on to the pool queue for re-use. If the pool runs out, the receive thread will block on the pool until *buffers are returned.
I prefer the pooled-buffer approach since it caps memory-use across the whole system, avoids continual malloc/free with its fragmentation issues etc, avoids the more complex bounded queues, is easier to tweak, (the pool level is easy to change at runtime), and is easier to monitor/debug - I usually dump the pool level, (ie. queue count), to the display once a second with a timer.
In either case, datagrams may be lost but, if your system is overloaded such that data regularly arrives faster than it can possibly be processed, that's going to be the case anyway no matter how you design it.
One socket is fine, so why complicate matters? :)
You can find a good example of a multithreaded UDP server in Go lang following:
https://gist.github.com/jtblin/18df559cf14438223f93
The main idea is to use multi-core functionality in a full so each thread works on its own CPU core and reads UDP data into the single buffer for processing.
I am writing a small server that will receive data from multiple sources and process this data. The sources and data received is significant, but no more than epoll should be able to handle quite well. However, all received data must be parsed and run through a large number of tests which is time consuming and will block a single thread despite epoll multiplexing. Basically, the pattern should be something like follows: IO-loop receives data and bundles it into a job, sends to the first thread available in the pool, the bundle is processed by the job and the result is passed pack to the IO loop for writing to file.
I have decided to go for a single IO thread and N worker threads. The IO thread for accepting tcp connections and reading data is easy to implement using the example provided at:
http://linux.die.net/man/7/epoll
Thread are also usually easy enough to deal with, but I am struggling to combine the epoll IO loop with a threadpool in an elegant manner. I am unable to find any "best practice" for using epoll with a worker pool online either, but quite a few questions regarding the same topic.
I therefore have some question I hope someone can help me answering:
Could (and should) eventfd be used as a mechanism for 2-way synchronization between the IO thread and all the workers? For instance, is it a good idea for each worker thread to have its own epoll routine waiting on a shared eventfd (with a struct pointer, containing data/info about the job) i.e. using the eventfd as a job queue somehow? Also perhaps have another eventfd to pass results back into the IO thread from multiple worker threads?
After the IO thread is signaled about more data on a socket, should the actual recv take place on the IO thread, or should the worker recv the data on their own in order to not block the IO thread while parsing data frames etc.? In that case, how can I ensure safety, e.g. in case recv reads 1,5 frames of data in a worker thread and another worker thread receives the last 0,5 frame of data from the same connection?
If the worker thread pool is implemented through mutexes and such, will waiting for locks block the IO thread if N+1 threads are trying to use the same lock?
Are there any good practice patterns for how to build a worker thread pool around epoll with two way communication (i.e. both from IO to workers and back)?
EDIT: Can one possible solution be to update a ring buffer from the IO-loop, after update send the ring buffer index to the workers through a shared pipe for all workers (thus giving away control of that index to the first worker that reads the index off the pipe), let the worker own that index until end of processing and then send the index number back into the IO-thread through a pipe again, thus giving back control?
My application is Linux-only, so I can use Linux-only functionality in order to achieve this in the most elegant way possible. Cross platform support is not needed, but performance and thread safety is.
In my tests, one epoll instance per thread outperformed complicated threading models by far. If listener sockets are added to all epoll instances, the workers would simply accept(2) and the winner would be awarded the connection and process it for its lifetime.
Your workers could look something like this:
for (;;) {
nfds = epoll_wait(worker->efd, &evs, 1024, -1);
for (i = 0; i < nfds; i++)
((struct socket_context*)evs[i].data.ptr)->handler(
evs[i].data.ptr,
evs[i].events);
}
And every file descriptor added to an epoll instance could have a struct socket_context associated with it:
void listener_handler(struct socket_context* ctx, int ev)
{
struct socket_context* conn;
conn->fd = accept(ctx->fd, NULL, NULL);
conn->handler = conn_handler;
/* add to calling worker's epoll instance or implement some form
* of load balancing */
}
void conn_handler(struct socket_context* ctx, int ev)
{
/* read all available data and process. if incomplete, stash
* data in ctx and continue next time handler is called */
}
void dummy_handler(struct socket_context* ctx, int ev)
{
/* handle exit condition async by adding a pipe with its
* own handler */
}
I like this strategy because:
very simple design;
all threads are identical;
workers and connections are isolated--no stepping on each other's toes or calling read(2) in the wrong worker;
no locks are required (the kernel gets to worry about synchronization on accept(2));
somewhat naturally load balanced since no busy worker will actively contend on accept(2).
And some notes on epoll:
use edge-triggered mode, non-blocking sockets and always read until EAGAIN;
avoid dup(2) family of calls to spare yourself from some surprises (epoll registers file descriptors, but actually watches file descriptions);
you can epoll_ctl(2) other threads' epoll instances safely;
use a large struct epoll_event buffer for epoll_wait(2) to avoid starvation.
Some other notes:
use accept4(2) to save a system call;
use one thread per core (1 for each physical if CPU-bound, or 1 for each each logical if I/O-bound);
poll(2)/select(2) is likely faster if connection count is low.
I hope this helps.
When performing this model, because we only know the packet size once we have fully received the packet, unfortunately we cannot offload the receive itself to a worker thread. Instead the best we can still do is a thread to receive the data which will have to pass off pointers to fully received packets.
The data itself is probably best held in a circular buffer, however we will want a separate buffer for each input source (if we get a partial packet we can continue receiving from other sources without splitting up the data. The remaining question is how to inform the workers of when a new packet is ready, and to give them a pointer to the data in said packet. Because there is little data here, just some pointers the most elegant way of doing this would be with posix message queues. These provide the ability for multiple senders and multiple receivers to write and read messages, always ensuring every message is received and by precisely 1 thread.
You will want a struct resembling the one below for each data source, I shall go through the fields purposes now.
struct DataSource
{
int SourceFD;
char DataBuffer[MAX_PACKET_SIZE * (THREAD_COUNT + 1)];
char *LatestPacket;
char *CurrentLocation
int SizeLeft;
};
The SourceFD is obviously the file descriptor to the data stream in question, the DataBuffer is where Packets contents are held while being processed, it is a circular buffer. The LatestPacket pointer is used to temporarily hold a pointer to the most resent packet in case we receive a partial packet and move onto another source before passing the packet off. The CurrentLocation stores where the latest packet ends so that we know where to place the next one, or where to carry on in case of partial receive. The size left is the room left in the buffer, this will be used to tell if we can fit the packet or need to circle back around to the beginning.
The receiving function will thus effectively
Copy the contents of the packet into the buffer
Move CurrentLocation to point to the end of the packet
Update SizeLeft to account for the now decreased buffer
If we cannot fit the packet in the end of the buffer we cycle around
If there is no room there either we try again a bit later, going to another source meanwhile
If we had a partial receive store the LatestPacket pointer to point to the start of the packet and go to another stream until we get the rest
Send a message using a posix message queue to a worker thread so it can process the data, the message will contain a pointer to the DataSource structure so it can work on it, it also needs a pointer to the packet it is working on, and it's size, these can be calculated when we receive the packet
The worker thread will do its processing using the received pointers and then increase the SizeLeft so the receiver thread will know it can carry on filling the buffer. The atomic functions will be needed to work on the size value in the struct so we don't get race conditions with the size property (as it is possible it is written by a worker and the IO thread simultaneously, causing lost writes, see my comment below), they are listed here and are simple and extremely useful.
Now, I have given some general background but will address the points given specifically:
Using the EventFD as a synchronization mechanism is largely a bad idea, you will find yourself using a fair amount of unneeded CPU time and it is very hard to perform any synchronization. Particularly if you have multiple threads pick up the same file descriptor you could have major problems. This is in effect a nasty hack that will work sometimes but is no real substitute for proper synchronization.
It is also a bad idea to try and offload the receive as explained above, you can get around the issue with complex IPC but frankly it is unlikely receiving IO will take enough time to stall your application, your IO is also likely much slower than CPU so receiving with multiple threads will gain little. (this assumes you do not say, have several 10 gigabit network cards).
Using mutexes or locks is a silly idea here, it fits much better into lockless coding given the low amount of (simultaneously) shared data, you are really just handing off work and data. This will also boost performance of the receive thread and make your app far more scalable. Using the functions mentioned here http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html you can do this nice and easily. If you did do it this way, what you would need is a semaphore, this can be unlocked every time a packet is received and locked by each thread which starts a job to allow dynamically more threads in if more packets are ready, that would have far less overhead then a homebrew solution with mutexes.
There is not really much difference here to any thread pool, you spawn a lot of threads then have them all block in mq_receive on the data message queue to wait for messages. When they are done they send their result back to the main thread which adds the results message queue to its epoll list. It can then receive results this way, it is simple and very efficient for small data payloads like pointers. This will also use little CPU and not force the main thread to waste time managing workers.
Finally your edit is fairly sensible, except for the fact as I ave suggested, message queues are far better than pipes here as they very efficiently signal events , guarantee a full message read and provide automatic framing.
I hope this helps, however it is late so if I missed anything or you have questions feel free to comment for clarification or more explanation.
I post the same answer in other post: I want to wait on both a file descriptor and a mutex, what's the recommended way to do this?
==========================================================
This is a very common seen problem, especially when you are developing network server-side program. Most Linux server-side program's main look will loop like this:
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
resp = proc(req);
fd.send(resp);
}
}
It is single threaded(the main thread), epoll based server framework. The problem is, it is single threaded, not multi-threaded. It requires that proc() should never blocks or runs for a significant time(say 10 ms for common cases).
If proc() will ever runs for a long time, WE NEED MULTI THREADS, and executes proc() in a separated thread(the worker thread).
We can submit task to the worker thread without blocking the main thread, using a mutex based message queue, it is fast enough.
Then we need a way to obtain the task result from a worker thread. How? If we just check the message queue directly, before or after epoll_wait(), however, the checking action will execute after epoll_wait() to end, and epoll_wait() usually blocks for 10 micro seconds(common cases) if all file descriptors it wait are not active.
For a server, 10 ms is quite a long time! Can we signal epoll_wait() to end immediately when task result is generated?
Yes! I will describe how it is done in one of my open source project.
Create a pipe for all worker threads, and epoll waits on that pipe as well. Once a task result is generated, the worker thread writes one byte into the pipe, then epoll_wait() will end in nearly the same time! - Linux pipe has 5 us to 20 us latency.
In my project SSDB(a Redis protocol compatible in-disk NoSQL database), I create a SelectableQueue for passing messages between the main thread and worker threads. Just like its name, SelectableQueue has an file descriptor, which can be wait by epoll.
SelectableQueue: https://github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94
Usage in main thread:
epoll_add(serv_sock);
epoll_add(queue->fd());
while(1){
ret = epoll_wait();
foreach(ret as fd){
if(fd is worker_thread){
sock, resp = worker->pop_result();
sock.send(resp);
}
if(fd is client_socket){
req = fd.read();
worker->add_task(fd, req);
}
}
}
Usage in worker thread:
fd, req = queue->pop_task();
resp = proc(req);
queue->add_result(fd, resp);
I am using msgget() function in my IPC based application. How can I clean up the queue filled up with old message queues?
To delete a queue, use the following command:
msgctl(msgQID, IPC_RMID, NULL);
SYSTEM CALL: msgctl()
A work around is to increase MSGMNI System wide maximum number of message queues: policy dependent (on Linux, this limit can be read and modified via /proc/sys/kernel/msgmni).
You can change the message queue attribute for O_NONBLOCK by using mq_setattr.
Then empty the queue by reading all of the messages, until the returned value indicates the queue is empty.
Now set back the old attributes.
This method is not a run time optimized, but it avoids the need to close and open the message queue.
These persistent resource allocation issues (there's a similar one with shared memory) are why the System V APIs are generally considered deprecated. In this case, have you considered using a unix domain socket or FIFO instead of a message queue? Those appear in the filesystem, and can be "cleaned up" when no longer used with tools like rm.