zmq_send() blocks on recv()/poll() - c

I try to send a few small (8 byte) messages through inproc pair socket. However after a few tries zmq_send() blocks. When I stop debugger I see following stack trace:
libc.so.6!__GI___poll(struct pollfd * fds, nfds_t nfds, int timeout)
libzmq.so.5!poll(int __timeout, nfds_t __nfds, pollfd * __fds)
libzmq.so.5!zmq::signaler_t::wait(zmq::signaler_t * const this, int timeout_)
libzmq.so.5!zmq::mailbox_t::recv(zmq::mailbox_t * const this, zmq::command_t * cmd_, int timeout_)
libzmq.so.5!zmq::socket_base_t::process_commands(zmq::socket_base_t * const this, int timeout_, bool throttle_)
libzmq.so.5!zmq::socket_base_t::send(zmq::socket_base_t * const this, zmq::msg_t * msg_, int flags_)
libzmq.so.5!s_sendmsg(int flags_, zmq_msg_t * msg_, zmq::socket_base_t * s_)
libzmq.so.5!zmq_send(void * s_, const void * buf_, size_t len_, int flags_)
Why does it happen? What commands does ZMQ try to to process? Why does it call recv()? Is it because of high water mark? I suppose it's something different because I send small amount of data and water mark shouldn't be reached yet. And if water mark is the only explanation then how can I measure it?

Generally, this will happen when the socket reaches some sort of limit on outstanding sent data; it will then block waiting for an acknowledgement message allowing it to continue.
The precise behavior depends on the zmq_socket type as well as any other configuration you have done on it (setting the high-water mark, for example).

It turned out it was bug in logic of my code. After sending some messages, next were sent by different thread using different socket. That socket was not connected so zmq_send() was waiting for connection to be established. This is why its pipe was NULL. Thanks everybody for help.

Related

Does listen's backlog number include SYN-received connections count in case of TCP in Linux?

I read some posts and checked Linux kernel code like inet_listen()->inet_csk_listen_start() and it seems that backlog argument of listen() syscall only affects on accepted queue, but not on SYN-received queue:
sk->sk_max_ack_backlog = backlog;
I.e. symbolically accept-queue + syn-received-queue != backlog.
I can't figure out what is happening. This article states:
The maximum allowed length of both the Accept and SYN Queues is taken
from the backlog parameter passed to the listen(2) syscall by the
application.
But there is nothing similar in MAN page.
Also in case of Linux: Is backlog a hint as mentioned here or it really limits queues?
In case of 4.3 kernel you specified it's something like:
tcp_v4_do_rcv()->tcp_rcv_state_process()->tcp_v4_conn_request()->tcp_conn_request()->inet_csk_reqsk_queue_is_full()
Here we can see the most important details about queues:
/* TW buckets are converted to open requests without
* limitations, they conserve resources and peer is
* evidently real one.
*/
if ((sysctl_tcp_syncookies == 2 ||
inet_csk_reqsk_queue_is_full(sk)) && !isn) {
want_cookie = tcp_syn_flood_action(sk, skb, rsk_ops->slab_name);
if (!want_cookie)
goto drop;
}
/* Accept backlog is full. If we have already queued enough
* of warm entries in syn queue, drop request. It is better than
* clogging syn queue with openreqs with exponentially increasing
* timeout.
*/
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
goto drop;
}
Pay your attention to inet_csk_reqsk_queue_is_full():
static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
{
return inet_csk_reqsk_queue_len(sk) >= sk->sk_max_ack_backlog;
}
Finally it compares current queue icsk_accept_queue with sk_max_ack_backlog size which was previously set by inet_csk_listen_start(). So yep, backlog affects incoming queue in current case.
You can see that both sk_acceptq_is_full() and inet_csk_reqsk_queue_is_full() make comparison with the same socket's sk_max_ack_backlog which is set through the listen():
static inline bool sk_acceptq_is_full(const struct sock *sk)
{
return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
}
Useful links: 1, 2

linux rpc order and retransmision

We use RPC (udp socket) in our application and we noticed that RPC retransmit it messages when it was not received (or confirmed) by the target application.
Does RPC present the order of messages? let say we have message 1 and message 2, does it wait for message 1 to be confirmed by the receiver and then sends message 2?
also I wasn't been able to find how many retries it does by default and if the sending failed after x retries does it log it somewhere so we can inspect it?
Thanks
RPC library has a call to control how long to wait before retry the request:
struct timeval tv;
clnt_control(cl, CLSET_TIMEOUT, (char *) &tv);
when you invoke an rpc call, you provide total timeout:
enum clnt_stat clnt_call(CLIENT *clnt, unsigned long procnum,
xdrproc_t inproc, char *in,
xdrproc_t outproc, char *out,
struct timeval tout);
if you divide tout by value you set with clnt_control, you will get number of retries.
The sync/async behavior depends on your application only.

Circular buffer using pointers in C

I have a queue structure, that I attempted to implement using a circular buffer, which I am using in a networking application. I am looking for some guidance and feedback. First, let me present the relevant code.
typedef struct nwk_packet_type
{
uint8_t dest_address[NRF24_ADDR_LEN];
uint8_t data[32];
uint8_t data_len;
}nwk_packet_t;
/* The circular fifo on which outgoing packets are stored */
nwk_packet_t nwk_send_queue[NWK_QUEUE_SIZE];
nwk_packet_t* send_queue_in; /* pointer to queue head */
nwk_packet_t* send_queue_out; /* pointer to queue tail */
static nwk_packet_t* nwk_tx_pkt_allocate(void)
{
/* Make sure the send queue is not full */
if(send_queue_in == (send_queue_out - 1 + NWK_QUEUE_SIZE) % NWK_QUEUE_SIZE)
return 0;
/* return pointer to the next add and increment the tracker */
return send_queue_in++;//TODO: it's not just ++, it has to be modular by packet size
}
/* External facing function for application layer to send network data */
// simply adds the packet to the network queue if there is space
// returns an appropriate error code if anything goes wrong
uint8_t nwk_send(uint8_t* address, uint8_t* data, uint8_t len)
{
/* First check all the parameters */
if(!address)
return NWK_BAD_ADDRESS;
if(!data)
return NWK_BAD_DATA_PTR;
if(!len || len > 32)
return NWK_BAD_DATA_LEN;
//TODO: PROBABLY NEED TO START BLOCKING HERE
/* Allocate the packet on the queue */
nwk_packet_t* packet;
if(!( packet = nwk_tx_pkt_allocate() ))
return NWK_QUEUE_FULL;
/* Build the packet */
memcpy(packet->dest_address, address, NRF24_ADDR_LEN);
memcpy(packet->data, data, len);
packet->data_len = len;
//TODO: PROBABLY SAFE TO STOP BLOCKING HERE
return NWK_SUCCESS;
}
/* Only called during NWK_IDLE, pushes the next item on the send queue out to the chip's "MAC" layer over SPI */
void nwk_transmit_pkt(void)
{
nwk_packet_t tx_pkt = nwk_send_queue[send_queue_out];
nrf24_send(tx_pkt->data, tx_pkt->data_len);
}
/* The callback for transceiver interrupt when a sent packet is either completed or ran out of retries */
void nwk_tx_result_cb(bool completed)
{
if( (completed) && (nwk_tx_state == NWK_SENDING))
send_queue_out++;//TODO: it's not just ++, it has to be modular by packet size with in the buffer
}
Ok now for a quick explanation and then my questions. So the basic idea is that I've got this queue for data which is being sent onto the network. The function nwk_send() can be called from anywhere in application code, which by the wall will be a small pre-emptive task based operating system (FreeRTOS) and thus can happen from lots of places in the code and be interrupted by the OS tick interrupt.
Now since that function is modifying the pointers into the global queue, I know it needs to be blocking when it is doing that. Am I correct in my comments on the code about where I should be blocking (ie disabling interrupts)? Also would be smarter to make a mutex using a global boolean variable or something rather than just disabling interrupts?
Also, I think there's a second place I should be blocking when things are being taken off the queue, but I'm not sure where that is exactly. Is it in nwk_transmit_pkt() where I'm actually copying the data off the queue and into a local ram variable?
Final question, how do I achieve the modulus operation on my pointers within the arrays? I feel like it should look something like:
send_queue_in = ((send_queue_in + 1) % (NWK_QUEUE_SIZE*sizeof(nwk_packet_t))) + nwk_send_queue;
Any feedback is greatly appreciated, thank you.
About locking it will be best to use some existing mutex primitive from the OS you use. I am not familiar with FreeRTOS but it should have builtin primitives for locking between interrupt and user context.
For circular buffer you may use these:
check for empty queue
send_queue_in == send_queue_out
check for full queue
(send_queue_in + 1) % NWK_QUEUE_SIZE == send_queue_out
push element [pseudo code]
if (queue is full)
return error;
queue[send_queue_in] = new element;
send_queue_in = (send_queue_in + 1) % NWK_QUEUE_SIZE;
pop element [pseudo code]
if (queue is empty)
return error;
element = queue[send_queue_out];
send_queue_out = (send_queue_out + 1) % NWK_QUEUE_SIZE;
It looks that you copy and do not just reference the packet data before sending. This means that you can hold the lock until the copy is done.
Without an overall driver framework to develop with, and when communicating with interrupt-state on a uC, you need to be very careful.
You cannot use OS synchro primitives to communicate to interrupt state. Attmpting to do so will certainly crash your OS because interrupt-handlers cannot block.
Copying the actual bulk data should be avoided.
On an 8-bit uC, I suggest queueing an index onto a buffer array pool, where the number of buffers is <256. That means that only one byte needs to be queued up and so, with an appropriate queue class that stores the value before updating internal byte-size indexes, it is possible to safely communicate buffers into a tx handler without excessive interrupt-disabling.
Access to the pool array should be thread-safe and 'insertion/deletion' should be quick - I have 'succ/pred' byte-fields in each buffer struct, so forming a double-linked list, access protected by a mutex. As well as I/O, I use this pool of buffers for all inter-thread comms.
For tx, get a buffer struct from teh pool, fill with data, push the index onto a tx queue, disable interrupts for only long enough to determine whether the tx interrupt needs 'primimg'. If priming is required, shove in a FIFO-full of data before re-enabling interrupts.
When the tx interrupt-handler has sent the buffer, it can push the 'used' index back onto a 'scavenge' queue and signal a semaphore to make a handler thread run. This thread can then take the entry from the scavenge queue and return it to the pool.
This scheme only works if interrupt-handlers do not re-enable higher-priority interrupts using the same buffering scheme.

Caching packets captured from pcap

This is a follow-up question to this:
Rebuilding a packet to inject via pcap
What I want to accomplish:
functionA: Capture packets with pcap. Modify source/destination addresses. Recalculate checksums. Inject with pcap.
functionB: Create two threads. Thread 1 sends a magic packet to wake sleeping client. Thread 2 captures packets with pcap and caches the packets into an array of u_char *'s, since pcap puts the packet data serially into "u_char * packet". When both threads terminate, I then change the headers then inject each of the cached packets.
What I need help with:
functionA: I can do everything but calculate checksums. I tried to verify the original checksum by calculating it myself with a function but they never match. However, this issue is not as important because I don't need it to demo my final project. I understand that if IP checksums are incorrect, the receiving computer will discard the packet. But when I demo, so long as my client computer can be shown to have received this incorrect packet, I have proven my overall concept and will not fail. :)
functionB: I guess this is the more important problem. I don't know of an easy way to cache my captured packets. What I'm working on right now is as follows:
functionB creates a pointer to an array that stores u_char * called cachedPackets. So cachedPackets basically points to an array that stores "strings".
It'll be something like this? u_char ** cachedPackets[100], enough array elements for 100 packets.
After this, I start two threads. Thread1 to wake my sleeping client. Thread2 to open another pcap session so no data is lost while client is waking. Thread1 is easy, I've already tested my send magic packet function independently. Thread2 is where I'm screwing up.
Thread2 eventually calls int pcap_loop(pcap_t *p, int cut, pcap_handler callback, u_char *user).
callback is the function that will be run after each packet is captured. It is where I will be caching the packet into the array.
callback takes parameters ( u_char* user,
const struct pcap_pkthdr* packet_header,
const u_char* packet_data )
user is the same string in the 4th argument of pcap_loop.
So I was thinking, I could sneakily give my callback function the pointer to the array of string by type casting it.
pcap_loop(asdf, asdf, callback, (u_char *)cachedPackets);
Since I don't know how big the incoming packets will be, I'll dynamically allocate enough space in the callback function. I will also keep track of my position in the array with a static int.
this is what the callback looks like:
void cacheCall(u_char * user, const struct pcap_pkthdr * header, const u_char * packet)
static int cacheindex = 0;
u_char ** cachethis = (u_char **)user;
//u_char * cachething = *cachethis;
(*cachethis)[cacheindex] = (u_char *) malloc(header->len); <--- 497
int i = 0;
for(i = 0; i < header->len; i++)
{
(*cachethis)[cacheindex][i] = packet[i]; <-------------------503
}
//memcpy(cachething[cacheindex], packet, header->len);
cacheindex++;
but when I compile, i get
497: warning: assignment makes integer from pointer without a cast
503: error: subscripted value is neither array nor pointer
That was pretty longwinded, hopefully my knowledge of what I'm doing isn't completely misinformed. Any help would be awesome! :)
u_char ** cachethis;
cachethis is a pointer-to-pointer-to-u_char.
So:
*cachethis
is a pointer-to-u_char, and:
(*cachethis)[i]
is a plain u_char.
So line 497 tries to store a pointer into an u_char, and line 503 tries to subscript a u_char, both of which are invalid.
Looks like what you want is simply:
cachethis[i]
and
cachethis[i][j]

C - How to use both aio_read() and aio_write()

I implement game server where I need to both read and write. So I accept incoming connection and start reading from it using aio_read() but when I need to send something, I stop reading using aio_cancel() and then use aio_write(). Within write's callback I resume reading. So, I do read all the time but when I need to send something - I pause reading.
It works for ~20% of time - in other case call to aio_cancel() fails with "Operation now in progress" - and I cannot cancel it (even within permanent while cycle). So, my added write operation never happens.
How to use these functions well? What did I missed?
EDIT:
Used under Linux 2.6.35. Ubuntu 10 - 32 bit.
Example code:
void handle_read(union sigval sigev_value) { /* handle data or disconnection */ }
void handle_write(union sigval sigev_value) { /* free writing buffer memory */ }
void start()
{
const int acceptorSocket = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
memset(&addr, 0, sizeof(struct sockaddr_in));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = htons(port);
bind(acceptorSocket, (struct sockaddr*)&addr, sizeof(struct sockaddr_in));
listen(acceptorSocket, SOMAXCONN);
struct sockaddr_in address;
socklen_t addressLen = sizeof(struct sockaddr_in);
for(;;)
{
const int incomingSocket = accept(acceptorSocket, (struct sockaddr*)&address, &addressLen);
if(incomingSocket == -1)
{ /* handle error ... */}
else
{
//say socket to append outcoming messages at writing:
const int currentFlags = fcntl(incomingSocket, F_GETFL, 0);
if(currentFlags < 0) { /* handle error ... */ }
if(fcntl(incomingSocket, F_SETFL, currentFlags | O_APPEND) == -1) { /* handle another error ... */ }
//start reading:
struct aiocb* readingAiocb = new struct aiocb;
memset(readingAiocb, 0, sizeof(struct aiocb));
readingAiocb->aio_nbytes = MY_SOME_BUFFER_SIZE;
readingAiocb->aio_fildes = socketDesc;
readingAiocb->aio_buf = mySomeReadBuffer;
readingAiocb->aio_sigevent.sigev_notify = SIGEV_THREAD;
readingAiocb->aio_sigevent.sigev_value.sival_ptr = (void*)mySomeData;
readingAiocb->aio_sigevent.sigev_notify_function = handle_read;
if(aio_read(readingAiocb) != 0) { /* handle error ... */ }
}
}
}
//called at any time from server side:
send(void* data, const size_t dataLength)
{
//... some thread-safety precautions not needed here ...
const int cancellingResult = aio_cancel(socketDesc, readingAiocb);
if(cancellingResult != AIO_CANCELED)
{
//this one happens ~80% of the time - embracing previous call to permanent while cycle does not help:
if(cancellingResult == AIO_NOTCANCELED)
{
puts(strerror(aio_return(readingAiocb))); // "Operation now in progress"
/* don't know what to do... */
}
}
//otherwise it's okay to send:
else
{
aio_write(...);
}
}
If you wish to have separate AIO queues for reads and writes, so that a write issued later can execute before a read issued earlier, then you can use dup() to create a duplicate of the socket, and use one to issue reads and the other to issue writes.
However, I second the recommendations to avoid AIO entirely and simply use an epoll()-driven event loop with non-blocking sockets. This technique has been shown to scale to high numbers of clients - if you are getting high CPU usage, profile it and find out where that's happening, because the chances are that it's not your event loop that's the culprit.
First of all, consider dumping aio. There are lots of other ways to do asynchronous I/O that are not as braindead (yes, aio is breaindead). Lots of alternatives; if you're on linux you can use libaio (io_submit and friends). aio(7) mentions this.
Back to your question.
I haven't used aio in a long time but here's what I remember. aio_read and aio_write both put requests (aiocb) on some queue. They return immediately even if the requests will complete some time later. It's entirely possible to queue multiple requests without caring what happened to the earlier ones. So, in a nutshell: stop cancelling read requests and keep adding them.
/* populate read_aiocb */
rc = aio_read(&read_aiocb);
/* time passes ... */
/* populate write_aiocb */
rc = aio_write(&write_aiocb)
Later you're free to wait using aio_suspend, poll using aio_error, wait for signals etc.
I see you mention epoll in your comment. You should definitely go for libaio.
Unless I'm not mistaken, POSIX AIO (that is, aio_read(), aio_write() and so on) is guaranteed to work only on seekable file descriptors. From the aio_read() manpage:
The data is read starting at the absolute file offset aiocbp->aio_offset, regardless of the
current file position. After this request, the value of the current file position is unspeciā€
fied.
For devices which do not have an associated file position such as network sockets, AFAICS, POSIX AIO is undefined. Perhaps it happens to work on your current setup, but that seems more by accident than by design.
Also, on Linux, POSIX AIO is implemented in glibc with the help of userspace threads.
That is, where possible use non-blocking IO and epoll(). However, epoll() does not work for seekable file descriptors such as regular files (same goes for the classical select()/poll() as well); in that case POSIX AIO is an alternative to rolling your own thread pool.
There should be no reason to stop or cancel an aio read or write request just because you need to make another read or write. If that were the case, that would defeat the whole point of asynchronous reading and writing since it's main purpose is to allow you to setup a reading or writing operation, and then move on. Since multiple requests can be queued, it would be much better to setup a couple of asynchronous reader/writer pools where you can grab a set of pre-initialized aiocb structures from an "available" pool that have been setup for asynchronous operations whenever you need them, and then return them to another "finished" pool when they're done and you can access the buffers they point to. While they're in the middle of an asynchronous read or write, they would be in a "busy" pool and wouldn't be touched. That way you won't have to keep creating aiocb structures on the heap dynamically every time you need to make a read or write operation, although that's okay to-do ... it's just not very efficient if you never plan on going over a certain limit, or plan to have only a certain number of "in-flight" requests.
BTW, keep in mind with a couple different in-flight asynchronous requests that your asychronous read/write handler can actually be interrupted by another read/write event. So you really don't want to be doing a whole-lot with your handler. In the above scenario I described, your handler would basically move the aiocb struct that triggered the signal handler from one of the pools to the next in the listed "available"->"busy"->"finished" stages. Your main code, after reading from the buffer pointed to by the aiocb structures in the "finished" pool would then move the structure back to the "available" pool.

Resources