How can I reliably make send(2) do a short send? - c

send(2) takes a buffer and a buffer length. It can return either an error, or some number of bytes successfully sent up to the size of the buffer length. In some cases, send will send fewer than the number of bytes requested (e.g. https://stackoverflow.com/a/2618755/939259).
Is there a way to consistently trigger a short send in a unit test, other than sending a big message and firing a signal from another thread and hoping to get lucky?

Just roll your own:
#include <sys/types.h>
#include <sys/socket.h>
ssize_t mysend(int fd, void * buff, size_t len, int flags)
{
#if WANT_PARTIAL_SEND
len = 1 + urand(len -1);
#endif
return send(fd, buff, len, flags);
}

If you pack the code-to-be tested into a shared library (or a static library with the symbols weakened), then your testing executable (which links with the library) will be able to override send for both itself and the libraries it links.
Example (overrides write rather than send):
#!/bin/sh -eu
cat > code.c <<EOF
#include <unistd.h>
#include <stdio.h>
ssize_t hw(void)
{
static const char b[]="hello world\n";
return write(1, b, sizeof(b)-1);
}
void libfunc(void)
{
puts(__func__);
hw();
}
EOF
cat > test.c <<'EOF'
#include <stdio.h>
void libfunc(void);
ssize_t hw(void);
#if TEST
ssize_t hw(void)
{
puts("override");
return 42;
}
#endif
int main()
{
libfunc();
puts("====");
printf("%zu\n", hw());
}
EOF
gcc code.c -fpic -shared -o libcode.so
gcc test.c $PWD/libcode.so -o real
gcc -DTEST test.c $PWD/libcode.so -o mocked
set -x
./real
./mocked
Example output:
hello world
hello world
libfunc
====
12
libfunc
override
====
override
42
This overshadows the libc implementation of the symbol and while there are mechanism for accessing the overridee (namely dlopen and/or -Wl,--wrap), you shouldn't need to access it in a unit test (if you do need it in other unit tests, it's simplest to just put those other unit tests in a different program).

send(2) by default returns only when all data was successfully copied to the send buffers.
The possible ways to force it to send less bytes highly depend on the circumstances.
If you
1. can access the socket
2. do not want to alter the behaviour of all calls to send in your linked binary,
then you could set the socket non-blocking. Then, a call to send will send as much octects as possible. The amount of octets sent depends mainly on the amount of free memory in the send buffer of the socket you want to send on.
Thus, if you got
uint8_t my_data[NUM_BYTES_TO_SEND] = {0}; /* We don't care what your buffer actually contains in this example ... */
size_t num_bytes = sizeof(my_data);
send(fd, my_data, num_bytes);
and want send to send less than num_bytes, you could try to decrease the send buffer of your socket fd .
Whether this is possible, how to accomplish this might depend on your OS.
Under Linux, you could try to shrink the send buffer by setting the buffer size manually by using setsockopt(2) via the option SO_SNDBUF, descibed in the man page `socket(7):
uint8_t my_data[NUM_BYTES_TO_SEND] = {0};
size_t num_bytes = sizeof(my_data);
size_t max_bytes_to_send = num_bytes - 1; /* Force to send at most 1 byte less than in our buffer */
/* Set the socket non-blocking - you should check status afterwards */
int status = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK);
/* reduce the size of the send buffer below the number of bytes you want to send */
setsockopt (fd, SOL_SOCKET, SO_SNDBUF, &max_bytes_to_send, sizeof (size_t));
...
send(fd, my_data, num_bytes);
/* Possibly restore the old socket state */
Possibly you also have to fiddle around with the option SO_SNDBUFFORCE .
Further info for
Setting non-blocking: How do I change a TCP socket to be non-blocking?
Infos on SO_SNDBUF etc: What are SO_SNDBUF and SO_RECVBUF
Anyhow, the best way to go for you depends on the circumstance.
If you look for a reliable solution to check code you perhaps cannot even access, but link into your project dynamically, you might go with the other approach suggested here: Overshadow the send symbol in your compiled code .
On the other hand, this will impact all calls to send in your code (you could, of course bypass this problem e.g. by having your send replacement depend on some flags you could set).
If you can access the socket fd, and want only specific send calls to be impacted (as I guess is the case with you, since you talk about unit tests, and checking for sending less bytes than expected in all your tests is probably not what you want?), then going the way to shrink the send buffer could be the way to go.

Related

Network diagnostics for ZeroMQ Example

I am trying to implement ZeroMQ to get an application on a Raspberry Pi 3 (Raspbian Stretch) to communicate with an application on a separate machine (in this case Windows 7 64bit OS) linked by a wired or WLAN connection.
I have compiled ZeroMQ with the C library interface on both machines (using Cygwin on Windows) and the Hello World example (which I modified slightly to print the pointer values to assure me that the functions were 'working'). Both machines are connected (in this case via a wired Ethernet link and a router) and the connection is good (I link to RPi from PC via Xrdp or SSH OK).
The problem I have is that the client/server ZeroMQ programs don't appear to be 'seeing' each other even though they do appear to work and my question is: What are the first steps I should take to investigate why this is happening? Are there any commandline or GUI tools that can help me find out what's causing the blockage? (like port activity monitors or something?).
I know very little about networking so consider me a novice in all things sockety/servicey in your reply. The source code on the RPi (server) is:
// ZeroMQ Test Server
// Compile with
// gcc -o zserver zserver.c -lzmq
#include <zmq.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
int main (void)
{
void *context=NULL,*responder=NULL;
int rc=1;
// Socket to talk to clients
context = zmq_ctx_new ();
printf("Context pointer = %p\n",context);
responder = zmq_socket (context, ZMQ_REP);
printf("Responder pointer = %p\n",responder);
rc = zmq_bind (responder, "tcp://*:5555");
printf("rc = %d\n",rc);
assert (rc == 0);
while (1) {
char buffer [10];
zmq_recv (responder, buffer, 10, 0);
printf ("Received Hello\n");
sleep (1); // Do some 'work'
zmq_send (responder, "World", 5, 0);
}
return 0;
}
The source code on the PC (Cygwin) client is:
// ZeroMQ Test Client
// Compile with:
// gcc -o zclient zclient.c -L/usr/local/lib -lzmq
#include <zmq.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
int main (void)
{
void *context=NULL,*requester=NULL;
printf ("Connecting to hello world server\n");
context = zmq_ctx_new ();
printf("Context pointer = %p\n",context);
requester = zmq_socket (context, ZMQ_REQ);
printf("Requester pointer = %p\n",requester);
zmq_connect (requester, "tcp://localhost:5555");
int request_nbr;
for (request_nbr = 0; request_nbr != 10; request_nbr++) {
char buffer [10];
printf ("Sending Hello %d\n", request_nbr);
zmq_send (requester, "Hello", 5, 0);
zmq_recv (requester, buffer, 10, 0);
printf ("Received World %d\n", request_nbr);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}
On the RPi LXTerminal I run the server and get this:
Context pointer = 0xefe308
Responder pointer = 0xf00e08
rc = 0
and on the Cygwin Bash shell I run the client and get this:
Connecting to hello world server
Context pointer = 0x60005ab90
Requester pointer = 0x60005f890
Sending Hello 0
... and there they both hang - one listening, the other sending but neither responding to each other.
Any clue how to start investigating this would be appreciated.
+1 for a care using explicit zmq_close() and zmq_ctx_term() release of resources ...
In case this is the first time to work with ZeroMQ,
one may here enjoy to first look at "ZeroMQ Principles in less than Five Seconds" before diving into further details
Q : What are the first steps I should take to investigate why this is happening?
A Line-of-Sight test as a step zero makes no sense here.
All localhost-placed interfaces are hard to not "see" one another.
Next, test as a first step call { .bind() | .connect() }-methods using an explicit address like tcp://127.0.0.1:56789 ( so as to avoid the expansion of both the *-wildcard and the localhost-symbolic name translations )
Always be ready to read/evaluate the API-provided errno that ZeroMQ keeps reporting about the last ZeroMQ API-operation resultin error-state.
Best read the ZeroMQ native API documentation, which is well maintained from version to version, so as to fully understand the comfort of API designed signaling/messaging meta-plane.
Mea Culpa: the LoS is sure not to have been established by the O/P code:
RPi .bind()-s on it's local I/F ( and cannot otherwise )
PC .connect()-s not to that of RPi, but the PC's local I/F
PC .connect( "tcp://<address_of_RPi>:5555" ) will make it ( use the same IP-address as you use in Xrdp or SSH to connect to RPi or may read one explicitly from RPi CLI-terminal after ~$ ip address and use that one for PC-side client code )
Two disjoint ZeroMQ AccessPoint-s have zero way how to communicate,once no transport-"wire" from A to B
// Zero MQ Test Server
// Compile with
// gcc -o zserver zserver.c -lzmq
#include <zmq.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
int main (void)
{
void *context=NULL,*responder=NULL;
int rc=1;
// Socket to talk to clients
context = zmq_ctx_new (); printf("Context pointer = %p\n",context);
responder = zmq_socket (context, ZMQ_REP); printf("Responder pointer = %p\n",responder);
rc = zmq_bind (responder, "tcp://*:5555"); printf("rc = %d\n",rc);
/* ----------------------------------^^^^^^------------RPi interface-----------*/
assert (rc == 0);
while (1) {
char buffer [10];
zmq_recv (responder, buffer, 10, 0); printf("Received Hello\n");
sleep (1); // Do some 'work'
zmq_send (responder, "World", 5, 0);
}
return 0;
}
The source code on the PC (Cygwin) client is:
// ZeroMQ Test Client
// Compile with:
// gcc -o zclient zclient.c -L/usr/local/lib -lzmq
#include <zmq.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
int main (void)
{
void *context=NULL,*requester=NULL;
printf("Connecting to hello world server\n");
context = zmq_ctx_new (); printf("Context pointer = %p\n",context);
requester = zmq_socket (context, ZMQ_REQ); printf("Requester pointer = %p\n",requester);
zmq_connect (requester, "tcp://localhost:5555");
/*---------------------------------^^^^^^^^^^^^^^---------PC-local-interface------*/
int request_nbr;
for (request_nbr = 0; request_nbr != 10; request_nbr++) {
char buffer [10]; printf("Sending Hello %d\n", request_nbr);
zmq_send (requester, "Hello", 5, 0);
zmq_recv (requester, buffer, 10, 0); printf("Received World %d\n", request_nbr);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}
May like to also read more on ZeroMQ-related subjects here
Epilogue :
The trouble reported in the O/P is actually masked and remains hidden from being detectable by the API. ZeroMQ permits one AccessPoint to have 0+ transport-class-connections simultaneously, given a proper syntax and other conditions are met.
A call tozmq_connect( reguester, "tcp://<address-not-intended-but-correct>:<legal-port>" ) will result in legally-fair state and none of the defined and documented cases of possible error-states would get reported, because none of all such cases did actually happen:
EINVAL
The endpoint supplied is invalid.
EPROTONOSUPPORT
The requested transport protocol is not supported.
ENOCOMPATPROTO
The requested transport protocol is not compatible with the socket type.
ETERM
The ØMQ context associated with the specified socket was terminated.
ENOTSOCK
The provided socket was invalid.
EMTHREAD
No I/O thread is available to accomplish the task.
There are some chances to at least somehow-"detect" the trouble would be to enforce another sort of exception/error, but deferred into the call of { zmq_recv() | zmq_recv() } in their non-blocking form, where these may turn into reporting EAGAIN or might be EFSM for not having completed the end-to-end re-confirmed ZMTP-protocol handshaking ( no counterparty was and would never be met on the PC-localhost-port with remote RPi-server-side ). This requires also prior settings of zmq_setsockopt( responder, ZMQ_IMMEDIATE, 1 ) and other configuration details.
Next one, in ZeroMQ v4.+, there is a chance to inspect a subset of AccessPoint's internally reported events, using an "inspection-socket" via a rather complex strategy of instantiatingint zmq_socket_monitor (void *socket, char *endpoint, int events); attached to the AccessPoint's internals via inproc:// transport-class ~ here "inproc://myPCsocketAccessPOINT_monitor" like this:
rc = zmq_socket_monitor( responder, // AccessPoint to monitor
"inproc://myPCsocketAccessPOINT_monitor", // symbolinc name
ZMQ_ALL_EVENTS // scope of Events
);
Such created internal monitoring "inspection-socket" may next get zmq_connect()-ed to like:
void *my_end_of_monitor_socket = zmq_socket ( context, ZMQ_PAIR );
rc = zmq_connect( my_end_of_monitor_socket, // local-end PAIR-socket AccessPoint
"inproc://myPCsocketAccessPOINT_monitor" // symbolic name
);
and finally, we can use this to read a sequence of events (and act accordingly ):
int event = get_monitor_event( my_end_of_monitor_socket, NULL, NULL );
if (event == ZMQ_EVENT_CONNECT_DELAYED) { ...; }
if (event == ... ) { ...; }
using as a tool a trivialised get_monitor_event() like this, that handles some of the internal rules of reading and interpreting the multi-part messages that come as ordered from the instantiated "internal"-monitor attached to the AccessPoint:
// Read one event off the monitor socket; return value and address
// by reference, if not null, and event number by value. Returns -1
// in case of error.
static int
get_monitor_event ( void *monitor, int *value, char **address )
{
// First frame in message contains event number and value
zmq_msg_t msg;
zmq_msg_init (&msg);
if (zmq_msg_recv (&msg, monitor, 0) == -1) return -1; // Interrupted, presumably
assert (zmq_msg_more (&msg));
uint8_t *data = (uint8_t *) zmq_msg_data (&msg);
uint16_t event = *(uint16_t *) (data);
if (value) *value = *(uint32_t *) (data + 2);
// Second frame in message contains event address
zmq_msg_init (&msg);
if (zmq_msg_recv (&msg, monitor, 0) == -1) return -1; // Interrupted, presumably
assert (!zmq_msg_more (&msg));
if (address) {
uint8_t *data = (uint8_t *) zmq_msg_data (&msg);
size_t size = zmq_msg_size (&msg);
*address = (char *) malloc (size + 1);
memcpy (*address, data, size);
(*address)[size] = 0;
}
return event;
}
What internal-API-events can be monitored ?
As of the state of v4.2 API, there is this set of "internal"-monitor(able) internal-API-events:
ZMQ_EVENT_CONNECTED
The socket has successfully connected to a remote peer. The event value is the file descriptor (FD) of the underlying network socket. Warning: there is no guarantee that the FD is still valid by the time your code receives this event.
ZMQ_EVENT_CONNECT_DELAYED
A connect request on the socket is pending. The event value is unspecified.
ZMQ_EVENT_CONNECT_RETRIED
A connect request failed, and is now being retried. The event value is the reconnect interval in milliseconds. Note that the reconnect interval is recalculated at each retry.
ZMQ_EVENT_LISTENING
The socket was successfully bound to a network interface. The event value is the FD of the underlying network socket. Warning: there is no guarantee that the FD is still valid by the time your code receives this event.
ZMQ_EVENT_BIND_FAILED
The socket could not bind to a given interface. The event value is the errno generated by the system bind call.
ZMQ_EVENT_ACCEPTED
The socket has accepted a connection from a remote peer. The event value is the FD of the underlying network socket. Warning: there is no guarantee that the FD is still valid by the time your code receives this event.
ZMQ_EVENT_ACCEPT_FAILED
The socket has rejected a connection from a remote peer. The event value is the errno generated by the accept call.
ZMQ_EVENT_CLOSED
The socket was closed. The event value is the FD of the (now closed) network socket.
ZMQ_EVENT_CLOSE_FAILED
The socket close failed. The event value is the errno returned by the system call. Note that this event occurs only on IPC transports.
ZMQ_EVENT_DISCONNECTED
The socket was disconnected unexpectedly. The event value is the FD of the underlying network socket. Warning: this socket will be closed.
ZMQ_EVENT_MONITOR_STOPPED
Monitoring on this socket ended.
ZMQ_EVENT_HANDSHAKE_FAILED
The ZMTP security mechanism handshake failed. The event value is unspecified.
NOTE: in DRAFT state, not yet available in stable releases.
ZMQ_EVENT_HANDSHAKE_SUCCEED
NOTE: as new events are added, the catch-all value will start returning them. An application that relies on a strict and fixed sequence of events must not use ZMQ_EVENT_ALL in order to guarantee compatibility with future versions.
Each event is sent as two frames. The first frame contains an event number (16 bits), and an event value (32 bits) that provides additional data according to the event number. The second frame contains a string that specifies the affected TCP or IPC endpoint.
In zmq_connect, you must indicate the IP address of the raspberry (which have executed zmq_bind:
It should have been:
// on PC, remote ip is the raspberry one, the one you use for ssh for instance
rc = zmq_connect(requester, "tcp://<remote ip>:5555");

How to set interrupt with serial on linux?

I want to set interrupt for serial port on linux,so I do it by signal.And the handler of signal hava worded,but I don't know how to get the number of character.Specifically I am not sure the third parameter in read() function when the handler is called by system.So,I need a solution that knows the amount of serial data.
Thanks you all.
PS:My English is not good,so the above may not be clearly expressed
void serialHandler(int sig)
{
read(fd,buffer,I don't know);
}
Specifically I am not sure the third parameter in read() function when the handler is called by system
read() is described fully here, and includes the following example:
#include <sys/types.h>
#include <unistd.h>
...
char buf[20];
size_t nbytes;
ssize_t bytes_read;
int fd;
...
nbytes = sizeof(buf);
bytes_read = read(fd, buf, nbytes);
It is common to use a loop construct (for example around similar code to that shown above) while testing the output of read for an exit criteria. In the above implementation (not looped) bytes_read contains the number of bytes successfully read, excluding any carriage return characters removed. If a read error or end-of-file ( EOF ) is encountered, the returned value can be less than that specified in the number_ofBytes parameter. If an error occurs, read returns 0 and sets errno to a nonzero value.
Note: As mentioned in the comments, using read() in conjunction with a serial port most likely precludes it will ever see an EOF condition.
Also to expound on the comment about using timeouts with read(), and how to implement a timeout for the read function itself using the select() function.
There is more information here to help with creating algorithms to read from port.

Linux TCP recv() with MSG_TRUNC - writes to buffer?

I've just encountered a surprising buffer overflow, while trying to use the flag MSG_TRUNC in recv on a TCP socket.
And it seems to only happen with gcc (not clang) and only when compiling with optimization.
According to this link: http://man7.org/linux/man-pages/man7/tcp.7.html
Since version 2.4, Linux supports the use of MSG_TRUNC in the flags argument of recv(2) (and recvmsg(2)). This flag causes the received bytes of data to be discarded, rather than passed back in a caller-supplied buffer. Since Linux 2.4.4, MSG_PEEK also has this effect when used in conjunction with MSG_OOB to receive out-of-band data.
Does this mean that a supplied buffer will not be written to? I expected so, but was surprised.
If you pass a buffer (non-zero pointer) and size bigger than the buffer size, it results in buffer overflow when client sends something bigger than buffer. It doesn't actually seem to write the message to the buffer if the message is small and fits in the buffer (no overflow).
Apparently if you pass a null pointer the problem goes away.
Client is a simple netcat sending a message bigger than 4 characters.
Server code is based on:
http://www.linuxhowtos.org/data/6/server.c
Changed read to recv with MSG_TRUNC, and buffer size to 4 (bzero to 4 as well).
Compiled on Ubuntu 14.04. These compilations work fine (no warnings):
gcc -o server.x server.c
clang -o server.x server.c
clang -O2 server.x server.c
This is the buggy (?) compilation, it also gives a warning hinting about the problem:
gcc -O2 -o server.x server.c
Anyway like I mentioned changing the pointer to null fixes the problem, but is this a known issue? Or did I miss something in the man page?
UPDATE:
The buffer overflow happens also with gcc -O1.
Here is the compilation warning:
In function ‘recv’,
inlined from ‘main’ at server.c:47:14:
/usr/include/x86_64-linux-gnu/bits/socket2.h:42:2: warning: call to ‘__recv_chk_warn’ declared with attribute warning: recv called with bigger length than size of destination buffer [enabled by default]
return __recv_chk_warn (__fd, __buf, __n, __bos0 (__buf), __flags);
Here is the buffer overflow:
./server.x 10003
* buffer overflow detected *: ./server.x terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7338f)[0x7fcbdc44b38f]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7fcbdc4e2c9c]
/lib/x86_64-linux-gnu/libc.so.6(+0x109b60)[0x7fcbdc4e1b60]
/lib/x86_64-linux-gnu/libc.so.6(+0x10a023)[0x7fcbdc4e2023]
./server.x[0x400a6c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fcbdc3f9ec5]
./server.x[0x400879]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:01 17732 > /tmp/server.x
... more messages here
Aborted (core dumped)
And gcc version:
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
The buffer and recv call:
char buffer[4];
n = recv(newsockfd,buffer,255,MSG_TRUNC);
And this seems to fix it:
n = recv(newsockfd,NULL,255,MSG_TRUNC);
This will not generate any warnings or errors:
gcc -Wall -Wextra -pedantic -o server.x server.c
And here is the complete code:
/* A simple server in the internet domain using TCP
The port number is passed as an argument */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
void error(const char *msg)
{
perror(msg);
exit(1);
}
int main(int argc, char *argv[])
{
int sockfd, newsockfd, portno;
socklen_t clilen;
char buffer[4];
struct sockaddr_in serv_addr, cli_addr;
int n;
if (argc < 2) {
fprintf(stderr,"ERROR, no port provided\n");
exit(1);
}
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0)
error("ERROR opening socket");
bzero((char *) &serv_addr, sizeof(serv_addr));
portno = atoi(argv[1]);
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = INADDR_ANY;
serv_addr.sin_port = htons(portno);
if (bind(sockfd, (struct sockaddr *) &serv_addr,
sizeof(serv_addr)) < 0)
error("ERROR on binding");
listen(sockfd,5);
clilen = sizeof(cli_addr);
newsockfd = accept(sockfd,
(struct sockaddr *) &cli_addr,
&clilen);
if (newsockfd < 0)
error("ERROR on accept");
bzero(buffer,4);
n = recv(newsockfd,buffer,255,MSG_TRUNC);
if (n < 0) error("ERROR reading from socket");
printf("Here is the message: %s\n",buffer);
n = write(newsockfd,"I got your message",18);
if (n < 0) error("ERROR writing to socket");
close(newsockfd);
close(sockfd);
return 0;
}
UPDATE:
Happens also on Ubuntu 16.04, with gcc version:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
I think you have misunderstood.
With datagram sockets, MSG_TRUNC option behaves as described in man 2 recv man page (at Linux man pages online for most accurate and up to date information).
With TCP sockets, the explanation in the man 7 tcp man page is a bit poorly worded. I believed it is not a discard flag, but a truncate (or "throw away the rest") operation. However, the implementation (in particular, net/ipv4/tcp.c:tcp_recvmsg() function in the Linux kernel handles the details for TCP/IPv4 and TCP/IPv6 sockets) indicates otherwise.
There is also a separate MSG_TRUNC socket flag. These are stored in the error queue associated with the socket, and can be read using recvmsg(socketfd, &msg, MSG_ERRQUEUE). It indicates a datagram that was read was longer than the buffer, so some of it was lost (truncated). This is rarely used, because it is really only relevant to datagram sockets, and there are much easier ways to determine overlength datagrams.
Datagram sockets:
With datagram sockets, the messages are separate, and not merged. When read, the unread part of each received datagram is discarded.
If you use
nbytes = recv(socketfd, buffer, buffersize, MSG_TRUNC);
it means that the kernel will copy up to first buffersize bytes of the next datagram, and discard the rest of the datagram if it is longer (as usual), but nbytes will reflect the true length of the datagram.
In other words, with MSG_TRUNC, nbytes may exceed buffersize, even though only up to buffersize bytes are copied to buffer.
TCP sockets in Linux, kernels 2.4 and later, edited:
A TCP connection is stream-like; there are no "messages" or "message boundaries", just a sequence of bytes flowing. (Although, there can be out-of-band data, but that is not pertinent here).
If you use
nbytes = recv(socketfd, buffer, buffersize, MSG_TRUNC);
the kernel will discard up to next buffersize bytes, whatever is already buffered (but will block until at least one byte is buffered, unless the socket is in non-blocking mode or MSG_TRUNC | MSG_DONTWAIT is used instead). The number of bytes discarded is returned in nbytes.
However, both buffer and buffersize should be valid, because a recv() or recvfrom() call goes through the kernel net/socket.c:sys_recvfrom() function, which verifies buffer and buffersize are valid, and if so, populates the internal iterator structure to match, before calling the aforementioned net/ipv4/tcp.c:tcp_recvmsg().
In other words, the recv() with a MSG_TRUNC flag does not actually try to modify buffer. However, the kernel does check if buffer and buffersize are valid, and if not, will cause the recv() syscall to fail with -EFAULT.
When buffer overflow checks are enabled, GCC and glibc recv() does not just return -1 with errno==EFAULT; it instead halts the program, producing the shown backtraces. Some of these checks include mapping the zero page (where the target of a NULL pointer resides in Linux on x86 and x86-64), in which case the access check done by the kernel (before actually trying to read or write to it) succeeds.
To avoid the GCC/glibc wrappers (so that code compiled with e.g. gcc and clang should behave the same), one can use real_recv() instead,
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
ssize_t real_recv(int fd, void *buf, size_t n, int flags)
{
long retval = syscall(SYS_recvfrom, fd, buf, n, flags, NULL, NULL);
if (retval < 0) {
errno = -retval;
return -1;
} else
return (ssize_t)retval;
}
which calls the syscall directly. Note that this does not include the pthreads cancellation logic; use this only in single-threaded test programs.
In summary, with the stated problem regarding MSG_TRUNC flag for recv() when using TCP sockets, there are several factors complicating the full picture:
recv(sockfd, data, size, flags) actually calls the recvfrom(sockfd, data, size, flags, NULL, NULL) syscall (there is no recv syscall in Linux)
With a TCP socket, recv(sockfd, data, size, MSG_TRUNC) acts as if it were to read up to size bytes into data, if (char *)data+0 to (char *)data+size-1 are valid; it just does not copy them into data. The number of bytes thus skipped is returned.
The kernel verifies data (from (char *)data+0 to (char *)data+size-1, inclusive) is readable, first. (I suspect this check is erroneous, and might be turned into a writability check sometime in the future, so do not rely on this being a readability test.)
Buffer overflow checks can detect the -EFAULT result from the kernel, and instead halts the program with some kind of "out of bounds" error message (with a stack trace)
Buffer overflow checks may make NULL pointer seem like valid from the kernel point of view (because the kernel test is for reading, currently), in which case the kernel verification accepts the NULL pointer as valid. (One can verify if this is the case by recompiling without buffer overflow checks, using e.g. the above real_recv(), and seeing if a NULL pointer causes an -EFAULT result then.)
The reason for such a mapping (that, if allowed by hardware and the kernel structures, only exists, and is not readable or writable) is that with such a mapping, any access generates a SIGBUS signal, which a (library or compiler-provided signal handler) can catch, and dump not just a stack trace, but more details about the exact access (address, code that attempted the access, and so on).
I do believe the kernel access check treats such mappings readable and writable, because there needs to be a read or write attempt for the signal to be generated.
Buffer overflow checks are done by both the compiler and the C library, so different compilers may implement the checks, and the NULL pointer case, differently.
Nota bene: I’m adding this answer here after all this time, as this is still one of the first results on google for recv buffer overflow MSG_TRUNC, and if someone else ends up here, they’ll save themselves a lot of grief, searching and trial-and-error.
The original question is answered well enough already, but the subtlety I wanted to highlight, is the difference between stream and datagram sockets.
A common code pattern is to use recv( socket_, NULL, 0, MSG_DONTWAIT | MSG_PEEK | MSG_TRUNC ) to find how much data is queued before a read. This works perfectly for stream sockets (TCP and SCTP) but for datagram sockets (UDP, UDPL and DCCP) it will intermittently buffer overflow, but only if the executable is compiled with gcp and with optimisations enabled. Without optimisations it seems to work perfectly, which means it will sail through development QA, only to fail in staging/live.
Finding this was a total PITA. You’re welcome. ;)

Issue in pcap_set_buffer_size()

#include <stdio.h>
#include <stdlib.h>
#include <pcap.h>
#define BUFFER_SIZE 65535
char errbuf[PCAP_ERRBUF_SIZE];
int main(int argc, char **argv)
{
int d;
pcap_if_t *alldevsp;
pcap_t *pkt_handle;
if((pcap_findalldevs(&alldevsp,errbuf))==-1)
{
printf("findalldevices: %s\n",errbuf);
exit(1);
}
printf("Availabel network devices are\n");
pcap_if_t *temp = alldevsp;
while((temp)!=NULL)
{
printf("%s: %s\n",(temp)->name,(temp)->description);
(temp)=(temp)->next;
}
pcap_freealldevs(alldevsp);
pkt_handle = pcap_create("wlan1",errbuf);
if(pkt_handle==NULL)
{
printf("create: %s\n",errbuf);
exit(1);
}
if((pcap_set_rfmon(pkt_handle, 1))!=0)
{
printf("Monitor mode could not be set\n");
exit(1);
}
if((pcap_set_buffer_size(pkt_handle, BUFFER_SIZE))!=0)
{
printf("ERROR\n");
exit(1);
}
if((d=(pcap_activate(pkt_handle)))!=0)
{
if(d==PCAP_ERROR_RFMON_NOTSUP)
printf("%d : PCAP_ERROR_RFMON_NOTSUP\n",d);
if(d==PCAP_WARNING)
printf("%d : PCAP_WARNING\n",d);
if(d==PCAP_ERROR)
printf("%d : PCAP_ERROR\n",d);
pcap_perror(pkt_handle,"Activate");
exit(1);
}
printf("d=%d\n",d);
while(1)
{
scanf("%d",&d);
if(d==-1)
break;
}
pcap_close(pkt_handle);
printf("Bye\n");
return 0;
}
When you run the above program using:
gcc -Wall -lpcap sample.c -o sample
I get the follwing error:
-1 : PCAP_ERROR
Activate: can't mmap rx ring: Invalid argument
However, if I comment out the section of code containing pcap_set_buffer_size() function call, the program works perfectly fine.
So, what is this problem with pcap_set_buffer_size()?
Why is it causing pcap_activate() to fail?
For a recent 64bit Linux:
Any buffer size equal or larger then 65616 should do.
For how the value is calculated please see the implementation of create_ring() in pcap-linux.c from the libpcap sources.
The default is 2*1024*1024 = 2097152.
The default buffer size on windows is 1000000.
Update:
The buffer size to be set by pcap_set_buffer_size() refers to the (ring-)buffer, which stores the already received packages. The optimal size depends on the use case and on the affordable system resources (non-pageable memory).
Please see the following statements on the receive buffer's size verbatim from man pcap:
Packets that arrive for a capture are stored in a buffer, so that they
do not have to be read by the application as soon as they arrive. On
some platforms, the
buffer's size can be set; a size that's too small could mean that, if too many packets are being captured and the
snapshot length doesn't limit the amount of
data that's buffered, packets could be dropped if the buffer fills up before the application can read packets from it, while
a size that's too large could use
more non-pageable operating system memory than is necessary to prevent packets from being dropped.
Update 1:
Anyway, the buffer's size should be least the snap length set for the handle in use, plus some bytes needed to properly align the buffer itself, otherwise activating the handle ends up as described in the original question.
One can retrieve the handle's current snap length using pcap_snapshot(). The default snap length is 65535 bytes.

Flush kernel's TCP buffer for `MSG_MORE`-flagged packets

send()'s man page reveals the MSG_MORE flag which is asserted to act like TCP_CORK. I have a wrapper function around send():
int SocketConnection_Write(SocketConnection *this, void *buf, int len) {
errno = 0;
int sent = send(this->fd, buf, len, MSG_NOSIGNAL);
if (errno == EPIPE || errno == ENOTCONN) {
throw(exc, &SocketConnection_NotConnectedException);
} else if (errno == ECONNRESET) {
throw(exc, &SocketConnection_ConnectionResetException);
} else if (sent != len) {
throw(exc, &SocketConnection_LengthMismatchException);
}
return sent;
}
Assuming I want to use the kernel buffer, I could go with TCP_CORK, enable whenever it is necessary and then disable it to flush the buffer. But on the other hand, thereby the need for an additional system call arises. Thus, the usage of MSG_MORE seems more appropriate to me. I'd simply change the above send() line to:
int sent = send(this->fd, buf, len, MSG_NOSIGNAL | MSG_MORE);
According to lwm.net, packets will be flushed automatically if they are large enough:
If an application sets that option on
a socket, the kernel will not send out
short packets. Instead, it will wait
until enough data has shown up to fill
a maximum-size packet, then send it.
When TCP_CORK is turned off, any
remaining data will go out on the
wire.
But this section only refers to TCP_CORK. Now, what is the proper way to flush MSG_MORE packets?
I can only think of two possibilities:
Call send() with an empty buffer and without MSG_MORE being set
Re-apply the TCP_CORK option as described on this page
Unfortunately the whole topic is very poorly documented and I couldn't find much on the Internet.
I am also wondering how to check that everything works as expected? Obviously running the server through strace is not an option. So the simplest way would be to use netcat and then look at its strace output? Or will the kernel handle traffic transmitted over a loopback interface differently?
I have taken a look at the kernel source and both assumptions seem to be true. The following code are extracts from net/ipv4/tcp.c (2.6.33.1).
static inline void tcp_push(struct sock *sk, int flags, int mss_now,
int nonagle)
{
struct tcp_sock *tp = tcp_sk(sk);
if (tcp_send_head(sk)) {
struct sk_buff *skb = tcp_write_queue_tail(sk);
if (!(flags & MSG_MORE) || forced_push(tp))
tcp_mark_push(tp, skb);
tcp_mark_urg(tp, flags, skb);
__tcp_push_pending_frames(sk, mss_now,
(flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
}
}
Hence, if the flag is not set, the pending frames will definitely be flushed. But this is be only the case when the buffer is not empty:
static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
size_t psize, int flags)
{
(...)
ssize_t copied;
(...)
copied = 0;
while (psize > 0) {
(...)
if (forced_push(tp)) {
tcp_mark_push(tp, skb);
__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
} else if (skb == tcp_send_head(sk))
tcp_push_one(sk, mss_now);
continue;
wait_for_sndbuf:
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
if (copied)
tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
goto do_error;
mss_now = tcp_send_mss(sk, &size_goal, flags);
}
out:
if (copied)
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;
do_error:
if (copied)
goto out;
out_err:
return sk_stream_error(sk, flags, err);
}
The while loop's body will never be executed because psize is not greater 0. Then, in the out section, there is another chance, tcp_push() gets called but because copied still has its default value, it will fail as well.
So sending a packet with the length 0 will never result in a flush.
The next theory was to re-apply TCP_CORK. Let's take a look at the code first:
static int do_tcp_setsockopt(struct sock *sk, int level,
int optname, char __user *optval, unsigned int optlen)
{
(...)
switch (optname) {
(...)
case TCP_NODELAY:
if (val) {
/* TCP_NODELAY is weaker than TCP_CORK, so that
* this option on corked socket is remembered, but
* it is not activated until cork is cleared.
*
* However, when TCP_NODELAY is set we make
* an explicit push, which overrides even TCP_CORK
* for currently queued segments.
*/
tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
tcp_push_pending_frames(sk);
} else {
tp->nonagle &= ~TCP_NAGLE_OFF;
}
break;
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit
* any pending partial frames in the queue. This is
* meant to be used alongside sendfile() to get properly
* filled frames when the user (for example) must write
* out headers with a write() call first and then use
* sendfile to send out the data parts.
*
* TCP_CORK can be set together with TCP_NODELAY and it is
* stronger than TCP_NODELAY.
*/
if (val) {
tp->nonagle |= TCP_NAGLE_CORK;
} else {
tp->nonagle &= ~TCP_NAGLE_CORK;
if (tp->nonagle&TCP_NAGLE_OFF)
tp->nonagle |= TCP_NAGLE_PUSH;
tcp_push_pending_frames(sk);
}
break;
(...)
As you can see, there are two ways to flush. You can either set TCP_NODELAY to 1 or TCP_CORK to 0. Luckily, both won't check whether the flag is already set. Thus, my initial plan to re-apply the TCP_CORK flag can be optimized to just disable it, even if it's currently not set.
I hope this helps someone with similar issues.
That's a lot of research... all I can offer is this empirical post note:
Sending a bunch of packet with MSG_MORE set, followed by a packet without MSG_MORE, the whole lot goes out. It works a treat for something like this:
for (i=0; i<mg_live.length; i++) {
// [...]
if ((n = pth_send(sock, query, len, MSG_MORE | MSG_NOSIGNAL)) < len) {
printf("error writing to socket (sent %i bytes of %i)\n", n, len);
exit(1);
}
}
}
pth_send(sock, "END\n", 4, MSG_NOSIGNAL);
That is, when you're sending out all the packets at once, and have a clearly defined end... AND you are only using one socket.
If you tried writing to another socket in the middle of the above loop, you may find that Linux releases the previously held packets. At least that appears to be the trouble I'm having right now. But it might be an easy solution for you.

Resources