domain socket fragmentation - c

I am using domain sockets (AF_UNIX) to communicate between two threads for inter process communication. This is chosen to work well with libev: I use it on the recv end of the domain socket. This works very well except that the data I am sending is constant 4864 bytes. I cannot afford to get this data fragmented. I always thought domain sockets won't fragment data, but as it turns out it does. When the communication is at its peak between the threads, I observe the following
Thread 1:
SEND = 4864 actual size = 4864
Thread 2:
READ = 3328 actual size = 4864
Thread 1:
SEND = 4864 actual size = 4864
Thread 2:
READ = 1536 actual size = 4864
As you can see, thread 2 received the data in fragments (3328 + 1536). This is really bad for my application. Is there anyway we can make it not fragment it? I understand that IP_DONTFRAG can be set to only AF_INET family? Can someone suggest an alternative?
Update: sendto code
ssize_t
socket_domain_writer_dgram_send(int *domain_sd, domain_packet_t *pkt) {
struct sockaddr_un remote;
unsigned long len = 0;
ssize_t ret = 0;
memset(&remote, '\0', sizeof(struct sockaddr_un));
remote.sun_family = AF_UNIX;
strncpy(remote.sun_path, DOMAIN_SOCK_PATH, strlen(DOMAIN_SOCK_PATH));
len = strlen(remote.sun_path) + sizeof(remote.sun_family) + 1;
ret = sendto(*domain_sd, pkt, sizeof(*pkt), 0, (struct sockaddr *)&remote, sizeof(struct sockaddr_un));
if (ret == -1) {
bps_log(BPS_LOGGER_RD, ASL_LEVEL_ERR, "Domain writer could not connect send packets", errno);
}
return ret;
}

SOCK_STREAM by definition doesn't preserve message boundaries. Try again with SOCK_DGRAM or SOCK_SEQPACKET:
http://man7.org/linux/man-pages/man7/unix.7.html
On the other hand, consider that you may be passing messages larger than your architecture page size. For example, for amd64, a memory page is 4K. If that's a problem for any reason it might make sense to split the packets in 2.
Note however, that's not a real issue for the packets to arrive fragmented. It's common to have a packet assembler in the receiving end of the socket. What's wrong with implementing it ?

4864 + 3328 = 8192. My guess is that you're transmitting two 4864-byte packets back to back in some cases, and it's filling an 8 KB kernel buffer somewhere. IP_DONTFRAG isn't applicable because IP is not involved here — the "fragmentation" you're seeing is happening via a completely different mechanism.
If all the data you're transmitting consists of packets, you would do well to use a datagram socket (SOCK_DGRAM) instead of a stream. This should make the send() block when the kernel buffer doesn't have sufficient space to store an entire packet, rather than allowing a partial write through, and will make each recv() return exactly one packet, so you don't need to deal with framing.

Related

Passing multiple buffers with iovec in C Linux sockets

I'm writing a linux C client-server programs that communicates with each other with unix domain sockets and passes couple of buffers each time.
I'm using ioverctors but for some reasons the server program only receives the first io vector.
Any idea ?
I attached the relevant code snippets.
Client code:
struct iovec iov[2];
struct msghdr mh;
int rc;
char str1[] = "abc";
char str2[] = "1234";
iov[0].iov_base = (caddr_t)str1;
iov[0].iov_len = sizeof(str1);
iov[1].iov_base = (caddr_t)str2;
iov[1].iov_len = sizeof(str2);
memset(&mh, 0, sizeof(mh));
mh.msg_iov = iov;
mh.msg_iovlen = 2;
n = sendmsg(sockfd, &mh, 0); /* no flags used*/
if (n > 0) {
printf("Sendmsg successfully executed\n");
}
}
Server code:
{
struct sockaddr_un *client_sockaddr = (sockaddr_un *)opq;
struct msghdr msg;
struct iovec io[2];
char buf[16];
char buf2[16];
io[0].iov_base = buf;
io[0].iov_len = sizeof(buf);
io[1].iov_base = buf2;
io[1].iov_len = sizeof(buf2);
msg.msg_iov = io;
msg.msg_iovlen = 2;
int len = recvmsg(sock, &msg, 0);
if (len > 0) {
printf("recv: %s %d %s %d\n", msg.msg_iov[0].iov_base, msg.msg_iov[0].iov_len, msg.msg_iov[1].iov_base, msg.msg_iov[1].iov_len);
}
return 0;
}
The output i'm getting from the server:
recv: abc 16 16
sendmsg(), writev(), pwritev(), and pwritev2() do not operate on multiple buffers, but one discontiguous buffer. They operate exactly as if you'd allocate a large enough temporary buffer, gather the data there, and then do the corresponding syscall on the single temporary buffer.
Their counterparts recvmsg(), readv(), preadv(), and preadv2() similarly do not operate on multiple buffers, only on one discontiguous buffer. They operate exactly as if you'd allocate a large enough temporary buffer, receive data into that buffer, then scatter the data from that buffer to the discontiguous buffer parts.
Unix domain datagram (SOCK_DGRAM) and seqpacket (SOCK_SEQPACKET) sockets preserve message boundaries, but stream sockets (SOCK_STREAM) do not. That is, using a datagram or seqpacket socket you receive each message as it was sent. With a stream socket, message boundaries are lost: two consecutively sent messages can be received as a single message, and you can (at least in theory) receive a partial message now and the rest later.
You can use the Linux-specific sendmmsg() function to send several messages in one call (using the same socket). If you use an Unix domain datagram or seqpacket socket, these will then retain their message boundaries.
Each message is described using a struct mmsghdr. It contains struct msghdr msg_hdr; and unsigned int msg_len;. msg_hdr is the same as you use when sending a single message using e.g. sendmsg(); you can use more than one iovec for each message, but the recipient will receive them concatenated into a single buffer (but can scatter that buffer using e.g. recvmsg()). msg_len will be filled in by the sendmmsg() call: the number of bytes sent for that particular message, similar to the return value of e.g. sendmsg() call when no errors occur.
The return value from the sendmmsg() call is the number of messages sent successfully (which may be fewer than requested!), or -1 if an error occurs (with errno indicating the error as usual). Thus, you'll want to write a helper function or a loop around sendmmsg() to make sure you send all the messages. For portability, I recommend a helper function, because you can then provide another based on a loop around sendmsg() for use when sendmmsg() is not available.
The only real benefit of sendmmsg() is that you need fewer syscalls to send a large number of messages: it boosts efficiency in certain situations, that's all.

SocketCAN select() and write() don't block

I'm testing the CAN interface on an embedded device (SOC / ARM core / Linux) using SocketCAN, and I want to send data as fast as possible for testing, using efficient code.
I can open the CAN device ("can0") as a BSD socket, and send frames with "write". This all works well.
My desktop can obviously generate frames faster than the CAN transmission rate (I'm using 500000 bps). To send efficiently, I tried using a "select" on the socket file descriptor to wait for it to become ready, followed by the "write". However, the "select" seems to return immediately regardless of the state of the send buffer, and "write" also doesn't block. This means that when the buffer fills up, I get an error from "write" (return value -1), and errno is set to 105 ("No buffer space available").
This mean I have to wait an arbitrary amount of time, then try the write again, which seems very inefficient (polling!).
Here's my code (C, edited for brevity):
printf("CAN Data Generator\n");
int skt; // CAN raw socket
struct sockaddr_can addr;
struct canfd_frame frame;
const int WAIT_TIME = 500;
// Create socket:
skt = socket(PF_CAN, SOCK_RAW, CAN_RAW);
// Get the index of the supplied interface name:
unsigned int if_index = if_nametoindex(argv[1]);
// Bind CAN device to socket created above:
addr.can_family = AF_CAN;
addr.can_ifindex = if_index;
bind(skt, (struct sockaddr *)&addr, sizeof(addr));
// Generate example CAN data: 8 bytes; 0x11,0x22,0x33,...
// ...[Omitted]
// Send CAN frames:
fd_set fds;
const struct timeval timeout = { .tv_sec=2, .tv_usec=0 };
struct timeval this_timeout;
int ret;
ssize_t bytes_writ;
while (1)
{
// Use 'select' to wait for socket to be ready for writing:
FD_ZERO(&fds);
FD_SET(skt, &fds);
this_timeout = timeout;
ret = select(skt+1, NULL, &fds, NULL, &this_timeout);
if (ret < 0)
{
printf("'select' error (%d)\n", errno);
return 1;
}
else if (ret == 0)
{
// Timeout waiting for buffer to be free
printf("ERROR - Timeout waiting for buffer to clear.\n");
return 1;
}
else
{
if (FD_ISSET(skt, &fds))
{
// Ready to write!
bytes_writ = write(skt, &frame, CAN_MTU);
if (bytes_writ != CAN_MTU)
{
if (errno == 105)
{
// Buffer full!
printf("X"); fflush(stdout);
usleep(20); // Wait for buffer to clear
}
else
{
printf("FAIL - Error writing CAN frame (%d)\n", errno);
return 1;
}
}
else
{
printf("."); fflush(stdout);
}
}
else
{
printf("-"); fflush(stdout);
}
}
usleep(WAIT_TIME);
}
When I set the per-frame WAIT_TIME to a high value (e.g. 500 uS) so that the buffer never fills, I see this output:
CAN Data Generator
...............................................................................
................................................................................
...etc
Which is good! At 500 uS I get 54% CAN bus utilisation (according to canbusload utility).
However, when I try a delay of 0 to max out my transmission rate, I see:
CAN Data Generator
................................................................................
............................................................X.XX..X.X.X.X.XXX.X.
X.XX..XX.XX.X.XX.X.XX.X.X.X.XX..X.X.X.XX..X.X.X.XX.X.XX...XX.X.X.X.X.XXX.X.XX.X.
X.X.XXX.X.XX.X.X.X.XXX.X.X.X.XX.X.X.X.X.XX..X..X.XX.X..XX.X.X.X.XX.X..X..X..X.X.
.X.X.XX.X.XX.X.X.X.X.X.XX.X.X.XXX.X.X.X.X..XX.....XXX..XX.X.X.X.XXX.X.XX.XX.XX.X
.X.X.XX.XX.XX.X.X.X.X.XX.X.X.X.X.XX.XX.X.XXX...XX.X.X.X.XX..X.XX.X.XX.X.X.X.X.X.
The initial dots "." show the buffer filling up; Once the buffer is full, "X" starts appearing meaning that the "write" call failed with error 105.
Tracing through the logic, this means the "select" must have returned and the "FD_ISSET(skt, &fds)" was true, although the buffer was full! (or did I miss something?).
The SockedCAN docs just say "Writing CAN frames can be done similarly, with the write(2) system call"
This post suggests using "select".
This post suggests that "write" won't block for CAN priority arbitration, but doesn't cover other circumstances.
So is "select" the right way to do it? Should my "write" block? What other options could I use to avoid polling?
After a quick look at canbusload:184, it seems that it computes efficiency (#data/#total bits on the bus).
On the other hand, according to this, max efficiency for CAN bus is around 57% for 8-byte frames, so you seem not to be far away from that 57%... I would say you are indeed flooding the bus.
When setting a 500uS delay, 500kbps bus bitrate, 8-byte frames, it gives you a (control+data) bitrate of 228kbps, which is lower than max bitrate of the CAN bus, so, no bottleneck here.
Also, since in this case only 1 socket is being monitored, you don't need pselect, really. All you can do with pselect and 1 socket can be done without pselect and using write.
(Disclamer: hereinafter, this is just guessing since I cannot test it right now, sorry.)
As of why the behavior of pselect, think that the buffer could have byte semantics, so it tells you there is still room for more bytes (1 at least), not necessarily for more can_frames. So, when returning, pselect does not inform you can send the whole CAN frame. I guess you could solve this by using SIOCOUTQ and the max size of the Rx buffer SO_SNDBUF, but not sure if it works for CAN sockets (the nice thing would be to use SO_SNDLOWAT flags, but it is not changable in Linux's implementation).
So, to answer your questions:
Is "select" the right way to do it?
Well, you can do it both ways, either (p)select or write, since you are only waiting for one file descriptor, there is no real difference.
Should my "write" block? It should if there is no single byte available in the send buffer.
What other options could I use to avoid polling? Maybe by ioctl'ing SIOCOUTQ and getsockopt'ing SO_SNDBUF and substracting... you will need to check this yourself. Alternatively, maybe you could set the send buffer size to a multiple of sizeof(can_frame) and see if it keeps you signaling when less than sizeof(can_frame) are available.
Anyhow, if you are interested in having a more precise timing, you could use a BCM socket. There, you can instruct the kernel to send a specific frame at a specific interval. Once set, the process run in kernel space, without any system call. In this way, user-kernel buffer problem is avoided. I would test different rates until canbusload shows no rise in bus utilization.
select and poll worked for me right with SocketCan. However, carefull configuration is require.
some background:
between user app and the HW, there are 2 buffers:
socket buffer, where its size (in bytes) is controlled by the setsockopt's SO_SNDBUF option
driver's qdisc, where its size (in packets) is controlled by the "ifconfig can0 txqueuelen 5" command.
data path is: user app "write" command --> socket buffer -> driver's qdisc -> HW TX mailbox.
2 flow control points exist along this path:
when there is no free TX mailboxe, driver freeze driver's qdisc (__QUEUE_STATE_DRV_XOFF), to prevent more packets to be dequeued from driver's qdisc into HW. it will be un-freezed when TX mailbox is free (upon TX completion interrupt).
when socket buffer goes above half of its capacity, poll/select blocks, until socket buffer goes beyond half of its capacity.
now, assume that socket buffer has room for 20 packets, while driver's qdisc has room for 5 packets. lets assume also that HW have single TX mailbox.
poll/select let user write up to 10 packets.
those packets are moved down to socket buffer.
5 of those packets continue and fill driver's qdisc.
driver dequeue 1st packet from driver's qdisc, put it into HW TX mailbox and freeze driver's qdisc (=no more dequeue). now there is room for 1 packet in driver's qdisc
6th packet is moved down successfully from socket buffer to driver's qdisc.
7th packet is moved down from socket buffer to driver's qdisc, but since there is no room - it is dropped and error 105 ("No buffer space available") is generated.
what is the solution?
in the above assumptions, lets configure socket buffer for 8 packets. in this case, poll/select will block user app after 4 packets, ensuring that there is room in driver's qdisc for all of those 4 packets.
however, socket buffer is configured to bytes, not to packet. translation should be made as the following: each CAN packet occupy ~704 bytes at socket buffer (most of them for the socket structure). so, to configure socket buffer to 8 packet, the size in bytes should be 8*704:
int size = 8*704;
setsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size));

Sockets - Reading and writing [duplicate]

I'm very new to C++, but I'm trying to learn some basics of TCP socket coding. Anyway, I've been able to send and receive messages, but I want to prefix my packets with the length of the packet (like I did in C# apps I made in the past) so when my window gets the FD_READ command, I have the following code to read just the first two bytes of the packet to use as a short int.
char lengthBuffer[2];
int rec = recv(sck, lengthBuffer, sizeof(lengthBuffer), 0);
short unsigned int toRec = lengthBuffer[1] << 8 | lengthBuffer[0];
What's confusing me is that after a packet comes in the 'rec' variable, which says how many bytes were read is one, not two, and if I make the lengthBuffer three chars instead of two, it reads three bytes, but if it's four, it also reads three (only odd numbers). I can't tell if I'm making some really stupid mistake here, or fundamentally misunderstanding some part of the language or the API. I'm aware that recv doesn't guarantee any number of bytes will be read, but if it's just two, it shouldn't take multiple reads.
Because you cannot assume how much data will be available, you'll need to continuously read from the socket until you have the amount you want. Something like this should work:
ssize_t rec = 0;
do {
int result = recv(sck, &lengthBuffer[rec], sizeof(lengthBuffer) - rec, 0);
if (result == -1) {
// Handle error ...
break;
}
else if (result == 0) {
// Handle disconnect ...
break;
}
else {
rec += result;
}
}
while (rec < sizeof(lengthBuffer));
Streamed sockets:
The sockets are generally used in a streamed way: you'll receive all the data sent, but not necessarily all at once. You may as well receive pieces of data.
Your approach of sending the length is hence valid: once you've received the length, you cann then load a buffer, if needed accross successive reads, until you got everything that you expected. So you have to loop on receives, and define a strategy on how to ahandle extra bytes received.
Datagramme (packet oriented) sockets:
If your application is really packet oriented, you may consider to create a datagramme socket, by requesting linux or windows socket(), the SOCK_DGRAM, or better SOCK_SEQPACKET socket type.
Risk with your binary size data:
Be aware that the way you send and receive your size data appers to be assymetric. You have hence a major risk if the sending and receiving between machine with CPU/architectures that do not use the same endian-ness. You can find here some hints on how to ame your code platform/endian-independent.
TCP socket is a stream based, not packet (I assume you use TCP, as to send length of packet in data does not make any sense in UDP). Amount of bytes you receive at once does not have to much amount was sent. For example you may send 10 bytes, but receiver may receive 1 + 2 + 1 + 7 or whatever combination. Your code has to handle that, be able to receive data partially and react when you get enough data (that's why you send data packet length for example).

Handle packet loss UDP socket programming in C

I want to send a byte array X times until I've reached the buffer size. My client send the byte array until the buffer size is full. But the problem is packet loss and the server only recieves like 8/10 of the buffer size.
Client:
while(writeSize < bufferSize)
{
bytes_sent += sendto(servSocket, package, sizeof(package), 0, (struct sockaddr *)&server, sizeof(server));
writeSize+= sizeof(package);
}
Server:
while(packetSize < bufferSize)
{
bytes_recv += recvfrom(servSocket, package, sizeof(package), 0, (struct sockaddr*)&client, &size);
packetSize += sizeof(package);
printf("Bytes recieved: %d\n", bytes_recv);
}
I can't really come up with a solution with this problem. When the client has sent all the packages, and the server suffers from package loss the while loop won't end, any ideas how I can solve this?
Many thanks
Hmpf. You are using UDP. For this protocol its perfectly ok to throw away packets if there is need to do so. So it's not reliable in terms of "what is sent will arrive". What you have to do in your client is to check wether all packets you need arrived, and if not, talk politely to your server to resend those packets you did not receive. To implement this stuff is not that easy, and you might finally end up with a solution wich might be less performant than just using plain old tcp-sockets, so my recomendation would be in the first instance to switch to a tcp connection.
If you have to use UDP to transfer a largish chunk of data, then design a small application-level protocol that would handle possible packet loss and re-ordering (that's part of what TCP does for you). I would go with something like this:
Datagrams less then MTU (plus IP and UDP headers) in size (say 1024 bytes) to avoid IP fragmentation.
Fixed-length header for each datagram that includes data length and a sequence number, so you can stitch data back together, and detect missed, duplicate, and re-ordered parts.
Acknowledgements from the receiving side of what has been successfully received and put together.
Timeout and retransmission on the sending side when these acks don't come within appropriate time.
Here you go. We started building TCP again ...
Of course TCP would be the better choice but if you only have UDP option i'd wrap the data into a structure with sequence and buf_length field.
It is absolutely NOT an optimal solution... It just take care of sequence number in order to assure the data is totally arrived and with the right order. If not, just stop to receive data (It could do something smarter like ask to the send to resend pieces and rebuild the data despite thw wrong order but it will become really complex)
struct udp_pkt_s {
int seq;
size_t buf_len;
char buf[BUFSIZE];
};
sending side:
struct udp_pkt_s udp_pkt;
udp_pkt.seq = 0;
while(writeSize < bufferSize)
{
// switch to the next package block
// ...
memcpy(udp_pkt.buf, package);
udp_pkt.buf_len = sizeof(package);
udp_pkt.seq++;
bytes_sent += sendto(servSocket, &udp_pkt, sizeof(udp_pkt_s), 0, (struct sockaddr *)&server, sizeof(server));
writeSize += sizeof(package);
}
receiving side:
last_seq = 0;
while(packetSize < bufferSize)
{
bytes_recv += recvfrom(servSocket, &udp_pkt, sizeof(udp_pkt_s), 0, (struct sockaddr*)&client, &size);
// break it if there's a hole
if (udp_pkt.seq != (last_seq + 1))
break;
last_seq = udp_pkt.seq;
packetSize += udp_pkt.buf_len;
printf("Bytes recieved: %d\n", udp_pkt.buf_len);
}
after sendto you can wait for sendto buffer to reach zero, you can use the following api after each sendto call.
void waitForSendBuffer(void)
{
int outstanding = 0;
while(1)
{
ioctl(socketFd_, SIOCOUTQ, &outstanding);
if(outstanding == 0)
break;
printf("going for sleep\n");
usleep(1000);
}
}

Effect of SO_SNDBUF

I am unable to make sense of how and why the following code segments work :
/* Now lets try to set the send buffer size to 5000 bytes */
size = 5000;
err = setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &size, sizeof(int));
if (err != 0) {
printf("Unable to set send buffer size, continuing with default size\n");
}
If we check the value of the send buffer, it is indeed correctly set to 5000*2 = 10000.
However, if we try to send more than the send buffer size, it does send all of it. For example:
n = send(sockfd, buf, 30000, 0);
/* Lets check how much us actually sent */
printf("No. of bytes sent is %d\n", n);
This prints out 30000.
How exactly did this work? Didn't the fact that the send buffer size was limited to 10000 have any effect? If it did, what exactly happened? Some kind of fragmentation?
UPDATE: What happens if the socket is in non-blocking mode? I tried the following:
Changing buffer size to 10000 (5000*2) causes 16384 bytes to be sent
Changing buffer size to 20000 (10000*2) causes 30000 bytes to be sent
Once again, why?
The effect of setting SO_SNDBUF option is different for TCP and UDP.
For UDP this sets the limit on the size of the datagram, i.e. anything larger will be discarded.
For TCP this just sets the size of in-kernel buffer for given socket (with some rounding to page boundary and with an upper limit).
Since it looks like you are talking about TCP, the effect you are observing is explained by the socket being in blocking mode, so send(2) blocks until kernel can accept all of your data, and/or the network stack asynchronously de-queueing data and pushing it to the network card, thus freeing space in the buffer.
Also, TCP is a stream protocol, it does not preserve any "message" structure. One send(2) can correspond to multiple recv(2)s on the other side, and the other way around. Treat it as byte-stream.
SO_SNDBUF configures the buffer that the socket implementation uses internally. If your socket is non-blocking you can only send up to the configured size, if your socket is blocking there is no limitation for your call.

Resources