I have 2 Ubuntu 14.04 PCs. One is used as a server and the other one is used as a client. The client setup a TCP connection to the server which sends some packets back. Here's the code on the server:
send(sd, pkt, pkt_len, MSG_NOSIGNAL);
The code on the client side is also very simple:
read(sd, buf, buf_size);
If the transmissions on the server is spaced out, I don't see any issue. However, if server is doing rapid transmissions, then thing looks ugly. Here's an example when the server is sending 8 packets back-to-back.
The server code shows the size of these 8 packets are: 752 (bytes), 713, 713, 713, 396, 398, 396, 396
tcpdump on the server captures 4 TX packets: 752 (bytes), 1398, 1398, 929
tcpdump on the client captures 3 RX packets: 752 (bytes), 2796, 929
The client code shows it receives only 2 packets with 3548 bytes and 929 bytes, respectively.
So you can see all the bytes sent by the server are received by the client. However, packets are combined at various points in the transmission path. I guess this is due to TSO, GSO, GRO, etc. However, shouldn't these optimizations re-assemble the packets back to the correct form when the packets are delivered to the receiving application?
How do I get around this issue?
TCP is carefully designed to not only permit but implement exactly what you're seeing. It is a byte-stream protocol. If you want messages you have to implement them yourself via a superimposed application protocol.
How do I get around this issue?
So you're using TCP (a byte-stream-oriented transport mechanism) but you'd like it to have message-oriented behavior. You can't change the way TCP works (it is, by design, allowed to transport bytes in whatever-sized groups it chooses to, as long as the bytes are all received and they are received in the same order). But you can add a layer on top of TCP to simulate packet-oriented behavior.
For example, say you wanted to simulate the transmission of a 1000-byte "packet". Your sending program could first send out a fixed-size (let's say, 4-byte) header that would tell the receiver how many bytes the "packet" will contain:
size_t myPacketSize = 1000; // or whatever the size of your packet is
uint32_t bePacketSize = htonl(myPacketSize); // convert native-endian to big-endian for cross-platform compatibility
if (send(sd, &bePacketSize, sizeof(bePacketSize), 0) != sizeof(bePacketSize))
{
perror("send(header)");
}
.... then right after that you'd send out the packet's payload data:
if (send(sd, packetDataPtr, myPacketSize, 0) != myPacketSize)
{
perror("send(body)");
}
The receiver would need to receive the header/size value, then allocate an array of that size and receive the payload data into it. Since this code has to handle the incoming data correctly no matter how many bytes are returned by each recv() call, it's a little more complex than the sending code:
void HandleReceivedPseudoPacket(const char * packetBytes, uint32_t packetSizeBytes)
{
// Your received-packet-handling code goes here
}
// Parses an incoming TCP stream of header+body data back into pseudo-packets for handling
void ReadPseudoPacketsFromTCPStreamForever(int sd)
{
uint32_t headerBuf; // we'll read each 4-byte header's bytes into here
uint32_t numValidHeaderBytes = 0; // how many bytes in (headerBuf) are currently valid
char * bodyBuf = NULL; // will be allocated as soon as we know how many bytes to allocate
uint32_t bodySize; // How many bytes (bodyBuf) points to
uint32_t numValidBodyBytes = 0; // how many bytes in (bodyBuf) are currently valid
while(1)
{
if (bodyBuf == NULL)
{
// We don't know the bodySize yet, so read in header bytes to find out
int32_t numBytesRead = recv(sd, ((char *)&headerBuf)+numValidHeaderBytes, sizeof(headerBuf)-numValidHeaderBytes, 0);
if (numBytesRead > 0)
{
numValidHeaderBytes += numBytesRead;
if (numValidHeaderBytes == sizeof(headerBuf))
{
// We've read the entire 4-byte header, so now we can allocate the body buffer
numValidBodyBytes = 0;
bodySize = ntohl(headerBuf); // convert from big-endian to the CPU's native-endian
bodyBuf = (char *) malloc(bodySize);
if (bodyBuf == NULL)
{
perror("malloc");
break;
}
}
}
else if (numBytesRead < 0)
{
perror("recv(header)");
break;
}
else
{
printf("TCP connection was closed while reading header bytes!\n");
break;
}
}
else
{
// If we got here, then we know the bodySize and now we need to read in the body bytes
int32_t numBytesRead = recv(sd, &bodyBuf[numValidBodyBytes], bodySize-numValidBodyBytes, 0);
if (numBytesRead > 0)
{
numValidBodyBytes += numBytesRead;
if (numValidBodyBytes == bodySize)
{
// At this point the pseudo-packet is fully received and ready to be handled
HandleReceivedPseudoPacket(bodyBuf, bodySize);
// Reset our state variables so we'll be ready to receive the next header
free(bodyBuf);
bodyBuf = NULL;
numValidHeaderBytes = 0;
}
}
else if (numBytesRead < 0)
{
perror("recv(body)");
break;
}
else
{
printf("TCP connection was closed while reading body bytes!\n");
break;
}
}
}
// Avoid memory leak if we exited the while loop in the middle of reading a psuedo-packet's body
if (bodyBuf) free(bodyBuf);
}
Related
currently I try to sent 720 bytes from Windows application to custom STM32 device (now for testing purposes I use Blue Pill - STM32F103xxx). Ah, I forgot to point that I am totally newbie into programming :). So on device side I have 1000 bytes buffers for receiving and sending (Thanks to STMCube for this). Testing device with terminal program ( packets < than 64 bytes) works. Then I rework one of Microsoft examples to be able to sent more data to device. Used device driver on Windows is "usbser.sys". In short my console program do following:
Calculate SINE weave (360) samples - 16 bytes size
Sent them to USB Device as 720 bytes (byte size protocol for COM port)
My problem is that no more than 64 bytes comes into device.
Somewhere I read that reason for this can be into built in Rx,Tx Windows buffers (64 bytes long by mention somewhere on internet) and for this into code below I insert:
SetupComm(hCom,1000,1000)
in hope that this will solve my troubles but nope. Below is "my" code, any ideas how I can fix this?
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <math.h>
#define PI 3.14159265
void PrintCommState(DCB dcb)
{
// Print some of the DCB structure values
_tprintf(TEXT("\nBaudRate = %d, ByteSize = %d, Parity = %d, StopBits = %d\n"),
dcb.BaudRate,
dcb.ByteSize,
dcb.Parity,
dcb.StopBits);
}
int _tmain(int argc, TCHAR* argv[])
{
DCB dcb;
HANDLE hCom;
BOOL fSuccess;
const TCHAR* pcCommPort = TEXT("COM3"); // Most systems have a COM1 port
unsigned __int8 aOutputBuffer[720];// Data that will sent to device
unsigned __int16 aCalculatedWave[360];// Data that will sent to device
int iCnt; // temp counter to use everywhere
for (iCnt = 0; iCnt < 360; iCnt = iCnt + 1)
{
aCalculatedWave[iCnt] = (unsigned short)(0xFFFF * sin(iCnt * PI / 180));
if (iCnt > 180) aCalculatedWave[iCnt] = 0 - aCalculatedWave[iCnt];
}
// 16 bit aCalculatedWaveto to 8 bit aOutputBuffer
for (int i = 0, j = 0; i < 720; i += 2, ++j)
{
aOutputBuffer[i] = aCalculatedWave[j] >> 8; // Hi byte
aOutputBuffer[i + 1] = aCalculatedWave[j] & 0xFF; // Lo byte
}
// Open a handle to the specified com port.
hCom = CreateFile(pcCommPort,
GENERIC_READ | GENERIC_WRITE,
0, // must be opened with exclusive-access
NULL, // default security attributes
OPEN_EXISTING, // must use OPEN_EXISTING
0, // not overlapped I/O
NULL); // hTemplate must be NULL for comm devices
if (hCom == INVALID_HANDLE_VALUE)
{
// Handle the error.
printf("CreateFile failed with error %d.\n", GetLastError());
return (1);
}
if (SetupComm(hCom,1000,1000) !=0)
printf("Windows In/Out serial buffers changed to 1000 bytes\n");
else
printf("Buffers not changed with error %d.\n", GetLastError());
// Initialize the DCB structure.
SecureZeroMemory(&dcb, sizeof(DCB));
dcb.DCBlength = sizeof(DCB);
// Build on the current configuration by first retrieving all current
// settings.
fSuccess = GetCommState(hCom, &dcb);
if (!fSuccess)
{
// Handle the error.
printf("GetCommState failed with error %d.\n", GetLastError());
return (2);
}
PrintCommState(dcb); // Output to console
// Fill in some DCB values and set the com state:
// 57,600 bps, 8 data bits, no parity, and 1 stop bit.
dcb.BaudRate = CBR_9600; // baud rate
dcb.ByteSize = 8; // data size, xmit and rcv
dcb.Parity = NOPARITY; // parity bit
dcb.StopBits = ONESTOPBIT; // stop bit
fSuccess = SetCommState(hCom, &dcb);
if (!fSuccess)
{
// Handle the error.
printf("SetCommState failed with error %d.\n", GetLastError());
return (3);
}
// Get the comm config again.
fSuccess = GetCommState(hCom, &dcb);
if (!fSuccess)
{
// Handle the error.
printf("GetCommState failed with error %d.\n", GetLastError());
return (2);
}
PrintCommState(dcb); // Output to console
_tprintf(TEXT("Serial port %s successfully reconfigured.\n"), pcCommPort);
if (WriteFile(hCom, aOutputBuffer, 720, NULL, 0) != 0)
_tprintf(TEXT("720 bytes successfully writed to Serial port %s \n"), pcCommPort);
else
_tprintf(TEXT("Fail on write 720 bytes to Serial port %s \n"), pcCommPort);
return (0);
}
USB bulk endpoints implement a stream-based protocol, i.e. an endless stream of bytes. This is in contrast to a message-based protocol. So USB bulk endpoints have no concept of messages, message start or end. This also applies to USB CDC as it is based on bulk endpoints.
At the lower USB level, the stream of bytes is split into packets of at most 64 bytes. As per USB full-speed standard, packets cannot be larger than 64 bytes.
If the host sends small chunks of data that are more than 1ms apart, they will be sent and received in separate packets and it looks as if USB is a message-based protocol. However, for chunks of more than 64 bytes, they are split into smaller packets. And if small chunks are sent with less than 1ms in-between, the host will merge them into bigger packets.
Your design seems to require that data is grouped, e.g. the group of 720 bytes mentioned in the question. If this is a requirement, the grouping must be implemented, e.g. by first sending the size of the group and then the data.
Since larger groups are split into chunks of 64 bytes and the receive callback is called for every packet, the packets must be joined until the full group is available.
Also note a few problems in your current code (see usbd_cdc_if.c, line 264):
USBD_CDC_SetRxBuffer(&hUsbDeviceFS, &Buf[0]);
USBD_CDC_ReceivePacket(&hUsbDeviceFS);
NewDataFromUsb = *Len;
USBD_CDC_SetRxBuffer sets the buffer for the next packet to be received. If you always use the same buffer – as in this case – it's not needed. The initial setup is sufficient. However, it could be used to set a new buffer if the current packet does not contain a full group.
Despite its name, USBD_CDC_ReceivePacket does not receive a packet. Instead, it gives the OK to receive the next package. It should only be called if the data in the buffer has been processed and the buffer is ready to receive the next packet. Your current implementation runs the risk that the buffer is overwritten before it is processed, in particular if you send a group of more than 64 bytes, which will likely result in a quick succession of packets.
Note that Windows hasn't been mentioned here. The Windows code seems to be okay. And changing to Winusb.sys will just make your life harder but not get you packets bigger than 64 bytes.
I need to write a TCP server that can handle multiple connections; I followed this guide and wrote up the following program:
static void _handle_requests(char* cmd,int sessionfd){
//TODO: extend
printf("RECEIVED: %s\n",cmd);
if (!strcmp(cmd,BAR)){
barrier_hit(&nodebar,sessionfd);
}else if (!strcmp(cmd, BYE)){
}else if (!strcmp(cmd, HI)){
}
}
void handle_requests(void){
listen(in_sock_fd,QUEUELEN);
fd_set read_set, active_set;
FD_ZERO(&active_set);
FD_SET(in_sock_fd, &active_set);
int numfd = 0;
char cmd[INBUFLEN];
for (;;){
read_set = active_set;
numfd = select(FD_SETSIZE,&read_set,NULL,NULL,NULL);
for (int i = 0;i < FD_SETSIZE; ++i){
if (FD_ISSET(i,&read_set)){
if (i == in_sock_fd){
//new connection
struct sockaddr_in cliaddr;
socklen_t socklen = sizeof cliaddr;
int newfd = accept(in_sock_fd,(struct sockaddr*)&cliaddr, &socklen);
FD_SET(newfd,&active_set);
}else{
//already active connection
read(i,cmd,INBUFLEN);
_handle_requests(cmd,i);
}
}
}
}
}
..and a single client that connect() to the server and does two consecutive write() calls to the socket file descriptor.
n = write(sm_sockfd, "hi", 3);
if (n < 0) {
perror("SM: ERROR writing to socket");
return 1;
}
//...later
n = write(sm_sockfd, "barrier", 8);
if (n < 0) {
perror("SM: 'barrier msg' failed");
exit(1);
}
The thing is, the server only picks up the first message ("hi"); afterwards, the select call hangs. Since the write ("barrier") on the client's end succeeded, shouldn't that session file descriptor be ready for reading? Have I made any glaring mistakes?
Thanks; sorry if this is something obvious, I'm completely unfamiliar with C's networking library, and the project is due very soon!
You have a misunderstanding of how TCP sockets work. There is no message boundary in TCP, i.e. if you send first "hi" and then "barrier", you can't expect the corresponding receives to return "hi" and "barrier". It's possible that they return "hibarrier". It's also in theory possible (although very rare) that they would return "h", "i", "b", "a", "r", "r", "i", "e", "r".
You really need to consider how you delimit your messages. One possibility is to send the length of a message as 32-bit integer in network byte order (4 bytes) prior to the message. Then when you receive the message, you first read 4 bytes and then read as many bytes as the message length indicates.
Do note that TCP may return partial reads, so you need to somehow handle those. One possibility is to have a buffer which holds the bytes read, and then append to this buffer when more bytes are read and handle the contents of the buffer when the first four bytes of the buffer (i.e. the message length) indicate that you have the full message.
If you want a sequential packet protocol that preserves packet boundaries, you may want to consider SCTP. However, it's not widely supported by operating system kernels currently so what I would do is the 32-bit length trick to have a packet-oriented layer on top of TCP.
Do this :
int nbrRead = read(i,cmd,INBUFLEN);
and print out the value of nbrRead. You will see that you received everything in one go. TCP is a streaming protocol, if you do 3 or more sequential sends the chance is very high that you will receive them all at once.
Also make sure that INBUFLEN is large enough 2048 will be more than enough for your example.
I am designing a game which has master and multiple players. They send and receive data using TCP sockets.
Players transfer character strings between themselves via TCP sockets.The programs are being executed in red hat linux 6 os .
The character string transferred between players is of the type
char chain[2*hops+10];
The player code on sender side is
len = send(to,chain,sizeof(chain),0);
if (len != sizeof(chain)) {
perror("send");
exit(1);}
The code where player receives the data is like this :
char chain[2*hops+10];
len = recv(current,chain,sizeof(chain),0);
The value of hops is same for both the players.
For hops value till around 8000 it is working fine, but once the hops value crosses some point, the same program is not working. I believe data is not transferred in one go.
Is there a maximum buffer size for send and recv buffer?
Note: The sockets between them are opened using this code:
s = socket(AF_INET, SOCK_STREAM, 0);
and then the usual connect and bind sockets on both sides.
TCP is a stream-oriented protocol (as implied by SOCK_STREAM). Data that an application sends or receives (in [maximum-sized] chunks) is not received or sent in same-sized chunks. Thus one should read from a socket until enough data to be processed have been received, then attempt to process said data, and repeat:
while (true) {
unsigned char buffer [4096] = {};
for (size_t nbuffer = 0; nbuffer < sizeof buffer
; nbuffer = MAX(nbuffer, sizeof buffer)) { /* Watch out for buffer overflow */
int len = recv (sockd, buffer, sizeof buffer, 0);
/* FIXME: Error checking */
nbuffer += len;
}
/* We have a whole chunk, process it: */
;
}
You can also handle partial sends on the other side as described here, much better than I ever would.
I was wondering if anyone could shed any light as to why two seperate send() calls would end up in the same recv() buffer using the loopback address for testing yet once switched to two remote machines they would require two recv() calls instead? I have been looking at the wireshark captures yet cant seem to make any sense as to why this would be occuring. Perhaps someone could critique my code and tell me where im going wrong. The two incoming messages from the server is of an undetermined length to the client. By the way i'm using BSD sockets using C in Ubuntu.
In the example shown below im parsing the entire buffer to extract the two seperate messages from it which i'll admit isn't an ideal approach.
-------SERVER SIDE--------
// Send greeting string and receive again until end of stream
ssize_t numBytesSent = send(clntSocket, greeting, greetingStringLen, 0);
if (numBytesSent < 0)
DieWithSystemMessage("send() failed");
//-----------------------------Generate "RANDOM" Message -----------------------
srand(time(NULL)); //seed random number from system clock
size_t randomStringLen = rand() % (RANDOMMSGSIZE-3); //generates random num
// betweeen 0 and 296
char randomMsg [RANDOMMSGSIZE] = "";
// declare and initialize allowable characteer set for the
const char charSet[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (randomStringLen) {
--randomStringLen;
for (size_t i = 0; i < randomStringLen; i++) {
int p = rand() % (int) (sizeof charSet - 1);
randomMsg[i] = charSet[p];
}
randomStringLen = strlen(randomMsg);
printf("Random String Size Before newline: %d\n", (int)randomStringLen);
strcat(randomMsg,"\r\n");
}
randomStringLen = strlen(randomMsg);
printf("Random String: %s\n", randomMsg);
//-----------------------------Send "RANDOM" Message ---------------------------
// Send greeting string and receive again until end of stream
numBytesSent = send(clntSocket, randomMsg, randomStringLen, 0);
if (numBytesSent < 0)
DieWithSystemMessage("send() failed");
//------------------------------------------------------------------------------
------CLIENT SIDE-------
//----------------------------- Receive Server Greeting ---------------------------
char buffer[BUFSIZE] = ""; // I/O buffer
// Receive up to the buffer size (minus 1 to leave space for
// a null terminator) bytes from the sender
ssize_t numBytesRcvd = recv(sock, buffer, BUFSIZE - 1, 0);
if (numBytesRcvd < 0)
DieWithSystemMessage("recv() failed");
buffer[numBytesRcvd] = '\0'; //terminate the string after calling recv()
printf("Buffer contains: %s\n",buffer); // Print the buffer
//printf("numBytesRecv: %d\n",(int)numBytesRcvd); // Print the buffer
//------------------------ Extracts the random message from buffer ---------------------------
char *randomMsg = strstr(buffer, "\r\n"); // searches from first occurance of substring
char randomMessage [BUFSIZE] = "";
strcat(randomMessage, randomMsg+2);
int randomStringLen = strlen(randomMessage)-2;
printf("Random Message: %s\n",randomMessage); // Print the buffer
char byteSize [10];
sprintf(byteSize,"%d", randomStringLen);
printf("ByteSize = %s\n",byteSize);
//----------------------- Send the number for random bytes recieved -------------------------
size_t byteStringLen = strlen(byteSize); // Determine input length
numBytes = send(sock, byteSize, byteStringLen, 0);
if (numBytes < 0)
DieWithSystemMessage("send() failed");
else if (numBytes != byteStringLen)
DieWithUserMessage("send()", "sent unexpected number of bytes");
shutdown(sock,SHUT_WR); // further sends are disallowed yet recieves are still possible
//----------------------------------- Recieve Cookie ----------------------------------------
On Unix systems recv and send are just special cases of the read and write that accepts additional flags. (Windows also emulates this with Winsock).
You shouldn't assume that one recv corresponds to one send because that's generally isn't true (just like you can read a file in multiple parts, even if it was written in a single write). Instead you should start each "message" with a header that tells you how long the message is, if it's important to know what were the separate messages, or just read the stream like a normal file, if it's not important.
TCP is a byte-stream protocol, not a message protocol. There is no guarantee that what you write with a single send() will be received via a single recv(). If you need message boundaries you must implement them yourself, e.g. with a length-word prefix, a type-length-value protocol, or a self-describing protocol like XML.
You're experiencing a TCP congestion avoidance optimization commonly referred to as the Nagle algorithm (named after John Nagle, its inventor).
The purpose of this optimization is to reduce the number of small TCP segments circulating over a socket by combining them together into larger ones. When you write()/send() on a TCP socket, the kernel may not transmit your data immediately; instead it may buffer the data for a very short delay (typically a few tens of milliseconds), in case another request follows.
You may disable Nagle's algorithm on a per-socket basis, by setting the TCP_NODELAY option.
It is customary to disable Nagle in latency-sensitive applications (remote control applications, online games, etc..).
send() shall return the number of bytes sent or error code, but all examples that I found check it only with error codes, but not with the number of bytes sent.
//typical example
int cnt=send(s,query,strlen(query),0);
if (cnt < 0) return(NULL);
//Hey, what about cnt < strlen(query)?
Q: Does "send()" always return the whole buffer?
A: No, not necessarily.
From Beej's Guide:
* http://beej.us/guide/bgnet/html/multi/syscalls.html#sendrecv
send() returns the number of bytes actually sent out—this might be
less than the number you told it to send! See, sometimes you tell it
to send a whole gob of data and it just can't handle it. It'll fire
off as much of the data as it can, and trust you to send the rest
later. Remember, if the value returned by send() doesn't match the
value in len, it's up to you to send the rest of the string. The good
news is this: if the packet is small (less than 1K or so) it will
probably manage to send the whole thing all in one go. Again, -1 is
returned on error, and errno is set to the error number.
Q: Does "recv()" always read the whole buffer?
A: No, absolutely not. You should never assume the buffer you've received is "the whole message". Or assume the message you receive is from one, single message.
Here's a good, short explanation. It's for Microsoft/C#, but it's applicable to all sockets I/O, in any language:
http://blogs.msdn.com/b/joncole/archive/2006/03/20/simple-message-framing-sample-for-tcp-socket.aspx
The answer is in another section of man 2 send:
When the message does not fit into the send buffer of the socket,
send() normally blocks, unless the socket has been placed in nonblock‐
ing I/O mode. In nonblocking mode it would fail with the error EAGAIN
or EWOULDBLOCK in this case. The select(2) call may be used to deter‐
mine when it is possible to send more data.
Or, alternatively, the POSIX version (man 3p send):
If space is not available at the sending socket to hold the message to
be transmitted, and the socket file descriptor does not have O_NONBLOCK
set, send() shall block until space is available. If space is not
available at the sending socket to hold the message to be transmitted,
and the socket file descriptor does have O_NONBLOCK set, send() shall
fail. The select() and poll() functions can be used to determine when
it is possible to send more data.
So, while a read of partial data is common, a partial send in blocking mode should not happen (barring implementation details).
Nope, it doesn't.
For reference, see the man page for send:
When the message does not fit into the send buffer of the socket, send()
normally blocks, unless the socket has been placed in nonblocking I/O mode.
In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this
case. The select(2) call may be used to determine when it is possible to send
more data.
I've read through this question and other two related questions:
When a non-blocking send() only transfers partial data, can we assume it would return EWOULDBLOCK the next call?
Blocking sockets: when, exactly, does "send()" return?
I found not all answers reach an consensus and one or two answers have oppsite conclusion.
So I spent quite some time searching in book and playing with this code of #Damon that he posted in the comment of https://stackoverflow.com/a/19400029/5983841 .
I think most answers are wrong and my conlusion is:
A call to send has these possible outcomes:
There is at least one byte available in the send buffer:
1.1 → if send is blocking (the fd is not set as non-blocking and MSG_DONTWAIT is not specified in send), send blocks until there's enough room for the whole buffer to fit, and send the whole buffer.
1.2 → if send is non-blocking (fd set as non-blocking or MSG_DONTWAIT is specified in send), send returns the number of bytes accepted (possibly fewer than you asked for).
The send buffer is completely full at the time you call send.
→ if the socket is blocking, send blocks
→ if the socket is non-blocking, send fails with EWOULDBLOCK/EAGAIN
An error occurred (e.g. user pulled network cable, connection reset by peer) →send fails with another error
#1.1 conforms to man 2 send:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.
partial recv is easy to understand, while for partial send (from The Linux Programming Interface):
61.1 Partial Reads and Writes on Stream Sockets
...
A partial write may occur if there is insufficient buffer space to transfer all of the requested bytes and one of the following is true:
A signal handler interrupted the write() call (Section 21.5) after it transferred some of the requested bytes.
The socket was operating in nonblocking mode (O_NONBLOCK), and it was possible to transfer only some of the requested bytes.
An asynchronous error occurred after only some of the requested bytes had
been transferred. By an asynchronous error, we mean an error that occurs asynchronously with respect to the application’s use of calls in the sockets API. An asynchronous error can arise, for example, because of a problem with a TCP connection, perhaps resulting from a crash by the peer application.
In all of the above cases, assuming that there was space to transfer at least 1 byte, the write() is successful, and returns the number of bytes that were transferred to the output buffer.
...
(The case of signal interruption doesn't happen in most of the time and I have difficulties writing to prove a partial write in this case. Hope someone could help)
What's not made clear enough of man 2 send :
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.
In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case.
is that in nonblocking mode it would fail if the buffer is completely full. If there's 1 byte available in send buffer, it won't fail but instead returns the number of bytes that were sent, aka a partial send. (the author of the book is also the mantainer of linux manpage https://www.kernel.org/doc/man-pages/maintaining.html ).
Prove of code, written by #Damon. I modifed 3~5 lines, making the server doesn't consume any packets, so as to demonstrate.
#include <cstdio>
#include <cstdlib>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <arpa/inet.h>
int create_socket(bool server = false)
{
addrinfo hints = {};
addrinfo* servinfo;
int sockfd = -1;
int rv;
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = server ? AI_PASSIVE : 0;
if ((rv = getaddrinfo(server ? 0 : "localhost", "12345", &hints, &servinfo)))
{
printf("getaddrinfo failed: %s\n", gai_strerror(rv));
exit(1);
}
for(auto p = servinfo; p; p = p->ai_next)
{
if ((sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1)
{
perror("socket");
continue;
}
if(server)
{
int yes = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));
if(bind(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
close(sockfd);
perror("bind");
continue;
}
}
else
{
if(connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
close(sockfd);
perror("connect");
continue;
}
else
puts("client: connected");
}
break;
}
freeaddrinfo(servinfo);
return sockfd;
}
void server()
{
int socket = create_socket(true);
if(listen(socket, 5) == -1)
{
perror("listen");
exit(1);
}
puts("server: listening");
int conn = -1;
sockaddr_storage addr;
unsigned int sizeof_addr = sizeof(addr);
for(;;)
{
if((conn = accept(socket, (sockaddr *) &addr, &sizeof_addr)) == -1)
{
perror("accept");
}
else
{
puts("server: accept");
if(!fork()) // actually not necessary, only got 1 client
{
close(socket);
// char *buf = new char[1024*1024];
// read(conn, buf, 1024*1024); // black hole
// server never reads
break;
}
}
}
}
void do_send(int socket, const char* buf, unsigned int size/*, bool nonblock = false */)
{
unsigned int sent = 0;
unsigned int count = 0;
while(sent < size)
{
int n = send(socket, &buf[sent], size - sent, 0);
// int n = send(socket, &buf[sent], size - sent, MSG_DONTWAIT);
if (n == -1)
{
if(errno == EAGAIN)
{
printf(".");
printf("\n");
}
else
{
perror("\nsend");
return;
}
}
else
{
sent += n;
printf(" --> sent a chunk of %u bytes (send no. %u, total sent = %u)\n", n, ++count, sent);
}
}
}
void client()
{
const unsigned int max_size = 64*1024*1024; // sending up to 64MiB in one call
sleep(1); // give server a second to start up
int socket = create_socket();
unsigned int send_buffer_size = 0;
unsigned int len = sizeof(send_buffer_size);
if(getsockopt(socket, SOL_SOCKET, SO_SNDBUF, &send_buffer_size, &len))
perror("getsockopt");
// Linux internally doubles the buffer size, and getsockopt reports the doubled size
printf("send buffer size = %u (doubled, actually %u)\n", send_buffer_size, send_buffer_size/2);
if(socket == -1)
{
puts("no good");
exit(1);
}
char *buf = new char[max_size]; // uninitialized contents, but who cares
for(unsigned int size = 65536; size <= max_size; size += 16384)
{
printf("attempting to send %u bytes\n", size);
do_send(socket, buf, size);
}
puts("all done");
delete buf;
}
int main()
{
setvbuf(stdout, NULL, _IONBF, 0);
if(fork() > 0) server(); else client();
return 0;
}
compile and run
g++ -g -Wall -o send-blocking-and-server-never-read code-of-damon.cpp
./send-blocking-and-server-never-read > log1.log 2>&1
log1.log content
server: listening
client: connectedserver: accept
send buffer size = 2626560 (doubled, actually 1313280)
attempting to send 65536 bytes
--> sent a chunk of 65536 bytes (send no. 1, total sent = 65536)
attempting to send 81920 bytes
--> sent a chunk of 81920 bytes (send no. 1, total sent = 81920)
attempting to send 98304 bytes
--> sent a chunk of 98304 bytes (send no. 1, total sent = 98304)
attempting to send 114688 bytes
--> sent a chunk of 114688 bytes (send no. 1, total sent = 114688)
attempting to send 131072 bytes
--> sent a chunk of 131072 bytes (send no. 1, total sent = 131072)
attempting to send 147456 bytes
--> sent a chunk of 147456 bytes (send no. 1, total sent = 147456)
attempting to send 163840 bytes
--> sent a chunk of 163840 bytes (send no. 1, total sent = 163840)
attempting to send 180224 bytes
--> sent a chunk of 180224 bytes (send no. 1, total sent = 180224)
attempting to send 196608 bytes
--> sent a chunk of 196608 bytes (send no. 1, total sent = 196608)
attempting to send 212992 bytes
--> sent a chunk of 212992 bytes (send no. 1, total sent = 212992)
attempting to send 229376 bytes
--> sent a chunk of 229376 bytes (send no. 1, total sent = 229376)
attempting to send 245760 bytes
--> sent a chunk of 245760 bytes (send no. 1, total sent = 245760)
attempting to send 262144 bytes
--> sent a chunk of 262144 bytes (send no. 1, total sent = 262144)
attempting to send 278528 bytes
--> sent a chunk of 278528 bytes (send no. 1, total sent = 278528)
attempting to send 294912 bytes
then comment int n = send(socket, &buf[sent], size - sent, 0); and uncomment int n = send(socket, &buf[sent], size - sent, MSG_DONTWAIT);
compile and run again
g++ -g -Wall -o send-nonblocking-and-server-never-read code-of-damon.cpp
./send-nonblocking-and-server-never-read > log2.log 2>&1
log2.log content
server: listening
server: accept
client: connected
send buffer size = 2626560 (doubled, actually 1313280)
attempting to send 65536 bytes
--> sent a chunk of 65536 bytes (send no. 1, total sent = 65536)
attempting to send 81920 bytes
--> sent a chunk of 81920 bytes (send no. 1, total sent = 81920)
attempting to send 98304 bytes
--> sent a chunk of 98304 bytes (send no. 1, total sent = 98304)
attempting to send 114688 bytes
--> sent a chunk of 114688 bytes (send no. 1, total sent = 114688)
attempting to send 131072 bytes
--> sent a chunk of 131072 bytes (send no. 1, total sent = 131072)
attempting to send 147456 bytes
--> sent a chunk of 147456 bytes (send no. 1, total sent = 147456)
attempting to send 163840 bytes
--> sent a chunk of 163840 bytes (send no. 1, total sent = 163840)
attempting to send 180224 bytes
--> sent a chunk of 180224 bytes (send no. 1, total sent = 180224)
attempting to send 196608 bytes
--> sent a chunk of 196608 bytes (send no. 1, total sent = 196608)
attempting to send 212992 bytes
--> sent a chunk of 212992 bytes (send no. 1, total sent = 212992)
attempting to send 229376 bytes
--> sent a chunk of 229376 bytes (send no. 1, total sent = 229376)
attempting to send 245760 bytes
--> sent a chunk of 245760 bytes (send no. 1, total sent = 245760)
attempting to send 262144 bytes
--> sent a chunk of 262144 bytes (send no. 1, total sent = 262144)
attempting to send 278528 bytes
--> sent a chunk of 278528 bytes (send no. 1, total sent = 278528)
attempting to send 294912 bytes
--> sent a chunk of 178145 bytes (send no. 1, total sent = 178145)
.
.
.
.
.
.
// endless .
Compare the last output of log1.log and log2.log and you can tell that a blocking send blocks when there's no enough buffer to fit all 294912 bytes while a non-blocking send performs a partial write. This conforms to conclusion #1.
Special thanks to #user207421's different opinion that leads me on more searching.