C - Socket TCP - Infinite loop with read - c

I have a problem when I want to execute two consecutive times the read function Address to retrieve all the data sent by the server.
I read once then read create an infinity loop.
Code :
char buff[50] = {0};
nbytes = 1;
while (nbytes > 0)
{
nbytes = read(m_socket, buff, sizeof(buff));
}
why read create infinity loop ? is not the "while" the problem.
thank you for your answers.

socket(2) gives you a blocking file descriptor, meaning a system call like read(2) blocks until there's enough some data available to fulfill your request, or end of stream (for TCP) happens (return value 0), or some error happens (return value -1).
This means you never get out of your while loop until you hit an error, or the other side closes the connection.
Edit 0:
#EJP is thankfully here to correct me, as usual - read(2) blocks until any data is available (not the whole thing you requested, as I initially stated above), including an end-of-stream or an error.

Related

How does recv work in socket programming?

I'm trying to understand recv() at a high level. So recv takes data in "chunks" but I'm still not getting how it is precisely handled. Example:
char buffer[1000];
int received= recv(sock, buffer, sizeof(buffer), 0)
Does this mean if I'm receiving a massive file, the buffer, if connected through sock might for example reflect it stored 500 bytes in the received variable, then in a loop receive another 300 bytes, and all 800 bytes of data will be stored in buffer by the end of the loop (lost in the received variable unless accounted for), or does buffer need a pointer to keep track of where it last received the data to store it in then next iteration?
recv has no context. All it knows that it got some address (pointer) to write into and some maximum size - and then it will try this. It will always start writing with the given address. If for example on wish to add data after some previously received data one can simply give the pointer into the location after the previous data instead of the beginning of the buffer. Of course on should adjust the maximum size it is allowed to read to not overflow the buffer.
You asked "How does recv() work?", so it may be worth briefly studying a simpler function that does essentially the same thing - read().
recv() operates in more or less the same way as the read() function. The main difference is that recv() allows you to pass flags in the last argument - but you are not using these flags anyway.
My suggestion would be - before trying to use recv() to read from a network socket - to practice using read() on a plain text file.
Both functions return the number of bytes read - except in the case of an error, in which case they will return -1. You should always check for this scenario - and handle appropriately.
Both functions can also return less than the number of bytes requested. In the case of recv() - and reading from a socket - this may be because the other end has simply not sent all the required data yet. In the case of a reading from a file - with read() - it may be because you have reached the end of the file.
Anyway ...
You will need to keep track of the current offset within your buffer - and update it on each read. So declare a file-scope variable offset.
static off_t offset; static char buffer[1000];
Then - when your 'loop' is running - increment the offset after each read ...
while (1) {
size_t max_len = sizeof(buffer) - offset;
ssize_t count = recv(sock, buffer+offset, max_len, 0);
if (count == -1) {
switch (errno) {
case EAGAIN:
usleep(20000);
break;
default:
perror("Failed to read from socket");
close(sock);
break;
}
}
if (count == 0) {
puts("Looks like connection has been closed.");
break;
}
offset += count;
if (offset >= expected_len) {
puts("Got the expected amount of data. Wrapping up ...");
}
}
Notes:
Using this approach, you will either need to know the expected amount of data before-hand - or use a special delimiter to mark the end of the message
the max_len variable indicates how much space is left in your buffer - and (perhaps needless to say) you should not try to read more bytes than this
the destination for the recv() command is buffer+offset - not buffer.
if recv() returns zero, AFAIK this indicates that the other end has performed an "orderly shutdown".
if recv() returns -1, you really need to check the return code. EAGAIN is non-fatal - and just means you need to try again.

Write to file descriptor and immediately read from it

Today I have encountered some weird looking code that at first glance it's not apparent to me what it does.
send(file_desc,"Input \'y\' to continue.\t",0x18,0);
read(file_desc,buffer,100);
iVar1 = strcmp("y",(char *)buffer);
if (iVar1 == 0) {
// some more code
}
It seems that a text string is being written into the file descriptor. Immediately then after that it reads from that file descriptor into a buffer. And it compares if the text written into the buffer is a "y".
My understanding (please correct me if I am wrong), is that it writes some data which is a text string into the file descriptor, and then the file descriptor acts as a temporary storage location for anything you write to it. And after that it reads that data from the file descriptor into the buffer. It actually is the same file descriptor. It seems as a primitive way of using a file descriptor to copy data from the text string into the buffer. Why not just use a strcpy() instead?
What would be the use case of writing to a file descriptor and then immediately read from it? It seems like a convoluted way to copy data using file descriptors. Or maybe I don't understand this code well enough, what this sequence of a send() and a read() does?
And assuming that this code is instead using the file descriptor to copy the text string "Input \'y\' to continue.\t" into the buffer, why are they comparing it with the string "y"? It should probably be false every single time.
I am assuming that any data written into a file descriptor stays in that file descriptor until it is read from. So here it seems that send() is being used to write the string into, and read() is used to read it back out.
In man send it says:
The only difference between send() and write(2) is the presence of flags. With a zero
flags argument, send() is equivalent to write(2).
why would they use send() instead of write()? This code is just so mind boggling.
Edit: here's the full function where this code is originally from:
void send_read(int file_desc)
{
int are_equal;
undefined2 buffer [8];
char local_28 [32];
/* 0x6e == 110 == 'n' */
buffer[0] = 0x6e;
send(file_desc,"Input \'y\' to continue.\t",0x18,0);
read(file_desc,buffer,100);
are_equal = strcmp("y",(char *)buffer);
if (are_equal == 0) {
FUN_00400a86(file_desc,local_28);
}
else {
close(file_desc);
}
return;
}
The send() and recv() functions are for use with sockets (send: send a message on a socket — recv: receive a message from a connected socket). See also the POSIX description of Sockets in general.
Socket file descriptors are bi-directional — you can read and write on them. You can't read what you wrote, unlike with pipe file descriptors. With pipes, the process writing to the write end of a pipe can read what it wrote from the read end of the pipe — if another process didn't read it first. When a process writes on a socket, that information goes to the peer process and cannot be read by the writer.
send(2) is a system call that can only be used with sockets. A socket is a descriptor that allows you to use it to send data or receive from a remote point (a remote socket) that can be on a different computer or in the same as you are. But it works like a phone line, what you send is received by your parnter and what he/she sends is received by you. read(2) system call can be used by sockets, while send(2) cannot be used by files, so your sample code is mixing calls related to files with calls related to sockets (that's not uncommon, as read(2) and write(2) can both be used with sockets)
The code you post above is erroneous, as it blindly compares the received buffer with strcmp function, assuming that it received a null terminated string. This can be the case, but it also cannot.
Even if the sender (in the other side of the connection) agreed on sending a full message, nul terminated string. The receiver must first get the amount of data received (this is the return value of the read(2) call, which can be:
-! indicating some error on reception. The connection can be reset by the other side, or the other side can have rebooted while you send the data.
0 indicating no more data or end of data (the other side closed the connection) This can happen if the other side has a timeout and you take too much to respond. It closes the connection without sending anything. You just receive nothing.
n some data, less than the buffer size, but including the full packet sent by the peer (and the agreed nul byte it sent with it). This is the only case in which you can safely strcmp the data.
n some data, less than the buffer size, and less than the data transmitted. This can happen due to some data fragmentation of the data in several packets. Then you have to do another read until you have all the data send by your peer. Packet fragmentation is something natural in TCP, for example.
n some data, less than the buffer size, and more than the data transmitted. The sender did another transmit, after the one you receive, and both packets got into the kernel buffer. You have to investigate this case, as you have one full packet, and must save the rest of the received data in the buffer, for later processing, or you'll lose data you have received.
n some data, the full buffer filled, and no space to store the full transmitted data remained. You have filled the buffer and no \0 char came... the packet is larger than the buffer, you run out of buffer space and have to decide what to do (allocate other buffer to receive the rest, discard the data, or whatever you decide to do) This will not happen to you because you expect a packet of 1 or 2 characters, and you have a buffer of 100, but who knows...
At least, and as a minimum safe net, you can do this:
send(file_desc,"Input \'y\' to continue.\t",0x18,0);
int n = read(file_desc,buffer,sizeof buffer - 1); /* one cell reserved for '\0' */
switch (n) {
case -1: /* error */
do_error();
break;
case 0: /* disconnect */
do_disconnect();
break;
default: /* some data */
buffer[n] = '\0'; /* append the null */
break;
}
if (n > 0) {
iVar1 = strcmp("y",(char *)buffer);
if (iVar1 == 0) {
// some more code
}
}
Note:
As you didn't post a complete and verifiable example, I couldn't post a complete and verifiable response.
My apologies for that.

Is it OK to loop over recv / read to read all data from socket

I'm building a multi-client<->server messaging application over TCP.
I created a non blocking server using epoll to multiplex linux file descriptors.
When a fd receives data, I read() /or/ recv() into buf.
I know that I need to either specify a data length* at the start of the transmission, or use a delimiter** at the end of the transmission to segregate the messages.
*using a data length:
char *buffer_ptr = buffer;
do {
switch (recvd_bytes = recv(new_socket, buffer_ptr, rem_bytes, 0)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
buffer_ptr += recvd_bytes;
rem_bytes -= recvd_bytes;
} while (rem_bytes != 0);
**using a delimiter:
void get_all_buf(int sock, std::string & inStr)
{
int n = 1, total = 0, found = 0;
char c;
char temp[1024*1024];
// Keep reading up to a '\n'
while (!found) {
n = recv(sock, &temp[total], sizeof(temp) - total - 1, 0);
if (n == -1) {
/* Error, check 'errno' for more details */
break;
}
total += n;
temp[total] = '\0';
found = (strchr(temp, '\n') != 0);
}
inStr = temp;
}
My question: Is it OK to loop over recv() until one of those conditions is met? What if a client sends a bogus message length or no delimiter or there is packet loss? Wont I be stuck looping recv() in my program forever?
Is it OK to loop over recv() until one of those conditions is met?
Probably not, at least not for production-quality code. As you suggested, the problem with looping until you get the full message is that it leaves your thread at the mercy of the client -- if a client decides to only send part of the message and then wait for a long time (or even forever) without sending the last part, then your thread will be blocked (or looping) indefinitely and unable to serve any other purpose -- usually not what you want.
What if a client sends a bogus message length
Then you're in trouble (although if you've chosen a maximum-message-size you can detect obviously bogus message-lengths that are larger than that size, and defend yourself by e.g. forcibly closing the connection)
or there is packet loss?
If there is a reasonably small amount of packet loss, the TCP layer will automatically retransmit the data, so your program won't notice the difference (other than the message officially "arriving" a bit later than it otherwise would have). If there is really bad packet loss (e.g. someone pulled the Ethernet cable out of the wall for 5 minutes), then the rest of the message might be delayed for several minutes or more (until connectivity recovers, or the TCP layer gives up and closes the TCP connection), trapping your thread in the loop.
So what is the industrial-grade, evil-client-and-awful-network-proof solution to this dilemma, so that your server can remain responsive to other clients even when a particular client is not behaving itself?
The answer is this: don't depend on receiving the entire message all at once. Instead, you need to set up a simple state-machine for each client, such that you can recv() as many (or as few) bytes from that client's TCP socket as it cares to send to you at any particular time, and save those bytes to a local (per-client) buffer that is associated with that client, and then go back to your normal event loop even though you haven't received the entire message yet. Keep careful track of how many valid received-bytes-of-data you currently have on-hand from each client, and after each recv() call has returned, check to see if the associated per-client incoming-data-buffer contains an entire message yet, or not -- if it does, parse the message, act on it, then remove it from the buffer. Lather, rinse, and repeat.

What is the best way to read input of unpredictable and indeterminate (ie no EOF) size from stdin in C?

This must be a stupid question because this should be a very common and simple problem, but I haven't been able to find an answer anywhere, so I'll bite the bullet and ask.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data? Obviously if the data ends in some kind of terminator like a NUL or EOF then this is quite trivial, but my data does not. This is simple IPC: the two programs need to talk back and forth and ending the file streams with EOF would break everything.
I thought this should be fairly simple. Clearly programs talk to each other over pipes all the time without needing any arcane tricks, so I hope there is a simple answer that I'm too stupid to have thought of. Nothing I've tried has worked.
Something obvious like (ignoring necessary realloc's for brevity):
int size = 0, max = 8192;
unsigned char *buf = malloc(max);
while (fread((buf + size), 1, 1, stdin) == 1)
++size;
won't work since fread() blocks and waits for data, so this loop won't terminate. As far as I know nothing in stdio allows nonblocking input, so I didn't even try any such function. Something like this is the best I could come up with:
struct mydata {
unsigned char *data;
int slen; /* size of data */
int mlen; /* maximum allocated size */
};
...
struct mydata *buf = xmalloc(sizeof *buf);
buf->data = xmalloc((buf->mlen = 8192));
buf->slen = 0;
int nread = read(0, buf->data, 1);
if (nread == (-1))
err(1, "read error");
buf->slen += nread;
fcntl(0, F_SETFL, oflags | O_NONBLOCK);
do {
if (buf->slen >= (buf->mlen - 32))
buf->data = xrealloc(buf->data, (buf->mlen *= 2));
nread = read(0, (buf->data + buf->slen), 1);
if (nread > 0)
buf->slen += nread;
} while (nread == 1);
fcntl(0, F_SETFL, oflags);
where oflags is a global variable containing the original flags for stdin (cached at the start of the program, just in case). This dumb way of doing it works as long as all of the data is present immediately, but fails otherwise. Because this sets read() to be non-blocking, it just returns -1 if there is no data. The program communicating with mine generally sends responses whenever it feels like it, and not all at once, so if the data is at all large this exits too early and fails.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data?
There always has to be a way to determinate the size. Otherwise, the program would require infinite memory, and thus impossible to run on a physical computer.
Think about it this way: even in the case of a never-ending stream of data, there must be some chunks or points where you have to process it. For instance, a live-streamed video has to decode a portion of it (e.g. a frame). Or a video game which processes messages one by one, even if the game has undetermined length.
This holds true regardless of the type of I/O you decide to use (blocking/non-blocking, synchronous/asynchronous...). For instance, if you want to use typical blocking synchronous I/O, what you have to do is process the data in a loop: each iteration, you read as much data as is available, and process as much as you can. Whatever you can not process (because you have not received enough yet), you keep for the next iteration. Then, the rest of the loop is the rest of the logic of the program.
In the end, regardless of what you do, you (or someone else, e.g. a library, the operating system, the hardware buffers...) have to buffer incoming data until it can be processed.
Basically, you have two choices -- synchronous or asynchronous -- and both have their advantages and disadvantages.
For synchronous, you need either delimeters or a length field embedded in the record (or fixed length records, but that is pretty inflexible). This works best for synchronous protocols like synchronous rpc or simplex client-server interactions where only one side talks at a time while the other side waits. For ASCII/text based protocols, it is common to use a control-character delimiter like NL/EOL or NUL or CTX to mark the end of messages. Binary protocols more commonly use an embedded length field -- the receiver first reads the length and then reads the full amount of (expected) data.
For asynchronous, you use non-blocking mode. It IS possible to use non-blocking mode with stdio streams, it just requires some care. out-of-data conditions show up to stdio like error conditions, so you need to use ferror and clearerr on the FILE * as appropriate.
It's possible for both to be used -- for example in client-server interactions, the clients may use synchronous (they send a request and wait for a reply) while the server uses asynchronous (to be be robust in the presence of misbehaving clients).
The read api on Linux or the ReadFile Api on windows will immediately return and not wait for the specified number of bytes to fill the buffer (when reading a pipe or socket). Read then reurns the number of bytes read.
This means, when reading from a pipe, you set a buffersize, read as much as returned and the process it. You then read the next bit. The only time you are blocked is if there is no data available at all.
This differs from fread which only returns once the desired number of bytes are returned or the stream determines doing so is impossible (like eof).

Non-block socket consecutive file transfer

The server that Im working on (which is a Unix C multi-threaded non-block socket server) need to receive a file from a client and broadcast it to all the other clients connected to the server.
Everything is working at the exception that Im having a hard time to determine when a file is done transferring... since Im using non-block socket Im having the issue that sometimes during the file transfer recv return -1 (which I was assuming was the end of the file) then the next pass more bytes comes in.
I try to hack the whole thing putting "END" at the end of the stream. However, sometimes when multiple files are sent in a row the "END" is part of the same recv buffer as the beginning of the next file. Or even worst, sometimes I end up with a buffer that finish with EN and the next pass the D comes in.
What would be the best approach to avoid the situations mentioned above, I don't really want that each time I receive some bytes from the socket loop the whole accumulated buffer to check if "END" is part of it then cut appropriately... Im sure there's a better solution to this right?
Thanks in advance!
If recv() returns -1 it is an error and you need to inspect errno. Most probably it was EAGAIN or EWOULDBLOCK, which just means there is no data currently in the socket receive buffer. So you need to re-select().
When recv() returns zero the peer has disconnected the socket and the transfer is complete.
Signaling the end of a file with some byte sequence is not reliable, the file could contain that sequence. First send the file length - 4 bytes or 8 if you allow huge file transfer, use network byte order.
if ((n = read(..., filelen)) > 0) {
filelen -= n;
}
The most simpe case EJP is referring to, the case where you take the closing of the socket by the other end as end-of-file, could look like the following:
{
ssize_t sizeRead = 0;
while (sizeRead = recv(...)) {
if (0 > sizeRead) { /* recv() failed */
if ((EGAGAIN == errno) ¦¦ (EWOULDBLOCK == errno)) { /* retry the recv() on those two kinds of error */
usleep(1) /* optional */
continue;
}
else
break;
}
... /* process the data read ... */
}
if (0 > sizeRead) {
/* There had been an error during recv() */
}
}

Resources