TCP Client - Receive message from unknown / unlimited size - c

I am currently sitting at a university task and am facing a problem that cannot be solved for me. I'm developing a TCP client which connects to a server and gets a message from there.
The client should be able to work with strings of any length and output all received characters until the server closes the connection.
My client works and with a fixed string length, I can also receive messages from e.g. djxmmx.net port 17. However, I have no idea how to map this arbitrary length.
My C knowledge is really poor, which is why I need some suggestions, ideas or tips on how to implement my problem.
Actual this is my code for receiving messages:
// receive data from the server
char server_response[512];
recv(client_socket, &server_response, sizeof(server_response), 0);

If you're going to work with input of essentially unlimited length, you will need to call recv() several times in a loop to get each succeeding section of the input. If you can deal with each section at a time, and then discard it and move onto the next section, that's one approach. If you are going to need to process all the input in one go, you're going to have to find a way of storing arbitrarily large amounts of data, probably using dynamic memory allocation.
With recv() you will probably want to loop reading content until it returns 0 indicating that the socket has performed an orderly shutdown (documentation here). That might look something like this:
char server_response[512];
ssize_t bytes_read;
while ((bytes_read = recv(client_socket, &server_response,
sizeof(server_response), 0)) > 0) {
/* do something with the data of length bytes_read
in server_response[] */
}

Related

tcp send and recv: always in loops?

What are the best practices when sending (or writing) and recving (or reading) to/from a TCP socket ?
Assume usual blocking I/O on sockets. From what I understand :
writing (sending) should be fine without a loop, because it will block if the write buffer of the socket is full, so something like
if ((nbytes_w = write(sock, buf, nb)) < nb)
/* something bad happened : error or interrupted by signal */
should always be correct ?
on the other hand, there is no guaranty that one will read a full message, so one should read with
while ((nbytes_r = read(sock, buf, MAX)) > 0) {
/* do something with these bytes */
/* break if encounter specific application protocol end of message flag
or total number of bytes was known from previous message
and/or application protocol header */
}
Am I correct ? Or is there some "small message size" or other conditions allowing to read safely outside a loop ?
I am confused because I have seen examples of "naked reads", for instance in Tanenbaum-Wetherall:
read(sa, buf, BUF_SIZE); /* read file name in socket */
Yes you must loop on the receive
Once a week I answer a question where someones TCP app stops working for this very reason. The real killer is that they developped the client and server on the same machine, so they get loopback connection. Almost all the time a loopback will receive the send messages in the same blocks as they were sent. This makes it look like the code is correct.
The really big challenge is that this means you need to know before the loop how big the message is that you are going to receive. Possibilities
send a fixed length length (ie you know its , say, 4 bytes) first.
have a recognizable end sequence (like the double crlf at the end of an HTTP request.
Have a fixed size message
I would always have a 'pull the next n bytes' function.
Writing should loop too, but that easy, its just a matter of looping.

What is the correct way to use send() on sockets when the full message has not been sent in one go?

I am writing a simple C server that may sometimes not send nor receive the full message. I have looked at the beej guide and the linux man page among other resources. I cannot figure out how I can send and receive when multiple send and receive calls are necessary. This is what I have tried to do for send:
char* buffer [4096];
int client_socket, buffer_len, message_len, position;
....
while (position < message_len) {
position = send(client_socket, buffer, message_len, 0);
}
I am not sure if I should be doing that or..
while (position < message_len) {
position = send(client_socket, buffer+position, message_len-position, 0);
}
The docs do not address this and I cannot find a usage example that has send within a while loop. Some C functions can track state between function calls (such as strtok) but I am not sure if send does. What I don't want to do is repeatedly send from the beginning of the message until it completes in one go.
It is necessary that I send files that are up to 50MB at a time and so there will likely be more than one call to send in this scenario.
send() returns the number of bytes sent, or -1 if an error occurred. If you keep track of how many bytes you have sent, you can use that as an offset in the buffer you send from. The length of the message that remains to be sent of course decreases by the same amount.
int bytes_sent_total = 0;
int bytes_sent_now = 0;
while (bytes_sent_total < message_len)
{
bytes_sent_now = send(client_socket, &buffer[bytes_sent_total], message_len - bytes_sent_total, 0);
if (bytes_sent_now == -1)
{
// Handle error
break;
}
bytes_sent_total += bytes_sent_now;
}
Assuming you're using a stream socket (not specified), in fact it doesn't matter how many calls to the 'send' function your program will do. The socket library offers the abstraction of sending data as writing to a file. The network layer will divide the data into small packets for sending them through the net.
On the client side, the network layer reassembles the received packets and offers a similar abstraction for the client, so that receiving data is like reading from a file. So you don't have to read the entire buffer in a single call.
For the client side, this introduces a small gimmick: when to stop reading? Common idioms are:
Knowing beforehand how much data to expect (by protocol design).
Iterating reads of small chunks (say: 1k or so) with a reasonable timeout, stop on timeout.
Prepending the data with a field containing its size.
Closing the socket right after sending the data (that's what HTTP usually does).

TCP client failed to send string to server

I am programming TCP server client. I sending the three string seperately using seperate send system call.
But receiving end i getting only single string that is first string which i send. remaining two string missed.
Below i given the part of my server client program.
client.c
char *info = "infolog";
char *size = "filesize";
char *end = "fileend";
send(client, info, strlen(info)+1, 0);
send(client, size, strlen(size)+1, 0);
send(client, end, strlen(end)+1, 0);
server.c
while ((read_size = recv(client, msg, sizeof(msg), 0))) {
printf("Data: %s\n", msg);
memset(msg, 0, sizeof(msg));
}
Actual output:
Data: infolog
Expected output
Data: infolog
Data: filesize
Data: fileend
Thanks.
Try printing out read_size. You probably have received all the messages already.
Due to Nagle's Algorithm, the sender probably batched up your three send() calls and sent a single packet to the server. While you can disable Nagle's algorithm, I don't think it's a good idea in this case. Your server needs to be able to handle receiving of partial data, and handle receiving more data than it expects.
You might want to look into using an upper-layer protocol for your messages, such as Google Protocol Buffers. Take a look at the techniques page, where they describe how they might do it: build up a protocol buffer, and write its length to the stream before writing the buffer itself. That way the receive side can read the length and then determine how many bytes it needs to read before it has a complete message.
TCP is not a message protocol but a byte stream protocol.
The three send-s could be recv-ed as a single input (or something else, e.g. in two or five recv etc....)
The application should analyze the input and buffer it to be able to splice it in meaningful messages.
the transmission may split or merge the messages, e.g. intermediate routers can and will split or merge the "packets".
In practice you'll better have some good conventions about your messages. Either decide that each message is e.g. newline terminated, or decide that it starts with some header giving its size.
Look at HTTP, SMTP, IMAP, SCGI or ONC/XDR (documented in RFC5531) as concrete examples. And document quite well your protocol (a minima, in long descriptive comments for some homework toy project, and more seriously, in a separate public document).

how to correctly use recv() sys call

I am in the process of writing a TCP server using Berkely SOCKET API under linux. Now the clients have a set of specifications and each message send by the client is based on those specifications i-e the message sent from the client corresponds to one of the many structs specified by the specs. Now the scenario is that the server doesn't know which message the client is going to send at what time. Once we get the message we can figure out what it is but not before that. The messages sent by the clients have variable lengths so we can not know in advance what message we are going to get. To solve this I have used the following method:
const char *buf[4096] = { 0 };
if ( recv (connected, buf, 4096, 0) == -1)
{
printf ("Error in recvng message\n");
exit (-1);
}
That is I use a default buffer of size 4096 (no message from the client can be larger than this size). receive in that buffer and then afterwards I Check the message type and take the corresponding action as follows:
struct ofp_header *oph;
oph=(struct ofp_header *)buf;
switch (oph->type)
{
case example_pkt:
handle_example_pkt();
break;
}
This works fine but I just wanted to confirm that is it an appropriate method or is there something else that could be better than this. All help much appreciated.
Thanks.
TCP is stream-based. This means if you use a buffer larger than your message you may receive part of the next message as well.
This means you will need to know the size of the message and incorporate any additional data into the next message. There are two obvious ways to do this:
Modify the protocol to send the size of each message as the first few bytes of the message. Read the size first, then only read that many bytes.
Since you know the size of each message, keep track of how many bytes you read. Process the first message in the buffer then subtract the size of that message from the remaining bytes in the buffer. Continue to repeat this process until you either A. Don't have enough bytes left to identify the message type or B. Don't have enough bytes for a message of the detected type. Save any remaining bytes and call recv again to read more data.

C: Using a select call, when I am reading, how do I keep track of the data?

First of all, I've never worked with C before (mostly Java which is the reason you'll find me write some naive C code). I am writing a simple command interpreter in C. I have something like this:
//Initialization code
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
perror("Select dead");
exit(EXIT_FAILURE);
}
....
....
//Loop through connections to see who has the data ready
//If the data is ready
if ((nbytes = recv(i, buf, sizeof(buf), 0)) > 0) {
//Do something with the message in the buffer
}
Now if I'm looking at something like a long paragraph of commands, it is obvious that a 256 byte buffer will not be able to get the entire command. For the time being, I'm using a 2056 byte buffer to get the entire command. But if I want to use the 256 byte buffer, how would I go about doing this? Do I keep track of which client gave me what data and append it to some buffer? I mean, use something like two dimensional arrays and such?
Yes, the usual approach is to have a buffer of "data I've received but not processed" for each client, large enough to hold the biggest protocol message.
You read into that buffer (always keeping track of how much data is currently in the buffer), and after each read, check to see if you have a complete message (or message(s), since you might get two at once!). If you do, you process the message, remove it from the buffer and shift any remaining data up to the start of the buffer.
Something roughly along the lines of:
for (i = 0; i < nclients; i++)
{
if (!FD_ISSET(client[i].fd, &read_fds))
continue;
nbytes = recv(client[i].fd, client[i].buf + client[i].bytes, sizeof(client[i].buf) - client[i].bytes, 0);
if (nbytes > 0)
{
client[i].bytes += nbytes;
while (check_for_message(client[i]))
{
size_t message_len;
message_len = process_message(client[i]);
client[i].bytes -= message_len;
memmove(client[i].buf, client[i].buf + message_len, client[i].bytes);
}
}
else
/* Handle client close or error */
}
By the way, you should check for errno == EINTR if select() returns -1, and just loop around again - that's not a fatal error.
I would keep a structure around for each client. Each structure contains a pointer to a buffer where the command is read in. Maybe you free the buffers when they're not used, or maybe you keep them around. The structure could also contain the client's fd in it as well. Then you just need one array (or list) of clients which you loop over.
The other reason you'd want to do this, besides the fact that 256 bytes might not be enough, is that recv doesn't always fill the buffer. Some of the data might still in transit over the network.
If you keep around buffers for each client, however, you can run into the "slowloris" attack, where a single client keeps sending little bits of data and takes up all your memory.
It can be a serious pain when you get tons of data like that over a network. There is a constant trade between allocating a huge array or multiple reads with data moves. You should consider getting a ready made linked list of buffers, then traverse the linked list as you read the buffers in each node of the linked list. That way it scales gracefully and you can quickly delete what you've processed. I think that's the best approach and it's also how boost asio implements buffered reads.
If you're dealing with multiple clients a common approach to to fork/exec for each connection. Your server would listen for incoming connections, and when one is made it would fork and and exec a child version of itself that would then handle the "command interpreter" portion of the problem.
This way you're letting the OS manage the client processes--that is, you don't have to have a data structure in your program to manage them. You will still need to clean up child processes in your server as they terminate.
As for managing the buffer...How much data do you expect before you post a response? You may need to be prepared to dynamically adjust the size of your buffer.

Resources