I'm writing myself a small server daemon in C, and the basic parts like processing connects, disconnects and receives are already in, but a problem in receiving still persists.
I use "recv" to read 256 bytes at once into a char array, and because it can contain multiple lines of data as one big chunk, I need to be able to split each line separatly to process it.
That alone wouldn't be the problem, but because of the possibility that a line could be cut off because it didn't fit into the buffer anymore, I also need to be able to see if a line has been cut off. Not that bad, too, just check the last char for \r or \n, but what if the line was cut off? My code does not allow for easy "just keep reading more data" because I'm using select() to handle multiple requests.
Basically, this is my situation:
//This is the chunk of code ran after select(), when a socket
//has readable data
char buf[256] = { 0 };
int nbytes;
if ((nbytes = recv(i, buf, sizeof(buf) - 1, 0)) <= 0)
{
if (nbytes == 0)
{
struct remote_address addr;
get_peername(i, &addr);
do_log("[Socket #%d] %s:%d disconnected", i, addr.ip, addr.port);
}
else
do_log("recv(): %s", strerror(errno));
close(i);
FD_CLR(i, &clients);
}
else
{
buf[sizeof(buf) - 1] = 0;
struct remote_address addr;
get_peername(i, &addr);
do_log("[Socket #%d] %s:%d (%d bytes): %s", i, addr.ip, addr.port, nbytes, buf);
// split "buf" here, and process each line
// but how to be able to get the rest of a possibly cut off line
// in case it did not fit into the 256 byte buffer?
}
I was thinking about having a higher scoped temporary buffer variable (possibly malloc()'d) to save the current buffer in, if it was too long to fit in at once, but I always feel bad about introducing unnecessarily high scoped variables if there's a better solution :/
I appreciate any pointers (except for the XKCD ones :))!
I guess you need to add another per-stream buffer that holds the incomplete line until the line feed that comes after is received.
I'd use some kind of dynamically expanding buffer like GString to accumulate data.
The other thing that might help would be putting the socket into nonblocking mode using fcntl(). Then you can recv() in a loop until you get a -1. Check errno, it will be either EAGAIN or EWOULDBLOCK (and those aren't required to have the same value: check for both).
Final remark: I found that using libev (google it; I can't post multiple links) was more fun than using select().
Related
I'm trying to understand recv() at a high level. So recv takes data in "chunks" but I'm still not getting how it is precisely handled. Example:
char buffer[1000];
int received= recv(sock, buffer, sizeof(buffer), 0)
Does this mean if I'm receiving a massive file, the buffer, if connected through sock might for example reflect it stored 500 bytes in the received variable, then in a loop receive another 300 bytes, and all 800 bytes of data will be stored in buffer by the end of the loop (lost in the received variable unless accounted for), or does buffer need a pointer to keep track of where it last received the data to store it in then next iteration?
recv has no context. All it knows that it got some address (pointer) to write into and some maximum size - and then it will try this. It will always start writing with the given address. If for example on wish to add data after some previously received data one can simply give the pointer into the location after the previous data instead of the beginning of the buffer. Of course on should adjust the maximum size it is allowed to read to not overflow the buffer.
You asked "How does recv() work?", so it may be worth briefly studying a simpler function that does essentially the same thing - read().
recv() operates in more or less the same way as the read() function. The main difference is that recv() allows you to pass flags in the last argument - but you are not using these flags anyway.
My suggestion would be - before trying to use recv() to read from a network socket - to practice using read() on a plain text file.
Both functions return the number of bytes read - except in the case of an error, in which case they will return -1. You should always check for this scenario - and handle appropriately.
Both functions can also return less than the number of bytes requested. In the case of recv() - and reading from a socket - this may be because the other end has simply not sent all the required data yet. In the case of a reading from a file - with read() - it may be because you have reached the end of the file.
Anyway ...
You will need to keep track of the current offset within your buffer - and update it on each read. So declare a file-scope variable offset.
static off_t offset; static char buffer[1000];
Then - when your 'loop' is running - increment the offset after each read ...
while (1) {
size_t max_len = sizeof(buffer) - offset;
ssize_t count = recv(sock, buffer+offset, max_len, 0);
if (count == -1) {
switch (errno) {
case EAGAIN:
usleep(20000);
break;
default:
perror("Failed to read from socket");
close(sock);
break;
}
}
if (count == 0) {
puts("Looks like connection has been closed.");
break;
}
offset += count;
if (offset >= expected_len) {
puts("Got the expected amount of data. Wrapping up ...");
}
}
Notes:
Using this approach, you will either need to know the expected amount of data before-hand - or use a special delimiter to mark the end of the message
the max_len variable indicates how much space is left in your buffer - and (perhaps needless to say) you should not try to read more bytes than this
the destination for the recv() command is buffer+offset - not buffer.
if recv() returns zero, AFAIK this indicates that the other end has performed an "orderly shutdown".
if recv() returns -1, you really need to check the return code. EAGAIN is non-fatal - and just means you need to try again.
This must be a stupid question because this should be a very common and simple problem, but I haven't been able to find an answer anywhere, so I'll bite the bullet and ask.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data? Obviously if the data ends in some kind of terminator like a NUL or EOF then this is quite trivial, but my data does not. This is simple IPC: the two programs need to talk back and forth and ending the file streams with EOF would break everything.
I thought this should be fairly simple. Clearly programs talk to each other over pipes all the time without needing any arcane tricks, so I hope there is a simple answer that I'm too stupid to have thought of. Nothing I've tried has worked.
Something obvious like (ignoring necessary realloc's for brevity):
int size = 0, max = 8192;
unsigned char *buf = malloc(max);
while (fread((buf + size), 1, 1, stdin) == 1)
++size;
won't work since fread() blocks and waits for data, so this loop won't terminate. As far as I know nothing in stdio allows nonblocking input, so I didn't even try any such function. Something like this is the best I could come up with:
struct mydata {
unsigned char *data;
int slen; /* size of data */
int mlen; /* maximum allocated size */
};
...
struct mydata *buf = xmalloc(sizeof *buf);
buf->data = xmalloc((buf->mlen = 8192));
buf->slen = 0;
int nread = read(0, buf->data, 1);
if (nread == (-1))
err(1, "read error");
buf->slen += nread;
fcntl(0, F_SETFL, oflags | O_NONBLOCK);
do {
if (buf->slen >= (buf->mlen - 32))
buf->data = xrealloc(buf->data, (buf->mlen *= 2));
nread = read(0, (buf->data + buf->slen), 1);
if (nread > 0)
buf->slen += nread;
} while (nread == 1);
fcntl(0, F_SETFL, oflags);
where oflags is a global variable containing the original flags for stdin (cached at the start of the program, just in case). This dumb way of doing it works as long as all of the data is present immediately, but fails otherwise. Because this sets read() to be non-blocking, it just returns -1 if there is no data. The program communicating with mine generally sends responses whenever it feels like it, and not all at once, so if the data is at all large this exits too early and fails.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data?
There always has to be a way to determinate the size. Otherwise, the program would require infinite memory, and thus impossible to run on a physical computer.
Think about it this way: even in the case of a never-ending stream of data, there must be some chunks or points where you have to process it. For instance, a live-streamed video has to decode a portion of it (e.g. a frame). Or a video game which processes messages one by one, even if the game has undetermined length.
This holds true regardless of the type of I/O you decide to use (blocking/non-blocking, synchronous/asynchronous...). For instance, if you want to use typical blocking synchronous I/O, what you have to do is process the data in a loop: each iteration, you read as much data as is available, and process as much as you can. Whatever you can not process (because you have not received enough yet), you keep for the next iteration. Then, the rest of the loop is the rest of the logic of the program.
In the end, regardless of what you do, you (or someone else, e.g. a library, the operating system, the hardware buffers...) have to buffer incoming data until it can be processed.
Basically, you have two choices -- synchronous or asynchronous -- and both have their advantages and disadvantages.
For synchronous, you need either delimeters or a length field embedded in the record (or fixed length records, but that is pretty inflexible). This works best for synchronous protocols like synchronous rpc or simplex client-server interactions where only one side talks at a time while the other side waits. For ASCII/text based protocols, it is common to use a control-character delimiter like NL/EOL or NUL or CTX to mark the end of messages. Binary protocols more commonly use an embedded length field -- the receiver first reads the length and then reads the full amount of (expected) data.
For asynchronous, you use non-blocking mode. It IS possible to use non-blocking mode with stdio streams, it just requires some care. out-of-data conditions show up to stdio like error conditions, so you need to use ferror and clearerr on the FILE * as appropriate.
It's possible for both to be used -- for example in client-server interactions, the clients may use synchronous (they send a request and wait for a reply) while the server uses asynchronous (to be be robust in the presence of misbehaving clients).
The read api on Linux or the ReadFile Api on windows will immediately return and not wait for the specified number of bytes to fill the buffer (when reading a pipe or socket). Read then reurns the number of bytes read.
This means, when reading from a pipe, you set a buffersize, read as much as returned and the process it. You then read the next bit. The only time you are blocked is if there is no data available at all.
This differs from fread which only returns once the desired number of bytes are returned or the stream determines doing so is impossible (like eof).
This is for a Linux system, in C. It involves network programming. It is for a file transfer program.
I've been having this problem where this piece of code works unpredictably. It either is completely successful, or the while loop in the client never ends. I discovered that this is because the fileLength variable would sometimes be a huge (negative or positive) value, which I thought was attributed to making some mistake with ntohl. When I put in a print statement, it seemed to work perfectly, without error.
Here is the client code:
//...here includes relevant header files
int main (int argc, char *argv[]) {
//socket file descriptor
int sockfd;
if (argc != 2) {
fprintf (stderr, "usage: client hostname\n");
exit(1);
}
//...creates socket file descriptor, connects to server
//create buffer for filename
char name[256];
//recieve filename into name buffer, bytes recieved stored in numbytes
if((numbytes = recv (sockfd, name, 255 * sizeof (char), 0)) == -1) {
perror ("recv");
exit(1);
}
//Null terminator after the filename
name[numbytes] = '\0';
//length of the file to recieve from server
long fl;
memset(&fl, 0, sizeof fl);
//recieve filelength from server
if((numbytes = recv (sockfd, &fl, sizeof(long), 0)) == -1) {
perror ("recv");
exit(1);
}
//convert filelength to host format
long fileLength = ntohl(fl);
//check to make sure file does not exist, so that the application will not overwrite exisitng files
if (fopen (name, "r") != NULL) {
fprintf (stderr, "file already present in client directory\n");
exit(1);
}
//open file called name in write mode
FILE *filefd = fopen (name, "wb");
//variable stating amount of data recieved
long bytesTransferred = 0;
//Until the file is recieved, keep recieving
while (bytesTransferred < fileLength) {
printf("transferred: %d\ntotal: %d\n", bytesTransferred, fileLength);
//set counter at beginning of unwritten segment
fseek(filefd, bytesTransferred, SEEK_SET);
//buffer of 256 bytes; 1 byte for byte-length of segment, 255 bytes of data
char buf[256];
//recieve segment from server
if ((numbytes = recv (sockfd, buf, sizeof buf, 0)) == -1) {
perror ("recv");
exit(1);
}
//first byte of buffer, stating number of bytes of data in recieved segment
//converting from char to short requires adding 128, since the char ranges from -128 to 127
short bufLength = buf[0] + 128;
//write buffer into file, starting after the first byte of the buffer
fwrite (buf + 1, 1, bufLength * sizeof (char), filefd);
//add number of bytes of data recieved to bytesTransferred
bytesTransferred += bufLength;
}
fclose (filefd);
close (sockfd);
return 0;
}
This is the server code:
//...here includes relevant header files
int main (int argc, char *argv[]) {
if (argc != 2) {
fprintf (stderr, "usage: server filename\n");
exit(1);
}
//socket file descriptor, file descriptor for specific client connections
int sockfd, new_fd;
//...get socket file descriptor for sockfd, bind sockfd to predetermined port, listen for incoming connections
//...reaps zombie processes
printf("awaiting connections...\n");
while(1) {
//...accepts any incoming connections, gets file descriptor and assigns to new_fd
if (!fork()) {
//close socket file discriptor, only need file descriptor for specific client connection
close (sockfd);
//open a file for reading
FILE *filefd = fopen (argv[1], "rb");
//send filename to client
if (send (new_fd, argv[1], strlen (argv[1]) * sizeof(char), 0) == -1)
{ perror ("send"); }
//put counter at end of selected file, and find length
fseek (filefd, 0, SEEK_END);
long fileLength = ftell (filefd);
//convert length to network form and send it to client
long fl = htonl(fileLength);
//Are we sure this is sending all the bytes??? TEST
if (send (new_fd, &fl, sizeof fl, 0) == -1)
{ perror ("send"); }
//variable stating amount of data unsent
long len = fileLength;
//Until file is sent, keep sending
while(len > 0) {
printf("remaining: %d\ntotal: %d\n", len, fileLength);
//set counter at beginning of unread segment
fseek (filefd, fileLength - len, SEEK_SET);
//length of the segment; 255 unless last segment
short bufLength;
if (len > 255) {
len -= 255;
bufLength = 255;
} else {
bufLength = len;
len = 0;
}
//buffer of 256 bytes; 1 byte for byte-length of segment, 255 bytes of data
char buf[256];
//Set first byte of buffer as the length of the segment
//converting short to char requires subtracting 128
buf[0] = bufLength - 128;
//read file into the buffer starting after the first byte of the buffer
fread(buf + 1, 1, bufLength * sizeof(char), filefd);
//Send data too client
if (send (new_fd, buf, sizeof buf, 0) == -1)
{ perror ("send"); }
}
fclose (filefd);
close (new_fd);
exit (0);
}
close (new_fd);
}
return 0;
}
Note: I've simplified the code a bit, to make it clearer I hope.
Anything beginning with //... represents a bunch of code
You seem to be assuming that each send() will either transfer the full number of bytes specified or will error out, and that each one will will pair perfectly with a recv() on the other side, such that the recv() receives exactly the number of bytes sent by the send() (or error out), no more and no less. Those are not safe assumptions.
You don't show the code by which you set up the network connection. If you're using a datagram-based protocol (i.e. UDP) then you're more likely to get the send/receive boundary matching you expect, but you need to account for the possibility that packets will be lost or corrupted. If you're using a stream-based protocol (i.e. TCP) then you don't have to be too concerned with data loss or corruption, but you have no reason at all to expect boundary-matching behavior.
You need at least three things:
An application-level protocol on top of the network-layer. You've got parts of that already, such as in how you transfer the file length first to advise the client about much content to expect, but you need to do similar for all data transferred that are not of pre-determined, fixed length. Alternatively, invent another means to communicate data boundaries.
Every send() / write() that aims to transfer more than one byte must be performed in a loop to accommodate transfers being broken into multiple pieces. The return value tells you how many of the requested bytes were transferred (or at least how many were handed off to the network stack), and if that's fewer than requested you must loop back to try to transfer the rest.
Every recv() / read() that aims to transfer more than one byte must be performed in a loop to accommodate transfers being broken into multiple pieces. I recommend structuring that along the same lines as described for send(), but you also have the option of receiving data until you see a pre-arranged delimiter. The delimiter-based approach is more complicated, however, because it requires additional buffering on the receiving side.
Without those measures, your server and client can easily get out of sync. Among the possible results of that are that the client interprets part of the file name or part of the file content as the file length.
Even though you removed it from that code I'll make an educated guess and assume that you're using TCP or some other stream protocol here. This means that the data that the servers sends is a stream of bytes and the recv calls will not correspond in the amount of data they get with the send calls.
It is equally legal for your first recv call to just get one byte of data, as it is to get the file name, file size and half of the file.
You say
When I put in a print statement,
but you don't say where. I'll make another educated guess here and guess that you did it on the server before sending the file length. And that happened to shake things enough that the data amounts that were sent on the connection just accidentally happened to match what you were expecting on the client.
You need to define a protocol. Maybe start with a length of the filename, then the filename, then the length of the file. Or always send 256 bytes for the filename regardless of how long it is. Or send the file name as a 0-terminated string and try to figure out the data from that. But you can never assume that just because you called send with X bytes that the recv call will get X bytes.
I believe the issue is actually a compound of everything you and others have said. In the server code you send the name of the file like this:
send (new_fd, argv[1], strlen (argv[1]) * sizeof(char), 0);
and receive it in the client like this:
recv (sockfd, name, 255 * sizeof (char), 0);
This will cause an issue when the filename length is anything less than 255. Since TCP is a stream protocol (as mentioned by #Art), there are no real boundaries between the sends and recvs, which can cause you to receive data in odd places where you are not expecting them.
My recommendation would be to first send the length of the filename, eg:
// server
long namelen = htonl(strlen(argv[1]));
send (new_fd, &namelen, 4, 0);
send (new_fd, argv[1], strlen (argv[1]) * sizeof(char), 0);
// client
long namelen;
recv (sockfd, &namelen, 4, 0);
namelen = ntohl(namelen);
recv (sockfd, name, namelen * sizeof (char), 0);
This will ensure that you are always aware of exactly how long your filename is and makes sure that you aren't accidentally reading your file length from somewhere in the middle of your file (which is what I expect is happening currently).
edit.
Also, be cautious when you are sending sized numbers. If you use the sizeof call on them, you may be sending and receiving different sizes. This is why I hard-coded the sizes in the send and recv for the name length so that there is no confusion on either side.
Well, after some testing, I discovered that the issue causing the problem did have something to do with htonl(), though I had still read the data incorrectly in the beginning. It wasn't that htonl() wasn't working at all, but that I didn't realize a 'long' has different lengths depending on system architecture (thanks #tofro). That is to say the length of a 'long' integer on 32-bit and 64-bit operating systems is 4 bytes and 8 bytes, respectively. And the htonl() function (from arpa/inet.h) for 4-byte integers. I was using a 64-bit OS, which explains why the value was being fudged. I fixed the issue by using the int32_t variable (from stdint.h) to store the file length. So the main issue in this case was not that it was becoming out of sync (I think). But as for everyone's advice towards developing an actual protocol, I think I know what exactly you mean, I definitely understand why it's important, and I'm currently working towards it. Thank you all for all your help.
EDIT: Well now that it has been several years, and I know a little more, I know that this explanation doesn't make sense. All that would result from long being larger than I expected (8 bytes rather than 4) is that there's some implicit casting going on. I used sizeof(long) in the original code rather than hardcoding it to assume 4 bytes, so that particular (faulty) assumption of mine shouldn't have produced the bug I saw.
The problem is almost certainly what everyone else said: one call to recv was not getting all of the bytes representing the file length. At the time I doubted this was the real cause of the behaviour I saw, because the file name (of arbitrary length) I was sending through was never partially sent (i.e. the client always created a file of the correct filename). Only the file length was messed up. My hypothesis at the time was that recv mostly respected message boundaries, and while recv can possibly only send part of the data, it was more likely that it was sending it all and there was another bug in my code. I now know this isn't true at all, and TCP doesn't care.
I'm a little curious as to why I didn't see other unexpected behaviour as well (e.g. the file name being wrong on the receiving end), and I wanted to investigate further, but despite managing to find the files, I can't seem to reproduce the problem now. I suppose I'll never know, but at least I understand the main issue here.
I am implementing both server and client side of a simple file download program. Client side requests file names from the server with get command, and server responses quickly. While server writes to the socket, clients reads the socket and prints out the buffer. After that time, program starts not to interpret my commands unless I press 'Enter' twice. (You can see it from Shell output below)
After debugging, I found out that it is because of the buffer size. While server writing to the socket; everything works properly if I use a small buffer size. But if I use a larger buffer size such as 1024, that problem occurs. How can I get rid of this issue?
#define F_BUFF_SIZE 1024
On server side:
/* ... */
if(!strcmp(buffer, "list\n")) {
char buff[F_BUFF_SIZE];
bzero(buff, F_BUFF_SIZE);
pt_ret = pthread_create(&thread_id, NULL, getfiles, (void*) buff);
pthread_join(thread_id, pt_ret);
n = write(sock, buff, F_BUFF_SIZE);
/* ... */
On client side:
/* ... */
char buffer[F_BUFF_SIZE];
bzero(buffer, F_BUFF_SIZE);
n = read(b_sock, buffer, F_BUFF_SIZE - 1);
if (n < 0) {
#ifdef _DEBUG_
fprintf(stderr, "Error: Could not read from the socket.\n");
#endif
return 0;
}
fputs(buffer, stdout);
/* ... */
Shell:
Opening socket: OK!
Connecting: OK!
# list
client
project1.mk
cs342.workspace
client.c
project1.project
cs342.workspace.session
server
cs342_wsp.mk
server.c
cs342.tags
# get
# take
get
# take
Unknown command.
...
There is no magic to having a smaller buffer size, this is just exposing that you have an error elsewhere. Joerg's comment is an important point - you need to be reading the same amount of data that you are writing. I'm wondering if there's also an issue with how you populate buff. You need to make sure that you are not overrunning the end of the buffer, or forgetting to add a null terminator to the end of the string.
By the way, it is important to read the size - 1; you're correct to do this since read won't append a null terminator to a string. You just need to be sure that you're writing that amount because otherwise there can be problems.
You must have some issue along these lines - this would explain why changing the size avoids the problem because problems like this are only exposed when the numbers line up perfectly. Running the program in valgrind may expose the issue (look for invalid write errors).
#ahmet, your useful question lead me to research a little bit. If you read a little bit on this paper you will have a better idea of what you are dealing with and then you will be able to determine the best buffer size in your situation.
As you may know, values like that should always be part of the settings of the application, so don't scramble those values inside the code.
Also, here are good advices on how to figure buffer size, hope that helps,
I need to read from an AF_UNIX socket to a buffer using the function read from C, but I don't know the buffer size.
I think the best way is to read N bytes until the read returns 0 (no more writers in the socket). Is this correct? Is there a way to guess the size of the buffer being written on the socket?
I was thinking that a socket is a special file. Opening the file in binary mode and getting the size would help me in knowing the correct size to give to the buffer?
I'm a very new to C, so please keep that in mind.
On common way is to use ioctl(..) to query FIONREAD of the socket which will return how much data is available.
int len = 0;
ioctl(sock, FIONREAD, &len);
if (len > 0) {
len = read(sock, buffer, len);
}
One way to read an unknown amount from the socket while avoiding blocking could be to poll() a non-blocking socket for data.
E.g.
char buffer[1024];
int ptr = 0;
ssize_t rc;
struct pollfd fd = {
.fd = sock,
.events = POLLIN
};
poll(&fd, 1, 0); // Doesn't wait for data to arrive.
while ( fd.revents & POLLIN )
{
rc = read(sock, buffer + ptr, sizeof(buffer) - ptr);
if ( rc <= 0 )
break;
ptr += rc;
poll(&fd, 1, 0);
}
printf("Read %d bytes from sock.\n", ptr);
I think the best way is to read N
bytes until the read returns 0 (no
more writers in the socket). Is this
correct?
0 means EOF, other side has closed the connection. If other side of communication closes the connection, then it is correct.
If connection isn't closed (multiple transfers over the same connect, chatty protocol), then the case is bit more complicated and behavior generally depends on whether you have SOCK_STREAM or SOCK_DGRAM socket.
Datagram sockets are already delimited for you by the OS.
Stream sockets do not delimit messages (all data are an opaque byte stream) and if desired one has to implement that on application level: for example by defining a size field in the message header structure or using a delimiter (e.g. '\n' for single-line text messages). In first case you would first read the header, extract length and using the length read the rest of the message. In other case, read stream into partial buffer, search for the delimiter and extract from buffer the message including the delimiter (you might need to keep the partial buffer around as depending on protocol several command can be received with single recv()/read()).
Is there a way to guess the
size of the buffer being written on
the socket?
For stream sockets, there is no reliable way as the other side of communication might be still in process of writing the data. Imagine the quite normal case: socket buffer is 32K and 128K is being written. Writing application would block inside send()/write(), the OS waiting for reading application to read out the data and thus free space for the next chunk of written data.
For datagram sockets, one normally knows the size of the message beforehand. Or one can try (never did that myself) recvmsg( MSG_PEEK ) and if the MSG_TRUNC is in the returned msghdr.msg_flags, try to increase the buffer size.
you are correct, if you don't know the size of the input you can just read one byte each time and append it to a larger buffer.
read N bytes until the read returns 0
Yes!
One added detail. If the sender doesn't close the connection, the socket will just block, instead of returning. A nonblocking socket will return -1 (with errno == EAGAIN) when there's nothing to read; that's another case.
Opening the file in binary mode and getting the size would help me in knowing the correct size to give to the buffer?
Nope. Sockets don't have a size. Suppose you sent two messages over the same connection: How long is the file?