Related
I know that read/write C functions from <unistd.h> are not guaranteed to read/write exactly N bytes as requested by size_t nbyte argument (especially for sockets).
How to read/write full buffer from/to a file(or socket) descriptor?
That read() and write() do not guarantee to transfer the full number of bytes requested is a feature, not a shortcoming. If that feature gets in your way in a particular application then it is probably better to use the the existing facilities of the standard library to deal with it than to roll your own (though I certainly have rolled my own from time to time).
Specifically, if you have a file descriptor on which you want to always transfer exact numbers of bytes then you should consider using fdopen() to wrap it in a stream and then performing I/O with fread() and fwrite(). You might also use setvbuf() to avoid having an intermediary buffer. As a possible bonus, you can then also use other stream functions with that, such as fgets() and fprintf().
Example:
int my_fd = open_some_resource();
// if (my_fd < 0) ...
FILE *my_file = fdopen(my_fd, "r+b");
// if (my_file == NULL) ...
int rval = setvbuf(my_file, NULL, _IONBF, 0);
// if (rval != 0) ...
Note that it is probably best to thereafter use only the stream, not the underlying file descriptor, and that is the main drawback of this approach. On the other hand, you can probably allow the FD to be lost, because closing the stream will also close the underlying FD.
Nothing particularly special is required to make fread() and fwrite() to transfer full-buffer units (or fail):
char buffer[BUF_SIZE];
size_t blocks = fread(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...
// ...
blocks = fwrite(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...
Do note that you must get the order of the second and third arguments right, however. The second is the transfer unit size, and the third is the number of units to transfer. Partial units will not be transferred unless an error or end-of-file occurs. Specifying the transfer unit as the full number of bytes you want to transfer and asking (therefore) for exactly one unit is what achieves the semantics you ask about.
You use a loop.
For example, with proper error checking:
/** Read a specific number of bytes from a file or socket descriptor
* #param fd Descriptor
* #param dst Buffer to read data into
* #param minbytes Minimum number of bytes to read
* #param maxbytes Maximum number of bytes to read
* #return Exact number of bytes read.
* errno is always set by this call.
* It will be set to zero if an acceptable number of bytes was read.
* If there was
and to nonzero otherwise.
* If there was not enough data to read, errno == ENODATA.
*/
size_t read_range(const int fd, void *const dst, const size_t minbytes, const size_t maxbytes)
{
if (fd == -1) {
errno = EBADF;
return 0;
} else
if (!dst || minbytes > maxbytes) {
errno = EINVAL;
return 0;
}
char *buf = (char *)dst;
char *const end = (char *)dst + minbytes;
char *const lim = (char *)dst + maxbytes;
while (buf < end) {
ssize_t n = read(fd, buf, (size_t)(lim - buf));
if (n > 0) {
buf += n;
} else
if (n == 0) {
/* Premature end of input */
errno = ENODATA; /* Example only; use what you deem best */
return (size_t)(buf - (char *)dst);
} else
if (n != -1) {
/* C library or kernel bug */
errno = EIO;
return (size_t)(buf - (char *)dst);
} else {
/* Error, interrupted by signal delivery, or nonblocking I/O would block. */
return (size_t)(buf - (char *)dst);
}
}
/* At least minbytes, up to maxbytes received. */
errno = 0;
return (size_t)(buf - (char *)dst);
}
Some do find it odd that it clears errno to zero on successful calls, but it is perfectly acceptable in both standard and POSIX C.
Here, it means that typical use cases are simple and robust. For example,
struct message msgs[MAX_MSGS];
size_t bytes = read_range(fd, msgs, sizeof msgs[0], sizeof msgs);
if (errno) {
/* Oops, things did not go as we expected. Deal with it.
If bytes > 0, we do have that many bytes in msgs[].
*/
} else {
/* We have bytes bytes in msgs.
bytes >= sizeof msgs[0] and bytes <= sizeof msgs.
*/
}
If you have a pattern where you have fixed or variable sized messages, and a function that consumes them one by one, do not assume that the best option is to try and read exactly one message at a time, because it is not.
This is also why the above example has minbytes and maxbytes instead of a single exactly_this_many_bytes parameter.
A much better pattern is to have a larger buffer, where you memmove() the data only when you have to (because you're running out of room, or because the next message is not sufficiently aligned).
For example, let's say you have a stream socket or file descriptor, where each incoming message consists of a three byte header: the first byte identifies the message type, and the next two bytes (say, less significant byte first) identify the number of data payload bytes associated with the message. This means that the maximum total length of a message is 1+2+65535 = 65538 bytes.
For efficiently receiving the messages, you'll use a dynamically allocated buffer. The buffer size is a software engineering question, and other than that it has to be at least 65538 bytes, its size – and even whether it should grow and shrink dynamically – depends on the situation. So, we'll just assume that we have unsigned char *data; pointing to a buffer of size size_t size; already allocated.
The loop itself could look something like the following:
size_t head = 0; /* Offset to current message */
size_t tail = 0; /* Offset to first unused byte in buffer */
size_t mlen = 0; /* Total length of the current message; 0 is "unknown"*/
while (1) {
/* Message processing loop. */
while (head + 3 <= tail) {
/* Verify we know the total length of the message
that starts at offset head. */
if (!mlen)
mlen = 3 + (size_t)(data[head + 1])
+ (size_t)(data[head + 2]) << 8;
/* If that message is not yet complete, we cannot process it. */
if (head + mlen > tail)
break;
/* type datalen, pointer to data */
handle_message(data[head], mlen - 3, data + head + 3);
/* Skip message in buffer. */
head += mlen;
/* Since we do not know the length of the next message,
or rather, the current message starting at head,
we do need to reset mlen to "unknown", 0. */
mlen = 0;
}
/* At this point, the buffer contains less than one full message.
Whether it is better to always move a partial leftover message
to the beginning of the buffer, or only do so if the buffer
is full, depends on the workload and buffer size.
The following one may look complex, but it is actually simple.
If the current start of the buffer is past the halfway mark,
or there is no more room at the end of the buffer, we do the move.
Only if the current message starts in the initial half, and
when there is room at the end of the buffer, we leave it be.
But first: If we have no data in the buffer, it is always best
to start filling it from the beginning.
*/
if (head >= tail) {
head = 0;
tail = 0;
} else
if (head >= size/2 || tail >= size) {
memmove(data, data + head, tail - head);
tail -= head;
head = 0;
}
/* We do not have a complete message, but there
is room in the buffer (assuming size >= 65538),
we need to now read more data into the buffer. */
ssize_t n = read(sourcefd, data + tail, size - tail);
if (n > 0) {
tail += n;
/* Check if it completed one or more messages. */
continue;
} else
if (n == 0) {
/* End of input. If buffer is empty, that's okay. */
if (head >= tail)
break;
/* Ouch: We have partial message in the buffer,
but there will be no more incoming data! */
ISSUE_WARNING("Discarding %zu byte partial message due to end of input.\n", tail - head);
break;
} else
if (n != -1) {
/* This should not happen. If it does, it is a C library
or kernel bug. We treat it as fatal. */
ISSUE_ERROR("read() returned %zd; dropping connection.\n", n);
break;
} else
if (errno != EINTR) {
/* Everything except EINTR indicates an error to us; we do
assume that sourcefd is blocking (not nonblocking). */
ISSUE_ERROR("read() failed with errno %d (%s); dropping connection.\n", errno, strerror(errno));
break;
}
/* The case n == -1, errno == EINTR usually occurs when a signal
was delivered to a handler using this thread, and that handler
was installed without SA_RESTART. Depending on what kind of
a device or socket sourcefd is, there could be additional cases;
but in general, it just means "something unrelated happened,
but you were to be notified about it, so EINTR you get".
Simply put, EINTR is not really an error, just like
EWOULDBLOCK/EAGAIN is not an error for nonblocking descriptors,
they're just easiest to treat as an "error-like situation" in C.
*/
}
/* close(sourcefd); */
Note how the loop does not actually try to read any specific amount of data? It just reads as much as it can, and processes it as it goes.
Could one read such messages precisely, by first reading exactly the three-byte header, then exactly the data payload? Sure, but that means you make an awful amount of syscalls; at minimum two per message. If the messages are common, you probably do not want to do that because of the syscall overhead.
Could one use the available buffer more carefully, and remove the type and data payload length from the next message in the buffer as soon as possible? Well, that is the sort of question one should discuss with colleagues or developers having written such code before. There are positives (mainly, you save three bytes), and negatives (added code complexity, which always makes code harder to maintain long term, and risks introducing bugs). On a microcontroller with just 128 bytes of buffer for incoming command messages, I probably would do that; but not on a desktop or server that prefers a few hundred kilobytes to a couple of megabytes of buffer for such code (since the memory "waste" is often covered by the smaller number of syscalls especially when processing lots of messages). No quick answers! :)-
Both read and write on success return ssize_t containing amount of bytes read/written. You can use it to construct a loop:
A reliable read():
ssize_t readall(int fd, void *buff, size_t nbyte) {
size_t nread = 0; size_t res = 0;
while (nread < nbyte) {
res = read(fd, buff+nread, nbyte-nread);
if (res == 0) break;
if (res == -1) return -1;
nread += res;
}
return nread;
}
A reliable write() (almost same):
ssize_t writeall(int fd, void *buff, size_t nbyte) {
size_t nwrote = 0; size_t res = 0;
while (nwrote < nbyte) {
res = write(fd, buff+nwrote, nbyte-nwrote);
if (res == 0) break;
if (res == -1) return -1;
nwrote += res;
}
return nwrote;
}
Basically it reads/writes until total amount of bytes != nbyte.
Please note, this answer uses only <unistd.h> functions, assuming there is a reason to use it. If you can use <stdio.h> too, see answer by John Bollinger, which uses fdopen;setvbuf and then fread/fwrite. Also, take a look at answer by Blabbo is Verbose for read_range function with a lot of features.
Here is my code:
FILE *responseStorage = fopen("response", "w");
if(responseStorage == NULL){
//error handling
}
int receivedTotal = 0;
while(1){
ssize_t received = recv(sockfd, buf, BUFSIZ, 0);
if( received == -1 ){
//error handling
}
if( received == 0 ){
//end of stream
break;
}
receivedTotal += received;
if(fwrite(buf, received, 1, responseStorage) != 1){
//error handling
}
memset(buf, '\0', sizeof(buf));
}
fclose(responseStorage);
FILE *responseFile = fopen("response", "r");
char responseArray[receivedTotal];
if(fread(responseArray, receivedTotal, 1, responseFile) == 0){
//error
}
I am calling a ssize_t received = recv(sockfd, buf, BUFSIZ, 0);, recieving data from server, saving how much data I received in receivedTotal += received; and writing that data to with fwrite(buf, received, 1, responseStorage) to my file FILE *responseStorage. After that at the end of the stream loop breaks and I open responseStorage file in r mode, make an array of the size receivedTotal, char responseArray[receivedTotal];, and with fread(responseArray, receivedTotal, 1, responseFile write that data from responeStorage file to responseArray.
Is there a way to write dirrectly to a responseArray? I have to validate response later on so I need it in an array. I know I would have to dynamically alocate the space for array with malloc. I want to avoid using receivedTotal and responseStorage.
You're already reading from your socket into buf, so all you have to do is write buf to a dynamically allocated string rather than responseStorage. Like you say you just have ti handle memory space to fit your response.
The inefficient but really easy way to do this is to reallocate storage every time you read. You can allocate storage for the sum of the previous response reads and the new string in buf, then write both strings to newly allocated space. You know the sum of the length of these strings +1 for the null byte, so you don't have to worry much about available allocated space. This is however pretty expensive because the reads get copied over and over again.
The slightly more complex way to do it would be to allocate one probably fairly large main response string buffer, keep track of its length total allocated space and use strncat to continue to concatenate buf until it's length exceeds the remaining space in response, +1 (for terminating nil byte). When there's not enough room, you can call realloc to obtain more memory. realloc is not very effficient (by C standards) because it is likely to need to allocate different space, copy existing data, and then return a new pointer.
if you wanted to be really clever, you could allocate one big buffer, and send read a pointer at the offset of the next available spot in buffer. You still might need to grow buffer but at least you don't need to copy it. buf then becomes your response array. This is the implementation I'll demonstrate:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(int argc, char**argv){
size_t size = 2;
size_t len = 0;
char* fullbuf = malloc(sizeof(char)*size);
while(1) {
ssize_t b = read(STDIN_FILENO, fullbuf + len, sizeof(char) * (size - len-1) );
if( b < 0) {
perror("Couldn't read from stdin: ");
exit(2);
} else if( b == 0 ){
break;
}
if( b + len + 1 >= size) {
// time to allocate more memory
size = size * 2;
fullbuf = realloc(fullbuf, sizeof(char) * size);
if( fullbuf == NULL ){
fprintf(stderr, "Couldn't allocate %zd more bytes of memory\n", size);
exit(1);
}
}
len += b;
}
fullbuf[len] = '\0'; //terminating null space
printf("%s", fullbuf);
}
For this demonstration I read from stdin instead of a socket, but same idea. I read only as much data as is available in buf. When it's full (but the terminating byte), I double its space. Note that I set fullbuf to the output of realloc - it may be the same address, it may not.
To prove it works, I started at the rather insane 2 byte buffer, and double from there so there's lots of realloc calls. I grabbed 32k of lorem ipsum to use as input.
$ du -h file.txt
32K file.txt
$ shasum -a 256 file.txt
346f2adbd1fdca6bf3b03fb0a4d2fd0030e3363e9a9c5d1e22747e1bcc316e37 file.txt
$ ./t < file.txt | shasum -a 256
346f2adbd1fdca6bf3b03fb0a4d2fd0030e3363e9a9c5d1e22747e1bcc316e37 -
Awesome, if the shasums are the same, that means I outputted file.txt exactly.
In Windows, most efficient way is to use VirtualAlloc, reserving a very large chunk and committing initially one page (4096 bytes). If BUFSIZ is larger, commit more pages to make it fit. Assign the address to responsiveArray. Read the first packet in there. For the next recv calls, keep committing extra pages and adjust the recv buffer address, making sure, that there is least BUFSIZ free bytes available. You will have continuous address space, no fragmentation, no hogging extra memory, no rewriting, and no writing to disk.
I read in MSDN about the send() and recv() function, and there is one thing that I'm not sure I understand.
If I send a buffer of size 256 for example, and receive first 5 bytes, so the next time I call the recv() function, it will point to the 6th byte and get the data from there?
for example :
char buff[256];
memcpy(buff,"hello world",12);
send(sockfd, buffer, 100) //sending 100 bytes
//server side:
char buff[256];
recv(sockfd, buff, 5) // now buffer contains : "Hello"?
recv(socfd, buff,5) // now I ovveride the data and the buffer contains "World"?
thanks!
The correct way to receive into a buffer in a loop from TCP in C is as follows:
char buffer[8192]; // or whatever you like, but best to keep it large
int count = 0;
int total = 0;
while ((count = recv(socket, &buffer[total], sizeof buffer - total, 0)) > 0)
{
total += count;
// At this point the buffer is valid from 0..total-1, if that's enough then process it and break, otherwise continue
}
if (count == -1)
{
perror("recv");
}
else if (count == 0)
{
// EOS on the socket: close it, exit the thread, etc.
}
You have missed the principal detail - what kind of socket is used and what protocol is requested. With TCP, data is octet granulated, and, yes, if 256 bytes was sent and you have read only 5 bytes, rest 251 will wait in socket buffer (assuming buffer is larger, which is true for any non-embedded system) and you can get them on next recv(). With UDP and without MSG_PEEK, rest of a single datagram is lost, but, if MSG_PEEK is specified, next recv() will give the datagram from the very beginning. With SCTP or another "sequential packet" protocol, AFAIK, the same behavior as with UDP is got, but I'm unsure in Windows implementation specifics.
In C I had working code but have no idea why it worked, so I started rewriting it so I could actually understand what is going on.
So far so good! I rewrote and am 90% sure I understand everything that is going on now; the issue however, is that I have no idea how to store the data chunk received by recv (databff) into my pre-allocated buffer (htmlbff).
Consider the following code (note that I stripped this down quite a bit, so it only includes the basics, e.g. no memory reallocation or leak protection, etc...):
#define BUFFERSIZE 4096
#define MAXDATASIZE 256
char *htmlbff, databff[MAXDATASIZE];
int c, i = BUFFERSIZE, q = 0;
if(!(htmlbff = malloc(i)))
{
printf("\nError! Memory allocation failed!");
return 0x00;
}
while((c = recv(sock, databff, MAXDATASIZE, 0)) > 0)
{
/*memory checks stripped out since they are irrelevent for this post*/
/*store data to the appropriate area in htmlbff*/
q += c;
}
So (if I am doing this right, and things are going as I think they are) c is the size of the current data chunk, and q is the total amount of data received so far (q is incremented by c each time the loop repeats). At the moment I am using q for memory handling (in case anybody was wondering) but I believe that it will also have purpose in the solution to this problem.
At any rate the question I am asking is in regards to the second comment. How do I store the data from recv into htmlbff correctly?
Use memcpy() to copy (append) data to the htmlbff but you also need to ensure you do not exceed the size of htmlbff. Either stop receving data when BUFFERSIZE bytes have been received or use realloc() to extend htmlbff to contain more data.
For example:
char* htmlbff;
size_t htmlbff_size = BUFFERSIZE;
htmlbff = malloc(htmlbff_size);
if (htmlbff)
{
while((c = recv(sock, databff, MAXDATASIZE, 0)) > 0)
{
if (c + q > htmlbff_size)
{
htmlbff_size *= 2; /* Arbitrary doubling of size. */
char* tmp = realloc(htmlbff, htmlbff_size);
if (tmp)
{
htmlbff = tmp;
}
else
{
/* memory allocation failure. */
free(htmlbff);
htmlbff = 0;
break;
}
}
memcpy(htmlbff + q, databff, c);
q += c;
}
}
Use memcpy, and offset htmlbff by q:
memcpy(htmlbff + q, databff, c);
You can similarly recv directly into htmlbff:
c = recv(sock, htmlbff + q, MAXDATASIZE, 0));
But it's fine to keep a separate buffer, and depending upon your full code, it may make things clearer.
Be sure that you add checks against BUFFERSIZE so that you don't copy past the bounds of htmlbff. You mentioned that you've stripped out realloc handling, so maybe you're already handling this.
Your constant names are a bit confusing, when buffering data I would use BUFFERSIZE to indicate the size of each chunk, i.e. the size of databff.
What I would do is recv() data directly into htmlbff, unless you need to do more processing on it.
Make sure that you realloc() htmlbff when i - q is less than MAXDATASIZE so that there is always room for another recv().
Then you would call recv(sock, htmlbff + q, MAXDATASIZE, 0)
You need to keep reallocating/expanding the buffer to fit all the data (if the data read off the socket exceeds MAXDATASIZE) = That way as recv reads data into the databff, your htmlbff can grow in memory and then the new read can be appended to your overall htmlbff.
q and c are like cursors to keep track of where you are up to and how far you have to go.
memcpy(htmlbff+q, databff, c); //Do this in your whle loop to append the data
Hi i have written a server application which accepts a name from the client which usually is a file name.It opens the file ,reads the contents into a buffer and then transmits the buffer over the ethernet using send().But the problem arises in the client side where all the bytes are not received successfully.I receive only a part of what i send.
For your reference ,here's the code snippet for the server side:
Server:
fp = fopen(filename,"r+");
strcpy(str,"");
fseek(fp, 0L, SEEK_END);
size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fread(str, size, 1,fp);
fclose(fp);
printf("Size of the file is : %d\n",size);
sprintf(filename, "%d", size);
n = send(nsd, filename, strlen(filename), 0);
while(size > 0){
n = send(nsd, str, strlen(str), 0);
printf("%d bytes sent successfully\n",n);
if(n == 0) break;
sentbytes = sentbytes + n;
size = size - sentbytes;
}
Please help me with writing the client app.I am currently confused about how to go about writing it.Shall i place the recv() part in a while(1) loop so that the client keeps running until all the bytes have been received successfully?
EDITED
For starters, you could both read from the file and write to the socket in chunks at the same time.
Since, you are transferring data over TCP, remember that data is transferred reliably as a stream and not as messages. So, don't make assumptions about how the data is recv'd except for the order.
Here is how it could be written:
open socket
open file
size_of_file = read_file_size(file);
send(socket, &size_of_file, sizeof(int), ...)
while (all is written)
read fixed chunk from file
write as much was read to the socket
cleanup // close file, socket
As for the recv part, I think it is best you send the file size over as an integer and keep reading in a while loop until you have recv'd as many bytes as you are sending from the server.
It's like this:
recv(socket, &size_of_msg, sizeof(int), ...)
while(all is read)
read fixed chunk from the socket
cleanup
Well I see atleast some issue with the way you are sending message over socket.
First from the man page of fread:
The function fread() reads nmemb elements of data, each size bytes
long, from the stream pointed to by stream, storing them at the loca-
tion given by ptr.
and what you are trying is this:
fread(str, size, 1,fp);
I assume what you meant was
fread(str, 1,size,fp);
Though it shold not casue the issue.
But the problem lies here:
n = send(nsd, str, strlen(str), 0);
printf("%d bytes sent successfully\n",n);
if(n == 0) break;
sentbytes = sentbytes + n;
size = size - sentbytes;
Though you are decreasing 'size' by decreasing by number of bytes successfully send, where are you extending str to point to new buffer location where data will be send.This will only resend initial bytes of the buffer repeatedly.
str += n; //Assuming str is char*
will solve your issue.
Using strlen doesn't seem appropriate. You've read the file, you know how long it is, so why do strlen? Either you'll just get the same result (so it's redundant) or you'll get something else (so it's a bug).
"Shall i place the recv() part in a while(1) loop so that the client keeps running until all the bytes have been received successfully?"
Something like that. Never presume that a recv() call got everything that was sent -- tcp/ip breaks messages into packets at a lower level, and recv() will return after reading whatever amount of data has actually been received at whatever point. You don't have to worry about that directly, except in so far as you do need to use some kind of protocol to indicate how long a message is so the receiver knows how much to read, then eg.:
char buffer[4096];
int msgsz = 600, // see below
sofar = 0,
cur;
while (sofar < msgsz) {
cur = recv (
socket_fd,
&buffer[sofar],
msgsz - sofar,
0
);
if (cur == -1) {
// error
break;
} else if (cur == 0) {
// disconnected
break;
}
sofar += cur;
}
WRT msgsz, you would include this somewhere in a fixed length header, which is read first. A simple version of that might be just 4 bytes containing a uint32_t, ie, an int with the length. You could also use a null terminated string with a number in it, but that means reading until '\0' is found.