In C I had working code but have no idea why it worked, so I started rewriting it so I could actually understand what is going on.
So far so good! I rewrote and am 90% sure I understand everything that is going on now; the issue however, is that I have no idea how to store the data chunk received by recv (databff) into my pre-allocated buffer (htmlbff).
Consider the following code (note that I stripped this down quite a bit, so it only includes the basics, e.g. no memory reallocation or leak protection, etc...):
#define BUFFERSIZE 4096
#define MAXDATASIZE 256
char *htmlbff, databff[MAXDATASIZE];
int c, i = BUFFERSIZE, q = 0;
if(!(htmlbff = malloc(i)))
{
printf("\nError! Memory allocation failed!");
return 0x00;
}
while((c = recv(sock, databff, MAXDATASIZE, 0)) > 0)
{
/*memory checks stripped out since they are irrelevent for this post*/
/*store data to the appropriate area in htmlbff*/
q += c;
}
So (if I am doing this right, and things are going as I think they are) c is the size of the current data chunk, and q is the total amount of data received so far (q is incremented by c each time the loop repeats). At the moment I am using q for memory handling (in case anybody was wondering) but I believe that it will also have purpose in the solution to this problem.
At any rate the question I am asking is in regards to the second comment. How do I store the data from recv into htmlbff correctly?
Use memcpy() to copy (append) data to the htmlbff but you also need to ensure you do not exceed the size of htmlbff. Either stop receving data when BUFFERSIZE bytes have been received or use realloc() to extend htmlbff to contain more data.
For example:
char* htmlbff;
size_t htmlbff_size = BUFFERSIZE;
htmlbff = malloc(htmlbff_size);
if (htmlbff)
{
while((c = recv(sock, databff, MAXDATASIZE, 0)) > 0)
{
if (c + q > htmlbff_size)
{
htmlbff_size *= 2; /* Arbitrary doubling of size. */
char* tmp = realloc(htmlbff, htmlbff_size);
if (tmp)
{
htmlbff = tmp;
}
else
{
/* memory allocation failure. */
free(htmlbff);
htmlbff = 0;
break;
}
}
memcpy(htmlbff + q, databff, c);
q += c;
}
}
Use memcpy, and offset htmlbff by q:
memcpy(htmlbff + q, databff, c);
You can similarly recv directly into htmlbff:
c = recv(sock, htmlbff + q, MAXDATASIZE, 0));
But it's fine to keep a separate buffer, and depending upon your full code, it may make things clearer.
Be sure that you add checks against BUFFERSIZE so that you don't copy past the bounds of htmlbff. You mentioned that you've stripped out realloc handling, so maybe you're already handling this.
Your constant names are a bit confusing, when buffering data I would use BUFFERSIZE to indicate the size of each chunk, i.e. the size of databff.
What I would do is recv() data directly into htmlbff, unless you need to do more processing on it.
Make sure that you realloc() htmlbff when i - q is less than MAXDATASIZE so that there is always room for another recv().
Then you would call recv(sock, htmlbff + q, MAXDATASIZE, 0)
You need to keep reallocating/expanding the buffer to fit all the data (if the data read off the socket exceeds MAXDATASIZE) = That way as recv reads data into the databff, your htmlbff can grow in memory and then the new read can be appended to your overall htmlbff.
q and c are like cursors to keep track of where you are up to and how far you have to go.
memcpy(htmlbff+q, databff, c); //Do this in your whle loop to append the data
Related
I know that read/write C functions from <unistd.h> are not guaranteed to read/write exactly N bytes as requested by size_t nbyte argument (especially for sockets).
How to read/write full buffer from/to a file(or socket) descriptor?
That read() and write() do not guarantee to transfer the full number of bytes requested is a feature, not a shortcoming. If that feature gets in your way in a particular application then it is probably better to use the the existing facilities of the standard library to deal with it than to roll your own (though I certainly have rolled my own from time to time).
Specifically, if you have a file descriptor on which you want to always transfer exact numbers of bytes then you should consider using fdopen() to wrap it in a stream and then performing I/O with fread() and fwrite(). You might also use setvbuf() to avoid having an intermediary buffer. As a possible bonus, you can then also use other stream functions with that, such as fgets() and fprintf().
Example:
int my_fd = open_some_resource();
// if (my_fd < 0) ...
FILE *my_file = fdopen(my_fd, "r+b");
// if (my_file == NULL) ...
int rval = setvbuf(my_file, NULL, _IONBF, 0);
// if (rval != 0) ...
Note that it is probably best to thereafter use only the stream, not the underlying file descriptor, and that is the main drawback of this approach. On the other hand, you can probably allow the FD to be lost, because closing the stream will also close the underlying FD.
Nothing particularly special is required to make fread() and fwrite() to transfer full-buffer units (or fail):
char buffer[BUF_SIZE];
size_t blocks = fread(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...
// ...
blocks = fwrite(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...
Do note that you must get the order of the second and third arguments right, however. The second is the transfer unit size, and the third is the number of units to transfer. Partial units will not be transferred unless an error or end-of-file occurs. Specifying the transfer unit as the full number of bytes you want to transfer and asking (therefore) for exactly one unit is what achieves the semantics you ask about.
You use a loop.
For example, with proper error checking:
/** Read a specific number of bytes from a file or socket descriptor
* #param fd Descriptor
* #param dst Buffer to read data into
* #param minbytes Minimum number of bytes to read
* #param maxbytes Maximum number of bytes to read
* #return Exact number of bytes read.
* errno is always set by this call.
* It will be set to zero if an acceptable number of bytes was read.
* If there was
and to nonzero otherwise.
* If there was not enough data to read, errno == ENODATA.
*/
size_t read_range(const int fd, void *const dst, const size_t minbytes, const size_t maxbytes)
{
if (fd == -1) {
errno = EBADF;
return 0;
} else
if (!dst || minbytes > maxbytes) {
errno = EINVAL;
return 0;
}
char *buf = (char *)dst;
char *const end = (char *)dst + minbytes;
char *const lim = (char *)dst + maxbytes;
while (buf < end) {
ssize_t n = read(fd, buf, (size_t)(lim - buf));
if (n > 0) {
buf += n;
} else
if (n == 0) {
/* Premature end of input */
errno = ENODATA; /* Example only; use what you deem best */
return (size_t)(buf - (char *)dst);
} else
if (n != -1) {
/* C library or kernel bug */
errno = EIO;
return (size_t)(buf - (char *)dst);
} else {
/* Error, interrupted by signal delivery, or nonblocking I/O would block. */
return (size_t)(buf - (char *)dst);
}
}
/* At least minbytes, up to maxbytes received. */
errno = 0;
return (size_t)(buf - (char *)dst);
}
Some do find it odd that it clears errno to zero on successful calls, but it is perfectly acceptable in both standard and POSIX C.
Here, it means that typical use cases are simple and robust. For example,
struct message msgs[MAX_MSGS];
size_t bytes = read_range(fd, msgs, sizeof msgs[0], sizeof msgs);
if (errno) {
/* Oops, things did not go as we expected. Deal with it.
If bytes > 0, we do have that many bytes in msgs[].
*/
} else {
/* We have bytes bytes in msgs.
bytes >= sizeof msgs[0] and bytes <= sizeof msgs.
*/
}
If you have a pattern where you have fixed or variable sized messages, and a function that consumes them one by one, do not assume that the best option is to try and read exactly one message at a time, because it is not.
This is also why the above example has minbytes and maxbytes instead of a single exactly_this_many_bytes parameter.
A much better pattern is to have a larger buffer, where you memmove() the data only when you have to (because you're running out of room, or because the next message is not sufficiently aligned).
For example, let's say you have a stream socket or file descriptor, where each incoming message consists of a three byte header: the first byte identifies the message type, and the next two bytes (say, less significant byte first) identify the number of data payload bytes associated with the message. This means that the maximum total length of a message is 1+2+65535 = 65538 bytes.
For efficiently receiving the messages, you'll use a dynamically allocated buffer. The buffer size is a software engineering question, and other than that it has to be at least 65538 bytes, its size – and even whether it should grow and shrink dynamically – depends on the situation. So, we'll just assume that we have unsigned char *data; pointing to a buffer of size size_t size; already allocated.
The loop itself could look something like the following:
size_t head = 0; /* Offset to current message */
size_t tail = 0; /* Offset to first unused byte in buffer */
size_t mlen = 0; /* Total length of the current message; 0 is "unknown"*/
while (1) {
/* Message processing loop. */
while (head + 3 <= tail) {
/* Verify we know the total length of the message
that starts at offset head. */
if (!mlen)
mlen = 3 + (size_t)(data[head + 1])
+ (size_t)(data[head + 2]) << 8;
/* If that message is not yet complete, we cannot process it. */
if (head + mlen > tail)
break;
/* type datalen, pointer to data */
handle_message(data[head], mlen - 3, data + head + 3);
/* Skip message in buffer. */
head += mlen;
/* Since we do not know the length of the next message,
or rather, the current message starting at head,
we do need to reset mlen to "unknown", 0. */
mlen = 0;
}
/* At this point, the buffer contains less than one full message.
Whether it is better to always move a partial leftover message
to the beginning of the buffer, or only do so if the buffer
is full, depends on the workload and buffer size.
The following one may look complex, but it is actually simple.
If the current start of the buffer is past the halfway mark,
or there is no more room at the end of the buffer, we do the move.
Only if the current message starts in the initial half, and
when there is room at the end of the buffer, we leave it be.
But first: If we have no data in the buffer, it is always best
to start filling it from the beginning.
*/
if (head >= tail) {
head = 0;
tail = 0;
} else
if (head >= size/2 || tail >= size) {
memmove(data, data + head, tail - head);
tail -= head;
head = 0;
}
/* We do not have a complete message, but there
is room in the buffer (assuming size >= 65538),
we need to now read more data into the buffer. */
ssize_t n = read(sourcefd, data + tail, size - tail);
if (n > 0) {
tail += n;
/* Check if it completed one or more messages. */
continue;
} else
if (n == 0) {
/* End of input. If buffer is empty, that's okay. */
if (head >= tail)
break;
/* Ouch: We have partial message in the buffer,
but there will be no more incoming data! */
ISSUE_WARNING("Discarding %zu byte partial message due to end of input.\n", tail - head);
break;
} else
if (n != -1) {
/* This should not happen. If it does, it is a C library
or kernel bug. We treat it as fatal. */
ISSUE_ERROR("read() returned %zd; dropping connection.\n", n);
break;
} else
if (errno != EINTR) {
/* Everything except EINTR indicates an error to us; we do
assume that sourcefd is blocking (not nonblocking). */
ISSUE_ERROR("read() failed with errno %d (%s); dropping connection.\n", errno, strerror(errno));
break;
}
/* The case n == -1, errno == EINTR usually occurs when a signal
was delivered to a handler using this thread, and that handler
was installed without SA_RESTART. Depending on what kind of
a device or socket sourcefd is, there could be additional cases;
but in general, it just means "something unrelated happened,
but you were to be notified about it, so EINTR you get".
Simply put, EINTR is not really an error, just like
EWOULDBLOCK/EAGAIN is not an error for nonblocking descriptors,
they're just easiest to treat as an "error-like situation" in C.
*/
}
/* close(sourcefd); */
Note how the loop does not actually try to read any specific amount of data? It just reads as much as it can, and processes it as it goes.
Could one read such messages precisely, by first reading exactly the three-byte header, then exactly the data payload? Sure, but that means you make an awful amount of syscalls; at minimum two per message. If the messages are common, you probably do not want to do that because of the syscall overhead.
Could one use the available buffer more carefully, and remove the type and data payload length from the next message in the buffer as soon as possible? Well, that is the sort of question one should discuss with colleagues or developers having written such code before. There are positives (mainly, you save three bytes), and negatives (added code complexity, which always makes code harder to maintain long term, and risks introducing bugs). On a microcontroller with just 128 bytes of buffer for incoming command messages, I probably would do that; but not on a desktop or server that prefers a few hundred kilobytes to a couple of megabytes of buffer for such code (since the memory "waste" is often covered by the smaller number of syscalls especially when processing lots of messages). No quick answers! :)-
Both read and write on success return ssize_t containing amount of bytes read/written. You can use it to construct a loop:
A reliable read():
ssize_t readall(int fd, void *buff, size_t nbyte) {
size_t nread = 0; size_t res = 0;
while (nread < nbyte) {
res = read(fd, buff+nread, nbyte-nread);
if (res == 0) break;
if (res == -1) return -1;
nread += res;
}
return nread;
}
A reliable write() (almost same):
ssize_t writeall(int fd, void *buff, size_t nbyte) {
size_t nwrote = 0; size_t res = 0;
while (nwrote < nbyte) {
res = write(fd, buff+nwrote, nbyte-nwrote);
if (res == 0) break;
if (res == -1) return -1;
nwrote += res;
}
return nwrote;
}
Basically it reads/writes until total amount of bytes != nbyte.
Please note, this answer uses only <unistd.h> functions, assuming there is a reason to use it. If you can use <stdio.h> too, see answer by John Bollinger, which uses fdopen;setvbuf and then fread/fwrite. Also, take a look at answer by Blabbo is Verbose for read_range function with a lot of features.
Here is my code:
FILE *responseStorage = fopen("response", "w");
if(responseStorage == NULL){
//error handling
}
int receivedTotal = 0;
while(1){
ssize_t received = recv(sockfd, buf, BUFSIZ, 0);
if( received == -1 ){
//error handling
}
if( received == 0 ){
//end of stream
break;
}
receivedTotal += received;
if(fwrite(buf, received, 1, responseStorage) != 1){
//error handling
}
memset(buf, '\0', sizeof(buf));
}
fclose(responseStorage);
FILE *responseFile = fopen("response", "r");
char responseArray[receivedTotal];
if(fread(responseArray, receivedTotal, 1, responseFile) == 0){
//error
}
I am calling a ssize_t received = recv(sockfd, buf, BUFSIZ, 0);, recieving data from server, saving how much data I received in receivedTotal += received; and writing that data to with fwrite(buf, received, 1, responseStorage) to my file FILE *responseStorage. After that at the end of the stream loop breaks and I open responseStorage file in r mode, make an array of the size receivedTotal, char responseArray[receivedTotal];, and with fread(responseArray, receivedTotal, 1, responseFile write that data from responeStorage file to responseArray.
Is there a way to write dirrectly to a responseArray? I have to validate response later on so I need it in an array. I know I would have to dynamically alocate the space for array with malloc. I want to avoid using receivedTotal and responseStorage.
You're already reading from your socket into buf, so all you have to do is write buf to a dynamically allocated string rather than responseStorage. Like you say you just have ti handle memory space to fit your response.
The inefficient but really easy way to do this is to reallocate storage every time you read. You can allocate storage for the sum of the previous response reads and the new string in buf, then write both strings to newly allocated space. You know the sum of the length of these strings +1 for the null byte, so you don't have to worry much about available allocated space. This is however pretty expensive because the reads get copied over and over again.
The slightly more complex way to do it would be to allocate one probably fairly large main response string buffer, keep track of its length total allocated space and use strncat to continue to concatenate buf until it's length exceeds the remaining space in response, +1 (for terminating nil byte). When there's not enough room, you can call realloc to obtain more memory. realloc is not very effficient (by C standards) because it is likely to need to allocate different space, copy existing data, and then return a new pointer.
if you wanted to be really clever, you could allocate one big buffer, and send read a pointer at the offset of the next available spot in buffer. You still might need to grow buffer but at least you don't need to copy it. buf then becomes your response array. This is the implementation I'll demonstrate:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(int argc, char**argv){
size_t size = 2;
size_t len = 0;
char* fullbuf = malloc(sizeof(char)*size);
while(1) {
ssize_t b = read(STDIN_FILENO, fullbuf + len, sizeof(char) * (size - len-1) );
if( b < 0) {
perror("Couldn't read from stdin: ");
exit(2);
} else if( b == 0 ){
break;
}
if( b + len + 1 >= size) {
// time to allocate more memory
size = size * 2;
fullbuf = realloc(fullbuf, sizeof(char) * size);
if( fullbuf == NULL ){
fprintf(stderr, "Couldn't allocate %zd more bytes of memory\n", size);
exit(1);
}
}
len += b;
}
fullbuf[len] = '\0'; //terminating null space
printf("%s", fullbuf);
}
For this demonstration I read from stdin instead of a socket, but same idea. I read only as much data as is available in buf. When it's full (but the terminating byte), I double its space. Note that I set fullbuf to the output of realloc - it may be the same address, it may not.
To prove it works, I started at the rather insane 2 byte buffer, and double from there so there's lots of realloc calls. I grabbed 32k of lorem ipsum to use as input.
$ du -h file.txt
32K file.txt
$ shasum -a 256 file.txt
346f2adbd1fdca6bf3b03fb0a4d2fd0030e3363e9a9c5d1e22747e1bcc316e37 file.txt
$ ./t < file.txt | shasum -a 256
346f2adbd1fdca6bf3b03fb0a4d2fd0030e3363e9a9c5d1e22747e1bcc316e37 -
Awesome, if the shasums are the same, that means I outputted file.txt exactly.
In Windows, most efficient way is to use VirtualAlloc, reserving a very large chunk and committing initially one page (4096 bytes). If BUFSIZ is larger, commit more pages to make it fit. Assign the address to responsiveArray. Read the first packet in there. For the next recv calls, keep committing extra pages and adjust the recv buffer address, making sure, that there is least BUFSIZ free bytes available. You will have continuous address space, no fragmentation, no hogging extra memory, no rewriting, and no writing to disk.
I have a TCP server running on a platform which reads bytes from a buffer and sends all the bytes over the network in one send call:
send(sSocket.client_sock, base, frb_size, 0);
frb_size is 2_228_224 bytes in size.
On the receiving end I am trying to buffer the data:
while(1) {
while(total != size) {
r = recv(my_socket->sock, buf, 8192-total, 0);
memcpy(buffer+total, buf, r);
total += r;
}
//some code dealing with manipulating the buffer with SDL
}
Where I have a smaller buffer buf that has a size of 8192. When read it uses memcpy to place it into the proper position inside buffer which is size 2_228_224.
My problem is that after the first iteration all proceeding iterations return r being 0 meaning the socket has been closed as per documentation. What is also weird is that no matter what sized buf I use, it always returns the full number of bytes on the first iteration. For example 8192 bytes would be returned in the above code, if I change the sizes to 65507 it will return 65507, but if i change it to 2_228_224 it will never return the full buffer.
Meanwhile when I do:
while(1) {
r = recv(my_socket->sock, buffer, size, 0);
//some code dealing with manipulating the buffer with SDL
}
Where size is the size of buffer (2_228_224). r never returns 0 when debugging, but it also never has the full number of bytes that make up the sent input.
Am I doing something wrong with the socket API on windows? Is there a way to make Winsock sockets block until all number of bytes are received?
Thanks.
As noted in the comments, using 8192-total isn't right. To read up to size bytes exactly, use something like this:
while(total < size) {
int bytes = size - total;
if (bytes > 8192) bytes = 8192;
r = recv(my_socket->sock, buf, bytes, 0);
if (r <= 0) {
/* handle this! */
break;
}
memcpy(buffer+total, buf, r);
total += r;
}
At each iteration, it attempts to read the number of bytes left out of the total but capped at the size of the input buffer.
It's important to handle r <= 0 and break out of the loop to avoid an infinite loop (if r == 0 repeatedly) or a segfault (if r == -1 repeatedly pushes total negative).
I'm still refining my C coding skills and keep running into issues with properly managing memory--go figure. Anyhow, I'm reading from a socket and I'm fine as long as my total response length from the socket it no bigger than my buffer size. I know this because when I increase the buffer size large enough for the incoming data, it works just fine for the larger payloads. Creating a really large "just-in-case" buffer on the stack isn't feasible, obviously, so I want to grow the buffer dynamically on the heap. Here's what I'm doing currently:
raw_response = NULL;
// Receive from the Web server
retcode = recv(server_s, in_buf, BUF_SIZE, 0);
while ((retcode > 0) || (retcode == -1))
{
totalLength += retcode;
if (raw_response == NULL) {
raw_response = (char*)malloc(sizeof(char)*totalLength);
memcpy(raw_response, in_buf, totalLength);
} else {
raw_response = (char*)realloc(raw_response, sizeof(char)*totalLength);
memcpy(raw_response+previousLength, in_buf, retcode);
}
previousLength = retcode;
retcode = recv(server_s, in_buf, BUF_SIZE, 0);
if (retcode == 0 || retcode == -1) {
printf("\n\nNo more data, bailing. Data length was: %lu\n\n", totalLength);
}
}
If the raw_response is NULL, I know I have not received any data yet, so I use malloc. Otherwise, I use realloc so that I don't have to build up a new buffer. Instead I can just append the incoming data. So to get the end of the existing data after the first iteration, I take the address of raw_response and add the previous length to that and append the new data there assuming it's correctly appending on each subsequent call to recv().
The problem is that my final buffer is always corrupted unless I change BUF_SIZE to something larger than my total incoming data size.
Seem like it's probably just something simple I'm overlooking. Any thoughts?
The problem is these lines:
memcpy(raw_response+previousLength, in_buf, retcode);
previousLength = retcode;
Your function will work for the first and second iterations but after that will start corrupting data. I assume you meant to write previousLength += retcode;
There are some other problems with the code which aren't the answer to your question. Firstly, what happens if realloc or malloc fails? You don't check for this in your little sample. Also, you can always just use realloc (which will act like malloc if the pointer is NULL see this SO question). i.e.
char *tmp = realloc(raw_response, sizeof(*tmp) * totalLength);
if (tmp == NULL)
return -ENOMEM;
raw_response = tmp;
memcpy(raw_response + previousLength, in_buf, ret_code)
Secondly, you might call memcpy when ret_code is -1 (also changing totalLength by -1 which will again cause problems).
I am developing a client that needs to parse Chunked-type HTTP transfers. I've beat my head against the wall trying to figure out the error with the following, and would appreciate it if someone might be able to catch my error a bit quicker. To sum up the issue: it seems as though, the client does not receive ALL of the chunk, thereby screwing up the rest of the process. Thanks in advance!
while(cflag){
pfile_chunk = malloc(CHUNK_SIZE+1);
memset(pfile_chunk, 0, CHUNK_SIZE);
cPtr = pfile_chunk;
cPtr2 = NULL;
k=0;
while(*(cPtr-1) != '\n'){
k++;
recv(sock, cPtr, 1, 0);
cPtr = pfile_chunk+k;
}
cPtr2 = strchr(pfile_chunk, '\r');
*cPtr2 = '\0';
sscanf(pfile_chunk, "%x", &l);
if(l == 0)
break;
printf("\nServer wants to deliver %ld bytes.\n", l);
pfile_chunk = realloc(pfile_chunk, l+1);
memset(pfile_chunk, 0, l);
recv(sock, pfile_chunk, l, 0);
fputs(pfile_chunk, f);
printf("GOT THIS, SIZE %ld:\n%s\n", strlen(pfile_chunk), pfile_chunk);
//get next \r\n bytes.
recv(sock, NULL, 2, 0);
}
At the very least, you should check the return value of recv to see if you are getting the number of bytes you are expecting to get.
A short read is definitely possible on the network, since the system call will return whatever is available in the socket receive buffer at the time you make the call.
Implement a loop until you have read in your entire chunk, or pass the MSG_WAITALL flag to recv in the last parameter. However, you still need to check for an error from recv.
ssize_t r = recv(sock, pfile_chunk, l, MSG_WAITALL);
if (r < l) {
/* check for errors ... */
} else {
/* got the data */
}
It looks as though your very first dereference for the check in your while loop will access before the beginning of your array, which is likely not to be desired behavior. Hopefully, that memory location usually won't contain \n. That could mess up your read. I expect it probably contains some information to do with your malloc, which is unlikely to be \n, so you might never see a problem from that.
Also, hopefully you can trust the other end of the socket not to send more than CHUNK_SIZE+1 before they give you a \n. Otherwise, it could seg-fault out. Normally, though, I would expect a sender to just send 10 or fewer ASCII numeric characters and a CRLF for a chunk header anyways, but they could theoretically send a bunch of long chunk extension header fields with it.
Apart from that, there's just the more important issue already found by user315052 that you should either tell the recv method to wait for all the data you requested, or check how much data it actually read.