Reading Data in C From Socket Until End Character - c

EDIT: It has been proven in the comments that defining the length instead should produce the same results and would not use any significant extra data. If you are looking for a way to send data between machines running your program(s), sending the length is better than reading until a terminating character. BonzaiThePenguin has some very good points you should look at.
But for educational purposes: I never found good example code that does this for standard C sockets that handles situations where the data is not all received in one packet, or multiple separate messages are contained within one packet. Simply calling recv repeatedly will not work in all cases.
This is one of those questions where I've answered it myself below, but I'm not 100% confident in my response.

It isn't 'dangerous to allow the client to specify the size of the message it is sending'. Most of the protocols in the word do that, including HTTP and SSL. It's only dangerous when implementations don't bounds-check messages properly.
The fatal flaw with your suggestion is that it doesn't work for binary data: you have to introduce an escape character so that the terminating character can appear within a message, and then of course you also need to escape the escape. All this adds processing and data copying at both ends.

Here is what I came up with. I cannot guarantee that this is perfect because I am not a professional, so if there are any mistakes, I (and anyone else looking for help) would greatly appreciate it if someone would point them out.
Context: socket is the socket, buffer is the array that stores all network input, line is the array that stores just one message extracted from buffer (which is what the rest of your program uses), length is the length of both inputted arrays, and recvLength is a pointer to an integer stored outside of the function that is meant to be 0 initially and should not be freed or modified by anything else. That is, it should persist across multiple calls to this function on the same socket. This function returns the length of the data outputted in the line array.
size_t recv_line(int socket, char* buffer, char* line, size_t length, size_t* recvLength){ //receives until '\4' (EOT character) or '\0' (null character)
size_t readHead = 0;
size_t lineIndex = 0;
char currentChar = 0;
while (1){
for (; readHead < *recvLength; readHead = readHead + 1){
currentChar = buffer[readHead];
if (currentChar=='\4' || currentChar=='\0'){ //replace with the end character(s) of your choice
if (DEBUG) printf("Received message===\n%s\n===of length %ld\n", line, lineIndex+1);
memcpy(buffer, buffer + readHead + 1, length-(readHead)); //shift the buffer down
*recvLength -= (readHead + 1); //without the +1, I had an "off by 1" error before!
return lineIndex+1; //success
}
if (readHead >= length){
if (DEBUG) printf("Client tried to overflow the input buffer. Disconnecting client.\n");
*recvLength = 0;
return 0;
}
line[lineIndex] = currentChar;
lineIndex++;
}
*recvLength = recv(socket, buffer + readHead, length, 0);
}
printf("Unknown error in recv_line!\n");
return 0;
}
Simple example usage:
int function_listening_to_network_input(int socket){
char netBuffer[2048];
char lineBuffer[2048];
size_t recvLength = 0;
while (1==1){
size_t length = recv_line(socket, netBuffer, lineBuffer, 2048, &recvLength);
// handle it…
}
return 0;
}
Note that this does not always leave line as a null-terminated string. If you want it to, it's easy to modify.

Related

Segmentation fault strcat

I got a problem when reading SSL response that causes a segmentation fault. I read the response into a buffer, then append it to a malloced string and memory reset it to 0 till the response is fully read, but when I try this in a multi threaded program, after some operations it gives me segmentation fault. When I remove strcat it doesn't give me segmentation fault even if I run it for hours.
Example:
char* response = malloc(10000);
char buffer[10000] = { 0 };
while(SSL_read(ssl,buf,sizeof(buffer)) > 0){
strcat(response,buffer);
memset(buffer,0,sizeof(buffer));
}
Errors
free(): invalid next size (normal)
malloc_consolidate(): invalid chunk
I made sure of freeing both of SSL and CTX and close socket and free the malloced string.
There are a few problems with your code:
You are not dealing with C strings, you are dealing with arbitrary byte sequences. SSL_read() reads bytes, not C strings, and you cannot treat them as strings. What you read cannot be assumed to be NUL-terminated (\0), so you should not use strcat, strlen or other similar functions that operate on strings. Zeroing out the entire buffers just to make sure there is a terminator makes little to no sense, as the terminator could very well be found in the middle of the data.
You are reading data continuously in a loop into a fixed size buffer. Your code will overflow the destination buffer (response) very easily.
Not an error, but there isn't really any need for an intermediate buffer to begin with. You are needlessly copying stuff around two times (one with SSL_read() and one with strcat) when you can read directly into response instead. On top of that, the memset() to clear the contents of buffer also adds a third scan of the data, slowing things down even more.
Again, not an error, but SSL_read() returns int and uses that to return the read size. You are not really using it, but you should, as you need to keep track of how much space is left on the buffer. You would be however much better off using size_t to avoid unwanted problems with signed math and possible overflows. You can use SSL_read_ex() for this purpose.
Here's a snippet of code that does what you want in a more robust way:
#define CHUNK_SIZE 10000
unsigned char *response = NULL;
size_t size = 0;
size_t space_left = 0;
size_t total_read = 0;
size_t n;
while (1) {
// Allocate more memory if needed.
if (space_left < CHUNK_SIZE) {
unsigned char *tmp = realloc(response, size + CHUNK_SIZE);
if (!tmp) {
// Handle realloc error
break;
}
response = tmp;
size += CHUNK_SIZE;
space_left += CHUNK_SIZE;
}
if (SSL_read_ex(ssl, response + total_read, space_left, &n)) {
total_read += n;
space_left -= n;
} else {
// Handle error
break;
}
}
You never initialized response(). The arguments to strcat() have to be null-terminated strings.
You should also subtract 1 from the size of the buffer when calling SSL_read(), to ensure there will always be room for its null terminator.
char* response = malloc(10000);
response[0] = '\0';
char buffer[10000] = { 0 };
while(SSL_read(ssl,buf,sizeof(buffer)-1) > 0){
strcat(response,buffer);
memset(buffer,0,sizeof(buffer));
}

Two times send and recv are not working C

I was running into a problem i couldnt really solve so I restarted.
I had a problem with Data encapsulation or more specific with no encapsulation. So after I figured out, that encapsulation is useful, I started rewriting the code.
Now I run into a different Problem. Somehow my send and recv calls are not working as I want them to be.
Here is the part where I send:
char to_send[] = "hello. I am the Data.";
// get size of data
int len = strlen(to_send);
char slen[len];
sprintf(slen,"%d",len);
printf("%s\n",slen);
// send size of data
if(send(comm_fd,slen,len,0)<0){perror("Error on send"); exit(1);}
// send data
if(send(comm_fd,to_send,len,0)<0){perror("Error on send"); exit(1);}
And here Part where I recv:
// getting size of bytes to recv
char buf[1000];
bzero(buf,1000);
int rec = recv(comm_fd, buf, 100,0);
printf("rec\n: %i",rec);
printf("buf\n: %s\n", buf);
int buffsize;
buffsize = atoi(buf);
bzero(buf,1000);
printf("buffsize: %i\n",buffsize);
// recv the bytes
bzero(buf,1000);
rec = recv(comm_fd, buf, buffsize,0);
printf("rec\n: %i",rec);
printf("%s",buf);
So my problem now is: I can recv the size of the next Data and print it. But the Data itself is not showing up.
Can someone help me? I think I'm doing major things wrong (I'm new to C and to Network programming)
Thanks in advance
Two things with that first send call:
if(send(comm_fd,slen,len,0)<0){perror("Error on send"); exit(1);}
Here you send len number of bytes, but len is the length of to_send and not the length of slen. You will most likely send data from outside the initialized parts of slen which leads to undefined behavior
The second problem is that you send the length of to_send as a variable-length string, so the received doesn't actually know how much to receive. In your case you could actually (and probably do) receive the length and the string in a single recv call. At least if you're using TCP (streaming) sockets.
Both of these problems can be solved by making slen a fixed-size array, big enough to hold the largest numbers you can think of (ten digits is usually enough), and then send this fixed-length array using sizeof slen .
Perhaps something like this:
// Ten digits, plus string terminator
char slen[10 + 1];
// Prefix length with zeroes, and don't overflow the buffer
snprintf(slen, sizeof(slen), "%010d", strlen(to_send));
// Send the whole array, including terminator
send(comm_fd, slen, sizeof slen, 0);
Then on the receiving side, you could do
// Ten digits, plus string terminator
char slen[10 + 1];
// Receive the whole string, including terminator
recv(comm_fd, slen, sizeof(slen), 0);
// Convert to a number
size_t len = strtoul(slen, NULL, 10);
// Now receive `len` bytes
Note that I have no error checking, which you should have.

Scanning string with length restriction

Using the standard C library, is there a way to scan a string (containing no whitespace) from standard input only if it fits in a buffer? In the following example I would like scanCount to be 0 if the input string is larger than 32:
char str[32];
int scanCount;
scanCount = scanf("%32s", str);
Edit: I also need file pointer rollback when the input string is too large.
You specified a requirement to only read if the whole data fits your buffer. This requirement makes no sense at all as it doesn't provide any functionality to your program. You can easily achieve the same sort of tasks without it. It also is not how operating systems present files to the user applications.
You can simply create a buffer of any size you see fit and then you can keep the data in the buffer until you can handle it, or you can do magic like actually resizing the buffer to accomodate more incoming data.
You can read any number of characters from a file using the ANSI fread() function:
size_t count;
char buffer[50];
count = fread(buffer, 1, sizeof buffer, stdin);
You can then see how many characters have actually been read by looking at the count variable, you can fill in the final NUL character if it's less than the buffer size or you can decide what to do next, if the whole buffer has been read and more data may be availabe. You could of course read sizeof buffer - 1 instead, to be able to always finalize the string. When the count is smaller than your specified value, feof() and ferror() can be used to see what happened. You can also look at the actual and check for a LF character to see how many lines you have read.
When using an enlarging buffer, you will need malloc() or just create a NULL pointer that will later be allocated using realloc():
/* Set initial size and offset. */
size_t offset = 0;
size_t size = 0;
char *buffer = NULL;
When you need to change the size of the buffer, you can use realloc():
/* Change the size. */
size = 100;
buffer = realloc(buffer, size);
(The first time it's equivalent to buffer = malloc(size).)
You can then read data into the buffer:
size_t count = fread(buffer + offset, 1, size - offset, stdin);
count += offset;
(The first time it's equivalent to fread(buffer, 1, size, stdin).)
When finished, you should free the buffer:
free(buffer);
At any time, you still have all the already read data somewhere in a buffer, so you can get back to it at any time, you just decouple the reading and processing, where the above examples are all about reading.
The processing then depends on what you need. You generally need to identify the start and end of the data that you want to extract.
Example start and end, where end means one character after the last one you want, so the arithmetics work better:
size_t start = 0;
size_t end = 10;
Extract the data (using bits of C99):
char data[end - start + 1];
memcpy(data, buffer + start, end - start);
data[end] = '\0';
Now you have a NUL-terminated string containing the data you wanted to extract. Sometimes you just assume start = 0 and then want to consume the data from the buffer to make place for new data:
char data[end + 1];
/* copy out the data */
memcpy(data, buffer, end);
/* move data between end end offset to the beginning */
memmove(buffer, buffer + end, offset - end);
/* adjust the offset accordingly */
offset -= end;
Now you have your data extracted but you still have the buffer ready with the rest of the data you haven't processed, yet. This effectively achieves what you wanted, as by keeping the data in an intermediate buffer, you're effectively peeking into an arbitrary part of the data received on input and taking out the data only if it fits your expectations, doing whatever else if they don't.Of course you should carefully test all return values to check for exceptional conditions and such stuff.
I personally would also turn all indexes in the examples into pointers directly to the memory and adjust the arithmetics accordingly, but not everyone enjoys pointer arithmetics as I do ;). I also tend to prefer low-level POSIX API over the intermetiate layer in form of the ANSI API. Ready to fix bugs or improve explanations, please comment.
Your comment that you need the file pointer reset on scan failure makes this impossible to do with scanf().
scanf() is basically specified as "fscanf( stdin, ... )", and fscanf() is defined to "[push] back at most one input character onto the input stream" (C99, footnote 242). (I assume this is for the same reason that ungetc() is only required to support one byte of push-back: So that it can be conveniently buffered in memory.)
*scanf() is a poor choice to read uncertain inputs, for the reason described above and several other shortcomings when it comes to recovery-from-error. Generally speaking, if there is any chance that the input might not conform to the expected format, read input into an internal memory buffer first and then parse it from there.
Just read and store one character too many, and test for that.
char str[34]; // 33 characters + NUL terminator
int scanCount = scanf("%33s", str);
if (scanCount > 0 && strlen(str) > 32)
{
scanCount = 0;
}
On scanning a stream such as stdin is only allowed to "put back" up to 1 char. So scanning 32 or 33 char and then undoing is not possible.
If your input could use ftell() and fseek() (Available when stdin is redirected), code could
long pos = ftell(input);
char str[32+1];
int scanCount;
scanCount = fscanf(input, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
fseek(input, pos, SEEK_SET);
scanCount = fscanf(input, some_new_format, ....);
}
Otherwise use fgets() to read a maximal line and use sscanf()
char buf[1024];
if (fget(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
char str[32+1];
int scanCount;
scanCount = sscanf(buf, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
scanCount = sscanf(buf, some_new_format, ....);
}

Issue reading from sscanf

All this is probably a real simple one but I am missing something and hope you can help. Ok this is my issue as simple as I can put it.
I am returning a buffer from readfile after using a USB device. This all works ok and I can out put the buffer fine by using a loop like so
for (long i=0; i<sizeof(buffer); i++) //for all chars in string
{
unsigned char c = buffer[i];
switch (Format)
{
case 2: //hex
printf("%02x",c);
break;
case 1: //asc
printf("%c",c);
break;
} //end of switch format
}
When I use the text (%c) version I can see the data in the buffer in my screen the I way I expected it. However my issue is when I come to read it using sscanf. I use strstr to search some key in the buffer and use sscanf to retrieve its data. However, sscanf fails. What could be the problem?
Below is an example of the code I am using to scan the buffer and it works fine with this standalone version. Buffer section in the above code can't be read. Even though I can see it with printf.
#include <stdio.h>
#include <string.h>
#include <windows.h>
int main ()
{
// in my application this comes from the handle and readfile
char buffer[255]="CODE-12345.MEP-12453.PRD-222.CODE-12355" ;
//
int i;
int codes[256];
char *pos = buffer;
size_t current = 0;
//
while ((pos=strstr(pos, "PRD")) != NULL) {
if (sscanf(pos, "PRD - %d", codes+current))
++current;
pos += 4;
}
for (i=0; i<current; i++)
printf("%d\n", codes[i]);
system("pause");
return 0;
}
Thanks
The problem is that, your ReadFile is giving you non-printable characters before the data you are interested in, specifically with a '\0' in the beginning. Since strings in C are NUL-terminated, all standard functions assume there is nothing in the buffer.
I don't know what it is exactly that you are reading, but perhaps you are reading a message that contains a header? In such a case you should skip the header first.
Blindly trying to solve the problem, you can skip the bad characters manually, assuming they are all in the beginning.
First of all, let's make sure the buffer is always NUL-terminated:
char buffer[1000 + 1]; // +1 in case it read all 1000 characters
ReadFile(h,buffer,0x224,&read,NULL);
buffer[read] = '\0';
Then, we know that there are read number of bytes filled by ReadFile. We first need to go back from that to find out where the good data start. Then, we need to go further back and find the first place where the data is not interesting. Note that, I am assuming in the end of the message, there are no printable characters. If there are, then this gets more complicated. In such a case, it is better if you write your own strstr that doesn't terminate on '\0', but reads up to a given length.
So instead of
char *pos = buffer;
We do
// strip away the bad part in the end
for (; read > 0; --read)
if (buffer[read - 1] >= ' ' && buffer[read - 1] <= 126)
break;
buffer[read] = '\0';
// find where the good data start
int good_position;
for (good_position = read; good_position > 0; --good_position)
if (buffer[good_position - 1] < ' ' || buffer[good_position - 1] > 126)
break;
char *pos = buffer + good_position;
The rest can remain the same.
Note: I am going from the back of the array, because assuming the beginning is a header, then it may contain data that might be interpreted as printable characters. On the other hand, in the end it may be all zeros or something.

Cannot read binary video files in GNU/Linux

I'm stuck with an apparently harmless piece of code. I'm trying to read a whole flv video file into a uint8_t array, but by no reason only the 10 first bytes are read.
contents = malloc(size + 1);
if (read(fd, contents, size) < 0)
{
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
I've tried with fopen and "rb" also, but seems that Glibc ignores that extra 'b' or something. Any clues?
Thanks in advance.
Edit: Maybe it reads a EOF character?
PS. 'size' is a variable containing the actual file size using stat().
It seems the original code correctly reads the entire content.
The problem seems to be in making use of that binary data - printing it out will truncate at the first null, making it appear that only 10 bytes are present.
You can't use any methods intended for strings or character arrays to output binary data, as they will truncate at the first null byte, making it appear the array is shorter than it really is.
Check out some other questions related to viewing hex data:
how do I print an unsigned char as hex in c++ using ostream?
Converting binary data to printable hex
If you want to append this to a string - in what format? hex? base64? Raw bytes won't work.
Here's the original code I posted. A few minor improvements, plus some better diagnostic code:
int ret, size = 4096; /* Probably needs to be much bigger */
uint8_t *contents;
contents = malloc(size + 1);
if(contents == NULL)
{
log_message(WARNING, __func__, EMSG_MEMORY);
return (NULL);
}
ret = read(fd, contents, size);
if(ret < 0)
{
/* Error reading file */
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
for(i = 0;i < ret;++i)
{
printf("%c", contents[i]);
/* printf("%0.2X", (char) contents[i]); /* Alternatively, print in hex */
}
Now, is ret really 10? Or do you just get 10 bytes when you try to print the output?
The 'read()' function in the C library doesn't necessarily return the whole read in one shot. In fact, if you're reading very much data at all, it usually doesn't give it to you in a single call.
The solution to this is to call read() in a loop, continuing to ask for more data until you've got it all, or until read returns an error, indicated by a negative return value, or end-of-file, indicated by a zero return value.
Something like the following (untested):
contents = malloc(size + 1);
bytesread = 0;
pos = 0;
while (pos < size && (bytesread = read(fd, contents + pos, size - pos)) > 0)
{
pos += bytesread;
}
if (bytesread < 0)
{
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
/* Go on to use 'contents' now, since it's been filled. Should probably
check that pos == size to make sure the file was the size you expected. */
Note that most C programmers would do this a little differently, probably making 'pos' a pointer which gets moved along, rather than offsetting from 'contents' each time through the loop. But I thought this approach might be clearer.
On success, read() returns the number of bytes read (which may be less than what you asked for, at which point you should ask for the rest.) On EOF it will return 0 and on error it will return -1. There are some errors for which you might want to consider re-issuing the read (eg. EINTR which happens when you get a signal during a read.)

Resources