I'm stuck with an apparently harmless piece of code. I'm trying to read a whole flv video file into a uint8_t array, but by no reason only the 10 first bytes are read.
contents = malloc(size + 1);
if (read(fd, contents, size) < 0)
{
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
I've tried with fopen and "rb" also, but seems that Glibc ignores that extra 'b' or something. Any clues?
Thanks in advance.
Edit: Maybe it reads a EOF character?
PS. 'size' is a variable containing the actual file size using stat().
It seems the original code correctly reads the entire content.
The problem seems to be in making use of that binary data - printing it out will truncate at the first null, making it appear that only 10 bytes are present.
You can't use any methods intended for strings or character arrays to output binary data, as they will truncate at the first null byte, making it appear the array is shorter than it really is.
Check out some other questions related to viewing hex data:
how do I print an unsigned char as hex in c++ using ostream?
Converting binary data to printable hex
If you want to append this to a string - in what format? hex? base64? Raw bytes won't work.
Here's the original code I posted. A few minor improvements, plus some better diagnostic code:
int ret, size = 4096; /* Probably needs to be much bigger */
uint8_t *contents;
contents = malloc(size + 1);
if(contents == NULL)
{
log_message(WARNING, __func__, EMSG_MEMORY);
return (NULL);
}
ret = read(fd, contents, size);
if(ret < 0)
{
/* Error reading file */
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
for(i = 0;i < ret;++i)
{
printf("%c", contents[i]);
/* printf("%0.2X", (char) contents[i]); /* Alternatively, print in hex */
}
Now, is ret really 10? Or do you just get 10 bytes when you try to print the output?
The 'read()' function in the C library doesn't necessarily return the whole read in one shot. In fact, if you're reading very much data at all, it usually doesn't give it to you in a single call.
The solution to this is to call read() in a loop, continuing to ask for more data until you've got it all, or until read returns an error, indicated by a negative return value, or end-of-file, indicated by a zero return value.
Something like the following (untested):
contents = malloc(size + 1);
bytesread = 0;
pos = 0;
while (pos < size && (bytesread = read(fd, contents + pos, size - pos)) > 0)
{
pos += bytesread;
}
if (bytesread < 0)
{
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
/* Go on to use 'contents' now, since it's been filled. Should probably
check that pos == size to make sure the file was the size you expected. */
Note that most C programmers would do this a little differently, probably making 'pos' a pointer which gets moved along, rather than offsetting from 'contents' each time through the loop. But I thought this approach might be clearer.
On success, read() returns the number of bytes read (which may be less than what you asked for, at which point you should ask for the rest.) On EOF it will return 0 and on error it will return -1. There are some errors for which you might want to consider re-issuing the read (eg. EINTR which happens when you get a signal during a read.)
Related
I recently started dabbing in C again, a language I'm not particularly proficient at and, in fact, keep forgetting (I mostly code in Python). My idea here is to read data from a hypothetically large file as chunks and then process the data accordingly. For now, I'm simulating this by actually loading the whole file into a buffer of type short with fread. This method will be changed, since it would be a very bad idea for, say, a file that's 1 GB, I'd think. The end goal is to read a chunk as one, process, move the cursor, read another chunk and so on.
The file in question is 43 bytes and has the phrase "The quick brown fox jumps over the lazy dog". This size is convenient because it's a prime number, so no matter how many bytes I split it into, there will always be trailing garbage (due to the buffer having leftover space?). Data processing in this case is just printing out the shorts as two chars after byte manipulation (see code below)
#include <stdio.h>
#include <stdlib.h>
#define MAX_BUFF_SIZE 1024
long file_size(FILE *f)
{
if (fseek(f, 0, SEEK_END) != 0) exit(EXIT_FAILURE); // Move cursor to the end
long file_size = ftell(f); // Determine position to get file size
rewind(f);
return file_size;
}
int main(int argc, char* argv[])
{
short buff[MAX_BUFF_SIZE] = {0}; // Initialize to 0 remove trailing garbage
char* filename = argv[1];
FILE* fp = fopen(filename, "r");
if (fp)
{
size_t size = sizeof(buff[0]); // Size in bytes of each chunk. Fixed to 2 bytes
int nmemb = (file_size(fp) + size - 1) / size; // Number of chunks to read from file
// (ceil f_size/size)
printf("Should read at most %d chunks\n", nmemb);
short mask = 0xFF; // Mask to take first or second byte
size_t num_read = fread(buff, size, nmemb, fp);
printf("Read %lu chunks\n\n", num_read); // Seems to have read more? Look into.
for (int i=0; i<nmemb; i++) {
char first_byte = buff[i] & mask;
char second_byte = (buff[i] >> 8) & mask; // Identity for 2 bytes. Keep mask for consistency
printf("Chunk %02d: 0x%04x | %c %c\n", // Remember little endian (bytes reversed)
i, buff[i], first_byte, second_byte);
}
fclose(fp);
} else
{
printf("File %s not found\n", filename);
return 1;
}
return 0;
}
Now yesterday, on printing out the last chunk of data I was getting "Chunk 21: 0xffff9567 | g". The last (first?) byte (0x67) is g, and I did expect some trailing garbage, but I don't understand why it was printing out so many bytes when the variable buff has shorts in it. At that point I was just printing the hex as %x, not %04x, and buff was not initialized to 0. Today, I decided to initialize it to 0 and not only did the garbage disappear, but I can't recreate the problem even after leaving buff uninitialized again.
So here are my questions that hopefully aren't too abstract:
Does fread look beyond the file when reading data and does it remove trailing garbage itself, or is it up to us?
Why was printf showing an int when the buffer is a short? (I assume %x is for ints) and why can't I replicate the behaviour even after leaving buff without initialization?
Should I always initialize the buffer to zero to remove trailing garbage? What's the usual approach here?
I hope these aren't too many, or too vague, questions, and that I was clear enough. Like I said, I don't know much about C but find low-mid level programming very interesting, especially when it comes to direct data bit/byte manipulation.
Hope you have a great day!
EDIT 1:
Some of you wisely suggested I use num_read instead of nmemb on the loop, since that's the return value of fread, but that means I'll discard the rest of the file (nmemb is 22 but num_read is 21). Is that the usual approach? Also, thank you for pointing out that %x was casting to unsigned int, hence the 4 bytes instead of 2.
EDIT 2:
For clarification, and since I mispoke in a comment, I'd like to keep the remaining byte (or data), while discarding the rest, which is undefined. I don't know if this is the usual approach since if I use num_read in the loop, whatever is leftover at the end is discarded, data or not. I'm more interested in knowing what the usual approach is: discard leftover data or remove anything that we know is undefined, in this case one of the bytes.
I have a problem which may look like if I copied my homework, but it's not my homework. It was part of a Test at University and I want to solve it (as well as others) at home so that I am prepared for the next Test.
My goal here is that I understand so that I can solve similar questions on my own. I am familiar with high-level languages but C is one of my weaknesses, this is why I have problems here.
The Question
Given the following method:
int safe_read(int fd, char *buff, size_t len){
do {
errno = 0;
pos += read(fd, buff + pos, len);
if (-1 == len) {
if (ENTER != errno) {
return 0;
}
}
} while(ENTER == errno);
return pos;
}
and the following call:
pos = safe_read(STDIN_FILENO, msg, 225);
Analyse the code and answer the following questions:
1) Does safe-read return the number of bytes read correctly in all cases?
2) If not, how can this be fixed?
Well. For now, I already understood the following:
1)
No it does not, as for the following reasons:
-the caller may set fd to something invalid.
-pos is not properly initialized.
-The variable ENTER is also not initialized.
-if(-1==len) does not make sense as len is a parameter set by the caller, thus will alsways be true or false at one call.
-it is also not safe as it is possible to go beyond the buffer's maximum size (by setting len to a value >= sizeof(buff))
-it does not return the number of characters read in all cases, as f.e. when I read len characters the first time, And the second time it fails. I will then return zero even though len characters have already been read.
2) Here is my fix.
initialize variables
/*
For a better understanding, I write what I understood about what this method is supposed to do:
-reads characters into char* buff.
-returns the number of characters read as int
-fd is a file descriptor of the file to read
-len is the number of bytes to be read
*/
int safe_read(int fd, char *buff, size_t len){
int ENTER=0;
int pos=0;
do {
errno = 0;
pos += read(fd, buff + pos, len);
if (len < 0) {
if (ENTER != errno) {
return 0;
}
}
} while(ENTER == errno);
return pos;
}
Did I understand you correctly? Is my fix correct?
Thank you!
Special thanks to Paul Ogilvie for the help before my edit!
Your code contains many errors and other members wonder if this is homework. But I'll attempt to help you. First your code:
int safe_read(int fd, char *buff, size_t len)
{
int pos= 0;
do{
errno=0;
pos+=read(fd, buff+pos, len);
if(-1==len){
if(ENTER!=errno){
return 0;
}
}
}while(ENTER==errno);
return pos;
}
Variable pos was not defined and even it it was defined global, you probably would have to initialize it to zero.
Then your funny variable ENTER, which is neither defined and, more importantly, is never set in your code. So it won't change value. What is your intention with this variable?
Then if(-1==len). len is a parameter that doesn't change so either it was -1 or it never will be. Clearly you want to check for an error on read, but this is not the way.
Then whether this is safe: no, it isn't. Assuming that len is the size of buff, then you repeatedly append len characters to buff, so at the second read it will go beyond the buffer's size.
And lastly whether this function will always return the correct number of characters read: no it doesn't. Suppose you read len characters the first time, and the second time it fails. You then return 0 but len characters had already been read.
I'm trying to read characters from a file and count the frequency of a particular word in a file using system calls, but the behavior of one of my read() calls is confusing me. This is the code that I've written:
int counter, seekError,readVal;
counter = 0;
char c[1];
char *string = "word";
readVal = read(fd,c,1);
while (readVal != 0){ // While not the end of the file
if(c[0] == string[0]) { // Match the first character
seekError = lseek(fd,-1,SEEK_CUR); // After we find a matching character, rewind to capture the entire word
char buffer[strlen(string)+1];
buffer[strlen(string)] = '\0';
readVal = read(fd,buffer,strlen(string)); // This read() does not put anything into the buffer
if(strcmp(lowerCase(buffer),string) == 0)
counter++;
lseek(fd,-(strlen(string)-1),SEEK_CUR); // go back to the next character
}
readVal = read(fd,c,1);
}
In all the read calls that I use, I am able to read characters with no problem from my file. However, the readVal = read(fd,buffer,strlen9string)); line never puts anything into buffer, no matter how I try to read the characters. Is there anything going on behind the scenes that would explain this kind of behavior? I've tried running this code on different machines as well, but I still get nothing in buffer at that line.
It shouldn't be necessary to cast -1 into the off_t type. It looks like your real bug is that you didn't include <unistd.h> so lseek wasn't properly declared when you used it. Either that or there's a serious bug in your system's implementation of lseek.
The problem here was that the -1 in the seekError = lseek(fd,-1,SEEK_CUR); line was being interpreted as 4294967295. After casting it into the off_t type, the system interpreted the offset as -1 instead of the large number.
So the corrected line is: seekError = lseek(fd,(off_t)-1,SEEK_CUR);
I'm trying to use the getdelim function to read an entire text file's contents into a string.
Here is the code I am using:
ssize_t bytesRead = getdelim(&buffer, 0, '\0', fp);
This is failing however, with strerror(errno) saying "Error: Invalid Argument"
I've looked at all the documentation I could and just can't get it working, I've tried getline which does work but I'd like to get this function working preferably.
buffer is NULL initialised as well so it doesn't seem to be that
fp is also not reporting any errors and the file opens perfectly
EDIT: My implementation is based on an answer from this stackoverflow question Easiest way to get file's contents in C
Kervate, please enable compiler warnings (-Wall for gcc), and heed them. They are helpful; why not accept all the help you can get?
As pointed out by WhozCraig and n.m. in comments to your original question, the getdelim() man page shows the correct usage.
If you wanted to read records delimited by the NUL character, you could use
FILE *input; /* Or, say, stdin */
char *buffer = NULL;
size_t size = 0;
ssize_t length;
while (1) {
length = getdelim(&buffer, &size, '\0', input);
if (length == (ssize_t)-1)
break;
/* buffer has length chars, including the trailing '\0' */
}
free(buffer);
buffer = NULL;
size = 0;
if (ferror(input) || !feof(input)) {
/* Error reading input, or some other reason
* that caused an early break out of the loop. */
}
If you want to read the contents of a file into a single character array, then getdelim() is the wrong function.
Instead, use realloc() to dynamically allocate and grow the buffer, appending to it using fread(). To get you started -- this is not complete! -- consider the following code:
FILE *input; /* Handle to the file to read, assumed already open */
char *buffer = NULL;
size_t size = 0;
size_t used = 0;
size_t more;
while (1) {
/* Grow buffer when less than 500 bytes of space. */
if (used + 500 >= size) {
size_t new_size = used + 30000; /* Allocate 30000 bytes more. */
char *new_buffer;
new_buffer = realloc(buffer, new_size);
if (!new_buffer) {
free(buffer); /* Old buffer still exists; release it. */
buffer = NULL;
size = 0;
used = 0;
fprintf(stderr, "Not enough memory to read file.\n");
exit(EXIT_FAILURE);
}
buffer = new_buffer;
size = new_size;
}
/* Try reading more data, as much as fits in buffer. */
more = fread(buffer + used, 1, size - used, input);
if (more == 0)
break; /* Could be end of file, could be error */
used += more;
}
Note that the buffer in this latter snippet is not a string. There is no terminating NUL character, so it's just an array of chars. In fact, if the file contains binary data, the array may contain lots of NULs (\0, zero bytes). Assuming there was no error and all of the file was read (you need to check for that, see the former example), buffer contains used chars read from the file, with enough space allocated for size. If used > 0, then size > used. If used == 0, then size may or may not be zero.
If you want to turn buffer into a string, you need to decide what to do with the possibly embedded \0 bytes -- I recommend either convert to e.g. spaces or tabs, or move the data to skip them altogether --, and add the string-terminating \0 at end to make it a valid string.
EDIT: It has been proven in the comments that defining the length instead should produce the same results and would not use any significant extra data. If you are looking for a way to send data between machines running your program(s), sending the length is better than reading until a terminating character. BonzaiThePenguin has some very good points you should look at.
But for educational purposes: I never found good example code that does this for standard C sockets that handles situations where the data is not all received in one packet, or multiple separate messages are contained within one packet. Simply calling recv repeatedly will not work in all cases.
This is one of those questions where I've answered it myself below, but I'm not 100% confident in my response.
It isn't 'dangerous to allow the client to specify the size of the message it is sending'. Most of the protocols in the word do that, including HTTP and SSL. It's only dangerous when implementations don't bounds-check messages properly.
The fatal flaw with your suggestion is that it doesn't work for binary data: you have to introduce an escape character so that the terminating character can appear within a message, and then of course you also need to escape the escape. All this adds processing and data copying at both ends.
Here is what I came up with. I cannot guarantee that this is perfect because I am not a professional, so if there are any mistakes, I (and anyone else looking for help) would greatly appreciate it if someone would point them out.
Context: socket is the socket, buffer is the array that stores all network input, line is the array that stores just one message extracted from buffer (which is what the rest of your program uses), length is the length of both inputted arrays, and recvLength is a pointer to an integer stored outside of the function that is meant to be 0 initially and should not be freed or modified by anything else. That is, it should persist across multiple calls to this function on the same socket. This function returns the length of the data outputted in the line array.
size_t recv_line(int socket, char* buffer, char* line, size_t length, size_t* recvLength){ //receives until '\4' (EOT character) or '\0' (null character)
size_t readHead = 0;
size_t lineIndex = 0;
char currentChar = 0;
while (1){
for (; readHead < *recvLength; readHead = readHead + 1){
currentChar = buffer[readHead];
if (currentChar=='\4' || currentChar=='\0'){ //replace with the end character(s) of your choice
if (DEBUG) printf("Received message===\n%s\n===of length %ld\n", line, lineIndex+1);
memcpy(buffer, buffer + readHead + 1, length-(readHead)); //shift the buffer down
*recvLength -= (readHead + 1); //without the +1, I had an "off by 1" error before!
return lineIndex+1; //success
}
if (readHead >= length){
if (DEBUG) printf("Client tried to overflow the input buffer. Disconnecting client.\n");
*recvLength = 0;
return 0;
}
line[lineIndex] = currentChar;
lineIndex++;
}
*recvLength = recv(socket, buffer + readHead, length, 0);
}
printf("Unknown error in recv_line!\n");
return 0;
}
Simple example usage:
int function_listening_to_network_input(int socket){
char netBuffer[2048];
char lineBuffer[2048];
size_t recvLength = 0;
while (1==1){
size_t length = recv_line(socket, netBuffer, lineBuffer, 2048, &recvLength);
// handle it…
}
return 0;
}
Note that this does not always leave line as a null-terminated string. If you want it to, it's easy to modify.