C using fread to read an unknown amount of data - c

I have a text file called test.txt
Inside it will be a single number, it may be any of the following:
1
2391
32131231
3123121412
I.e. it could be any size of number, from 1 digit up to x digits.
The file will only have 1 thing in it - this number.
I want a bit of code using fread() which will read that number of bytes from the file and put it into an appropriately sized variable.
This is to run on an embedded device; I am concerned about memory usage.
How to solve this problem?

You can simply use:
char buffer[4096];
size_t nbytes = fread(buffer, sizeof(char), sizeof(buffer), fp);
if (nbytes == 0)
...EOF or other error...
else
...process nbytes of data...
Or, in other words, provide yourself with a data space big enough for any valid data and then record how much data was actually read into the string. Note that the string will not be null terminated unless either buffer contained all zeroes before the fread() or the file contained a zero byte. You cannot rely on a local variable being zeroed before use.
It is not clear how you want to create the 'appropriately sized variable'. You might end up using dynamic memory allocation (malloc()) to provide the correct amount of space, and then return that allocated pointer from the function. Remember to check for a null return (out of memory) before using it.

If you want to avoid over-reading, fread is not the right function. You probably want fscanf with a conversion specifier along the lines of %100[0123456789]...

One way to achieve this is to use fseek to move your file stream location to the end of the file:
fseek(file, SEEK_END, SEEK_SET);
and then using ftell to get the position of the cursor in the file — this returns the position in bytes so you can then use this value to allocate a suitably large buffer and then read the file into that buffer.
I have seen warnings saying this may not always be 100% accurate but I've used it in several instances without a problem — I think the issues could be dependant on specific implementations of the functions on certain platforms.

Depending on how clever you need to be with the number conversion... If you do not need to be especially clever and fast, you can read it a character at a time with getc(). So,
- start with a variable initialized to 0.
- Read a character, multiply variable by 10 and add new digit.
- Then repeat until done.
Get a bigger sized variable as needed along the way or start with your largest sized variable and then copy it into the smallest size that fits after you finish.

Related

Using fread() to read a text based file - best practices

Consider this code to read a text based file. This sort of fread() usage was briefly touched upon in the excellent book C Programming: A Modern Approach by K.N. King.
There are other methods of reading text based files, but here I am concerned with fread() only.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
// Declare file stream pointer.
FILE *fp = fopen("Note.txt", "r");
// fopen() call successful.
if(fp != NULL)
{
// Navigate through to end of the file.
fseek(fp, 0, SEEK_END);
// Calculate the total bytes navigated.
long filesize = ftell(fp);
// Navigate to the beginning of the file so
// it can be read.
rewind(fp);
// Declare array of char with appropriate size.
char content[filesize + 1];
// Set last char of array to contain NULL char.
content[filesize] = '\0';
// Read the file content.
fread(content, filesize, 1, fp);
// Close file stream pointer.
fclose(fp);
// Print file content.
printf("%s\n", content);
}
// fopen() call unsuccessful.
else
{
printf("File could not be read.\n");
}
return 0;
}
There are some problems I have with this method. My opinion is that this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
To circumvent this issue, we may use a buffer size and keep on reading into a char array of that size. If filesize is less than buffer size, then we simply perform fread() once as described in the above code. Otherwise, We divide the total file size by the buffer size and get a result, whose int portion we will use as the total number of times to iterate a loop where we will invoke fread() each time, appending the read buffer array into a larger string. Now, for the final fread(), which we will perform after the loop, we will have to read exactly (filesize % buffersize) bytes of data into an array of that size and finally append this array into the larger string (Which we would have malloc-ed with filesize + 1 beforehand). I find that if we perform fread() for the last chunk of data using buffersize as its second parameter, then extra garbage data of size (buffersize - chunksize) will be read in and the data might become corrupted. Are my assumptions here correct? Please explain if/ how I have overlooked something.
Also, there is the issue that non-ASCII characters might not have size of 1 byte. In that case I would assume the proper amount is being read, but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?
this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
fread() does not care about strings (null character terminated arrays). It reads data as if it was in multiples of unsigned char*1 with no special concern to the data content if the stream opened in binary mode and perhaps some data processing (e.g. end-of-line, byte-order-mark) in text mode.
Are my assumptions here correct?
Failed assumptions:
Assuming ftell() return value equals the sum of fread() bytes.
The assumption can be false in text mode (as OP opened the file) and fseek() to the end is technical undefined behavior in binary mode.
Assuming not checking the return value of fread() is OK. Use the return value of fread() to know if an error occurred, end-of-file and how many multiples of bytes were read.
Assuming error checking is not required. , ftell(), fread(), fseek() instead of rewind() all deserve error checks. In particular, ftell() readily fails on streams that have no certain end.
Assuming no null characters are read. A text file is not certainly made into one string by reading all and appending a null character. Robust code detects and/or copes with embedded null characters.
Multi-byte: assuming input meets the encoding requirements. Example: robust code detects (and rejects) invalid UTF8 sequences - perhaps after reading the entire file.
Extreme: Assuming a file length <= LONG_MAX, the max value returned from ftell(). Files may be larger.
but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?
fread() does not function on multi-byte boundaries, only multiples of unsigned char. A given fread() may end with a portion of a multi-byte and the next fread() will continue from mid-multi-byte.
Instead of of 2 pass approach consider 1 single pass
// Pseudo code
total_read = 0
Allocate buffer, say 4096
forever
if buffer full
double buffer_size (`realloc()`)
u = unused portion of buffer
fread u bytes into unused portion of buffer
total_read += number_just_read
if (number_just_read < u)
quit loop
Resize buffer total_read (+ 1 if appending a '\0')
Alternatively consider the need to read the entire file in before processing the data. I do not know the higher level goal, but often processing data as it arrives makes for less resource impact and faster throughput.
Advanced
Text files may be simple ASCII only, 8-bit code page defined, one of various UTF encodings (byte-order-mark, etc. The last line may or may not end with a '\n'. Robust text processing beyond simple ASCII is non-trivial.
ASCII and UTF-8 are the most common. IMO, handle 1 or both of those and error out on anything that does not meet their requirements.
*1 fread() reads in multiple of bytes as per the 3rd argument, which is 1 in OP's case.
// v --- multiple of 1 byte
fread(content, filesize, 1, fp);

Will assigning a large value for length of char string be an issue?

I am reading a line from a file and I do not know the length it is going to be. I know there are ways to do this with pointers but I am specifically asking for just a plan char string. For Example if I initialize the string like this:
char string[300]; //(or bigger)
Will having large string values like this be a problem?
Any hard coded number is potentially too small to read the contents of a file. It's best to compute the size at run time, allocate memory for the contents, and then read the contents.
See Read file contents with unknown size.
char string[300]; //(or bigger)
I am not sure which of the two issues you are concerned with, so I will try to address both below:
if the string in the file is larger than 300 bytes and you try to "stick" that string in that buffer, without accounting the max length of your array -you will get undefined behaviour because of overwriting the array.
If you are just asking if 300 bytes is too much too allocate - then no, it is not a big deal unless you are on some very restricted device. e.g. In Visual Studio the default stack size (where that array would be stored) is 1 MB if I am not wrong. Benefits of doing so is understandable, e.g. you don't need to concern yourself with freeing it etc.
PS. So if you are sure the buffer size you specify is enough - this can be fine approach as you free yourself from memory management related issues - which you get from pointers and dynamic memory.
Will having large string values like this be a problem?
Absolutely.
If your application must read the entire line from a file before processing it, then you have two options.
1) Allocate buffer large enough to hold the line of maximum allowed length. For example, the SMTP protocol does not allow lines longer than 998 characters. In that case you can allocate a static buffer of length 1001 (998 + \r + \n + \0). Once you have read a line from a file (or from a client, in the example context) which is longer than the maximum length (that is, you have read 1000 characters and the last one is not \n), you can treat it as a fatal (protocol) error and report it.
2) If there are no limitations on the length of the input line, the only thing you can do to ensure your program robustness is allocating buffers dynamically as the input is read. This may involve storing multiple malloc-ed buffers in a linked list, or calling realloc when buffer exhaustion detected (this is how getline function works, although it is not specified in the C standard, only in POSIX.1-2008).
In either case, never use gets to read the line. Call fgets instead.
It all depends on how you read the line. For example:
char string[300];
FILE* fp = fopen(filename, "r");
//Error checking omitted
fgets(string, 300, fp);
Taken from tutorialspoint.com
The C library function char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.
That means that this will read 299 characters from the file at most. This will cause only a logical error (because you might not get all the data you need) that won't cause any undefined behavior.
But, if you do:
char string[300];
int i = 0;
FILE* fp = fopen(filename, "r");
do{
string[i] = fgetc(fp);
i++;
while(string[i] != '\n');
This will cause Segmantation Fault because it will try to write on unallocated memory on lines bigger than 300 characters.

How to save a specific length string from a file and work with it in C

So what I'm trying to do is open a file and read it until the end in blocks that are 256 bytes long each time it is called. My dilemma is using fgets() or fread() to do it.
I was using fgets() initially, because it returns a string of the bytes that were read, which is great because I can store that data and work with it. However, in my particular file that I'm reading, the 256 bytes often happen over a more than 2 lines, which is a problem because fgets() stops reading when it hits a newline character or the end of the file.
I then thought of using fread(), but I don't know how to save the line that I'm referring to with it because fread() returns an int referring to the number of elements successfully read (according to its documentation).
I've searched and thought of solutions for a while now and can't find anything that works with my particular scenario. I would like some guidance on how to go about this issue, how would you go about this in my position?
You can use fread() to read each 256 bytes block and keep a lineCount variable to keep track of the number of new line characters you have encountered so far in the input. Since you have to process the blocks already this wouldn't mean much of an overhead in the processing.
To read a block of 256 chars, which is what I think you are doing, you just need to create a buffer of chars that can hold 256 of them, in other words a char array of size 256.
#define BLOCK_SIZE 256
char block[BLOCK_SIZE];
Then if you check the documentation for fread() it shows the following signature:
Following is the declaration for fread() function.
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
Parameters
ptr -- This is the pointer to a block of memory with a minimum size of size*nmemb bytes.
size -- This is the size in bytes of each element to be read.
nmemb -- This is the number of elements, each one with a size of size bytes.
stream -- This is the pointer to a FILE object that an input stream.
So this means it takes a pointer to the buffer where it will write the read information, the size of each element it's supposed to read, the maximum amount of elements you want it to read and the file pointer. In your case it would be:
int read = fread(block, sizeof(char), BLOCK_SIZE, file);
This will copy the information from the file to the block array, which you can later process and keep track of the lines. The characters that were read by fread are in the block array, so the first char in the last read block would be block[0], the second block[1] and so on. The returned value in read indicates how many elements (in your case chars) were inserted in the array block when you call fread, this number will be equal to BLOCK_SIZE for every call, unless you reach the end of the file or there's an error.
I suggest you read some documentation for a full example, play a little with the code and do some reading on pointers in C to gain a better understanding of how everything works in general. If you still have questions after that, we can take it from there or you can create a new SO question.

Using fread function: size to be read is greater than available for reading

I have a question:
I am using fread to read a file.
typedef struct {
int ID1;
int ID2;
char string[256];
} Reg;
Reg *A = (Reg*) malloc(sizeof(Reg)*size);
size = FILESIZE/sizeof(Reg);
fread (A, sizeof(Reg), size, FILEREAD);
Using a loop, consecutively call this call, to make me read my entire file.
What will happen when I get near the end of the file, and I can not read "size" * sizeof (Reg), or if you are only available to read half this amount, what will happen with my array A. It will be completed? The function will return an error?
Knowing how the file was read by the fread through?
Edi1: Exactly, if the division is not exact, when I read the last bit smaller file size that I'll read things that are not on file, I'm wondering with my vector resize to the amount of bytes that I can read, or develop a dynamic better.
fread returns the number of records it read. Anything beyond that in your buffer may be mangled, do not rely on that data.
fread returns the number of full items actually read, which may be
less than count if an error occurs or if the end of the file is
encountered before reaching count.
The function will not read past the end of the file : the most likely occurrence is that you will get a bunch of full buffers and then a (final) partial buffer read, unless the file size is an exact multiple of your buffer length.
Your logic needs to accommodate this - the file size gives you the expected total number of records so it should not be hard to ignore trailing data in the buffer (after the final fread call) that corresponds to uninitialized records. A 'records remaining to read' counter would be one approach.
fread() returns the number of elements it could read. So you have to check the return value of fread() to find out how many elements in your array are valid.
It will return a short item count or zero if either an error occurred or EOF has is reached. You will have to use feof() ond ferror() in this case to check what condition is met.

fgetc(): Reading and storing a string of unknown length

What I need to do for an assignment is:
open a file (using fopen())
read the name of a student (using fgetc())
store that name in some part of a struct
The problem I have is that I need to read an arbitrary long string into name, and I don't know how to store that string without wasting memory (or writing into non-allocated memory).
EDIT
My first idea was to allocate a 1 byte (char) memory block, then call realloc() if more bytes are needed but this doesn't seem very efficient. Or maybe I could double the array if it is full and then at the end copy the chars into a new block of memory of the exact size.
Don't worry about wasting 100 or 1000 bytes which is likely to be long enough for all names.
I'd probably just put the buffer that you're reading into on the stack.
Do worry about writing over the end of the buffer. i.e. buffer overrun. Program to prevent that!
When you come to store the name into your structure you can malloc a buffer to store the name the exact length you need (don't forget to add an extra byte for the null terminator).
But if you really must store names of any length at all then you could do it with realloc.
i.e. Allocate a buffer with malloc of some size say 50 bytes.
Then when you need more space, use realloc to increase it's length. Increase the length in blocks of say 50 bytes and keep track with an int on how big it is so that you know when you need to grow it again. At some point, you will have to decide how long that buffer is going to be, because it can't grow indefinitely.
You could read the string character by character until you find the end, then rewind to the beginning, allocate a buffer of the right size, and re-read it into that, but unless you are on a tiny embedded system this is probably silly. For one thing, the fgetc, fread, etc functions create buffers in the O/S anyway.
You could allocate a temporary buffer that's large enough, use a length limited read (for safety) into that, and then allocate a buffer of the precise size to copy it into. You probably want to allocate the temporary buffer on the stack rather than via malloc, unless you think it might exceed your available stack space.
If you are writing single threaded code for a tiny system you can allocate a scratch buffer on startup or statically, and re-use it for many purposes - but be really carefully your usage can't overlap!
Given the implementation complexity of most systems, unless you really research how things work it's entirely possible to write memory optimized code that actually takes more memory than doing things the easy way. Variable initializations can be another surprisingly wasteful one.
My suggestion would be to allocate a buffer of sufficient size:
char name_buffer [ 80 ];
Generally, most names (at least common English names) will be less than 80 characters in size. If you feel that you may need more space than that, by all means allocate more.
Keep a counter variable to know how many characters you have already read into your buffer:
int chars_read = 0; /* most compilers will init to 0 for you, but always good to be explicit */
At this point, read character by character with fgetc() until you either hit the end of file marker or read 80 characters (79 really, since you need room for the null terminator). Store each character you've read into your buffer, incrementing your counter variable.
while ( ( chars_read < 80 ) && ( !feof( stdin ) ) ) {
name_buffer [ chars_read ] = fgetc ( stdin );
chars_read++;
}
if ( chars_read < 80 )
name_buffer [ chars_read ] = '\0'; /* terminating null character */
I am assuming here that you are reading from stdin. A more complete example would also check for errors, verify that the character you read from the stream is valid for a person's name (no numbers, for example), etc. If you try to read more data than for which you allocated space, print an error message to the console.
I understand wanting to maintain as small a buffer as possible and only allocate what you need, but part of learning how to program is understanding the trade-offs in code/data size, efficiency, and code readability. You can malloc and realloc, but it makes the code much more complex than necessary, and it introduces places where errors may come in - NULL pointers, array index out-of-bounds errors, etc. For most practical cases, allocate what should suffice for your data requirements plus a small amount of breathing room. If you find that you are encountering a lot of cases where the data exceeds the size of your buffer, adjust your buffer to accommodate it - that is what debugging and test cases are for.

Resources