Related
How to read text from a file into a dynamic array of characters?
I found a way to count the number of characters in a file and create a dynamic array, but I can't figure out how to assign characters to the elements of the array?
FILE *text;
char* Str;
int count = 0;
char c;
text = fopen("text.txt", "r");
while(c = (fgetc(text))!= EOF)
{
count ++;
}
Str = (char*)malloc(count * sizeof(char));
fclose(text);
There is no portable, standard-conforming way in C to know in advance how may bytes may be read from a FILE stream.
First, the stream might not even be seekable - it can be a pipe or a terminal or even a socket connection. On such streams, once you read the input it's gone, never to be read again. You can push back one char value, but that's not enough to be able to know how much data remains to be read, or to reread the entire stream.
And even if the stream is to a file that you can seek on, you can't use fseek()/ftell() in portable, strictly-conforming C code to know how big the file is.
If it's a binary stream, you can not use fseek() to seek to the end of the file - that's explicitly undefined behavior per the C standard:
... A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
Footnote 268 even says:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream ...
So you can't portably use fseek() in a binary stream.
And you can't use ftell() to get a byte count for a text stream. Per the C standard again:
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
Systems do exist where the value returned from ftell() is nothing like a byte count.
The only portable, conforming way to know how many bytes you can read from a stream is to actually read them, and you can't rely on being able to read them again.
If you want to read the entire stream into memory, you have to continually reallocate memory, or use some other dynamic scheme.
This is a very inefficient but portable and strictly-conforming way to read the entire contents of a stream into memory (all error checking and header files are omitted for algorithm clarity and to keep the vertical scrollbar from appearing - it really needs error checking and will need the proper header files):
// get input stream with `fopen()` or some other manner
FILE *input = ...
size_t count = 0;
char *data = NULL;
for ( ;; )
{
int c = fgetc( input );
if ( c == EOF )
{
break;
}
data = realloc( data, count + 1 );
data[ count ] = c;
count++;
}
// optional - terminate the data with a '\0'
// to treat the data as a C-style string
data = realloc( data, count + 1 );
data[ count ] = '\0';
count++;
That will work no matter what the stream is.
On a POSIX-style system such as Linux, you can use fileno() and fstat() to get the size of a file (again, all error checking and header files are omitted):
char *data = NULL;
FILE *input = ...
int fd = fileno( input );
struct stat sb;
fstat( fd, &sb );
if ( S_ISREG( sb.st_mode ) )
{
// sb.st_size + 1 for C-style string
char *data = malloc( sb.st_size + 1 );
data[ sb.st_size ] = '\0';
}
// now if data is not NULL you can read into the buffer data points to
// if data is NULL, see above code to read char-by-char
// this tries to read the entire stream in one call to fread()
// there are a lot of other ways to do this
size_t totalRead = 0;
while ( totalRead < sb.st_size )
{
size_t bytesRead = fread( data + totalRead, 1, sb.st_size - totalRead, input );
totalRead += bytesRead;
}
The above could should work on Windows, too. You may get some compiler warnings or have to use _fileno(), _fstat() and struct _stat instead, too.*
You may also need to define the S_ISREG() macro on Windows:
#define S_ISREG(m) (((m) & S_IFMT) == S_IFREG)
* that's _fileno(), _fstat(), and struct _stat without the hyperlink underline-munge.
For a binary file, you can use fseek and ftell to know the size without reading the file, allocate the memory and then read everything:
...
text = fopen("text.txt", "r");
fseek(txt, 0, SEEK_END);
char *ix = Str = malloc(ftell(txt);
while(c = (fgetc(text))!= EOF)
{
ix++ = c;
}
count = ix - Str; // get the exact count...
...
For a text file, on a system that has a multi-byte end of line (like Windows which uses \r\n), this will allocate more bytes than required. You could of course scan the file twice, first time for the size and second for actually reading the characters, but you can also just ignore the additional bytes, or you could realloc:
...
count = ix - Str;
Str = realloc(Str, count);
...
Of course for a real world program, you should control the return values of all io and allocation functions: fopen, fseek, fteel, malloc and realloc...
To just do what you asked for, you would have to read the whole file again:
...
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
// do stuff with it
// ...
fclose(text);
But your string is not null terminated this way. That will get you in some trouble if you try to use some common string functions like strlen.
To properly null terminate your string you would have to allocate space for one additional character and set that last one to '\0':
...
// allocate count + 1 (for the null terminator)
Str = (char*)malloc((count + 1) * sizeof(char));
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
// add null terminator
Str[count] = '\0';
// do stuff with it
// ...
fclose(text);
Now if you want know the number of characters in the file without counting them one by one, you could get that number in a more efficient way:
...
text = fopen("text.txt", "r");
// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
count = ftell(text)
// allocate count + 1 (for the null terminator)
Str = (char*)malloc((count + 1) * sizeof(char));
...
Now bring this in a more structured form:
// open file
FILE *text = fopen("text.txt", "r");
// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
ssize_t count = ftell(text)
// allocate count + 1 (for the null terminator)
char* Str = (char*)malloc((count + 1) * sizeof(char));
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
fclose(text);
// add null terminator
Str[count] = '\0';
// do stuff with it
// ...
Edit:
As Andrew Henle pointed out not every FILE stream is seekable and you can't even rely on being able to read the file again (or that the file has the same length/content when reading it again). Even though this is the accepted answer, if you don't know in advance what kind of file stream you're dealing with, his solution is definitely the way to go.
I am attempting to read a file into a character array, but when I try to pass in a value for MAXBYTES of 100 (the arguments are FUNCTION FILENAME MAXBYTES), the length of the string array is 7.
FILE * fin = fopen(argv[1], "r");
if (fin == NULL) {
printf("Error opening file \"%s\"\n", argv[1]);
return EXIT_SUCCESS;
}
int readSize;
//get file size
fseek(fin, 0L, SEEK_END);
int fileSize = ftell(fin);
fseek(fin, 0L, SEEK_SET);
if (argc < 3) {
readSize = fileSize;
} else {
readSize = atof(argv[2]);
}
char *p = malloc(fileSize);
fread(p, 1, readSize, fin);
int length = strlen(p);
filedump(p, length);
As you can see, the memory allocation for p is always equal to filesize. When I use fread, I am trying to read in the 100 bytes (readSize is set to 100 as it should be) and store them in p. However, strlen(p) results in 7 during if I pass in that argument. Am I using fread wrong, or is there something else going on?
Thanks
That is the limitation with attempting to read text with fread. There is nothing wrong with doing so, but you must know whether the file contains something other than ASCII characters (such as the nul-character) and you certainly cannot treat any part of the buffer as a string until you manually nul-terminate it at some point.
fread does not guarantee the buffer will contain a nul-terminating character at all -- and it doesn't guarantee that the first character read will not be the nul-character.
Again, there is nothing wrong with reading an entire file into an allocated buffer. That's quite common, you just cannot treat what you have read as a string. That is a further reason why there are character oriented, formatted, and line oriented input functions. (getchar, fgetc, fscanf, fgets and POSIX getline, to list a few). The formatted and line oriented functions guarantee a nul-terminated buffer, otherwise, you are on your own to account for what you have read, and insure you nul-terminate your buffer -- before treating it as a string.
I am using this code to read a file:
char* fs_read_line(FILE* file)
{
if (file == NULL) {
return "CFILEIO: Error while reading the file: Invalid File";
}
long threshold = ftell(file);
fseek(file, 0, SEEK_END);
uint8_t* buffer = calloc(ftell(file)-threshold, sizeof(uint8_t));
if(buffer == NULL)
return;
int8_t _;
fseek(file, threshold, SEEK_SET);
uint32_t ct = 0;
while ((_ = (char)(fgetc(file))) != '\n'
&& _ != '\0' && _ != '\r' && _ != EOF) {
buffer[ct++] = _;
}
buffer = realloc(buffer, sizeof *buffer * (ct + 1));
buffer[ct] = '\0';
return buffer;
}
If the file is too big, I get (heap) overflow errors, probably because I initally allocate the file with the total amount of characters it contains.
an other way I tried to do this is by realloc the buffer after every iteration, but that's kinda not the approach I want.
Is there any way to dynamicly change the size of the array depending on the the current iteration without always uisng realloc ? or is there an way to determine how long the current line is by using ftell and fseek?
Code does not return a pointer to a string.
There is no null character in the returned buffer, so the calling code lacks the ability to know the length of the allocated memory. This certainly causes the calling code to error.
When re-allocating, add 1.
// buffer = realloc(buffer, ct * sizeof(uint8_t*));
// v--- no star
buffer = realloc(buffer, ct * sizeof(uint8_t ) + 1);
buffer[ct] = '\0';
// or better
size_t ct = 0;
...
buffer = realloc(buffer, sizeof *buffer * (ct + 1));
buffer[ct] = '\0';
Is there any way to dynamically change the size of the array allocated memory depending on the the current iteration without always using realloc?
Array sizes cannot change. To dynamically change the size of the allocated memory requires realloc(). Note: the amount of needed memory could be determined before a memory allocation call.
or is there an way to determine how long the current line is by using ftell and fseek?
Like this code, you have found an upper bound to the current line's length. ftell and fseek do not locate the end of line.
Code could "seek" to the end of line with fscanf(file, "%*[^\n]"); or 1 beyond with a following fgetc(file).
If your file can't fit in memory it can't fit in memory. You are allocating the memory buffer in advance, but you're making two mistakes that can cause you to allocate more than you need.
You're starting at some arbitrary position in the file, but allocate memory as if you're starting at the beginning of the file. Allocate ftell(file) - threshold bytes.
You are are allocation way too much memory. The sizeof(uint8_t *) should be sizeof(uint8_t) instead. You're allocating 4 or 8 times more memory than you should.
Other than that, what's the point of reallocating the buffer after you're done writing to it? The memory overflow has already happened. You should allocate before writing (inside the while loop). I don't see the point of reallocating at all, though, since you're allocating more than enough memory to begin with.
the following code:
cleanly compiles
performs the desired operation
properly handles error conditions
properly declares variable types
properly returns a char* rather than a uint8_t*
leaves open the question: why return 2x the needed buffer length
the error message displayed when the passed in parameter is NULL is not correct. Suggest changing to indicate passed in file pointer was NULL
the OPs posted code fails to check the returned value from each call to fseek() and fails to check the return value from each call to ftell() which it should be doing to assure the operation(s) was successful. I did not add that error checking in my answer so as to not clutter the code, however, it should be performed.
and now, the code:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
char* fs_read_line(FILE* file);
char* fs_read_line(FILE* file)
{
if ( !file )
{
return "CFILEIO: Error while reading the file: Invalid File";
}
// implied else, valid parameter
long threshold = ftell(file);
fseek(file, 0, SEEK_END);
char* buffer = calloc( (size_t)(ftell(file) - threshold) *2 +1, sizeof(char));
if(buffer == NULL)
return NULL;
// implied else, calloc successful
int ch;
fseek(file, threshold, SEEK_SET);
size_t ct;
while ( (ch = fgetc(file)) != '\n'
&& ch != '\0'
&& ch != '\r'
&& ch != EOF)
{
buffer[ct++] = (char)ch;
}
return buffer;
} // end function: fs_read_line
if i am writing into file with fwrite as follows
char buffer[3]={255,255,255,'\0'};
char buffer2[3]={0,0,0,'\0'};
fwrite(buffer, sizeof(char), sizeof(buffer), outputFile);
fwrite(buffer2, sizeof(char), sizeof(buffer2), outputFile);
what i am trying to understand is the third argument, sizeof(buffer)
my issue is that if the array has an appended '\0' nul character, does fwrite copy the nul character also into the file
also if i used a while loop to write into a file such as
int i=0;
while(i++<100){
fwrite(buffer, sizeof(char), sizeof(buffer), outputFile);
fwrite(buffer2, sizeof(char), sizeof(buffer2), outputFile);
}
what are the potential problems here directly to do with the nul character
also one more question , as i am having a problem with my program
int main (int argc, char *argv[]) {
// check that the types have the size i'm relying on here
assert (sizeof(bits8) == 1);
assert (sizeof(bits16) == 2);
assert (sizeof(bits32) == 4);
FILE *outputFile;
int squareSize;
outputFile = fopen(BMP_FILE, "wb");
assert ((outputFile!=NULL) && "Cannot open file");
writeHeader(outputFile);
printf ("Enter square size (must be a factor of %d): \n", SIZE);
scanf ("%d", &squareSize);
assert (SIZE % squareSize == 0);
char buffer[squareSize*BYTES_PER_PIXEL]; //white
char buffer2[squareSize*BYTES_PER_PIXEL]; //black
initialize(buffer, buffer2, squareSize);
int line=0;
int m=1;
while(line<SIZE){
if(line%squareSize==0&&m==1)
m=0;
else if(line%squareSize==1&&m==0)
m=1;
writeToFile(buffer,buffer2,m,outputFile,squareSize);
line+=squareSize;
printf("\nline is %d inside while loop ",line);
}
fclose(outputFile);
return EXIT_SUCCESS;
}
void writeHeader (FILE *file) {
assert(sizeof (bits8) == 1);
assert(sizeof (bits16) == 2);
assert(sizeof (bits32) == 4);
bits16 magicNumber = MAGIC_NUMBER;
fwrite (&magicNumber, sizeof magicNumber, 1, file);
bits32 fileSize = OFFSET + (SIZE * SIZE * BYTES_PER_PIXEL);
fwrite (&fileSize, sizeof fileSize, 1, file);
bits32 reserved = 0;
fwrite (&reserved, sizeof reserved, 1, file);
bits32 offset = OFFSET;
fwrite (&offset, sizeof offset, 1, file);
bits32 dibHeaderSize = DIB_HEADER_SIZE;
fwrite (&dibHeaderSize, sizeof dibHeaderSize, 1, file);
bits32 width = SIZE;
fwrite (&width, sizeof width, 1, file);
bits32 height = SIZE;
fwrite (&height, sizeof height, 1, file);
bits16 planes = NUMBER_PLANES;
fwrite (&planes, sizeof planes, 1, file);
bits16 bitsPerPixel = BITS_PER_PIXEL;
fwrite (&bitsPerPixel, sizeof bitsPerPixel, 1, file);
bits32 compression = NO_COMPRESSION;
fwrite (&compression, sizeof compression, 1, file);
bits32 imageSize = (SIZE * SIZE * BYTES_PER_PIXEL);
fwrite (&imageSize, sizeof imageSize, 1, file);
bits32 hResolution = PIX_PER_METRE;
fwrite (&hResolution, sizeof hResolution, 1, file);
bits32 vResolution = PIX_PER_METRE;
fwrite (&vResolution, sizeof vResolution, 1, file);
bits32 numColors = NUM_COLORS;
fwrite (&numColors, sizeof numColors, 1, file);
bits32 importantColors = NUM_COLORS;
fwrite (&importantColors, sizeof importantColors, 1, file);
}
void initialize(char *buffer, char*buffer2, int size){
//white for buffer 255,255,255 1 pixel
//black for buffer2 00,00,00 1 pixel
int buf = 255;
int buf2 = 0;
int i = 0;
while(i<size*3){
buffer[i]=buf;
buffer2[i]=buf2;
printf("\nbuffer 1 [i] is %c and i is %d\n",buffer[i],i);
printf("buffer 2 [i] is %c and i is %d\n",buffer2[i],i);
i++;
// printf("\nline ran %d times inside initialize loop ",i);
}
buffer[i]='\0';
buffer2[i]='\0';
printf("%s\n",buffer);
printf("%s",buffer2);
}
void writeToFile(char *buffer,char *buffer2, int m, FILE *file,
int squareSize){
int k = 0;
// printf("\nline ran %d times",line);
if(m==0){
while(k<(SIZE/squareSize)){
fwrite(buffer2, sizeof(char), sizeof(buffer2), file);
k+=1;
fwrite(buffer, sizeof(char), sizeof(buffer), file);
k+=1;
// printf("\nline ran %d times inside first if ",line);
}
}
else if(m==1){
while(k<(SIZE/squareSize)){
fwrite(buffer, sizeof(char), sizeof(buffer), file);
k+=1;
fwrite(buffer2, sizeof(char), sizeof(buffer2), file);
k+=1;
// printf("\nline ran %d times inside second else if",line);
}
}
}
the program is supposed to write to a file that is supposed to be a bmp file
the header write output function works fine
however i am having problem with the function initialize and the function writeToFile which i think has to do witht he nul pointer because i think fwrite is taking the nul pointer over as well and causing the bmp file to have the wrong information throw into it , also if i do remove teh nul character does fwrite produce problems by not stopping at the specified spot or does it stil copy correctly
i dont know what the problem is but the program is not writing in the order that i imagined
i have been at it ALL NIGHT and it still does not function correctly i am not sure where the problem is
the program is supposed to write into a 512 by 512 output file which is supposed to print out checkered black and white squares based on the input by user of a square which is a factor of 512
so that if the person chooses the input to be 256 , the program is supposed to divide the 512 by 512 space into 4 square with the first square being black then white, then white then black and etc
if the person choose 16 as the size of square in pixels, then i am supposed to divide the space into squares of 16 pixels sides ion the order of starting with black then white(across) next line above white then black all the way to the end
i think the problem is with my write to File function but i am not sure what the problem is, really confusing
hope you can help to give me some suggestions on how to deal with this problem
any help would be highly appreciated so that i can get this over and done with
The second argument of fwrite() is the size of each object, and the third argument is the number of objects. In your case, you tell it to write 4 objects of size 1, but here you could just write one object of size 4 just as easily.
"Nul characters" are irrelevant here, since they terminate strings, and fwrite() explicitly deals with binary objects, not strings. It'll write exactly what you give it to the file.
As an aside, a regular unqualified char can be (and often is) signed, rather than unsigned, so trying to stuff all those 255s into them may not be wise. unsigned char may be better for this.
In the writeToFile() function in your third block of code, this:
fwrite(buffer, sizeof(char), sizeof(buffer), file);
is a simple misunderstanding of how the sizeof operator works. In this context, buffer is not an array, it's a pointer to char that you have passed to the function. sizeof will therefore give you the size of a character pointer, usually 4 or 8 bytes, and that size will obviously be totally unrelated to the size of the buffer it points to. If char pointers are 8 bytes on your system, then sizeof(buffer) in this context will always evaluate to 8, regardless of whether you initially set up a 3 byte array, or a 4 byte array, or a 672 byte array.
To make what you want to do work in a function like this, you'll have to explicitly pass to the writeToFile() function the size of the buffer you created, and use that instead of sizeof.
fwrite and fread don't care at all about zero bytes. They simply write or read the number of bytes you ask for.
And note that buffer2 does not just have a '\0' appended but is in fact all '\0's since 0 and '\0' are identical in that context.
You don't need to zero-terminate your buffers since they aren't strings, they're just data and can contain zero-bytes within them.
Question: does fwrite copy the nul character also into the file?
Answer: The way you are calling it, the answer is "Yes". The third argument to the buffer is the number of objects that you wish to write to the stream.
Solution to your problem
You can change your variables
char buffer[3]={255,255,255,'\0'}; // This should be a compiler error/warning.
char buffer2[3]={0,0,0,'\0'}; // You have 4 items in {} for an array of size 3.
to
char buffer[3]={255,255,255};
char buffer2[3]={0,0,0};
You need a '\0' at the end of an array of char only if you want to treat the array as a null-terminated string. Since you are using the array to store only pixel values, you don't need to have the terminating null character in your array.
Potential pitfalls of not having the null character is that most functions that work with strings expect a terminating null character. They won't work with buffer or buffer2. Don't use:
Any of the standard string manipulation functions, such as strlen, strcpy.
printf("%s", buffer);.
and many other functions.
Considering your usage of those variable, I don't think you need to use them.
#define "/local/home/..."
FILE *fp;
short *originalUnPacked;
short *unPacked;
int fileSize;
fp = fopen(FILENAME, "r");
fseek (fp , 0 , SEEK_END);
fileSize = ftell (fp);
rewind (fp);
originalUnPacked = (short*) malloc (sizeof(char)*fileSize);
unPacked = (short*) malloc (sizeof(char)*fileSize);
fread(unPacked, 1, fileSize, fp);
fread(originalUnPacked, 1, fileSize, fp);
if( memcmp( unPacked, originalUnPacked, fileSize) == 0)
{
print (" unpacked and original unpacked equal ") // Not happens
}
My little knowldege of C says that the print statement in the last if block should be printed but it doesnt, any ideas Why ??
Just to add more clarity and show you the complete code i have added a define statement and two fread statement before the if block.
Few points for your consideration:
1. The return type of ftell long int so it is better to declare fileSize as long int (as sizeof(int) <= sizeof(long)).
2. It is a better practice in C not to typecast the return value of malloc. Also you can probably get rid of sizeof(char) when using in malloc.
3. fread advances the file stream thus after the first fread call the file stream pointer has advanced by the size of the file as dictated by fileSize. Thus the second fread immediately after that will fail to read anything (assuming the first one succeeded). This is the reason why you are seeing the behavior mentioned in your program. You need to reset the file stream pointer using rewind before the second call to fread. Also you can check the return value of fread which is the number of bytes successfully read to check how many bytes were actually read successfully. Try something on these lines:
size_t bytes_read;
bytes_read = fread(unPacked, 1, fileSize, fp);
/* some check or print of bytes read successfully if needed */
/* Reset fp if fread was successfully to load file in memory pointed by originalUnPacked */
rewind(fp);
bytes_read = fread(originalUnPacked, 1, fileSize, fp);
/* some check or print of bytes read successfully if needed */
/* memcmp etc */
4. It may be a good idea to check for the return values of fopen, malloc etc against failure i.e. NULL check in case of fopen & malloc.
Hope this helps!
The memory allocated with malloc is not pre-initialized, so its contents are random and thus almost certainly different for the two allocations.
The expected (probabilistically speaking, "certain") result is exactly what happens.
Did you mean to load the file into both of these buffers before testing with memcmp but forgot to do so?