Length of character array after fread is smaller than expected

Length of character array after fread is smaller than expected - c

I am attempting to read a file into a character array, but when I try to pass in a value for MAXBYTES of 100 (the arguments are FUNCTION FILENAME MAXBYTES), the length of the string array is 7.
FILE * fin = fopen(argv[1], "r");
if (fin == NULL) {
printf("Error opening file \"%s\"\n", argv[1]);
return EXIT_SUCCESS;
}
int readSize;
//get file size
fseek(fin, 0L, SEEK_END);
int fileSize = ftell(fin);
fseek(fin, 0L, SEEK_SET);
if (argc < 3) {
readSize = fileSize;
} else {
readSize = atof(argv[2]);
}
char *p = malloc(fileSize);
fread(p, 1, readSize, fin);
int length = strlen(p);
filedump(p, length);
As you can see, the memory allocation for p is always equal to filesize. When I use fread, I am trying to read in the 100 bytes (readSize is set to 100 as it should be) and store them in p. However, strlen(p) results in 7 during if I pass in that argument. Am I using fread wrong, or is there something else going on?
Thanks

That is the limitation with attempting to read text with fread. There is nothing wrong with doing so, but you must know whether the file contains something other than ASCII characters (such as the nul-character) and you certainly cannot treat any part of the buffer as a string until you manually nul-terminate it at some point.
fread does not guarantee the buffer will contain a nul-terminating character at all -- and it doesn't guarantee that the first character read will not be the nul-character.
Again, there is nothing wrong with reading an entire file into an allocated buffer. That's quite common, you just cannot treat what you have read as a string. That is a further reason why there are character oriented, formatted, and line oriented input functions. (getchar, fgetc, fscanf, fgets and POSIX getline, to list a few). The formatted and line oriented functions guarantee a nul-terminated buffer, otherwise, you are on your own to account for what you have read, and insure you nul-terminate your buffer -- before treating it as a string.

Related

character by character reading from a file in C

How to read text from a file into a dynamic array of characters?
I found a way to count the number of characters in a file and create a dynamic array, but I can't figure out how to assign characters to the elements of the array?
FILE *text;
char* Str;
int count = 0;
char c;
text = fopen("text.txt", "r");
while(c = (fgetc(text))!= EOF)
{
count ++;
}
Str = (char*)malloc(count * sizeof(char));
fclose(text);

There is no portable, standard-conforming way in C to know in advance how may bytes may be read from a FILE stream.
First, the stream might not even be seekable - it can be a pipe or a terminal or even a socket connection. On such streams, once you read the input it's gone, never to be read again. You can push back one char value, but that's not enough to be able to know how much data remains to be read, or to reread the entire stream.
And even if the stream is to a file that you can seek on, you can't use fseek()/ftell() in portable, strictly-conforming C code to know how big the file is.
If it's a binary stream, you can not use fseek() to seek to the end of the file - that's explicitly undefined behavior per the C standard:
... A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
Footnote 268 even says:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream ...
So you can't portably use fseek() in a binary stream.
And you can't use ftell() to get a byte count for a text stream. Per the C standard again:
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
Systems do exist where the value returned from ftell() is nothing like a byte count.
The only portable, conforming way to know how many bytes you can read from a stream is to actually read them, and you can't rely on being able to read them again.
If you want to read the entire stream into memory, you have to continually reallocate memory, or use some other dynamic scheme.
This is a very inefficient but portable and strictly-conforming way to read the entire contents of a stream into memory (all error checking and header files are omitted for algorithm clarity and to keep the vertical scrollbar from appearing - it really needs error checking and will need the proper header files):
// get input stream with `fopen()` or some other manner
FILE *input = ...
size_t count = 0;
char *data = NULL;
for ( ;; )
{
int c = fgetc( input );
if ( c == EOF )
{
break;
}
data = realloc( data, count + 1 );
data[ count ] = c;
count++;
}
// optional - terminate the data with a '\0'
// to treat the data as a C-style string
data = realloc( data, count + 1 );
data[ count ] = '\0';
count++;
That will work no matter what the stream is.
On a POSIX-style system such as Linux, you can use fileno() and fstat() to get the size of a file (again, all error checking and header files are omitted):
char *data = NULL;
FILE *input = ...
int fd = fileno( input );
struct stat sb;
fstat( fd, &sb );
if ( S_ISREG( sb.st_mode ) )
{
// sb.st_size + 1 for C-style string
char *data = malloc( sb.st_size + 1 );
data[ sb.st_size ] = '\0';
}
// now if data is not NULL you can read into the buffer data points to
// if data is NULL, see above code to read char-by-char
// this tries to read the entire stream in one call to fread()
// there are a lot of other ways to do this
size_t totalRead = 0;
while ( totalRead < sb.st_size )
{
size_t bytesRead = fread( data + totalRead, 1, sb.st_size - totalRead, input );
totalRead += bytesRead;
}
The above could should work on Windows, too. You may get some compiler warnings or have to use _fileno(), _fstat() and struct _stat instead, too.*
You may also need to define the S_ISREG() macro on Windows:
#define S_ISREG(m) (((m) & S_IFMT) == S_IFREG)
* that's _fileno(), _fstat(), and struct _stat without the hyperlink underline-munge.

For a binary file, you can use fseek and ftell to know the size without reading the file, allocate the memory and then read everything:
...
text = fopen("text.txt", "r");
fseek(txt, 0, SEEK_END);
char *ix = Str = malloc(ftell(txt);
while(c = (fgetc(text))!= EOF)
{
ix++ = c;
}
count = ix - Str; // get the exact count...
...
For a text file, on a system that has a multi-byte end of line (like Windows which uses \r\n), this will allocate more bytes than required. You could of course scan the file twice, first time for the size and second for actually reading the characters, but you can also just ignore the additional bytes, or you could realloc:
...
count = ix - Str;
Str = realloc(Str, count);
...
Of course for a real world program, you should control the return values of all io and allocation functions: fopen, fseek, fteel, malloc and realloc...

To just do what you asked for, you would have to read the whole file again:
...
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
// do stuff with it
// ...
fclose(text);
But your string is not null terminated this way. That will get you in some trouble if you try to use some common string functions like strlen.
To properly null terminate your string you would have to allocate space for one additional character and set that last one to '\0':
...
// allocate count + 1 (for the null terminator)
Str = (char*)malloc((count + 1) * sizeof(char));
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
// add null terminator
Str[count] = '\0';
// do stuff with it
// ...
fclose(text);
Now if you want know the number of characters in the file without counting them one by one, you could get that number in a more efficient way:
...
text = fopen("text.txt", "r");
// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
count = ftell(text)
// allocate count + 1 (for the null terminator)
Str = (char*)malloc((count + 1) * sizeof(char));
...
Now bring this in a more structured form:
// open file
FILE *text = fopen("text.txt", "r");
// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
ssize_t count = ftell(text)
// allocate count + 1 (for the null terminator)
char* Str = (char*)malloc((count + 1) * sizeof(char));
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
fclose(text);
// add null terminator
Str[count] = '\0';
// do stuff with it
// ...
Edit:
As Andrew Henle pointed out not every FILE stream is seekable and you can't even rely on being able to read the file again (or that the file has the same length/content when reading it again). Even though this is the accepted answer, if you don't know in advance what kind of file stream you're dealing with, his solution is definitely the way to go.

How to read a complete file with scanf maybe something like %[^\EOF] without loop in single statement

I want to know if I can read a complete file with single scanf statement. I read it with below code.
#include<stdio.h>
int main()
{
FILE * fp;
char arr[200],fmt[6]="%[^";
fp = fopen("testPrintf.c","r");
fmt[3] = EOF;
fmt[4] = ']';
fmt[5] = '\0';
fscanf(fp,fmt,arr);
printf("%s",arr);
printf("%d",EOF);
return 0;
}
And it resulted into a statement after everything happened
"* * * stack smashing detected * * *: terminated
Aborted (core dumped)"
Interestingly, printf("%s",arr); worked but printf("%d",EOF); is not showing its output.
Can you let me know what has happened when I tried to read upto EOF with scanf?

If you really, really must (ab)use fscanf() into reading the file, then this outlines how you could do it:
open the file
use fseek() and
ftell() to find the size of the file
rewind() (or fseek(fp, 0, SEEK_SET)) to reset the file to the start
allocate a big buffer
create a format string that reads the correct number of bytes into the buffer and records how many characters are read
use the format with fscanf()
add a null terminating byte in the space reserved for it
print the file contents as a big string.
If there are no null bytes in the file, you'll see the file contents printed. If there are null bytes in the file, you'll see the file contents up to the first null byte.
I chose the anodyne name data for the file to be read — there are endless ways you can make that selectable at runtime.
There are a few assumptions made about the size of the file (primarily that the size isn't bigger than can be fitted into a long with signed overflow, and that it isn't empty). It uses the fact that the %c format can accept a length, just like most of the formats can, and it doesn't add a null terminator at the end of the string it reads and it doesn't fuss about whether the characters read are null bytes or anything else — it just reads them. It also uses the fact that you can specify the size of the variable to hold the offset with the %n (or, in this case, the %ln) conversion specification. And finally, it assumes that the file is not shrinking (it will ignore growth if it is growing), and that it is a seekable file, not a FIFO or some other special file type that does not support seeking.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
const char filename[] = "data";
FILE *fp = fopen(filename, "r");
if (fp == NULL)
{
fprintf(stderr, "Failed to open file %s for reading\n", filename);
exit(EXIT_FAILURE);
}
fseek(fp, 0, SEEK_END);
long length = ftell(fp);
rewind(fp);
char *buffer = malloc(length + 1);
if (buffer == NULL)
{
fprintf(stderr, "Failed to allocate %ld bytes\n", length + 1);
exit(EXIT_FAILURE);
}
char format[32];
snprintf(format, sizeof(format), "%%%ldc%%ln", length);
long nbytes = 0;
if (fscanf(fp, format, buffer, &nbytes) != 1 || nbytes != length)
{
fprintf(stderr, "Failed to read %ld bytes (got %ld)\n", length, nbytes);
exit(EXIT_FAILURE);
}
buffer[length] = '\0';
printf("<<<SOF>>\n%s\n<<EOF>>\n", buffer);
free(buffer);
return(0);
}
This is still an abuse of fscanf() — it would be better to use fread():
if (fread(buffer, sizeof(char), length, fp) != (size_t)length)
{
fprintf(stderr, "Failed to read %ld bytes\n", length);
exit(EXIT_FAILURE);
}
You can then omit the variable format and the code that sets it, and also nbytes. Or you can keep nbytes (maybe as a size_t instead of long) and assign the result of fread() to it, and use the value in the error report, along the lines of the test in the fscanf() variant.
You might get warnings from GCC about a non-literal format string for fscanf(). It's correct, but this isn't dangerous because the programmer is completely in charge of the content of the format string.

Difference between specifications of fread and fgets?

What is the difference between fread and fgets when reading in from a file?
I use the same fwrite statement, however when I use fgets to read in a .txt file it works as intended, but when I use fread() it does not.
I've switched from fgets/fputs to fread/fwrite when reading from and to a file. I've used fopen(rb/wb) to read in binary rather than standard characters. I understand that fread will get /0 Null bytes as well rather than just single lines.
//while (fgets(buff,1023,fpinput) != NULL) //read in from file
while (fread(buff, 1, 1023, fpinput) != 0) // read from file
I expect to read in from a file to a buffer, put the buffer in shared memory, and then have another process read from shared memory and write to a new file.
When I use fgets() it works as intended with .txt files, but when using fread it adds a single line from 300~ characters into the buffer with a new line. Can't for the life of me figure out why.

fgets will stop when encountering a newline. fread does not. So fgets is typically only useful for text files, while fread can be used for both text and binary files.
From the C11 standard:
7.21.7.2 The fgets function
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
7.21.8.1 The fread function
The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.
This snippet maybe will make things clearer for you. It just copies a file in chunks.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char ** argv)
{
if(argc != 3) {
printf("Usage: ./a.out src dst\n");
printf("Copies file src to dst\n");
exit(EXIT_SUCCESS);
}
const size_t chunk_size = 1024;
FILE *in, *out;
if(! (in = fopen(argv[1], "rb"))) exit(EXIT_FAILURE);
if(! (out = fopen(argv[2], "wb"))) exit(EXIT_FAILURE);
char * buffer;
if(! (buffer = malloc(chunk_size))) exit(EXIT_FAILURE);
size_t bytes_read;
do {
// fread returns the number of successfully read elements
bytes_read = fread(buffer, 1, chunk_size, in);
/* Insert any modifications you may */
/* want to do here */
// write bytes_read bytes from buffer to output file
if(fwrite(buffer, 1, bytes_read, out) != bytes_read) exit(EXIT_FAILURE);
// When we read less than chunk_size we are either done or an error has
// occured. This error is not handled in this program.
} while(bytes_read == chunk_size);
free(buffer);
fclose(out);
fclose(in);
}
You mentioned in a comment below that you wanted to use this for byteswapping. Well, you can just use the following snippet. Just insert it where indicated in code above.
for(int i=0; i < bytes_read - bytes_read%2; i+=2) {
char tmp = buffer[i];
buffer[i] = buffer[i+1];
buffer[i+1] = tmp;
}

create a binary file in C

I'm currently working on a binary file creation. Here is what I have tried.
Example 1:
#include<stdio.h>
int main() {
/* Create the file */
int a = 5;
FILE *fp = fopen ("file.bin", "wb");
if (fp == NULL)
return -1;
fwrite (&a, sizeof (a), 1, fp);
fclose (fp);
}
return 0;
}
Example 2:
#include <stdio.h>
#include <string.h>
int main()
{
FILE *fp;
char str[256] = {'\0'};
strcpy(str, "3aae71a74243fb7a2bb9b594c9ea3ab4");
fp = fopen("file.bin", "wb");
if(fp == NULL)
return -1;
fwrite(str, sizeof str, 1, fp);
return 0;
}
Example 1 gives the right output in binary form. But Example 2 where I'm passing string doesn't give me right output. It writes the input string which I have given into the file and appends some data(binary form).
I don't understand and I'm unable to figure it out what mistake I'm doing.

The problem is that sizeof str is 256, that is, the entire size of the locally declared character array. However, the data you are storing in it does not require all 256 characters. The result is that the write operation writes all the characters of the string plus whatever garbage happened to be in the character array already. Try the following line as a fix:
fwrite(str, strlen(str), 1, fp);

C strings are null terminated, meaning that anything after the '\0' character must be ignored. If you read the file written by Example 2 into a str[256] and print it out using printf("%s", str), you would get the original string back with no extra characters, because null terminator would be read into the buffer as well, providing proper termination for the string.
The reason you get the extra "garbage" in the output is that fwrite does not interpret str[] array as a C string. It interprets it as a buffer of size 256. Text editors do not interpret null character as a terminator, so random characters from str get written to the file.
If you want the string written to the file to end at the last valid character, use strlen(str) for the size in the call of fwrite.

getting string from file

Why is the output of this code is some random words in memory?
void conc()
{
FILE *source = fopen("c.txt", "r+");
if(!source)
{
printf("Ficheiro não encontrado");
return;
}
short i = 0;
while(fgetc(source) != EOF)
i++;
char tmp_str[i];
fgets(tmp_str, i, source);
fclose(source);
printf("%s", tmp_str);
}
This should give me the content of the file, I think.

Because after you have walked through the file using fgetc(), then the position indicator is at end-of-file. fgets() has nothing to read. You need to reset it so that it points to the beginning using rewind(source);.
By the way, don't loop through the file using fgetc(), that's an extremely ugly solution. use fseek() and ftell() or lseek() instead to get the size of the file:
fseek(source, SEEK_END, 0);
long size = ftell(source);
fseek(source, SEEK_SET, 0); // or rewind(source);
alternative:
off_t size = lseek(source, SEEK_END, 0);
rewind(source);

Use rewind(source); before fgets(tmp_str, i, source);

After your fgetc() - loop you have reached EOF and if you don't fseek( source, 0l, SEEK_SET ) back to the beginning you won't get any more data.
Anyway you should avoid reading the file twice. Use fstat( fileno(source), ... ) instead to determine the file size.

fgetc reads a character from the stream.
fgets reads a string from the stream.
Now in your code you are iterating through the end of the file. So the call to fgets on the stream will simply return NULL and leave the buffer content unchanged. In your case, your buffer is not initialised. This explains the random values you are seeing.
Instead of reading the complete file content with fgetc to get the character count, I recommend using fseek / ftell (see answer from this thread)

Your code is wrong. As was said before:
You should not read file twice
To allocate array dynamically you must use operator new (in c++)
or function malloc (in c)
If you need code to read content of the file, try next (sorry, but I didn't compile it. anyway it should work well):
FILE* source = fopen("c.txt", "r+b");
if(!source){return;}
fseek(source, 0, SEEK_END);
size_t filesize = ftell(source);
fseek(source, 0, SEEK_SET);
char* buf = new char[filesize+1]; // +1 is for '/0'
fread(buf, sizeof(char), filesize, source);
fclose(source);
buf[filesize]=0;
printf("%s", buf);
delete buf;

Each time you call fgetc() you advance the internal file pointer one character further. At the end of your while() loop the file pointer will then be at the end of the file. Subsequent calls intented to read on the file handle will fail with an EOF condition.
The fgets manual says that :
If the end-of-file is encountered while attempting to read a character, the eof indicator is set (feof). If this happens before any characters could be read, the pointer returned is a null pointer (and the contents of str remain unchanged).
The consequence is that tmp_str is left unchanged. The garbage you get back when you call printf is actually a part of the conc() function stack.
A solution to your problem would be rewinding the file pointer with fseek() just before calling fgets().
fseek(source, 0, SEEK_SET);
Then a better way to get the size of your file would be to fseek to the end of the file, and use ftell to get the current position :
long size;
fseek(source, 0, SEEK_END);
size = ftell(source);
This being said, your code still has a problem. When you allocate on the stack (variables local to a function) you have to tell the size of the variable at compile time. Here your compiler allocates a char array of length 0. I suggest you investigate dynamic allocation with malloc of the keyword new if you're coding in C++.
A proper allocation would look like this :
char *tmp_str = malloc(size);
// Here you read the file
free(tmp_str);
An simpler solution could be to preallocate a string large enough to hold your file.
char tmp_str[1024 * 100]; // 100Kb
Then use the size variable we got earlier to check the file will fit in tmp_str before reading.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Length of character array after fread is smaller than expected - c

Related

character by character reading from a file in C

How to read a complete file with scanf maybe something like %[^\EOF] without loop in single statement

Difference between specifications of fread and fgets?

create a binary file in C

getting string from file

Categories

Resources