This question already has answers here:
reading a text file into an array in c
(3 answers)
Closed 8 years ago.
I am trying to read 7M data from a file but it is failing. When I googled,I found that there is no limit for reading data.
My code given below is failing with segmentation fault.
char *buf = malloc(7008991);
FILE *fp = fopen("35mb.txt", "rb");
long long i = 0;
long long j = 0;
while(fgets(buf+i, 1024, fp)) {
i+=strlen(buf);
if(i==7008991)break;
}
printf("read done");
printf("ch=%s\n", buf);
Need some help
If you want to read the content of a large file into memory, you may:
1. actually read it
2. mmap it.
I'll cover how to actually read it, and assume using binary mode and no text-mode mess.
FILE* fp;
// Open the file
fp = fopen ("35mb.txt", "rb");
if ( fp == NULL ) return -1; // Fail
// Get file length, there are many use to do this like fstat
// TODO: check failure
fseek ( fp, 0, SEEK_END );
flen = ftell ( fp );
fseek ( fp, 0, SEEK_SET );
if ( fread ( buffer, flen, 1, fp ) != 1 ) {
// Fail
}
fclose ( fp );
There are a few things that could go wrong here.
Firstly, no this line, memory allocation can fail. (Malloc can return a NULL pointer, you should check this. (You should also check that the file opened without error.)
char *buf = malloc(7008991);
Next, in the loop. Remember that fgets reads one line, regardless of how long that is, up to a maximum of 1024-1 bytes (and appends a null-character). Please not that for binary input, using fread is probably more appropriate.
while(fgets(buf+i, 1024, fp)) {
After that, this is a good line, as you really do not know how long a line is.
i+=strlen(buf);
This line however is probably why you are failing.
if(i==7008991)break;
You are requireing the size to be exactly 77008991 bytes long to break. That is rather unlikely unless you are very very sure about the formatting of your file. This line should probably read if ( i >= 7008991 ) break;
You should probably replace your explicit size with a named constant as well.
Most probably the size of your file is exactly 7008991 bytes. But when you read the file with fgets you ask to write at most 1024 bytes. This is not true when you reach the end of the file. Suppose you already read 7008990 bytes, then you should call fgets with: fgets(buf+i, 1, fp) because your buffer has got no more than one byte left.
Another issue is that you want to print the buffer at the end of your program. For this to work your buffer must be NUL terminated. So you need to allocate one more byte than the file size. fgets will automatically append the NUL byte.
Yet another issue is the way you increment your counter: i += strlen(buf) this is wrong, the correct code is: i = strlen(buf)
All of this assume there is no NUL bytes in your code. As already explained in comments, it is wiser to use fgets only when dealing with text files. When reading binary files you'd better use fread.
The corrected code would be:
unsigned long FILE_SIZE = 7008991+1;
char *buf = malloc(FILE_SIZE);
FILE *fp = fopen("35mb.txt", "rb");
long long i = 0;
long long j = 0;
while(fgets(buf+i, FILE_SIZE-i, fp)) {
i = strlen(buf);
if(i==7008991)break;
}
printf("read done");
printf("ch=%s\n", buf);
Related
How to read text from a file into a dynamic array of characters?
I found a way to count the number of characters in a file and create a dynamic array, but I can't figure out how to assign characters to the elements of the array?
FILE *text;
char* Str;
int count = 0;
char c;
text = fopen("text.txt", "r");
while(c = (fgetc(text))!= EOF)
{
count ++;
}
Str = (char*)malloc(count * sizeof(char));
fclose(text);
There is no portable, standard-conforming way in C to know in advance how may bytes may be read from a FILE stream.
First, the stream might not even be seekable - it can be a pipe or a terminal or even a socket connection. On such streams, once you read the input it's gone, never to be read again. You can push back one char value, but that's not enough to be able to know how much data remains to be read, or to reread the entire stream.
And even if the stream is to a file that you can seek on, you can't use fseek()/ftell() in portable, strictly-conforming C code to know how big the file is.
If it's a binary stream, you can not use fseek() to seek to the end of the file - that's explicitly undefined behavior per the C standard:
... A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
Footnote 268 even says:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream ...
So you can't portably use fseek() in a binary stream.
And you can't use ftell() to get a byte count for a text stream. Per the C standard again:
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
Systems do exist where the value returned from ftell() is nothing like a byte count.
The only portable, conforming way to know how many bytes you can read from a stream is to actually read them, and you can't rely on being able to read them again.
If you want to read the entire stream into memory, you have to continually reallocate memory, or use some other dynamic scheme.
This is a very inefficient but portable and strictly-conforming way to read the entire contents of a stream into memory (all error checking and header files are omitted for algorithm clarity and to keep the vertical scrollbar from appearing - it really needs error checking and will need the proper header files):
// get input stream with `fopen()` or some other manner
FILE *input = ...
size_t count = 0;
char *data = NULL;
for ( ;; )
{
int c = fgetc( input );
if ( c == EOF )
{
break;
}
data = realloc( data, count + 1 );
data[ count ] = c;
count++;
}
// optional - terminate the data with a '\0'
// to treat the data as a C-style string
data = realloc( data, count + 1 );
data[ count ] = '\0';
count++;
That will work no matter what the stream is.
On a POSIX-style system such as Linux, you can use fileno() and fstat() to get the size of a file (again, all error checking and header files are omitted):
char *data = NULL;
FILE *input = ...
int fd = fileno( input );
struct stat sb;
fstat( fd, &sb );
if ( S_ISREG( sb.st_mode ) )
{
// sb.st_size + 1 for C-style string
char *data = malloc( sb.st_size + 1 );
data[ sb.st_size ] = '\0';
}
// now if data is not NULL you can read into the buffer data points to
// if data is NULL, see above code to read char-by-char
// this tries to read the entire stream in one call to fread()
// there are a lot of other ways to do this
size_t totalRead = 0;
while ( totalRead < sb.st_size )
{
size_t bytesRead = fread( data + totalRead, 1, sb.st_size - totalRead, input );
totalRead += bytesRead;
}
The above could should work on Windows, too. You may get some compiler warnings or have to use _fileno(), _fstat() and struct _stat instead, too.*
You may also need to define the S_ISREG() macro on Windows:
#define S_ISREG(m) (((m) & S_IFMT) == S_IFREG)
* that's _fileno(), _fstat(), and struct _stat without the hyperlink underline-munge.
For a binary file, you can use fseek and ftell to know the size without reading the file, allocate the memory and then read everything:
...
text = fopen("text.txt", "r");
fseek(txt, 0, SEEK_END);
char *ix = Str = malloc(ftell(txt);
while(c = (fgetc(text))!= EOF)
{
ix++ = c;
}
count = ix - Str; // get the exact count...
...
For a text file, on a system that has a multi-byte end of line (like Windows which uses \r\n), this will allocate more bytes than required. You could of course scan the file twice, first time for the size and second for actually reading the characters, but you can also just ignore the additional bytes, or you could realloc:
...
count = ix - Str;
Str = realloc(Str, count);
...
Of course for a real world program, you should control the return values of all io and allocation functions: fopen, fseek, fteel, malloc and realloc...
To just do what you asked for, you would have to read the whole file again:
...
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
// do stuff with it
// ...
fclose(text);
But your string is not null terminated this way. That will get you in some trouble if you try to use some common string functions like strlen.
To properly null terminate your string you would have to allocate space for one additional character and set that last one to '\0':
...
// allocate count + 1 (for the null terminator)
Str = (char*)malloc((count + 1) * sizeof(char));
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
// add null terminator
Str[count] = '\0';
// do stuff with it
// ...
fclose(text);
Now if you want know the number of characters in the file without counting them one by one, you could get that number in a more efficient way:
...
text = fopen("text.txt", "r");
// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
count = ftell(text)
// allocate count + 1 (for the null terminator)
Str = (char*)malloc((count + 1) * sizeof(char));
...
Now bring this in a more structured form:
// open file
FILE *text = fopen("text.txt", "r");
// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
ssize_t count = ftell(text)
// allocate count + 1 (for the null terminator)
char* Str = (char*)malloc((count + 1) * sizeof(char));
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
printf("woops - something bad happened\n");
}
fclose(text);
// add null terminator
Str[count] = '\0';
// do stuff with it
// ...
Edit:
As Andrew Henle pointed out not every FILE stream is seekable and you can't even rely on being able to read the file again (or that the file has the same length/content when reading it again). Even though this is the accepted answer, if you don't know in advance what kind of file stream you're dealing with, his solution is definitely the way to go.
I want to get the a multiline text file's content, and put it back to the file.
However, I have an issue with the file length.
The null terminator (0) that I add is after some strange characters.
Something wrong with my f_length ?
Edit : If I set the line-endings of my file to Unix (LF), I don't have the issue. So it seems that my code is incompatible with Windows line endings. How can I account for Windows text files ?
int main()
{
FILE *fp = NULL;
int f_length;
char *buffer = NULL;
size_t size = 0;
fp = fopen(FILENAME, "r+");
fseek(fp, 0, SEEK_END);
f_length = ftell(fp);
rewind(fp);
buffer = malloc((f_length + 1) * sizeof(*buffer));
fread(buffer, f_length, 1, fp);
buffer[f_length] = 0;
printf("%s\n", buffer);
fp = fopen(FILENAME, "w+");
fputs(buffer, fp);
fclose(fp);
return 0;
}
Use fp = fopen(FILENAME, "rb+"); instead. For text files, you'll have newline characters replaced while reading (you've already noticed that in comments). In some cases, new format is shorter ("\r" or "\n" while the file itself contains "\r\n"), so f_length will be bigger than the actual data read.
Or you can use line-by-line reading functions, they are made for text-mode files.
You are assuming the size of the file on disk is going to be equal to the number of bytes you will read. That is a valid assumption for a clean, binary file. It is not a valid assumption for a text file.
I'd suggest using the return value from fread instead of f_length as it reports the number of objects you actually read after any required read processing. You'll need to adjust your fread parameters to read 1-byte sized objects.
regarding:
buffer[f_length] = 0;
This is placing the '0' way too far into the buffer. this is why you see garbage characters. Much better to capture the returned value from the call to fread() and set the '0' using:
buffer[ <returnedValue> ] = '\0';
What is the difference between fread and fgets when reading in from a file?
I use the same fwrite statement, however when I use fgets to read in a .txt file it works as intended, but when I use fread() it does not.
I've switched from fgets/fputs to fread/fwrite when reading from and to a file. I've used fopen(rb/wb) to read in binary rather than standard characters. I understand that fread will get /0 Null bytes as well rather than just single lines.
//while (fgets(buff,1023,fpinput) != NULL) //read in from file
while (fread(buff, 1, 1023, fpinput) != 0) // read from file
I expect to read in from a file to a buffer, put the buffer in shared memory, and then have another process read from shared memory and write to a new file.
When I use fgets() it works as intended with .txt files, but when using fread it adds a single line from 300~ characters into the buffer with a new line. Can't for the life of me figure out why.
fgets will stop when encountering a newline. fread does not. So fgets is typically only useful for text files, while fread can be used for both text and binary files.
From the C11 standard:
7.21.7.2 The fgets function
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
7.21.8.1 The fread function
The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.
This snippet maybe will make things clearer for you. It just copies a file in chunks.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char ** argv)
{
if(argc != 3) {
printf("Usage: ./a.out src dst\n");
printf("Copies file src to dst\n");
exit(EXIT_SUCCESS);
}
const size_t chunk_size = 1024;
FILE *in, *out;
if(! (in = fopen(argv[1], "rb"))) exit(EXIT_FAILURE);
if(! (out = fopen(argv[2], "wb"))) exit(EXIT_FAILURE);
char * buffer;
if(! (buffer = malloc(chunk_size))) exit(EXIT_FAILURE);
size_t bytes_read;
do {
// fread returns the number of successfully read elements
bytes_read = fread(buffer, 1, chunk_size, in);
/* Insert any modifications you may */
/* want to do here */
// write bytes_read bytes from buffer to output file
if(fwrite(buffer, 1, bytes_read, out) != bytes_read) exit(EXIT_FAILURE);
// When we read less than chunk_size we are either done or an error has
// occured. This error is not handled in this program.
} while(bytes_read == chunk_size);
free(buffer);
fclose(out);
fclose(in);
}
You mentioned in a comment below that you wanted to use this for byteswapping. Well, you can just use the following snippet. Just insert it where indicated in code above.
for(int i=0; i < bytes_read - bytes_read%2; i+=2) {
char tmp = buffer[i];
buffer[i] = buffer[i+1];
buffer[i+1] = tmp;
}
I am attempting to read a file into a character array, but when I try to pass in a value for MAXBYTES of 100 (the arguments are FUNCTION FILENAME MAXBYTES), the length of the string array is 7.
FILE * fin = fopen(argv[1], "r");
if (fin == NULL) {
printf("Error opening file \"%s\"\n", argv[1]);
return EXIT_SUCCESS;
}
int readSize;
//get file size
fseek(fin, 0L, SEEK_END);
int fileSize = ftell(fin);
fseek(fin, 0L, SEEK_SET);
if (argc < 3) {
readSize = fileSize;
} else {
readSize = atof(argv[2]);
}
char *p = malloc(fileSize);
fread(p, 1, readSize, fin);
int length = strlen(p);
filedump(p, length);
As you can see, the memory allocation for p is always equal to filesize. When I use fread, I am trying to read in the 100 bytes (readSize is set to 100 as it should be) and store them in p. However, strlen(p) results in 7 during if I pass in that argument. Am I using fread wrong, or is there something else going on?
Thanks
That is the limitation with attempting to read text with fread. There is nothing wrong with doing so, but you must know whether the file contains something other than ASCII characters (such as the nul-character) and you certainly cannot treat any part of the buffer as a string until you manually nul-terminate it at some point.
fread does not guarantee the buffer will contain a nul-terminating character at all -- and it doesn't guarantee that the first character read will not be the nul-character.
Again, there is nothing wrong with reading an entire file into an allocated buffer. That's quite common, you just cannot treat what you have read as a string. That is a further reason why there are character oriented, formatted, and line oriented input functions. (getchar, fgetc, fscanf, fgets and POSIX getline, to list a few). The formatted and line oriented functions guarantee a nul-terminated buffer, otherwise, you are on your own to account for what you have read, and insure you nul-terminate your buffer -- before treating it as a string.
#define "/local/home/..."
FILE *fp;
short *originalUnPacked;
short *unPacked;
int fileSize;
fp = fopen(FILENAME, "r");
fseek (fp , 0 , SEEK_END);
fileSize = ftell (fp);
rewind (fp);
originalUnPacked = (short*) malloc (sizeof(char)*fileSize);
unPacked = (short*) malloc (sizeof(char)*fileSize);
fread(unPacked, 1, fileSize, fp);
fread(originalUnPacked, 1, fileSize, fp);
if( memcmp( unPacked, originalUnPacked, fileSize) == 0)
{
print (" unpacked and original unpacked equal ") // Not happens
}
My little knowldege of C says that the print statement in the last if block should be printed but it doesnt, any ideas Why ??
Just to add more clarity and show you the complete code i have added a define statement and two fread statement before the if block.
Few points for your consideration:
1. The return type of ftell long int so it is better to declare fileSize as long int (as sizeof(int) <= sizeof(long)).
2. It is a better practice in C not to typecast the return value of malloc. Also you can probably get rid of sizeof(char) when using in malloc.
3. fread advances the file stream thus after the first fread call the file stream pointer has advanced by the size of the file as dictated by fileSize. Thus the second fread immediately after that will fail to read anything (assuming the first one succeeded). This is the reason why you are seeing the behavior mentioned in your program. You need to reset the file stream pointer using rewind before the second call to fread. Also you can check the return value of fread which is the number of bytes successfully read to check how many bytes were actually read successfully. Try something on these lines:
size_t bytes_read;
bytes_read = fread(unPacked, 1, fileSize, fp);
/* some check or print of bytes read successfully if needed */
/* Reset fp if fread was successfully to load file in memory pointed by originalUnPacked */
rewind(fp);
bytes_read = fread(originalUnPacked, 1, fileSize, fp);
/* some check or print of bytes read successfully if needed */
/* memcmp etc */
4. It may be a good idea to check for the return values of fopen, malloc etc against failure i.e. NULL check in case of fopen & malloc.
Hope this helps!
The memory allocated with malloc is not pre-initialized, so its contents are random and thus almost certainly different for the two allocations.
The expected (probabilistically speaking, "certain") result is exactly what happens.
Did you mean to load the file into both of these buffers before testing with memcmp but forgot to do so?