I am attempting to write a bittorrent client. In order to parse the file etc. I need to read a torrent file into memory. I have noticed that fread is not reading the entire file into my buffer. After further investigation it appears that whenever the symbol shown below is encountered in the file, fread stops reading the file. Calling the feof function on the FILE* pointer returns 16 indicating that the end of file has been reached. This occurs no matter where the symbol is placed. Can somebody explain why this happens and any solutions that may work.
The symbol is highlighted below:
Here is the code that does the read operation:
char *read_file(const char *file, long long *len){
struct stat st;
char *ret = NULL;
FILE *fp;
//store the size/length of the file
if(stat(file, &st)){
return ret;
}
*len = st.st_size;
//open a stream to the specified file
fp = fopen(file, "r");
if(!fp){
return ret;
}
//allocate space in the buffer for the file
ret = (char*)malloc(*len);
if(!ret){
return NULL;
}
//Break down the call to fread into smaller chunks
//to account for a known bug which causes fread to
//behave strangely with large files
//Read the file into the buffer
//fread(ret, 1, *len, fp);
if(*len > 10000){
char *retTemp = NULL;
retTemp = ret;
int remaining = *len;
int read = 0, error = 0;
while(remaining > 1000){
read = fread(retTemp, 1, 1000, fp);
if(read < 1000){
error = feof(fp);
if(error != 0){
printf("Error: %d\n", error);
}
}
retTemp += 1000;
remaining -= 1000;
}
fread(retTemp, 1, remaining, fp);
} else {
fread(ret, 1, *len, fp);
}
//cleanup by closing the file stream
fclose(fp);
return ret;
}
Thank you for your time :)
Your question is oddly relevant as I recently ran into this problem in an application here at work last week!
The ASCII value of this character is decimal 26 (0x1A, \SUB, SUBSTITUTE). This is used to represent the CTRL+Z key sequence or an End-of-File marker.
Change your fopen mode ("In [Text] mode, CTRL+Z is interpreted as an end-of-file character on input.") to get around this on Windows:
fp = fopen(file, "rb"); /* b for 'binary', disables Text-mode translations */
You should open the file in binary mode. Some platforms, in text (default) mode, interpret some bytes as being physical end of file markers.
You're opening the file in text rather than raw/binary mode - the arrow is ASCII for EOF. Specify "rb" rather than just "r" for your fopen call.
Related
I have a c code, simply reads a line from a txt file. The file has only one line which is as below:
The code snippet to read this line is:
int** readFile(char* filename){
int col=0, row =0;
int i=0;
int* numList[2048];
for(int i = 0; i<2048; i++)
numList[i] = (int*) malloc(6*sizeof(int));
if(NULL == numList){
printf("Memory error!");
}
char * token = NULL;
char currentLine[25] = {'\0'};
FILE* file = fopen(filename, "r");
if(NULL != file){
printf("File is opened successfully\n");
if( NULL != fgets (currentLine, 60, file) )
{
int i = 0;
while (NULL != currentLine[i]){
printf("%d ", currentLine[i]);
i++;
}
}
}
else
{
printf("File I/O Error");
return NULL;
}
fclose(file);
return numList;
}
When this code runs, I get the following output:
I observed something suspicious, which is, as you can see in the first screenshot (Content of txt file), Notepad++ shows CR LF at the end of the line. But in the output, I see 10 as the last character which is LF.
Probably I am missing a very primitive point but, I couldn't understand why CR character is not there.
Needless to say, platform is windows and this is a console program.
Thanks&Regards.
You're opening the file in text mode. This mode ensures you can handle text files the same on any platform.
C specifies '\n' as the end of line. In Windows, the end of line is the sequence "\r\n". C (in this case, the standard library implementing stdio) will automatically translate this for you. Reading from a file on Windows in text mode will give you just \n for \r\n.
If you want to see exactly the byte contents of the file, you have to open it in binary mode instead:
FILE* file = fopen(filename, "rb");
I'm using the fopen with fread for this:
FILE *fp;
if (fopen_s(&fp, filePath, "rb"))
{
printf("Failed to open file\n");
//exit(1);
}
fseek(fp, 0, SEEK_END);
int size = ftell(fp);
rewind(fp);
char buffer = (char)malloc(sizeof(char)*size);
if (!buffer)
{
printf("Failed to malloc\n");
//exit(1);
}
int charsTransferred = fread(buffer, 1, size, fp);
printf("charsTransferred = %d, size = %d\n", charsTransferred, strlen(buffer));
fclose(fp);
I'm not getting the file data in the new file. Here is a comparison between the original file (right) and the one that was sent over the network (left):
Any issues with my fopen calls?
EDIT: I can't do away with the null terminators, because this is a PDF. If i get rid of them the file will corrupt.
Be reassured: the way you're doing the read ensures that you're reading all the data.
you're using "rb" so even in windows you're covered against CR+LF conversions
you're computing the size all right using ftell when at the end of the file
you rewind the file
you allocate properly.
BUT you're not storing the right variable type:
char buffer = (char)malloc(sizeof(char)*size);
should be
char *buffer = malloc(size);
(that very wrong and you should correct it, but since you successfully print some data, that's not the main issue. Next time enable and read the warnings. And don't cast the return value of malloc, it's error-prone specially in your case)
Now, the displaying using printf and strlen which confuses you.
Since the file is binary, you meet a \0 somewhere, and printf prints only the start of the file. If you want to print the contents, you have to perform a loop and print each character (using charsTransferred as the limit).
That's the same for strlen which stops at the first \0 character.
The value in charsTransferred is correct.
To display the data, you could use fwrite to stdout (redirect the output or this can crash your terminal because of all the junk chars)
fwrite(buffer, 1, size, stdout);
Or loop and print only if the char is printable (I'd compare ascii codes for instance)
int charsTransferred = fread(buffer, 1, size, fp);
int i;
for (i=0;i<charsTransferred;i++)
{
char b = buffer[i];
putchar((b >= ' ') && (b < 128) ? b : "-");
if (i % 80 == 0) putchar('\n'); // optional linefeed every now and then...
}
fflush(stdout);
that code prints dashes for characters outside the standard printable ASCII-range, and the real character otherwise.
I'm making a program to split a file into N smaller parts
of (almost) equal sizes. So here's my code:
FILE * fp = fopen(file,"r");
long aux;
long cursor = 0;
long blockSize = 1024000; //supose each smaller file will have 1 MB
long bytesLimit = blockSize;
for( i = 0 ; i < n ; i++) {
FILE * fp_aux = fopen( outputs[i] , "w"); //outputs is an array of temporary file names
while(cursor < bytesLimit) { //here occurs the infinite loop
fscanf(fp,"%lu\n",&aux);
fprintf(fp_aux,"%lu\n",aux);
cursor = ftell(fp);
}
fclose(fp_aux);
bytesLimit = bytesLimit + blockSize;
}
//here add some more logic to get the remaining content left in the main file
The code works if I want to split the file into two or three parts, but when I try to split it into 10 parts, fscanf locks on reading the same number and stays on an infinite loop there.
My input file has the format "%lu\n" like below:
1231231
4341342
4564565
...
If splitting a file is the focus, then simplify your method. Because your post indicates you are working with a text file, the assumption is that it contains words with punctuation, numbers, linefeeds etc. With this type of content, it can be parsed into lines using fgets()/fputs(). This will allow you to read lines from one large file, tracking accumulated size as you go, and writing lines to several smaller files...
Some simple steps:
1) determine file size of file to be split
2) Set desired small file size.
3) open large file
4) Use fgets/fputs in a loop, opening and closing files to split contents, using accumulated size as split point.
5) Clean up. (fclose files etc.)
Here is an example that will illustrate these steps. This splits a large text file by size, regardless of text content. (I used a text file with 130K of volume and split it into segments of 5k
#define SEGMENT 5000 //approximate target size of small file
long file_size(char *name);//function definition below
int main(void)
{
int segments=0, i, len, accum;
FILE *fp1, *fp2;
long sizeFile = file_size(largeFileName);
segments = sizeFile/SEGMENT + 1;//ensure end of file
char filename[260]={"c:\\play\\smallFileName_"};//base name for small files.
char largeFileName[]={"c:\\play\\largeFileName.txt"};//change to your path
char smallFileName[260];
char line[1080];
fp1 = fopen(largeFileName, "r");
if(fp1)
{
for(i=0;i<segments;i++)
{
accum = 0;
sprintf(smallFileName, "%s%d.txt", filename, i);
fp2 = fopen(smallFileName, "w");
if(fp2)
{
while(fgets(line, 1080, fp1) && accum <= SEGMENT)
{
accum += strlen(line);//track size of growing file
fputs(line, fp2);
}
fclose(fp2);
}
}
fclose(fp1);
}
return 0;
}
long file_size(char *name)
{
FILE *fp = fopen(name, "rb"); //must be binary read to get bytes
long size=-1;
if(fp)
{
fseek (fp, 0, SEEK_END);
size = ftell(fp)+1;
fclose(fp);
}
return size;
}
if you have bad data in the file that isn't a long unsigned int format then the fscanf will read it, the file pointer for the fp file object won't change. Then the program sets the fp file pointer back to the start of that read and it will do it again
To prevent this you need to check the return value of the fscanf to see that it has an appropriate value ( probably 1 )
If you want to split a file into several parts with a specified maximum file size of each part, why do you use fscanf(..), ftell(..) and fprintf(..)?
This is not the fastest way to achieve your goal...
I recommend doing it in this way:
Open input file
As long as there is input data (!feof(..))
Open output file (if not already open)
Read block of input data (fread)
Write block of data to output file (fwrite)
track number of bytes written and close output file if maximum file size is reached
Go back to step 2.
Clean up
If doing so the split files will not exceed a specific maximum file size. Additionally you avoid usage of slow file I/O functions like fprintf.
A possible implementation would look like this:
/*
** splitFile
** Splits an existing input file into multiple output files with a specified
** maximum file size.
**
** Return Value:
** Number of created result files, or 0 in case of bad input data or a negative
** value in case of an error during file splitting.
*/
int splitFile(char *fileIn, size_t maxSize)
{
int result = 0;
FILE *fIn;
FILE *fOut;
char buffer[1024 * 16];
size_t size;
size_t read;
size_t written;
if ((fileIn != NULL) && (maxSize > 0))
{
fIn = fopen(fileIn, "rb");
if (fIn != NULL)
{
fOut = NULL;
result = 1; /* we have at least one part */
while (!feof(fIn))
{
/* initialize (next) output file if no output file opened */
if (fOut == NULL)
{
sprintf(buffer, "%s.%03d", fileIn, result);
fOut = fopen(buffer, "wb");
if (fOut == NULL)
{
result *= -1;
break;
}
size = 0;
}
/* calculate size of data to be read from input file in order to not exceed maxSize */
read = sizeof(buffer);
if ((size + read) > maxSize)
{
read = maxSize - size;
}
/* read data from input file */
read = fread(buffer, 1, read, fIn);
if (read == 0)
{
result *= -1;
break;
}
/* write data to output file */
written = fwrite(buffer, 1, read, fOut);
if (written != read)
{
result *= -1;
break;
}
/* update size counter of current output file */
size += written;
if (size >= maxSize) /* next split? */
{
fclose(fOut);
fOut = NULL;
result++;
}
}
/* clean up */
if (fOut != NULL)
{
fclose(fOut);
}
fclose(fIn);
}
}
return (result);
}
The above code split a test file with a size of 126803945 bytes into 121 1MB parts in about 500ms.
Note that the size of buffer (here: 16KB) affects the speed a file is split. The bigger the buffer the faster a huge file is split. If you want to use really large buffers (>1MB or so) you have to allocate (and free) the buffer on each call (or use a static buffer if you do not need reentrant code).
I am try to write a buffer so I can remove a lot of null "00" characters in a file. The characters are useless and are completely random. They are wreaking havoc on the searcher in the program. The code below compiles but just seems to hang when a file is passed to it. Any suggestions will be helpful.
void ReadFile(char *name)
{
FILE *dbg;
char *buffer;
unsigned long fileLen;
//Open file
dbg = fopen(dbg, "w+");
if (!dbg)
{
fprintf(stderr, "Unable to open file %s", name);
return;
}
//Get file length
fseek(dbg, 0, SEEK_END);
fileLen = ftell(dbg);
fseek(dbg, 0, SEEK_SET);
//Allocate memory
buffer = (char *)malloc(fileLen+1);
if (!buffer)
{
fprintf(stderr, "Memory error!");
fclose(dbg);
return;
}
//Read file contents into buffer
fread(buffer, fileLen, 1, dbg);
for(i = fileLen-1; i >= 0 && buffer[i] == 0; i--);
i++;
if (i > 0)
{
fwrite(buffer, 1, i, dbg);
}
fclose(dbg);
//Do what ever with buffer
free(buffer);
}
Change
dbg = fopen(dbg, "w+");
to
dbg = fopen(name, "w+");
Also, if you want to read the file, change it then write it, you shouldn't open it with "w+". You should first open the file with "r", read from it, do whatever change you want, then fclose it, then again open it but this time with "w" so that you write over it. After you have opened it in "w"rite mode, you can write the modified buffer back into the file.
You opened a file for writing and then you try to read from it.
Check the return value of fread and all the other calls.
I have a text file named test.txt
I want to write a C program that can read this file and print the content to the console (assume the file contains only ASCII text).
I don't know how to get the size of my string variable. Like this:
char str[999];
FILE * file;
file = fopen( "test.txt" , "r");
if (file) {
while (fscanf(file, "%s", str)!=EOF)
printf("%s",str);
fclose(file);
}
The size 999 doesn't work because the string returned by fscanf can be larger than that. How can I solve this?
The simplest way is to read a character, and print it right after reading:
int c;
FILE *file;
file = fopen("test.txt", "r");
if (file) {
while ((c = getc(file)) != EOF)
putchar(c);
fclose(file);
}
c is int above, since EOF is a negative number, and a plain char may be unsigned.
If you want to read the file in chunks, but without dynamic memory allocation, you can do:
#define CHUNK 1024 /* read 1024 bytes at a time */
char buf[CHUNK];
FILE *file;
size_t nread;
file = fopen("test.txt", "r");
if (file) {
while ((nread = fread(buf, 1, sizeof buf, file)) > 0)
fwrite(buf, 1, nread, stdout);
if (ferror(file)) {
/* deal with error */
}
fclose(file);
}
The second method above is essentially how you will read a file with a dynamically allocated array:
char *buf = malloc(chunk);
if (buf == NULL) {
/* deal with malloc() failure */
}
/* otherwise do this. Note 'chunk' instead of 'sizeof buf' */
while ((nread = fread(buf, 1, chunk, file)) > 0) {
/* as above */
}
Your method of fscanf() with %s as format loses information about whitespace in the file, so it is not exactly copying a file to stdout.
There are plenty of good answers here about reading it in chunks, I'm just gonna show you a little trick that reads all the content at once to a buffer and prints it.
I'm not saying it's better. It's not, and as Ricardo sometimes it can be bad, but I find it's a nice solution for the simple cases.
I sprinkled it with comments because there's a lot going on.
#include <stdio.h>
#include <stdlib.h>
char* ReadFile(char *filename)
{
char *buffer = NULL;
int string_size, read_size;
FILE *handler = fopen(filename, "r");
if (handler)
{
// Seek the last byte of the file
fseek(handler, 0, SEEK_END);
// Offset from the first to the last byte, or in other words, filesize
string_size = ftell(handler);
// go back to the start of the file
rewind(handler);
// Allocate a string that can hold it all
buffer = (char*) malloc(sizeof(char) * (string_size + 1) );
// Read it all in one operation
read_size = fread(buffer, sizeof(char), string_size, handler);
// fread doesn't set it so put a \0 in the last position
// and buffer is now officially a string
buffer[string_size] = '\0';
if (string_size != read_size)
{
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;
}
// Always remember to close the file.
fclose(handler);
}
return buffer;
}
int main()
{
char *string = ReadFile("yourfile.txt");
if (string)
{
puts(string);
free(string);
}
return 0;
}
Let me know if it's useful or you could learn something from it :)
Instead just directly print the characters onto the console because the text file maybe very large and you may require a lot of memory.
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE *f;
char c;
f=fopen("test.txt","rt");
while((c=fgetc(f))!=EOF){
printf("%c",c);
}
fclose(f);
return 0;
}
Use "read()" instead o fscanf:
ssize_t read(int fildes, void *buf, size_t nbyte);
DESCRIPTION
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf.
Here is an example:
http://cmagical.blogspot.com/2010/01/c-programming-on-unix-implementing-cat.html
Working part from that example:
f=open(argv[1],O_RDONLY);
while ((n=read(f,l,80)) > 0)
write(1,l,n);
An alternate approach is to use getc/putc to read/write 1 char at a time. A lot less efficient. A good example: http://www.eskimo.com/~scs/cclass/notes/sx13.html
You can use fgets and limit the size of the read string.
char *fgets(char *str, int num, FILE *stream);
You can change the while in your code to:
while (fgets(str, 100, file)) /* printf("%s", str) */;
Two approaches leap to mind.
First, don't use scanf. Use fgets() which takes a parameter to specify the buffer size, and which leaves any newline characters intact. A simple loop over the file that prints the buffer content should naturally copy the file intact.
Second, use fread() or the common C idiom with fgetc(). These would process the file in fixed-size chunks or a single character at a time.
If you must process the file over white-space delimited strings, then use either fgets or fread to read the file, and something like strtok to split the buffer at whitespace. Don't forget to handle the transition from one buffer to the next, since your target strings are likely to span the buffer boundary.
If there is an external requirement to use scanf to do the reading, then limit the length of the string it might read with a precision field in the format specifier. In your case with a 999 byte buffer, then say scanf("%998s", str); which will write at most 998 characters to the buffer leaving room for the nul terminator. If single strings longer than your buffer are allowed, then you would have to process them in two pieces. If not, you have an opportunity to tell the user about an error politely without creating a buffer overflow security hole.
Regardless, always validate the return values and think about how to handle bad, malicious, or just malformed input.
You can use getline() to read your text file without worrying about large lines:
getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found.
If *lineptr is set to NULL before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed.
bool read_file(const char *filename)
{
FILE *file = fopen(filename, "r");
if (!file)
return false;
char *line = NULL;
size_t linesize = 0;
while (getline(&line, &linesize, file) != -1) {
printf("%s", line);
free(line);
}
free(line);
fclose(file);
return true;
}
You can use it like this:
int main(void)
{
if (!read_file("test.txt")) {
printf("Error reading file\n");
exit(EXIT_FAILURE);
}
}
I use this version
char* read(const char* filename){
FILE* f = fopen(filename, "rb");
if (f == NULL){
exit(1);
}
fseek(f, 0L, SEEK_END);
long size = ftell(f)+1;
fclose(f);
f = fopen(filename, "r");
void* content = memset(malloc(size), '\0', size);
fread(content, 1, size-1, f);
fclose(f);
return (char*) content;
}
You could read the entire file with dynamic memory allocation, but isn't a good idea because if the file is too big, you could have memory problems.
So is better read short parts of the file and print it.
#include <stdio.h>
#define BLOCK 1000
int main() {
FILE *f=fopen("teste.txt","r");
int size;
char buffer[BLOCK];
// ...
while((size=fread(buffer,BLOCK,sizeof(char),f)>0))
fwrite(buffer,size,sizeof(char),stdout);
fclose(f);
// ...
return 0;
}