fread is not reading whole file [duplicate] - c

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);

I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.

In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen

Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.

Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.

Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.

We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.

In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

Related

How to use FILE in binary and text mode in C [duplicate]

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

Open non text file without windows line ending

I took over a project that use the following function to read files:
char *fetchFile(char *filename) {
char *buffer;
int len;
FILE *f = fopen(filename, "rb");
if(f) {
if(verbose) {
fprintf(stdout, "Opened file %s successfully\n", filename);
}
fseek(f, 0, SEEK_END);
len = ftell(f);
fseek(f, 0, SEEK_SET);
if(verbose) {
fprintf(stdout, "Allocating memory for buffer for %s\n", filename);
}
buffer = malloc(len + 1);
if(buffer) fread (buffer, 1, len, f);
fclose (f);
buffer[len] = '\0';
} else {
fprintf(stderr, "Error reading file %s\n", filename);
exit(1);
}
return buffer;
}
The rb mode is used because sometimes the file can be a spreadsheet and therefore I want the information as in a text file.
The program runs on a linux machine but the files to read come from linux and windows.
I am not sure of what approach is better to not have windows line ending mess with my code.
I was thinking of using dos2unix at the start of this function.
I also thought of opening in r mode, but I believe that could potentially mess things up when opening non-text files.
I would like to understand better the differences between using:
dos2unix,
r vs rb mode,
or any other solution which would fit
better the problem.
Note: I believe that I understand r vs rb modes, but if you could explain why it is a bad or good solution for this specific situation (I think it wouldn't be good because sometimes it opens spreadsheets but I am not sure of that).
If my understanding is correct the rb mode is used because sometimes the file can be a spreadsheet and therefore the programs just want the information as in a text file.
You seem uncertain, and though perhaps you do understand correctly, your explanation does not give me any confidence in that.
C knows about two distinct kinds of streams: binary streams and text streams. A binary stream is simply an ordered sequence of bytes, written and / or read as-is without any kind of transformation. On the other hand,
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one- to-one correspondence
between the characters in a stream and those in the external
representation. [...]
(C2011 7.21.2/2)
For some implementations, such as POSIX-compliant ones, this is a distinction without a difference. For other implementations, such as those targeting Windows, the difference matters. In particular, on Windows, text streams convert on the fly between carriage-return / line-feed pairs in the external representation and newlines (only) in the internal representation.
The b in your fopen() mode specifies that the file should be opened as a binary stream -- that is, no translation will be performed on the bytes read from the file. Whether this is the right thing to do depends on your environment and the application's requirements. This is moot on Linux or another Unix, however, as there is no observable difference between text and binary streams on such systems.
dos2unix converts carriage-return / line-feed pairs in the input file to single line-feed (newline) characters. This will convert a Windows-style text file or one with mixed Windows / Unix line terminators to Unix text file convention. It is irreversible if there are both Windows-style and Unix-style line terminators in the file, and it is furthermore likely to corrupt your file if it is not a text file in the first place.
If your inputs are sometimes binary files then opening in binary mode is appropriate, and conversion via dos2unix probably is not. If that's the case and you also need translation for text-file line terminators, then you first and foremost need a way to distinguish which case applies for any particular file -- for example, by command-line argument or by pre-analyzing the file via libmagic. You then must provide different handling for text files; your main options are
Perform the line terminator conversion in your own code.
Provide separate versions of the fetchFile() function for text and binary files.
The code just copies the contents of a file to an allocated buffer. The UNIX way (YMMV) is to just memory map the file instead of reading it. Much faster.
// untested code
void* mapfile(const char *name)
{
int fd;
struct stat st;
if ((fd = open(name, O_RDONLY)) == -1)
return NULL;
if (fstat(fd, &st)) {
close(fd);
return NULL;
}
void *p = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, 0, fd);
close(fd);
if (p == (void *)MAP_FAILED)
p = NULL;
return p;
}
Something along these lines will work. Adjust settings if you want to write to the file as well.

How to duplicate an image file? [duplicate]

I am designing an image decoder and as a first step I tried to just copy the using c. i.e open the file, and write its contents to a new file. Below is the code that I used.
while((c=getc(fp))!=EOF)
fprintf(fp1,"%c",c);
where fp is the source file and fp1 is the destination file.
The program executes without any error, but the image file(".bmp") is not properly copied. I have observed that the size of the copied file is less and only 20% of the image is visible, all else is black. When I tried with simple text files, the copy was complete.
Do you know what the problem is?
Make sure that the type of the variable c is int, not char. In other words, post more code.
This is because the value of the EOF constant is typically -1, and if you read characters as char-sized values, every byte that is 0xff will look as the EOF constant. With the extra bits of an int; there is room to separate the two.
Did you open the files in binary mode? What are you passing to fopen?
It's one of the most "popular" C gotchas.
You should use freadand fwrite using a block at a time
FILE *fd1 = fopen("source.bmp", "r");
FILE *fd2 = fopen("destination.bmp", "w");
if(!fd1 || !fd2)
// handle open error
size_t l1;
unsigned char buffer[8192];
//Data to be read
while((l1 = fread(buffer, 1, sizeof buffer, fd1)) > 0) {
size_t l2 = fwrite(buffer, 1, l1, fd2);
if(l2 < l1) {
if(ferror(fd2))
// handle error
else
// Handle media full
}
}
fclose(fd1);
fclose(fd2);
It's substantially faster to read in bigger blocks, and fread/fwrite handle only binary data, so no problem with \n which might get transformed to \r\n in the output (on Windows and DOS) or \r (on (old) MACs)

How to save results of a function into text file in C

This function print the length of words with '*' called histogram.How can I save results into text file? I tried but the program does not save the results.(no errors)
void histogram(FILE *myinput)
{
FILE *ptr;
printf("\nsaving results...\n");
ptr=fopen("results1.txt","wt");
int j, n = 1, i = 0;
size_t ln;
char arr[100][10];
while(n > 0)
{
n = fscanf(myinput, "%s",arr[i]);
i++;
}
n = i;
for(i = 0; i < n - 1; i++)
{
ln=strlen(arr[i]);
fprintf(ptr,"%s \t",arr[i]);
for(j=0;j<ln;j++)
fprintf(ptr, "*");
fprintf(ptr, "\n");
}
fclose(myinput);
fclose(ptr);
}
I see two ways to take care of this issue:
Open a file in the program and write to it.
If running with command line, change the output location for standard out
$> ./histogram > outfile.txt
Using the '>' will change where standard out will write to. The issue with '>' is that it will truncate a file and then write to the file. This means that if there was any data in that file before, it is gone. Only the new data written by the program will be there.
If you need to keep the data in the file, you can change the standard out to append the file with '>>' as in the following example:
$> ./histogram >> outfile.txt
Also, there does not have to be a space between '>' and the file name. I just do that for preference. It could look like this:
$> ./histogram >outfile.txt
If your writing to a file will be a one time thing, changing standard out is probably be best way to go. If you are going to do it every time, then add it to the code.
You will need to open another FILE. You can do this in the function or pass it in like you did the file being read from.
Use 'fprintf' to write to the file:
int fprintf(FILE *restrict stream, const char *restrict format, ...);
Your program may have these lines added to write to a file:
FILE *myoutput = fopen("output.txt", "w"); // or "a" if you want to append
fprintf(myoutput, "%s \t",arr[i]);
Answer Complete
There may be some other issues as well that I will discuss now.
Your histogram function does not have a return identifier. C will set it to 'int' automatically and then say that you do not have a return value for the function. From what you have provided, I would add the 'void' before the function name.
void histogram {
The size of arr's second set of arrays may be to small. One can assume that the file you are reading from does not exceed 10 characters per token, to include the null terminator [\0] at the end of the string. This would mean that there could be at most 9 characters in a string. Else you are going to overflow the location and potentially mess your data up.
Edit
The above was written before a change to the provided code that now includes a second file and fprintf statements.
I will point to the line that opens the out file:
ptr=fopen("results1.txt","wt");
I am wondering if you mean to put "w+" where the second character is a plus symbol. According to the man page there are six possibilities:
The argument mode points to a string beginning with one of the
following sequences (possibly followed by additional characters, as
described below):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is
positioned at the beginning of the file.
a Open for appending (writing at end of file). The file is
created if it does not exist. The stream is positioned at the
end of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file
position for reading is at the beginning of the file, but
output is always appended to the end of the file.
As such, it appears you are attempting to open the file for reading and writing.

Confusion with fwrite in VC2012

I need to write some binary data into file. The format is uint64_t.
#include <stdio.h>
#include <assert.h>
typedef unsigned long long uint64_t;
int main()
{
FILE * file = fopen("data","w");assert(file);
uint64_t a[]={16000550, 1051320,14456018, 4743184,11840752 ,4225032,\
13642264,6059108,563784 ,11823354,3989084 ,15759410,\
13413018 ,1582802,1574952 ,1635384,1102996 ,10511428,\
10239562 ,9472574,2641952 ,1350256,3432142 ,9920,11573360,\
12121180,10255874 ,3198684,7628524,16522766,12908660,\
2681374,9482820 ,6354462,15230702 ,16255676,5813862, \
8174782,7642752,7362790,6089340 ,803928,2669686 ,4225032,\
7603956 ,16551562,15734364 ,14424308,12060282 ,572450,\
18432 ,10276902,8134910 ,10749010,14166126 ,1636942,\
5295788 ,12342876,2151156 ,12322948};
for(int i=0;i<sizeof(a)/sizeof(uint64_t);i++)
{
fwrite((char*)&a[i],sizeof(uint64_t),1,file);
}
fclose(file);
}
I found the output doesn't satisfy my expectation only when the size of array is large, so I give 60 uint64_ts in my example.
In test, I found it will output 0000 fe20 7c00 0000 for 8134910. Also, some others errors exists in it. In GCC, it works well and in VS2012, it works bad.
Based on your feedback in comments, the reason it's different in VS2012 is because the file has been opened by defualt in "text" mode. In this mode, each \n when written will be expanded to \r\n, which will corrupt your data.
The solution is to explicitly open the file in binary mode:
FILE * file = fopen("data","wb")
Quoting from MSDN regarding the t and b characters that may be appended to the mode:
t
Open in text (translated) mode. In this mode, CTRL+Z is interpreted
as an EOF character on input. In files that are opened for
reading/writing by using "a+", fopen checks for a CTRL+Z at the end of
the file and removes it, if possible. This is done because using fseek
and ftell to move within a file that ends with CTRL+Z may cause fseek
to behave incorrectly near the end of the file.
In text mode, carriage return–linefeed combinations are translated
into single linefeeds on input, and linefeed characters are translated
to carriage return–linefeed combinations on output. When a Unicode
stream-I/O function operates in text mode (the default), the source or
destination stream is assumed to be a sequence of multibyte
characters. Therefore, the Unicode stream-input functions convert
multibyte characters to wide characters (as if by a call to the mbtowc
function). For the same reason, the Unicode stream-output functions
convert wide characters to multibyte characters (as if by a call to
the wctomb function).
b
Open in binary (untranslated) mode; translations involving
carriage-return and linefeed characters are suppressed.
If t or b is not given in mode, the default translation mode is
defined by the global variable _fmode.
The MSDN documentation for _fmode says:
The default setting of _fmode is _O_TEXT for text-mode
translation. _O_BINARY is the setting for binary mode.

Resources