What is the different between these blocks of code. I tried to search for "wb" but don't see it anywhere. The file containing "wb" is from on of my tutors
FILE *f = fopen(DB_FILE_NAME, "wb");
if (f == NULL) {
printf("Write error\n");
} else {
/* write n_students elements of the studentlist array */
fwrite(studentlist, sizeof(student_t), n_students, f);
fclose(f);
}
and
FILE *f = fopen(DB_FILE_NAME, "w");
if (f == NULL) {
printf("Write error\n");
} else {
/* write n_students elements of the studentlist array */
fwrite(studentlist, sizeof(student_t), n_students, f);
fclose(f);
}
Specifying "b" in the access mode prevents (some implementations of) the standard library from translating a few characters when reading/writing to the file.
Most common translation is for end of line: \n is translated to \r\n in Windows.
Absolutely any reference on the fopen() function would have told you this. For instance the manual page which is the common documentation used in Unix-like environments:
The mode string can also include the letter 'b' either as a last character
or as a character between the characters in any of the two-character
strings described above. This is strictly for compatibility with C89 and
has no effect; the 'b' is ignored on all POSIX conforming systems,
including Linux. (Other systems may treat text files and binary files
differently, and adding the 'b' may be a good idea if you do I/O to a
binary file and expect that your program may be ported to non-UNIX
environments.)
So, it stands for binary and is useful to indicate that you intend to treat the contents of the file as not being text.
For your code, binary access seems right. However, directly writing raw struct values is generally a very bad idea, since you don't know the exact internal format used by the compiler and it can change unexpectedly. For files that should be shared and/or accessed "later", this is not the proper way to do it in C. Look into serialization.
In fopen documentation :
With the mode specifiers above the file is open as a text file. In order to open a file as a binary file, a "b" character has to be included in the mode string. This additional "b" character can either be appended at the end of the string (thus making the following compound modes: "rb", "wb", "ab", "r+b", "w+b", "a+b") or be inserted between the letter and the "+" sign for the mixed modes ("rb+", "wb+", "ab+").
Related
What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others
What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others
I took over a project that use the following function to read files:
char *fetchFile(char *filename) {
char *buffer;
int len;
FILE *f = fopen(filename, "rb");
if(f) {
if(verbose) {
fprintf(stdout, "Opened file %s successfully\n", filename);
}
fseek(f, 0, SEEK_END);
len = ftell(f);
fseek(f, 0, SEEK_SET);
if(verbose) {
fprintf(stdout, "Allocating memory for buffer for %s\n", filename);
}
buffer = malloc(len + 1);
if(buffer) fread (buffer, 1, len, f);
fclose (f);
buffer[len] = '\0';
} else {
fprintf(stderr, "Error reading file %s\n", filename);
exit(1);
}
return buffer;
}
The rb mode is used because sometimes the file can be a spreadsheet and therefore I want the information as in a text file.
The program runs on a linux machine but the files to read come from linux and windows.
I am not sure of what approach is better to not have windows line ending mess with my code.
I was thinking of using dos2unix at the start of this function.
I also thought of opening in r mode, but I believe that could potentially mess things up when opening non-text files.
I would like to understand better the differences between using:
dos2unix,
r vs rb mode,
or any other solution which would fit
better the problem.
Note: I believe that I understand r vs rb modes, but if you could explain why it is a bad or good solution for this specific situation (I think it wouldn't be good because sometimes it opens spreadsheets but I am not sure of that).
If my understanding is correct the rb mode is used because sometimes the file can be a spreadsheet and therefore the programs just want the information as in a text file.
You seem uncertain, and though perhaps you do understand correctly, your explanation does not give me any confidence in that.
C knows about two distinct kinds of streams: binary streams and text streams. A binary stream is simply an ordered sequence of bytes, written and / or read as-is without any kind of transformation. On the other hand,
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one- to-one correspondence
between the characters in a stream and those in the external
representation. [...]
(C2011 7.21.2/2)
For some implementations, such as POSIX-compliant ones, this is a distinction without a difference. For other implementations, such as those targeting Windows, the difference matters. In particular, on Windows, text streams convert on the fly between carriage-return / line-feed pairs in the external representation and newlines (only) in the internal representation.
The b in your fopen() mode specifies that the file should be opened as a binary stream -- that is, no translation will be performed on the bytes read from the file. Whether this is the right thing to do depends on your environment and the application's requirements. This is moot on Linux or another Unix, however, as there is no observable difference between text and binary streams on such systems.
dos2unix converts carriage-return / line-feed pairs in the input file to single line-feed (newline) characters. This will convert a Windows-style text file or one with mixed Windows / Unix line terminators to Unix text file convention. It is irreversible if there are both Windows-style and Unix-style line terminators in the file, and it is furthermore likely to corrupt your file if it is not a text file in the first place.
If your inputs are sometimes binary files then opening in binary mode is appropriate, and conversion via dos2unix probably is not. If that's the case and you also need translation for text-file line terminators, then you first and foremost need a way to distinguish which case applies for any particular file -- for example, by command-line argument or by pre-analyzing the file via libmagic. You then must provide different handling for text files; your main options are
Perform the line terminator conversion in your own code.
Provide separate versions of the fetchFile() function for text and binary files.
The code just copies the contents of a file to an allocated buffer. The UNIX way (YMMV) is to just memory map the file instead of reading it. Much faster.
// untested code
void* mapfile(const char *name)
{
int fd;
struct stat st;
if ((fd = open(name, O_RDONLY)) == -1)
return NULL;
if (fstat(fd, &st)) {
close(fd);
return NULL;
}
void *p = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, 0, fd);
close(fd);
if (p == (void *)MAP_FAILED)
p = NULL;
return p;
}
Something along these lines will work. Adjust settings if you want to write to the file as well.
I am practicing some practice questions in FILE IO in C. Below is one of the programs.
#include<stdio.h>
#include<stdlib.h>
int main()
{
char fname[]="poem.txt";
FILE *fp;
char ch;
fp = fopen ( fname, "tr");
if (fp == NULL)
{
printf("Unable to open file...\n");
exit(1);
}
while((ch =fgetc(fp)) != EOF)
{
printf("%c",ch);
}
printf("\n");
return 0;
}
As you can see in the statement
fp = fopen ( fname, "tr");
The mode "tr" is not a valid mode (as I understand). I was expecting gcc to give an error (or a warning) while compiling the above program. However, gcc does not give any error (or warning) while compiling it.
However, as expected, when i run the program it exits printing "Unable to open file..." which means fopen() returned NULL , because there was error while opening file.
-bash-4.1$ ./a.out
Unable to open file...
-bash-4.1$
(The file poem.txt exists so this is because of the invalid mode given to fopen(). I checked by changing the mode to "r" and it works fine displaying the content of "poem.txt")
-bash-4.1$ ./a.out
THis is a poem.
-bash-4.1$
I was expecting gcc to give an error (or warning) message for the invalid mode.
Why gcc does not give any error (or warning) for this ?
the compiler doesn't check what you do, it only checks the syntax.
However, at run time, if the code is written like so:
#include<stdio.h>
#include<stdlib.h>
int main()
{
char fname[]="poem.txt";
FILE *fp;
char ch;
fp = fopen ( fname, "tr");
if (fp == NULL)
{
perror( "fopen for poem.txt failed");
exit( EXIT_FAILURE );
}
while((ch =fgetc(fp)) != EOF)
{
printf("%c",ch);
}
printf("\n");
return 0;
}
then a proper error message is output:
...$ ./untitled
fopen for poem.txt failed: Invalid argument
This is Undefined Behavior:
Per Annex J.2 "Undefined Behavior", it is UDB if:
—The string pointed to by the mode argument in a call to the fopen function does not exactly match one of the specified character sequences (7.19.5.3).
Although Annex J is informative, looking at §7.19.5.3:
/3 The argument mode points to a string. If the string is one of the following, the file is open in the indicated mode. Otherwise, the behavior is undefined.
Basically, the compiler can blow you off here - a standard library function name (and behavior) can be used outside of the inclusion of a standard header (for example, non-standard extensions, completely user-defined behavior, etc.). The Standard specifies what a conforming library implementation shall include and how it shall behave, but does not require you to use that standard library (or define behavior for a specific implementation explicitly specified as UDB territory: at this point, if your parameter types match it's a legal function call).
A really good lint utility might help you here.
How is the compiler supposed to know what the valid arguments for a function are?
To do it you'd be building too much knowledge in the compiler - it would have to recognize functions and their parameters by name. What if you want to override the function? What if different modes are valid on different platforms?
In Windows programming, "tr" is a valid mode is not a valid mode, although "rt" is. The t means text and the r means read. (If you are using gcc and linking to MS's C runtime then you will be able to use this).
However you still don't see t very often because it is the default and therefore redundant; the other option for this setting is b meaning binary. But MS do seem to explicitly use t in their examples to make it clear that translation is intended.
The behaviour of text mode and binary mode for a stream is implementation-defined, although the intent is that binary mode reads the characters exactly as they appear on disk, and text mode may perform translations relevant to text processing; most famously, converting \r\n in MS text files to \n in your program.
I'm trying to parse data from stdin in binary mode under Win32.
The first thing my code does is to check for a 4byte header at the beginning:
int riff_header;
fread(&riff_header, sizeof(riff_header), 1, ifp);
// 'RIFF' = little-endian
if (riff_header != 0x46464952) {
fprintf(stderr, "wav2msu: Incorrect header: Invalid format or endianness\n");
fprintf(stderr, " Value was: 0x%x\n", riff_header);
return -1;
}
stdin has been switched to binary mode before reading from it:
if (*argv[argc-1] == '-') {
fprintf(stderr, "Reading from stdin.\n");
infile = stdin;
// We need to switch stdin to binary mode, or else we run
// into problems under Windows
freopen(NULL, "rb", stdin);
}
This code works fine under Linux, however on Win32 (specifically Windows XP), the fread only seems to read a single byte and thus cause the evaluation to fail.
Example:
> ffmeg.exe -i ..\test.mp3 -f wav pipe:1 2> nul |..\foo.exe -o test.bin -
Reading from stdin.
foo: Incorrect header: Invalid format or endianness
Value was: 0x4
What am I doing wrong?
According to the MSDN documentation, it's not permitted to pass NULL for the path parameter of freopen, so the call to freopen is almost certainly failing; have you checked the return value and the value of errno? C89 does not specify the behavior of freopen when path is NULL; C99 does, but the Microsoft C runtime is not (and does not claim to be) C99-compliant.
If you really need to read binary info from stdin, you might have to use platform-specific code and read the raw binary data directly with ReadFile on the file GetStdHandle(STD_INPUT_HANDLE).
At http://pubs.opengroup.org/onlinepubs/009695399/functions/freopen.html I have found the following:
If filename is a null pointer, the freopen() function shall attempt to
change the mode of the stream to that specified by mode, as if the
name of the file currently associated with the stream had been used.
In this case, the file descriptor associated with the stream need not
be closed if the call to freopen() succeeds. It is
implementation-defined which changes of mode are permitted (if any),
and under what circumstances.
Maybe you should check if the change of mode (from text to binary) is allowed by the compiler and libraries you are using. Which compiler are you using?
Update / summary
Using MinGW you can call setmode() to switch the mode of the stdin stream.
You should set the mode to _O_BINARY, which is defined in fcntl.h.
For more information see e.g. http://gnuwin32.sourceforge.net/compile.html