Binary mode and Text mode in File I/O in C - c

I am little confused when to open a file in Text Mode or Binary Mode. I read some documentations and examples, observed that it used getc()-putc() or fgets()-fputs() in Text Mode as well as in Binary Mode. Can I open a file in Text Mode to use fread()-fwrite() or I should use only Binary Mode for Binary I/O functions like fread()-fwrite().
To use fseek(), ftell() which mode I should use Text Mode or Binary Mode ?
I am using C programming language and Linux distro (fedora).

In Unix systems (and linux, in particular), there's no difference between binary mode and text mode (the library just ignores the t qualifier) but in other systems do. In Windows, a line end is indicated with a sequence of \r\n characters, which are converted in input into \n, while when outputting, the \n is converted into a sequence of \r\n for text files. Binary files are not converted at all, so no transformation is done, either in input or in output. You must have present that this transformation is not reversible, as you don't know if the characters or sequences are converted because they where converted in the process or they where already in the form read.
Text mode means that your file is text, and will be transformated to comply with the operating system's way of line ending. If it is actually text, normally this is not a problem, but if you do this with an actually binary file (e.g. a compressed file or a .jpg image) the results will be unpredictable.

You can use fread and fwrite in both text and binary, although they are used more commonly for binary. You can use also fseek and ftell in both modes (wb (Write-Binary) and w(Write-Text))

Related

Reading a file with a wrong access mode in C

In a project I'm writing a file as binary, using fopen(p_full_path, "ab"); before I start writing it.
When I read it, I noticed I was doing this in a wrong way: to open the file I used fopen(p_full_path, "r+"); instead of fopen(p_full_path, "rb+");.
The content of the file is correct anyway. So I don't understand if there is a real difference between the modes r+ and rb+, I mean if there is the possibility of reading the wrong content because I'm not reading the file as binary.
Reading the documentation of fopen() I didn't find the answer.
The answer to your question depends on which platform you are using.
For example, on POSIX-compliant platforms (such as Linux), there is no difference between binary mode and text mode.
However, on Microsoft Windows, the situation is very different: In binary mode, lines are terminated by character sequences of \r\n (carriage return followed by line feed), whereas in text mode, they are only terminated by \n characters. This is because \r\n is automatically converted to \n in text mode, whereas in binary mode, no conversion is performed. Also, in text mode, the character code \x1A is interpreted as the end of the file, whereas in binary mode, this value has no special meaning and is treated as a value like any other.

Why is my File I/O in VSCode not working properly? [duplicate]

With the C standard library stdio.h, I read that to output ASCII/text data, one should use mode "w" and to output binary data, one should use "wb". But why the difference?
In either case, I'm just outputting a byte (char) array, right? And if I output a non-ASCII byte in ASCII mode, the program still outputs the correct byte.
Some operating systems - mostly named "windows" - don't guarantee that they will read and write ascii to files exactly the way you pass it in. So on windows they actually map \r\n to \n. This is fine and transparent when reading and writing ascii. But it would trash a stream of binary data. Basically just always give windows the 'b' flag if you want it to faithfully read and write data to files exactly the way you passed it in.
There are certain transformations that can take place when outputting in ASCII (e.g. outputting neline+carriage-return when the outputted character is new-line) -- depending on your platform. Such transformations will not take place when using binary format

File management in C?

I'm training with file management in C, I saw that there are plenty of ways to open a file with fopen using words as a,r,etc.. Everything ok, but I read also that if to that word I add b that become a binary file. What does it mean? Which are the differences with a normal file?
Opening a file in text mode causes the C libraries to do some handling specific to text. For example, new lines are different between Windows and Unix/linux but you can simply write '\n' because C is handling that difference for you.
Opening a file in binary mode doesn't do any of this special handling, it just treats it as raw bytes. There's a bit of a longer explanation of this on the C FAQ
Note that this only matters on Windows; Unix/linux systems don't (need to) differentiate between text and binary modes, though you can include the 'b' flag without them complaining.
If you open a regular file in the binary mode, you'll get all its data as-is and whatever you write into it, will appear in it.
OTOH, if you open a regular file in the text mode, things like ends of lines can get special treatment. For example, the sequence of bytes with values of 13 (CR or '\r') and 10 (LF or '\n') can get truncated to just one byte, 10, when reading or 10 can get expanded into 13 followed by 10 when writing. This treatment is platform-specific (read, compiler/OS-specific).
For text files, this is often unimportant. But if you apply the text mode to a non-text file, you risk data corruptions.
Also, reading and writing bytes at arbitrary offsets in files opened in the text mode isn't supported because of that special treatment.
The difference is explained here
A binary file is a series of 1's and 0's. This is called machine language because microprocessors can interpret this by sending a signal for 1's or no signal for 0's. This is much more compact, but not readable by humans.
For this reason, text files are a string of binary signals designated to be displayed as more people-friendly characters which lend themselves to language much better than binary. ASCII is an example of one such designation. This reveals the truth of the matter: all files are binary on the lowest level.
But, binary lends itself to any application which does not have to be textually legible to us lowly humans =] Examples applications where binary is preferred are sound files, images, and compiled programs. The reason binary is preferred to text is that it is more efficient to have an image described in machine language than textually (which has to be translated to machine language anyway).
There are two types of files: text files and binary files.
Binary files have two features that distinguish them from text files: You can jump instantly to any record in the file, which provides random access as in an array; and you can change the contents of a record anywhere in the file at any time. Binary files also usually have faster read and write times than text files, because a binary image of the record is stored directly from memory to disk (or vice versa). In a text file, everything has to be converted back and forth to text, and this takes time.
more info here
b is for working with binary files. However, this has no effect on POSIX compliant operating systems.
from the manpage of fopen:
The mode string can also include the letter 'b' either as a last char‐
acter or as a character between the characters in any of the two-char‐
acter strings described above. This is strictly for compatibility with
C89 and has no effect; the 'b' is ignored on all POSIX conforming sys‐
tems, including Linux. (Other systems may treat text files and binary
files differently, and adding the 'b' may be a good idea if you do I/O
to a binary file and expect that your program may be ported to non-UNIX
environments.)

Why use fopen() mode 'b' (stdio.h) when output can be non-ASCII regardless?

With the C standard library stdio.h, I read that to output ASCII/text data, one should use mode "w" and to output binary data, one should use "wb". But why the difference?
In either case, I'm just outputting a byte (char) array, right? And if I output a non-ASCII byte in ASCII mode, the program still outputs the correct byte.
Some operating systems - mostly named "windows" - don't guarantee that they will read and write ascii to files exactly the way you pass it in. So on windows they actually map \r\n to \n. This is fine and transparent when reading and writing ascii. But it would trash a stream of binary data. Basically just always give windows the 'b' flag if you want it to faithfully read and write data to files exactly the way you passed it in.
There are certain transformations that can take place when outputting in ASCII (e.g. outputting neline+carriage-return when the outputted character is new-line) -- depending on your platform. Such transformations will not take place when using binary format

In C how to write whichever end of line character is appropriate to the OS?

Unix has \n, Mac was \r but is now \n and DOS/Win32 is \r\n. When creating a text file with C, how to ensure whichever end of line character(s) is appropriate to the OS gets used?
fprintf(your_file, "\n");
This will be converted to an appropriate EOL by the stdio library on your operating system provided that you opened the file in text mode. In binary mode no conversion takes place.
From Wikipedia:
When writing a file in text mode, '\n'
is transparently translated to the
native newline sequence used by the
system, which may be longer than one
character. (Note that a C
implementation is allowed not to store
newline characters in files. For
example, the lines of a text file
could be stored as rows of a SQL table
or as fixed-length records.) When
reading in text mode, the native
newline sequence is translated back to
'\n'. In binary mode, the second mode
of I/O supported by the C library, no
translation is performed, and the
internal representation of any escape
sequence is output directly.
When you open a file in text mode (pass "w" to fopen instead of "wb") any newline characters written to the file will automatically be converted to the appropriate newline sequence for the system. Newline sequences will be translated back to newline characters when you read the file.
This is why it's important to distinguish between text and binary mode; if you're writing in binary mode, C will not tamper with the bytes you write to a file.

Resources