Is there any difference between text and binary mode in file access? - c

Is there any difference if I open a file in text mode rather than binary mode? Because I read that UNIX and Linux make no distinction between text and binary files.

There is no difference on Linux (at least on native file systems like Ext4 and on most other file systems too, with the usual GNU libc).
Perhaps some bizarre filesystems could have a specific flag to open differently binary or text files. I know no such filesystem. Maybe you could code some FUSE filesystem making the distinction, perhaps with some additional hack around fopen inside a bizarrely customized libc
However, C99 standard (at least page 271, §7.19.5.3 of n1256 draft) mentions explicitly the text vs binary mode, so your program would be easier to port to other systems (such as Windows) if it conforms to the standard.
So my point is that you might want to try passing a mode string to fopen which differentiate the binary vs text mode. (I agree that I don't do that very often). It won't hurt.
The Linux fopen(3) man page explicitly says:
The mode string can also include the letter 'b' either as a last
character or as a character between the characters in any of the two-
character strings described above. This is strictly for
compatibility with C89 and has no effect; the 'b' is ignored on all
POSIX conforming systems, including Linux. (Other systems may treat
text files and binary files differently, and adding the 'b' may be a
good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)
Of course, the open(2) syscall does not have any way of transmitting a mode flag. (You 'll need some private ioctl(2) probably)

Related

How can I open a file that has a Chinese Filename in C? [duplicate]

Is there a standard way to do an fopen with a Unicode string file path?
No, there's no standard way. There are some differences between operating systems. Here's how different OSs handle non-ASCII filenames.
Linux
Under Linux, a filename is simply a binary string. The convention on most modern distributions is to use UTF-8 for non-ASCII filenames. But in the beginning, it was common to encode filenames as ISO-8859-1. It's basically up to each application to choose an encoding, so you can even have different encodings used on the same filesystem. The LANG environment variable can give you a hint what the preferred encoding is. But these days, you can probably assume UTF-8 everywhere.
This is not without problems, though, because a filename containing an invalid UTF-8 sequence is perfectly valid on most Linux filesystems. How would you specify such a filename if you only support UTF-8? Ideally, you should support both UTF-8 and binary filenames.
OS X
The HFS filesystem on OS X uses Unicode (UTF-16) filenames internally. Most C (and POSIX) library functions like fopen accept UTF-8 strings (since they're 8-bit compatible) and convert them internally.
Windows
The Windows API uses UTF-16 for filenames, but fopen uses the current codepage, whatever that is (UTF-8 just became an option). Many C library functions have a non-standard equivalent that accepts UTF-16 (wchar_t on Windows). For example, _wfopen instead of fopen.
In *nix, you simply use the standard fopen (see more information in reply from TokeMacGuy, or in this forum)
In Windows, you can use _wfopen, and then pass a Unicode string (for more information, see MSDN).
As there is no real common way, I would wrap this call in a macro, together with all other system-dependent functions.
This is a matter of your current locale. On my system, which is Unicode-enabled, file paths will be in Unicode. I'm able to detect this by means of the locale command:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
The encoding of file paths is normally set system wide, so if your file path is not in the system's locale, you will need to convert it, perhaps by means of the iconv library.
Almost all POSIX platforms use UTF-8 nowadays. And modern Windows also support UTF-8 as the locale, you can just use UTF-8 everywhere and open any files without using wide strings on Windows. fopen just works portably
setlocale(LC_ALL, "en_us.utf8"); // need some setup before calling this
fopen(R"(C:\filê\wíth\Ünicode\name.txt)", "w+");
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use ".UTF8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".UTF8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
...
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
UTF-8 Support

Can I seek a position beyond 2GB in C using the standard library?

I am making a program that reads disk images in C. I am trying to make something portable, so I do not want to use too many OS-specific libraries. I am aware there are many disk images that are very large files but I am unsure how to support these files.
I have read up on fseek and it seems to use a long int which is not guaranteed to support values over 231-1. fsetpos seems to support a larger value with fpos_t but an absolute position cannot be specified. I have also though about using several relative seeks with fseek but am unsure if this is portable.
How can I support portably support large files in C?
There is no portable way.
On Linux there are fseeko() and ftello(), pair (need some defines, check ftello()).
On Windows, I believe you have to use _fseeki64() and _ftelli64()
#ifdef is your friend
pread() works on any POSIX-compliant platform (OS X, Linux, BSD, etc.). It's missing on Windows but there are lots of standard things that Windows gets wrong; this won't be the only thing in your codebase that needs a Windows special case.
You can't do it with standard C. Even with relative seeks it's not possible on some architectures.
One approach would be to check the platform at compile time. You can just check the value of LONG_MAX and throw a compile error if it's not large enough. But even that doesn't guarantees that the underlying filesystem supports files larger than 2 or 4GB.
A better way is to use the pre-processor macros supplied by your compiler to check the operating system that your code is being compiled for and write operating system specific specific. The operating system should provide a way to check that the filesystem actually supports files larger than 2GB or 4GB.

Is open command suitable for binary file operations

I have an FTP application sending binary files over the TCP sockets.
I have opened the file using open and reading the binary files as if they were string files (Program works fine with text files) and then sending them over TCP.
But I'm struggling with the output at the other end. I wanted to know if fopen is better suited for binary files or binary files can be treated as text files.
On Linux, there is no notion of binary or text file (contrarily to Windows, where it is relevant), which appears only in the C99 standard fopen(3) function which says:
The mode string can also include the letter 'b' either as a last
character or as a character between the characters in any of the two-
character strings described above. This is strictly for
compatibility with C89 and has no effect; the 'b' is ignored on all
POSIX conforming systems, including Linux. (Other systems may treat
text files and binary files differently, and adding the 'b' may be a
good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)
Of course you can use the open(2) syscall directly (BTW, fopen uses it).
However, binary files are much less portable (e.g. because of endianness issues) than textual ones. Read about serialization, so perhaps prefer textual formats and protocols e.g. JSON to binary ones.
Regarding FTP on the client side, consider perhaps using existing libraries like libcurl

No O_BINARY and O_TEXT flags in Linux?

When using system-level IO in Linux, I noticed that the compiler recognized the O_RDONLY and O_RDWR flags, but it had no clue whatsoever as to the meaning of the O_BINARY and O_TEXT flags.
Is this a Linux thing?
Linux, and just about every flavor of Unix for that matter, doesn't differentiate between binary and text files. Thus, there are no standard constants with that name. You can manually define the constants to be zero in Linux if you want to include them in your code for portability purposes.
http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2007-03/msg00147.html
It's a *nix thing. *nix operating systems don't do automatic linefeed conversion for I/O on "text" files so O_TEXT and O_BINARY flags wouldn't make sense.
At the level of C language and its standard library, there's no such thing as O_BINARY and O_TEXT flags. The binary or text mode is selected by adding the b specifier of the mode parameter of fopen function. The specifier itself is, of course, supported by all C implementations, but on POSIX platforms this specifier has no effect: per POSIX specification the text mode is the same as the binary mode.
Not suprisingly, if you dig deeper into the level of non-standard platform-specific Unix I/O functions, you'll discover that they have no knowledge of that text/binary distinction whatsoever.
Windows uses \r\n for line endings, Linux (and other Unix-alikes) use just \n. In Windows, reading O_BINARY gives you the raw data, possibly odd line endings and all, while O_TEXT normalises the line endings, so your C code only sees a single character.
Under Linux et al, there's no point distinguishing between text and binary, because the data only has a single character anyway, so the flags are unnecessary.
There isn't a difference at the OS level between binary and text file under Unix. Text file have just a restricted content. That's also true for Windows, but the conventions used by C for the end of lines are the same as the one used by Unix, while Windows use CR/LF pair (and an explicit end of file marker in some contexts, but the handling of that was not consistent even in the system programs last time I checked), so a mapping is needed to respect the conventions mandated by C.

Is there a standard way to do an fopen with a Unicode string file path?

Is there a standard way to do an fopen with a Unicode string file path?
No, there's no standard way. There are some differences between operating systems. Here's how different OSs handle non-ASCII filenames.
Linux
Under Linux, a filename is simply a binary string. The convention on most modern distributions is to use UTF-8 for non-ASCII filenames. But in the beginning, it was common to encode filenames as ISO-8859-1. It's basically up to each application to choose an encoding, so you can even have different encodings used on the same filesystem. The LANG environment variable can give you a hint what the preferred encoding is. But these days, you can probably assume UTF-8 everywhere.
This is not without problems, though, because a filename containing an invalid UTF-8 sequence is perfectly valid on most Linux filesystems. How would you specify such a filename if you only support UTF-8? Ideally, you should support both UTF-8 and binary filenames.
OS X
The HFS filesystem on OS X uses Unicode (UTF-16) filenames internally. Most C (and POSIX) library functions like fopen accept UTF-8 strings (since they're 8-bit compatible) and convert them internally.
Windows
The Windows API uses UTF-16 for filenames, but fopen uses the current codepage, whatever that is (UTF-8 just became an option). Many C library functions have a non-standard equivalent that accepts UTF-16 (wchar_t on Windows). For example, _wfopen instead of fopen.
In *nix, you simply use the standard fopen (see more information in reply from TokeMacGuy, or in this forum)
In Windows, you can use _wfopen, and then pass a Unicode string (for more information, see MSDN).
As there is no real common way, I would wrap this call in a macro, together with all other system-dependent functions.
This is a matter of your current locale. On my system, which is Unicode-enabled, file paths will be in Unicode. I'm able to detect this by means of the locale command:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
The encoding of file paths is normally set system wide, so if your file path is not in the system's locale, you will need to convert it, perhaps by means of the iconv library.
Almost all POSIX platforms use UTF-8 nowadays. And modern Windows also support UTF-8 as the locale, you can just use UTF-8 everywhere and open any files without using wide strings on Windows. fopen just works portably
setlocale(LC_ALL, "en_us.utf8"); // need some setup before calling this
fopen(R"(C:\filê\wíth\Ünicode\name.txt)", "w+");
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use ".UTF8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".UTF8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
...
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
UTF-8 Support

Resources