What is the difference between _LARGEFILE_SOURCE and _FILE_OFFSET_BITS=64? - c

I understand that -D_FILE_OFFSET_BITS=64 causes off_t to be 64bits. So what does -D_LARGEFILE_SOURCE do that isn't already done by -D_FILE_OFFSET_BITS=64? What do these definitions do exactly?

The GLIBC Feature test macros documentation states:
_LARGEFILE_SOURCE
If this macro is defined some extra functions are available which rectify a few shortcomings in all previous standards. Specifically, the functions fseeko and ftello are available. Without these functions the difference between the ISO C interface (fseek, ftell) and the low-level POSIX interface (lseek) would lead to problems.
This macro was introduced as part of the Large File Support extension (LFS).
So that macro specifically makes fseeko and ftello available. _FILE_OFFSET_BITS settings alone don't make these functions available.
(Note that if you're using a GNU dialect of C, the default with GCC, you might not need to explicitly define _LARGEFILE_SOURCE. You do if you use -std=c99 for instance.)

The other answer is wrong, as the documentation for _LARGEFILE_SOURCE is misleading. _FILE_OFFSET_BITS=64 is sufficient to expose the fseeko and ftello functions, and so is a _POSIX_C_SOURCE macro defined to >= 200112L.
From the glibc documentation on _FILE_OFFSET_BITS
If the macro is defined to the value 64, the large file interface replaces the old interface. I.e., the functions are not made available under different names (as they are with _LARGEFILE64_SOURCE). Instead the old function names now reference the new functions, e.g., a call to fseeko now indeed calls fseeko64.
Always define _FILE_OFFSET_BITS=64 to switch over to the 64-bit types on 32-bit glibc-based systems. glibc should really make it the default...

LFS
Large File Support (LFS), provides extra functionality required +- to access large files
large file == On 32-bit architectures, file larger than 2GB
We can write applications requiring LFS functionality in one of two ways:
Define the _LARGEFILE64_SOURCE feature test macro when compiling our program. -- Use the transitional LFS API.
Define the _FILE_OFFSET_BITS macro with the value 64 ++ when compiling our programs.
_LARGEFILE64_SOURCE
Define the _LARGEFILE64_SOURCE feature test macro when compiling our program. -- Use the transitional LFS API.
This API provides functions capable of handling 64-bit file sizes and offsets.
These functions have the same names as their 32-bit counterparts,
++ but have the suffix 64 appended +- to the function name.
Among these functions are
fopen64(),
open64(),
lseek64(),
truncate64(),
stat64(),
mmap64(),
setrlimit64().
#eg::
fd = open64(name, O_CREAT | O_RDWR, mode);
_FILE_OFFSET_BITS
Define the _FILE_OFFSET_BITS macro with the value 64 ++ when compiling our programs.
This automatically converts all of the relevant 32-bit functions and data types into their 64-bit counterparts.
#eg:: for example,
calls to open() are actually converted into calls to open64(), and
the off_t data type is defined +- to be 64 bits long.
#ie:: In other words,
we can recompile an existing program +- to handle large files without needing +- to make any changes +- to the source code.
vs
Using _FILE_OFFSET_BITS is clearly simpler than using the _LARGEFILE64_SOURCE (transitional LFS API).
_LARGEFILE64_SOURCE (transitional LFS API) is now obsolete.
_FILE_OFFSET_BITS is preferred.
reference
The Linux Programming Interface
(most content was directly copied from here)

Related

Does _FILE_OFFSET_SIZE work after prior inclusion of stdio.h?

I want to use fseeko, and so have this:
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
Because source is intended for all sorts of platforms and architectures there are many surrounding preprocessor switches for each, making it unpreferable to copy into source files where used. However, if putting this in a header I fear this would require that header to be included first, which may be difficult to enforce.
Testing whether an stdio.h macro is defined or not might work, e.g. SEEK_CUR, but that is ugly in its own ways since there is no standard, self-explainatory STDIO_INCLUDED-macro to test for.
Some headers allow multiple inclusion with different switches, like assert.h and NDEBUG. I was wondering if this is also the case according to POSIX standard for _FILE_OFFSET_BITS and other similar switches.
Ugh. Why does my reading skills get a boost right after asking a question in public. From fseeko page, emphasis mine:
On some architectures, both off_t and long are 32-bit types, but defining _FILE_OFFSET_BITS with the value 64 (before including any header files) will turn off_t into a 64-bit type.
The answer is no.

Which platforms implement flock?

I'm looking at the Ruby MRI code for File#flock. The documentation states that it's "Not available on all platforms.", but doesn't state which. If I should venture a guess, old FAT file systems might not have locking, but I would like to not be guessing.
Digging a bit into the implementation takes me to rb_file_flock(VALUE obj, VALUE operation), which in turn calls rb_thread_flock(void *data). This simply wraps a call to flock from sys/file.h. However, it seems that this implementation may or may not be available:
#ifdef HAVE_SYS_FILE_H
# include <sys/file.h>
#else
int flock(int, int);
#endif
However, I can't figure out where HAVE_SYS_FILE_H is defined (In a build-script perhaps?), so I don't know which platforms would enable it.
So, for my question(s): Which platforms could I expect HAVE_SYS_FILE_H to be defined for. And provided that it is defined and thus sys/file.h available, can I expect file locking to work?
flock is a BSD and Linux extension function:
CONFORMING TO
4.4BSD (the flock() call first appeared in 4.2BSD). A version of flock(), possibly implemented in terms of fcntl(2), appears on most UNIX systems.
Unix specification does require advisory file locking to be implemented in terms of fcntl(F_SETLK):
Record locking shall be supported for regular files, and may be supported for other files.

Cross-platform seeking for large files [duplicate]

I am running into integer overflow using the standard ftell and fseek options inside of G++, but I guess I was mistaken because it seems that ftell64 and fseek64 are not available. I have been searching and many websites seem to reference using lseek with the off64_t datatype, but I have not found any examples referencing something equal to fseek. Right now the files that I am reading in are 16GB+ CSV files with the expectation of at least double that.
Without any external libraries what is the most straightforward method for achieving a similar structure as with the fseek/ftell pair? My application right now works using the standard GCC/G++ libraries for 4.x.
fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g.
gcc -D_FILE_OFFSET_BITS=64 ....
http://www.suse.de/~aj/linux_lfs.html has a great overviw of large file support on linux:
Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
Use the O_LARGEFILE flag with open to operate on large files.
If you want to stick to ISO C standard interfaces, use fgetpos() and fsetpos(). However, these functions are only useful for saving a file position and going back to the same position later. They represent the position using the type fpos_t, which is not required to be an integer data type. For example, on a record-based system it could be a struct containing a record number and offset within the record. This may be too limiting.
POSIX defines the functions ftello() and fseeko(), which represent the position using the off_t type. This is required to be an integer type, and the value is a byte offset from the beginning of the file. You can perform arithmetic on it, and can use fseeko() to perform relative seeks. This will work on Linux and other POSIX systems.
In addition, compile with -D_FILE_OFFSET_BITS=64 (Linux/Solaris). This will define off_t to be a 64-bit type (i.e. off64_t) instead of long, and will redefine the functions that use file offsets to be the versions that take 64-bit offsets. This is the default when you are compiling for 64-bit, so is not needed in that case.
fseek64() isn't standard, the compiler docs should tell you where to find it.
Have you tried fgetpos and fsetpos? They're designed for large files and the implementation typically uses a 64-bit type as the base for fpos_t.
Have you tried fseeko() with the _FILE_OFFSET_BITS preprocessor symbol set to 64?
This will give you an fseek()-like interface but with an offset parameter of type off_t instead of long. Setting _FILE_OFFSET_BITS=64 will make off_t a 64-bit type.
The same for goes for ftello().
Use fsetpos(3) and fgetpos(3). They use the fpos_t datatype , which I believe is guaranteed to be able to hold at least 64 bits.

sigaction System Call

I was looking at the man page of sigaction, and I ended up looking at the following line.
sigaction(): _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE
What do _POSIX_X_SOURCE, _X_OPEN_SOURCE, _POSIX_SOURCE mean? What to do with it?
These are feature test macros. Their purpose is to allow your program to inform the system header files which standards you want it to attempt to conform to, and what extensions you want available.
Without any feature test macros defined, implementations vary a lot in what macros, functions, and type definitions they make visible in their headers. A common practice is to make everything visible by default, which is a problem because "everything" is not very specific, and it's very possible that symbol names used in your program might clash with some of the extensions. Even if they don't clash now, there's no way to know if they will in the future. So the standards (like ISO C and POSIX) put strict requirements on the implementation that it not pollute the applications namespace with names not explicitly defined or reserved in the standards. When you use a feature test macro to request a particular standard, you're asking the implementation to ensure that (1) it provides everything defined in this standard, (2) it doesn't pollute your application's namespace by providing anything not defined in that standard.
A correct program should always explicitly use the right feature test macros for the standard(s) it's written to. The easiest way to do this is putting the right -D argument on the compiler command line (CFLAGS). Adding the #define as the first line in each source file also works. Be aware if you do it in source files though:
The feature test macros must be defined at the top before any system header is included.
It's usually a bad idea to use different feature test macros in different translation units.
As an aside, it's not exactly the same as the other feature test macros, but all modern programs should define _FILE_OFFSET_BITS=64 when built on Linux/glibc to request that off_t be 64-bit for large file support.
Here is a man for Feature macros: http://www.kernel.org/doc/man-pages/online/pages/man7/feature_test_macros.7.html
They will turn on or off some level of standard support in the headers.
E.g. _POSIX_C_SOURCE >= 1 means that POSIX.2-1992 or later should be supported; _X_OPEN_SOURCE means POSIX.1, POSIX.2, and XPG4 are enabled; and for greater values of macro (>=500; >=600; >=700) it will also turn on some variants of SUSv2 v3 or v4 (UNIX 98; 03 or POSIX.1-2008+XSI). And _POSIX_SOURCE is an obsolete way to define _POSIX_C_SOURCE = 1
They're the things you have to #define to get the prototype, and are known as feature test macros.
For example, the following code will susscessfully define the prototype for sigaction:
#define _XOPEN_SOURCE
#include <signal.h>
Including signal.h without that #define (or the others) will not define the prototype.
It is a Feature test macro.
Symbols called "feature test macros" are used to control the visibility of symbols that might be included in a header. Implementations, future versions of IEEE Std 1003.1-2001, and other standards may define additional feature test macros.

Macro definitions for headers, where to put them?

When defining macros that headers rely on, such as _FILE_OFFSET_BITS, FUSE_USE_VERSION, _GNU_SOURCE among others, where is the best place to put them?
Some possibilities I've considered include
At the top of the any source files that rely on definitions exposed by headers included in that file
Immediately before the include for the relevant header(s)
Define at the CPPFLAGS level via the compiler? (such as -D_FILE_OFFSET_BITS=64) for the:
Entire source repo
The whole project
Just the sources that require it
In project headers, which should also include those relevant headers to which the macros apply
Some other place I haven't thought of, but is infinitely superior
A note: Justification by applicability to make, autotools, and other build systems is a factor in my decision.
If the macros affect system headers, they probably ought to go somewhere where they affect every source file that includes those system headers (which includes those that include them indirectly). The most logical place would therefore be on the command line, assuming your build system allows you to set e.g. CPPFLAGS to affect the compilation of every file.
If you use precompiled headers, and have a precompiled header that must therefore be included first in every source file (e.g. stdafx.h for MSVC projects) then you could put them in there too.
For macros that affect self-contained libraries (whether third-party or written by you), I would create a wrapper header that defines the macros and then includes the library header. All uses of the library from your project should then include your wrapper header rather than including the library header directly. This avoids defining macros unnecessarily, and makes it clear that they relate to that library. If there are dependencies between libraries then you might want to make the macros global (in the build system or precompiled header) just to be on the safe side.
Well, it depends.
Most, I'd define via the command line - in a Makefile or whatever build system you use.
As for _FILE_OFFSET_BITS I really wouldn't define it explicitly, but rather use getconf LFS_CFLAGS and getconf LFS_LDFLAGS.
I would always put them on the command line via CPPFLAGS for the whole project. If you put them any other place, there's a danger that you might forget to copy them into a new source file or include a system header before including the project header that defines them, and this could lead to extremely nasty bugs (like one file declaring a legacy 32-bit struct stat and passing its address to a function in another file which expects a 64-bit struct stat).
BTW, it's really ridiculous that _FILE_OFFSET_BITS=64 still isn't the default on glibc.
Most projects that I've seen use them did it via -D command line options. They are there because that eases building the source with different compilers and system headers. If you were to build with a system compiler for another system that didn't need them or needed a different set of them then a configure script can easily change the command line arguments that a make file passes to the compiler.
It's probably best to do it for the entire program because some of the flags effect which version of a function gets brought in or the size/layout of a struct and mixing those up could cause crazy things if you aren't careful.
They certainly are annoying to keep up with.
For _GNU_SOURCE and the autotools in particular, you could use AC_USE_SYSTEM_EXTENSIONS (citing liberally from the autoconf manual here):
-- Macro: AC_USE_SYSTEM_EXTENSIONS
This macro was introduced in Autoconf 2.60. If possible, enable
extensions to C or Posix on hosts that normally disable the
extensions, typically due to standards-conformance namespace
issues. This should be called before any macros that run the C
compiler. The following preprocessor macros are defined where
appropriate:
_GNU_SOURCE
Enable extensions on GNU/Linux.
__EXTENSIONS__
Enable general extensions on Solaris.
_POSIX_PTHREAD_SEMANTICS
Enable threading extensions on Solaris.
_TANDEM_SOURCE
Enable extensions for the HP NonStop platform.
_ALL_SOURCE
Enable extensions for AIX 3, and for Interix.
_POSIX_SOURCE
Enable Posix functions for Minix.
_POSIX_1_SOURCE
Enable additional Posix functions for Minix.
_MINIX
Identify Minix platform. This particular preprocessor macro
is obsolescent, and may be removed in a future release of
Autoconf.
For _FILE_OFFSET_BITS, you need to call AC_SYS_LARGEFILE and AC_FUNC_FSEEKO:
— Macro: AC_SYS_LARGEFILE
Arrange for 64-bit file offsets, known as large-file support. On some hosts, one must use special compiler options to build programs that can access large files. Append any such options to the output variable CC. Define _FILE_OFFSET_BITS and _LARGE_FILES if necessary.
Large-file support can be disabled by configuring with the --disable-largefile option.
If you use this macro, check that your program works even when off_t is wider than long int, since this is common when large-file support is enabled. For example, it is not correct to print an arbitrary off_t value X with printf("%ld", (long int) X).
The LFS introduced the fseeko and ftello functions to replace their C counterparts fseek and ftell that do not use off_t. Take care to use AC_FUNC_FSEEKO to make their prototypes available when using them and large-file support is enabled.
If you are using autoheader to generate a config.h, you could define the other macros you care about using AC_DEFINE or AC_DEFINE_UNQUOTED:
AC_DEFINE([FUSE_VERSION], [28], [FUSE Version.])
The definition will then get passed to the command line or placed in config.h, if you're using autoheader. The real benefit of AC_DEFINE is that it easily allows preprocessor definitions as a result of configure checks and separates system-specific cruft from the important details.
When writing the .c file, #include "config.h" first, then the interface header (e.g., foo.h for foo.c - this ensures that the header has no missing dependencies), then all other headers.
I usually put them as close as practicable to the things that need them, whilst ensuring you don't set them incorrectly.
Related pieces of information should be kept close to make it easier to identify. A classic example is the ability for C to now allow variable definitions anywhere in the code rather than just at the top of a function:
void something (void) {
// 600 lines of code here
int x = fn(y);
// more code here
}
is a lot better than:
void something (void) {
int x;
// 600 lines of code here
x = fn(y);
// more code here
}
since you don't have to go searching for the type of x in the latter case.
By way of example, if you need to compile a single source file multiple times with different values, you have to do it with the compiler:
gcc -Dmydefine=7 -o binary7 source.c
gcc -Dmydefine=9 -o binary9 source.c
However, if every compilation of that file will use 7, it can be moved closer to the place where it's used:
source.c:
#include <stdio.h>
#define mydefine 7
#include "header_that_uses_mydefine.h"
#define mydefine 7
#include "another_header_that_uses_mydefine.h"
Note that I've done it twice so that it's more localised. This isn't a problem since, if you change only one, the compiler will tell you about it, but it ensures that you know those defines are set for the specific headers.
And, if you're certain that you will never include (for example) bitio.h without first setting BITCOUNT to 8, you can even go so far as to create a bitio8.h file containing nothing but:
#define BITCOUNT 8
#include "bitio.h"
and then just include bitio8.h in your source files.
Global, project-wide constants that are target specific are best put in CCFLAGS in your makefile. Constants you use all over the place can go in appropriate header files which are included by any file that uses them.
For example,
// bool.h - a boolean type for C
#ifndef __BOOL_H__
#define BOOL_H
typedef int bool_t
#define TRUE 1
#define FALSE 0
#endif
Then, in some other header,
`#include "bool.h"`
// blah
Using header files is what I recommend because it allows you to have a code base built by make files and other build systems as well as IDE projects such as Visual Studio. This gives you a single point of definition that can be accompanied by comments (I'm a fan of doxygen which allows you to generate macro documentation).
The other benefit with header files is that you can easily write unit tests to verify that only valid combinations of macros are defined.

Resources