Which platforms implement flock? - c

I'm looking at the Ruby MRI code for File#flock. The documentation states that it's "Not available on all platforms.", but doesn't state which. If I should venture a guess, old FAT file systems might not have locking, but I would like to not be guessing.
Digging a bit into the implementation takes me to rb_file_flock(VALUE obj, VALUE operation), which in turn calls rb_thread_flock(void *data). This simply wraps a call to flock from sys/file.h. However, it seems that this implementation may or may not be available:
#ifdef HAVE_SYS_FILE_H
# include <sys/file.h>
#else
int flock(int, int);
#endif
However, I can't figure out where HAVE_SYS_FILE_H is defined (In a build-script perhaps?), so I don't know which platforms would enable it.
So, for my question(s): Which platforms could I expect HAVE_SYS_FILE_H to be defined for. And provided that it is defined and thus sys/file.h available, can I expect file locking to work?

flock is a BSD and Linux extension function:
CONFORMING TO
4.4BSD (the flock() call first appeared in 4.2BSD). A version of flock(), possibly implemented in terms of fcntl(2), appears on most UNIX systems.
Unix specification does require advisory file locking to be implemented in terms of fcntl(F_SETLK):
Record locking shall be supported for regular files, and may be supported for other files.

Related

Cross-platform way to determine if file has been edited?

I am writing a cross-platform (big 3 - Linux, MAC, Windows) backup program, so I need to know if a file has been edited since last time. My plan is to save the last save time in a file and check the real situation of a folder against the data in the file to determine which files need to be backed up or updated.
I would like to avoid methods that require a lot of processing power (like diff, or counting bytes).
In this similar post, people suggested to use fstat(), but that solution would be a last resort for me because I was hoping for a cross-platform solution that can be solved with pure C. As far as I know, fstat is a (2), and in my man page it appears as (1), which (to my understanding) means that it is a system function in Linux and isn't a part of the standard C library. I have searched for fstat on windows, but could only find some android version.
Is there some other way to access file metadata? Is there some other solution to this? I am open to any suggestions and am ok if it sometimes false-flags, as long as it backs up data correctly and doesn't waste resources on backing up everything all the time.
Please help!
Thank you!
fstat is still the way to do this, but on Windows it's called _fstat. You can check for the _MSC_VER macro which will be defined if you're building with MSVC, and if so create a macro alias for fstat.
You can do the same for struct stat which MSVC calls struct _stat:
#ifdef _MSC_VER
#define fstat(fd,buf) _fstat(fd,buf)
typedef struct _stat stat_struct;
#else
typedef struct stat stat_struct;
#endif
Then you can use fstat and pass it an argument of type stat_struct for the second argument.
I have a decently sized cross platform open source application that uses this technique.
My plan is to save the last save time in a file and check the real situation of a folder against the data in the file to determine which files need to be backed up or updated.
Ok.
I was hoping for a cross-platform solution that can be solved with pure C.
If by "pure C" you mean relying on only language features and library functions defined by the C language specification, then I'm afraid you're out of luck. Pure C (in that sense) has no concept of persistent file metadata such as modification timestamps. All functions and data structures dealing with such things are extensions or third-party libraries.
You can rely on standard POSIX facilities (such as fstat()) for both Linux and Mac, but Windows does not provide that. At least, Windows does not provide it exactly. The Microsoft C library does provide some POSIX compatibility functions, but it somewhat maddeningly uses modified names for them. In particular, it offers several flavors of _fstat() (note leading underscore). With a little bit of macro glue, it should not be too hard to make your program use POSIX fstat() on Linux and Mac, and use one of the _fstat() flavors on Windows.

What is the purpose of libc_nonshared.a?

Why does libc_nonshared.a exist? What purpose does it serve? I haven't been able to find a good answer for its existence online.
As far as I can tell it provides certain symbols (stat, lstat, fstat, atexit, etc.). If someone uses one of these functions in their code, it will get linked into the final executable from this archive. These functions are part of the POSIX standard and are pretty common so I don't see why they wouldn't just be put in the shared or static libc.so.6 or libc.a, respectively.
It was a legacy mistake in glibc's implementing extensibility for the definition of struct stat before better mechanisms (symbol redirection or versioning) were thought of. The definitions of the stat-family functions in libc_nonshared.a cause the version of the structure to bind at link-time, and the definitions there call the __xstat-family functions in the real shared libc, which take an extra argument indicating the desired structure version. This implementation is non-conforming to the standard since each shared library ends up gettings its own copy of the stat-family functions with their own addresses, breaking the requirement that pointers to the same function evaluate equal.
Here's the problem. Long ago, members of the struct stat structure had different sizes than they had today. In particular:
uid_t was 2 bytes (though I think this one was fixed in the transition from libc5 to glibc)
gid_t was 2 bytes
off_t was 4 bytes
blkcnt_t was 4 bytes
time_t was 4 bytes
also, timespec wasn't used at all and there was no room for nanosecond precision.
So all of these had to change. The only real solution was to make different versions of the stat() system call and library function and you get the version you compiled against. That is, the .a file matches the header files. These things didn't all change at once, but I think we're done changing them now.
You can't really solve this by a macro because the structure name is the same as the function name; and inline wasn't mandated to exist in the beginning so glibc couldn't demand everybody use it.
I remember there used to be this thing O_LARGEFILE for saying you could handle files bigger than 4GB; otherwise things just wouldn't work. We also used to have to define things like _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE but it's all handled automatically now. Back in the day, if you weren't ready for large file support yet, you didn't define these and you didn't get the 64-bit version of the stat structure; and also worked on older kernel versions lacking the new system calls. I haven't checked; it's possible that 32-bit compilation still doesn't define these automatically, but 64-bit always does.
So you probably think; okay, fine, just don't franken-compile stuff? Just build everything that goes into the final executable with the same glibc version and largefile-choice. Ever use plugins such as browser plugins? Those are pretty much guaranteed to be compiled in different places with different compiler and glibc versions and options; and this didn't require you to upgrade your browser and replace all its plugins at the same time.

How to deal with Unicode paths in a cross-platfrom C library?

I'm contributing to a C library. It has a function that takes a char* parameter for a file path name. The authors are mostly UNIX developers, and this works fine on unixes where char* mostly means UTF-8. (At least in GCC, the character set is configurable and UTF-8 is the default.)
However, char* means ANSI on Windows, which implies that it is currently impossible to use Unicode path names with this library on Windows, where wchar_t* should be used and only UTF-16 is supported. (A quick search on StackOverflow reveals that the ANSI Windows API functions can not be used with UTF-8.)
The question is, what is the right way to deal with this? We've come up with various ways to do it, but neither of us are Windows experts, so we can't really decide how to do it properly. Our goal is that the users of the library should be able to write cross-platform code that would work on unixes as well as windows.
Under the hood, the library has #ifdefs in place to differentiate between operating systems so that it can use POSIX functions on UNIXes and Win32 APIs on Windows.
So far, we've come up with the following possibilities:
Offer a separate windows-only function that accepts a wchar_t*.
Require UTF-16 on Windows and #ifdef the library header in such a way that the function would accept wchar_t* on Windows.
Add a flag that would tell the function to cast the given char* to wchar_t* and call the widechar Windows APIs.
Create a variant of the function that takes a file descriptor (or file handle on Windows) instead of a file path.
Always require UTF-8 (even on Windows), and then inside the function, convert UTF-8 to UTF-16 and call the widechar Windows APIs.
The problem with options 1-4 is that they would require the user to consciously take care of portability themselves. Option 5 sounds good, but I'm not sure if this is the right way to go.
I'm also open to other suggestions or ideas that can solve this. :)
Since portability is an important goal for you, I think it is imperative for your function semantics to be precisely defined. Among other things, that means that the arguments' types and meanings don't vary across platforms. So, if you have a function that accepts regular char based paths then it should accept such paths on all systems, and the encoding expected of those paths should be well-defined (which does not necessarily mean "the same"). That rules out options (2) and (3).
Moreover, portability requires the same functions to be usable across all platforms; that rules out (1). Option (4) could be ok if a stream- and/or file descriptor-based approach were the only one provided by your library, but it yields portability only with respect to those functions, not with respect to the path-based ones. (And note that stream (FILE *) APIs are defined by C, whereas file descriptors are a POSIX concept, not native to C. In principle, therefore, streams are more portable than file descriptors.)
(5) could work, but it places stronger constraints than you actually need. It is not essential for the function to define the encoding expected (though it can); it suffices for it to define how that encoding is determined.
Additionally, you could add wchar_t-based functions that work everywhere (as opposed to Windows-only). Those might be more convenient for Windows users. Similar to alternative (4), however, that provides portability only with respect to those functions. Supposing that you don't want to drop the char-based ones, you would need to pair this alternative with some variation on (5).

Is there a way to test whether thread safe functions are available in the C standard library?

In regards to the thread safe functions in newer versions of the C standard library, is there a cross-platform way to tell if these are available via pre-processor definition? I am referring to functions such as localtime_r().
If there is not a standard way, what is the reliable way in GCC? [EDIT] Or posix systems with unistd.h?
There is no standard way to test that, which means there is no way to test it across all platforms. Tools like autoconf will create a tiny C program that calls this function and then try to compile and link it. It this works, looks like the function exists, if not, then it may not exist (or the compiler options are wrong and the appropriate CFLAGS need to be set).
So you have basically 6 options:
Require them to exist. Your code can only work on platforms where they exist; period. If they don't exist, compilation will fail, but that is not your problem, since the platform violates your minimum requirements.
Avoid using them. If you use the non-thread safe ones, maybe protected by a global lock (e.g. a mutex), it doesn't matter if they exist or not. Of course your code will then only work on platforms with POSIX mutexes, however, if a platform has no POSIX mutexes, it won't have POSIX threads either and if it has no POSIX threads (and I guess you are probably using POSIX threads w/o supporting any alternative), why would you have to worry about thread-safety in the first place?
Decide at runtime. Depending on the platform, either do a "weak link", so you can test at runtime if the function was found or not (a pointer to the function will point to NULL if it wasn't) or alternatively resolve the symbol dynamically using something like dlsym() (which is also not really portable, but widely supported in the Linux/UNIX world). However, in that case you need a fallback if the function is not found at runtime.
Use a tool like autoconf, some other tool with similar functionality, or your own configuration script to determine this prior to start of compilation (and maybe set preprocessor macros depending on result). In that case you will also need a fallback solution.
Limit usage to well known platforms. Whether this function is available on a certain platform is usually known (and once it is available, it won't go away in the future). Most platforms expose preprocessor macros to test what kind of platform that is and sometimes even which version. E.g. if you know that GNU/Linux, Android, Free/Open/NetBSD, Solaris, iOS and MacOS X all offer this function, test if you are compiling for one of these platforms and if yes, use it. If the code is compiled for another platform (or if you cannot determine what platform that is), it may or may not offer this function, but since you cannot say for sure, better be safe and use the fallback.
Let the user decide. Either always use the fallback, unless the user has signaled support or do it the other way round (which makes probably more sense), always assume it is there and in case compilation fails, offer a way the user can force your code into "compatibility mode", by somehow specifying that thread-safe-functions are not available (e.g. by setting an environment variable or by using a different make target). Of course this is the least convenient method for the (poor) user.

Difference between statvfs() and statfs() system calls?

Why do the statfs() and statvfs() calls both exist when they're so similar?
Under what circumstances would I prefer one over the other?
Err, "historical reasons".
Originally 4.4BSD defined a statfs() call. Linux later implemented a slightly different call with the same name. Posix standardized it between all freenix and Unix versions by defining statvfs().
statfs() is OS-specific
statvfs() is posix-conforming
As they all return slightly different structures, later ones to come along can't replace the first.
In general you should use statvfs(), the Posix one. Be careful about "use Posix" advice, though, as in some cases (pty, for example) the BSD (or whatever) one is more portable in practice.
If you just want file system capacity and usage information, the other answers are correct: prefer statvfs because it is standard POSIX and handles large file sizes better. statfs is BSD- and Linux-specific, with different structures on each. (Linux 2.6 added new statfs64 and fstatfs64 system calls that use an expanded structure to handle larger sizes.) However, statfs is still useful on Linux for determining the file system type (assuming you're okay with writing Linux-specific code).
statfs() is deprecated in favor of statvfs(), which deals considerably better with large file support. statfs() is known to do odd things for sizes that exceed the value of an unsigned long.
As far as I can tell (and remember), statvfs() has been around since Redhat 7.3, just after being introduced as a POSIX replacement. You'll likely find it on (most) modern systems.

Resources