What is the purpose of libc_nonshared.a? - c

Why does libc_nonshared.a exist? What purpose does it serve? I haven't been able to find a good answer for its existence online.
As far as I can tell it provides certain symbols (stat, lstat, fstat, atexit, etc.). If someone uses one of these functions in their code, it will get linked into the final executable from this archive. These functions are part of the POSIX standard and are pretty common so I don't see why they wouldn't just be put in the shared or static libc.so.6 or libc.a, respectively.

It was a legacy mistake in glibc's implementing extensibility for the definition of struct stat before better mechanisms (symbol redirection or versioning) were thought of. The definitions of the stat-family functions in libc_nonshared.a cause the version of the structure to bind at link-time, and the definitions there call the __xstat-family functions in the real shared libc, which take an extra argument indicating the desired structure version. This implementation is non-conforming to the standard since each shared library ends up gettings its own copy of the stat-family functions with their own addresses, breaking the requirement that pointers to the same function evaluate equal.

Here's the problem. Long ago, members of the struct stat structure had different sizes than they had today. In particular:
uid_t was 2 bytes (though I think this one was fixed in the transition from libc5 to glibc)
gid_t was 2 bytes
off_t was 4 bytes
blkcnt_t was 4 bytes
time_t was 4 bytes
also, timespec wasn't used at all and there was no room for nanosecond precision.
So all of these had to change. The only real solution was to make different versions of the stat() system call and library function and you get the version you compiled against. That is, the .a file matches the header files. These things didn't all change at once, but I think we're done changing them now.
You can't really solve this by a macro because the structure name is the same as the function name; and inline wasn't mandated to exist in the beginning so glibc couldn't demand everybody use it.
I remember there used to be this thing O_LARGEFILE for saying you could handle files bigger than 4GB; otherwise things just wouldn't work. We also used to have to define things like _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE but it's all handled automatically now. Back in the day, if you weren't ready for large file support yet, you didn't define these and you didn't get the 64-bit version of the stat structure; and also worked on older kernel versions lacking the new system calls. I haven't checked; it's possible that 32-bit compilation still doesn't define these automatically, but 64-bit always does.
So you probably think; okay, fine, just don't franken-compile stuff? Just build everything that goes into the final executable with the same glibc version and largefile-choice. Ever use plugins such as browser plugins? Those are pretty much guaranteed to be compiled in different places with different compiler and glibc versions and options; and this didn't require you to upgrade your browser and replace all its plugins at the same time.

Related

Cross-platform way to determine if file has been edited?

I am writing a cross-platform (big 3 - Linux, MAC, Windows) backup program, so I need to know if a file has been edited since last time. My plan is to save the last save time in a file and check the real situation of a folder against the data in the file to determine which files need to be backed up or updated.
I would like to avoid methods that require a lot of processing power (like diff, or counting bytes).
In this similar post, people suggested to use fstat(), but that solution would be a last resort for me because I was hoping for a cross-platform solution that can be solved with pure C. As far as I know, fstat is a (2), and in my man page it appears as (1), which (to my understanding) means that it is a system function in Linux and isn't a part of the standard C library. I have searched for fstat on windows, but could only find some android version.
Is there some other way to access file metadata? Is there some other solution to this? I am open to any suggestions and am ok if it sometimes false-flags, as long as it backs up data correctly and doesn't waste resources on backing up everything all the time.
Please help!
Thank you!
fstat is still the way to do this, but on Windows it's called _fstat. You can check for the _MSC_VER macro which will be defined if you're building with MSVC, and if so create a macro alias for fstat.
You can do the same for struct stat which MSVC calls struct _stat:
#ifdef _MSC_VER
#define fstat(fd,buf) _fstat(fd,buf)
typedef struct _stat stat_struct;
#else
typedef struct stat stat_struct;
#endif
Then you can use fstat and pass it an argument of type stat_struct for the second argument.
I have a decently sized cross platform open source application that uses this technique.
My plan is to save the last save time in a file and check the real situation of a folder against the data in the file to determine which files need to be backed up or updated.
Ok.
I was hoping for a cross-platform solution that can be solved with pure C.
If by "pure C" you mean relying on only language features and library functions defined by the C language specification, then I'm afraid you're out of luck. Pure C (in that sense) has no concept of persistent file metadata such as modification timestamps. All functions and data structures dealing with such things are extensions or third-party libraries.
You can rely on standard POSIX facilities (such as fstat()) for both Linux and Mac, but Windows does not provide that. At least, Windows does not provide it exactly. The Microsoft C library does provide some POSIX compatibility functions, but it somewhat maddeningly uses modified names for them. In particular, it offers several flavors of _fstat() (note leading underscore). With a little bit of macro glue, it should not be too hard to make your program use POSIX fstat() on Linux and Mac, and use one of the _fstat() flavors on Windows.

How is the type sf_count_t in sndfile.h defined in libsndfile?

I am trying to work with Nyquist (a music programming platform, see: https://www.cs.cmu.edu/~music/nyquist/ or https://www.audacityteam.org/about/nyquist/) as a standalone program and it utilizes libsndfile (a library for reading and writing sound, see: http://www.mega-nerd.com/libsndfile/). I am doing this on an i686 GNU/Linux machine (Gentoo).
After successful set up and launching the program without errors, I tried to generate sound via one of the examples, "(play (osc 60))", and was met with this error:
*** Fatal error : sizeof (off_t) != sizeof (sf_count_t)
*** This means that libsndfile was not configured correctly.
Investigating this further (and emailing the author) has proved somewhat helpful, but the solution is still far from my grasp. The author recommended looking at /usr/include/sndfile.h to see how sf_count_t is defined, and (this portion of) my file is identical to his:
/* The following typedef is system specific and is defined when libsndfile is
** compiled. sf_count_t will be a 64 bit value when the underlying OS allows
** 64 bit file offsets.
** On windows, we need to allow the same header file to be compiler by both GCC
** and the Microsoft compiler.
*/
#if (defined (_MSCVER) || defined (_MSC_VER))
typedef __int64 sf_count_t ;
#define SF_COUNT_MAX 0x7fffffffffffffffi64
#else
typedef int64_t sf_count_t ;
#define SF_COUNT_MAX 0x7FFFFFFFFFFFFFFFLL
#endif
In the above the author notes there is no option for a "32 bit offset". I'm not sure how I would proceed. Here is the particular file the author of Nyquist recommend I investigate: https://github.com/erikd/libsndfile/blob/master/src/sndfile.h.in , and here is the entire source tree: https://github.com/erikd/libsndfile
Here are some relevant snippets from the authors email reply:
"I'm guessing sf_count_t must be showing up as 32-bit and you want
libsndfile to use 64-bit file offsets. I use nyquist/nylsf which is a
local copy of libsndfile sources -- it's more work keeping them up to
date (and so they probably aren't) but it's a lot easier to build and
test when you have a consistent library."
"I use CMake and nyquist/CMakeLists.txt to build nyquist."
"It may be that one 32-bit machines, the default sf_count_t is 32
bits, but I don't think Nyquist supports this option."
And here is the source code for Nyquist: http://svn.code.sf.net/p/nyquist/code/trunk/nyquist/
This problem is difficult for me to solve because it's composed of an niche use case of relatively obscure software. This also makes the support outlook for the problem a bit worrisome. I know a little C++, but I am far from confident in my ability to solve this. Thanks for reading and happy holidays to all. If you have any suggestions, even in terms of formatting or editing, please do not hesitate!
If you look at the sources for the bundled libsndfile in nyquist, i.e. nylsf, then you see that sndfile.h is provided directly. It defines sf_count_t as a 64-bit integer.
The libsndfile sources however do not have this file, rather they have a sndfile.h.in. This is an input file for autoconf, which is a tool that will generate the proper header file from this template. It has currently the following definition for sf_count_t for linux systems (and had it since a while):
typedef #TYPEOF_SF_COUNT_T# sf_count_t ;
The #TYPEOF_SF_COUNT_T# would be replaced by autoconf to generate a header with a working type for sf_count_t for the system that is going to be build for. The header file provided by nyquist is therefore already configured (presumably for the system of the author).
off_t is a type specified by the POSIX standard and defined in the system's libc. Its size on a system using the GNU C library is 32bit if the system is 32bit.
This causes the sanity check in question to fail, because the sizes of sf_count_t and off_t don't match. The error message is also correct, as we are using an unfittingly configured sndfile.h for the build.
As I see it you have the following options:
Ask the nyquist author to provide the unconfigured sndfile.h.in and to use autoconf to configure this file at build time.
Do not use the bundled libsndfile and link against the system's one. (This requires some knowledge and work to change the build scripts and header files, maybe additional unexpected issues)
If you are using the GNU C library (glibc): The preprocessor macro _FILE_OFFSET_BITS can be set to 64 to force the size of off_t and the rest of the file interface to use the 64bit versions even on 32bit systems.
This may or may not work depending on whether your system supports it and it is not a clean solution as there may be additional misconfiguration of libsndfile going unnoticed. This flag could also introduce other interface changes that the code relies on, causing further build or runtime errors/vulnerabilities.
Nonetheless, I think the syntax for cmake would be to add:
add_compile_definitions(_FILE_OFFSET_BITS=64)
or depending on cmake version:
add_definitions(-D_FILE_OFFSET_BITS=64)
in the appropriate CMakeLists.txt.
Actually the README in nyquist/nylsf explains how the files for it were generated. You may try to obtain the source code of the same libsndfile version it is based on and repeat the steps given to produce an nylsf configured to your system. It may cause less further problems than 2. and 3. because there wouldn't be any version/interface changes introduced.

How to circumvent dlopen() caching?

According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).
POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).

Structure definition in header file for a library and compilation differences

I have a code which is compiled into a library (dll, static library and so). I want the user of this library to use some struct to pass some data as parameters for the library function. I thought about declaring the struct in the API header file.
Is it safe to do so, considering compilation with different compilers, with respect to structure alignment or other things I didn't think about?
Will it require the usage of the same compiler (and flags) for both the library and its user?
Few notes:
I considered giving the user a pointer and set all the struct via functions in the library, but this will make the API really not comfortable to use.
This question is about C, although it would be nice to know if there's a difference in c++.
If it's a regular/static library, the library and application should be compiled using the same compiler. There're a few reasons for this that I can think of:
Different compilers (as in different brands or compilers for different platforms) normally don't understand each other's object and library formats.
You don't want to compile different parts of the same program using different types (e.g. signed vs unsigned char), type sizes (e.g. long = 32 vs 64 bits), alignment and packing and probably some other things, all of which are allowed by the C standard to vary. Mixing and matching those things is usually a bad thing.
You may, however, often use slightly different versions of the same compiler to compile the library and the application using it. Usually, it's OK. Sometimes there're changes that break the code, though.
You may implement some "initialization" function in that header file (declared as static inline) that would ensure that types, type sizes, alignment and packing are the same as expected by the compiled library. The application using this library would have to call this function prior to using any other part of the library. If things aren't the same as expected, the function must fail and cause program termination, possibly with some good textual description of the failure. This won't solve completely the problem of having somewhat incompatible compilers, but it can prevent silent and mysterious malfunctions. Some things can be checked with the preprocessor's #if and #ifdef directives and cause compilation errors with #error.
In addition, structure packing problems can be relieved by inserting explicit padding bytes into structure declarations and forcing tight packing (by e.g. using #pragma pack, which is supported by many compilers). That way if type sizes are the same, it won't matter what the default packing is.
You can apply the same to DLLs as well, but you should really expect that the calling application has been compiled with a different compiler and not depend on the compilers being the same.
All Windows APIs throw structs around like crazy so obviously this is something that is done every day and it works. Of course it doesn't mean that your concerns are not valid :)
I would suggest making your structure's fields have explicit width types (int32_t etc) and maybe specify explicitly that that the packing in a way which would break on any compiler but yours, i.e.
#if defined(_MSC_VER)
#pragma pack(0)
#elif defined ... handle gcc
#else
FAIL // fail compilation on unsupported platform
#endif

Change library load order at run time (like LD_PRELOAD but during execution)

How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");
AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.
There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.
It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.
You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.
there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work
Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.

Resources