Where is the definition of the POSIX function "stat" on Linux? - c

On Windows, stat and pretty much all other C/POSIX functions Windows supplies are defined in msvcrt.dll, which is the C runtime library.
On Linux, I know a lot of POSIX C functions are system calls. I also know when linking a program, you can't have undefined references. I have searched all so files in /lib and /usr/lib for the symbol stat or "mangled/prefixed" form but have not found anything. This is the command I used:
objdump -T /lib/*.so* /usr/lib/*.so* | grep "stat"
It didn't turn up the stat I was looking for.
So my question becomes: where is it, and any other "system calls" defined?

On my Linux machine, I can find the stat (weak) symbol and __stat (non-weak) in /usr/lib/libc.a

You might make linux kernel system calls without even using the libc (but this is probably a bad practice). The Linux Assembly Howto explains (in its chapters 5 & 6) how to do that (on x86 Linux 32 bits at least).
But I think it is a bad idea. Going thru the libc is good practice, and might even be faster (because e.g. of VDSO), and is more portable.

First of all stat is ambiguous; there's a stat syscall and there is a function stat that can be called from user space which calls the syscall. That last function is (on my system at least) defined in /usr/include/sys/stat.h (that's right, it's in the header file). It actually has several definitions (all one liners that call a different function, like e.g. __fxstat) of which one is chosen depending on compiler and system and whatnots.
Anyhow, stat (and other syscalls) are just wrappers that call the kernel (usually with a lot of orchestration). That is why I was initially confused about what you meant. I hope though, I could help despite my unhelpful first comment.

You can call it with syscall(2)
#include <sys/syscall.h>
...
syscall(SYS_stat, path, buf);
see also Linux syscall reference: http://syscalls.kernelgrok.com/

Related

What is the purpose of libc_nonshared.a?

Why does libc_nonshared.a exist? What purpose does it serve? I haven't been able to find a good answer for its existence online.
As far as I can tell it provides certain symbols (stat, lstat, fstat, atexit, etc.). If someone uses one of these functions in their code, it will get linked into the final executable from this archive. These functions are part of the POSIX standard and are pretty common so I don't see why they wouldn't just be put in the shared or static libc.so.6 or libc.a, respectively.
It was a legacy mistake in glibc's implementing extensibility for the definition of struct stat before better mechanisms (symbol redirection or versioning) were thought of. The definitions of the stat-family functions in libc_nonshared.a cause the version of the structure to bind at link-time, and the definitions there call the __xstat-family functions in the real shared libc, which take an extra argument indicating the desired structure version. This implementation is non-conforming to the standard since each shared library ends up gettings its own copy of the stat-family functions with their own addresses, breaking the requirement that pointers to the same function evaluate equal.
Here's the problem. Long ago, members of the struct stat structure had different sizes than they had today. In particular:
uid_t was 2 bytes (though I think this one was fixed in the transition from libc5 to glibc)
gid_t was 2 bytes
off_t was 4 bytes
blkcnt_t was 4 bytes
time_t was 4 bytes
also, timespec wasn't used at all and there was no room for nanosecond precision.
So all of these had to change. The only real solution was to make different versions of the stat() system call and library function and you get the version you compiled against. That is, the .a file matches the header files. These things didn't all change at once, but I think we're done changing them now.
You can't really solve this by a macro because the structure name is the same as the function name; and inline wasn't mandated to exist in the beginning so glibc couldn't demand everybody use it.
I remember there used to be this thing O_LARGEFILE for saying you could handle files bigger than 4GB; otherwise things just wouldn't work. We also used to have to define things like _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE but it's all handled automatically now. Back in the day, if you weren't ready for large file support yet, you didn't define these and you didn't get the 64-bit version of the stat structure; and also worked on older kernel versions lacking the new system calls. I haven't checked; it's possible that 32-bit compilation still doesn't define these automatically, but 64-bit always does.
So you probably think; okay, fine, just don't franken-compile stuff? Just build everything that goes into the final executable with the same glibc version and largefile-choice. Ever use plugins such as browser plugins? Those are pretty much guaranteed to be compiled in different places with different compiler and glibc versions and options; and this didn't require you to upgrade your browser and replace all its plugins at the same time.

Linux bare system calls, not glibc

I'm reading an article that explains how to call bare syscalls without passing through glibc. To call chmod and exit, use:
#include <linux/unistd.h>
_syscall2(int,chmod,char*,f,int,m)
_syscall1(int,exit,int,r)
My gcc complains about them. What are their use, how do they work?
$ gcc --version
gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
$ gcc e.c
e.c:2:15: error: unknown type name ‘setresuid’; did you mean ‘__NR_setresuid’?
_syscall3(int,setresuid,int,r,int,e,int,s)
^~~~~~~~~
__NR_setresuid
e.c:2:29: error: unknown type name ‘r’
_syscall3(int,setresuid,int,r,int,e,int,s)
^
e.c:2:35: error: unknown type name ‘e’
_syscall3(int,setresuid,int,r,int,e,int,s)
^
e.c:2:41: error: unknown type name ‘s’
_syscall3(int,setresuid,int,r,int,e,int,s)
^
Your article is probably obsolete.
If you code in C, there is no reason to avoid using the syscalls(2) (notice the plural) as documented. Be also aware of the vdso(7). You could use some other C standard library than the glibc (e.g. musl-libc, dietlibc, etc...) and you might (but that is not recommended) statically link it.
You might use syscall(2) (notice the singular) instead. I see no reason to do that, e.g. use read(2) or mmap(2) without syscall.
The Assembly HowTo might be an interesting read (beware, it might be too 32 bits centric, most Linux PCs today are 64 bits x86-64).
See also osdev.org
BTW, some old Unixes (e.g. Solaris) had a libsys providing just the syscalls, and their libc linked to it. I would like a libsys too! But on current Linux systems, it does not really matter, since almost every process (running some dynamically linked ELF executable) is mmap(2)-ing, after ld-linux.so(8), several segments and sections of your libc.so.6; for details, read Drepper's How to write a shared library (since it also explains in details how shared libraries actually work). Use also pmap(1) on some running process (e.g. pmap $$ in a shell).
Some rare syscalls (e.g. userfaultfd(2) today 2Q2019) are not known by the glibc. They are an exception, because most system calls are wrapped by your libc (the wrapping usually just deals with errno(3) setting on failure). Be aware of strace(1).
And you also should read Operating Systems: Three Easy Pieces (it is a freely downloadable book, explaining the role of, and reason for, system calls)

Different ways to invoke system calls

In some code, I can see system call are invoked in a strange way, take sched_yield as an example:
#define __NR_sys_sched_yield __NR_sched_yield
inline _syscall0(void, sys_sched_yield);
And then we can use sys_sched_yield().
I'm curious what's the difference between using sched_yield directly and this way.
In src/include/asm/unistd, _syscall0 is defined:
#define _syscall0(type,name) \
type name(void) \
{ \
long __res; \
__asm__ volatile ("int $0x80" \
: "=a" (__res) \
: "0" (__NR_##name)); \
__syscall_return(type,__res); \
}
Presumably that's for systems where sched_yieldmight not be available.
As for differences, sched_yield returns -1 on error and sets ERRNO while this implementation presumably returns the raw value from the kernel. Can't tell for sure since you haven't provided definition of _syscall0 which must be a macro.
This is linux which uses glibc. BSD has a sched_yield but has its own libc.
This isn't strange. The syscall0 macro issues an assembler int 0x80 instruction and the syscall number is in the rax register [x86 architecture]. This is the standard syscall interface for linux. Under the hood, all linux glibc functions that are syscall wrappers will do this [or use the more modern sysenter/sysexit x86 instruction pairings]
glibc has a tendency to "usurp" syscalls in its wrapper functions and add stuff around them. For example, when you call fork, it [ultimately] calls __libc_fork which does a huge amount of extra stuff related to threads and file closure, etc.
Generally, glibc makes good choices. But, sometimes highly experienced linux application programmers want the raw syscall behavior, particularly if they're writing system utilities or programs/libraries that must have close interaction with the kernel, device drivers, or device hardware.
Actually, __libc_fork doesn't invoke the fork syscall, it invokes the clone syscall, which is a [harder to use] superset of fork. But, the plain old fork syscall still exists. So, if you want that, you need the macro stuff--and I'll bet there's a sys_fork definition somewhere.
On the other hand, glibc might implement sched_yield ala POSIX as a nop returning -1 and setting errno to ENOSYS. I just checked latest glibc source and I couldn't find the "real" implementation, except for mach. It probably does do the real thing, I just couldn't find it.
Sometimes, linux has a syscall, but glibc doesn't want to support it, or they consider it to be too dangerous for an application programmer, so they leave out the wrapper function. So, the macros are a way to "end around" glibc.
The probable reason for glibc implementing sched_yield as a nop, posix aside, is they consider it "bad" and probably tell you to use nanosleep instead. I've used both and they are not the same, depending on your use case and desired effect.
Sometimes, you need to do the raw, inline syscall. For example, the ELF loader [every system that supports ELF binaries must have one and linux's is ld-linux.so] is invoked by the kernel to load an ELF binary. It must operate before glibc.so is available, because it is what actually links in glibc.so, the ELF loader must have some builtin syscalls for open and read
Also, most systems have a syscall library function that takes a variable number of arguments. You could implement:
#define my_sched_yield() syscall(__NR_sched_yield)
#define my_read(_fd,_buf,_len) syscall(__NR_read,_fd,_buf,_len)
This function handles the kernel's syscall return value/error and sets errno. That's what the __syscall_return macro had to do.
The __NR_* prefix is what linux uses, but other systems have AUE_* or SYS_*

access a POSIX function using dlopen

POSIX 2008 introduces several file system functions, which rely on directory descriptor when determining a path to the file (I'm speaking about -at functions, such as openat, renameat, symlinkat, etc.). I doubt if all POSIX platforms support it (well, at least the most recent versions seem to support) and I'm looking for a way to determine if platform supports such functions. Of course one may use autoconf and friends for compile-time determination, but I'm looking for a possibility to find out whether implementation supports -at functions dynamically.
The first that comes to my mind is a dlopen()/dlsym()/dlclose() combo; at least I've successfully loaded the necessary symbols from /usr/libc.so.6 shared library. However, libc may be (or is?) named differently on various platforms. Is there a list of standard locations to find libc? At least on Linux /lib/libc.so appears to be not a symbolic link to shared library, but a ld script. May be there exist some other way to examine during runtime if a POSIX function is supported? Thanks in advance!
#define _GNU_SOURCE 1
#include <dlfcn.h>
#include <stdio.h>
int main ()
{
void * funcaddr = dlsym(RTLD_DEFAULT, "symlinkat");
/* -----------------------^ magic! */
printf ("funcaddr = %p\n", funcaddr);
}
Output:
funcaddr = 0x7fb62e44c2c0
Magic explanation: your program is already linked with libc, no need to load it again.
Note, this is actually GNU libc feature, as hinted by _GNU_SOURCE. POSIX reserves RTLD_DEFAULT "for future use", and then proceeds to define it exactly like GNU libc does. So strictly speaking it is not guaranteed to work on all POSIX systems.

c and LD_PRELOAD. open and open64 calls intercepted, but not stat64

I've done a little shared library that tries to intercept open, open64, stat and stat64 sys calls.
When I export LD_PRELOAD and run oracle's sqlplus, I can see the traces of the open and open64 calls, but no traces of the stat and stat64 calls.
The shared library is a single c file with all the definitions of the sys calls in it.
Why does it happen that some syscalls are intercepted and others don't?
thanks for your help.
Because the GNU libc implements open() and open64() as you'd expect (i.e. they're just dynamically linked symbols that you can hook into with LD_PRELOAD), but does something special with stat() and stat64().
If you look at the symbols exported by libc (e.g. with nm -D /libc/libc.so.6), you'll see that it doesn't actually provide the symbols stat or stat64!
Calls to these functions are wrapped - either at compile time (if possible) by inline functions in <sys/stat.h>, or (failing that) statically linked definitions provided by libc_nonshared.a.
The actual dynamically linked functions which are called are __xstat() or __xstat64(); and these take an additional first argument, an integer, which is a version number indicating the layout of struct stat that is expected by the caller. Try hooking these instead.
(The point of all this is to allow the dynamically linked libc to support binaries which use various incompatible layouts of struct stat and definitions of bits in mode_t; if you look in /usr/include/sys/stat.h you'll find a comment to this effect. fstat(), fstat64(), lstat(), lstat64() and mknod() are also affected in the same way.)

Resources