I'm reading an article that explains how to call bare syscalls without passing through glibc. To call chmod and exit, use:
#include <linux/unistd.h>
_syscall2(int,chmod,char*,f,int,m)
_syscall1(int,exit,int,r)
My gcc complains about them. What are their use, how do they work?
$ gcc --version
gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
$ gcc e.c
e.c:2:15: error: unknown type name ‘setresuid’; did you mean ‘__NR_setresuid’?
_syscall3(int,setresuid,int,r,int,e,int,s)
^~~~~~~~~
__NR_setresuid
e.c:2:29: error: unknown type name ‘r’
_syscall3(int,setresuid,int,r,int,e,int,s)
^
e.c:2:35: error: unknown type name ‘e’
_syscall3(int,setresuid,int,r,int,e,int,s)
^
e.c:2:41: error: unknown type name ‘s’
_syscall3(int,setresuid,int,r,int,e,int,s)
^
Your article is probably obsolete.
If you code in C, there is no reason to avoid using the syscalls(2) (notice the plural) as documented. Be also aware of the vdso(7). You could use some other C standard library than the glibc (e.g. musl-libc, dietlibc, etc...) and you might (but that is not recommended) statically link it.
You might use syscall(2) (notice the singular) instead. I see no reason to do that, e.g. use read(2) or mmap(2) without syscall.
The Assembly HowTo might be an interesting read (beware, it might be too 32 bits centric, most Linux PCs today are 64 bits x86-64).
See also osdev.org
BTW, some old Unixes (e.g. Solaris) had a libsys providing just the syscalls, and their libc linked to it. I would like a libsys too! But on current Linux systems, it does not really matter, since almost every process (running some dynamically linked ELF executable) is mmap(2)-ing, after ld-linux.so(8), several segments and sections of your libc.so.6; for details, read Drepper's How to write a shared library (since it also explains in details how shared libraries actually work). Use also pmap(1) on some running process (e.g. pmap $$ in a shell).
Some rare syscalls (e.g. userfaultfd(2) today 2Q2019) are not known by the glibc. They are an exception, because most system calls are wrapped by your libc (the wrapping usually just deals with errno(3) setting on failure). Be aware of strace(1).
And you also should read Operating Systems: Three Easy Pieces (it is a freely downloadable book, explaining the role of, and reason for, system calls)
Related
Library functions have the weak attribute set by default (see [1]) and could be "overwritten" with functions having the same signature by accident.
For example printf internally calls fputc and I could easily declare one of my functions int fputc(int, FILE *).
If that happens, I would like to receive a compiler warning.
Is there a way to tell the compiler to warn me in case of overwriting a weak function?
[1] https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html
(I am guessing you are on Linux, and compiling and linking your application as usual, in particular with the libc.so dynamically linked)
Library functions have the weak attribute set by default
This is not always true; on my system fputc is not a weak symbol:
% nm -D /lib/x86_64-linux-gnu/libc-2.21.so|grep fputc
000000000006fdf0 T fputc
0000000000071ea0 T fputc_unlocked
(if it was weak, the T would be a W, and indeed write is weak)
BTW, redefining your own fputc (or malloc) is legitimate (and could be useful, but is very tricky), provided it keeps a semantic conforming to the standard. More generally weak symbols are expected to be redefinable (but this is tricky).
Is there a way to tell the compiler to warn me in case of overwriting a weak function?
No (the compiler cannot warn you reliably).
Since the only thing which could give you some warning is not the compiler (which does not know which particular libc would be used at runtime, you might upgrade your libc.so after compilation) but the linker, and more precisely the dynamic linker, that is ld-linux(8). And the warnings could reliably only be given at runtime (because the libc.so might be different at build time and at run time). Perhaps you want LD_DYNAMIC_WEAK.
If you are ready to spend weeks working on a solution, you might consider using GCC MELT with your own MELT extension and customize a recent GCC to emit a warning when a weak symbol from the libc available at compile time (which might not be the same libc dynamically linked at runtime, so such a check has limited usefulness) is redefined.
Perhaps you might use some LD_PRELOAD trick.
Also, if you linked statically your application, the linker could give you diagnostics if you redefine a libc function.
Read also Drepper's How to Write a Shared Library & Levine's Linkers & loaders book.
Basically, I want to get a list of libraries a binary might load.
The unreliable way I came up with that seems to work (with possible false-positives):
comm -13 <(ldd elf_file | sed 's|\s*\([^ ]*\)\s.*|\1|'| sort -u) <(strings -a elf_file | egrep '^(|.*/)lib[^:/]*\.so(|\.[0-9]+)$' | sort -u)
This is not reliable. But it gives useful information, even if the binary was stripped.
Is there a reliable way to get this information without possible false-positives?
EDIT: More context.
Firefox is transitioning from using gstreamer to using ffmpeg.
I was wondering what versions of libavcodec.so will work.
libxul.so uses dlopen() for many optional features.
And the library names are hard-coded. So, the above command helps
in this case.
I also have a general interest in package management and binary dependencies.
I know you can get direct dependencies with readelf -d, dependencies of
dependencies with ldd. And I was wondering about optional dependencies, hence the question.
ldd tells you the libraries your binary has been linked against. These are not those that the program could open with dlopen.
The signature for dlopen is
void *dlopen(const char *filename, int flag);
So you could, still unreliably, run strings on the binary, but this could still fail if the library name is not a static string, but built or read from somewhere during program execution -- and this last situation means that the answer to your question is "no"... Not reliably. (The name of the library file could be read from the network, from a Unix socket, or even uncompressed on the fly, for example. Anything is possible! -- although I wouldn't recommend any of these ideas myself...)
edit: also, as John Bollinger mentioned, the library names could be read from a config file.
edit: you could also try substituting the dlopen system call with one of yours (this is done by the Boehm garbage collector with malloc, for example), so it would open the library, but also log its name somewhere. But if the program didn't open a specific library during execution, you still won't know about it.
(I am focusing on Linux; I guess that most of my answer fits for every POSIX systems; but on MacOSX dlopen wants .dylib dynamic library files, not .so shared objects)
A program could even emit some C code in some temporary file /tmp/foo1234.c, fork a compilation of that /tmp/foo1234.c into a shared library /tmp/foo1234.so by some gcc -O -shared -fPIC /tmp/foo1234.c -o /tmp/foo1234.so command -generated and executed at runtime of your program-, perhaps remove the /tmp/foo1234.c file -since it is not needed any more-, and dlopen that /tmp/foo1234.so (and perhaps even remove /tmp/foo1234.so after dlopen), all that in the same process. My GCC MELT plugin for gcc does exactly this, and so does Bigloo, and the GCCJIT library is doing something close.
So in general, your quest is impossible and even has no sense.
Is there a reliable way to get this information without possible false-positives?
No, there is no reliable way to get such information without false positives (you could prove that equivalent to the halting problem, or to some other undecidable problem). See also Rice's theorem.
In practice, most dlopen happens on plugins provided by some configuration. There might not be exactly named as such in a configuration file (e.g. some Foo programs might have a convention like a plugin named bar in some foo.conf configuration file is provided by foo-bar.so plugin).
However, you might find some heuristic approximation. Most programs doing some dlopen have some plugin convention requesting some particular symbol names in the plugin. You could search for shared objects defining these names. Of course you'll get false positives.
For example, the zsh shell accepts plugins called zsh modules. the example module shows that enables_,
boot_, features_ etc... functions are expected in zsh modules. You could use nm -D to find *.so files providing these (hence finding the plugins likely to be perhaps loadable by zsh)
(I am not convinced that such an approach is worthwhile; in fact you should usually know which plugins are useful on your system by which applications)
BTW, you could use strace(1) on the execution of some command to understand the syscalls it is doing, hence the plugins it is loading. You might also use ltrace(1), or pmap(1) (on some given process), or simply -for a process 1234- use cat /proc/1234/maps to understand its virtual address space, hence the plugins it has already loaded. See proc(5).
Notice that strace, ltrace, pmap exist on Linux, but many POSIX systems have similar programs.
Also, a program could generate some machine code at runtime and execute it (SBCL does that at every REPL interaction!). Your program could also use some JIT techniques (e.g. with libjit, llvm, asmjit, GCCJIT or with hand-written code...) to do likewise. So plugin-like behavior can happen without dlopen (and you might mimic dlopen with mmap calls and some ELF relocation processing).
Addenda:
If you are installing firefox from its packaged version (e.g. the iceweasel package on Debian), its package is likely to handle the dependencies
I am looking for the definition of usleep().
I found the declaration in /usr/include/unistd.h. But there it's declared as
extern int usleep (__useconds_t __useconds);
Where to find the definition of the function?(Please mention if any way is there other than grep, so that for other library functions also I can follow.)
I'm using gcc version 4.8.3 in Fedora 21 with Linux Kernel 4.1
As with most library functions, you'll find no definition, as it's already compiled in the standard library that is linked automatically to your object modules by the compiler. You can probably find the full sources of your libc in the source packages provided by your Linux distribution, as well as on the glibc site (but keep in mind that some distributions add their patches).
As for usleep, you probably won't find anything interesting, it's just a wrapper around nanosleep, which in turn is just a syscall (so you'll just find some register setup followed by a sysenter instruction - the juicy stuff happens in kernel mode). edit actually, as nanosleep is a cancellable syscall it's a bit more complicated, but the point stands
I was searching for the source code of the C standard libraries. What I mean with it is, for example, how are cos, abs, printf, scanf, fopen, and all the other standard C functions written, I mean to see their source code.
So while searching for this, I came across with GLIBC, but I don't know what it actually is. It is GNU C Library, and it contains some source codes, but what are they actually, are they the source code of the standard functions or are they something else? And what is it used for?
Its the implementation of Standard C library described in C standards plus some extra useful stuffs which are not strictly standard but used frequently.
Its main contents are :
1) C library described in ANSI,c99,c11 standards. It includes macros, symbols, function implementations etc.(printf(),malloc() etc)
2) POSIX standard library. The "userland" glue of system calls. (open(),read() etc. Actually glibc does not "implement" system calls. kernel does it. But glibc provides the user land interface to the services provided by kernel so that user application can use a system call just like a ordinary function.
3) Also some nonstandard but useful stuff.
"use the force, read the source "
$git clone git://sourceware.org/git/glibc.git
(I was recently pretty enlightened when i looked through malloc.c in glibc)
There are several implementations of the standard. Glibc is the implementation that most Linuxes use, but there are others. Glibc also contains (as Aftnix states) the glue functions which set up the scene for jumps into the kernel (also known as system calls). So many of glibc's 'functions' don't do the actual work but only delegate to the kernel.
To read the source of Glibc, just google for it. There are myriad sites which carry it, and also several variations.
Windows uses Microsoft's own implementation, which I believe is called MSVCR.DLL. I doubt that you will find the source code to that library anywhere. Also note that some functions which a Linux hacker might think of as 'standard', simply don't exist on Windows (notably fork). The reverse is also true.
Other systems will have their own libc.
The glibc package contains standard libraries which are used by multiple programs on the system. In order to save disk space and memory, as well as to make upgrading easier, common system code iskept in one place and shared between programs. This particular package contains the most important sets of shared libraries: the standard C library and the standard math library. Without these two libraries, a Linux system will not function. The glibc package also contains national language (locale) support.
Yes, It's the implementation of standard library functions.
More specifically, it is the implementation for all GNU systems and in almost all *NIX systems that use the Linux kernel.
Here are a few "hands-on" points of view:
it implements the POSIX C API on top of the Linux kernel: What is the meaning of "POSIX"?
it contains several assembly hand-optimized versions of ANSI C functions for several different architectures, e.g. strlen:
sysdeps/x86_64/strlen.S
sysdeps/aarch64/strlen.S
how to modify its source, recompile and use it understand it better: How to compile my own glibc C standard library from source and use it?
how to GDB step debug it with QEMU and Buildroot: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/9693c23fe6b2ae1409010a1a29ff0c1b7bd4b39e#gdbserver-libc
On Windows, stat and pretty much all other C/POSIX functions Windows supplies are defined in msvcrt.dll, which is the C runtime library.
On Linux, I know a lot of POSIX C functions are system calls. I also know when linking a program, you can't have undefined references. I have searched all so files in /lib and /usr/lib for the symbol stat or "mangled/prefixed" form but have not found anything. This is the command I used:
objdump -T /lib/*.so* /usr/lib/*.so* | grep "stat"
It didn't turn up the stat I was looking for.
So my question becomes: where is it, and any other "system calls" defined?
On my Linux machine, I can find the stat (weak) symbol and __stat (non-weak) in /usr/lib/libc.a
You might make linux kernel system calls without even using the libc (but this is probably a bad practice). The Linux Assembly Howto explains (in its chapters 5 & 6) how to do that (on x86 Linux 32 bits at least).
But I think it is a bad idea. Going thru the libc is good practice, and might even be faster (because e.g. of VDSO), and is more portable.
First of all stat is ambiguous; there's a stat syscall and there is a function stat that can be called from user space which calls the syscall. That last function is (on my system at least) defined in /usr/include/sys/stat.h (that's right, it's in the header file). It actually has several definitions (all one liners that call a different function, like e.g. __fxstat) of which one is chosen depending on compiler and system and whatnots.
Anyhow, stat (and other syscalls) are just wrappers that call the kernel (usually with a lot of orchestration). That is why I was initially confused about what you meant. I hope though, I could help despite my unhelpful first comment.
You can call it with syscall(2)
#include <sys/syscall.h>
...
syscall(SYS_stat, path, buf);
see also Linux syscall reference: http://syscalls.kernelgrok.com/