Why would one call stat64 explicitly? [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Env: Ubuntu 18.04, Linux Kernel 5.3
I'm debugging some binary with gdb. Here is what I found when catching stat system call:
(gdb) bt
#0 0x00007f2d8ecae775 in __GI___xstat (vers=vers#entry=1, name=name#entry=0x7f2d882d7d60 "/etc/app/cfg", buf=buf#entry=0x7f2d8f3a14f0) at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
#1 0x00007f2d592294e4 in stat64 (__statbuf=0x7f2d8f3a14f0, __path=0x7f2d882d7d60 "/etc/app/cfg") at /usr/include/x86_64-linux-gnu/sys/stat.h:500
#2 0x00007f2d6fac1990 in ?? ()
#3 0x00007f2d8f3a15c8 in ?? ()
#4 0x00007f2d8f3a1620 in ?? ()
#5 0x00007f2d6fabbcb3 in ?? ()
#6 0x00000007170a2ae8 in ?? ()
#7 0x00007f2d8f3a15d0 in ?? ()
#8 0x0000000000000000 in ?? ()
The line #1 0x00007f2d592294e4 in stat64 (__statbuf=0x7f2d8f3a14f0, __path=0x7f2d882d7d60 "/etc/app/cfg") at /usr/include/x86_64-linux-gnu/sys/stat.h:500 got me confused.
I don't have an idea about why one would use stat64 explicitly. First of all it requires _GNU_SOURCE to be defined. Secondly to my knowledge glibc's stat already handle all the kernel-specific 32/64-bit difference staff.
And besides, both the stat and stat64 use the same stat system call on my kernel.

The most likely explanation is the program did a #define _FILE_OFFSET_BITS 64 before including any system headers. This causes calls to plain stat to be remapped to stat64, open to open64, etc. Nowadays all applications should do this.
However, there is a reason to use stat64 etc directly. In a library whose public interfaces logically should involve off_t or any of the other types that are changed by defining _FILE_OFFSET_BITS, you can’t use that define or any of those types in your interface headers because then your own ABI will depend on the setting of that macro, which is controlled by the library user, not you. Instead you have to define _LARGEFILE64_SOURCE and use the explicitly sized types (off64_t, etc.) and functions (stat64, etc.) in your interface headers. In principle, .c and .h files that aren't exposed to external macro defines can still use _FILE_OFFSET_BITS and the ordinary functions, but in practice it’s easier to enforce a style rule that all of the library’s code must use only the explicitly sized types and functions.

Related

How to find the available POSIX system calls api list for my Linux? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I'm working on Ubuntu 16.04 and I need to use those functions
int spawnv( mode, path, argv );
int spawnve( mode, path, argv, envp );
int spawnvp( mode, file, argv );
int spawnvpe( mode, file, argv, envp );
I know that they are compiler dependent, so how to find system calls supported by my compiler/system? or how to find my multi-tasking api for processes system calls?
I tried using man spawn + clicking on tab but nothing appears.
The Wikipedia page on Spawn (computing) indicates that the spawn*() functions you reference are from DOS/Windows. They don't have direct analogues in Unix — although they were originally derived from Unix (fork() and exec*()) and adapted to DOS/Windows.
There are no direct analogues to those functions in POSIX. Arguably the nearest approach is posix_spawn() and its multitude of support functions (see the 'SEE ALSO' section on that page for links to the other functions).
I didn't find any similar functions in Linux, even when looking at:
https://man7.org/linux/man-pages/man3
https://man7.org/linux/man-pages/man2
https://linux.die.net/man/2
https://linux.die.net/man/3
Similar functionality can probably be written using fork() (sometimes), exec*() and waitpid() or one of its relatives (sometimes), but it might not be as easy as all that. It depends in part on how exact and complete the emulation functionality has to be.

How to extract all functions out of a compiled elf file,even the function has no symbol [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
IDA can do this:some function with no symbol will named with 'sub_address'.
How can I do this at runtime.
enter image description here
In short, you disassemble all exported functions and look for call instructions. For each call instruction you take the address operand and mark it as another function, and disassemble it too. Ditto recursively.
Such functions, found from call operands, are what IDA calls sub_XXXX.
How to extract all functions out of a compiled elf file,even the function has no symbol
You don't define what is a function for you (and you really should).
Notice that if the compiler has inlined a function, it does not appear in the ELF file, even if of course it exists in the source code (the entire program could have been built with link-time optimization, e.g. g++ -flto -O2 at both compile and link time; then you would have many inlined functions, including several which are not marked inline in the source code).
The original source code could have been compiled with visibility tricks.
The software build might have used some code obfuscation techniques.
If some function is called only indirectly (think of a virtual method in C++, always called thru some vtable; or think of some static function whose address is put into some function pointer variable or struct field) then you practically cannot detect it, since to reliably do that on the binary executable requires a precise analysis of all the possible (function pointer) values of some register or memory location (and that is undecidable, see Rice's theorem).
A program can also load a plugin at runtime (e.g. using dlopen) and call functions in it. It could also generate some machine code when running (e.g. with the help of GNU lightning, asmjit, libgccjit, etc...) and call such a generated function.
So in general you cannot achieve your goal (especially if you assume that your "adversary", the software writer, use clever techniques to make that function extraction difficult). In general, decompilation is impossible (if you want it to be precise and complete).
However, arrowd's answer is proposing some crude and incomplete approximation. You need to decide if that is enough (and even IDA is giving approximate results).
At last, in some legal systems, decompilation or reverse engineering of a binary executable is forbidden (even when technically possible); check the EULA or contract (or law) related to your binary software and your situation. You really should verify that what you are trying to do is legal (and it might not be, and in some cases you could risk jail).
BTW, all these reasons is why I prefer to always use free software, whose source code is published and can be studied and improved. I am willingly avoiding proprietary software.

C, using F commands in functions such as open() [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Kind of a stupid question but seeing if its possible.
With the f commands you have for example, "w" which would be write to file, create new file if it doesn't exist.
With the O commands its a bit more complex, my research shows its: O_WRONLY|O_TRUNC|O_CREAT
the o commands seems a bit too much to remember oppose to the f command. Is it "good coding" and possible to use W as the flag in an open function
The fopen function is defined in the ISO C standards (C99 7.19.5.3 for example). It is implemented in the C runtime library. As such all compliant C implementations, across all platforms, have to implement it.
The open function is defined in POSIX. It is specific to unix-like platforms.
When you call fopen on a unix-like, POSIX compliant platform, the runtime library inspects the arguments you are supplying and translates them into the corresponding arguments to the POSIX open function.
If you called fopen on another system, such as Windows, the runtime library would be calling a windows specific function to open the file - perhaps OpenFile.
You can't just supply the fopen arguments directly to POSIX open, or to Windows OpenFile for that matter. Those functions don't understand them.
In terms of "good programming" and which of these layers to use: in general, you should avoid directly calling OS system calls (such as the POSIX ones or the windows ones) unless you specifically need to for some reason. The reason is that your program will be more portable if you don't. It will be possible to compile it for any platform which has a compliant C compiler.
On the other hand, if you need some capability or option which is not available in the C Runtime libraries, then you should use the OS system calls.

hard to understand this macro [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
#define __HAVE_ARCH_STRCPY
What's the meaning of __HAVE_ARCH ? I'm not a native speaker and I fail to find the meaning of it by google...(maybe this question is quite silly)
By defining the __HAVE_ARCH_XXXX pre-processor tokens, it allows other locations in the OS kernel to test if the current hardware platform supports the strcpy, memset, etc. functionality. You'll notice that on some platforms, this token is defined, and then a basic implementation of these functions are defined as inline functions along with the token, since on those platforms, the functionality is not provided by some other kernel library or kernel code module. On other platforms, the functions are defined in some other code module, and may be simply declared as extern just after the pre-processor token.
Keep in mind that the kernel itself in Linux does not have access to the standard libc library, so these functions have to be defined separately from what you would typically use in a user-land application that is linked against libc. Thus it's important to define what standard functions are present, and which ones are not, as it may vary from platform-to-platform.
"This architecture has strcpy()".

Is there a downside to using -Bsymbolic-functions?

I recently discovered the linker option "-Bsymbolic-functions" in GNU ld:
-Bsymbolic
When creating a shared library, bind references to global symbols to the
definition within the shared library, if any. Normally, it is possible
for a program linked against a shared library to override the definition
within the shared library.
This option is only meaningful on ELF platforms which support shared libraries.
-Bsymbolic-functions
When creating a shared library, bind references to global function symbols
to the definition within the shared library, if any.
This option is only meaningful on ELF platforms which support shared libraries.
This seems to be the inverse of the GCC option -fvisibility=hidden, in that instead of preventing the export of the referenced function to other shared objects, it prevents library-internal references to that function from being bound to an an exported function of a different shared object. I informed myself that -Bsymbolic-functions will prevent the creation of PLT entries for the functions, which is a nice side effect.
But I was wondering whether there is perhaps a finer-grained control over this, like overwriting -Bsymbolic for individual function definitions of a library.
Should I be aware of any pitfalls of using -Bsymbolic-functions? I plan to only use that, because the -Bsymbolic will break exceptions, I think (it will make it so that references to typeinfo objects are not unified, I think).
Thanks!
Answering my own question because I just earned a Tumbleweed badge for it... and I found out subsequently
But I was wondering whether there is perhaps a finer-grained control over this, like overwriting -Bsymbolic for individual function definitions of a library.
Yes, there is the option --dynamic-list which does exactly that
Should I be aware of any pitfalls of using -Bsymbolic-functions? I plan to only use that, because the -Bsymbolic will break exceptions, I think (it will make it so that references to typeinfo objects are not unified, I think).
I looked more into it, and it seems there is no issue. The libstdc++ library apparently does it or at least did consider it and they only had to add --dynamic-list-cpp-new to still have operator new unified (to prevent issues with multiple allocator / deallocators mixing up in a program but I would argue such programs are broken anyway). Ubuntu uses it or used it by default, and it seems it causes conflicts with some packages. But overall it should work nicely I expect.
I recently discussed this this with one of the toolchain experts at SUSE. Here are his remarks:
"-Bsymbolic-functions is a thing from an old world which doesn't
exist anymore. It completely bypasses everything about what ELF can
provide, including visibility. When you're using it, everything is bound
locally. IOW: don't use it :)
Noone should use -Bsymbolic-functions, it's a too big hammer for
most purposes."
How does -Bsymbolic-functions relate to library versioning (--version-script) ?
"-Bsymbolic-functions overrides anything, from linker
scripts, from GCC attributes or anywhere, about symbol visibilities or anything. It makes everything bind local, always, irrespective of
anything else that you might have added on command lines, or extra files,
or object files. (And yes, --dynamic-list= was a mis-guided attempt to
fix some of that and make -Bsymbolic* somewhat more friendly). So, yes, it takes precendence over linker script. It's a big hammer :)
"
"To be extra precise: -Bsymbolic-functions is not quite that same as linker
script global/local, which is probably a reason why people still use it
sometimes. While -Bsymbolic-functions does bind references to definitions
locally (like local: in linker scripts), it also keeps them exported
(like the global: ones). In ELF speak that would be somewhat like
PROTECTED visibility. Unfortunately that can't be expressed in a symbol
version script right now, only via GCCs __attribute__(visibility). So,
when people try to get the speed advantage of local binding (fewer symbol
lookups at library load time), while still exporting all their functions
from the shared lib, they unfortunately often end up first finding that
-Bsymbolic-functions "does what I want", without realizing that it creates
problems down the line."
Well you could say it is a "hardening" option as it ensures your calls to in-library functions surely end up there. But one issue that I found is some projects test-suites.
For example the libvirt test-suite would want to call into the just built libvirt0.so but also mock some of the calls that will be done from there.
Due to -Bsymbolic-functions being used on the build that breaks the test as the original and not the mocked function is called.
Example backtraces
Good case:
#0 virHostCPUGetThreadsPerSubcore (arch=VIR_ARCH_PPC64) at ../../../tests/virhostcpumock.c:30
#1 0x00007ffff7c1e4c4 in virHostCPUGetInfoPopulateLinux (cpuinfo=<optimized out>, arch=VIR_ARCH_PPC64, cpus=0x7fffffffdf38, mhz=<optimized out>, nodes=0x7fffffffdf40, sockets=0x7fffffffdf44, cores=0x7fffffffdf48, threads=0x7fffffffdf4c)
at ../../../src/util/virhostcpu.c:661
#2 0x0000555555557e6f in linuxTestCompareFiles (outputfile=0x55555558f150 "/build/libvirt-OUKR8i/libvirt-4.10.0/tests/virhostcpudata/linux-ppc64-subcores2.expected", arch=VIR_ARCH_PPC64,·
cpuinfofile=0x5555555a3f10 "/build/libvirt-OUKR8i/libvirt-4.10.0/tests/virhostcpudata/linux-ppc64-subcores2.cpuinfo") at ../../../tests/virhostcputest.c:44
#3 linuxTestHostCPU (opaque=<optimized out>) at ../../../tests/virhostcputest.c:189
#4 0x000055555555914d in virTestRun (title=0x55555555c0a1 "subcores2", body=0x555555557cc0 <linuxTestHostCPU>, data=0x7fffffffe0c0) at ../../../tests/testutils.c:176
#5 0x000055555555781a in mymain () at ../../../tests/virhostcputest.c:263
#6 0x0000555555559df4 in virTestMain (argc=1, argv=0x7fffffffe2c8, func=0x5555555577b0 <mymain>) at ../../../tests/testutils.c:1114
#7 0x00007ffff79bb09b in __libc_start_main (main=0x5555555576a0 <main>, argc=1, argv=0x7fffffffe2c8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe2b8) at ../csu/libc-start.c:308
#8 0x00005555555576ea in _start () at ../../../tests/virhostcputest.c:278
Bad case:
#0 virHostCPUGetThreadsPerSubcore (arch=arch#entry=VIR_ARCH_PPC64) at ../../../src/util/virhostcpu.c:1119
#1 0x00007ffff7c27e04 in virHostCPUGetInfoPopulateLinux (cpuinfo=<optimized out>, arch=VIR_ARCH_PPC64, cpus=0x7fffffffdea8, mhz=<optimized out>, nodes=0x7fffffffdeb0, sockets=0x7fffffffdeb4, cores=0x7fffffffdeb8, threads=0x7fffffffdebc)
at ../../../src/util/virhostcpu.c:661
#2 0x0000555555557e6f in linuxTestCompareFiles (outputfile=0x5555555a5c30 "/build/libvirt-4biJ7f/libvirt-4.10.0/tests/virhostcpudata/linux-ppc64-subcores2.expected", arch=VIR_ARCH_PPC64,·
cpuinfofile=0x55555558fd20 "/build/libvirt-4biJ7f/libvirt-4.10.0/tests/virhostcpudata/linux-ppc64-subcores2.cpuinfo") at ../../../tests/virhostcputest.c:44
#3 linuxTestHostCPU (opaque=<optimized out>) at ../../../tests/virhostcputest.c:189
#4 0x000055555555914d in virTestRun (title=0x55555555c0a1 "subcores2", body=0x555555557cc0 <linuxTestHostCPU>, data=0x7fffffffe030) at ../../../tests/testutils.c:176
#5 0x000055555555781a in mymain () at ../../../tests/virhostcputest.c:263
#6 0x0000555555559df4 in virTestMain (argc=1, argv=0x7fffffffe238, func=0x5555555577b0 <mymain>) at ../../../tests/testutils.c:1114
#7 0x00007ffff79b009b in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00005555555576ea in _start () at ../../../tests/virhostcputest.c:278
Compare the source for virHostCPUGetThreadsPerSubcore in those two and you will see the difference.
Another case I have seen are:
static variables becoming multiple instances in singularity
segfaulting tests in sssd
autofs issues with global variables
Since the original question was about potential drawbacks I thought it is worth to mention those somewhat common category of related issues as well.
There are cases with side effects. A documented one:
https://bugs.launchpad.net/ubuntu/+source/xfe/+bug/644645
I would also like to figure out more about it, because I have such a case right now.
building glibc with -Bsymbolic-functions is not recommended neither. Here is the result I got:
Core was generated by `/home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/elf/ld-linux .'.
Program terminated with signal 11, Segmentation fault.
#0 0x400a3e90 in _int_free ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
(gdb) where
#0 0x400a3e90 in _int_free ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#1 0x4016b94b in __libc_dlsym ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#2 0x4004c2c7 in __gconv_find_shlib ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#3 0x40042320 in find_derivation ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#4 0x40042889 in __gconv_find_transform ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#5 0x400d6f00 in __wcsmbs_load_conv ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#6 0x400c86f6 in mbrtowc ()
from /home/lano1106/dev/packages/glibc/repos/core-i686/src/glibc-build/libc.so.6
#7 0x08048914 in ?? ()
#8 0x00000000 in ?? ()

Resources