I was wondering if there is any way to statically analyze binaries in linux, and get all potential call flows/control flows.
Essentially a static analysis (with potential aliasing) similar to as is given by LLVM and similar compilers, but from the binary... (i.e. I do not need to recompile etc.)
Thanks
Related
My host application took over the ownership of e.g. a FILE object which came from a dynamic library. Can I call fclose() on this object safely even though my host application and the dynamic library are compiled with different versions of clang / gcc?
Background
On Windows (with different VS runtimes) it would be illegal and I have to first extract the fclose() function from the runtime library which is used by the dynamic library since all runtimes have their own pools and internal structures for file or memory objects.
An illustration for the situation in Windows would look like this:
Does this restriction apply for Linux and macOS as well?
The issue is not whether your application and the dynamic libraries were compiled with different versions of clang and/or gcc. The issue is whether, ultimately, there's one underlying C library that manipulates one kind of FILE * object and has one, compatible implementation of fclose().
Under MacOS and Linux, at least, the answer to all these questions is likely to be "yes". In my experience it's hard to get two different, incompatible C libraries into the mix; you'd have to really work at it.
Addendum: I suppose I should admit, however, that my experience may be getting dated. In my experience, on any Unix-like system, there's exactly one C library, generally /lib/libc.{a,so}. But I gather that "modern" compilers are tending to access their own compiler- and version-specific libraries off in special places, meaning that the scenario you're worried about could be a problem. To me, it seems, this way lies madness, but then again, it seems that more and more of the world seems to be embracing dependency hell, rather than trying to eliminate it.
It is not generally safe to use a library designed for one compiler with code compiled by a different compiler. A compiler may generate code that implements the nominal functions in the standard library using internal routines or interfaces, and those routines or interfaces may be different or missing in the library designed for another compiler.
Nor is it safe to take any pointer to some internal data structure from one library and use it with another library.
If the sources are just compiled with different versions of one compiler (e.g., clang 73 and clang 89), not different compilers (e.g., Apple clang versus GCC), the compiler might offer some guarantee about library compatibility. You would have to check its documentation. Or, if the compiler is intended to use the library provided with the operating system, that could work. Again, you would have to check its documentation.
On Linux, if both your code and the other library dynamically link to the same library (such as libc.so.6), both will get the same version and implementation of that library at runtime. You can check which libraries a given dynamic library links to with ldd.
If you were linking to a library that statically linked in a supporting library, you would need to be careful to pass any structures to or from it against the same version of the library. But this is more likely to come up in practice with libc++ and libstdc++ than with libc.
So, don't statically link your library to another and then pass a data structure that requires client code to separately link to the same library.
Does there exist static analysis tool (C/C++) which analyzes code without being able to compile it?
(The reason I ask is my code may have some functions from external SDK)
Most static analysis tools (e.g. frama-C) don't compile C code, but often requires its preprocessed form. So they require the availability of header files used by your code. Often, they fork the compiler just to get the preprocessed form (i.e. gcc -C -E)
Notice that these tools usually don't need or care about the binary form of the libraries you are using, only their header files.
However, I believe that extending a compiler to add much more static analysis abilities is a plus, since the analyzer can take advantage of all the work done (and the infrastructure provided) by the compiler. This is the main motivation for my (free software, obsolete in 2019) GCC MELT tool (which you can use to extend GCC to do some particular static analysis).
Some few static analyzers -e.g. coccinelle- are able to handle unpreprocessed C code (using macros). But then, they need some way to understand the macros which your code is using (otherwise they cannot check much: a macro invocation can expand to many thousands statements!).
N.B. all the analyzers mentioned above are free software.
I have been using this for many years: FlexeLint
Is it possible to write code in C, then statically build it and make a binary out of it like an ELF/PE then remove its header and all unnecessary meta-data so to create a raw binary and at last be able to put this raw binary in any other kind of OS specific like (ELF > PE) or (PE > ELF)?!
have you done this before?
is it possible?
what are issues and concerns?
how this would be possible?!
and if not, just tell me why not?!!?!
what are my pitfalls in understanding the static build?
doesn't it mean that it removes any need for 3rd party and standard as well as os libs and headers?!
Why cant we remove the meta of for example ELF and put meta and other specs needed for PE?
Mention:
I said, Cross OS not Cross Hardware
[Read after reading below!]
As you see the best answer, till now (!) just keep going and learn cross platform development issues!!! How crazy is this?! thanks to philosophy!!!
I would say that it's possible, but this process must be crippled by many, many details.
ABI compatibility
The first thing to think of is Application Binary Interface compatibility. Unless you're able to call your functions the same way, the code is broken. So I guess (though I can't check at the moment) that compiling code with gcc on Linux/OS X and MinGW gcc on Windows should give the same binary code as far as no external functions are called. The problem here is that executable metadata may rely on some ABI assumptions.
Standard libraries
That seems to be the largest hurdle. Partly because of C preprocessor that can inline some procedures on some platforms, leaving them to run-time on others. Also, cross-platform dynamic interoperation with standard libraries is close to impossible, though theoretically one can imagine a code that uses a limited subset of the C standard library that is exposed through the same ABI on different platforms.
Static build mostly eliminates problems of interaction with other user-space code, but still there is a huge issue of interfacing with kernel: it's int $0x80 calls on x86 Linux and a platform-specifc set of syscall numbers that does not map to Windows in any direct way.
OS-specific register use
As far as I know, Windows uses register %fs for storing some OS-wide exception-handling stuff, so a binary compiled on Linux should avoid cluttering it. There might be other similar issues. Also, C++ exceptions on Windows are mostly done with OS exceptions.
Virtual addresses
Again, AFAIK Windows DLLs have some predefined address they're must be loaded into in virtual address space of a process, whereas Linux uses position-independent code for shared libraries. So there might be issues with overlapping areas of an executable and ported code, unless the ported position-dependent code is recompiled to be position-independent.
So, while theoretically possible, such transformation must be very fragile in real situations and it's impossible to re-plant the whole static build code - some parts may be transferred intact, but must be relinked to system-specific code interfacing with other kernel properly.
P.S. I think Wine is a good example of running binary code on a quite different system. It tricks a Windows program to think it's running in Windows environment and uses the same machine code - most of the time that works well (if a program does not use private system low-level routines or unavailable libraries).
When I run a program (in linux) does it all get loaded into the physical memory? If so, is using shared libraries, instead of static libraries, help in terms of caching? In general, when should I use shared libraries and when should I use static libraries? My codes are either written in C or in C++ if that matters.
This article hits covers some decent ground on what you want. This article goes much deeper about the advantages of shared libraries
SO also has covered this topic in depth
Difference between static and shared libraries?
When to use dynamic vs. static libraries
Almost all the above mentioned articles are shared library biased. Wikipedia tries to rescue static libraries :)
From wiki,
There are several advantages to statically linking libraries with an
executable instead of dynamically linking them. The most significant
is that the application can be certain that all its libraries are
present and that they are the correct version. This avoids dependency
problems. Usually, static linking will result in a significant
performance improvement.
Static linking can also allow the application
to be contained in a single executable file, simplifying distribution
and installation.
With static linking, it is enough to include those
parts of the library that are directly and indirectly referenced by
the target executable (or target library).
With dynamic libraries, the
entire library is loaded, as it is not known in advance which
functions will be invoked by applications. Whether this advantage is
significant in practice depends on the structure of the library.
Shared libraries are used mostly when you have functionality that could be used and "shared" across different programs. In that case, you will have a single point where all the programs will get their methods. However, this creates a dependency problem since now your compiled programs are dependent on that specific version of the library.
Static libraries are used mostly when you don't want to have dependency issues and don't want your program to care which X or Y libraries are installed on your target system.
So, which one to use?. for that you should answer the following questions:
Will your program be used on different platforms or Linux distributions? (e.g. Red Hat, Debian, SLES11-SP1)
Do you have replicated code that is being used by different binaries?
Do you envision that in the future other programs could benefit from your work?
I think this is a case by case decision, and it is not a one size fits all kind of answer.
We have a reusable library which gets delivered across to multiple products. Most of the products are in VxWorks and use gcc compiler. But, each of them will be on different architectures like PPC, MIPS and in PPC itself there are more types like 8531, 8620 etc.
Currently, I am building static libs for each of these boards seperately and provide. Is there anyway that a common library can be built, which can be used across all these different architectures?
Also, currently I try to ensure that compiler options are same as that of the products. Is it necessary? Is there any information available in the internet which classifies which options are important to maintained same for static libraries and applications?
No there is no other way - you must built the libraries (static or not) for each platform.
As you probably already know static library is really just a container storing a buch of object files. Each object file contains binary code specific to platform that the library was built for (read: different set of assembly instructions).
Yes, keeping the compiler options the same when you are building a library and the binary (program) that uses it the same is a very good practice. This way you are avoiding potentially very nasty problems. Some optimization options are binary incompatible (e.g.: you may compile a function in a library with a optimization that will cause it to return (or expect) a data by register), but your main program may expect that the function returns it by address on stack - big trouble.
It depends of each option: platform and architecture options must be the same, obviously.
Another ones like optimization, debug, profiling can be different.
Imagine that a library may be provided by an external developer, so, you don't really not know how did he compiled it, only platform and architecture requirements.
Also, currently I try to ensure that compiler options are same as that of the products. Is it necessary?
'2. Necessary - no. In fact, most libraries can be considered to be standalone and not tied to any particular product (i.e. they are usable from many products). As such, per-product specific flags just don't belong into the library, or vice versa (library-implementation specific flags are not supposed to appear when compiling products' objects).