Should static analysis tool compile code

Should static analysis tool compile code - c

Does there exist static analysis tool (C/C++) which analyzes code without being able to compile it?
(The reason I ask is my code may have some functions from external SDK)

Most static analysis tools (e.g. frama-C) don't compile C code, but often requires its preprocessed form. So they require the availability of header files used by your code. Often, they fork the compiler just to get the preprocessed form (i.e. gcc -C -E)
Notice that these tools usually don't need or care about the binary form of the libraries you are using, only their header files.
However, I believe that extending a compiler to add much more static analysis abilities is a plus, since the analyzer can take advantage of all the work done (and the infrastructure provided) by the compiler. This is the main motivation for my (free software, obsolete in 2019) GCC MELT tool (which you can use to extend GCC to do some particular static analysis).
Some few static analyzers -e.g. coccinelle- are able to handle unpreprocessed C code (using macros). But then, they need some way to understand the macros which your code is using (otherwise they cannot check much: a macro invocation can expand to many thousands statements!).
N.B. all the analyzers mentioned above are free software.

I have been using this for many years: FlexeLint

Related

Compilation map

Let assume a complex project (in C/C++), is there a solution to know which sources files are responsible/used for the creation of a specific binary without compiling the project itself.
I know I could just read the Makefile and try to follow the dependency chain like this but it's not very scalable and it could be hard if multiple Makefiles and / or implicit rules are used.
Thanks a lot for your help
PS: To clarify the first comments, I'm looking for a method which does not need to have a valid build environment (e.g. so compiling, even as a dry-run, is not an option).

is there a solution to know which sources files are responsible/used for the creation of a specific binary without compiling the project itself
If you compile with GCC (or perhaps Clang) you could use appropriate preprocessor options like -M to generate and keep in some textual file the dependencies, in a format acceptable by GNU make or ninja build automation tools. This works well on Linux distributions like Debian.
You could also be interested by other builders, including omake, and package managers like opam, urpmi, etc...
You could also be in touch with SoftwareHeritage team.
If you use GCC, you could write your own GCC plugin to maintain these dependencies in your database.
At last, be aware of Rice's theorem, and think about crazy examples (in C++) like
#if __TIME__[0]=='1'
int something=0;
#else
constexpr int something=1;
#endif
So my current intuition is that your wish is impossible. I could have misunderstood it.
Refer to some C standard like n1570, or to some C++ standard like n3337.
Study the behavior of tools like GNU autoconf.
Think of programs generating C or C++ code like GNU bison, my manydl.c, bismon, SWIG, RefPerSys, ANTLR .... Notice that GCC has many C++ code generators (notably gengtype) and is definitely "a complex project coded in C++".
See also linuxfromscratch.

Extract function source code from existing open source C library

I need to extract source code for a function from the existing C library (the library is open source). The problem is that functions are created using macros in header files, and when I write a test project and link the library to it the debugger points me to that header file on 'go to definition' action. I have the source code of the library and I guess i need to build it together with my test code (maybe this is not correct, I am not sure). Any advice on how to proceed, what to use? Thank you.

I need to extract source code for a function from the existing C library (the library is open source).
Several C compilers are themselves open source. Both GCC and Clang are (and so is tinycc). So you legally could improve them (but that could take months of work).
In addition, recent GCC versions (e.g. in july 2020, GCC 10) accept plugins. Your GCC plugin could work on some internal GCC representations (e.g. GIMPLE, GENERIC) so will know about functions (even obtained by preprocessor expansion).
You could also consider using some open source static program analyzers, such as Frama-C or Clang static analyzer.
PS. Take into account open source license issues (legal ones). I am not a lawyer (and you might need to ask one, if you mix various software of different open source licenses).

Where in the GCC source code does it compile to the different assembly languages?

Where is the code in the GCC source code that actually constructs the assembly for the different architectures?
Wondering how many different assembly languages it compiles to, and how it actually does this (by taking a look at the source code).
Is it in the gcc repo somewhere, or in another repo? I have started to dig around but haven't found anything.
https://github.com/gcc-mirror/gcc
For example, here is some of the assembly generating code in V8:
https://github.com/v8/v8-git-mirror/tree/master/src/x64
Is there anything equivalent for GCC?
I am wondering because it's a mystery how GCC does this, and it would be a great way to learn how compilers are actually implemented down to the assembly level.

The .md (machine description) files of GCC source contain stuff to generate assembly. GCC contains several specialized C/C++ code generators (and some of them translates the .md files into code emitting assembly).
GCC is a very complex program. The documentation of GCC MELT (an obsolete project) contains several interesting links and slides, notably refering to the Indian GCC Resource Center
Most of the optimizations in GCC happens in the middle-end (which is mostly independent of source language or target system), notably with many passes working on the Gimple representations.
The GCC repo is an SVN repository.
See also this answer, notably the pictures inside it.

The actual source code for GCC is most accessible from here:
https://gcc.gnu.org/svn.html
The software is accessible via SVN (subversion), a source code control system. This would be installed on many versions of Linux/UNIX, but if not on your platform, you can install the svn kit and then fetch the source using the following command:
svn checkout svn://gcc.gnu.org/svn/gcc/trunk SomeLocalDir
GCC is complex and would take significant experience to understand the nature of how the application actually compiles to different architectures.
In a nutshell, GCC has three major components - front-end, middle and back-end processing. The front-end processor has the component of the language parsing to understand the syntax of languages (like C, C++, Objective-C, etc). The front-end deconstructs the code to a portable construct which is then passed to the back-end for compilation to the target environment.
The middle part performs code analysis and optimisation, attempting to prioritise the code to generate the best possible output at the end of the full process. Technically, optimisation can occur at any part of the process as patterns are discovered during analysis.
The back-end processor compiles the code to a tree-style output format (not actually final executable code). Based on what the expected output is designed to be, the "pseudo-code" is optimised for using registers, bit-sizes, endian-ness, and so on. The final code is then generated during the assembly phase, which converts the back-end code into machine executable instructions.
It's important to note that the compiler has many options to deal with output formats so you can create output to many classes of architecture, usually out of the box. For cross-compiling and target compiler options, try checking out this link:
https://gcc.gnu.org/install/configure.html

Crosscompiler Binary compatibility in C

I need to verify something for which I have doubts. If a shared library ( .dll) is written in C, with the C99 standard and compiled under a compiler. Say MinGw. Then in my experience it is binary compatible and hence useable from any other compiler. Say MS Visual Studio. I say in my experience because I have tried it successfully more than once. But I need to verify if this is a rule.
And in addition I would like to ask if it is indeed so, then why libraries written completely in C, like openCV for example don't provide compiled binaries for every different OS? I know that the obvious reason would be to set all the compile-time parameters, but other than that there is none right?
EDIT: I am adding an additional question which I see as a logical extension to the original. Isn't this how one would go and create a closed source library? Since the option of giving source goes out of the window there, giving binaries is the only choice. And in that case providing binaries for as many architectures as possible is the desired result, with C being an obvious choice for having the best portability between systems and compilers. Right?

In the specific case of C compilers (MSVC and GCC/MinGW) in the Windows world, you are correct in the assumption of binary compatibility. One can link a C interface DLL compiled by GCC to a program in Visual Studio. This is the way C99 projects like ffmpeg allow developers to write application wiht Visual Studio. One only needs to create the import library with lib.exe found in the Microsoft toolchain from the DLL. Or vice versa, using mingw.org's pexports or better, mingw-w64's gendef tool, one can create a GCC import lib for a MSVC produced DLL.
This handy interoperability breaks down when you enter the C++ interface world, where the ABI of MSVC and GCC is different and incompatible. It may work, it may not, no guarantees are made and no effort is (currently) being done in changing that. Also, debugging info is obviously different, until someone writes a debug information generator/writer in GCC that is compatible to MSVC's debugger (along with gdb support of course).
I don't think C99 specifically changes anything to function declarations or the way arguments are handled in symbol definitions, so there should be no problem here either.
Note that as Vijay said, there is still the architecture difference, so a x86 library can't be used when linking to an AMD64 library.
To also answer your additional question about closed source binaries and distributing a version for all available compilers/architectures.
This is exactly the way you would create a closed source binary. In addition to the import library, it is also very important to hide exports from the DLL, making the DLL itself useless for linking (if you don't want client code to use private functions in the library, see for example the output of dumpbin /exports on a MSOffice DLL, lots of hidden stuff there). You can achieve the same thing with GCC (I believe, never used or tried it) using things like __attribute(hidden) etc...
Some compiler specific points:
MSVC comes with four (well, actually only three remaining in newer versions) different runtime libraries through /MT, /MD, and /LD. On top of this, you would have to provide a build for each version of Visual Studio (including Service Packs) to assure compatibility. But that is closed source binary and Windows for you...
GCC does not have this problem; MinGW always links to msvcrt.dll provided by Windows (since Windows 98), equivalent with /MD (and maybe also a debug library equivalent with /MDd). But I there are two versions of MinGW (mingw.org and mingw-w64) which do not guarantee binary compatibility. THe latter is more complete as it provides 64-bit options as well as 32-bit, and provides a more complete header/library set (including a substantial part of DirectX and DDK).

The general rule is that IF your OS/CPU combination has a standard ABI, and IF that ABI is powerful enough for your language, most compilers will follow that ABI and as a result will be binary compatible, allowing you to link libraries (shared or static) compiled with different compilers to programs compiled with other compilers just fine.
The problem is that most ABIs are fairly weak -- they're designed around low-level languages like C and FORTRAN and date back to the days before object oriented languages like C++. So they tend to lack support for things like function overloading, user-defined operators, exceptions, global contructors and destructors, virtual functions, inheritance, and such that are needed by C++.
This lack was recognized when C++ was designed which is why C++ has extern "C" -- which causes the compiler to limit itself to the standard ABI for certain functions, while disabling all the extra C++ features that the ABIs generally don't support.

A shared library or dll compiled to a particular architecture can be linked to applications compiled by other compilers that target the same architecture. (By architecture, I mean a processor/OS combination). But it is not practical for a library developer to compile against all possible architectures. Moreover, when a library is distributed in source form, users can build binaries optimized to their specific requirements.

Why in Linux compiler we have to give additional arguments while compiling and running C programs?

I have implemented semaphores in Linux last year. But for that I have to use -lpthread.
Now while implementing log10() function in C, I surfed the INTERNET and I saw that I have to use -lm.
I want to know why these kind of command line arguments are necessary in Linux.And Does this rule is compiler oriented?
(In windows Turboc compiler, I never used these kind of arguments.)

You are instructing the compiler to look for certain libraries and use them to try and produce a final object file.
When you were doing your threading code, you used threading primitives. These threading primitives are implemented in a library called pthread, -lpthread tells the linker to use the library pthread, without providing this switch the compiler will not be able to produce a valid object file as it is missing threading code implementation.
On the file system the libraries can be found in /usr/lib and lib (among others) when you look in these directories you will see files start with the lib prefix. for example libpthreadxxxxxx. You will have to do your own research to figure out what the xxxx means.
The development cycle using unix style tools is very granular on the surface, when you use heavyweight IDE's (read: visual studiio for C++), the IDE implicetly links against loads of standard libraries, so often you do not need to supply the name of the libraries you will commonly use. However, when you start doing more advanced programming you will probably have to install and configure your IDE to use external code libraries. If you were to use threading primitives in visual studio, you most likely will not have to provide the compiler with information on where to look for threading primitives, Microsoft considers this a common library and every new project will implicitly link against it.
A little discussion on GCC
GCC is a very diverse compiler producing code for various different usage scenarios. As such they try to be neutral and do not make assumptions. For example pthread is a particular threading primitives implementation. However, even through now on Linux at least it is the defacto standard, it is not the only one. Other Unix implementation have had different implementation. When such choices exist it is not fair for the compiler developers to implicitly link against libraries. They do however implicitly link against standard libraries; for example G++ is just a wrapper command to the internal compiler code, it is a C++ front-end so it implicitly links against an implementation of the C++ standard library. Similarly the C front end links against a the standard C library.
People often do not want to use certain standard library implementation, and instead they might want to use another implementation, in such cases you have to explicetly inform the compiler to use an implementation that you provide. Such use cases are very granular and are surface level issues with G++. In visual studio, you would have to tinker a lot to make such changes generally, since it is not an anticipated use-case anymore.
wikipedia will provide you with more information.
Edit: I'll fix the spelling and Grammatical issues later :D

The option -l indicates to gcc what libraries must be used for linking. -lpthread stands for "use the pthread library", and -lm stands for "use the m library" which is the math library. These commands are relative to gcc, not linux.

Because by default, gcc only links the C library (libc), which contains the well-known functions printf, scanf, and many more.
log10 exists in a different library called libm, and thus you need to explictly tell gcc to link that library, with -lm. The same logic applies for -lpthread.

This is purely a backwards, harmful practice. Separating parts of the standard library into separate .so files does nothing but increase load time and memory usage. Good luck getting anyone to change it though... Just accept that you have to do it (and that POSIX specifically allows, but does not require, that an implementation require -lm for using the math functions and -lpthread for using threads, etc.) and move on to more important things.
Or, go pester Drepper about it on the glibc bug tracker/mailing list. He won't change his mind, but if you enjoy flamewars you can get some kicks...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight