Compilation map - c

Let assume a complex project (in C/C++), is there a solution to know which sources files are responsible/used for the creation of a specific binary without compiling the project itself.
I know I could just read the Makefile and try to follow the dependency chain like this but it's not very scalable and it could be hard if multiple Makefiles and / or implicit rules are used.
Thanks a lot for your help
PS: To clarify the first comments, I'm looking for a method which does not need to have a valid build environment (e.g. so compiling, even as a dry-run, is not an option).

is there a solution to know which sources files are responsible/used for the creation of a specific binary without compiling the project itself
If you compile with GCC (or perhaps Clang) you could use appropriate preprocessor options like -M to generate and keep in some textual file the dependencies, in a format acceptable by GNU make or ninja build automation tools. This works well on Linux distributions like Debian.
You could also be interested by other builders, including omake, and package managers like opam, urpmi, etc...
You could also be in touch with SoftwareHeritage team.
If you use GCC, you could write your own GCC plugin to maintain these dependencies in your database.
At last, be aware of Rice's theorem, and think about crazy examples (in C++) like
#if __TIME__[0]=='1'
int something=0;
#else
constexpr int something=1;
#endif
So my current intuition is that your wish is impossible. I could have misunderstood it.
Refer to some C standard like n1570, or to some C++ standard like n3337.
Study the behavior of tools like GNU autoconf.
Think of programs generating C or C++ code like GNU bison, my manydl.c, bismon, SWIG, RefPerSys, ANTLR .... Notice that GCC has many C++ code generators (notably gengtype) and is definitely "a complex project coded in C++".
See also linuxfromscratch.

Related

How do I read the source code for a C library in CLion

OS: Deepin 20 (base on Debian 10)
CLion: 2020.1.2
GCC: gcc (Uos 8.3.0.3-3+rebuild) 8.3.0
Make: 4.2.1 x86_64-pc-linux-gnu
Cmake: 3.18.1
I am a newcomer who just started learning C language. When I was writing C code using CLion, I could access it by Ctrl + mouse click .
I'm calling the method inside the header function. For example, if I use printf , I can access the stdio.h file, which can be seen at line 332 extern int printf (const char * ___, RESTRICT, format,...) ; .
But if I want to see the details of this method
I can't see it. According to Navigate in the code structure
Use Ctrl+Alt+Home to switch. But the IDE prompts No related file .
How can I get the source code to call a method? I want to learn from the good experiences of others by looking at their implementation logic in their libraries
Thank you for your review. I would really appreciate it if you could help me.
Even if most of GNU/Linux software is open source, it is not installed (in source code form) by default on your computer.
Regarding C programming, see Modern C (and the C11 standard n1570) and read the documentation of your C compiler (perhaps GCC or Clang, or simpler ones like nwcc or tinycc), your linker (probably binutils), your build automation tool (e.g. GNU make or ninja or cmake). Enable all warnings and DWARF debug info, so if using GCC compile with at least gcc -Wall -Wextra -g; then improve your C code to get no warnings. Once you have debugged your C source code (using GDB and perhaps valgrind), add optimization flags such as -O2. Order of arguments to gcc matters!
Consider, for some tasks, generating some of your C code (perhaps some #include-d header file) with tools like GNU bison, ANTLR, SWIG, RPCGEN, AWK, GUILE, GPP, GNU m4, GNU autoconf - or your own program or script.
I want to learn from the good experiences of others by looking at their implementation logic in their libraries
You need to fetch the source code from elsewhere.
For examples, see GNU libc or musl-libc, and the Linux kernel (and others: GTK, PostGreSQL, sqlite, GUILE, etc.... including many open source programs mentionned in this answer) and look also on websites like github, gitlab, sourceforge
Read also Advanced Linux Programming and syscalls(2). See also http://linuxfromscratch.org/
In 2020, a recent GCC compiler happens to handle specially calls to printf when asked to optimize. See the softwareheritage and Frama-C projects.
In some cases, consider accepting plugins in your program with dlopen(3) and dlsym(3) (see also elf(5) and How to Write Shared Libraries). You might even generate some code at runtime with libraries like libgccjit (or generate C code at runtime, then compile it as a plugin, and load it; such an approach is called metaprogramming and is related to partial evaluation; see also the blog of the late J.Pitrat for more insights).
Of course, you need tools to navigate in source code. Consider using GNU emacs combined with GNU grep for that, or some other source navigator. For large programs of millions of source code lines, consider writing your own GCC plugin to understand them.
Use also tools like strace(1) and GDB to understand the dynamic behavior of programs.
Expect several months of full time work to explore all this.
You could be interested by ACM conference papers also.
For your own source code, consider using some version control tool such as git. Of course read its documentation. And use LibreOffice, Lout or LaTeX, MarkDown (perhaps combined with inkscape or diagrams for figures) to write the documentation of your software.
In some cases, you might consider generating parts of the documentation from parts of your source code (e.g. using literate programming techniques like nuweb or documentation generators like doxygen).

How can I compile ANSI C99-based MEX code delivered with Linux makefiles under Win64 MATLAB?

It seems I've got a real problem here due to my lack of any knowledge about Linux systems:
I have downloaded some open source code, which
is written in C
uses complex.h, so I assume it is ANSI C99
comes with makefiles designed for compilation under Linux systems
provides interfaces to IDL, MATLAB, Python etc.
I am indeed familiar about compiling C/MEX files under Windows-based MATLAB environments, but in this case I don't even know where to start. The project is distributed in several folders and consists of dozens of source and header files. And, to begin with, the Visual Studio 2010 compiler I've used to compile MEX files until now does not comply with the C99 standard, i.e. it does not recognize the complex.h header.
Any help towards getting this project compiled would be highly appreciated. In particular, I have the following questions:
1) Is there any possibility to automatically extract compilation information from the MEX files and transfer it to Windows reality?
2) Is there any free compiler being able to compile C99 stuff, which is also easy to embed in MATLAB?
I have done this (moved in-house legacy code inc. mex files to Win64). I can't recommend the experience.
You will have to recompile, no way around it.
Supported compilers for mex depend on your MATLAB version
This File Exchange entry for using Pelles C may be a starting point (if it works with your version of MATLAB).
I am guessing that there is a main makefile which then works through the makefiles in the subdirectories - have a read through the instructions for compiling under Linux, it will give you some idea of what's going on and may also discuss what to do if you want to change compiler. Once you've found a compatible compiler, the next stage is to understand what the makefiles are doing and edit them accordingly (change paths, compiler, compiler flags, etc.)
Then, from memory (it was a while ago), you get to enjoy a magical mystery tour through increasingly obscure compiler errors. Document everything because if you do get it working, you won't be in a mood to do this twice.
MATLAB R2016b on Windows now supports the MinGW compiler. I'm successfully using this to compile code written primarily for Linux/gcc. I installed this from the Add-On menu in MATLAB (search MinGW).
For my case, I'm building with the legacy code tool. The only thing I needed to do differently than normal was to tell the compiler to support c99 via a compiler flag. This does the trick:
legacy_code('compile', def, {'CFLAGS=-std=c99'})
I had trouble getting the flag command just right (I had some extra quotes that apparently broke things), and asked The MathWorks, so credit is due to their support team for this.
If you are using mex, I would expect to do something very similar.
I would guess that the makefiles are irrelevant for your application; you will need to tell the mex or legacy_code function about all of the files necessary to build the whole application or link against pre-built libraries (which it sounds like you don't have).
I hope this helps!

Should static analysis tool compile code

Does there exist static analysis tool (C/C++) which analyzes code without being able to compile it?
(The reason I ask is my code may have some functions from external SDK)
Most static analysis tools (e.g. frama-C) don't compile C code, but often requires its preprocessed form. So they require the availability of header files used by your code. Often, they fork the compiler just to get the preprocessed form (i.e. gcc -C -E)
Notice that these tools usually don't need or care about the binary form of the libraries you are using, only their header files.
However, I believe that extending a compiler to add much more static analysis abilities is a plus, since the analyzer can take advantage of all the work done (and the infrastructure provided) by the compiler. This is the main motivation for my (free software, obsolete in 2019) GCC MELT tool (which you can use to extend GCC to do some particular static analysis).
Some few static analyzers -e.g. coccinelle- are able to handle unpreprocessed C code (using macros). But then, they need some way to understand the macros which your code is using (otherwise they cannot check much: a macro invocation can expand to many thousands statements!).
N.B. all the analyzers mentioned above are free software.
I have been using this for many years: FlexeLint

C code preprocessing in Perl

I work on the C code parser in Perl.
At the moment I need to pre-process the code.
Implementation of the pre-processing seems to be a lot of work, so I am looking for a script or library that will allow to pre-process the file.
I found the following possibilities:
Text::CPP
Filter::CPP
Both of these require cpp which I don't have on my Windows machine. Are there any other options?
I'm not sure I understand your needs, but you are right that implementing this yourself is probably a poor choice. I was recently looking for alternative C preprocessors as well.
The Text::CPP module should only require a compiler to compile itself. If you can find a precompiled version, it should work for you.
The JCPP Java C Preprocessor by the same author could probably be made to work. You'd likely have to process externally and then load the result.
Filepp is an older Perl program that claims CPP compatability. There is a precompiled Windows binary to download.
There is a brand new Lua C-Preprocessor LCPP that might be something you could work with. Probably best as a standalone, but you might be able to use Inline::Lua.
SWIG comes with its own preprocessor implementation. I presume this would be available for Windows.
What else? The Boost Wave Preprocessor might work well and is available for Windows.
The MSVC Compiler can preprocess to a file.
Still, the easiest and best long term solution may be to just install CPP. It comes as part of GCC, which you can get from Cygwin or MinGW.

Why in Linux compiler we have to give additional arguments while compiling and running C programs?

I have implemented semaphores in Linux last year. But for that I have to use -lpthread.
Now while implementing log10() function in C, I surfed the INTERNET and I saw that I have to use -lm.
I want to know why these kind of command line arguments are necessary in Linux.And Does this rule is compiler oriented?
(In windows Turboc compiler, I never used these kind of arguments.)
You are instructing the compiler to look for certain libraries and use them to try and produce a final object file.
When you were doing your threading code, you used threading primitives. These threading primitives are implemented in a library called pthread, -lpthread tells the linker to use the library pthread, without providing this switch the compiler will not be able to produce a valid object file as it is missing threading code implementation.
On the file system the libraries can be found in /usr/lib and lib (among others) when you look in these directories you will see files start with the lib prefix. for example libpthreadxxxxxx. You will have to do your own research to figure out what the xxxx means.
The development cycle using unix style tools is very granular on the surface, when you use heavyweight IDE's (read: visual studiio for C++), the IDE implicetly links against loads of standard libraries, so often you do not need to supply the name of the libraries you will commonly use. However, when you start doing more advanced programming you will probably have to install and configure your IDE to use external code libraries. If you were to use threading primitives in visual studio, you most likely will not have to provide the compiler with information on where to look for threading primitives, Microsoft considers this a common library and every new project will implicitly link against it.
A little discussion on GCC
GCC is a very diverse compiler producing code for various different usage scenarios. As such they try to be neutral and do not make assumptions. For example pthread is a particular threading primitives implementation. However, even through now on Linux at least it is the defacto standard, it is not the only one. Other Unix implementation have had different implementation. When such choices exist it is not fair for the compiler developers to implicitly link against libraries. They do however implicitly link against standard libraries; for example G++ is just a wrapper command to the internal compiler code, it is a C++ front-end so it implicitly links against an implementation of the C++ standard library. Similarly the C front end links against a the standard C library.
People often do not want to use certain standard library implementation, and instead they might want to use another implementation, in such cases you have to explicetly inform the compiler to use an implementation that you provide. Such use cases are very granular and are surface level issues with G++. In visual studio, you would have to tinker a lot to make such changes generally, since it is not an anticipated use-case anymore.
wikipedia will provide you with more information.
Edit: I'll fix the spelling and Grammatical issues later :D
The option -l indicates to gcc what libraries must be used for linking. -lpthread stands for "use the pthread library", and -lm stands for "use the m library" which is the math library. These commands are relative to gcc, not linux.
Because by default, gcc only links the C library (libc), which contains the well-known functions printf, scanf, and many more.
log10 exists in a different library called libm, and thus you need to explictly tell gcc to link that library, with -lm. The same logic applies for -lpthread.
This is purely a backwards, harmful practice. Separating parts of the standard library into separate .so files does nothing but increase load time and memory usage. Good luck getting anyone to change it though... Just accept that you have to do it (and that POSIX specifically allows, but does not require, that an implementation require -lm for using the math functions and -lpthread for using threads, etc.) and move on to more important things.
Or, go pester Drepper about it on the glibc bug tracker/mailing list. He won't change his mind, but if you enjoy flamewars you can get some kicks...

Resources