Needs debugging symbols for __alignof__ - c

I'm debugging the glibc library. I've built it with -g3 -O flags. I can print most macros, but not this one. I'm debugging malloc(), and there are a lot of macros that use __alignof__. But I can't find its definition anywhere in glibc source code. Here is an example:
(gdb) p MALLOC_ALIGN_MASK
No symbol "__alignof__" in current context.
And also I got the same problem with __builtin_offsetof. But this one is an built in macro. So the 2 cases are a bit different. Solving this problem will speed up my debugging a bit.

You won't get any debugging information. Since __alignof__ is, like sizeof, only known at compile-time. See alignof from <stdalign.h>
Even by recompiling GCC itself, you won't get it (there is no debugging information available). __alignof__ is processed at compile time (so __alignof__ (double) is replaced by 8 during compilation, for x86-64 ABI).
You could guess by yourself the expanded value of MALLOC_ALIGN_MASK.
You could define a const int my_malloc_align_mask = MALLOC_ALIGN_MASK; and use p my_malloc_align_mask in the debugger.
I'm debugging the glibc library.
This is weird. You should trust the glibc library to behave as documented (yes, beware of undefined behavior).

GDB only has a very approximate implementation of C and C++. It does not use the same C and C++ parser as GCC, so some things are missing, including this GCC extension. GDB recognizes _Alignof, but it is not exactly the same as __alignof__. But it will work in this case, so you could change the glibc sources to use it.
LLDB uses the Clang parsers and therefore does not suffer from this particular problem, but will not help you here because apparently, the debugger does not recognize the DWARF data generated by the -g3 option, so the macro information is missing from the executable.

Related

Why would gcc change the order of functions in a binary?

Many questions about forcing the order of functions in a binary to match the order of the source file
For example, this post, that post and others
I can't understand why would gcc want to change their order in the first place?
What could be gained from that?
Moreover, why is toplevel-reorder default value is true?
GCC can change the order of functions, because the C standard (e.g. n1570 or newer) allows to do that.
There is no obligation for GCC to compile a C function into a single function in the sense of the ELF format. See elf(5) on Linux
In practice (with optimizations enabled: try compiling foo.c with gcc -Wall -fverbose-asm -O3 foo.c then look into the emitted foo.s assembler file), the GCC compiler is building intermediate representations like GIMPLE. A big lot of optimizations are transforming GIMPLE to better GIMPLE.
Once the GIMPLE representation is "good enough", the compiler is transforming it to RTL
On Linux systems, you could use dladdr(3) to find the nearest ELF function to a given address. You can also use backtrace(3) to inspect your call stack at runtime.
GCC can even remove functions entirely, in particular static functions whose calls would be inline expanded (even without any inline keyword).
I tend to believe that if you compile and link your entire program with gcc -O3 -flto -fwhole-program some non static but unused functions can be removed too....
And you can always write your own GCC plugin to change the order of functions.
If you want to guess how GCC works: download and study its source code (since it is free software) and compile it on your machine, invoke it with GCC developer options, ask questions on GCC mailing lists...
See also the bismon static source code analyzer (some work in progress which could interest you), and the DECODER project. You can contact me by email about both. You could also contribute to RefPerSys and use it to generate GCC plugins (in C++ form).
What could be gained from that?
Optimization. If the compiler thinks some code is like to be used a lot it may put that code in a different region than code which is not expected to execute often (or is an error path, where performance is not as important). And code which is likely to execute after or temporally near some other code should be placed nearby, so it is more likely to be in cache when needed.
__attribute__((hot)) and __attribute__((cold)) exist for some of the same reasons.
why is toplevel-reorder default value is true?
Because 99% of developers are not bothered by this default, and it makes programs faster. The 1% of developers who need to care about ordering use the attributes, profile-guided optimization or other features which are likely to conflict with no-toplevel-reorder anyway.

what gcc compiler options can I use for gfortran

I studied Option Summary for gfortran but found no compiler option to detect integer overflow. Then I found the GCC (GNU Compiler Collection) flag option -fsanitize=signed-integer-overflow here and used it when invoking gfortran. It works--integer overflow can be detected at run time!
So what does -fsanitize=signed-integer-overflow do here? Just adding to the machine code generated by gfortran some machine-level pieces that check integer overflow?
What is the relation between GCC (GNU Compiler Collection) flag options and gfortran compiler options ? What gcc compiler options can I use for gfortran, g++ etc ?
There is the GCC - GNU Compiler Collection. It shares the common backend and middleend and has frontends for different languages. For example frontends for C, C++ and Fortran which are usually invoked by commands gcc, g++ and gfortran.
It is actually more complicated, you can call gcc on a Fortran source and gfortran on a C source and it will work almost the same with the exceptions of libraries being linked (there are some other fine points). The appropriate frontend will be called based on the file extension or the language requested.
You can look almost all GCC (not just gcc) flags for all of the mentioned frontends. There are certain flags which are language specific. Normally you will get a warning like
gfortran -fcheck=all source.c
cc1: warning: command line option ‘-fcheck=all’ is valid for Fortran but not for C
but the file will compile fine, the option is just ignored and you will get a warning about that. Notice it is a C file and it is compiled by the gfortran command just fine.
The sanitization options are AFAIK not that language specific and work for multiple languages implemented in GCC, maybe with some exceptions for some obviously language specific checks. Especially -fsanitize=signed-integer-overflow which you ask about works perfectly fine for both C and C++. Signed integer overwlow is undefined behaviour in C and C++ and it is not allowed by the Fortran standard (which effectively means the same, Fortran just uses different words).
This isn't a terribly precise answer to your question, but an aha! moment, when learning about compilers, is learning that gcc (the GNU Compiler Collection), like llvm, is an example of a three-stage compiler.
The ‘front end’ parses the syntax of whichever language you're interested, and spits out an Abstract Syntax Tree (AST), which represents your program in a language-independent way.
Then the ‘middle end’ (terrible name, but ‘the clever bit’) reorganises that AST into another AST which is semantically equivalent but easier to turn into machine code.
Then the ‘back end’ turns that reorganised AST into assembler for one-or-other processor, possibly doing platform-specific micro-optimisations along the way.
That's why the (huge number of) gcc/llvm options are unexpectedly common to (apparently wildly) different languages. A few of the options are specific to C, or Fortran, or Objective-C, or whatever, but the majority of them (probably) are concerned with the middle and last bits, and so are common to all of the languages that gcc/llvm supports.
Thus the various options are specific to stage 1, 2 or 3, but may not be conveniently labelled as such; with this in mind, however, you might reasonably intuit what is and isn't relevant to the particular language you're interested in.
(It's for this sort of reason that I will dogmatically claim that CC++FortranJavaPerlPython is essentially a single language, with only trivial syntactical and library minutiae to distinguish between dialects).

What does gcc -D_REENTRANT really do?

I am writing Java bindings for a C library, and therefore working with JNI. Oracle specifies, reasonably, that native libraries for use with Java should be compiled with multithread-aware compilers.
The JNI docs give the specific example that for gcc, this multithread-awareness requirement should be met by defining one of the macros _REENTRANT or _POSIX_C_SOURCE. That seems odd to me. _REENTRANT and _POSIX_C_SOURCE are feature-test macros. GCC and POSIX documentation describe their effects in terms of defining symbols and making declarations visible, just as I would expect for any feature-test macro.
If I do not need the additional symbols or functions, then do these macros in fact do anything useful for me? Does one or both cause gcc to generate different code than it otherwise would? Do they maybe cause my code's calls to standard library functions to be linked to different implementations? Or is Oracle just talking out of its nether regions?
Edit:
Additionally, it occurs to me that reentrancy is a separate consideration from threading. Non-reentrancy can be an issue even for single-threaded programs, so Oracle's suggestion that defining _REENTRANT makes gcc multithread-aware now seems even more dubious.
The Oracle recommendation was written for Solaris, not for Linux.
On Solaris, if you compiled a .so without _REENTRANT and ended up loaded by a multi-threaded application then very bad things could happen (e.g. random data corruption of libc internals). This was because without the define you ended up with unlocked variants of some routines by default.
This was the case when I first read this documentation, which was maybe 15 years ago, the mention of the -mt flag for the sun studio compiler was added after I last read this document in any detail.
This is no longer the case - You always get the same routine now whether or not you compile with the _REENTRANT flag; it's now only a feature macro, and not a behaviour macro.

C/C++ Compiler listing what's defined

This question : Is there a way to tell whether code is now being compiled as part of a PCH? lead me to thinking about this.
Is there a way, in perhaps only certain compilers, of getting a C/C++ compiler to dump out the defines that it's currently using?
Edit: I know this is technically a pre-processor issue but let's add that within the term compiler.
Yes. In GCC
g++ -E -dM <file>
I would bet it is possible in nearly all compilers.
Boost Wave (a preprocessor library that happens to include a command line driver) includes a tracing capability to trace macro expansions. It's probably a bit more than you're asking for though -- it doesn't just display the final result, but essentially every step of expanding a macro (even a very complex one).
The clang preprocessor is somewhat similar. It's also basically a library that happens to include a command line driver. The preprocessor defines a macro_iterator type and macro_begin/macro_end of that type, that will let you walk the preprocessor symbol table and do pretty much whatever you want with it (including printing out the symbols, of course).

Performance difference between C program executables created by gcc and g++ compilers

Lets say I have written a program in C and compiled it with both gcc (as C) and g++ (as C++), which compiled executable will run faster: the one created by gcc or by g++? I think using the g++ compiler will make the executable slow, but I'm not sure about it.
Let me clarify my question again because of confusion about gcc:
Let's say I compile program a.c like this in the terminal:
gcc a.c
g++ a.c
Which a.out executable will run faster?
Firstly: the question (and some of the other answers) seem to be based on the faulty premise that C is a strict subset of C++, which is not in fact the case. Compiling C as C++ is not the same as compiling it as C: it can change the meaning of your program!
C will mostly compile as C++, and will mostly give the same results, but there are some things that are explicitly defined to give different behaviour.
Here's a simple example - if this is your a.c:
#include <stdio.h>
int main(void)
{
printf("%d\n", sizeof('x'));
return 0;
}
then compiling as C will give one result:
$ gcc a.c
$ ./a.out
4
and compiling as C++ will give a different result (unless you're using an unusual platform where int and char are the same size):
$ g++ a.c
$ ./a.out
1
because the C specification defines a character literal to have type int, and the C++ specification defines it to have type char.
Secondly: gcc and g++ are not "the same compiler". The same back end code is used, but the C and C++ front ends are different pieces of code (gcc/c-*.c and gcc/cp/*.c in the gcc source).
Even if you stick to the parts of the language that are defined to do the same thing, there is no guarantee that the C++ front end will parse the code in exactly the same way as the C front end (i.e. giving exactly the same input to the back end), and hence no guarantee that the generated code will be identical. So it is certainly possible that one might happen to generate faster code than the other in some cases - although I would imagine that you'd need complex code to have any chance of finding a difference, as most of the optimisation and code generation magic happens in the common back end of the compiler; and the difference could be either way round.
I think they they will both produce the same machine code, and therefore the same speed on your computer.
If you want to find out, you could compile the assembly for both and compare the two, but I'm betting that they create the same assembly, and therefore the same machine code.
Profile it and try it out. I'm certain it will depend on the actual code, even if it would require potentially a really weird case to get any different bytecode. Though if you don't have extern C {} around your C code, and or works fine in C, I'm not sure how "compiling it as though it were C++" could provide any speed, unless the particular compiler optimizations in g++ just happen to be a bit better for your particular situation...
The machine code generated should be identical. The g++ version of a.out will probably link in a couple of extra support libraries. This will make the startup time of a.out be slower by a few system calls.
There is not really any practical difference though. The Linux linker will not become noticeably slower until you reach 20-40 linked libraries and thousands of symbols to resolve.
The gcc and g++ executables are just frontends, they are not the actual compilers. They both run the actual C or C++ compilers (and ld, ar, whatever is needed to produce the output you asked for) based on the file extensions. So you'll get the exact same result. G++ is commonly used for C++ because it links with the standard C++ library (iostreams etc.).
If you want to compile C code as C++, either change the file extension, or do something like this:
gcc test.c -otest -x c++
http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/G_002b_002b-and-GCC.html
GCC is a compiler collection. It is mainly used for compilation of C,C++,Ada,Java and many more programming languages.
G++ is a part of gnu compiler collection(gcc).
I mean gcc includes g++ as well. When we use gcc for compilation of C++ it uses g++. The output files will be different because the G++ compiler uses its own run time library.
Edit: Okay, to clarify things, because we have a bit of confusion in naming here. GCC is the GNU Compiler Collection. It can compile Ada, C++, C, and a billion and a half other languages. It is a "backend" to the various languages "front end" compilers like GNAT. Go read the link i made at the top of the page from GCC.GNU.Org.
GCC can also refer to the GNU C Compiler. This will compile C++ code if given the -lstdc++ command, but normally will choke and die because it's not pulling in the C++ libraries.
G++, the GNU C++ Compiler, like the GNU C Compiler is a front end to the GNU Compiler Collection. It's difference between the C Compiler is that it automatically includes those libraries and makes a few other small tweaks, because it's assuming it's going to be fed C++ code to compile.
This is where the confusion comes from. Does this clarify things a bit?

Resources