Value optimized out in GDB: Can gdb handle decoding it automatically? - c

1) First I want to know, how to decode such variables ?
I know the solutions to this problem, remove optimization flag, make it volatile, I dont want to do all that. Is there any solution which can be done without compiling the source again ? The problem is whenever i make any changes, it takes ages to compile, so I dont want to compile it with different optimization flags, also I had tried once changing the optimization flag, but it crashed just because of change in compilation flags, for reasons I cant fathom.
Also I am not able to find documentation about understanding various registers when I do "info reg". i was expecting some variable ( whose value I knew, what would it be ) but info reg is showing me all different values. I am missing something here. The architecture I am working on is x86_64
2) I want to know what are the restrictions faced by gdb to decode such register variables ? Or is this problem already tackled by someone. I have read at many places that going through the assembly code, you can find out which variable is in that register. If thats true, why it cant be build into gdb. Please point me to relevant pages if there are solutions to this problem

If you don't have the source and compile with debug/no optimizations (i.e. 3rd party code.) the best you can do would be to disassemble the code and try to determine how the variables are stored.
In gdb the disassemble instruction will dump the assembly for the given function:
disassemble <function name>
Or if symbols have been stripped
disassemble <address>
where <address> is the entry point to the function.
You may also have to inspect where the function is called to determine the calling conventions used.
Once you've figured out the structure of the functions and variable layout (stack variables or registers), when debugging you can step through each instruction with nexti and stepi and watch how the values in the variables change by dumping the contents of the registers or memory locations.
I don't know any good primers or tutorials myself but this question and its answers may be of use to you. Personally I find myself referencing the Intel manuals the most. They can be downloaded in pdf from Intel's website. I don't have a link handy at the moment. If someone else does perhaps they can update my answer.

Have you looked at compiling your code un-optimized?
Try one of these in your gcc options:
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.

Related

What files need to be modified to compile for a custom architecture of an existing cpu with gcc?

I've been looking at examples of C code that is compiled for some lesser known processors (like ZPU) using the gcc cross compiler.
Most of the working examples I see assume a certain arquitecture (Memory map and set of peripherals) and simply give you a recipe to compile for these and they work.
However I can find very little information on what needs to modified if you use the same cpu with a different memory map and set of peripherals.
From what I've read. There are two main files that I need to make sure that are done "right". The linker script that is used and the crt0.o (Which if I need to modify means recompiling the crt0.S which is assembler). On this last one, especially I find very little information on what is actually supposed to do (other that setting up reset there is no clear info, and I'm talking conceptually not for an specific processor. Although something for this would also be useful).
Can any one tell me what is the relationship between a the c files for the code of program (bare metal development), the crt0.S (specially why it is needed) and it's relationship with a working linker script?
PD: Answers of the form "read this book" are welcome and I would love them.
PD: I realize this kind of question is usually vague and closed quickly but I don't know where else to turn, so I ask for a bit of leniency.

Speed up compiled programs using runtime information like for example JVM does it?

Java programs can outperform compiled programming languages like C in specific tasks. It is because the JVM has runtime information, and does JIT compiling when necessary (i guess).
(example: http://benchmarksgame.alioth.debian.org/u32/performance.php?test=chameneosredux)
Is there anything like this for a compiled language?
(i am interested in C first of all)
After compiling the source, the developer runs it and tries to mimic typical workload.
A tool gathers information about the run, and then according to this data, it recompiles again.
gcc has -fprofile-arcs
from the manpage:
-fprofile-arcs
Add code so that program flow arcs are instrumented. During execution the
program records how many times each branch and call is executed and how many
times it is taken or returns. When the compiled program exits it saves this
data to a file called auxname.gcda for each source file. The data may be
used for profile-directed optimizations (-fbranch-probabilities), or for
test coverage analysis (-ftest-coverage).
I don't think the jvm has ever really beaten well optimized C code.
But to do something like that for c, you are looking for profile guided optimization, where the compiler use runtime information from a previous run, to re-compile the program.
Yes there are some tools like this, I think it's known as "profiler-guided optimization".
There are a number of optimizations. Importantly is to reduce backing-store paging, as well as the use of your code caches. Many modern processors have one code cache, maybe a second level of code cache, or a second unified data and code cache, maybe a third level of cache.
The simplest thing to do is to move all of your most-frequently used functions to one place in the executable file, say at the beginning. More sophisticated is for less-frequently-taken branches to be moved into some completely different part of the file.
Some instruction set architectures such as PowerPC have branch prediction bits in their machine code. Profiler-guided optimization tries to set these more advantageously.
Apple used to provide this for the Macintosh Programmer's Workshop - for Classic Mac OS - with a tool called "MrPlus". I think GCC can do it. I expect LLVM can but I don't know how.

Examples showing how switching to a modern C compiler can help discover bugs?

I am preparing a note to convince people that switching from GCC2 to GCC4 (as a C compiler) is a good idea.
In particular, I think it can reveal existing bugs. I would like to give examples, but as a Java programmer my experience of this situations is limited. One example is return type checking, I guess.
What are other convincing examples showing that switching to a modern compiler can help discover bugs that exist in C code?
Well, some gcc options which is very useful in bugs discovery:
-finstrument-functions - helps to build function call stack tracer. Especially on architectures where built-in __builtin_return_address() scope is limited only to current function at hand. Stack tracer together with linker's symbol file generated with -Map linker option are indispensable tools for detecting memory leaks (suppose you develop embedded system on which Valgrind can't be run or etc.)
-fstack-protector-all is very useful for detecting where code writes bytes to memory in out-of-buffer place. So this option detects buffer-overflow type bugs.
Errr... just those two options are in mind. Possibly there are more which I don't know ...
I assume these people have a particular piece of code they're using gcc2 with. The best thing to do might be to just take that code and compile it in gcc4 with all possible warnings turned on and compare the difference.
Some other differences between gcc2 and gcc4 are likely to be:
Better compile times (gcc4 is probably faster)
Much better code run times (gcc4 is better at optimizing, and has knowledge of CPU architecture that did not exist when gcc2 came out).
Better warning/error messages
I'm sure there are some interesting new GNU C extensions in gcc4

MIPS, ELF and partial linking

I have a big software project with a complicated build process, which works like this:
Compile individual source files.
Partially link object files for each module together into another .o using ld -r.
Hide private symbols in each module using objcopy -G.
Partially link module objects together, again using ld -r.
Link modules together into a shared object.
Step 3 is required to allow module-private global variables that aren't exported to the rest of the project.
This all works fine with ARM and IA32. Unfortunately, now I have to make things work on mips (specifically, mipsel-linux-gnu for Android). And the MIPS shared object ABI is significantly more complex than on the other platforms and it's not working.
What's happening is that step 5 is failing with this error:
CALL16 reloc at 0x1234 not against global symbol
This seems to be because the compiler generates CALL16 relocations to call functions in another compilation unit, but CALL16 only allows you to call global symbols --- and because of step 3, some of the symbols that we're trying to call aren't global any more.
At this point I can see several possible options:
persuade the linker to resolve the CALL16 relocations to normal intra-compilation-unit PC-relative calls at step 2.
ditto, but at step 4 or 5.
tell the compiler not to generate CALL16 relocations for inter-compilation-unit function calls.
other.
Disabling step 3 is, I'm afraid, not an option due to external requirements.
What I'd really, really like to do is to generate absolute code which gets patched at load time to the right addresses; it's smaller, much faster, and vastly simpler, and we don't need to share the library between processes. Unfortunately it appears that Android's dlopen() doesn't seem to support this.
Currently I'm out of my depth. Anyone have any suggestions?
This is gcc 4.4.5 (from Emdebian), binutils 2.20.1. Target BFD is elf32-tradlittlemips. Host OS is Linux, and I'm cross-compiling for Android.
Addendum
I am also getting warnings like this from step 4.
$MODULE.o: Can't find matching LO16 reloc against `$SYMBOLNAME' for R_MIPS_GOT16 at 0x18 in section `.text.$SYMBOLNAME'
Looking at the disassembly of the input to step 4, I can see that the compiler's generated code like this:
50: 8f9e0000 lw s8,0(gp)
50: R_MIPS_GOT16 $SYMBOLNAME
54: 8fd9001c lw t9,28(s8)
58: 0320f809 jalr t9
5c: 00a02021 move a0,a1
Doesn't GOT16 fix up to the high half of an address, and should be followed with a LO16 for the low half? But the code looks like it's trying to do a GOT indirection. This puzzles me. I've no idea if this is related to my earlier problem, or is a different problem, or is not a problem at all...
Update
Apparently MIPS simply does not support hidden global symbols!
We've gotten around it by mangling the names of the symbols that are supposed to be hidden so that nobody can tell what they are. This is pushing the external requirements quite a lot, but I sold management on it by pointing out that it was the only way to get a shippable product.
That's totally gruesome (and involves some deeply disgusting makefile work to do), so I'd rather like a better solution, if anyone has one...
I'm not sure about about the specific GOT issues you are having. There are a lot of bugs and issues with GOT, LO16/HI16 stuff in binutils. I think most have been fixed in the version your using, unless you are targeting MIPS16 (which you don't seem to be doing). LO16 is really only necessary there, beyond MIPS16 you're pulling the full 26-bit offset out of the GOT since you have 32-bit registers. LO16 isn't needed, but is still formally required by some ABI/APIs but it was fudged to be at most an warning (you may try removing a -Werror at that phase if you are using it). I only understand the very basics of that part honestly, the rest of your situation I had some recommendations on though, if not an answer (hard to be sure given the complexity of your setup).
In MIPS (and most assemblies I'm familiar with) you have your basic three levels of visibility: local, global, and weak. In addition you have comm for shared objects. GNU, of course, likes to have things more complicated and adds more. gas provides protected, hidden, and internal (minimally, it is hard to keep up with all the extensions). With all of this the steps your setting in manually fiddling around with visibility seem unnecessary.
If you can remove the intermediate globalness of the variables, it should remove you need to make them unglobal, which can only serve to simplify any GOT issues you run into later.
The overall problems is a bit confusing. I'm not sure what you mean by hidden global symbols, it's a bit a contradiction (of course portability and specific projects give crazy problems and restrictions). You seem to want cross assembly unit symbols at one stage, but not a later stage. Without using GNU extensions (something best avoided in my book), you may want to replace the globals in steps 1-2 with comm and/or weakglobals. You could always use use preprocessor trickery to avoid having multiple sub-units at the stage even (ugly, but that's portable code at this level).
You really have a setup of 1) sub-modules 2) sub-modules -> modules 3-5) modules -> shared library. Simplifying that can't hurt. You can always interpose at 2) or 3-5) a C-level interface just to find what assembly GCC will product for you architectures and use that as a basis for breaking visibility up into clean interfaces.
Wish I could give you a tailor made solution, but that's pretty impossible without your full project to work from. I can reassure that while MIPS location (especially the toolchains) have issues, the visibility options (especially if you are using gas, libbfd, and gcc) are the same.
your binutils is too old. some changesets in 2.23 may resolve your problem, like "hide symbols without PLT nor GOT references".

Optimized code on Unix?

What is the best and easiest method to debug optimized code on Unix which is written in C?
Sometimes we also don't have the code for building an unoptimized library.
This is a very good question. I had similar difficulties in the past where I had to integrate 3rd party tools inside my application. From my experience, you need to have at least meaningful callstacks in the associated symbol files. This is merely a list of addresses and associated function names. These are usually stripped away and from the binary alone you won't get them... If you have these symbol files you can load them while starting gdb or afterward by adding them. If not, you are stuck at the assembly level...
One weird behavior: even if you have the source code, it'll jump forth and back at places where you would not expect (statements may be re-ordered for better performance) or variables don't exist anymore (optimized away!), setting breakpoints in inlined functions is pointless (they are not there but part of the place where they are inlined). So even with source code, watch out these pitfalls.
I forgot to mention, the symbol files usually have the extension .gdb, but it can be different...
This question is not unlike "what is the best way to fix a passenger car?"
The best way to debug optimized code on UNIX depends on exactly which UNIX you have, what tools you have available, and what kind of problem you are trying to debug.
Debugging a crash in malloc is very different from debugging an unresolved symbol at runtime.
For general debugging techniques, I recommend this book.
Several things will make it easier to debug at the "assembly level":
You should know the calling
convention for your platform, so you
can tell what values are being passed
in and returned, where to find the
this pointer, which registers are "caller saved" and which are "callee saved", etc.
You should know your OS "calling convention" -- what a system call looks like, which register a syscall number goes into, the first parameter, etc.
You should
"master" the debugger: know how to
find threads, how to stop individual
threads, how to set a conditional
breakpoint on individual instruction, single-step, step into or skip over function calls,
etc.
It often helps to debug a working program and a broken program "in parallel". If version 1.1 works and version 1.2 doesn't, where do they diverge with respect to a particular API? Start both programs under debugger, set breakpoints on the same set of functions, run both programs and observe differences in which breakpoints are hit, and what parameters are passed.
Write small code samples by the same interfaces (something in its header), and call your samples instead of that optimized code, say simulation, to narrow down the code scope which you debug. Furthermore you are able to do error enjection in your samples.

Resources