Is it possible to "jump"/"skip" in GDB debugger? - c

Is it possible to jump to some location/address in the code/executable while debugging in GDB ?
Let say I have something similar to the following
int main()
{
caller_f1() {
f1(); // breakpoint
f2() } // want to skip f2() and jump
caller_f2() { // jump to this this location ??
f1();
f2(); }
}

To resume execution at a new address, use jump (short form: j):
jump LINENUM
jump *ADDRESS
The GDB manual suggests using tbreak (temporary breakpoint) before jumping.
The linenum can be any linespec expression, like +1 for the next line.
See #gospes's answer on a related question for a handy skip macro that does exactly that.
Using jump is only "safe" in un-optimized code (-O0), and even then only within the current function. It only modifies the program counter; it doesn't change any other registers or memory.
Only gcc -O0 compiles each source statement (or line?) into an independent block of instructions that loads variable values from memory and stores results. This lets you modify variable values with a debugger at any breakpoint, and makes jumping between lines in the machine code work like jumping between lines in the C source.
This is part of why -O0 makes such slow code: not only does the compiler not spend time optimizing, it is required to make slow code that spills/reloads everything after every statement to support asynchronous modification of variables and even program-counter. (Store/reload latency is about 5 cycles on a typical x86, so a 1 cycle add takes 6 cycles in -O0 builds).
gcc's manual suggests using -Og for the usual edit-compile-debug cycle, but even that light level of optimization will break jump and async modification of variables. If you don't want to do that while debugging, it's a good choice, especially for projects where -O0 runs so slowly that it's a problem.
To set program-counter / instruction-pointer to a new address without resuming, you can also use this:
set $pc = 0x4005a5
Copy/paste addresses from the disassembly window (layout asm / layout reg).
This is equivalent to tbreak + jump, but you can't use line numbers, only instruction addresses. (And you don't get a warning + confirmation-request for jumping outside the current function).
Then you can stepi from there. $pc is a generic gdb name for whatever the register is really called in the target architecture. e.g. RIP in x86-64. (See also the bottom of the x86 tag wiki for asm debugging tips for gdb.)

There seems to be a jump command which is exactly what you are looking for:
http://idlebox.net/2010/apidocs/gdb-7.0.zip/gdb_18.html#SEC163
Updated link:
http://web.archive.org/web/20140101193811/http://idlebox.net/2010/apidocs/gdb-7.0.zip/gdb_18.html#SEC163

Related

How to count how many times a global variable was used (read and write)?

I'm trying to optimize a C code project.
I would like to count how many times a global variable was used (read or write) in order to place it at the most suitable memory type.
For example, to store commonly used variables at the fast access memory type.
Data cache is disabled for determenistic reasons.
Is there a way to count how many times a variable was used without inserting counters or adding extra code? for example, using the assembly code?
The code is written in C.
In my possession:
A) (.map) file, generated by the GCC compiler from which I extracts the global variables names, addresses and sizes.
B) The assembly code of the project generated using the GCC compiler -S flag.
Thanks a lot,
GDB has something called watchpoints: https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html
Set a watchpoint for an expression. GDB will break when the expression
expr is written into by the program and its value changes. The
simplest (and the most popular) use of this command is to watch the
value of a single variable:
(gdb) watch foo
awatch [-l|-location] expr [thread thread-id] [mask maskvalue]
Set a watchpoint that will break when expr is either read from or written
into by the program.
Commands can be attached to watchpoints: https://sourceware.org/gdb/onlinedocs/gdb/Break-Commands.html#Break-Commands
You can give any breakpoint (or watchpoint or catchpoint) a series of commands to execute when your program stops due to that breakpoint… For example, here is how you could use breakpoint commands to print
the value of x at entry to foo whenever x is positive.
break foo if x>0
commands
silent
printf "x is %d\n",x
cont
end
The command should typically increment a variable or print "read/write" to a file, but you can really add other stuff too such as a backtrace. Unsure about the best way for outward communication using gdb. Maybe it is good enough for you to run it in interactive mode.
You can do this, using Visual Studio (or an other IDE): search for all places where your variable is used in the source code, put a conditional breakpoint, logging some information, attach to process, and launch the features which use that variable. You can count the instances in the output window.
I think, what you need is automatic instrumentation and/or profiling. GCC can actually do profile-guided optimization for you. As well as other types of instrumentation, the documentation even mentions a hook for implementing your own custom instrumentation.
There are several performance analysis tools out there such as perf and gprof profilers.
Also, execution inside a virtual machine could (at least in theory) do what you are after. valgrind comes to mind. I think, valgrind actually knows about all memory accesses. I'd look for ways to obtain this informaiton (and then corellate that with the map files).
I don't know if any of the above tools solves exactly your problem, but you definitely could use, say, perf (if it's available for your platform) to see in what areas of code significant time is spent. Then probably there are either a lot of expensive memory accesses, or just intensive computations, you can figure out which is the case by staring at the code.
Note that the compiler already allocates frequently accessed variables to registers, so the kind of information you are after won't give you an accurate picture. I.e. while some variable might be accessed a lot, cache-allocating it might not improve things much if its value already lives on a register most of the time.
Also consider that optimization affects your program greatly on the assembly level. So any performance statistics such as memory accesses counters would be different with and without optimization. And what should be of interest to you is the optimized case. On the other hand, restoring the information about what location corresponds to which variable is harder with the optimized program if feasible at all.

How to be certain that functions in my cpu emulator are not running in parallel?

I've been writing an arcade game emulator in C using SDL and have run into a weird problem with emulating systems with multiple cpus. I've got something like this:
int done = 0;
int outputDebug = 0;
FILE *fp = fopen("debug_output.txt", "w");
while (!done)
{
// Run one instruction on cpu #1
emulate(cpu1);
if (outputDebug)
fprintf(fp, "%d %d \n", cpu1->variable1, cpu1->variable2);
// Run one instruction on cpu #2
emulate(cpu2);
// pressing a key in here toggles outputDebug
checkInput();
}
Now here's what's weird...if I comment out the if statement which should have absolutely no bearing on the emulate functions, some bugs go away, and other bugs surface. If I add in some other inconsequential code, like a call to SDL_GetTicks() for example, I'll get yet a different set of bugs. I've reproduced this several times. It is always exactly the same bugs for each variation.
The conclusion I've come to is that the two calls to the emulate function must be getting run in parallel and having other code in there causes them to get out of sync to a greater or lesser degree which breaks the emulation in different ways. I'm using the GNU C compiler and not passing any optimization arguments...I tried passing -O0 even though that's the default, and everything behaves the same way.
I'm on Windows 7, using the GNU compiler as mentioned, and SDL is the only external library I'm using.
What's going on here and how do I fix it?
I'm guessing that somewhere in your emulate functions, you're referencing uninitialised memory - eg :
void somefunction(char *somearg)
{
static char *myvar;
if (somecondition) {
myvar = somearg;
}
// Now do something with myvar
If somecondition is true first time through, then myvar would be set to an address in the process' memory space.
If somecondition is subsequently false, then myvar will be left still pointing to that same address - which is still in the processes' memory space (so wouldn't necessarily cause a failure) but not what you're expecting (hence the 'bug's).
The left-over contents of that particular address in memory could well have changed depending on the calling stack - eg. it might well be pointing to an address that happened to have been used by that "fprintf" (hence one "set of bugs"). If outputDebug is false, then that memory address might last have been used by some other function (hence a different "set of bugs").
Of course, this is all speculation (since the code you've presented looks fine), but is just an example of the sort of things that could cause the symptoms you've described, and what to possibly look for.

How does including assembly inline with C code work?

I've seen code for Arduino and other hardware that have assembly inline with C, something along the lines of:
asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
__asm__("movb %bh (%eax)"); /*moves the byte from bh to the memory pointed by eax */
How does this actually Work? I realize every compiler is different, but what are the common reasons this is done, and how could someone take advantage of this?
The inline assembler code goes right into the complete assembled code untouched and in one piece. You do this when you really need absolutely full control over your instruction sequence, or maybe when you can't afford to let an optimizer have its way with your code. Maybe you need every clock tick. Maybe you need every single branch of your code to take the exact same number of clock ticks, and you pad with NOPs to make this happen.
In any case, lots of reasons why someone may want to do this, but you really need to know what you're doing. These chunks of code will be pretty opaque to your compiler, and its likely you won't get any warnings if you're doing something bad.
Usually the compiler will just insert the assembler instructions right into its generated assembler output. And it will do this with no regard for the consequences.
For example, in this code the optimiser is performing copy propagation, whereby it sees that y=x, then z=y. So it replaces z=y with z=x, hoping that this will allow it to perform further optimisations. Howver, it doesn't spot that I've messed with the value of x in the mean time.
char x=6;
char y,z;
y=x; // y becomes 6
_asm
rrncf x, 1 // x becomes 3. Optimiser doesn't see this happen!
_endasm
z=y; // z should become 6, but actually gets
// the value of x, which is 3
To get around this, you can essentially tell the optimiser not to perform this optimisation for this variable.
volatile char x=6; // Tell the compiler that this variable could change
// all by itself, and any time, and therefore don't
// optimise with it.
char y,z;
y=x; // y becomes 6
_asm
rrncf x, 1 // x becomes 3. Optimiser doesn't see this happen!
_endasm
z=y; // z correctly gets the value of y, which is 6
Historically, C compilers generated assembly code, which would then be translated to machine code by an assembler. Inline assembly arises as a simple feature — in the intermediate assembly code, at that point, inject some user-picked code. Some compilers directly generate machine code, in which case they contain an assembler or call an external assembler to generate the machine code for the inline assembly snippets.
The most common use for assembly code is to use specialized processor instructions that the compiler isn't able to generate. For example, disabling interrupts for a critical section, controlling processor features (cache, MMU, MPU, power management, querying CPU capabilities, …), accessing coprocessors and hardware peripherals (e.g. inb/outb instructions on x86), etc. You'll rarely find asm("movl %ecx %eax"), because that affects general-purpose registers that the C code around it is also using, but something like asm("mcr p15, 0, 0, c7, c10, 5") has its use (data memory barrier on ARM). The OSDev wiki has several examples with code snippets.
Assembly code is also useful to implement features that break C's flow control model. A common example is context switching between threads (whether cooperative or preemptive, whether in the same address space or not) requiring assembly code to save and restore register values.
Assembly code is also useful to hand-optimize small bits of code for memory or speed. As compilers are getting smarter, this is rarely relevant at the application level nowadays, but it's still relevant in much of the embedded world.
There are two ways to combine assembly with C: with inline assembly, or by linking assembly modules with C modules. Linking is arguably cleaner but not always applicable: sometimes you need that one instruction in the middle of a function (e.g. for register saving on a context switch, a function call would clobber the registers), or you don't want to pay the cost of a function call.
Most C compilers support inline assembly, but the syntax varies. It is typically introduced by the keyword asm, _asm, __asm or __asm__. In addition to the assembly code itself, the inline assembly construct may contain additional code that allows you to pass values between assembly and C (for example, requesting that the value of a local variable is copied to a register on entry), or to declare that the assembly code clobbers or preserves certain registers.
asm("") and __asm__ are both valid usage. Basically, you can use __asm__ if the keyword asm conflicts with something in your program. If you have more than one instructions, you can write one per line in double quotes, and also suffix a ’\n’ and ’\t’ to the instruction. This is because gcc sends each instruction as a string to as(GAS) and by using the newline/tab you can send correctly formatted lines to the assembler. The code snippet in your question is basic inline.
In basic inline assembly, there is only instructions. In extended assembly, you can also specify the operands. It allows you to specify the input registers, output registers and a list of clobbered registers. It is not mandatory to specify the registers to use, you can leave that to GCC and that probably fits into GCC’s optimization scheme better. An example for the extended asm is:
__asm__ ("movl %eax, %ebx\n\t"
"movl $56, %esi\n\t"
"movl %ecx, $label(%edx,%ebx,$4)\n\t"
"movb %ah, (%ebx)");
Notice that the '\n\t' at the end of each line except the last, and each line is enclosed in quotes. This is because gcc sends each as instruction to as as a string as I mentioned before. The newline/tab combination is required so that the lines are fed to as according to the correct format.

call stack unwinding in ARM cortex m3

I would like to create a debugging tool which will help me debug better my application.
I'm working bare-bones (without an OS). using IAR embedded workbench on Atmel's SAM3.
I have a Watchdog timer, which calls a specific IRQ in case of timeout (This will be replaced with a software reset on release).
In the IRQ handler, I want to print out (UART) the stack trace, of where exactly the Watchdog timeout occurred.
I looked in the web, and I didn't find any implementation of that functionality.
Anyone has an idea on how to approach this kind of thing ?
EDIT: OK, I managed to grab the return address from the stack, so I know exactly where the WDT timeout occurred.
Unwinding the whole stack is not simple as it first appears, because each function pushes different amount of local variables into the stack.
The code I end up with is this (for others, who may find it usefull)
void WDT_IrqHandler( void )
{
uint32_t * WDT_Address;
Wdt *pWdt = WDT ;
volatile uint32_t dummy ;
WDT_Address = (uint32_t *) __get_MSP() + 16 ;
LogFatal ("Watchdog Timer timeout,The Return Address is %#X", *WDT_Address);
/* Clear status bit to acknowledge interrupt */
dummy = pWdt->WDT_SR ;
}
ARM defines a pair of sections, .ARM.exidx and .ARM.extbl, that contain enough information to unwind the stack without debug symbols. These sections exist for exception handling but you can use them to perform a backtrace as well. Add -funwind-tables to force GCC to include these sections.
To do this with ARM, you will need to tell your compiler to generate stack frames. For instance with gcc, check the option -mapcs-frame. It may not be the one you need, but this will be a start.
If you do not have this, it will be nearly impossible to "unroll" the stack, because you will need for each function the exact stack usage depending on parameters and local variables.
If you are looking for some exemple code, you can check dump_stack() in Linux kernel sources, and find back the related piece of code executed for ARM.
It should be pretty straight forward to follow execution. Not programmatically in your isr...
We know from the ARM ARM that on a Cortex-M3 it pushes xPSR,
ReturnAddress, LR (R14), R12, R3, R2, R1, and R0 on the stack. mangles the lr so it can detect a return from interrupt then calls the entry point listed in the vector table. if you implement your isr in asm to control the stack, you can have a simple loop that disables the interrupt source (turns off the wdt, whatever, this is going to take some time) then goes into a loop to dump a portion of the stack.
From that dump you will see the lr/return address, the function/instruction that was interrupted, from a disassembly of your program you can then see what the compiler has placed on the stack for each function, subtract that off at each stage and go as far back as you like or as far back as you have printed the stack contents.
You could also make a copy of the stack in ram and dissect it later rather than doing such things in an isr (the copy still takes too much time but is less intrusive than waiting on the uart).
If all you are after is the address of the instruction that was interrupted, that is the most trivial task, just read that from the stack, it will be at a known place, and print it out.
Did I hear my name? :)
You will probably need a tiny bit of inline assembly. Just figure out the format of the stack frames, and which register holds the ordinary1 stack pointer, and transfer the relevant values into C variables from which you can format strings for output to the UART.
It shouldn't be too tricky, but of course (being rather low-level) you need to pay attention to the details.
1As in "non-exception"; not sure if the ARM has different stacks for ordinary code and exceptions, actually.
Your watchdog timer can fire at any point, even when the stack does not contain enough information to unwind (e.g. stack space has been allocated for register spill, but the registers not copied yet).
For properly optimized code, you need debug info, period. All you can do from a watchdog timer is a register and stack dump in a format that is machine readable enough to allow conversion into a core dump for gdb.

c statements to assembly

for a assigment i need to translate some C declarations to assembly using only AVR directives.
I'm wondering if anyone could give me some advice on learning how to do this.
For example:
translate 'char c;' and 'char* d;' to assembly statements
Please note, this is the first week im learning assembly,
Any help/advice would be appreciated
First, char c; and char* d; are declarations not statements.
What you can do is dump the assembly output of your C program with the avr-gcc option -S:
# Dump assembly to stdout
avr-gcc -mmcu=your_avr_mcu -S -c source.c -o -
Then you can reuse the relevant assembly output parts to create inline assembler statements.
Look here on how to write inline assember with avr-gcc:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
Without a compiler that you can disassemble from (avr-gcc is easy to come by), it may be difficult to try to understand what happens when a high level language is compiled.
you are simply declaring that you want a variable or an address when you use declarations like that. that doesnt necessarily, immediately, place something in assembly. often dead code and other things are removed from code by the compiler. Other times it is not until the very end of the compile process that you know where your variable may end up. Sometimes a char only ever lives in a register, so for a short period of time in the program that variable has a home. Sometimes there is a longer life the variable has to have a home the whole time the program is running and there are not enough registers to keep it in one forever so it will get a memory location allocated to it.
Likewise a pointer is an address which also lives in registers or in memory locations.
If you dont have a compiler where you can experiment with adding C code and seeing what happens. And even if you do you need to get the instruction set documentation for the desired processor family.
http://www.atmel.com/Images/doc0856.pdf
Look at the add operation for example add rd,rr, and it shows you that d and r are both between 0 and 31 so you could have add r10,r23. And looking at the operation that means that r10 = r10 + r23. If you had char variables that you wanted to add together in C this is one of the instructions the compiler might use.
There are two lds instructions a single 16 bit word version and one that takes two 16 bit words. (usually the assembler chooses that for you). rd is between 0 and 31 and k is an address in the memory space. If you have a global variable declared it will likely be accessed using lds or sts. So K is a pointer, a fixed pointer. Your char * in C can turn into a fixed number at compile time depending on what you do with that pointer in your code. If it is a dynamic thing then look at the flavors of ld and st, that use register pairs. So you might see a char * pointer turn into a pair of registers or a pair of memory locations that hold the pointer itself then it might use the x, y, or z register pairs, see ld and st, and maybe an adiw to find an offset to that pointer before using ld or st.
I have a simulator http://github.com/dwelch67/avriss. but it needs work, not fully debugged (unless you want to learn the instruction set through examining a simulator and its potentinal bugs). simavr and some others are out there that you can use to watch your code execute. http://gitorious.org/simavr

Resources