How to make GDB stepi faster?

How to make GDB stepi faster? - c

I am using GDB to log the executed assembly instruction.
Here is the GDB script I made:
log.gdb
set confirm off
tbreak start_trigger
r
set logging overwrite on
set logging on
set height 0
set style enabled off
while (1)
x/i $pc
stepi
end
quit
And I ran gdb using
$ gdb results-mte/aha-compress.elf -x script.gdb -batch
This works well and writes the gdb.txt but it is really slow. Is there any way to make it faster?

Is there any way to make it faster?
Yes: don't do that.
Think about how single-stepping works. On a processor which supports single-step in hardware, GDB has to
enable single-stepping
resume inferior
wait for OS to deliver SIGCHLD
query inferior for current registers ($pc mostly) via ptrace
decode and print current instruction
... repeat for each instruction. This is expected to be about 1000-10000 times slower than native execution.
Usual solutions are to use some tracing mechanism. E.g. using intel_pt trace would make this only slightly slower than full native speed.
I'm running this GDB inside a Fedora RiscV on QEMU.
Now you are emulating the GDB itself, adding another factor of 10 or more slowdown.
What you probably want to do is ask QEMU to record the instructions it executes.
Typing "qemu trace instructions" into Google produces this post (among others).

Related

How to skip BKPT instruction in GDB on ARM?

i use __asm__ __volatile__ ("bkpt #0"); in code. GDB stop with signal SIGTRAP. Ok, but i want the code to run further.
in GDB i use 'continue', 'skip', but i still stay on the same instruction.
How to skip programm bkpt in GDB?

You need to do step in GDB.
If you use Eclipse use F6 (step over).
Works both with openOCD, seggerGDBserver and stm32 gdb server.
Breakpoint instruction:
Step over, but step in works as well

gdb commands gdbstub to write to _Unwind_DebugHook memory location

I ported a gdbstub for an OS I'm working on which runs on x86_64. The host which is running gdb is connected to the target that has the stub and the OS over serial. I have an int3 instruction in the source code to force the OS to jump into the stub's code which it does. The problem is if I try to step to the next instruction using nexti the stub stops responding and the host keeps timing out.
Looking at the packets that the host is sending I see this:
Sending packet: $Me1dc20,1:cc#6c...Ack
Timed out.
Timed out.
Timed out.
Ignoring packet error, continuing...
which means that the host is telling the stub to write cc (which is the opcode for int3) to memory location 0xe1dc20. I looked into that memory location and found this:
(gdb) x/16i 0xe1dc20
0xe1dc20 <_Unwind_DebugHook>: retq
0xe1dc21: data16 nopw %cs:0x0(%rax,%rax,1)
0xe1dc2c: nopl 0x0(%rax)
This function is part of gcc's code here https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2.c but it is not used anywhere in the source file that I am debugging.
Now obviously it is causing me troubles so I disabled memory writing functionality in my stub so that it longer responds to memory writing commands $M and $X and when I did I was able to execute nexti and step in gdb without issues. The stub uses the RFLAGS.TF for flow control.
The question is why is gdb trying to set a breakpoint in a function that I am not using anywhere and how do I prevent it from doing so? I thought about adding an if statement in the stub to ignore writes to this memory location but is there a less intrusive way of doing it?

The _Unwind_DebugHook symbol exists as a place for GDB (or any other debugger) to place a breakpoint and so catch exceptions. GDB will look for this symbol (in the debug info), and it it exists, place a breakpoint there.
These breakpoints will always get inserted, even when doing something as simple as a stepi, just in case - you might be about to step to that address.
One problem I see with the remote trace is that GDB will be expecting an OK packet to indicate that the write succeeded, this is why you're seeing the timeout messages.

how to single-step code on-target with no jtag, breakpoints, simulator, emulator

Let's say you have a pointer to function whose source you do not have and which is "untrusted" because it might read/write to disallowed memory region.
Before it executes each assembly instruction, you want to verify that it doesn't access disallowed memory regions.
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
This is for a functionality that needs to be enabled not only during development but during normal runtime.
Ideally, it'd run something like this:
void (*fptr)(int);
fptr = &someFunction; // untrusted, don't have source
// enable interrupts for each assembly instruction
_EN_INT();
// call the function
fptr();
// everytime the PC increments, some other code runs which verifies that if any load/stores are executed, it doesn't access some disallowed memory range
// disable interrupts for each assembly instruction
_DIS_INT();
QUESTION
Is it possible to call that function and pause execution after every assembly instruction?

The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
My answer assumes that you can modify the "OS" the way you need it...
Cortex MK20DX256VLH7
This seems to be a Cortex M4 CPU.
how to single-step code on-target with no jtag, breakpoints
From the doc, it doesn't say whether you NEED an external debugger to resume execution.
If the CPU is really stopped, you'll definitely need an external signal (e.g. from a debugger).
However most CPUs support software debugging. This means that an interrupt service routine is executed whenever a breakpoint is hit. To continue execution you simply return from the interrupt service routine.
I don't know about the Cortex M4 but for the Cortex M3 you'll have to set some special registers to enable that feature. Whenever a "BKPT" instruction is hit then interrupt #12 (*) is executed.
For code in RAM you simply write an BKPT instruction (0xBExx, e.g. 0xBEBE) to the address where you want to set your breakpoint. (Before writing you read out the value to be able to restore it later on).
For code in Flash the M3 has a "Flash patching unit" which allows you to specify up to three addresses which shall be read out as 0xBExx (0xBEBE ?) even if other data is stored there. This allows you to set up to 3 breakpoints in Flash.
Interesting for you: The register controlling the debug features in the M3 (named "DEMCR") also has a bit named "MON_STEP":
If you set this bit in interrupt handler #12 then exactly one instruction is executed after returning from the interrupt handler and interrupt #12 is triggered again. The use case for this feature is - of course - single-stepping code!
To stop single-stepping you'll have to clear the MON_STEP bit again...
Important 1:
I don't know if the MK20DX256VLH7 really has all these features. However because it is a Cortex M4 chip and the M4 should have nearly all features of the M3 these features should be present...
Important 2:
Implementing single-stepping and debugging is not done quickly. Assembly language knowledge will be very helpful and you'll need a lot of time...
From the doc, ...
You will not only need the documentation for the MK20DX256VLH7 from NXP but you'll also need the Cortex M4 documentation from ARM.
(*) Offset 4*12 in the vector table is meant here (which is named "IRQ(-4)" in some ARM documents); not IRQ12.

yes the ARM emulator/interpreter sounds exactly like what I want. Is there a free one?
qemu is open-source, most of it is GPLv2. https://wiki.qemu.org/License. You'd probably need to modify it a lot, because it's designed for use as a stand-alone wrapper for a whole Unix process (qemu-user) or whole machine (qemu-system).
I googled, and there's also http://www.unicorn-engine.org/ which is designed to be used as part of a larger program (written in C with bindings for calling from various languages). It's also GPLv2 (not LGPL), so you can use it if the rest of your code is also Free software.
It's actually based on the CPU-emulation code from QEMU; they stripped out all the device / BIOS emulation stuff to make a flexible library for just emulating CPUs.
Presumably you could configure some memory protections for it and set up the starting machine state, and let it run your function (with a return address that leads to some code that hands control back to your main code?)

How to correctly use a startup-ipi to start an application processor?

My goal is to let my own kernel start an application cpu. It uses the same mechanism as the linux kernel:
Send asserting and level triggered init-IPI
Wait...
Send deasserting and level triggered init-IPI
Wait...
Send up to two startup-IPIs with vector number (0x40000 >> 12) (the entry code for the application processor lies there)
Currently I'm just interested in making it work with QEMU. Unfortunately, instead of jumping to 0x40000, the application cpu jumps to 0x0 with the cs register set to 0x4000. (I checked with gdb).
The Intel MultiProcessor Specification (B.4.2) explains that the behavior that I noticed is valid if the target processor is halted immediately after RESET or INIT. But shouldn't this also apply to the code of the linux kernel? It sends the startup-IPI after the init-IPI. Or do I misunderstand the specification?
What can I do to have the application processor jump to 0x000VV000 and not to 0x0 with the cs register set to 0xVV00? I really can't see, where linux does something that changes the behavior.

It seems that I really misunderstood the specification: Since the application cpu is started in real mode 0x000VV000 is equivalent to 0xVV00:0x0000. It is not possible to represent the address just in the 16 bit ip register. Therefore a segment offset for the code segment is required.
Additionally, debugging real mode code with gdb is comparable complicated because it does not respect the segment offset. When required to see the disassembled code of the trampoline at the current position, it is necessary to calculate the physical location:
x/20i $eip+0xVV000
This makes gdb print the next 20 instructions at 0xVV00:$eip.

Program received signal SIGTRAP, Trace/breakpoint trap

I'm debugging a piece of (embedded) software. I've set a breakpoint on a function, and for some reason, once I've reached that breakpoint and continue I always come back to the function (which is an initialisation function which should only be called once). When I remove the breakpoint, and continue, GDB tells me:
Program received signal SIGTRAP, Trace/breakpoint trap.
Since I was working with breakpoints, I'm assuming I fell in a "breakpoint trap". What is a breakpoint trap?

Breakpoint trap just means the processor has hit a breakpoint. There are two possibilities for why this is happening. Most likely, your initialization code is being hit because your CPU is resetting and hitting the breakpoint again. The other possibility would be that the code where you set the breakpoint is actually run in places other than initialization. Sometimes with aggressive compiler optimization it can be hard to tell exactly which code your breakpoint maps to and which execution paths can get there.

The other possibility i can think of is:
1.Your process is running more than one thread.
For eg - 2 say x & y.
2.Thread y hits the break point but you have attached gdb to thread x.
This case is a Trace/breakpoint trap.

I got this problem running linux project in Visual studio 2015 and debugging remotely. My solution is project_properties -> Configuration properties -> Debugging -> Debugging mode and change the value from "gdbserver" to "gdb"

If you use V BAT as backup supply and your backup voltage drives lower than 1.65V then you get the same problem after conecting to a power supply.
In this case you have to disconnect all power supplies and reconnect with correct voltage level. Then the problem with debugging goes away.

I stucked with the same problem and in my
case the solution is to decrease SWDs frequency.
(I've got soldering staff between mcu and host, not so reliable)
I changed 4000k to 100k and problem gone.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight