Call stack doesn't give a clue - c

We are having a 32Bit 'C' - Win32 Application that occasionally hangs
At the time of a Hang, (non Responsive state) the customer sent us the Dump
When i opened the dump in Windbg, the call stack gives the following
wow64win!NtUserMessageCall+0xa
wow64win!whNT32NtUserMessageCallCB+0x32
wow64win!Wow64DoMessageThunk+0x8b
wow64win!whNtUserMessageCall+0x12e
wow64!Wow64SystemServiceEx+0xd7
wow64cpu!TurboDispatchJumpAddressEnd+0x2d
wow64!RunCpuSimulation+0xa
wow64!Wow64LdrpInitialize+0x42a
ntdll!LdrpInitializeProcess+0x17e3
ntdll! ?? ::FNODOBFM::`string'+0x28ff0
ntdll!LdrInitializeThunk+0xe
I am not getting a clue as the call stack is not pointing to our code.
PS :-
The Hang situation is not just common for 64 bit systems.

you need switch to 32 bits in windbg, !wow64exts.sw, this print out real stack trace for wow64 target.

Related

gdb commands gdbstub to write to _Unwind_DebugHook memory location

I ported a gdbstub for an OS I'm working on which runs on x86_64. The host which is running gdb is connected to the target that has the stub and the OS over serial. I have an int3 instruction in the source code to force the OS to jump into the stub's code which it does. The problem is if I try to step to the next instruction using nexti the stub stops responding and the host keeps timing out.
Looking at the packets that the host is sending I see this:
Sending packet: $Me1dc20,1:cc#6c...Ack
Timed out.
Timed out.
Timed out.
Ignoring packet error, continuing...
which means that the host is telling the stub to write cc (which is the opcode for int3) to memory location 0xe1dc20. I looked into that memory location and found this:
(gdb) x/16i 0xe1dc20
0xe1dc20 <_Unwind_DebugHook>: retq
0xe1dc21: data16 nopw %cs:0x0(%rax,%rax,1)
0xe1dc2c: nopl 0x0(%rax)
This function is part of gcc's code here https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2.c but it is not used anywhere in the source file that I am debugging.
Now obviously it is causing me troubles so I disabled memory writing functionality in my stub so that it longer responds to memory writing commands $M and $X and when I did I was able to execute nexti and step in gdb without issues. The stub uses the RFLAGS.TF for flow control.
The question is why is gdb trying to set a breakpoint in a function that I am not using anywhere and how do I prevent it from doing so? I thought about adding an if statement in the stub to ignore writes to this memory location but is there a less intrusive way of doing it?
The _Unwind_DebugHook symbol exists as a place for GDB (or any other debugger) to place a breakpoint and so catch exceptions. GDB will look for this symbol (in the debug info), and it it exists, place a breakpoint there.
These breakpoints will always get inserted, even when doing something as simple as a stepi, just in case - you might be about to step to that address.
One problem I see with the remote trace is that GDB will be expecting an OK packet to indicate that the write succeeded, this is why you're seeing the timeout messages.

How to make GDB stepi faster?

I am using GDB to log the executed assembly instruction.
Here is the GDB script I made:
log.gdb
set confirm off
tbreak start_trigger
r
set logging overwrite on
set logging on
set height 0
set style enabled off
while (1)
x/i $pc
stepi
end
quit
And I ran gdb using
$ gdb results-mte/aha-compress.elf -x script.gdb -batch
This works well and writes the gdb.txt but it is really slow. Is there any way to make it faster?
Is there any way to make it faster?
Yes: don't do that.
Think about how single-stepping works. On a processor which supports single-step in hardware, GDB has to
enable single-stepping
resume inferior
wait for OS to deliver SIGCHLD
query inferior for current registers ($pc mostly) via ptrace
decode and print current instruction
... repeat for each instruction. This is expected to be about 1000-10000 times slower than native execution.
Usual solutions are to use some tracing mechanism. E.g. using intel_pt trace would make this only slightly slower than full native speed.
I'm running this GDB inside a Fedora RiscV on QEMU.
Now you are emulating the GDB itself, adding another factor of 10 or more slowdown.
What you probably want to do is ask QEMU to record the instructions it executes.
Typing "qemu trace instructions" into Google produces this post (among others).

How to correctly use a startup-ipi to start an application processor?

My goal is to let my own kernel start an application cpu. It uses the same mechanism as the linux kernel:
Send asserting and level triggered init-IPI
Wait...
Send deasserting and level triggered init-IPI
Wait...
Send up to two startup-IPIs with vector number (0x40000 >> 12) (the entry code for the application processor lies there)
Currently I'm just interested in making it work with QEMU. Unfortunately, instead of jumping to 0x40000, the application cpu jumps to 0x0 with the cs register set to 0x4000. (I checked with gdb).
The Intel MultiProcessor Specification (B.4.2) explains that the behavior that I noticed is valid if the target processor is halted immediately after RESET or INIT. But shouldn't this also apply to the code of the linux kernel? It sends the startup-IPI after the init-IPI. Or do I misunderstand the specification?
What can I do to have the application processor jump to 0x000VV000 and not to 0x0 with the cs register set to 0xVV00? I really can't see, where linux does something that changes the behavior.
It seems that I really misunderstood the specification: Since the application cpu is started in real mode 0x000VV000 is equivalent to 0xVV00:0x0000. It is not possible to represent the address just in the 16 bit ip register. Therefore a segment offset for the code segment is required.
Additionally, debugging real mode code with gdb is comparable complicated because it does not respect the segment offset. When required to see the disassembled code of the trampoline at the current position, it is necessary to calculate the physical location:
x/20i $eip+0xVV000
This makes gdb print the next 20 instructions at 0xVV00:$eip.

Using KSEG2 with wired TLB's in kernel mode

We have some code running in KUSEG and we see the need for more than 2Gb of memory that KUSEG provides. We tried to map some more physical memory into KSEG2 (since we run in kernel mode) by setting up wired TLBs. When I wrote a test application to access and write to the KSEG2 space (address 0xC0000000) I see that it throws a TLBS exception complaining that there is a TLB miss. I have double checked that the TLB's are setup correctly.
Am I missing something here. Has anyone used MIPS KSEG2 in kernel mode.
Thanks a lot in advance.
Vamsi.
On the chip we were using the KSEG2 address needs to have the high order 32 bits set to 1. Programming the virtual address as 0xFFFFFFFFC0000000 solved the problem.

Program received signal SIGTRAP, Trace/breakpoint trap

I'm debugging a piece of (embedded) software. I've set a breakpoint on a function, and for some reason, once I've reached that breakpoint and continue I always come back to the function (which is an initialisation function which should only be called once). When I remove the breakpoint, and continue, GDB tells me:
Program received signal SIGTRAP, Trace/breakpoint trap.
Since I was working with breakpoints, I'm assuming I fell in a "breakpoint trap". What is a breakpoint trap?
Breakpoint trap just means the processor has hit a breakpoint. There are two possibilities for why this is happening. Most likely, your initialization code is being hit because your CPU is resetting and hitting the breakpoint again. The other possibility would be that the code where you set the breakpoint is actually run in places other than initialization. Sometimes with aggressive compiler optimization it can be hard to tell exactly which code your breakpoint maps to and which execution paths can get there.
The other possibility i can think of is:
1.Your process is running more than one thread.
For eg - 2 say x & y.
2.Thread y hits the break point but you have attached gdb to thread x.
This case is a Trace/breakpoint trap.
I got this problem running linux project in Visual studio 2015 and debugging remotely. My solution is project_properties -> Configuration properties -> Debugging -> Debugging mode and change the value from "gdbserver" to "gdb"
If you use V BAT as backup supply and your backup voltage drives lower than 1.65V then you get the same problem after conecting to a power supply.
In this case you have to disconnect all power supplies and reconnect with correct voltage level. Then the problem with debugging goes away.
I stucked with the same problem and in my
case the solution is to decrease SWDs frequency.
(I've got soldering staff between mcu and host, not so reliable)
I changed 4000k to 100k and problem gone.

Resources