embedded newlib-nano printf causes hardfault - c

I compile the "same" code on 2 targets (one Freescale, one STM32 both with cortex M4). I use --specs=nano.specs and I have implemented the _write function as an empty function and this causes the whole printf to be optimized away by GCC's -Wno-unused-function even with -O0 on the STM32 target (seen in map). This is fine and I would like to reproduce that on Freescale target.
But on the Freescale target (with same compile flags) the printf causes hardfault. But if I go step by step with the debugger (assembly stepping) the printf goes through the library without hardfaulting. Simple breakpoint breakpoint sometimes not hit and run from any location in printf causes hardfault too (so it is unlikely that it is a peripheral issue).
So far I checked that stack and heap are not overlapping and some other far-fetched disassembly.
Why isn't the printf optimized away on freescale target ?
What can cause the library code to hardfault ?
Why is it OK when doing assembly step by step debug ?
EDIT:
Using arm-none-eabi-gcc 5.4.1 for both MCU with same libraries.
I do not want to remove printf, this is only a first step to be able to use
them or not.
Vector table has default weak vectors for all ISR so it should be OK
Using the register dump it seems that the faulty instruction is at address 4 (reset vector) so the new question is now: why does the chip reset ?

When ARM applications seem to work properly until printf is used, the most common problem is stack misalignment. Put a breakpoint at the entry point of the function that calls printf and inspect the stack pointer. If the pointer isn't double-word aligned, you've found your problem.

The common reason for crashing in printf with newlib is incorrectly set up free storage, especially if you are using an RTOS (ie FreeRTOS). Since 2019 NXP (formerly Freescale) includes my solution in MCUXpresso. You can find code and detailed explanation here: https://github.com/DRNadler/FreeRTOS_helpers

Related

Why gcc produce different assembly result for user and kernel level code

I am trying to learn function call grammar in arm architecture and i compiled same code for user mode app and loadable kernel module. in attached picture you can see disassembly result for same function in two different mode. i am curious about reason of this difference.
You have compiled the code with wildly different options. The first is ARM (32bit only) and the 2nd is Thumb2 (mixed 16/32bit); see hex opcodes at the side. Thumb2 used the first 8 registers in a compact way (16bit encodings) so the call interface is different. Ie, fp is r7 versus r12. This is why you are seeing different call sequences for the same code.
Also, the first has profiling enabled (why __gnu_mcount_nc is inserted).
It really has nothing to do with 'kernel' versus 'user' code. It is possible to compile user code with similar option as the kernel uses. There are many gcc command line options which affect the 'call interface' (search AAPCS for more information and the gcc ARM options help).
Related: ARM Link and frame pointer

How can I figure out the full call stack with the current IP and BP registers?

I am doing a simple experiment on Ubuntu LTS 16.04.1 X86_64 with GCC 5.4.
The experiment is to get full call stack of a running C programme.
What I have done is:
Using ptrace's PTRACE_ATTACH & PTRACE_GETREGS to suspend a running C programme and get its current IP and BP.
Using PTRACE_PEEKDATA to get data at [BP] and [BP+4] (or +8 for 64 bits target), so that I can have the calling function's BP and the return address.
Because the BPs are a chain, I should be able to get a sequence of return addresses. After that, by analyzing the address sequence with listing file or dwarf data, I should finally be able to figure the full call stack. Something like 'main --> funcA --> funcB --> funcC ...'.
My problem is, this works fine if the call stack is totally inside my test programme's code. I mean the case when every function is written by me. However, if the test programme is stopped in a CRT or system API, such as 'scanf' or 'sleep', the BP chain no longer works.
I checked the disassambly and noticed that CRT or system API functions do not establish stack frame by 'push ebp' and 'mov ebp,esp' like what my functions do. No wonder why the above approach does not work. But I cannot explain why GDB can still work properly in such case?! So there must be many things I do not know about Linux C programme's call stack.
Could you figure my mistake/misunderstanding? Or could you simply suggest some articles/links for me to read? Thank you very much.
Because the BPs are a chain
They are not. It used to be that a frame pointer chain was used on i386, but for a few years now GCC defaults to -fomit-frame-pointer in optimized compiles even on i386. On x86_64 the -fno-omit-frame-pointer was never the default in optimized code.
this works fine if the call stack is totally inside my test programme's code.
This will only work if you compile without optimization (or with optimization if you also use -fno-omit-frame-pointer).
I cannot explain why GDB can still work properly in such case
GDB (and libunwind) uses DWARF unwind info, which you can examine with readelf -wf a.out.

ARM THUMB mode issue on Cortex A15

we are using cortex A15, and kernel 3.8.
If I compile
arm-gcc-4.7.3 test.c -o test_thumb -mthumb
In Kernel if I set CONFIG_ARM_THUMB or unset. my THUMB(user space) always run,
So i could not understand the behavior.
Ok, so, I can't see a good reason to do what you're attempting to do ... so I'll assume you are asking out of pure curiosity.
It is not possible (in the processor) to disable decoding Thumb instructions or switching to Thumb state. The CONFIG_ARM_THUMB option is about making the use of Thumb code in applications safe with regards to how the operating system acts. This means, on the theoretical level, that not having this disabled could mean that in certain situations the program would not work properly - not that it would prevent actively Thumb code from executing.
In practise, the main effect it ever had was with OABI, which used an embedded value in the SWI (now SVC) instruction to identify which system call it was requesting.
I think OABI is not even supported by latest versions of GCC/binutils...
Any 4.7 toolchain is highly likely to be EABI.

Value optimized out in GDB: Can gdb handle decoding it automatically?

1) First I want to know, how to decode such variables ?
I know the solutions to this problem, remove optimization flag, make it volatile, I dont want to do all that. Is there any solution which can be done without compiling the source again ? The problem is whenever i make any changes, it takes ages to compile, so I dont want to compile it with different optimization flags, also I had tried once changing the optimization flag, but it crashed just because of change in compilation flags, for reasons I cant fathom.
Also I am not able to find documentation about understanding various registers when I do "info reg". i was expecting some variable ( whose value I knew, what would it be ) but info reg is showing me all different values. I am missing something here. The architecture I am working on is x86_64
2) I want to know what are the restrictions faced by gdb to decode such register variables ? Or is this problem already tackled by someone. I have read at many places that going through the assembly code, you can find out which variable is in that register. If thats true, why it cant be build into gdb. Please point me to relevant pages if there are solutions to this problem
If you don't have the source and compile with debug/no optimizations (i.e. 3rd party code.) the best you can do would be to disassemble the code and try to determine how the variables are stored.
In gdb the disassemble instruction will dump the assembly for the given function:
disassemble <function name>
Or if symbols have been stripped
disassemble <address>
where <address> is the entry point to the function.
You may also have to inspect where the function is called to determine the calling conventions used.
Once you've figured out the structure of the functions and variable layout (stack variables or registers), when debugging you can step through each instruction with nexti and stepi and watch how the values in the variables change by dumping the contents of the registers or memory locations.
I don't know any good primers or tutorials myself but this question and its answers may be of use to you. Personally I find myself referencing the Intel manuals the most. They can be downloaded in pdf from Intel's website. I don't have a link handy at the moment. If someone else does perhaps they can update my answer.
Have you looked at compiling your code un-optimized?
Try one of these in your gcc options:
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.

Programmatically calling debugger in GCC

Is it possible to programmatically break into debugger from GCC?
For example I want something like:
#define STOP_EXECUTION_HERE ???
which when put on some code line will force debugger stop there.
Is it possible at all ?
I found some solution, but i can't use it because on my embedded ARM system I don't have signal.h.
(However I can use inline assembly).
What you are trying to do is called software breakpoint
It is very hard to say precisely without knowing how you actually debug. I assume your embedded system runs gdbstub. There are various possibilities how this can be supported:
Use dedicated BKPT instruction
This could be a standard way on your system and debugger to support software breakpoints
Feed invalid instruction to CPU
gdbstub could have placed own UNDEF ARM mode handler placed. If you go this route you must be aware of current CPU mode (ARM or THUMB), because instruction size will be different. Examples of undefined instructions:
ARM: 0xE7F123F4
THUMB: 0xDE56
In runtime CPU mode could be found from the lowest bit of PC register. But the easier way is to know how you compiled object file, where you placed software breakpoint
Use SWI instruction
We did so when used RealView ICE. Most likely this does not apply to you, if you run some OS on your embedded system. SWI is usually used by OS to implement system calls
Normally, the best way to do this is via a library function supplied with your device's toolchain.
I can't test it out, but a general solution might be to insert an ARM BKPT instruction. It takes an immediate argument, which your debugger might interpret, with potentially odd results.
You could run your application from GDB, and in the code call e.g. abort. This will stop your application at that point. I'm not sure if it's possible to continue after that though.

Resources