How to resolve undefined instruction error during Gem5 ARM fs simulation - arm

I am currently trying to run a program compiled for arm64 on Gem5. I am using the sve/beta1 branch of Gem5, linux kernel 4.15 and the program makes use of glibc (it's statically linked).
To run Gem5 I am using the following command:
./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py --arm-sve-vl=8 --cpu-type=atomic --big-cpus=2 --little-cpus=2 --kernel=/dist/m5/system/binaries/linux4_15 --dtb=/dist/m5/system/binaries/armv8_gem5_v1_big_little_2_2.dtb --disk=/dist/m5/system/disks/linaro-minimal-aarch64.img
I am successfully booting the linux distro and the binary starts as well. However, after a while, I get the following error message:
[13602.881469] Program_Binary[1059]: undefined instruction: pc=000000006e018621
[13602.881484] Code: d503201f d11b43ff a9007bfd 910003fd (d50320ff)
I am not completely sure which instruction is causing this but I assume it is the instruction (d11b43ff) which according to the ARM reference manual is a msr instruction. Anyone have an idea as to how I could resolve this issue?

Applying the changes of commits 260b0fc, 33b311d, 6efe7e1 and fcc379d of the public/gem5 branch to the sve/beta1 branch fixed this issue for both FS and SE simulation.

In general, there is only one solution: to go and implement the missing instruction.
Newer gem5 actually prints the binary opcode of the unimplemented instruction on the error message, which you can then use a disassembler to determine which instruction it is: Using objdump for ARM architecture: Disassembling to ARM Before that you would just have to find the opcode with objdump based on the PC address first.
In this particular case, since you are on a branch, you should first produce a minimal (se.py if possible because easy) example that uses the instruction and see if it was fixed in master.
As mentioned at: How to compile and run an executable in gem5 syscall emulation mode with se.py? however there has been a MRS glibc pre-main fix in the pas few months at commit 260b0fc5381a47c681e7ead8e4f13aad45069665 which did not yet go into sve/beta1. Can you try to cherry pick it and see what happens?

Related

Why gcc produce different assembly result for user and kernel level code

I am trying to learn function call grammar in arm architecture and i compiled same code for user mode app and loadable kernel module. in attached picture you can see disassembly result for same function in two different mode. i am curious about reason of this difference.
You have compiled the code with wildly different options. The first is ARM (32bit only) and the 2nd is Thumb2 (mixed 16/32bit); see hex opcodes at the side. Thumb2 used the first 8 registers in a compact way (16bit encodings) so the call interface is different. Ie, fp is r7 versus r12. This is why you are seeing different call sequences for the same code.
Also, the first has profiling enabled (why __gnu_mcount_nc is inserted).
It really has nothing to do with 'kernel' versus 'user' code. It is possible to compile user code with similar option as the kernel uses. There are many gcc command line options which affect the 'call interface' (search AAPCS for more information and the gcc ARM options help).
Related: ARM Link and frame pointer

embedded newlib-nano printf causes hardfault

I compile the "same" code on 2 targets (one Freescale, one STM32 both with cortex M4). I use --specs=nano.specs and I have implemented the _write function as an empty function and this causes the whole printf to be optimized away by GCC's -Wno-unused-function even with -O0 on the STM32 target (seen in map). This is fine and I would like to reproduce that on Freescale target.
But on the Freescale target (with same compile flags) the printf causes hardfault. But if I go step by step with the debugger (assembly stepping) the printf goes through the library without hardfaulting. Simple breakpoint breakpoint sometimes not hit and run from any location in printf causes hardfault too (so it is unlikely that it is a peripheral issue).
So far I checked that stack and heap are not overlapping and some other far-fetched disassembly.
Why isn't the printf optimized away on freescale target ?
What can cause the library code to hardfault ?
Why is it OK when doing assembly step by step debug ?
EDIT:
Using arm-none-eabi-gcc 5.4.1 for both MCU with same libraries.
I do not want to remove printf, this is only a first step to be able to use
them or not.
Vector table has default weak vectors for all ISR so it should be OK
Using the register dump it seems that the faulty instruction is at address 4 (reset vector) so the new question is now: why does the chip reset ?
When ARM applications seem to work properly until printf is used, the most common problem is stack misalignment. Put a breakpoint at the entry point of the function that calls printf and inspect the stack pointer. If the pointer isn't double-word aligned, you've found your problem.
The common reason for crashing in printf with newlib is incorrectly set up free storage, especially if you are using an RTOS (ie FreeRTOS). Since 2019 NXP (formerly Freescale) includes my solution in MCUXpresso. You can find code and detailed explanation here: https://github.com/DRNadler/FreeRTOS_helpers

Compiling PowerPC binary with gcc and restrict useable registers

I have a PowerPC device running a software and I'd like to modify this software by inserting some own code parts.
I can easily write my own assembler code, put it somewhere in an unused region in RAM, replace any instruction in the "official" code by b 0x80001234 where 0x80001234 is the RAM address where my own code extension is loaded.
However, when I compile a C code with powerpc-eabi-gcc, gcc assumes it compiles a complete program and not only "code parts" to be inserted into a running program.
This leads to a problem: The main program uses some of the CPUs registers to store data, and when I just copy my extension into it, it will mess with the previous contents.
For example, if the main program I want to insert code into uses register 5 and register 8 in that code block, the program will crash if my own code writes to r5 or r8. Then I need to convert the compiled binary back to assembler code, edit the appropriate registers to use registers other than r5 and r8 and then compile that ASM source again.
Waht I'm now searching for is an option to the ppc-gcc which tells it "never ever use the PPC registers r5 and r8 while creating the bytecode".
Is this possible or do I need to continue crawling through the ASM code on my own replacing all the "used" registers with other registers?
You should think of another approach to solve this problem.
There is a gcc extension to reserve a register as a global variable:
register int *foo asm ("r12");
Please note that if you use this extension, your program does no longer confirm to the ABI of the operating system you are working on. This means that you cannot call any library functions without risking program crashes, overwritten variables, or crashes.

ARM THUMB mode issue on Cortex A15

we are using cortex A15, and kernel 3.8.
If I compile
arm-gcc-4.7.3 test.c -o test_thumb -mthumb
In Kernel if I set CONFIG_ARM_THUMB or unset. my THUMB(user space) always run,
So i could not understand the behavior.
Ok, so, I can't see a good reason to do what you're attempting to do ... so I'll assume you are asking out of pure curiosity.
It is not possible (in the processor) to disable decoding Thumb instructions or switching to Thumb state. The CONFIG_ARM_THUMB option is about making the use of Thumb code in applications safe with regards to how the operating system acts. This means, on the theoretical level, that not having this disabled could mean that in certain situations the program would not work properly - not that it would prevent actively Thumb code from executing.
In practise, the main effect it ever had was with OABI, which used an embedded value in the SWI (now SVC) instruction to identify which system call it was requesting.
I think OABI is not even supported by latest versions of GCC/binutils...
Any 4.7 toolchain is highly likely to be EABI.

Programmatically calling debugger in GCC

Is it possible to programmatically break into debugger from GCC?
For example I want something like:
#define STOP_EXECUTION_HERE ???
which when put on some code line will force debugger stop there.
Is it possible at all ?
I found some solution, but i can't use it because on my embedded ARM system I don't have signal.h.
(However I can use inline assembly).
What you are trying to do is called software breakpoint
It is very hard to say precisely without knowing how you actually debug. I assume your embedded system runs gdbstub. There are various possibilities how this can be supported:
Use dedicated BKPT instruction
This could be a standard way on your system and debugger to support software breakpoints
Feed invalid instruction to CPU
gdbstub could have placed own UNDEF ARM mode handler placed. If you go this route you must be aware of current CPU mode (ARM or THUMB), because instruction size will be different. Examples of undefined instructions:
ARM: 0xE7F123F4
THUMB: 0xDE56
In runtime CPU mode could be found from the lowest bit of PC register. But the easier way is to know how you compiled object file, where you placed software breakpoint
Use SWI instruction
We did so when used RealView ICE. Most likely this does not apply to you, if you run some OS on your embedded system. SWI is usually used by OS to implement system calls
Normally, the best way to do this is via a library function supplied with your device's toolchain.
I can't test it out, but a general solution might be to insert an ARM BKPT instruction. It takes an immediate argument, which your debugger might interpret, with potentially odd results.
You could run your application from GDB, and in the code call e.g. abort. This will stop your application at that point. I'm not sure if it's possible to continue after that though.

Resources