is there any use of __attribute__ ((interrupt)) for riscv compilers? - c

we can read here that the interrupt attribute keyword is use for ARM, AVR, CR16, Epiphany, M32C, M32R/D, m68k, MeP, MIPS, RL78, RX and Xstormy16.
does it have any impact on riscv compilation using riscv32-***-elf-gcc compilers?

There is a separate page for RISC-V which claims it works. You can find it here. Also you could probably verify it by compiling code with and without the attribute set.
I don't have riscv32 toolchain installed, but i managed to verify it using the riscv64 toolchain. You should reproduce the same steps using the riscv32 toolchain to make sure it works.
Using a simple test.c file:
__attribute__((interrupt))
void test() {}
Compiling it with riscv64-linux-gnu-gcc -c -o test.o test.c and disassembling with riscv64-linux-gnu-objdump -D -j.text test.o we can see it generates mret instruction at the end of the function:
0: 1141 addi sp,sp,-16
2: e422 sd s0,8(sp)
4: 0800 addi s0,sp,16
6: 0001 nop
8: 6422 ld s0,8(sp)
a: 0141 addi sp,sp,16
c: 30200073 mret
After removing the interrupt attribute the instruction changes to regular ret. According to this SO answer this seems like correct behaviour.

Normally, an interrupt handler requires a different entry/exit sequence than a normal function. The differences focus in the saving of all registers in the interrupt (normally, only some registers are preserved in a normal function call) and the return instruction is normally different (e.g. in the ARM it has to change processor mode of operation, probably this is also true in the RISCV processor)
The interrupt attribute informs the compiler of the routine properties, so it can generate the correct code for it.

Related

Converting C to nasm assembly in 16 bit [duplicate]

I am writing real mode function, which should be normal function with stackframes and so, but it should use %sp instead of %esp. Is there some way to do it?
GCC 5.2.0 (and possible earlier versions) support 16-bit code generation with the -m16 flag. However, the code will almost certainly rely on 32-bit processor features (such as 32-bit wide registers), so you should check the generated assembly carefully.
From the man pages:
The -m16 option is the same as -m32, except for that it outputs the
".code16gcc" assembly directive at the beginning of the assembly output
so that the binary can run in 16-bit mode.
Firstly, gcc could build 16bit code, because the linux kernel is go through realmode to protectmode, so it could even build 16bit c code.
Then, -m16 option is supported by GCC >= 4.9 and clang >= 3.5
gcc will ignore asm(".code16"),you can see it by -S output the assembly code surround by #APP #NO_APP
the linux kernel do the trick to compile 16bit c with a code16gcc.h(only have .code16gcc) pass to gcc compile params directly.
see Build 16-bit code with -m16 where possible, also see the linux kernel build Makefile
if you direct put the asm(".code16gcc"), see Writing 16-bit Code, it's not real 16bit code, call, ret, enter, leave, push, pop, pusha, popa, pushf, and popf instructions default to 32-bit size
GCC does not produce 8086 code. The GNU AS directive .code16gcc can be used to assemble the output of GCC to run in a 16-bit mode, put asm(".code16gcc") at the start of your C source, your program will be limited to 64Kibytes.
On modern GCC versions you can pass the -m16 argument to gcc which will produce code to run in a 16-bit mode. It still requires a 386 or later.
As far as I know, GCC does not support generation of code for 16-bit x86. For legacy bootloaders and similar purposes, you should write a small stub in assembly language to put the cpu in 32-bit mode and pass off execution to 32-bit code. For other purposes you really shouldn't be writing 16-bit code.

Is it possible in practice to compile millions of small functions into a static binary?

I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.
The linker complains about relocation truncations, very much like those in this question.
I've already tried using -mcmodel=large, but as the answer to that same question says, I would
"need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.
I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:
ld: failed to convert GOTPCREL relocation; relink with --no-relax
and adding that flag also doesn't help.
I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.
I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.
Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?
Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).
call *__libc_start_main#GOTPCREL(%rip)
start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.
Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:
movabs $__libc_start_main#GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.
The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.

How to resolve undefined instruction error during Gem5 ARM fs simulation

I am currently trying to run a program compiled for arm64 on Gem5. I am using the sve/beta1 branch of Gem5, linux kernel 4.15 and the program makes use of glibc (it's statically linked).
To run Gem5 I am using the following command:
./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py --arm-sve-vl=8 --cpu-type=atomic --big-cpus=2 --little-cpus=2 --kernel=/dist/m5/system/binaries/linux4_15 --dtb=/dist/m5/system/binaries/armv8_gem5_v1_big_little_2_2.dtb --disk=/dist/m5/system/disks/linaro-minimal-aarch64.img
I am successfully booting the linux distro and the binary starts as well. However, after a while, I get the following error message:
[13602.881469] Program_Binary[1059]: undefined instruction: pc=000000006e018621
[13602.881484] Code: d503201f d11b43ff a9007bfd 910003fd (d50320ff)
I am not completely sure which instruction is causing this but I assume it is the instruction (d11b43ff) which according to the ARM reference manual is a msr instruction. Anyone have an idea as to how I could resolve this issue?
Applying the changes of commits 260b0fc, 33b311d, 6efe7e1 and fcc379d of the public/gem5 branch to the sve/beta1 branch fixed this issue for both FS and SE simulation.
In general, there is only one solution: to go and implement the missing instruction.
Newer gem5 actually prints the binary opcode of the unimplemented instruction on the error message, which you can then use a disassembler to determine which instruction it is: Using objdump for ARM architecture: Disassembling to ARM Before that you would just have to find the opcode with objdump based on the PC address first.
In this particular case, since you are on a branch, you should first produce a minimal (se.py if possible because easy) example that uses the instruction and see if it was fixed in master.
As mentioned at: How to compile and run an executable in gem5 syscall emulation mode with se.py? however there has been a MRS glibc pre-main fix in the pas few months at commit 260b0fc5381a47c681e7ead8e4f13aad45069665 which did not yet go into sve/beta1. Can you try to cherry pick it and see what happens?

embedded newlib-nano printf causes hardfault

I compile the "same" code on 2 targets (one Freescale, one STM32 both with cortex M4). I use --specs=nano.specs and I have implemented the _write function as an empty function and this causes the whole printf to be optimized away by GCC's -Wno-unused-function even with -O0 on the STM32 target (seen in map). This is fine and I would like to reproduce that on Freescale target.
But on the Freescale target (with same compile flags) the printf causes hardfault. But if I go step by step with the debugger (assembly stepping) the printf goes through the library without hardfaulting. Simple breakpoint breakpoint sometimes not hit and run from any location in printf causes hardfault too (so it is unlikely that it is a peripheral issue).
So far I checked that stack and heap are not overlapping and some other far-fetched disassembly.
Why isn't the printf optimized away on freescale target ?
What can cause the library code to hardfault ?
Why is it OK when doing assembly step by step debug ?
EDIT:
Using arm-none-eabi-gcc 5.4.1 for both MCU with same libraries.
I do not want to remove printf, this is only a first step to be able to use
them or not.
Vector table has default weak vectors for all ISR so it should be OK
Using the register dump it seems that the faulty instruction is at address 4 (reset vector) so the new question is now: why does the chip reset ?
When ARM applications seem to work properly until printf is used, the most common problem is stack misalignment. Put a breakpoint at the entry point of the function that calls printf and inspect the stack pointer. If the pointer isn't double-word aligned, you've found your problem.
The common reason for crashing in printf with newlib is incorrectly set up free storage, especially if you are using an RTOS (ie FreeRTOS). Since 2019 NXP (formerly Freescale) includes my solution in MCUXpresso. You can find code and detailed explanation here: https://github.com/DRNadler/FreeRTOS_helpers

Is there a way to use gcc to convert C to MIPS?

I completed a C to MIPS conversion for a class, and I want to check it against the assembly. I have heard that there is a way of configuring gcc so that it can convert C code to the MIPS architecture rather than the x86 architecture (my computer users an Intel i5 processor) and prints the output.
Running the terminal in Ubuntu (which comes with gcc), what command do I use to configure gcc to convert to MIPS? Is there anything I need to install as well?
EDIT:
Let me clarify. Please read this.
I'm not looking for which compiler to use, or people saying "well you could cross-compile, but instead you should use this other thing that has no instructions on how to set up."
If you're going to post that, at least refer me to instructions. GCC came with Ubuntu. I don't have experience on how to install compilers and it's not easy finding online tutorials for anything other than GCC. Then there's the case of cross-compiling I need to know about as well. Thank you.
GCC can produce assembly code for a large number of architectures, include MIPS. But what architecture a given GCC instance targets is decided when GCC itself is compiled. The precompiled binary you will find in an Ubuntu system knows about x86 (possibly both 32-bit and 64-bit modes) but not MIPS.
Compiling GCC with a target architecture distinct from the architecture on which GCC itself will be running is known as preparing a cross-compilation toolchain. This is doable but requires quite a bit of documentation-reading and patience; you usually need to first build a cross-assembler and cross-linker (GNU binutils), then build the cross-GCC itself.
I recommend using buildroot. This is a set of scripts and makefiles designed to help with the production of a complete cross-compilation toolchain and utilities. At the end of the day, you will get a complete OS and development tools for a target system. This includes the cross-compiler you are after.
Another quite different solution is to use QEMU. This is an emulator for various processors and systems, including MIPS systems. You can use it to run a virtual machine with a MIPS processor, and, within that machine, install an operating system for MIPS, e.g. Debian, a Linux distribution. This way, you get a native GCC (a GCC running on a MIPS system and producing code for MIPS).
The QEMU way might be a tad simpler; using cross-compilation requires some understanding of some hairy details. Either way, you will need about 1 GB of free disk space.
It's not a configuration thing, you need a version of GCC that cross-compiles to MIPS. This requires a special GCC build and is quite hairy to set up (building GCC is not for the faint of heart).
I'd recommend using LCC for this. It's way easier to do cross-compilation with LCC than it is with GCC, and building LCC is a matter of seconds on current machines.
For a one-time use for a small program or couple functions, you don't need to install anything locally.
Use Matt Godbolt's compiler explorer site, https://godbolt.org/, which has GCC and clang for various ISAs including MIPS and x86-64, and some other compilers.
Note that the compiler explorer by default filters directives so you can just see the instructions, leaving out stuff like alignment, sections, .globl, and so on. (For a function with no global / static data, this is actually fine, especially when you just want to use a compiler to make an example for you. The default section is .text anyway, if you don't use any directives.)
Most people that want MIPS asm for homework are using SPIM or MARS, usually without branch-delay slots. (Unlike real MIPS, so you need to tweak the compiler to not take advantage of the next instruction after a branch running unconditionally, even when it's taken.) For GCC, the option is -fno-delayed-branch - that will fill every delay slot with a NOP, so the code will still run on a real MIPS. You can just manually remove all the NOPs.
There may be other tweaks needed, like MARS may require you to use jr $31 instead of j $31, Tweak mips-gcc output to work with MARS. And of course I/O code will have to be implemented using MARS's toy system calls, not jal calls to standard library functions like printf or std::ostream::operator<<. You can usefully compile (and hand-tweak) asm for manipulating data, like multiplying integers or summing or reversing an array, though.
Unfortunately GCC doesn't have an option to use register names like $a0 instead of $r. For PowerPC there's -mregnames to use r1 instead of 1, but no similar option for MIPS to use "more symbolic" reg names.
int maybe_square(int num) {
if (num>0)
return num;
return num * num;
}
On Godbolt with GCC 5.4 -xc -O3 -march=mips32r2 -Wall -fverbose-asm -fno-delayed-branch
-xc compiles as C, not C++, because I find that more convenient than flipping between the C and C++ languages in the dropdown and having the site erase my source code.
-fverbose-asm comments the asm with C variable names for the destination and sources. (In optimized code that's often an invented temporary, but not always.)
-O3 enables full optimization, because the default -O0 debug mode is a horrible mess for humans to read. Always use at least -Og if you want to look at the code by hand and see how it implements the source. How to remove "noise" from GCC/clang assembly output?. You might also use -fno-unroll-loops, and -fno-tree-vectorize if compiling for an ISA with SIMD instructions.
This uses mul instead of the classic MIPS mult + mflo, thanks to the -march= option to tell GCC we're compiling for a later MIPS ISA, not whatever the default baseline is. (Perhaps MIPS I aka R2000, -march=mips1)
See also the GCC manual's section on MIPS target options.
# gcc 5.4 -O3
square:
blez $4,$L5
nop
move $2,$4 # D.1492, num # retval = num
j $31 # jr $ra = return
nop
$L5:
mul $2,$4,$4 # D.1492, num, num # retval = num * num
j $31 # jr $ra = return
nop
Or with clang, use -target mips to tell it to compile for MIPS. You can do this on your desktop; unlike GCC, clang is normally built with multiple back-ends enabled.
From the same Godbolt link, clang 10.1 -xc -O3 -target mips -Wall -fverbose-asm -fomit-frame-pointer. The default target is apparently MIPS32 or something like that for clang. Also, clang defaults to enabling frame pointers for MIPS, making the asm noisy.
Note that it chose to make branchless asm, doing if-conversion into a conditional-move to select between the original input and the mul result. Unfortunately clang doesn't support -fno-delayed-branch; maybe it has another name for the same option, or maybe there's no hope.
maybe_square:
slti $1, $4, 1
addiu $2, $zero, 1
movn $2, $4, $1 # conditional move based on $1
jr $ra
mul $2, $2, $4 # in the branch delay slot
In this case we can simply put the mul before the jr, but in other cases converting to no-branch-delay asm is not totally trivial. e.g. branch on a loop counter before decrementing it can't be undone by putting the decrement first; that would change the meaning.
Register names:
Compilers use register numbers, not bothering with names. For human use, you will often want to translate back. Many places online have MIPS register tables that show how $4..$7 are $a0..$a3, $8 .. $15 are $t0 .. $t7, etc. For example this one.
You should install a cross-compiler from the Ubuntu repositories. GCC MIPS C cross-compilers are available in the repositories. Pick according to your needs:
gcc-mips-linux-gnu - 32-bit big-endian.
gcc-mipsel-linux-gnu - 32-bit little-endian.
gcc-mips64-linux-gnuabi64 - 64-bit big-endian.
gcc-mips64el-linux-gnuabi64 - 64-bit little-endian.
etc.
(Note for users of Ubuntu 20.10 (Groovy Gorilla) or later, and Debian users: if you usually like to install your regular compilers using the build-essential package, you would be interested to know of the existence of crossbuild-essential-mips, crossbuild-essential-mipsel, crossbuild-essential-mips64el, etc.)
In the following examples, I will assume that you chose the 32-bit little-endian version (sudo apt-get install gcc-mipsel-linux-gnu). The commands for other MIPS versions are similar.
To deal with MIPS instead of the native architecture of your system, use the mipsel-linux-gnu-gcc command instead of gcc. For example, mipsel-linux-gnu-gcc -fverbose-asm -S myprog.c produces a file myprog.s containing MIPS assembly.
Another way to see the MIPS assembly: run mipsel-linux-gnu-gcc -g -c myprog.c to produce an object file myprog.o that contains debugging information. Then view the disassembly of the object file using mipsel-linux-gnu-objdump -d -S myprog.o. For example, if myprog.c is this:
#include <stdio.h>
int main()
{
int a = 1;
int b = 2;
printf("The answer is: %d\n", a + b);
return 0;
}
And if it is compiled using mipsel-linux-gnu-gcc -g -c myprog.c, then mipsel-linux-gnu-objdump -d -S myprog.o will show something like this:
myprog.o: file format elf32-tradlittlemips
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main() {
0: 27bdffd8 addiu sp,sp,-40
4: afbf0024 sw ra,36(sp)
8: afbe0020 sw s8,32(sp)
c: 03a0f025 move s8,sp
10: 3c1c0000 lui gp,0x0
14: 279c0000 addiu gp,gp,0
18: afbc0010 sw gp,16(sp)
int a = 1;
1c: 24020001 li v0,1
20: afc20018 sw v0,24(s8)
int b = 2;
24: 24020002 li v0,2
28: afc2001c sw v0,28(s8)
printf("The answer is: %d\n", a + b);
2c: 8fc30018 lw v1,24(s8)
30: 8fc2001c lw v0,28(s8)
34: 00621021 addu v0,v1,v0
38: 00402825 move a1,v0
3c: 3c020000 lui v0,0x0
40: 24440000 addiu a0,v0,0
44: 8f820000 lw v0,0(gp)
48: 0040c825 move t9,v0
4c: 0320f809 jalr t9
50: 00000000 nop
54: 8fdc0010 lw gp,16(s8)
return 0;
58: 00001025 move v0,zero
}
5c: 03c0e825 move sp,s8
60: 8fbf0024 lw ra,36(sp)
64: 8fbe0020 lw s8,32(sp)
68: 27bd0028 addiu sp,sp,40
6c: 03e00008 jr ra
70: 00000000 nop
...
You would need to download the source to binutils and gcc-core and compile with something like ../configure --target=mips .... You may need to choose a specific MIPS target. Then you could use mips-gcc -S.
You can cross-compile the GCC so that it generates MIPS code instead of x86. That's a nice learning experience.
If you want quick results you can also get a prebuilt GCC with MIPS support. One is the CodeSourcery Lite Toolchain. It is free, comes for a lot of architectures (including MIPS) and they have ready to use binaries for Linux and Windows.
http://www.codesourcery.com/sgpp/lite/mips/portal/subscription?#template=lite
You should compile your own version of gcc which is able to cross-compile. Of course this ain't easy, so you could look for a different approach.. for example this SDK.

Resources