How to verify VFPv4 feature in ARM toolchain - arm

I have a pre-compiled ARM tool chain for a Cortex A15. I want to check whether it generate correct VFPv4 instructions. Any body have any ideas?

We can look it up in gcc release log which states
...
GCC now supports VFPv4-based FPUs and FPUs with single-precision-only VFP.
...
We can also verify it manually. According to ARM Architecture manual VFPv4 at least added Vector Fused Multiply Accumulate / Subtract.
void test_vfp4() {
asm("VFMA.F32 q1, q2, q3");
}
Compiling this with -mfpu=neon-vfpv4 switch (otherwise my tool chain says Error: selected processor does not support ARM mode 'vfma.f32 q1,q2,q3')
gcc -mfpu=neon-vfpv4 -O2 -marm -c vfpv4.c
and dumping the binary for with
arm-linux-gnueabihf-objdump -S vfpv4.o
should list below
00000000 <test_vfp4>:
0: f2042c56 vfma.f32 q1, q2, q3
4: e12fff1e bx lr
However I don't know how you can use this at C level since I couldn't find any intrinsic listed for these fused instructions or think of any other way.

Related

is there any use of __attribute__ ((interrupt)) for riscv compilers?

we can read here that the interrupt attribute keyword is use for ARM, AVR, CR16, Epiphany, M32C, M32R/D, m68k, MeP, MIPS, RL78, RX and Xstormy16.
does it have any impact on riscv compilation using riscv32-***-elf-gcc compilers?
There is a separate page for RISC-V which claims it works. You can find it here. Also you could probably verify it by compiling code with and without the attribute set.
I don't have riscv32 toolchain installed, but i managed to verify it using the riscv64 toolchain. You should reproduce the same steps using the riscv32 toolchain to make sure it works.
Using a simple test.c file:
__attribute__((interrupt))
void test() {}
Compiling it with riscv64-linux-gnu-gcc -c -o test.o test.c and disassembling with riscv64-linux-gnu-objdump -D -j.text test.o we can see it generates mret instruction at the end of the function:
0: 1141 addi sp,sp,-16
2: e422 sd s0,8(sp)
4: 0800 addi s0,sp,16
6: 0001 nop
8: 6422 ld s0,8(sp)
a: 0141 addi sp,sp,16
c: 30200073 mret
After removing the interrupt attribute the instruction changes to regular ret. According to this SO answer this seems like correct behaviour.
Normally, an interrupt handler requires a different entry/exit sequence than a normal function. The differences focus in the saving of all registers in the interrupt (normally, only some registers are preserved in a normal function call) and the return instruction is normally different (e.g. in the ARM it has to change processor mode of operation, probably this is also true in the RISCV processor)
The interrupt attribute informs the compiler of the routine properties, so it can generate the correct code for it.

using library in bare metal program for arm

Can someone help me out please! I do not know if the answer is general, or specific to the board and software versions I am working with. I am out of my previous areas here, and do not even know the right question to ask.
EDITs added at the bottom
What I currently want, is to create a program that will run standalone (bare metal; no OS) on a A20-OLinuXino-Micro-4GB board, that needs to use (at least) some standard math library calls. Eventually, I will want to load it into NAND, and run it on powerup, but for now I am trying to manually load it (loady) from the U-Boot (github.com/linux-sunxi/u-boot-sunxi/wiki) serial 'console', after booting from an SD card. Standalone is needed, because the linux distro level access to the hardware GPIO ports is not very flexible, when working with more than one bit (port in a port group) at a time, and quite slow. Too slow for the target application, and I did not really want to try modifying / adding a kernel module just to see if that would be fast enough.
Are there some standard gcc / ld flags needed to create a bare metal standalone program, and include some library routines? Beyond -ffreestanding and -static? Is there some special glue code needed? Is there something else I have not even thought of?
If found and looked over Beagleboard bare metal programming (stackoverflow.com/questions/6870712/beagleboard-bare-metal-programming). The answer there is good info, but is assembler, and does not reference any library. Application hangs when calling printf to uart with bare metal raspberry pi might show a cause for the problem. The (currently) bottom answer points to problems with VFP, and I already ran across problems with soft/hard floating point options. That shows some assembler code, but I am missing details about how to add a wrapper/glue to combine with c code. My assembler coding is rusty, but would adding equivalent code at the start of hello_world (at least before the reference to the sin() function (likely) get things working? Maybe adding it into the libstubs code.
I am using another A20 board for the main development environment.
$ gcc --version gcc (Debian 4.6.3-14) 4.6.3 Copyright (C) 2011 Free
Software Foundation, Inc. This is free software; see the source for
copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld.bfd --version GNU ld (GNU Binutils for Debian) 2.22 Copyright
2011 Free Software Foundation, Inc. This program is free software; you
may redistribute it under the terms of the GNU General Public License
version 3 or (at your option) a later version. This program has
absolutely no warranty.
$ uname -a Linux a20-OLinuXino 3.4.67+ #6 SMP PREEMPT Fri Nov 1
17:32:40 EET 2013 armv7l GNU/Linux
I have been able to create bootable U-Boot images for the board on SD cards from the repo, either building directly from the linux-sunxi distro that was supplied with the board, or by cross-compiling from a Fedora 21 machine. Same for the standalone hello_world program that came in the examples for U-boot, which can be loaded and run from the U-Boot console.
However, reducing the sample program to bare minimum, then adding code that needs math.h, -lm and -lc fails (in various iterations) with 'software interrupt' or 'undefined operation' type errors. The original sample program was being linked with -lgcc, but a little checking showed that nothing was actually being included from the library. The identical binary was created without the library, so the question might be 'what does it take to use any library with a bare metal program?'
sun7i# go 0x48000000
## Starting application at 0x48000000 ...
Hello math World
undefined instruction
pc : [<48000010>] lr : [<4800000c>]
sp : 7fb66da0 ip : 7fb672c0 fp : 00000000
r10: 00000002 r9 : 7fb66f0c r8 : 7fb67778
r7 : 7ffbbaf8 r6 : 00000001 r5 : 7fb6777c r4 : 48000000
r3 : 00000083 r2 : 7ffbc7fc r1 : 0000000a r0 : 00000011
Flags: nZCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
To get that far, I had to tweak build options, to specify hardware floating point, since that is how the base libraries were compiled.
Here are the corresponding source and build script files
hello_world.c
#include <common.h>
#include <math.h>
int hello_world (void)
{
double tst;
tst = 0.33333333333;
printf ("Hello math World\n");
tst = sin(0.5);
// printf ("sin test %d : %d\n", (int)tst, (int)(1000 * tst));
return (0);
}
build script
#! /bin/bash
UBOOT="/home/olimex/u-boot-sunxi"
SRC="$UBOOT/examples/standalone"
#INCLS="-nostdinc -isystem /usr/lib/gcc/arm-linux-gnueabihf/4.6/include -I$UBOOT/include -I$UBOOT/arch/arm/include"
INCLS="-I$UBOOT/include -I$UBOOT/arch/arm/include"
#-v
GCCOPTS="\
-D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x4a000000\
-Wall -Wstrict-prototypes -Wno-format-security\
-fno-builtin -ffreestanding -Os -fno-stack-protector\
-g -fstack-usage -Wno-format-nonliteral -fno-toplevel-reorder\
-DCONFIG_ARM -D__ARM__ -marm -mno-thumb-interwork\
-mabi=aapcs-linux -mword-relocations -march=armv7-a\
-ffunction-sections -fdata-sections -fno-common -ffixed-r9\
-mhard-float -pipe"
# -msoft-float -pipe
OBJS="hello_world.o libstubs.o"
LDOPTS="--verbose -g -Ttext 0x48000000"
#--verbose
#LIBS="-static -L/usr/lib/gcc/arm-linux-gnueabihf/4.6 -lm -lc"
LIBS="-static -lm -lc"
#-lgcc
gcc -Wp,-MD,stubs.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(stubs)"\
-D"KBUILD_MODNAME=KBUILD_STR(stubs)"\
-c -o stubs.o $SRC/stubs.c
ld.bfd -r -o libstubs.o stubs.o
gcc -Wp,-MD,hello_world.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(hello_world)"\
-D"KBUILD_MODNAME=KBUILD_STR(hello_world)"\
-c -o hello_world.o hello_world.c
ld.bfd $LDOPTS -o hello_world -e hello_world $OBJS $LIBS
objcopy -O binary hello_world hello_world.bin
EDITS added:
The application that this is to be part of needs both some fairly high speed GPIO and some math functions. Should only need sin() and maybe sqrt(). My previous testing for the GPIO got the toggling of single pin (port in a port group) up to 8MHz. The constraints for the application need to get the full cycle time in the 10µs (100Hhz) range, which includes reading all pins from a single port, and writing a few pins on other ports, synchronized with the timing limitations of the attached ADC chip (3 ADC reads). I have bare metal code that is doing (simulating) that process in about 2.1µs. Now I need to add in the math to process the values, the output of which will set some more outputs. Future planned improvements including using SIMD for the math, and dedicating the second core to the math, while the first does the GPIO and 'feeds' the calculations.
The needed math code / logic has already been written into a simulation program using very standard (c99) code. I just need to port it into the bare metal program. Need to get 'math' to work first.
As first thing, I suggest reading this excellent paper on Bare Metal programming with ARM and GNU http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf.
Then, I would make sure you avoid any syscall to the Linux Kernel (which you don't have and your compiler will try to make), e.g. avoiding returning values in void main() - that should never return, anyway.
Finally, I would either user newlib or, if you need to use a small subset of what libraries have to offer you, write a custom implementation.
Keep in mind you are using an Allinner SoC which is not the best for bare metal documentation, but you can find the TRM here http://www.soselectronic.com/a_info/resource/c/20_UM-V1.020130322.pdf, so I would check if libraries (if you decide to use them) or your code need some special silicon hardware to be initialized (some interconnect fabric, clock and power domains, etc.).
I strongly suggest, if you just need to use sin() and similar, to just deploy your own.

Wrong result with log10 math function in armv6 on Raspberry Pi

I have this very simple code:
#include <stdio.h>
#include <math.h>
int main()
{
long v = 35;
double app = (double)v;
app /= 100;
app = log10(app);
printf("Calculated log10 %lf\n", app);
return 0;
}
This code works perfectly on x86, but doesn't work on arm, on which the result is 0.00000. Some ideas?
Other info:
Operating system: linux 3.2.27
I build arm toolchain with ct-ng: arm-unknown-linux-gnueabi-
libc version 2.13
Output of gcc -v:
Using built-in specs.
COLLECT_GCC=arm-unknown-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/opt/x-tools/arm-unknown-linux-gnueabi/libexec/gcc/arm-unknown-linux-gnueabi/4.5.1/lto-wrapper
Target: arm-unknown-linux-gnueabi
Configured with: /home/mirko/misc/rasppi-ct-ng-files/.build/src/gcc-4.5.1/configure --build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu --target=arm-unknown-linux-gnueabi --prefix=/opt/x-tools/arm-unknown-linux-gnueabi --with-sysroot=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --enable-languages=c --disable-multilib --with-pkgversion=crosstool-NG-1.9.3 --enable-__cxa_atexit --disable-libmudflap --disable-libgomp --disable-libssp --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-gmp=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpfr=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpc=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-ppl=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-cloog=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-libelf=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --enable-threads=posix --enable-target-optspace --with-local-prefix=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --disable-nls --enable-symvers=gnu --enable-c99 --enable-long-long
Thread model: posix
gcc version 4.5.1 (crosstool-NG-1.9.3)
Floating point support on ARM Linux distributions is not trivial. Because of that you should use a toolchain matching your system that is operating system & hardware and use the right compile switches.
First thing you need to understand ARM's calling convention which is about "how arguments are passed when you call a function?". ARM being a RISC architecture, can only work on registers. There are no instructions manipulating memory directly. If you need to change a value in memory you first need to load it to a register, modify it, then you need to store it back on the memory.
When you call a function you may need to pass arguments to it, you can put arguments on stack (memory) but since ARM can only work with registers first thing your function would probably do will be loading them back to registers. To avoid this waste ARM calling convention uses registers to pass arguments. However since ARM has a limited number of registers, calling convention also dictates you to use only first four (r0-r3) registers for the first four arguments, remaining must still use stack to be passed.
Second thing is early ARM cores didn't have any floating point support, operations where implemented in software. (This is what is still supported via gcc's -mfloat-abi=soft.)
We can easily demonstrate what this means via following snippet.
float pi2(float a) {
return a * 3.14f;
}
Compiling this via -c -O3 -mfloat-abi=soft and obdumping gives us
00000000 <pi2>:
0: f24f 51c3 movw r1, #62915 ; 0xf5c3
4: b508 push {r3, lr}
6: f2c4 0148 movt r1, #16456 ; 0x4048
a: f7ff fffe bl 0 <__aeabi_fmul>
e: bd08 pop {r3, pc}
As you can see (actually it is not visible :) ) pi2 gets its parameter in r0, populates pi constant on r1 and uses __aeabi_fmul to multiply those and return result in r0. Since __aeabi_fmul also uses same calling convention, details about r0 is not visible. All our function does to populate r1 and delegate it to __aeabi_fmul.
When floating hardware support added to ARM (again because of architecture style), it came with its own set of registers (s0, s1, ...).
If we compile same snippet with -c -O3 -mfloat-abi=softfp and dump we get
00000000 <pi2>:
0: eddf 7a04 vldr s15, [pc, #16] ; 14 <pi2+0x14>
4: ee07 0a10 vmov s14, r0
8: ee27 7a27 vmul.f32 s14, s14, s15
c: ee17 0a10 vmov r0, s14
10: 4770 bx lr
12: bf00 nop
14: 4048f5c3 .word 0x4048f5c3
As you can see now compiler doesn't create a call to __aeabi_fmul but instead it creates a vmul.f32 instruction after it moves argument located in r0 to s14 and populates 3.14 on s15. After multiplication instruction it moves result available in s14 back to r0 since any caller of this function would expect it because of the calling convention.
Now if you think pi2 as a library provided to you by some third party, you can understand that both soft and softfp implementations do the same thing for you and you can use them interchangeably. If system provides them for you, you wouldn't care if your app runs on a system with hardware floating point support or not. This was quite good to keep old software running on new hardware.
However while keeping compability this approach introduces the overhead of moving values between ARM registers and FP registers. This obviously effects performance and addressed by a new calling convention, called hard by gcc. This new convention states that if you have floating point arguments in your function you can utilize floating point registers interleaved with normal ones, as well as you can return floating point values in floating point register s0.
Again if we compile our snippet with -c -O3 -mfloat-abi=hard and dump we get
00000000 <pi2>:
0: eddf 7a02 vldr s15, [pc, #8] ; c <pi2+0xc>
4: ee20 0a27 vmul.f32 s0, s0, s15
8: 4770 bx lr
a: bf00 nop
c: 4048f5c3 .word 0x4048f5c3
You can see there is no registers getting moved around. Argument to pi2 gets passed in s0, compiler created code to populate 3.14 in s15 and uses vmul.f32 s0, s0, s15 to get result we want in s0.
Big problem with this new convention is while you improve the code produced by compiler you completely kill compability. You can't expect an application built with hard convention to work with libraries built for soft/softfp and an application built for softfp won't work with libraries built for hard.
For more information on calling conventions you should check ARM's website.

Is there a way to use gcc to convert C to MIPS?

I completed a C to MIPS conversion for a class, and I want to check it against the assembly. I have heard that there is a way of configuring gcc so that it can convert C code to the MIPS architecture rather than the x86 architecture (my computer users an Intel i5 processor) and prints the output.
Running the terminal in Ubuntu (which comes with gcc), what command do I use to configure gcc to convert to MIPS? Is there anything I need to install as well?
EDIT:
Let me clarify. Please read this.
I'm not looking for which compiler to use, or people saying "well you could cross-compile, but instead you should use this other thing that has no instructions on how to set up."
If you're going to post that, at least refer me to instructions. GCC came with Ubuntu. I don't have experience on how to install compilers and it's not easy finding online tutorials for anything other than GCC. Then there's the case of cross-compiling I need to know about as well. Thank you.
GCC can produce assembly code for a large number of architectures, include MIPS. But what architecture a given GCC instance targets is decided when GCC itself is compiled. The precompiled binary you will find in an Ubuntu system knows about x86 (possibly both 32-bit and 64-bit modes) but not MIPS.
Compiling GCC with a target architecture distinct from the architecture on which GCC itself will be running is known as preparing a cross-compilation toolchain. This is doable but requires quite a bit of documentation-reading and patience; you usually need to first build a cross-assembler and cross-linker (GNU binutils), then build the cross-GCC itself.
I recommend using buildroot. This is a set of scripts and makefiles designed to help with the production of a complete cross-compilation toolchain and utilities. At the end of the day, you will get a complete OS and development tools for a target system. This includes the cross-compiler you are after.
Another quite different solution is to use QEMU. This is an emulator for various processors and systems, including MIPS systems. You can use it to run a virtual machine with a MIPS processor, and, within that machine, install an operating system for MIPS, e.g. Debian, a Linux distribution. This way, you get a native GCC (a GCC running on a MIPS system and producing code for MIPS).
The QEMU way might be a tad simpler; using cross-compilation requires some understanding of some hairy details. Either way, you will need about 1 GB of free disk space.
It's not a configuration thing, you need a version of GCC that cross-compiles to MIPS. This requires a special GCC build and is quite hairy to set up (building GCC is not for the faint of heart).
I'd recommend using LCC for this. It's way easier to do cross-compilation with LCC than it is with GCC, and building LCC is a matter of seconds on current machines.
For a one-time use for a small program or couple functions, you don't need to install anything locally.
Use Matt Godbolt's compiler explorer site, https://godbolt.org/, which has GCC and clang for various ISAs including MIPS and x86-64, and some other compilers.
Note that the compiler explorer by default filters directives so you can just see the instructions, leaving out stuff like alignment, sections, .globl, and so on. (For a function with no global / static data, this is actually fine, especially when you just want to use a compiler to make an example for you. The default section is .text anyway, if you don't use any directives.)
Most people that want MIPS asm for homework are using SPIM or MARS, usually without branch-delay slots. (Unlike real MIPS, so you need to tweak the compiler to not take advantage of the next instruction after a branch running unconditionally, even when it's taken.) For GCC, the option is -fno-delayed-branch - that will fill every delay slot with a NOP, so the code will still run on a real MIPS. You can just manually remove all the NOPs.
There may be other tweaks needed, like MARS may require you to use jr $31 instead of j $31, Tweak mips-gcc output to work with MARS. And of course I/O code will have to be implemented using MARS's toy system calls, not jal calls to standard library functions like printf or std::ostream::operator<<. You can usefully compile (and hand-tweak) asm for manipulating data, like multiplying integers or summing or reversing an array, though.
Unfortunately GCC doesn't have an option to use register names like $a0 instead of $r. For PowerPC there's -mregnames to use r1 instead of 1, but no similar option for MIPS to use "more symbolic" reg names.
int maybe_square(int num) {
if (num>0)
return num;
return num * num;
}
On Godbolt with GCC 5.4 -xc -O3 -march=mips32r2 -Wall -fverbose-asm -fno-delayed-branch
-xc compiles as C, not C++, because I find that more convenient than flipping between the C and C++ languages in the dropdown and having the site erase my source code.
-fverbose-asm comments the asm with C variable names for the destination and sources. (In optimized code that's often an invented temporary, but not always.)
-O3 enables full optimization, because the default -O0 debug mode is a horrible mess for humans to read. Always use at least -Og if you want to look at the code by hand and see how it implements the source. How to remove "noise" from GCC/clang assembly output?. You might also use -fno-unroll-loops, and -fno-tree-vectorize if compiling for an ISA with SIMD instructions.
This uses mul instead of the classic MIPS mult + mflo, thanks to the -march= option to tell GCC we're compiling for a later MIPS ISA, not whatever the default baseline is. (Perhaps MIPS I aka R2000, -march=mips1)
See also the GCC manual's section on MIPS target options.
# gcc 5.4 -O3
square:
blez $4,$L5
nop
move $2,$4 # D.1492, num # retval = num
j $31 # jr $ra = return
nop
$L5:
mul $2,$4,$4 # D.1492, num, num # retval = num * num
j $31 # jr $ra = return
nop
Or with clang, use -target mips to tell it to compile for MIPS. You can do this on your desktop; unlike GCC, clang is normally built with multiple back-ends enabled.
From the same Godbolt link, clang 10.1 -xc -O3 -target mips -Wall -fverbose-asm -fomit-frame-pointer. The default target is apparently MIPS32 or something like that for clang. Also, clang defaults to enabling frame pointers for MIPS, making the asm noisy.
Note that it chose to make branchless asm, doing if-conversion into a conditional-move to select between the original input and the mul result. Unfortunately clang doesn't support -fno-delayed-branch; maybe it has another name for the same option, or maybe there's no hope.
maybe_square:
slti $1, $4, 1
addiu $2, $zero, 1
movn $2, $4, $1 # conditional move based on $1
jr $ra
mul $2, $2, $4 # in the branch delay slot
In this case we can simply put the mul before the jr, but in other cases converting to no-branch-delay asm is not totally trivial. e.g. branch on a loop counter before decrementing it can't be undone by putting the decrement first; that would change the meaning.
Register names:
Compilers use register numbers, not bothering with names. For human use, you will often want to translate back. Many places online have MIPS register tables that show how $4..$7 are $a0..$a3, $8 .. $15 are $t0 .. $t7, etc. For example this one.
You should install a cross-compiler from the Ubuntu repositories. GCC MIPS C cross-compilers are available in the repositories. Pick according to your needs:
gcc-mips-linux-gnu - 32-bit big-endian.
gcc-mipsel-linux-gnu - 32-bit little-endian.
gcc-mips64-linux-gnuabi64 - 64-bit big-endian.
gcc-mips64el-linux-gnuabi64 - 64-bit little-endian.
etc.
(Note for users of Ubuntu 20.10 (Groovy Gorilla) or later, and Debian users: if you usually like to install your regular compilers using the build-essential package, you would be interested to know of the existence of crossbuild-essential-mips, crossbuild-essential-mipsel, crossbuild-essential-mips64el, etc.)
In the following examples, I will assume that you chose the 32-bit little-endian version (sudo apt-get install gcc-mipsel-linux-gnu). The commands for other MIPS versions are similar.
To deal with MIPS instead of the native architecture of your system, use the mipsel-linux-gnu-gcc command instead of gcc. For example, mipsel-linux-gnu-gcc -fverbose-asm -S myprog.c produces a file myprog.s containing MIPS assembly.
Another way to see the MIPS assembly: run mipsel-linux-gnu-gcc -g -c myprog.c to produce an object file myprog.o that contains debugging information. Then view the disassembly of the object file using mipsel-linux-gnu-objdump -d -S myprog.o. For example, if myprog.c is this:
#include <stdio.h>
int main()
{
int a = 1;
int b = 2;
printf("The answer is: %d\n", a + b);
return 0;
}
And if it is compiled using mipsel-linux-gnu-gcc -g -c myprog.c, then mipsel-linux-gnu-objdump -d -S myprog.o will show something like this:
myprog.o: file format elf32-tradlittlemips
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main() {
0: 27bdffd8 addiu sp,sp,-40
4: afbf0024 sw ra,36(sp)
8: afbe0020 sw s8,32(sp)
c: 03a0f025 move s8,sp
10: 3c1c0000 lui gp,0x0
14: 279c0000 addiu gp,gp,0
18: afbc0010 sw gp,16(sp)
int a = 1;
1c: 24020001 li v0,1
20: afc20018 sw v0,24(s8)
int b = 2;
24: 24020002 li v0,2
28: afc2001c sw v0,28(s8)
printf("The answer is: %d\n", a + b);
2c: 8fc30018 lw v1,24(s8)
30: 8fc2001c lw v0,28(s8)
34: 00621021 addu v0,v1,v0
38: 00402825 move a1,v0
3c: 3c020000 lui v0,0x0
40: 24440000 addiu a0,v0,0
44: 8f820000 lw v0,0(gp)
48: 0040c825 move t9,v0
4c: 0320f809 jalr t9
50: 00000000 nop
54: 8fdc0010 lw gp,16(s8)
return 0;
58: 00001025 move v0,zero
}
5c: 03c0e825 move sp,s8
60: 8fbf0024 lw ra,36(sp)
64: 8fbe0020 lw s8,32(sp)
68: 27bd0028 addiu sp,sp,40
6c: 03e00008 jr ra
70: 00000000 nop
...
You would need to download the source to binutils and gcc-core and compile with something like ../configure --target=mips .... You may need to choose a specific MIPS target. Then you could use mips-gcc -S.
You can cross-compile the GCC so that it generates MIPS code instead of x86. That's a nice learning experience.
If you want quick results you can also get a prebuilt GCC with MIPS support. One is the CodeSourcery Lite Toolchain. It is free, comes for a lot of architectures (including MIPS) and they have ready to use binaries for Linux and Windows.
http://www.codesourcery.com/sgpp/lite/mips/portal/subscription?#template=lite
You should compile your own version of gcc which is able to cross-compile. Of course this ain't easy, so you could look for a different approach.. for example this SDK.

ARM Cortex-A8: How to make use of both NEON and vfpv3

I'm using Cortex-A8 processor and I'm not understanding how to use the -mfpu flag.
On the Cortex-A8 there are both vfpv3 and neon co-processors. Previously I was not knowing how to use neon so I was only using
gcc -marm -mfloat-abi=softfp -mfpu=vfpv3
Now I have understood how SIMD processors run and I have written certain code using NEON intrinsics. To use neon co-processor now my -mfpu flag has to change to -mfpu=neon, so my compiler command line looks like this
gcc -marm -mfloat-abi=softfp -mfpu=neon
Now, does this mean that my vfpv3 is not used any more? I have lots of code which is not making use of NEON, do those parts not make use of vfpv3.
If both neon and vfpv3 are still used then I have no issues, but if only one of them is used how can I make use of both?
NEON implies having the traditional VFP support too. VFP can be used for "normal" (non-vector) floating-point calculations. Also, NEON does not support double-precision FP so only VFP instructions can be used for that.
What you can do is add -S to gcc's command line and check the assembly. Instructions starting with V (e.g. vld1.32, vmla.f32) are NEON instructions, and those starting with F (fldd, fmacd) are VFP. (Although ARM docs now prefer using the V prefix even for VFP instructions, GCC does not do that.)

Resources