I'm very new to gdb. I wrote a very simple hello world program
#include <stdio.h>
int main() {
printf("Hello world\n");
return 0;
}
I compiled it with -g to add debugging symbols
gcc -g -o hello hello.c
I'm not sure what to do next since I'm not familiar with gdb. I'd like to be able to use gdb to inspect assembly code. That's what I was told on IRC.
First, start the program to stop exactly at the beginning of main function.
(gdb) start
Switch to assembly layout to see assembly instructions interactively in a separate window.
(gdb) layout asm
Use stepi or nexti commands to step through the program. You will see current instruction pointer in assembly window moving when you walk over the assembly instructions in your program.
printf is pretty much the last function you would want to use to learn assembly, library calls would come later, but you wouldnt need to use library/system calls. Using a debugger is going to lead you into a rats nest using system calls as well. Try something like this, particularly if you want to learn assembly language from this exercise.
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a^b^3);
}
gcc -O2 -c so.c -o so.o
objdump -D so.o
Disassembly of section .text:
0000000000000000 <fun>:
0: 89 f0 mov %esi,%eax
2: 83 f0 03 xor $0x3,%eax
5: 31 f8 xor %edi,%eax
7: c3 retq
I highly recommend you avoid x86 as your first instruction set. Try something cleaner...
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-objdump -D so.o
00000000 <fun>:
0: 2303 movs r3, #3
2: 4059 eors r1, r3
4: 4048 eors r0, r1
6: 4770 bx lr
msp430-gcc -O2 -c so.c -o so.o
msp430-objdump -D so.o
00000000 <fun>:
0: 3f e0 03 00 xor #3, r15 ;#0x0003
4: 0f ee xor r14, r15
6: 30 41 ret
dead serious about this one being the first instruction set, msp430 is close to it but this one makes the most sense, unfortunately the gnu assembler syntax doesnt match the books, and also unfortunate the world thought in octal then and we think hex now...
pdp11-aout-gcc -O2 -c so.c -o so.o
pdp11-aout-objdump -D so.o
00000000 <_fun>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 15c0 0003 mov $3, r0
8: 1d41 0006 mov 6(r5), r1
c: 7840 xor r1, r0
e: 1d41 0004 mov 4(r5), r1
12: 7840 xor r1, r0
14: 1585 mov (sp)+, r5
16: 0087 rts pc
Nice simulators or hardware for all, best to learn in a simulator than on real hardware...
Most of the instruction sets I learned I learned by writing a disassembler, arm and thumb would fall into this category as they are fixed instruction length (if you avoid thumb2 extensions). Or just write a simulator, msp430 and pdp11 fall into this category. Either of the latter is an afternoon project, either of the former is a long weekend project. You will know each instruction set better than the average person, even some who have been programming in it for a while.
If you insist on x86 (I strongly urge you away from this) use an 8086/8088 simulator like pcemu and stick to the original instruction set, use nasm or a86 or whatever as needed to do this. It is not as nice of an instruction set even back then but back then makes more sense than now. bitsavers has nice scanned with search capability versions of the original intel documents, best place to start.
arm docs are at arm (looking for the architectural reference manual for armv5 I think they call it now). msp430 just look at wikipedia instruction set is there pdp11 google it and using C to machine code to disassembly figure out the syntax.
If you really really want to have fun get the amber core from opencores it is an arm2/3, almost all the instructions are the same as in armv4 and later, can use the gnu tools. Use verilator to build and simulate and see a working processor from the inside. Understand that just like taking 100 programmers and giving them a programming task and getting anywhere from 1 to 100 different solutions, take an instruction set and give 100 engineers the task of implementing it you get anywhere from 1 to 100 different solutions. Arm itself has re-designed their cores for the same instruction sets several times over, much less the few legal clones.
recommended order pdp11, msp430, thumb, arm, then mips and if you still feel you need to disassemble some x86. PIC12/14 is simple and educational (should take you like a half hour to an hour to make a simulator for that), 6502, z80, 8051, 6800 and a number of others are also historically educational like x86 to look at the documentation but not necessary to write programs. if you start with a good one, then each Nth instruction set is that much easier from the second one on. They are more alike than different but you do get to see different things like how to do things without flags in mips, etc...I have left out several other instruction sets that are either still available in silicon or are interesting for various reasons.
Another approach is install clang/llvm and take a quick or longer look at every instruction set that llc can produce (compile to bitcode/bytecode then use llc to do the backend to whatever instruction set). Like above taking the same code and seeing what different instruction sets look like at least with that compiler and its settings is very educational and helps mentally get a feel for how to break programming tasks down into these atomic steps.
Related
I have an empty program in LLVM IR:
define i32 #main(i32 %argc, i8** %argv) nounwind {
entry:
ret i32 0
}
I'm cross-compiling it on Intel x86-64 Windows for ARM Linux using ELLCC, with the following command:
ecc++ hw.ll -o hw.o -target arm-linux-engeabihf
It completes without errors and generates an ELF binary.
When I take the binary to a Raspberry Pi Model B+ (running Raspbian), I get only the following error:
Illegal instruction
I don't know how to tell what's wrong from the disassembled code. I tried other ARM Linux targets but the behavior was the same. What's wrong?
The exact same file builds, links and runs fine for other targets like i386-linux-eng, x86_64-w64-mingw32, etc (that I could test on), again using the ELLCC toolchain.
Assuming the library and startup code isn't at fault, this is what the disassembly of main itself looks like:
.text:00010188 e24dd008 sub sp, sp, #8
.text:0001018c e3002000 movw r2, #0
.text:00010190 e58d0004 str r0, [sp, #4]
.text:00010194 e1a00002 mov r0, r2
.text:00010198 e58d1000 str r1, [sp]
.text:0001019c e28dd008 add sp, sp, #8
.text:000101a0 e12fff1e bx lr
I'd guess it's choking on the movw at 0x0001018c. The movw/movt encodings which can handle full 16-bit immediate values first appeared in the ARMv6T2 version of the architecture - the ARM1176 in the original Pi models predates that, only supporting original ARMv6*.
You need to tell the compiler to generate code appropriate to the thing you're running on - I don't know ELLCC, but I'd guess from this it's fairly modern and up-to-date and thus defaulting to something newer like ARMv6T2 or ARMv7. Otherwise, it's akin to generating code for a Pentium and hoping it works on an 80486 - you might be lucky, you might not. That said, there's no good reason it should have chosen that encoding in the first place - it's not as if 0 can't be encoded in a 'classic' mov instruction...
The decadent option, however, would be to consider this a perfect excuse to replace the Pi with a Pi 2 - the Cortex-A7s in that are nice capable ARMv7 cores ;)
* Lies for clarity. I think 1176 might actually be v6K, but that's irrelevant here. I'm not sure if anything actually exists as plain ARMv6, and all the various architecture extensions are frankly a hideous mess
I'm currently having a weird issue when trying to run a C program that calls a very simple ARM assembly function. Here's my C code:
#include <stdio.h>
#include <stdlib.h>
extern void getNumber(int* pointer);
int main()
{
int* pointer = malloc(sizeof(int));
getNumber(pointer);
printf("%d\n", *pointer);
return 0;
}
And here's my assembly code:
.section .text
.align 4
.arm
.global getNumber
.type getNumber STT_FUNC
getNumber:
mov r1, #0
str r1, [r0]
bx lr
So far so good. However, if I add a line with mov r7, #0 at the top of getNumber, the program segfaults when trying to access pointer. After inspecting it with gdb I noticed now the pointer itself is stored at a very low address, such as 0xa.
Now, I did a bit of research and apparently r7 is the frame pointer for THUMB code (according to this). However, I'm clearly stating I don't want to use THUMB instructions in the .arm line in my assembly code. Why on earth is it failing?
I'm compiling both the .c and .s files using arm-linux-gnueabihf-gcc, and I'm running the program on a Cortex-A8 based board running Arch Linux.
Edit: The program runs fine if I compile using the -fomit-frame-pointer flag. However, I still want to know why is it using r7 as the frame pointer.
Edit 2: It's still failing even if I use .code 32 instead of .arm.
The ARM Procedure Call Standard specifies the following:
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6).
So your assembly language subroutine must save & restore r7 if it uses it.
You might be avoiding the problem with your small test program by by not compiling for Thumb mode, but you're just accidentally avoiding the problem. Anything that links to your assembly routine is entitled to expect that r7 will be preserved.
You're crashing the program because your are corrupting the frame pointer, like you mentioned. There is really no rhyme or reason to the convention. Just that ARM reserves certain registers for certain things. Kinda like in x86 esp is the stack pointer.
Here's a pretty good reference for registers to avoid:
http://msdn.microsoft.com/en-us/library/ms253599(v=vs.80).aspx
I finally got it: doing $ arm-linux-gnueabihf-gcc -v showed me the default options my compiler is using. Among those is: --with-mode=thumb.
Compiling with -marm fixed it. Now it's working as intended!
Edit: Upon reading the comments here I realize I was mistaken. I should've saved/restored r7 so it wouldn't screw up the rest of my program. Good thing I learned this now with a toy project and not while working on something real!
I've spent a great deal of time reading the LLVM source tree. It is quite an impressive piece of engineering!
Anyhow, I have been trying to convert some MachO Arm Binaries that I have into the LLVM bitcode for basic static analysis. Mainly, I'd like to create backwards static slices on certain calls depending on which registers are used. Additionally, I am trying to do forward propagation of obvious constants (for instance, loading a function name from the symbol table and passing to a register).
At this point, I have been able dump a file and parse it in native ARM assembly using this command line:
bash-3.2$ llvm-objdump -d ~/code/osx/HelloWorldThin -triple=thumb
-mattr=+thumb2,+32bit,+v7,+v6t2,+thumb-mode,+neon
/Users/steve/code/osx/HelloWorldThin: file format Mach-O arm
Disassembly of section __TEXT,__text:
_main:
2fd4: f0 b5 push {r4, r5, r6, r7, lr}
2fd6: 03 af add r7, sp, #12
2fd8: 4d f8 04 8d str r8, [sp, #-4]!
2fdc: 0d 46 mov r5, r1
2fde: 06 46 mov r6, r0
2fe0: 00 f0 fe ef blx #4092
...snipped...
This is great, as it saves me a bunch of time writing a parser!
After looking through MachODump.cpp, I see that these are lowered to MCInst, which from the way I understand it, is just a parsed opcode with parameters.
So my questions are:
1) Is there a way to convert from ARM to LLVM (for optimization passes, etc)? There is no need to emit back to ARM, only a need to have an analysis result.
1.5) I notice all the analysis operations operate on Instruction instead of MCInst, is there a way to type promote and provide the required information?
2) Is there a way to emulate/simulate ARM or LLVM instructions? I ask because things like slicing and constant propagation need dataflow analysis in order to determine what contents are in memory and registers.
Operations like this, require tracking the way data is loaded and stored from memory, along with registers. Can LLVM understand the side effects of these instructions for analysis?
__text:000032DE LDR R1, [R0] ; "viewDidLoad"
__text:000032E0 MOV R0, SP
__text:000032E2 BLX _objc_msgSendSuper2
3) If it seems like I have a fundamental misunderstanding of something going on in LLVM, I'd love any feedback.
Thanks and let me know if I can provide any more information about my problem.
For the purpose of static analysis of ARM binaries. It's is better to translate the semantics of each ARM instruction directly to LLVM IR and apply data-flow analysis on the later. For example, an ADD rd, rd, rm in ARM can be translated to LLVM IR %rd2 = add i32 %rd1, %rm1.
Decompilation of ARM machine code to C (for the purpose of recompiling it back to LLVM IR) is both cumbersome and unnecessary. Note that the focus of decompilers like IDA Pro is on binary understanding and not on recompilation per se. Therefore, you would have a hard time recompiling the software back, and even harder time linking your analysis results to the original binary.
The following links might be useful:
Fracture is an open source project attempting to directly translate ARM binaries to LLVM IR.
LLBT: is a research project that implemented ARM translation to LLVM IR. Their goal, however, is on static binary rewriting rather than binary analysis.
Note that you need a robust disassembler if you are considering analyzing stripped binaries. objdump can emit too much disassembly errors on binaries without symbols.
I'm in the early phases of a research project where we develop a processor description language that can make describing instruction semantics in LLVM IR easier. I'll update this answer when we have more results.
For (1) - not within the framework of LLVM. There's no "decompiler" in there. You're free to use an external decompiler that translates machine code into C, and then compile that into LLVM IR with clang. YMMV with regards to the quality of such a translation, of course.
(1.5) If I understand what you're asking, then no. Instruction and MCInst are quite different animals, very far apart in their abstraction levels. Read this: http://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm/
(2) Yes, LLVM has an interpreter you can use from the lli tool. It directly "emulates" LLVM IR without lowering it.
I have this very simple code:
#include <stdio.h>
#include <math.h>
int main()
{
long v = 35;
double app = (double)v;
app /= 100;
app = log10(app);
printf("Calculated log10 %lf\n", app);
return 0;
}
This code works perfectly on x86, but doesn't work on arm, on which the result is 0.00000. Some ideas?
Other info:
Operating system: linux 3.2.27
I build arm toolchain with ct-ng: arm-unknown-linux-gnueabi-
libc version 2.13
Output of gcc -v:
Using built-in specs.
COLLECT_GCC=arm-unknown-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/opt/x-tools/arm-unknown-linux-gnueabi/libexec/gcc/arm-unknown-linux-gnueabi/4.5.1/lto-wrapper
Target: arm-unknown-linux-gnueabi
Configured with: /home/mirko/misc/rasppi-ct-ng-files/.build/src/gcc-4.5.1/configure --build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu --target=arm-unknown-linux-gnueabi --prefix=/opt/x-tools/arm-unknown-linux-gnueabi --with-sysroot=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --enable-languages=c --disable-multilib --with-pkgversion=crosstool-NG-1.9.3 --enable-__cxa_atexit --disable-libmudflap --disable-libgomp --disable-libssp --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-gmp=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpfr=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpc=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-ppl=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-cloog=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-libelf=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --enable-threads=posix --enable-target-optspace --with-local-prefix=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --disable-nls --enable-symvers=gnu --enable-c99 --enable-long-long
Thread model: posix
gcc version 4.5.1 (crosstool-NG-1.9.3)
Floating point support on ARM Linux distributions is not trivial. Because of that you should use a toolchain matching your system that is operating system & hardware and use the right compile switches.
First thing you need to understand ARM's calling convention which is about "how arguments are passed when you call a function?". ARM being a RISC architecture, can only work on registers. There are no instructions manipulating memory directly. If you need to change a value in memory you first need to load it to a register, modify it, then you need to store it back on the memory.
When you call a function you may need to pass arguments to it, you can put arguments on stack (memory) but since ARM can only work with registers first thing your function would probably do will be loading them back to registers. To avoid this waste ARM calling convention uses registers to pass arguments. However since ARM has a limited number of registers, calling convention also dictates you to use only first four (r0-r3) registers for the first four arguments, remaining must still use stack to be passed.
Second thing is early ARM cores didn't have any floating point support, operations where implemented in software. (This is what is still supported via gcc's -mfloat-abi=soft.)
We can easily demonstrate what this means via following snippet.
float pi2(float a) {
return a * 3.14f;
}
Compiling this via -c -O3 -mfloat-abi=soft and obdumping gives us
00000000 <pi2>:
0: f24f 51c3 movw r1, #62915 ; 0xf5c3
4: b508 push {r3, lr}
6: f2c4 0148 movt r1, #16456 ; 0x4048
a: f7ff fffe bl 0 <__aeabi_fmul>
e: bd08 pop {r3, pc}
As you can see (actually it is not visible :) ) pi2 gets its parameter in r0, populates pi constant on r1 and uses __aeabi_fmul to multiply those and return result in r0. Since __aeabi_fmul also uses same calling convention, details about r0 is not visible. All our function does to populate r1 and delegate it to __aeabi_fmul.
When floating hardware support added to ARM (again because of architecture style), it came with its own set of registers (s0, s1, ...).
If we compile same snippet with -c -O3 -mfloat-abi=softfp and dump we get
00000000 <pi2>:
0: eddf 7a04 vldr s15, [pc, #16] ; 14 <pi2+0x14>
4: ee07 0a10 vmov s14, r0
8: ee27 7a27 vmul.f32 s14, s14, s15
c: ee17 0a10 vmov r0, s14
10: 4770 bx lr
12: bf00 nop
14: 4048f5c3 .word 0x4048f5c3
As you can see now compiler doesn't create a call to __aeabi_fmul but instead it creates a vmul.f32 instruction after it moves argument located in r0 to s14 and populates 3.14 on s15. After multiplication instruction it moves result available in s14 back to r0 since any caller of this function would expect it because of the calling convention.
Now if you think pi2 as a library provided to you by some third party, you can understand that both soft and softfp implementations do the same thing for you and you can use them interchangeably. If system provides them for you, you wouldn't care if your app runs on a system with hardware floating point support or not. This was quite good to keep old software running on new hardware.
However while keeping compability this approach introduces the overhead of moving values between ARM registers and FP registers. This obviously effects performance and addressed by a new calling convention, called hard by gcc. This new convention states that if you have floating point arguments in your function you can utilize floating point registers interleaved with normal ones, as well as you can return floating point values in floating point register s0.
Again if we compile our snippet with -c -O3 -mfloat-abi=hard and dump we get
00000000 <pi2>:
0: eddf 7a02 vldr s15, [pc, #8] ; c <pi2+0xc>
4: ee20 0a27 vmul.f32 s0, s0, s15
8: 4770 bx lr
a: bf00 nop
c: 4048f5c3 .word 0x4048f5c3
You can see there is no registers getting moved around. Argument to pi2 gets passed in s0, compiler created code to populate 3.14 in s15 and uses vmul.f32 s0, s0, s15 to get result we want in s0.
Big problem with this new convention is while you improve the code produced by compiler you completely kill compability. You can't expect an application built with hard convention to work with libraries built for soft/softfp and an application built for softfp won't work with libraries built for hard.
For more information on calling conventions you should check ARM's website.
I completed a C to MIPS conversion for a class, and I want to check it against the assembly. I have heard that there is a way of configuring gcc so that it can convert C code to the MIPS architecture rather than the x86 architecture (my computer users an Intel i5 processor) and prints the output.
Running the terminal in Ubuntu (which comes with gcc), what command do I use to configure gcc to convert to MIPS? Is there anything I need to install as well?
EDIT:
Let me clarify. Please read this.
I'm not looking for which compiler to use, or people saying "well you could cross-compile, but instead you should use this other thing that has no instructions on how to set up."
If you're going to post that, at least refer me to instructions. GCC came with Ubuntu. I don't have experience on how to install compilers and it's not easy finding online tutorials for anything other than GCC. Then there's the case of cross-compiling I need to know about as well. Thank you.
GCC can produce assembly code for a large number of architectures, include MIPS. But what architecture a given GCC instance targets is decided when GCC itself is compiled. The precompiled binary you will find in an Ubuntu system knows about x86 (possibly both 32-bit and 64-bit modes) but not MIPS.
Compiling GCC with a target architecture distinct from the architecture on which GCC itself will be running is known as preparing a cross-compilation toolchain. This is doable but requires quite a bit of documentation-reading and patience; you usually need to first build a cross-assembler and cross-linker (GNU binutils), then build the cross-GCC itself.
I recommend using buildroot. This is a set of scripts and makefiles designed to help with the production of a complete cross-compilation toolchain and utilities. At the end of the day, you will get a complete OS and development tools for a target system. This includes the cross-compiler you are after.
Another quite different solution is to use QEMU. This is an emulator for various processors and systems, including MIPS systems. You can use it to run a virtual machine with a MIPS processor, and, within that machine, install an operating system for MIPS, e.g. Debian, a Linux distribution. This way, you get a native GCC (a GCC running on a MIPS system and producing code for MIPS).
The QEMU way might be a tad simpler; using cross-compilation requires some understanding of some hairy details. Either way, you will need about 1 GB of free disk space.
It's not a configuration thing, you need a version of GCC that cross-compiles to MIPS. This requires a special GCC build and is quite hairy to set up (building GCC is not for the faint of heart).
I'd recommend using LCC for this. It's way easier to do cross-compilation with LCC than it is with GCC, and building LCC is a matter of seconds on current machines.
For a one-time use for a small program or couple functions, you don't need to install anything locally.
Use Matt Godbolt's compiler explorer site, https://godbolt.org/, which has GCC and clang for various ISAs including MIPS and x86-64, and some other compilers.
Note that the compiler explorer by default filters directives so you can just see the instructions, leaving out stuff like alignment, sections, .globl, and so on. (For a function with no global / static data, this is actually fine, especially when you just want to use a compiler to make an example for you. The default section is .text anyway, if you don't use any directives.)
Most people that want MIPS asm for homework are using SPIM or MARS, usually without branch-delay slots. (Unlike real MIPS, so you need to tweak the compiler to not take advantage of the next instruction after a branch running unconditionally, even when it's taken.) For GCC, the option is -fno-delayed-branch - that will fill every delay slot with a NOP, so the code will still run on a real MIPS. You can just manually remove all the NOPs.
There may be other tweaks needed, like MARS may require you to use jr $31 instead of j $31, Tweak mips-gcc output to work with MARS. And of course I/O code will have to be implemented using MARS's toy system calls, not jal calls to standard library functions like printf or std::ostream::operator<<. You can usefully compile (and hand-tweak) asm for manipulating data, like multiplying integers or summing or reversing an array, though.
Unfortunately GCC doesn't have an option to use register names like $a0 instead of $r. For PowerPC there's -mregnames to use r1 instead of 1, but no similar option for MIPS to use "more symbolic" reg names.
int maybe_square(int num) {
if (num>0)
return num;
return num * num;
}
On Godbolt with GCC 5.4 -xc -O3 -march=mips32r2 -Wall -fverbose-asm -fno-delayed-branch
-xc compiles as C, not C++, because I find that more convenient than flipping between the C and C++ languages in the dropdown and having the site erase my source code.
-fverbose-asm comments the asm with C variable names for the destination and sources. (In optimized code that's often an invented temporary, but not always.)
-O3 enables full optimization, because the default -O0 debug mode is a horrible mess for humans to read. Always use at least -Og if you want to look at the code by hand and see how it implements the source. How to remove "noise" from GCC/clang assembly output?. You might also use -fno-unroll-loops, and -fno-tree-vectorize if compiling for an ISA with SIMD instructions.
This uses mul instead of the classic MIPS mult + mflo, thanks to the -march= option to tell GCC we're compiling for a later MIPS ISA, not whatever the default baseline is. (Perhaps MIPS I aka R2000, -march=mips1)
See also the GCC manual's section on MIPS target options.
# gcc 5.4 -O3
square:
blez $4,$L5
nop
move $2,$4 # D.1492, num # retval = num
j $31 # jr $ra = return
nop
$L5:
mul $2,$4,$4 # D.1492, num, num # retval = num * num
j $31 # jr $ra = return
nop
Or with clang, use -target mips to tell it to compile for MIPS. You can do this on your desktop; unlike GCC, clang is normally built with multiple back-ends enabled.
From the same Godbolt link, clang 10.1 -xc -O3 -target mips -Wall -fverbose-asm -fomit-frame-pointer. The default target is apparently MIPS32 or something like that for clang. Also, clang defaults to enabling frame pointers for MIPS, making the asm noisy.
Note that it chose to make branchless asm, doing if-conversion into a conditional-move to select between the original input and the mul result. Unfortunately clang doesn't support -fno-delayed-branch; maybe it has another name for the same option, or maybe there's no hope.
maybe_square:
slti $1, $4, 1
addiu $2, $zero, 1
movn $2, $4, $1 # conditional move based on $1
jr $ra
mul $2, $2, $4 # in the branch delay slot
In this case we can simply put the mul before the jr, but in other cases converting to no-branch-delay asm is not totally trivial. e.g. branch on a loop counter before decrementing it can't be undone by putting the decrement first; that would change the meaning.
Register names:
Compilers use register numbers, not bothering with names. For human use, you will often want to translate back. Many places online have MIPS register tables that show how $4..$7 are $a0..$a3, $8 .. $15 are $t0 .. $t7, etc. For example this one.
You should install a cross-compiler from the Ubuntu repositories. GCC MIPS C cross-compilers are available in the repositories. Pick according to your needs:
gcc-mips-linux-gnu - 32-bit big-endian.
gcc-mipsel-linux-gnu - 32-bit little-endian.
gcc-mips64-linux-gnuabi64 - 64-bit big-endian.
gcc-mips64el-linux-gnuabi64 - 64-bit little-endian.
etc.
(Note for users of Ubuntu 20.10 (Groovy Gorilla) or later, and Debian users: if you usually like to install your regular compilers using the build-essential package, you would be interested to know of the existence of crossbuild-essential-mips, crossbuild-essential-mipsel, crossbuild-essential-mips64el, etc.)
In the following examples, I will assume that you chose the 32-bit little-endian version (sudo apt-get install gcc-mipsel-linux-gnu). The commands for other MIPS versions are similar.
To deal with MIPS instead of the native architecture of your system, use the mipsel-linux-gnu-gcc command instead of gcc. For example, mipsel-linux-gnu-gcc -fverbose-asm -S myprog.c produces a file myprog.s containing MIPS assembly.
Another way to see the MIPS assembly: run mipsel-linux-gnu-gcc -g -c myprog.c to produce an object file myprog.o that contains debugging information. Then view the disassembly of the object file using mipsel-linux-gnu-objdump -d -S myprog.o. For example, if myprog.c is this:
#include <stdio.h>
int main()
{
int a = 1;
int b = 2;
printf("The answer is: %d\n", a + b);
return 0;
}
And if it is compiled using mipsel-linux-gnu-gcc -g -c myprog.c, then mipsel-linux-gnu-objdump -d -S myprog.o will show something like this:
myprog.o: file format elf32-tradlittlemips
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main() {
0: 27bdffd8 addiu sp,sp,-40
4: afbf0024 sw ra,36(sp)
8: afbe0020 sw s8,32(sp)
c: 03a0f025 move s8,sp
10: 3c1c0000 lui gp,0x0
14: 279c0000 addiu gp,gp,0
18: afbc0010 sw gp,16(sp)
int a = 1;
1c: 24020001 li v0,1
20: afc20018 sw v0,24(s8)
int b = 2;
24: 24020002 li v0,2
28: afc2001c sw v0,28(s8)
printf("The answer is: %d\n", a + b);
2c: 8fc30018 lw v1,24(s8)
30: 8fc2001c lw v0,28(s8)
34: 00621021 addu v0,v1,v0
38: 00402825 move a1,v0
3c: 3c020000 lui v0,0x0
40: 24440000 addiu a0,v0,0
44: 8f820000 lw v0,0(gp)
48: 0040c825 move t9,v0
4c: 0320f809 jalr t9
50: 00000000 nop
54: 8fdc0010 lw gp,16(s8)
return 0;
58: 00001025 move v0,zero
}
5c: 03c0e825 move sp,s8
60: 8fbf0024 lw ra,36(sp)
64: 8fbe0020 lw s8,32(sp)
68: 27bd0028 addiu sp,sp,40
6c: 03e00008 jr ra
70: 00000000 nop
...
You would need to download the source to binutils and gcc-core and compile with something like ../configure --target=mips .... You may need to choose a specific MIPS target. Then you could use mips-gcc -S.
You can cross-compile the GCC so that it generates MIPS code instead of x86. That's a nice learning experience.
If you want quick results you can also get a prebuilt GCC with MIPS support. One is the CodeSourcery Lite Toolchain. It is free, comes for a lot of architectures (including MIPS) and they have ready to use binaries for Linux and Windows.
http://www.codesourcery.com/sgpp/lite/mips/portal/subscription?#template=lite
You should compile your own version of gcc which is able to cross-compile. Of course this ain't easy, so you could look for a different approach.. for example this SDK.