Calling Mips from C - c

I'd like to call assembly (specifically MIPS) code from my C program and call the C back from the assembly.
I've decided on the GNU GCC as my compiler, (I am also guessing I need an emulator?)
I'm on a x86 Win 7 machine.
There are some things that are very unclear to me how this can/should work out.
If MIPS will be using a load-store archi with 32 regs and the C will continue to use a register memory archi because I'm on x86?
Now that I want to call mips assembly instead of x86 assembly, can/do I still use asm() ?
If MIPS uses more registers than C, will I be able to access those registers from my C code?
Can anyone help me out with this, perhaps by pointing out where I could learn this bit of sorcery?
Thanks
Disclaimer: I am working on a verification of self modifying code project for credit in school, and this code is going to be used as an example, but I am not getting any credit for this code.

The most common MIPS calling convention is described here. The easiest way to write a C-callable assembly routine is to write a skeleton for the routine in C, and then copy the assembly code output from the compiler into your assembly source (use gcc's -S option). Say you want to call an assembler function defined in C as int foo(int a, int b). You would write a simple version of that function in C. For example, put the following into foo.c:
int foo(int a, int b) {
return a+b; // some simple code to access all arguments and the return value
}
Then you would compile that function using a MIPS cross compiler (see below) using the -S and the -O0 option to gcc which will produce a text output file foo.S giving you MIPS assembler source code to access the arguments for function foo and showing you where to put the return value. Simply copy that source file into your own assembler source, and add the assembler calculations you need to compute foo.
Calling C from assembly is straightforward once you have calling in the other direction figured out.
You can download a free MIPS gcc cross compiler tool chain from Mentor Graphics (formerly Codesourcery).
You can download a free, fully functional (it boots and runs Linux) MIPS simulator from here. Don't use SPIM or MARS, since they do not completely model the MIPS architecture.

Related

Why is the assembly code generate by hello world in C not having a .code segment nor model tiny like x86 assembly does?

I'm learning assembly for 80x86 this semester. A typical asm file I write looks something like
.model tiny
.486
.data
#initializations
.code
.startup
#actual code
.exit
end
I was expecting a similar format when I created a .s file for a simple hello world. But I don't see any of the segments with their proper names and it's all very different. I compile using g++ -S -O0 hello.c
Why is the assembly for c so different than the assembly they make us write in class? Is the assembly I'm learning used by a different programming language? If I want to get the assembly version (that I'm used to) of hello world from some higher-level code, how do I do that?
The code does not match your command line. That is neither C (file name) nor C++ code (command line). That is assembly language.
Assembly language varies by tool (masm, tasm, nasm, gas, etc), and is not expected to be compatible nor standard in any way. Not talking about just intel vs at&t, all of the code, and this applies to all targets not just x86, easily seen with ARM and others.
You should try to use the assembler not a C nor C++ compiler as that creates yet another assembly language even though gcc for example will pass the assembly language on to gas it can pre-process it through the C preprocessor creating yet another programming language that is incompatible with the gnu assembler it is fed to.
x86 is the last if ever assembly language/instruction set you want to learn, if you are going to learn it then starting with the 8086/88 is IMO the preferred way, much more understandable despite the nuances. Since this appears to be a class you are stuck with this ISA and cannot chose a better first instruction set. (first, second, third...)
Very much within the x86 world, but also for any other target, expect that the language is incompatible between tools and if it happens to work or mostly work that is a bonus. Likewise there is no reason to assume that any tool will have a "masm compatible" or other mode, simply stating intel vs at&t is only a fraction of the language problem and is in no way expected to make the code port between tools.
Re-write the code for the assembly language used for the assembler is the bottom line.

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

How to write inline Assembly with Turbo C 2.01?

I want to write some inline assembly in a DOS program which is compiled using Turbo C 2.01. When I write
asm {
nop
}
the compiler claims that in-line assembly is not allowed in function .... See:
Any ideas?
See the Turbo C user manual page 430:
Inline assembly not allowed
Your source file contains inline assembly language statements and you are compiling it from within the
Integrated Environment. You must use the TCC command to compile this
source file.
I believe that you need also to pass the -B option to TCC (page 455).
Alternatively you can use __emit__ (page 103) for relatively simple code entered as machine code rather than assembler mnemonics.
It seems an odd restriction to not allow inline assembly in the IDE. You might consider "upgrading" to Turbo C++ 3.0 which I believe does allow it. I would imagine that TC++ will compile C code when presented with a .c file, or that the IDE can be set to compile C explicitly. There's a manual for that too.
Turbo C converts C code directly into machine code without using an assembler phase, and thus cannot include assembly language source within a program. What it can do, however, is use the __emit directive to insert machine code. The cleanest way to use that is probably to use a separate assembler (or perhaps DEBUG) to process the code of interest by itself into a COM file, and then enter the byte values therein into an __emit directive. Parameters are stored in ascending order left to right, starting at either BP+4 (in tiny, small, or compact model) or BP+6 (medium, large, or huge). Local variables are stored at addresses below BP.
When using Turbo Pascal, it's possible to use a handy program called "inline assembler" to convert assembly-language source into a Turbo Pascal literal-code directive. Turbo Pascal's directive is formatted differently from C's (I like Pascal's better) and can accommodate labels in ways Turbo C's cannot. Still, using __emit may have far less impact on build times than trying to use inline assembly code.

How does C code call assembly code (e.g. optimized strlen)?

I always read things about how certain functions within the C programming language are optimized by being written in assembly. Let me apologize if that sentence sounds a little misguided.
So, I'll put it clearly: How is it that when you call some functions like strlen on UNIX/C systems, the actual function you're calling is written in assembly? Can you write assembly right into C programs somehow or is it an external call situation? Is it part of the C standard to be able to do this, or is it an operating system specific thing?
The C standard dictates what each library function must do rather than how it is implemented.
Almost all known implementations of C are compiled into machine language. It is up to the implementers of the C compiler/library how they choose to implement functions like strlen. They could choose to implement it in C and compile it to an object, or they could choose to write it in assembly and assemble it to an object. Or they could implement it some other way. It doesn't matter so long as you get the right effect and result when you call strlen.
Now, as it happens, many C toolsets do allow you to write inline assembly, but that is absolutely not part of the standard. Any such facilties have to be included as extensions to the C standard.
At the end of the road compiled programs and programs in assembly are all machine language, so they can call each other. The way this is done is by having the assembly code use the same calling conventions (way to prepare for a call, prepare parameters and such) as the program written in C. An overview of popular calling conventions for x86 processors can be found here.
Many (most?) C compilers do happen to support inline assembly, though it's not part of the standard. That said, there's no strict need for a compiler to support any such thing.
First, recognize that assembly is mostly just human (semi-)readable machine code, and that C ends up as machine code anyway.
"Calling" a C function just generates a set of instructions that prepare registers, the stack, and/or some other machine-dependent mechanism according to some established calling convention, and then jumps to the start of the called function.
A block of assembly code can conform to the appropriate calling convention, and thus generate a blob of machine code that another blob of machine code that was originally written in C is able to call. The reverse is, of course, also possible.
The details of the calling convention, the assembly process, and the linking process (to link the assembly-generated object file with the C-generated object file) may all vary wildly between platforms, compilers, and linkers. A good assembly tutorial for your platform of choice will probably cover such details.
I happen to like the x86-centric PC Assembly Tutorial, which specifically addresses interfacing assembly and C code.
When C code is compiled by gcc, it's first compiled to assembler instructions, which are then again compiled to a binary, machine-executable file. You can see the generated assembler instructions by specifying -S, as in gcc file.c -S.
Assembler code just passes the first stage of C-to-assembler compilation and is then indistinguishable from code compiled from C.
One way to do it is to use inline assembler. That means you can write assembler code directly into your C code. The specific syntax is compiler-specific. For example, see GCC syntax and MS Visual C++ syntax.
You can write inline assembly in your C code. The syntax for this is highly compiler specific but the asm keyword is ususally used. Look into inline assembly for more information.

Using another assembler (MASM, NASM, TASM, etc.) with GCC

I've been looking through questions on here and the internet for a while now and I cannot seem to find out whether or not it is possible to do inline assembly with GCC using something other than GAS. I am trying to find if I can avoid using not only GAS's AT&T syntax (though, I know how to use Intel syntax with GAS) but the extended asm format. While this is not for a project or anything other than my own curiosity, I would really appreciate any help I can get (this is actually my first question here because I could not find an answer about it)! Also, if this makes any difference, I'm currently using DevC++ (for C code, not C++) on Windows.
Thanks,
Tom
You can link the output from an assembler (a ".o" or ".obj" file) with your C or C++ program. Put your assembler code in a text file. Your IDE or makefile will assemble it just as it would any c source file. The only tricky bit is learning how to interface between the two different systems.
You cannot use another inline assembly syntax with GCC. inline assembly is implemented by GCC literally including the assembly you write inline with its own (textual) assembly output, which it then sends to gas to be assembled. Since GCC doesn't know how to change the format of its own output to feed to another assembler, you can't change the inline assembly, either.

Resources