merging assembly and C in mplab

merging assembly and C in mplab - c

I want use a procedure written in assembly for PIC in my c code in MPLABX. is there a way I can do this. I have searched over the internet but can't find anything helpful on this.

If you are using a 16-bit PIC, see 8.3 MIXING ASSEMBLY LANGUAGE AND C VARIABLES AND FUNCTIONS in MPLAB® C30 User’s Guide.
EXAMPLE 8-2: CALLING AN ASSEMBLY FUNCTION IN C
/*
** file: call1.c
*/
extern int asmFunction(int, int);
int x;
void main(void)
{
x = asmFunction(0x100, 0x200);
}
The assembly-language function sums its two parameters and returns the result.
;
; file: call2.s
;
.global _asmFunction
_asmFunction:
add w0,w1,w0
return
.end
Parameter passing in C is detailed in Section 4.12.2 “Return Value”. In the preceding
example, the two integer arguments are passed in the W0 and W1 registers. The
integer return result is transferred via register W0. More complicated parameter lists
may require different registers and care should be taken in the hand-written assembly
to follow the guidelines.

Related

Specifying registers for function arguments?

Some compilers, says old gcc or egcs, apply ABI-breaking optimization for static functions within single file, like passing arguments or returning results with arbitrary registers.
Consider some source code like:
// Original foobar.c
// This example targets MIPS o32 ABI.
// Shared subroutine
// Compiler decided to use $16, $17 to pass a0 and a1 to minimize stack usage and move between registers.
static void __bar(int a0, int a1) {
// Something very complicated
}
// ...
void foo(int a0, int a1) {
// ...
/*
This call was compiled to something like:
ori $16, $0, 0x1
jal __bar
ori $17, $0, 0x1
*/
__bar(1, 1);
// ...
}
// ...
Suppose someone want to restore / reimplement foobar.c from the compiled assembly without access to the original source.
One would probably like to decompile / rewrite some part first, says start from foo() or other standard functions.
However, in order to test the correctness of the implementation, one must deal with calls to non-standard ABI routines.
A trivial way is to workaround with global register variables provided by gcc / clang:
// Restoration of foobar.c
// void __bar(int asm("s0"), int asm("s1"))
// External function in assembly, says foobar.s, which is from compiled original foobar.c.
void __bar();
volatile register int s0 asm ("s0"); // $16 = s0
volatile register int s1 asm ("s1"); // $17 = s1
// ...
void foo(int a0, int a1) {
// ...
// __bar(1, 1);
s0 = 1; s1 = 1;
__bar();
// ...
}
// ...
The question is:
Does gcc / clang supports customize calling convention for some specific functions?
Are there any way to deal with non-standard ABI calls more elegantly?

Does gcc / clang supports customize calling convention for some specific functions?
The best you can do is opt in to one of the specific supported calling conventions, e.g. one of these for x86. If the static function in question does not conform to any of them, then you're stuck.
Are there any way to deal with non-standard ABI calls more elegantly?
Nothing truly elegant. If none of the supported calling conventions apply, you're stuck with either:
Reversing & rebuilding the whole thing (so it can compile as normal without relying on original binaries), or at least enough of it that you're replacing ABI conforming functions and all their dependencies completely, or
Calling it from assembly, explicitly passing the arguments per the non-standard calling conventions of the compiled function.
#2 is the basis for the most elegant solution, which is basically to write a wrapper function in assembly that receives the arguments and returns the values according to the ABI, and otherwise does nothing but rearrange them to pass to the non-standard function it wraps (and possibly fix up the return value if it's not returning according to normal rules). You write the wrapper(s) once, and now the rest of your code can be written in C, calling the wrapper functions which adhere to the ABI and being blissfully unaware of the weirdness under the covers.
Similarly, if you're trying to replace the existing non-standard function with another, you'd write the non-conforming wrapper in assembly, then write your replacement function in plain C and have the wrapper call it, and swap in your wrapper in your hacked together mix of the original binary and the new code.

c library x86/x64 assembler

Is there a C library for assembling a x86/x64 assembly string to opcodes?
Example code:
/* size_t assemble(char *string, int asm_flavor, char *out, size_t max_size); */
unsigned char bytes[32];
size_t size = assemble("xor eax, eax\n"
"inc eax\n"
"ret",
asm_x64, &bytes, 32);
for(int i = 0; i < size; i++) {
printf("%02x ", bytes[i]);
}
/* Output: 31 C0 40 C3 */
I have looked at asmpure, however it needs modifications to run on non-windows machines.
I actually both need an assembler and a disassembler, is there a library which provides both?

There is a library that is seemingly a ghost; its existance is widely unknown:
XED (X86 Encoder Decoder)
Intel wrote it: https://software.intel.com/sites/landingpage/pintool/docs/71313/Xed/html/
It can be downloaded with Pin: https://software.intel.com/en-us/articles/pintool-downloads

Sure - you can use llvm. Strictly speaking, it's C++, but there are C interfaces. It will handle both the assembling and disassembling you're trying to do, too.

Here you go:
http://www.gnu.org/software/lightning/manual/lightning.html
Gnu Lightning is a C library which is designed to do exactly what you want. It uses a portable assembly language though, rather than x86 specific one. The portable assembly is compiled in run time to a machine specific one in a very straightforward manner.
As an added bonus, it is much smaller and simpler to start using than LLVM (which is rather big and cumbersome).

You might want libyasm (the backend YASM uses). You can use the frontends as examples (most particularly, YASM's driver).

I'm using fasm.dll: http://board.flatassembler.net/topic.php?t=6239
Don't forget to write "use32" at the beginning of code if it's not in PE format.

Keystone seems like a great choice now, however it didn't exist when I asked this question.

Write the assembly into its own file, and then call it from your C program using extern. You have to do a little bit of makefile trickery, but otherwise it's not so bad.
Your assembly code has to follow C conventions, so it should look like
global _myfunc
_myfunc: push ebp ; create new stack frame for procedure
mov ebp,esp ;
sub esp,0x40 ; 64 bytes of local stack space
mov ebx,[ebp+8] ; first parameter to function
; some more code
leave ; return to C program's frame
ret ; exit
To get at the contents of C variables, or to declare variables which C can access, you need only declare the names as GLOBAL or EXTERN. (Again, the names require leading underscores.) Thus, a C variable declared as int i can be accessed from assembler as
extern _i
mov eax,[_i]
And to declare your own integer variable which C programs can access as extern int j, you do this (making sure you are assembling in the _DATA segment, if necessary):
global _j
_j dd 0
Your C code should look like
extern void myasmfunc(variable a);
int main(void)
{
myasmfunc(a);
}
Compile the files, then link them using
gcc mycfile.o myasmfile.o

How to access C variable for inline assembly manipulation?

Given this code:
#include <stdio.h>
int main(int argc, char **argv)
{
int x = 1;
printf("Hello x = %d\n", x);
}
I'd like to access and manipulate the variable x in inline assembly. Ideally, I want to change its value using inline assembly. GNU assembler, and using the AT&T syntax.

In GNU C inline asm, with x86 AT&T syntax:
(But https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it).
// this example doesn't really need volatile: the result is the same every time
asm volatile("movl $0, %[some]"
: [some] "=r" (x)
);
after this, x contains 0.
Note that you should generally avoid mov as the first or last instruction of an asm statement. Don't copy from %[some] to a hard-coded register like %%eax, just use %[some] as a register, letting the compiler do register allocation.
See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html and https://stackoverflow.com/tags/inline-assembly/info for more docs and guides.
Not all compilers support GNU syntax.
For example, for MSVC you do this:
__asm mov x, 0 and x will have the value of 0 after this statement.
Please specify the compiler you would want to use.
Also note, doing this will restrict your program to compile with only a specific compiler-assembler combination, and will be targeted only towards a particular architecture.
In most cases, you'll get as good or better results from using pure C and intrinsics, not inline asm.

asm("mov $0, %1":"=r" (x):"r" (x):"cc"); -- this may get you on the right track. Specify register use as much as possible for performance and efficiency. However, as Aniket points out, highly architecture dependent and requires gcc.

Is it possible to access 32-bit registers in C?

Is it possible to access 32-bit registers in C ? If it is, how ? And if not, then is there any way to embed Assembly code in C ? I`m using the MinGW compiler, by the way.
Thanks in advance!

If you want to only read the register, you can simply:
register int ecx asm("ecx");
Obviously it's tied to instantiation.
Another way is using inline assembly. For example:
asm("movl %%ecx, %0;" : "=r" (value) : );
This stores the ecx value into the variable value. I've already posted a similar answer here.

Which registers do you want to access?
General purpose registers normally can not be accessed from C. You can declare register variables in a function, but that does not specify which specific registers are used. Further, most compilers ignore the register keyword and optimize the register usage automatically.
In embedded systems, it is often necessary to access peripheral registers (such as timers, DMA controllers, I/O pins). Such registers are usually memory-mapped, so they can be accessed from C...
by defining a pointer:
volatile unsigned int *control_register_ptr = (unsigned int*) 0x00000178;
or by using pre-processor:
#define control_register (*(unsigned int*) 0x00000178)
Or, you can use Assembly routine.
For using Assembly language, there are (at least) three possibilities:
A separate .asm source file that is linked with the program. The assembly routines are called from C like normal functions. This is probably the most common method and it has the advantage that hw-dependent functions are separated from the application code.
In-line assembly
Intrinsic functions that execute individual assembly language instructions. This has the advantage that the assembly language instruction can directly access any C variables.

You can embed assembly in C
http://en.wikipedia.org/wiki/Inline_assembler
example from wikipedia
extern int errno;
int funcname(int arg1, int *arg2, int arg3)
{
int res;
__asm__ volatile(
"int $0x80" /* make the request to the OS */
: "=a" (res) /* return result in eax ("a") */
"+b" (arg1), /* pass arg1 in ebx ("b") */
"+c" (arg2), /* pass arg2 in ecx ("c") */
"+d" (arg3) /* pass arg3 in edx ("d") */
: "a" (128) /* pass system call number in eax ("a") */
: "memory", "cc"); /* announce to the compiler that the memory and condition codes have been modified */
/* The operating system will return a negative value on error;
* wrappers return -1 on error and set the errno global variable */
if (-125 <= res && res < 0) {
errno = -res;
res = -1;
}
return res;
}

I don't think you can do them directly. You can do inline assembly with code like:
asm (
"movl $0, %%ebx;"
"movl $1, %%eax;"
);

If you are on a 32-bit processor and using an adequate compiler, then yes. The exact means depends on the particular system and compiler you are programming for, and of course this will make your code about as unportable as can be.
In your case using MinGW, you should look at GCC's inline assembly syntax.

You can of course. "MinGW" (gcc) allows (as other compilers) inline assembly, as other answers already show. Which assembly, it depends on the cpu of your system (prob. 99.99% that it is x86). This makes however your program not portable on other processors (not that bad if you know what you are doing and why).
The relevant page talking about assembly for gcc is here and here, and if you want, also here. Don't forget that it can't be specific since it is architecture-dependent (gcc can compile for several cpus)

there is generally no need to access the CPU registers from a program written in a high-level language: high-level languages, like C, Pascal, etc. where precisely invented in order to abstract the underlying machine and render a program more machine-independent.
i suspect you are trying to perform something but have no clue how to use a conventional way to do it.
many access to the registers are hidden in higher-level constructs or in system or library calls which lets you avoid coding the "dirty-part". tell us more about what you want to do and we may suggest you an alternative.

Help me understand this C code ((void() ()) scode) ()

Source: http://milw0rm.org/papers/145
#include <stdio.h>
#include <stdlib.h>
int main()
{
char scode[]="\x31\xc0\xb0\x01\x31\xdb\xcd\x80";
(*(void(*) ()) scode) ();
}
This papers is tutorial about shellcode on Linux platform, however it did not explain how the following statement "(*(void(*) ()) scode) ();" works. I'm using the book "The C Language Programming Reference, 2ed by Brian.W.Kernighan, Dennis.M.Ritchie" to lookup for an answer but found no answer. May someone can point to the right directions, maybe a website, another C reference book where I can find an answer.

Its machine code (compiled assembly instructions) in scode then it casts to a callable void function pointer and calls it. GMan demonstrated an equivalent, clearer approach:
typedef void(*void_function)(void);
int main()
{
char scode[]="\x31\xc0\xb0\x01\x31\xdb\xcd\x80";
void_function f = (void_function)scode;
f(); //or (*f)();
}
scode contains x86 machine code which disassembles into (thanks Michael Berg)
31 c0 xor %eax,%eax
b0 01 mov $0x1,%al
31 db xor %ebx,%ebx
cd 80 int $0x80
This is the code for a system call in Linux (interrupt 0x80). According to the system call table, this is calling the sys_exit() system call (eax=1) with parameter 0 (in ebx). This causes the process to exit immediately, as if it called _exit(0).
Jonathan Leffler pointed out that this is most commonly used to call shellcode, "a small piece of code used as the payload in the exploitation of a software vulnerability." Thus, modern OSes take measures to prevent this.
If the stack is non-executable, this code will fail horribly. The shell code is loaded into a local variable in the stack, and then we jump to that location. If the stack is non-executable, then a CPU fault of some kind will occur as soon as the CPU tries to execute the code, and control will be shifted into the kernel's interrupt handlers. The kernel will then kill the process in an abnormal fashion. One case where the stack might be non-executable would be if you're running on a CPU that supports Physical Address Extensions, and you have the NX (non-executable) bit set in your page tables.
There may also be instruction cache issues on some CPUs -- if the instruction cache hasn't been flushed, the CPU may read stale data (instead of the shell code we explicitly loaded into the stack) and start executing random instructions.

In C:
(some_type) some_var
casts some_var to be of type some_type.
In your code sample "void(*) ()" is the some_type and is the signature for a function pointer that takes no arguments and returns nothing.
"(void(*) ()) scode" casts scode to be a function pointer.
"(*(void(*) ()) scode)" dereferences that function pointer.
And the final () calls the function defined in scode.
And the bytes in scode disassemble to the following i386 assembly:
31 c0 xor %eax,%eax
b0 01 mov $0x1,%al
31 db xor %ebx,%ebx
cd 80 int $0x80

What this code does is assign some machine code (the bytes in scode) then it converts the address of that code into a function pointer of type void function () then calls it.
In C/C++, this function's type definition is expressed:
typedef void (* basicFunctionPtr) (void);

A typedef helps:
// function that takes and returns nothing
typedef void(*generic_function)(void);
// cast to function
generic_function f = (generic_function)scode;
// call
(*f)();
// same thing written differently:
// call
f();

scode is an address. (void(*)()) casts scode to a function returning void and accepting no parameters. The leading * calls the function pointer, and the trailing () indicates that no arguments are given to the function.

To learn a lot more about shell-coding technique, look at the book:
The Shellcoder's Handbook, 2nd Edn
There are several other similar books as well - I think this is the best, but could be persuaded otherwise. You can also find numerous related resources with Google and "shellcoder's handbook" (or your search engine of choice, no doubt).

The character array contains executable code and the cast is a function cast.
(*(void(*) ()) means "cast to a function pointer that produces void, i.e. nothing. The () after the name is the function call operator.

The characters encoded in scode are the char/byte representations of some compiled assembly code. The code you have posted takes that assembly, encoded as characters for simplicity, and then calls that string as a function.
The assembly seems to translate out to:
xor %eax,
%eax mov $0x1,
%al xor %ebx,
%ebx int $0x80
Yup, that would indeed create a shell in Linux.