Assembly analyzing system() function called in C - c

So I made a very simple C program to study how C works on the inside. It has just 1 line in the main() excluding return 0:
system("cls");
If I use ollydebugger to analyze this program It will show something like this(text after the semicolons are comments generated by ollydebugger.
MOV DWORD PTR SS:[ESP],test_1.004030EC ; ||ASCII "cls"
CALL <JMP.&msvcrt.system> ; |\system
Can someone explain what this means, and if I want to change the "cls" called in the system() to another command, where is the "cls" stored? And how do I modify it?

You are using 32 bit Windows system, with its corresponding ABI (the assumptions used when functions are called).
MOV DWORD PTR SS:[ESP],test_1.004030EC
Is equivalent to a push 4030ech instruction, that simply store the address of the string cls on the stack.
This is the way parameters are passed to functions and tell us that the string cls is at address 4030ech.
CALL <JMP.&msvcrt.system> ; |\system
This is the call to the system function from the CRT.
The JMP in the name is due how linking works by default with Visual Studio compilers and linkers.
So those two lines are simply passing the address of the string to the system function.
If you want do modify it you need to check if it is in a writable section (I think is not) by checking the PE Sections, your debugger may have a tool for that. Or you could just try anyway the following:
Inspect the memory at 4030ech, you will see the string, try editing it (this is debugger dependent).
Note: I use the TASM notation for hex numbers, i.e. 123h means 0x123 in C notation.

Related

Getting stdout pointer in assembly

I want to call c function 'fputc', so that I need FILE pointer (in my case it's stdout). I know that I can use putc equivalent, but I'm curious if it is even possible to get pointer to stdout in asm. In c or c++ I would write something like 'stdout' or '&_iob[1]'. Is it even possible to get this working without replacing 'fputc' with 'putc'? (Target architecture is Windows on x86-32).
Example code (I want to print 'A'):
push <the value I need>
push 0x41
call _fputc
add esp, 8
You call __iob_func from MSVCRT.dll, it returns the iob struct and you add 0x30 to it and that's the address for stdout.

GCC Inline-Assembly Error: "Operand size mismatch for 'int'"

first, if somebody knows a function of the Standard C Library, that prints
a string without looking for a binary zero, but requires the number of characters to draw, please tell me!
Otherwise, I have this problem:
void printStringWithLength(char *str_ptr, int n_chars){
asm("mov 4, %rax");//Function number (write)
asm("mov 1, %rbx");//File descriptor (stdout)
asm("mov $str_ptr, %rcx");
asm("mov $n_chars, %rdx");
asm("int 0x80");
return;
}
GCC tells the following error to the "int" instruction:
"Error: operand size mismatch for 'int'"
Can somebody tell me the issue?
There are a number of issues with your code. Let me go over them step by step.
First of all, the int $0x80 system call interface is for 32 bit code only. You should not use it in 64 bit code as it only accepts 32 bit arguments. In 64 bit code, use the syscall interface. The system calls are similar but some numbers are different.
Second, in AT&T assembly syntax, immediates must be prefixed with a dollar sign. So it's mov $4, %rax, not mov 4, %rax. The latter would attempt to move the content of address 4 to rax which is clearly not what you want.
Third, you can't just refer to the names of automatic variables in inline assembly. You have to tell the compiler what variables you want to use using extended assembly if you need any. For example, in your code, you could do:
asm volatile("mov $4, %%eax; mov $1, %%edi; mov %0, %%esi; mov %2, %%edx; syscall"
:: "r"(str_ptr), "r"(n_chars) : "rdi", "rsi", "rdx", "rax", "memory");
Fourth, gcc is an optimizing compiler. By default it assumes that inline assembly statements are like pure functions, that the outputs are a pure function of the explicit inputs. If the output(s) are unused, the asm statement can be optimized away, or hoisted out of loops if run with the same inputs.
But a system call like write has a side-effect you need the compiler to keep, so it's not pure. You need the asm statement to run the same number of times and in the same order as the C abstract machine would. asm volatile will make this happen. (An asm statement with no outputs is implicitly volatile, but it's good practice to make it explicit when the side effect is the main purpose of the asm statement. Plus, we do want to use an output operand to tell the compiler that RAX is modified, as well as being an input, which we couldn't do with a clobber.)
You do always need to accurately describe your asm's inputs, outputs, and clobbers to the compiler using Extended inline assembly syntax. Otherwise you'll step on the compiler's toes (it assumes registers are unchanged unless they're outputs or clobbers). (Related: How can I indicate that the memory *pointed* to by an inline ASM argument may be used? shows that a pointer input operand alone does not imply that the pointed-to memory is also an input. Use a dummy "m" input or a "memory" clobber to force all reachable memory to be in sync.)
You should simplify your code by not writing your own mov instructions to put data into registers but rather letting the compiler do this. For example, your assembly becomes:
ssize_t retval;
asm volatile ("syscall" // note only 1 instruction in the template
: "=a"(retval) // RAX gets the return value
: "a"(SYS_write), "D"(STDOUT_FILENO), "S"(str_ptr), "d"(n_chars)
: "memory", "rcx", "r11" // syscall destroys RCX and R11
);
where SYS_WRITE is defined in <sys/syscall.h> and STDOUT_FILENO in <stdio.h>. I am not going to explain all the details of extended inline assembly to you. Using inline assembly in general is usually a bad idea. Read the documentation if you are interested. (https://stackoverflow.com/tags/inline-assembly/info)
Fifth, you should avoid using inline assembly when you can. For example, to do system calls, use the syscall function from unistd.h:
syscall(SYS_write, STDOUT_FILENO, str_ptr, (size_t)n_chars);
This does the right thing. But it doesn't inline into your code, so use wrapper macros from MUSL for example if you want to really inline a syscall instead of calling a libc function.
Sixth, always check if the system call you want to call is already available in the C standard library. In this case, it is, so you should just write
write(STDOUT_FILENO, str_ptr, n_chars);
and avoid all of this altogether.
Seventh, if you prefer to use stdio, use fwrite instead:
fwrite(str_ptr, 1, n_chars, stdout);
There are so many things wrong with your code (and so little reason to use inline asm for it) that it's not worth trying to actually correct all of them. Instead, use the write(2) system call the normal way, via the POSIX function / libc wrapper as documented in the man page, or use ISO C <stdio.h> fwrite(3).
#include <unistd.h>
static inline
void printStringWithLength(const char *str_ptr, int n_chars){
write(1, str_ptr, n_chars);
// TODO: check error return value
}
Why your code doesn't assemble:
In AT&T syntax, immediates always need a $ decorator. Your code will assemble if you use asm("int $0x80").
The assembler is complaining about 0x80, which is a memory reference to the absolute address 0x80. There is no form of int that takes the interrupt vector as anything other than an immediate. I'm not sure exactly why it complains about the size, since memory references don't have an implied size in AT&T syntax.
That will get it to assemble, at which point you'll get linker errors:
In function `printStringWithLength':
5 : <source>:5: undefined reference to `str_ptr'
6 : <source>:6: undefined reference to `n_chars'
collect2: error: ld returned 1 exit status
(from the Godbolt compiler explorer)
mov $str_ptr, %rcx
means to mov-immediate the address of the symbol str_ptr into %rcx. In AT&T syntax, you don't have to declare external symbols before using them, so unknown names are assumed to be global / static labels. If you had a global variable called str_ptr, that instruction would reference its address (which is a link-time constant, so can be used as an immediate).
As other have said, this is completely the wrong way to go about things with GNU C inline asm. See the inline-assembly tag wiki for more links to guides.
Also, you're using the wrong ABI. int $0x80 is the x86 32-bit system call ABI, so it doesn't work with 64-bit pointers. What are the calling conventions for UNIX & Linux system calls on x86-64
See also the x86 tag wiki.

What does `PUSH 0xFFFFFFFF` mean in a function prologue?

I'm trying to understand assembly code through a book called "Reverse Engineering for Beginners" [LINK]. There was a piece of code win-32 assembly code I didn't quite understand.
main:
push 0xFFFFFFFF
call MessageBeep
xor eax,eax
retn
What does the first PUSH instruction do?? Why is it pushing 0xFFFFFFFF to the stack, but never popping it back again? What is the significance of 0xFFFFFFFF?
Thanks in advance.
You are looking at the equivalent code for
int main() {
MessageBeep(0xffffffff);
return 0;
}
The assembly code actually don't contain any prolongue or epilogue, since this function doesn't make use of the stack or clobber any preserved register, it just has to perform a function call and return 0 (which is put in eax at the end). It may be receiving arguments it doesn't use as long as it uses the cdecl calling convention (where the caller is responsible for arguments cleanup).
MessageBeep, as almost all Win32 APIs, uses the stdcall calling convention (you'll find it in the C declarations hidden behind the WINAPI macro), which means that it's the called function who is responsible for the cleaning up of the stack from the parameters.
Your code pushes 0xFFFFFFFF as the only argument to MessageBeep, and calls it. MessageBeep does his things, and at the end ensures that all its arguments are popped from the stack before returning (actually, there's a special form of the ret instruction for this). When your code regains control, the stack is as before you pushed the arguments.

How are variable names stored in memory in C?

In C, let's say you have a variable called variable_name. Let's say it's located at 0xaaaaaaaa, and at that memory address, you have the integer 123. So in other words, variable_name contains 123.
I'm looking for clarification around the phrasing "variable_name is located at 0xaaaaaaaa". How does the compiler recognize that the string "variable_name" is associated with that particular memory address? Is the string "variable_name" stored somewhere in memory? Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it, and if so, wouldn't it have to use memory in order to make that substitution?
Variable names don't exist anymore after the compiler runs (barring special cases like exported globals in shared libraries or debug symbols). The entire act of compilation is intended to take those symbolic names and algorithms represented by your source code and turn them into native machine instructions. So yes, if you have a global variable_name, and compiler and linker decide to put it at 0xaaaaaaaa, then wherever it is used in the code, it will just be accessed via that address.
So to answer your literal questions:
How does the compiler recognize that the string "variable_name" is associated with that particular memory address?
The toolchain (compiler & linker) work together to assign a memory location for the variable. It's the compiler's job to keep track of all the references, and linker puts in the right addresses later.
Is the string "variable_name" stored somewhere in memory?
Only while the compiler is running.
Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it, and if so, wouldn't it have to use memory in order to make that substitution?
Yes, that's pretty much what happens, except it's a two-stage job with the linker. And yes, it uses memory, but it's the compiler's memory, not anything at runtime for your program.
An example might help you understand. Let's try out this program:
int x = 12;
int main(void)
{
return x;
}
Pretty straightforward, right? OK. Let's take this program, and compile it and look at the disassembly:
$ cc -Wall -Werror -Wextra -O3 example.c -o example
$ otool -tV example
example:
(__TEXT,__text) section
_main:
0000000100000f60 pushq %rbp
0000000100000f61 movq %rsp,%rbp
0000000100000f64 movl 0x00000096(%rip),%eax
0000000100000f6a popq %rbp
0000000100000f6b ret
See that movl line? It's grabbing the global variable (in an instruction-pointer relative way, in this case). No more mention of x.
Now let's make it a bit more complicated and add a local variable:
int x = 12;
int main(void)
{
volatile int y = 4;
return x + y;
}
The disassembly for this program is:
(__TEXT,__text) section
_main:
0000000100000f60 pushq %rbp
0000000100000f61 movq %rsp,%rbp
0000000100000f64 movl $0x00000004,0xfc(%rbp)
0000000100000f6b movl 0x0000008f(%rip),%eax
0000000100000f71 addl 0xfc(%rbp),%eax
0000000100000f74 popq %rbp
0000000100000f75 ret
Now there are two movl instructions and an addl instruction. You can see that the first movl is initializing y, which it's decided will be on the stack (base pointer - 4). Then the next movl gets the global x into a register eax, and the addl adds y to that value. But as you can see, the literal x and y strings don't exist anymore. They were conveniences for you, the programmer, but the computer certainly doesn't care about them at execution time.
A C compiler first creates a symbol table, which stores the relationship between the variable name and where it's located in memory. When compiling, it uses this table to replace all instances of the variable with a specific memory location, as others have stated. You can find a lot more on it on the Wikipedia page.
All variables are substituted by the compiler. First they are substituted with references and later the linker places addresses instead of references.
In other words. The variable names are not available anymore as soon as the compiler has run through
This is what's called an implementation detail. While what you describe is the case in all compilers I've ever used, it's not required to be the case. A C compiler could put every variable in a hashtable and look them up at runtime (or something like that) and in fact early JavaScript interpreters did exactly that (now, they do Just-In-TIme compilation that results in something much more raw.)
Specifically for common compilers like VC++, GCC, and LLVM: the compiler will generally assign a variable to a location in memory. Variables of global or static scope get a fixed address that doesn't change while the program is running, while variables within a function get a stack address-that is, an address relative to the current stack pointer, which changes every time a function is called. (This is an oversimplification.) Stack addresses become invalid as soon as the function returns, but have the benefit of having effectively zero overhead to use.
Once a variable has an address assigned to it, there is no further need for the name of the variable, so it is discarded. Depending on the kind of name, the name may be discarded at preprocess time (for macro names), compile time (for static and local variables/functions), and link time (for global variables/functions.) If a symbol is exported (made visible to other programs so they can access it), the name will usually remain somewhere in a "symbol table" which does take up a trivial amount of memory and disk space.
Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it
Yes.
and if so, wouldn't it have to use memory in order to make that substitution?
Yes. But it's the compiler, after it compiled your code, why do you care about memory?

Sparc Procedure Call Conventions

I would like to do some "inline" assemly programming in Sparc and I am wondering how I can do that with register passing.
Best to explain my issue with a small example
int main()
{
int a = 5;
int b = 6;
int res;
asm_addition(a,b);
printf("Result: %d\n", res);
return(0);
}
// My assembler addition
.global asm_addition
.align 4
add rs1, rs2, rd
restore
Does anyone know which registers I have to use so that the values a and b will be added? Finally, which register do I need to speficy for rd so that the result will then be printed put with the last printf statement following the assemly routine.
Thanks so much for some input!
The calling convention might depend on OS. I presume Solaris. Google for system v application binary interface sparc, the PDF is easy to find.
Full inline assembler documentation is buried somewhere in the SunStudio PDFs and not so easy to find. Officially it is also accessible via man -s 1 inline, though on my system I have to open the file manually. In the man page, look for "Coding Conventions for SPARC Systems".
On Solaris the parameter are passed via register %o0 to %o5 then over the stack. If the called function is a leaf function (i.e. it doesn't call another function) the register window is not moved forward and the function accesses them directly via %o0 to %o5. If the register window is moved, then the function can access the parameters via the %i0 to %i5 registers. The return value goes the same way via %i0 in the callee which becomes %o0 in the caller.
For floating point parameter they are handled via the FP registers but there you will have to read the document Dummy00001 pointed to.

Resources