What does `PUSH 0xFFFFFFFF` mean in a function prologue? - c

I'm trying to understand assembly code through a book called "Reverse Engineering for Beginners" [LINK]. There was a piece of code win-32 assembly code I didn't quite understand.
main:
push 0xFFFFFFFF
call MessageBeep
xor eax,eax
retn
What does the first PUSH instruction do?? Why is it pushing 0xFFFFFFFF to the stack, but never popping it back again? What is the significance of 0xFFFFFFFF?
Thanks in advance.

You are looking at the equivalent code for
int main() {
MessageBeep(0xffffffff);
return 0;
}
The assembly code actually don't contain any prolongue or epilogue, since this function doesn't make use of the stack or clobber any preserved register, it just has to perform a function call and return 0 (which is put in eax at the end). It may be receiving arguments it doesn't use as long as it uses the cdecl calling convention (where the caller is responsible for arguments cleanup).
MessageBeep, as almost all Win32 APIs, uses the stdcall calling convention (you'll find it in the C declarations hidden behind the WINAPI macro), which means that it's the called function who is responsible for the cleaning up of the stack from the parameters.
Your code pushes 0xFFFFFFFF as the only argument to MessageBeep, and calls it. MessageBeep does his things, and at the end ensures that all its arguments are popped from the stack before returning (actually, there's a special form of the ret instruction for this). When your code regains control, the stack is as before you pushed the arguments.

Related

Is _start() a function?

It stands to reason that, for executable code to be called a function, it should conform to the function calling convention of the platform it's running on.
However, _start() does not; for example in this reference implementation there is no return address on the stack:
.section .text
.global _start
_start:
# Set up end of the stack frame linked list.
movq $0, %rbp
pushq %rbp # rip=0
pushq %rbp # rbp=0
movq %rsp, %rbp
# We need those in a moment when we call main.
pushq %rsi
pushq %rdi
# Prepare signals, memory allocation, stdio and such.
call initialize_standard_library
# Run the global constructors.
call _init
# Restore argc and argv.
popq %rdi
popq %rsi
# Run main
call main
# Terminate the process with the exit code.
movl %eax, %edi
call exit
.size _start, . - _start
Yet it's called a function in a myriad of sources. A number of questions and answers on StackOverflow also refer to it as a function.
Is a function simply a group of instructions identified by the address to the entry point, or must it conform to the calling convention? The C standard does not seem to define the concept of a function, neither do the gcc and clang docs. What is the authoritative source that defines this concept?
About the lack of a return making a piece of code not a function, even a function written in C, does not have to have a return instruction in it:
int call_fn(int(*fn)()) {
return fn();
}
This function, with proper optimizations compiles down to a single jmp instruction: https://godbolt.org/z/nxT9qTvaf
call_fn(int (*)()): # #call_fn(int (*)())
jmp rdi # TAILCALL
In general, I don't think the C or the C++ standard would define anything about stuff written in assembly. A common calling convention helps for making direct calls into functions written in other languages, but you can still call functions using other calling conventions using a trampoline.
It stands to reason that, for executable code to be called a function, it should conform to the function calling convention of the platform it's running on.
"Function" is the primary idea here; "calling convention" is subsidiary to that. As such, I think a more supportable claim would be that for every function, there is a convention for calling it.
Interoperability considerations lead to standardization of calling conventions, but there is no One True calling convention, not even on a per-platform basis. Even subject to the influence of interoperability, there are platforms that support multiple standard calling conventions. In any case the existence of standard calling conventions does not necessarily relegate code with other conventions for entry and exit to non-function-hood.
Is a function simply a group of instructions identified by the address to the entry point, or must it conform to the calling convention?
This is a question of the definition of "function". There is room for variation on this, and in practice, different definitions apply in different contexts. For example, the question refers to the C language specification, but this speaks to the meaning of "function" in the context of C source code, not assembly or machine code.
In practice, in various languages and contexts, there are
functions with identifiers and functions without;
functions that return a value and functions that don't;
functions with a single entry point and functions with multiple entry points;
functions with a single exit point and functions with multiple exit points;
functions that always return to the caller, functions that usually return, functions that occasionally return, and functions that never return;
a wide variety of patterns for how functions receive data to operate on, how they return data to their caller (if they do so), and what invariants they do and do not ensure
other dimensions of variation, too
Thus, no, I do not accept in any universal sense that a piece of code needs to conform to a particular calling convention to be called a "function", and I also do not accept "a group of instructions identified by the address to the entry point" as a satisfactory universal definition.
Is _start() a function?
A _start() function such as is provided by GCC / Glibc satisfies some relevant definitions of the term. I have no problem with calling it a "function".
There seems to be this idea going around in the newer programming models that all running code is inside functions; but in the beginning this was not so, and if we look at the old languages we can observe this.
Drawing from lisp:
(format t "Hello, World!")
This is hello world in common lisp, and is not a function in any normal sense. For comparison, here is it as a function:
(defun hello ()
(format t "Hello, World!"))
(hello)
And from near the other root of all programming languages; here is Fortran (source):
PROGRAM FUNDEM
C Declarations for main program
REAL A,B,C
REAL AV, AVSQ1, AVSQ2
REAL AVRAGE
C Enter the data
DATA A,B,C/5.0,2.0,3.0/
C Calculate the average of the numbers
AV = AVRAGE(A,B,C)
AVSQ1 = AVRAGE(A,B,C) **2
AVSQ2 = AVRAGE(A**2,B**2,C**2)
PRINT *,'Statistical Analysis'
PRINT *,'The average of the numbers is:',AV
PRINT *,'The average squared of the numbers: ',AVSQl
PRINT *,'The average of the squares is: ', AVSQ2
END
REAL FUNCTION AVRAGE(X,Y,Z)
REAL X,Y,Z,SUM
SUM = X + Y + Z
AVRAGE = SUM /3.0
RETURN
END
Yup that's top level statements and a function definition. Fortran has three things, the PROGRAM, SUBROUTINEs, and FUNCTIONs.
And again, we can do the same kind of example in QuickBasic:
CALL Hello
Sub Hello()
Print "Hello, World"
End Sub
QuickBasic was kind of funny; you never even tried to name the entry point and whatever .OBJ file was first in the build script was where the entry point was.
There's a general recurring theme here. In all of these, the top level isn't very function-like. The compiler would add stuff to the beginning of the entry point for you so that runtime initialization worked correctly.
Now what happened in C? C took a different path. The initialization routines were written in their own file that calls main() and the compiler just compiles main() as it would any other function and has no capacity for emitting code that runs at top level. Thus, the entry point (traditionally called _start but doesn't have to be) is not and cannot be written in C.
Don't get me wrong here, if you were to compile any of these on a Unix platform today and look at the resulting .o files you would see the modern compilers emit a main() function with the top level code in it. This is because of the preeminence of the C runtime and not because of any need for it to be a function. Had the other languages carried around in their runtimes the definitions of the system calls like they used to, this would not need to be.
Thus we have the process entry point is not a function.
We can take this argument one step farther; suppose (and I have seen news articles reference a thing kind of like this) we had a full native Java compiler that emitted .o files and linked against .so files providing the Java runtime; we could then ask Is _start a class method? The answer isn't no. The answer is the question makes no sense because you can't get a valid Java reference to the symbol. The same silly thing happens in C, we just need to pick a different platform. On DOS FAR model, _start is exported as PROC NEAR but void _start() expects a PROC FAR. The emitted link-time fixup is of the wrong size and trying to take the address of _start results in undefined behavior.
You are mixing fields. You can't apply "text specification" to oranges.
the C standard does not seem to define the concept of a function
C is a language. In the C language, the text like the following:
void func();
is a function declaration of a function func.
Is _start() a function?
The text you posted is not in C language. There are no functions declarations and definitions in it.
As you stated, the term function is not defined in the C standard. I would assume that the English language understanding of the term "function" applies here, as to any other word in the C standard.
I see in Merriam-Webster that a "function" is a computer subroutine, where a subroutine is a a sequence of computer instructions for performing a specified task that can be used repeatedly.
Clearly, _start is a function - it is a sequence of instructions to be executed repeatedly, it is executed on a computer, and it also operates on variables in the form of registers.
The text you posted represents the function _start in the form of a text using assembly language. It is not possible to represent the function _start in the C programming language.
(It is also not possible to express oranges, yet they exist in the real world. My point is, you can take any other word in the C standard, like, I don't know, "international", and ask "Are oranges international?". Applying C standard and "language-lawyer" tag to abstract contexts is not going to give you answers. Bottom line is that the C standard is a specification - it tells what happens when, it is not a dictionary.)
Is a function simply a group of instructions identified by the address to the entry point, or must it conform to the calling convention?
See Merriam-Webster function.
What is the authoritative source that defines this concept?
I googled and "There is no official agency that makes rules for English language".
The C standard is created by http://www.open-std.org/jtc1/sc22/wg14/ .

Register usage in ARM assembly function which is called by a C function

The C function call convention for ARM says:
Caller will pass the first 4 parameters in r0-r3.
Caller will pass any extra parameters on stack.
Caller will get the return value from r0.
I am handcrafting an assembly function called by C. The prototype is equivalent to this:
void s(void);
Suppose a C function c() calls s().
Since s() has no parameter nor return value. I believe r0-r3 will not be touched by the compiler to generate the calling sequence for c() to call s().
Suppose s() will use r0-r12 to complete its function. It is also possible that c() will use those registers.
I am not sure if I have to explicitly save and restore all the registers touched in s(), say r0-r12. Such memory operation will cost some time.
Or at least I don't have to do that for r0-r3?
From Procedure Call Standard for the Arm Architecture, section 6.1.1 (page 19):
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variantsthat designate r9 as v6)
So yes, since r0-r3 are scratch registers, you do not need to save those before using them in s(), but you have to save and restore any other register.
Assuming that the compiler is compliant with the ARM ABI, then declaring s() like this:
extern void s(void);
should suffice, and the compiler should not emit code that relies on previous values of r0-r3 in the c() function after the call to s() (i.e. c() should save r0-r3 if needed before calling s() and restore them after), since that would break the ABI compliance.
Generally when mixing C and asm, you can never make any assumptions about what registers the C code uses, save for those guaranteed to get stacked by the calling convention. Stack all other registers before using them and then pop them later. All of this depends on what assumptions the compiler makes and doesn't make internally upon calling your assembler function.
Some good info here: Mixing C, C++, and Assembly Language

MASM and C jump to function

I have a pointer to a __stdcall function in C and in both x86 and x64 assembly what I'd like to do is have an asm function that I can use to jump to that function.
For example take the windows API function MessageBoxW
void *fn = GetProcAddress(GetModuleHandle("kernel32.dll"), MessageBoxW);
Then in C I'll have a call to the ASM, like
void foo()
{
MessageBoxW_asmstub(NULL, "test", "test", NULL);
}
Assume fn is global. Then in assembly I'd like to have a function that just forwards to MessageBoxW, not calling it. In other words I want MessageBoxW to clean up the variables passed to MessageBoxW_asmstub and then return to foo
jump (fn) ?
I don't know how to do this.
Assuming that MessageBoxW_asmstub is declared to the C compiler as having the correct calling convention (i.e. __stdcall for x86; for x64 there is thankfully only one calling convention), then as the comment from Ross Ridge said, this is as simple as jumping to the target function which will then return directly to the caller. Since you have an indirect reference (i.e. fn refers to a pointer to the target), you probably need another load instruction (although my knowledge of x86 is limited here -- I wouldn't be at all surprised if there is some double-indirect form of jmp). You can use any volatile registers in the calling convention to do this, e.g. for x64 you might use something along the lines of:
extern fn:qword
MessageBoxW_asmstub:
mov rax, fn
jmp rax
BTW, if you use a debugger to step through calls to delay-loaded DLL imports, you'll probably see a similar pattern used in the linker-generated stub functions.

Assembly analyzing system() function called in C

So I made a very simple C program to study how C works on the inside. It has just 1 line in the main() excluding return 0:
system("cls");
If I use ollydebugger to analyze this program It will show something like this(text after the semicolons are comments generated by ollydebugger.
MOV DWORD PTR SS:[ESP],test_1.004030EC ; ||ASCII "cls"
CALL <JMP.&msvcrt.system> ; |\system
Can someone explain what this means, and if I want to change the "cls" called in the system() to another command, where is the "cls" stored? And how do I modify it?
You are using 32 bit Windows system, with its corresponding ABI (the assumptions used when functions are called).
MOV DWORD PTR SS:[ESP],test_1.004030EC
Is equivalent to a push 4030ech instruction, that simply store the address of the string cls on the stack.
This is the way parameters are passed to functions and tell us that the string cls is at address 4030ech.
CALL <JMP.&msvcrt.system> ; |\system
This is the call to the system function from the CRT.
The JMP in the name is due how linking works by default with Visual Studio compilers and linkers.
So those two lines are simply passing the address of the string to the system function.
If you want do modify it you need to check if it is in a writable section (I think is not) by checking the PE Sections, your debugger may have a tool for that. Or you could just try anyway the following:
Inspect the memory at 4030ech, you will see the string, try editing it (this is debugger dependent).
Note: I use the TASM notation for hex numbers, i.e. 123h means 0x123 in C notation.

Who is responsible for cleanup?

I wish to know which one is responsible for cleanup of the stack
Suppose you have a function fun lets say like this:
var = fun(int x, int y, float z, char x);
when fun will get called it will go into the stack along with the parameters then when the function returns who is responsible for cleanup of the stack is it the function it self or the "var" which will hold the return value.
One more thing, can anyone explain the concepts of calling conventions?
You referred to the answer yourself: calling conventions.
A calling convention is similar to a contract. It decides the following things:
Who is responsible to cleanup the parameters.
How and in which order the parameters are passed to the called function.
Where the return value is stored.
There are many different calling conventions, depending on the platform and the programming environment. Two common calling conventions on the x86 platforms are:
stdcall
The parameters are passed onto the stack from right to left. The called function cleans up the stack.
cdecl
The parameters are passed onto the stack from right to left. The calling function cleans up the stack.
In both cases the return value is in the EAX register (or ST0 for floating point values)
Many programming languages for the x86 platform allow to specify the calling convention, for example:
Delphi
function MyFunc(x: Integer): Integer; stdcall;
Microsoft C/C++
int __stdcall myFunc(int x)
Some usage notes:
When creating a simple application it's rarely necessary to change or to know about the calling convention, but there are two typical cases where you need to concern yourself with calling conventions:
When calling external libraries, Win32 API for example: You have to use compatible calling conventions, otherwise the stack might get corrupted.
When creating inline assembler code: You have to know in which registers and where on the stack you find the variables.
For further details I recommend these Wikipedia articles:
Calling convention
x86 calling conventions
calling convention refers to who is doing the cleanup of the stack; caller or callee.
Calling conventions can differ in:
where parameters and return values are placed (in registers; on the call
stack; a mix of both)
the order in which parameters are passed (or parts of a single
parameter)
how the task of setting up and cleaning up a function call is divided
between the caller and the callee.
which registers that may be directly used by the callee may sometimes also
be included
Architectures almost always have more
than one possible calling convention.
By the time that line is complete var will hold the value returned by fun() and any memory on the stack used by fun will be gone: "push", "pop" all tidy.
Calling conventions: everything that the compiler organises so that fun can do its work. Consider those parameters x, y, z. What order do they get pushed onto the stack (indeed do they get passed via the stack)? Doesn't matter so long as the caller and callee agree! It's a convention.

Resources