Why is the compiler generating a push/pop instruction pair? - c

I compiled the code below with the VC++ 2010 compiler:
__declspec(dllexport)
unsigned int __cdecl __mm_getcsr(void) { return _mm_getcsr(); }
and the generated code was:
push ECX
stmxcsr [ESP]
mov EAX, [ESP]
pop ECX
retn
Why is there a push ECX/pop ECX instruction pair?

The compiler is making room on the stack to store the MXCSR. It could have equally well done this:
sub esp,4
stmxcsr [ESP]
mov EAX, [ESP]
add esp,4
retn
But "push ecx" is probably shorter or faster.

The push here is used to allocate 4 bytes of temporary space. [ESP] would normally point to the pushed return address, which we cannot overwrite.
ECX will be overwritten here, however, ECX is a probably a volatile register in the ABI you're targeting, so functions don't have to preserve ECX.
The reason a push/pop is used here is a space (and possibly speed) optimization.

It creates an top-of-stack entry that ESP now refers to as the target for the stmxcsr instruction. Then the result is stored in EAX for the return.

Related

Why the first actual parameter printing as a output in C

#include <stdio.h>
int add(int a, int b)
{
if (a > b)
return a * b;
}
int main(void)
{
printf("%d", add(3, 7));
return 0;
}
Output:
3
In the above code, I am calling the function inside the print. In the function, the if condition is not true, so it won't execute. Then why I am getting 3 as output? I tried changing the first parameter to some other value, but it's printing the same when the if condition is not satisfied.
What happens here is called undefined behaviour.
When (a <= b), you don't return any value (and your compiler probably told you so). But if you use the return value of the function anyway, even if the function doesn't return anything, that value is garbage. In your case it is 3, but with another compiler or with other compiler flags it could be something else.
If your compiler didn't warn you, add the corresponding compiler flags. If your compiler is gcc or clang, use the -Wall compiler flags.
Jabberwocky is right: this is undefined behavior. You should turn your compiler warnings on and listen to them.
However, I think it can still be interesting to see what the compiler was thinking. And we have a tool to do just that: Godbolt Compiler Explorer.
We can plug your C program into Godbolt and see what assembly instructions it outputs. Here's the direct Godbolt link, and here's the assembly that it produces.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
Again, to be perfectly clear, what you've done is undefined behavior. With different compiler flags or a different compiler version or even just a compiler that happens to feel like doing things differently on a particular day, you will get different behavior. What I'm studying here is the assembly output by gcc 12.2 on Godbolt with optimizations disabled, and I am not representing this as standard or well-defined behavior.
This engine is using the System V AMD64 calling convention, common on Linux machines. In System V, the first two integer or pointer arguments are passed in the rdi and rsi registers, and integer values are returned in rax. Since everything we work with here is either an int or a char*, this is good enough for us. Note that the compiler seems to have been smart enough to figure out that it only needs edi, esi, and eax, the lower half-words of each of these registers, so I'll start using edi, esi, and eax from this point on.
Our main function works fine. It does everything we'd expect. Our two function calls are here.
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
To call add, we put 3 in the edi register and 7 in the esi register and then we make the call. We get the return value back from add in eax, and we move it to esi (since it will be the second argument to printf). We put the address of the static memory containing "%d" in edi (the first argument), and then we call printf. This is all normal. main knows that add was declared to return an integer, so it has the right to assume that, after calling add, there will be something useful in eax.
Now let's look at add.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
The rbp and rsp shenanigans are standard function call fare and aren't specific to add. First, we load our two arguments onto the call stack as local variables. Now here's where the undefined behavior comes in. Remember that I said eax is the return value of our function. Whatever happens to be in eax when the function returns is the returned value.
We want to compare a and b. To do that, we need a to be in a register (lots of assembly instructions require their left-hand argument to be a register, while the right-hand can be a register, reference, immediate, or just about anything). So we load a into eax. Then we compare the value in eax to the value b on the call stack. If a > b, then the jle does nothing. We go down to the next two lines, which are the inside of your if statement. They correctly set eax and return a value.
However, if a <= b, then the jle instruction jumps to the end of the function without doing anything else to eax. Since the last thing in eax happened to be a (because we happened to use eax as our comparison register in cmp), that's what gets returned from our function.
But this really is just random. It's what the compiler happened to have put in that register previously. If I turn optimizations up (with -O3), then gcc inlines the whole function call and ends up printing out 0 rather than a. I don't know exactly what sequence of optimizations led to this conclusion, but since they started out by hinging on undefined behavior, the compiler is free to make what assumptions it chooses.

Does function parameter names has a place in memory in C?

I dont think function parameter names are treated like variables and they doesnt get stored in memory. But I dont get how we can use these parameters in functions as variables if they dont have any place in memory. Can anyone explain me whats going on with function parameters and if they have place or not in memory
All variables are either allocated somewhere or optimized away in case the compiler found them unnecessary. Function parameters are variables and are almost certainly stored either in a CPU register or on the stack, if they are used by the program.
The only time when they might not get allocated is when the function is inlined - when the whole function call is optimized away and the function code is instead injected in the caller-side machine code. In such cases the original variables used by the caller might be used instead.
Function parameter names however are not stored anywhere in the final executable, just like any other identifier isn't stored there either. Names of variables, functions etc only exist in the source code, for the benefit of the programmer alone.
Although your title asks about “function parameter names,” it appears your question is about function parameters, which are different.
Commonly, arguments are passed to functions by putting them in processor registers or on the hardware stack. Each computing platform has some specification of which arguments should be passed where. For example, the first few small arguments (such as int values) may be passed in certain processor registers, while more or larger arguments may be put on the stack.
To the called function, these are parameters. The called function uses them from the processor registers or the stack.
I'm writing this answer assuming you know what the stack and CPU registers are. If you don't, I'd suggest you look them up before seeing this answer.
I dont think function parameter names are treated like variables and they doesnt get stored in memory.
At the assembly level, function parameter names don't really exist. But for function parameters, it depends on the assembly generated based on the compiler's level of optimization. Consider this simple function:
int foo(int a, int b)
{
return a + b;
}
Using Compiler Explorer, I checked the generated disassembly of x64 GCC 10.2. On -O0, it looks like this:
foo:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
pop rbp
ret
These two lines:
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
interestingly show that the arguments are passed to edi and esi for a and b respectively, and then moved into the stack, presumably in case the registers need to be used elsewhere in the function. The rest of the function uses the space in the stack as opposed to the registers:
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
(In case you didn't know, eax/rax generally holds the value for functions and edx in this case just serves as a general-purpose register, so these three lines are besaically eax = a; edx = b; eax += edx).
Ok, so that makes sense. The arguments are passed to registers and copied to the stack, where they are used for the rest of the function. What about -O1?
foo:
lea eax, [rdi+rsi]
ret
Now that is a lot shorter. Here, eax gets the value of rdi + rsi and the function ends. All the copying to the stack is completely skipped and the registers are used directly. So yes, in this case, the memory is never used.
EDIT
After writing this answer, I went and checked the generated assembly with the -m32 option and noticed that arguments were always pushed to the stack before the function was called. Assembly generated from -O0 looks like this:
foo:
push ebp
mov ebp, esp
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add eax, edx
pop ebp
ret
Here, since the arguments are passed to the stack before the function is called, they don't have be copied from the registers to the stack (because they're already there). So the function is shorter, and amount of registers used is reduced. However, on higher levels of optimization, the function ends up becoming longer because of this:
foo:
mov eax, DWORD PTR [esp+8]
add eax, DWORD PTR [esp+4]
ret
So with -m32 set, parameters are always placed in memory.

Stack cleanup not working (__stdcall MASM function)

there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END

Dive into the assembly

Function in c:
PHPAPI char *php_pcre_replace(char *regex, int regex_len,
char *subject, int subject_len,
zval *replace_val, int is_callable_replace,
int *result_len, int limit, int *replace_count TSRMLS_DC)
{
pcre_cache_entry *pce; /* Compiled regular expression */
/* Compile regex or get it from cache. */
if ((pce = pcre_get_compiled_regex_cache(regex, regex_len TSRMLS_CC)) == NULL) {
return NULL;
}
....
}
Its assembly:
php5ts!php_pcre_replace:
1015db70 8b442408 mov eax,dword ptr [esp+8]
1015db74 8b4c2404 mov ecx,dword ptr [esp+4]
1015db78 56 push esi
1015db79 8b74242c mov esi,dword ptr [esp+2Ch]
1015db7d 56 push esi
1015db7e 50 push eax
1015db7f 51 push ecx
1015db80 e8cbeaffff call php5ts!pcre_get_compiled_regex_cache (1015c650)
1015db85 83c40c add esp,0Ch
1015db88 85c0 test eax,eax
1015db8a 7502 jne php5ts!php_pcre_replace+0x1e (1015db8e)
php5ts!php_pcre_replace+0x1c:
1015db8c 5e pop esi
1015db8d c3 ret
The c function call pcre_get_compiled_regex_cache(regex, regex_len TSRMLS_CC) corresponds to 1015db7d~1015db80 which pushes the 3 parameters to the stack and call it.
But my doubt is,among so many registers,how does the compiler decide to use eax,ecx and esi(this is special,as it's restored before using,why?) as the intermediate to carry to the stack?
There must be some hidden indication in c that tells the compiler to do it this way,right?
No, there is no hidden indication.
This is a typical strategy for generating 80x86 instructions used by many compiler implementations, C and otherwise. For example, the 1980s Intel Fortran-77 compiler, when optimization was turned on, did the same thing.
That is uses eax and ecx preferentially is probably an artifact of avoiding use of esi and edi since those registers cannot directly be used to load byte operands.
Why not ebx and edx? Well, those are preferred by many code generators for holding intermediate pointers in evaluating complex structure evaluation, which is to say, there isn't much reason at all. The compiler just looked for two available registers to use and overwrote them to buffer the values.
Why not reuse eax like this?:
push esi
mov eax,dword ptr [esp+2Ch]
push eax
mov eax,dword ptr [esp+8]
push eax
mov eax,dword ptr [esp+4]
push eax
Because that causes pipeline stalls waiting for eax to complete previous memory cycles, in 80x86s since the 80586 (maybe 80486—it's too long ago to be sure off the top of my head).
The x86 architecture is a strange beast. Each register, though promoted as being "general purpose" by Intel, has its quirks (cx/ecx is tied to the loop instruction for example, and eax:edx is tied to the multiply instruction). That combined with the peculiar ways to optimize execution to avoid cache misses and pipeline stalls often leads to inscrutable generated code by a code generator which factors all that in.

Accessing parameters passed on the stack in an ASM function

I am writing an assembly function to be called from C that will call the sidt machine instruction with a memory address passed to the C function. From my analysis of the code produced by MSVC10, I have constructed the following code (YASM syntax):
SECTION .data
SECTION .text
GLOBAL _sidtLoad
_sidtLoad:
push ebp
mov ebp,esp
sub esp,0C0h
push ebx
push esi
push edi
lea edi,[ebp-0C0h]
mov ecx,30h
mov eax,0CCCCCCCCh
sidt [ebp+8]
pop edi
pop esi
pop ebx
add esp,0C0h
cmp ebp,esp
mov esp,ebp
pop ebp
ret
Here is the C function signature:
void sidtLoad (void* mem);
As far as I can see everything should work, I have even checked the memory address passed to the function and have seen that it matches the address stored at [ebp+8] (the bytes are reversed, which I presume is a result of endianness and should be handled by the machine). I have tried other arguments for the sidt instruction, such as [ebp+12], [ebp+4], [ebp-8] etc but no luck.
P.S I am writing this in an external assembly module in order to get around the lack of inline assembly when targeting x64 using MSVC10. However, this particular assembly function/program is being run as x86, just so I can get to grips with the whole process.
P.P.S I am not using the __sidt intrinsic as the Windows Driver Kit (WDK) doesn't seem to support it. Or at least I can't get it to work with it!
Your code will write the IDT into the memory address ebp+8, which is not what you want. Instead, you need something like:
mov eax, [ebp+8]
sidt [eax]
compile:
int sidLoad_fake(void * mem) {
return (int)mem;
}
to assembly, and then look at where it pulls the value from to put in the return register.
I think that the first few arguments on x86_64 are passed in registers.

Resources