How can one access first parameter of a function using stack frame? - c

There's a series of problems in SPOJ about creating a function in a single line with some constraints. I've already solved the easy, medium and hard ones, but for the impossible one I keep getting Wrong Answer.
To sum it up, the problem requests to fill in the code of the return statement such that if x is 1, the return value should be 2. For other x values, it should return 3. The constraint is that the letter 'x' can't be used, and no more code can be added; one can only code that return statement. Clearly, to solve this, one must create a hack.
So I've used gcc's built in way to get the stack frame, and then decreased the pointer to get a pointer to the first parameter. Other than that, the statement is just a normal comparison.
On my machine it works fine, but for the cluster (Intel Pentinum G860) used by the online judge, it doesn't work, probably due to a different calling convention. I'm not sure I understood the processor's ABI (I'm not sure if the stack frame pointer is saved on the stack or only on a register), or even if I'm reading the correct ABI.
The question is: what would be the correct way to get the first parameter of a function using the stack?
My code is (it must be formatted this way, otherwise it's not accepted):
#include <stdio.h>
int count(int x){
return (*(((int*)__builtin_frame_address(0))-1) == 1) ? 2 : 3;
}
int main(i){
for(i=1;i%1000001;i++)
printf("%d %d\n",i,count(i));
return 0;
}

The question is: what would be the correct way to get the first
parameter of a function using the stack?
There is no way in portable manner. You must assume specific compiler, its settings and ABI, along with calling conventions.
The gcc compiler is likely to "lay down" an int local variable with -0x4 offset (assuming that sizeof(int) == 4). You might observe with most basic definition of count:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
0x00000000004004c8 <+4>: mov %edi,-0x4(%rbp)
5 return x == 1 ? 2 : 3;
0x00000000004004cb <+7>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cf <+11>: jne 0x4004d8 <count+20>
0x00000000004004d1 <+13>: mov $0x2,%eax
0x00000000004004d6 <+18>: jmp 0x4004dd <count+25>
0x00000000004004d8 <+20>: mov $0x3,%eax
6 }
0x00000000004004dd <+25>: leaveq
0x00000000004004de <+26>: retq
You may also see that %edi register holds first parameter. This is the case for AMD64 ABI (%edi is also not preserved between calls).
Now, with that knowledge, you might write something like:
int count(int x)
{
return *((int*)(__builtin_frame_address(0) - sizeof(int))) == 1 ? 2 : 3;
}
which can be obfuscated as:
return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
However, trick is that such optimizing compiler may enthusiastically assume that since x is not referenced in count, it could simply skip moving into stack. For example it produces following assembly with -O flag:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
5 return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
0x00000000004004c8 <+4>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cc <+8>: setne %al
0x00000000004004cf <+11>: movzbl %al,%eax
0x00000000004004d2 <+14>: add $0x2,%eax
6 }
0x00000000004004d5 <+17>: leaveq
0x00000000004004d6 <+18>: retq
As you can see mov %edi,-0x4(%rbp) instruction is now missing, thus the only way1 would be to access value of x from %edi register:
int count(int x)
{
return ({register int edi asm("edi");edi==1?2:3;});
}
but this method lacks of ability to "obfuscate", as whitespaces are needed for variable declaration, that holds value of %edi.
1) Not necessarily. Even if compiler decides to skip mov operation from register to stack, there is still a possibility to "force" it to do so, by asm("mov %edi,-0x4(%rbp)"); inline assembly. Beware though, compiler may have its revenge, sooner or later.

C standard does NOT require a stack in any implementation, so really your problem doesn't make any sense.
in the context of gcc, the behavior is different in x86 and x86-64(and any others).
in x86, parameters reside in stack, but in x86-64, the first 6 parameters(including the implicit ones) reside in registers. so basically you can't do the hacking as you say.
if you want to hack the code, you need to specify the platform you want to run on, otherwise, there is no point to answer your question.

Related

Assembly: Purpose of loading the effective address before a call to a function?

Source C Code:
int main()
{
int i;
for(i=0, i < 10; i++)
{
printf("Hello World!\n");
}
}
Dump of Intel syntax x86 assembler code for function main:
1. 0x000055555555463a <+0>: push rbp
2. 0x000055555555463b <+1>: mov rbp,rsp
3. 0x000055555555463e <+4>: sub rsp,0x10
4. 0x0000555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0
5. 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33>
6. 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4
7. 0x0000555555554652 <+24>: call 0x555555554510 <puts#plt>
8. 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1
9. 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9
10. 0x000055555555465f <+37>: jle 0x55555555464b <main+17>
11. 0x0000555555554661 <+39>: mov eax,0x0
12. 0x0000555555554666 <+44>: leave
13. 0x0000555555554667 <+45>: ret
I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.
I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.
1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).
My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.
So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?
2nd Question: What is the purpose of line 11? (mov eax,0x0?)
I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.
The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?
The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)
The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie
mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.
If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).
This:
lea rdi,[rip+0xa2]
Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).
Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.
In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.
The purpose of:
mov eax,0x0
Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.
Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.
If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.
There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU
On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.
EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.
The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions
Here you have more interesting cases https://godbolt.org/z/M4MeGk

Interpreting this line in Assembly language?

Below are the first 5 lines of a disassembled C program that I am trying to reverse engineer back into it's C code for purposes of better learning assembly language. At the beginning of this code I see it makes room on the stack and immediately calls
0x000000000040054e <+8>: mov %fs:0x28,%rax
I am confused what this line does, and what might be calling this from the corresponding C program. The only time I have seen this line so far is when a different method within a C program is called, but this time it is not followed by any Callq instructions so I am not so sure... Any ideas what else could be in this C program to be making this call?
0x0000000000400546 <+0>: push %rbp
0x0000000000400547 <+1>: mov %rsp,%rbp
0x000000000040054a <+4>: sub $0x40,%rsp
0x000000000040054e <+8>: mov %fs:0x28,%rax
0x0000000000400557 <+17>: mov %rax,-0x8(%rbp)
0x000000000040055b <+21>: xor %eax,%eax
0x000000000040055d <+23>: movl $0x17,-0x30(%rbp)
...
I know this is to provide some form of stack protection for buffer overflow attacks, I just need to know what C code would prompt this protection if not for a seperate method.
As you say, this is code used to defend against buffer overflows. The compiler generates this "stack canary check" for functions that have local variables that might be buffers that could be overflowed. Note the instructions immediately above and below the line you are asking about:
sub $0x40, %rsp
mov %fs:0x28, %rax
mov %rax, -0x8(%ebp)
xor %eax, %eax
The sub allocates 64 bytes of space on the stack, which is enough room for at least one small array. Then a secret value is copied from %fs:0x28 to the top of that space, just below the previous frame pointer and the return address, and then it is erased from the register file.
The body of the function does something with arrays; if it writes sufficiently far past the end of an array, it will overwrite the secret value. At the end of the function, there will be code along the lines of
mov -0x8(%rbp), %rax
xor %fs:28, %rax
jne 1
mov %rbp, %rsp
pop %rbp
ret
1:
call __stack_chk_fail # does not return
This verifies that the secret value is unchanged, and crashes the program if it has changed. The idea is that someone trying to exploit a simple buffer overflow vulnerability, like you have when you use gets, won't be able to change the return address without also modifying the secret value.
The compiler has several different heuristics, selectable with command line options, for deciding when it is necessary to generate stack-canary protection code.
You can't write C code corresponding to this assembly language yourself, because it uses the unusual %fs:nnnn addressing mode; the stack-canary code intentionally uses an addressing mode that no other code generation relies on, to make it as difficult as possible for the adversary to learn the secret value.

Why does the compiler generate such code when initializing a volatile array?

I have the following program that enables the alignment check (AC) bit in the x86 processor flags register in order to trap unaligned memory accesses. Then the program declares two volatile variables:
#include <assert.h>
int main(void)
{
#ifndef NOASM
__asm__(
"pushf\n"
"orl $(1<<18),(%esp)\n"
"popf\n"
);
#endif
volatile unsigned char foo[] = { 1, 2, 3, 4, 5, 6 };
volatile unsigned int bar = 0xaa;
return 0;
}
If I compile this, the code generated initially does the
obvious things like setting up the stack and creating the array of chars by moving the values 1, 2, 3, 4, 5, 6 onto the stack:
/tmp ➤ gcc test3.c -m32
/tmp ➤ gdb ./a.out
(gdb) disassemble main
0x0804843d <+0>: push %ebp
0x0804843e <+1>: mov %esp,%ebp
0x08048440 <+3>: and $0xfffffff0,%esp
0x08048443 <+6>: sub $0x20,%esp
0x08048446 <+9>: mov %gs:0x14,%eax
0x0804844c <+15>: mov %eax,0x1c(%esp)
0x08048450 <+19>: xor %eax,%eax
0x08048452 <+21>: pushf
0x08048453 <+22>: orl $0x40000,(%esp)
0x0804845a <+29>: popf
0x0804845b <+30>: movb $0x1,0x16(%esp)
0x08048460 <+35>: movb $0x2,0x17(%esp)
0x08048465 <+40>: movb $0x3,0x18(%esp)
0x0804846a <+45>: movb $0x4,0x19(%esp)
0x0804846f <+50>: movb $0x5,0x1a(%esp)
0x08048474 <+55>: movb $0x6,0x1b(%esp)
0x08048479 <+60>: mov 0x16(%esp),%eax
0x0804847d <+64>: mov %eax,0x10(%esp)
0x08048481 <+68>: movzwl 0x1a(%esp),%eax
0x08048486 <+73>: mov %ax,0x14(%esp)
0x0804848b <+78>: movl $0xaa,0xc(%esp)
0x08048493 <+86>: mov $0x0,%eax
0x08048498 <+91>: mov 0x1c(%esp),%edx
0x0804849c <+95>: xor %gs:0x14,%edx
0x080484a3 <+102>: je 0x80484aa <main+109>
0x080484a5 <+104>: call 0x8048310 <__stack_chk_fail#plt>
0x080484aa <+109>: leave
0x080484ab <+110>: ret
However at main+60 it does something strange: it moves the array of 6 bytes to another part of the stack: the data is moved one 4-byte word at a time in registers. But the bytes start at offset 0x16, which is not aligned, so the program will crash when attempting to perform the mov.
So I've two questions:
Why is the compiler emitting code to copy the array to another part of the stack? I assumed volatile would skip every optimization and always perform memory accesses. Maybe volatile vars are required to always be accessed as whole words, and so the compiler would always use temporary registers to read/write whole words?
Why does the compiler not put the char array at an aligned address if it later intends to do these mov calls? I understand that x86 is normally safe with unaligned accesses, and on modern processors it does not even carry a performance penalty; however in all other instances I see the compiler trying to avoid generating unaligned accesses, since they are considered, AFAIK, an unspecified behavior in C. My guess is that, since later it provides a properly aligned pointer for the copied array on the stack, it just does not care about alignment of the data used only for initialization in a way which is invisible to the C program?
If my hypotheses above are right, it means that I cannot expect an x86 compiler to always generate aligned accesses, even if the compiled code never attempts to perform unaligned accesses itself, and so setting the AC flag is not a practical way to detect parts of the code where unaligned accesses are performed.
EDIT: After further research I can answer most of this myself. In an attempt to make progress, I added an option in Redis to set the AC flag and otherwise run normally. I found that this approach is not viable: the process immediately crashes inside libc: __mempcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:83. I assume that the whole x86 software stack simply does not really care about misalignment since it is handled very well by this architecture. Thus it is not practical to run with the AC flag set.
So the answer to question 2 above is that, like the rest of the software stack, the compiler is free to do as it pleases, and relocate things on the stack without caring about alignment, so long as the behavior is correct from the perspective of the C program.
The only question left to answer, is why with volatile, is a copy made in a different part of the stack? My best guess is that the compiler is attempting to access whole words in variables declared volatile even during initialization (imagine if this address was mapped to an I/O port), but I'm not sure.
You're compiling without optimization, so the compiler is generating straight-forward code without worrying about how inefficient it is. So it first creates the initializer { 1, 2, 3, 4, 5, 6 } in temp space on the stack, and it then copies that into the space it allocated for foo.
The compiler populates the array in a working storage area, one byte at a time, which is not atomic. It then moves the entire array to its final resting place using an atomic MOVZ instruction (the atomicity is implicit when the target address is naturally aligned).
The write has to be atomic because the compiler must assume (due to the volatile keyword) that the array can be accessed at any time by anyone else.

Assembler debug of undefined expression

I'm trying to get a better understanding of how compilers produce code for undefined expressions e.g. for the following code:
int main()
{
int i = 5;
i = i++;
return 0;
}
This is the assembler code generated by gcc 4.8.2 (Optimisation is off -O0 and I’ve inserted my own line numbers for reference purposes):
(gdb) disassemble main
Dump of assembler code for function main:
(1) 0x0000000000000000 <+0>: push %rbp
(2) 0x0000000000000001 <+1>: mov %rsp,%rbp
(3) 0x0000000000000004 <+4>: movl $0x5,-0x4(%rbp)
(4) 0x000000000000000b <+11>: mov -0x4(%rbp),%eax
(5) 0x000000000000000e <+14>: lea 0x1(%rax),%edx
(6) 0x0000000000000011 <+17>: mov %edx,-0x4(%rbp)
(7) 0x0000000000000014 <+20>: mov %eax,-0x4(%rbp)
(8) 0x0000000000000017 <+23>: mov $0x0,%eax
(9) 0x000000000000001c <+28>: pop %rbp
(10) 0x000000000000001d <+29>: retq
End of assembler dump.
Execution of this code results in the value of i remaining at the value of 5 (verified with a printf() statement) i.e. i doesn't appear to ever be incremented. I understand that different compilers will evaluate/compile undefined expressions in differnet ways and this may just be the way that gcc does it i.e. I could get a different result with a different compiler.
With respect to the assembler code, as I understand:
Ignoring line - 1-2 setting up of stack/base pointers etc.
line 3/4 - is how the value of 5 is assigned to i.
Can anyone explain what is happening on line 5-6? It looks as if i will be ultimately reassigned the value of 5 (line 7), but is the increment operation (required for the post increment operation i++) simply abandoned/skipped by the compiler in the case?
These three lines contain your answer:
lea 0x1(%rax),%edx
mov %edx,-0x4(%rbp)
mov %eax,-0x4(%rbp)
The increment operation isn't skipped. lea is the increment, taking the value from %rax and storing the incremented value in %edx. %edx is stored but then overwritten by the next line which uses the original value from %eax.
They key to understanding this code is to know how lea works. It stands for load effective address, so while it looks like a pointer dereference, it actually just does the math needed to get the final address of [whatever], and then keeps the address, instead of the value at that address. This means it can be used for any mathematical expression that can be expressed efficiently using addressing modes, as an alternative to mathematical opcodes. It's frequently used as a way to get a multiply and add into a single instruction for this reason. In particular, in this case it's used to increment the value and move the result to a different register in one instruction, where inc would instead overwrite it in-place.
Line 5-6, is the i++. The lea 0x1(%rax),%edx is i + 1 and mov %edx,-0x4(%rbp) writes that back to i. However line 7, the mov %eax,-0x4(%rbp) writes the original value back into i. The code looks like:
(4) eax = i
(5) edx = i + 1
(6) i = edx
(7) i = eax

Why some compilers optimize if(a>0) and not if(*(&a)>0)?

Let's say I have declared in the global scope:
const int a =0x93191;
And in the main function I have the following condition:
if(a>0)
do_something
An awkward thing I have noticed is that the RVDS compiler will drop the if statement and there is no branch/jmp in the object file.
but If I write:
if(*(&a)>0)
do_something
The if (cmp and branch) will be in the compiled object file.
In contrast, GCC do optimizes both with (-O1 or -O2 or -O3) :
#include <stdio.h>
const a = 3333;
int main()
{
if (a >333)
printf("first\n");
return 0;
}
compiled with -O3:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000f10 <main+0>: push %rbp
0x0000000100000f11 <main+1>: mov %rsp,%rbp
0x0000000100000f14 <main+4>: lea 0x3d(%rip),%rdi # 0x100000f58
0x0000000100000f1b <main+11>: callq 0x100000f2a <dyld_stub_puts>
0x0000000100000f20 <main+16>: xor %eax,%eax
0x0000000100000f22 <main+18>: pop %rbp
0x0000000100000f23 <main+19>: retq
End of assembler dump.
And for
#include <stdio.h>
const a = 3333;
int main()
{
if (*(&a) >333)
printf("first\n");
return 0;
}
will give:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000f10 <main+0>: push %rbp
0x0000000100000f11 <main+1>: mov %rsp,%rbp
0x0000000100000f14 <main+4>: lea 0x3d(%rip),%rdi # 0x100000f58
0x0000000100000f1b <main+11>: callq 0x100000f2a <dyld_stub_puts>
0x0000000100000f20 <main+16>: xor %eax,%eax
0x0000000100000f22 <main+18>: pop %rbp
0x0000000100000f23 <main+19>: retq
End of assembler dump.
GCC treat both as same (as should be) and RVDS doesn't ?
I tried to examine the affect of using volatile and in the RVDS it did drop the the if(a>333) but gcc didn't:
#include <stdio.h>
volatile const a = 3333;
int main()
{
if (a >333)
printf("first\n");
return 0;
}
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000f10 <main+0>: push %rbp
0x0000000100000f11 <main+1>: mov %rsp,%rbp
0x0000000100000f14 <main+4>: cmpl $0x14e,0x12a(%rip) # 0x100001048 <a>
0x0000000100000f1e <main+14>: jl 0x100000f2c <main+28>
0x0000000100000f20 <main+16>: lea 0x39(%rip),%rdi # 0x100000f60
0x0000000100000f27 <main+23>: callq 0x100000f36 <dyld_stub_puts>
0x0000000100000f2c <main+28>: xor %eax,%eax
0x0000000100000f2e <main+30>: pop %rbp
0x0000000100000f2f <main+31>: retq
End of assembler dump.
Probably there are some bugs in the compiler version I used of RVDS.
The level of complexity the compiler will go through to find out "is this something I can figure out what the actual value is", is not unbounded. If you write a sufficiently complex statement, the compiler will simply say "I don't know what the value is, I'll generate code to compute it".
This is perfectly possible for a compiler to figure out that it's not going to change. But it's also possible that some compilers "give up" in the process - it may also depends on where in the compilation chain this analysis is done.
This is probably a fairly typical example of "as-if" rule - the compiler is allowed to perform any optimisation that generates the result "as-if" this was executed.
Having said all that, this should be fairly trivial (and as per comments, the compiler should consdier *(&a) the same as a), so it seems strange that it then doesn't get rid of the comparison.
Optimizations are implementation details of the compilers. It takes time and effort to implement them and compiler writers usually focus on the common uses of the language (i.e. the return of investment of optimizing code that is highly infrequent is close to nothing).
That being said there is a important difference in both pieces of code, in the first case a is not odr-used, only used as an rvalue and that means that it can be processed as a compile time constant. That is, when a is used directly (no address-of, no references bound to it) compilers immediately substitute the value in. The value must be known by the compiler without accessing the variable, since it could be used in contexts where constant expressions are required (i.e. defining the size of an array).
In the second case a is odr-used, the address is taken and the value at that location is read. The compiler must produce code that does those steps before passing the result to the optimizer. The optimizer in turn can detect that it is a constant and replace the whole operation with the value, but this is a bit more involved than the previous case where the compiler itself filled the value in.

Resources