How does an ELF pad its `short`s? - c

I know that function arguments are padded to target word size, but with what?
Specifically in the context of the x86 Linux GNU toolchain, what do these functions return?
int iMysteryMeat(short x)
{
return *((int *)&x);
}
unsigned uMysteryMeat(unsigned short x)
{
return *((unsigned *)&x);
}
The question is whether, when hand-coding a function in assembly, it is necessary to sterilze "small" arguments by masking or sign-extending them before using them in "large" contexts (andl, imull).
I'd also be interested whether there are any more general or cross-platform standards for this case.

This depends on the ABI. The ABI needs to specify the choice that small arguments are extended by the caller, or by the callee (and how). Unfortunately, this part of the ABI is often underspecified, leading to different choices by different compilers. So to prevent incompatibility between code compiled with different legacy compilers, most modern compilers (I know in particular about gcc on i386) err on the side of caution and do both.
int a(short x) {
return x;
}
int b(int x);
int c(short x) {
b(x);
}
gcc -m32 -O3 -S tmp.c -o tmp.s
_a:
pushl %ebp
movl %esp, %ebp
movswl 8(%ebp),%eax
leave
ret
_c:
pushl %ebp
movl %esp, %ebp
movswl 8(%ebp),%eax
movl %eax, 8(%ebp)
leave
jmp _b
Note that a does not assume any extension rule about its argument, but extends it itself. Similarly, c makes sure to extend its argument before passing it to b (via a tail call).

int iMysteryMeat(short x)
{
return *((int *)&x);
}
This is undefined behavior in C, this violates aliasing rules and may also violate alignment requirements. In short don't do this.

Although Keith's answer is in line with the spirit of my question, per Alex's request I thought I'd try this out for myself.
Interestingly, in this case the more literal answer to my example is "garbage".
#include <stdio.h>
int iMysteryMeat(short x)
{
return *((int *)&x);
}
unsigned uMysteryMeat(unsigned short x)
{
return *((unsigned *)&x);
}
int main()
{
printf("iMeat: 0x%08x\n", iMysteryMeat(-23));
printf("uMeat: 0x%08x\n", uMysteryMeat(-23));
return 0;
}
gcc -m32 -S meat.c
iMysteryMeat:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl 8(%ebp), %eax
movw %ax, -4(%ebp)
leal -4(%ebp), %eax
movl (%eax), %eax
leave
ret
uMysteryMeat:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl 8(%ebp), %eax
movw %ax, -4(%ebp)
leal -4(%ebp), %eax
movl (%eax), %eax
leave
ret
./a.out
iMeat: 0x0804ffe9
uMeat: 0x0043ffe9
As you can see, not only is the usual sign-extension protocol overrided (i.e. compare with Keith's a()), it actually moves x into uninitialized stack space with movw, rendering the top half of the return value garbage no matter what main() gives it.
So, again, as ouah said, never ever do this in C, and in assembly (or in general, really), always sterilize your inputs.

Related

x86 calling convention: should arguments passed by stack be read-only?

It seems state-of-art compilers treat arguments passed by stack as read-only. Note that in the x86 calling convention, the caller pushes arguments onto the stack and the callee uses the arguments in the stack. For example, the following C code:
extern int goo(int *x);
int foo(int x, int y) {
goo(&x);
return x;
}
is compiled by clang -O3 -c g.c -S -m32 in OS X 10.10 into:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 10
.globl _foo
.align 4, 0x90
_foo: ## #foo
## BB#0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
movl %eax, -4(%ebp)
leal -4(%ebp), %eax
movl %eax, (%esp)
calll _goo
movl -4(%ebp), %eax
addl $8, %esp
popl %ebp
retl
.subsections_via_symbols
Here, the parameter x(8(%ebp)) is first loaded into %eax; and then stored in -4(%ebp); and the address -4(%ebp) is stored in %eax; and %eax is passed to the function goo.
I wonder why Clang generates code that copy the value stored in 8(%ebp) to -4(%ebp), rather than just passing the address 8(%ebp) to the function goo. It would save memory operations and result in a better performance. I observed a similar behaviour in GCC too (under OS X). To be more specific, I wonder why compilers do not generate:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 10
.globl _foo
.align 4, 0x90
_foo: ## #foo
## BB#0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
leal 8(%ebp), %eax
movl %eax, (%esp)
calll _goo
movl 8(%ebp), %eax
addl $8, %esp
popl %ebp
retl
.subsections_via_symbols
I searched for documents if the x86 calling convention demands the passed arguments to be read-only, but I couldn't find anything on the issue. Does anybody have any thought on this issue?
The rules for C are that parameters must be passed by value. A compiler converts from one language (with one set of rules) to a different language (potentially with a completely different set of rules). The only limitation is that the behaviour remains the same. The rules of the C language do not apply to the target language (e.g. assembly).
What this means is that if a compiler feels like generating assembly language where parameters are passed by reference and are not passed by value; then this is perfectly legal (as long as the behaviour remains the same).
The real limitation has nothing to do with C at all. The real limitation is linking. So that different object files can be linked together, standards are needed to ensure that whatever the caller in one object file expects matches whatever the callee in another object file provides. This is what's known as the ABI. In some cases (e.g. 64-bit 80x86) there are multiple different ABIs for the exact same architecture.
You can even invent your own ABI that's radically different (and implement your own tools that support your own radically different ABI) and that's perfectly legal as far as the C standards go; even if your ABI requires "pass by reference" for everything (as long as the behaviour remains the same).
Actually, I just compiled this function using GCC:
int foo(int x)
{
goo(&x);
return x;
}
And it generated this code:
_foo:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
leal 8(%ebp), %eax
movl %eax, (%esp)
call _goo
movl 8(%ebp), %eax
leave
ret
This is using GCC 4.9.2 (on 32-bit cygwin if it matters), no optimizations. So in fact, GCC did exactly what you thought it should do and used the argument directly from where the caller pushed it on the stack.
The C programming language mandates that arguments are passed by value. So any modification of an argument (like an x++; as the first statement of your foo) is local to the function and does not propagate to the caller.
Hence, a general calling convention should require copying of arguments at every call site. Calling conventions should be general enough for unknown calls, e.g. thru a function pointer!
Of course, if you pass an address to some memory zone, the called function is free to dereference that pointer, e.g. as in
int goo(int *x) {
static int count;
*x = count++;
return count % 3;
}
BTW, you might use link-time optimizations (by compiling and linking with clang -flto -O2 or gcc -flto -O2) to perhaps enable the compiler to improve or inline some calls between translation units.
Notice that both Clang/LLVM and GCC are free software compilers. Feel free to propose an improvement patch to them if you want to (but since both are very complex pieces of software, you'll need to work some months to make that patch).
NB. When looking into produced assembly code, pass -fverbose-asm to your compiler!

Can't set stack boundary gcc

My c code:
#include <stdio.h>
foo()
{
char buffer[8];
}
main()
{
foo();
return 0;
}
I compile it using gcc -ggdb -mpreferred-stack-boundary=2 -o bar bar.c
When I load it using GDB ./bar I see that inside the foo function the code is:
sub $0x0c,$esp
Why is this happening?
I want to buffer to take 8 bytes in the stack so it should be sub $0x8,$esp!
Why can't I set stack boundary to 4 bytes?
Help!
I can't reproduce exactly what you are seeing, but on my 4.8.2 version of gcc, the option does affect the amount of stack used with this code (make sure "buffer" is used to avoid it being optimised away, and fix the warnings for no return type/argument types):
#include <stdio.h>
void foo(void)
{
char buffer[8];
buffer[0] = 'a';
buffer[1] = '\n';
buffer[2] = 0;
printf("my first program! %s\n", buffer);
}
int main()
{
foo();
return 0;
}
Compiled with -mpreferred-stack-boundary=2 and -mpreferred-stack-boundary=4, and the difference between the generated assembler is notable:
$ diff -u stb-2.s stb-4.s
--- stb-2.s 2014-04-10 09:00:39.546038191 +0100
+++ stb-4.s 2014-04-10 09:00:58.895108979 +0100
## -15,11 +15,11 ##
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
- subl $16, %esp
- movb $97, -8(%ebp)
- movb $10, -7(%ebp)
- movb $0, -6(%ebp)
- leal -8(%ebp), %eax
+ subl $40, %esp
+ movb $97, -16(%ebp)
+ movb $10, -15(%ebp)
+ movb $0, -14(%ebp)
+ leal -16(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
.LEHB0:
## -67,9 +67,10 ##
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
+ andl $-16, %esp
call _Z3foov
movl $0, %eax
- popl %ebp
+ leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
So, at least in gcc 4.8.2. for x86-32, the option has an effect.
Of course, the default according to the docs is -mpreferred-stack-boundary=2, so maybe that's why you can't see any difference from "without" (Although in my experiments, it seems that it's -mpreferred-stack-boundary=4). [Moment passes] Ah, the default has been changed over time, so the 4.4.2 docs online says 2, my info gcc for 4.8.2 says 4, which explains the difference.
As to why your code is allocating twelve bytes of stack-space - look at how printf is called:
movl $.LC0, (%esp)
call printf
If the compiler can, it will pre-allocate argument space for function calls at the start of the function, rather than use push $.LC0 as it would be in this case. It's not much difference, but it saves at least one instruction for cleanup at the other side of printf (and it makes it MUCH easier to deal with stack-relative offsets within the produced code, since the compiler doesn't have to keep track of where the current stack-pointer is - it's always at a constant place after the prologue code at the beginning of the function, all the way to the end of the function). Since the space is ultimately required anyway, there's no point in "saving 4 bytes".

stack layout for process

why is the output 44 instead of 12, &x,eip,ebp,&j should be the stack layout for the code
given below i guess, if so then it must have 12 as output, but i am getting it 44?
so help me understand the concept,altough the base pointer is relocated for every instant of execution, the relative must remain unchanged and should not it be 12?
int main() {
int x = 9;
// printf("%p is the address of x\n",&x);
fun(&x );
printf("%x is the address of x\n", (&x));
x = 3;
printf("%d x is \n",x);
return 0;
}
int fun(unsigned int *ptr) {
int j;
printf("the difference is %u\n",((unsigned int)ptr -(unsigned int) &j));
printf("the address of j is %x\n",&j);
return 0;
}
You're assuming that the compiler has packed everything (instead of putting things on specific alignment boundaries), and you're also assuming that the compiler hasn't inlined the function, or made any other optimisations or transformations.
In summary, you cannot make any assumptions about this sort of thing, or rely on any particular behaviour.
No, it should not be 12. It should not be anything. The ISO standard has very little to say about how things are arranged on the stack. An implementation has a great deal of leeway in moving things around and inserting padding for efficiency.
If you pass it through your compiler with an option to generate he assembler code (such as with gcc -S), it will become evident as to what's happening here.
fun:
pushl %ebp
movl %esp, %ebp
subl $40, %esp ;; this is quite large (see below).
movl 8(%ebp), %edx
leal -12(%ebp), %eax
movl %edx, %ecx
subl %eax, %ecx
movl %ecx, %eax
movl %eax, 4(%esp)
movl $.LC2, (%esp)
call printf
leal -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC3, (%esp)
call printf
movl $0, %eax
leave
ret
It appears that gcc is pre-generating the next stack frame (the one going down to printf) as part of the prolog for the fun function. That's in this case. Your compiler may be doing something totally different. But the bottom line is: the implementation can do what it likes as long as it doesn't violate the standard.
That's the code from optimisation level 0, by the way, and it gives a difference of 48. When I use gcc's insane optimisation level 3, I get a difference of 4. Again, perfectly acceptable, gcc usually does some pretty impressive optimisations at that level.

Can/do C compilers optimize out adress-of in inline functions?

Let's say I have following code:
int f() {
int foo = 0;
int bar = 0;
foo++;
bar++;
// many more repeated operations in actual code
foo++;
bar++;
return foo+bar;
}
Abstracting repeated code into a separate functions, we get
static void change_locals(int *foo_p, int *bar_p) {
*foo_p++;
*bar_p++;
}
int f() {
int foo = 0;
int bar = 0;
change_locals(&foo, &bar);
change_locals(&foo, &bar);
return foo+bar;
}
I'd expect the compiler to inline the change_locals function, and optimize things like *(&foo)++ in the resulting code to foo++.
If I remember correctly, taking address of a local variable usually prevents some optimizations (e.g. it can't be stored in registers), but does this apply when no pointer arithmetic is done on the address and it doesn't escape from the function? With a larger change_locals, would it make a difference if it was declared inline (__inline in MSVC)?
I am particularly interested in behavior of GCC and MSVC compilers.
inline (and all its cousins _inline, __inline...) are ignored by gcc. It might inline anything it decides is an advantage, except at lower optimization levels.
The code procedure by gcc -O3 for x86 is:
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
.ident "GCC: (GNU) 4.4.4 20100630 (Red Hat 4.4.4-10)"
It returns zero because *ptr++ doesn't do what you think. Correcting the increments to:
(*foo_p)++;
(*bar_p)++;
results in
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
movl $4, %eax
movl %esp, %ebp
popl %ebp
ret
So it directly returns 4. Not only did it inline them, but it optimized the calculations away.
Vc++ from vs 2005 provides similar code, but it also created unreachable code for change_locals(). I used the command line
/O2 /FD /EHsc /MD /FA /c /TP
If I remember correctly, taking
address of a local variable usually
prevents some optimizations (e.g. it
can't be stored in registers), but
does this apply when no pointer
arithmetic is done on the address and
it doesn't escape from the function?
The general answer is that if the compiler can ensure that no one else will change a value behind its back, it can safely be placed in a register.
Think of this as though the compiler first performs inlining, then transforms all those *&foo (which results from the inlining) to simply foo before deciding if they should be placed in registers on in memory on the stack.
With a larger change_locals, would it
make a difference if it was declared
inline (__inline in MSVC)?
Again, generally speaking, whether or not a compiler decides to inline something is done using heuristics. If you explicitly specify that you want something to be inlines, the compiler will probably weight this into its decision process.
I've tested gcc 4.5, MSC and IntelC using this:
#include <stdio.h>
void change_locals(int *foo_p, int *bar_p) {
(*foo_p)++;
(*bar_p)++;
}
int main() {
int foo = printf("");
int bar = printf("");
change_locals(&foo, &bar);
change_locals(&foo, &bar);
printf( "%i\n", foo+bar );
}
And all of them did inline/optimize the foo+bar value, but also did
generate the code for change_locals() (but didn't use it).
Unfortunately, there's still no guarantee that they'd do the same for
any kind of such a "local function".
gcc:
__Z13change_localsPiS_:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
incl (%edx)
incl (%eax)
leave
ret
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
pushl %ebx
subl $28, %esp
call ___main
movl $LC0, (%esp)
call _printf
movl %eax, %ebx
movl $LC0, (%esp)
call _printf
leal 4(%ebx,%eax), %eax
movl %eax, 4(%esp)
movl $LC1, (%esp)
call _printf
xorl %eax, %eax
addl $28, %esp
popl %ebx
leave
ret

Decoding equivalent assembly code of C code

Wanting to see the output of the compiler (in assembly) for some C code, I wrote a simple program in C and generated its assembly file using gcc.
The code is this:
#include <stdio.h>
int main()
{
int i = 0;
if ( i == 0 )
{
printf("testing\n");
}
return 0;
}
The generated assembly for it is here (only the main function):
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
leave
ret
I am at an absolute loss to correlate the C code and assembly code. All that the code has to do is store 0 in a register and compare it with a constant 0 and take suitable action. But what is going on in the assembly?
Since main is special you can often get better results by doing this type of thing in another function (preferably in it's own file with no main). For example:
void foo(int x) {
if (x == 0) {
printf("testing\n");
}
}
would probably be much more clear as assembly. Doing this would also allow you to compile with optimizations and still observe the conditional behavior. If you were to compile your original program with any optimization level above 0 it would probably do away with the comparison since the compiler could go ahead and calculate the result of that. With this code part of the comparison is hidden from the compiler (in the parameter x) so the compiler can't do this optimization.
What the extra stuff actually is
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
This is setting up a stack frame for the current function. In x86 a stack frame is the area between the stack pointer's value (SP, ESP, or RSP for 16, 32, or 64 bit) and the base pointer's value (BP, EBP, or RBP). This is supposedly where local variables live, but not really, and explicit stack frames are optional in most cases. The use of alloca and/or variable length arrays would require their use, though.
This particular stack frame construction is different than for non-main functions because it also makes sure that the stack is 16 byte aligned. The subtraction from ESP increases the stack size by more than enough to hold local variables and the andl effectively subtracts from 0 to 15 from it, making it 16 byte aligned. This alignment seems excessive except that it would force the stack to also start out cache aligned as well as word aligned.
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
I don't know what all this does. alloca increases the stack frame size by altering the value of the stack pointer.
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
I think you know what this does. If not, the movl just befrore the call is moving the address of your string into the top location of the stack so that it may be retrived by printf. It must be passed on the stack so that printf can use it's address to infer the addresses of printf's other arguments (if any, which there aren't in this case).
leave
This instruction removes the stack frame talked about earlier. It is essentially movl %ebp, %esp followed by popl %ebp. There is also an enter instruction which can be used to construct stack frames, but gcc didn't use it. When stack frames aren't explicitly used, EBP may be used as a general puropose register and instead of leave the compiler would just add the stack frame size to the stack pointer, which would decrease the stack size by the frame size.
ret
I don't need to explain this.
When you compile with optimizations
I'm sure you will recompile all fo this with different optimization levels, so I will point out something that may happen that you will probably find odd. I have observed gcc replacing printf and fprintf with puts and fputs, respectively, when the format string did not contain any % and there were no additional parameters passed. This is because (for many reasons) it is much cheaper to call puts and fputs and in the end you still get what you wanted printed.
Don't worry about the preamble/postamble - the part you're interested in is:
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
It should be pretty self-evident as to how this correlates with the original C code.
The first part is some initialization code, which does not make any sense in the case of your simple example. This code would be removed with an optimization flag.
The last part can be mapped to C code:
movl $0, -4(%ebp) // put 0 into variable i (located at -4(%ebp))
cmpl $0, -4(%ebp) // compare variable i with value 0
jne L2 // if they are not equal, skip to after the printf call
movl $LC0, (%esp) // put the address of "testing\n" at the top of the stack
call _printf // do call printf
L2:
movl $0, %eax // return 0 (calling convention: %eax has the return code)
Well, much of it is the overhead associated with the function. main() is just a function like any other, so it has to store the return address on the stack at the start, set up the return value at the end, etc.
I would recommend using GCC to generate mixed source code and assembler which will show you the assembler generated for each sourc eline.
If you want to see the C code together with the assembly it was converted to, use a command line like this:
gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst
See http://www.delorie.com/djgpp/v2faq/faq8_20.html
On linux, just use gcc. On Windows down load Cygwin http://www.cygwin.com/
Edit - see also this question Using GCC to produce readable assembly?
and http://oprofile.sourceforge.net/doc/opannotate.html
You need some knowledge about Assembly Language to understand assembly garneted by C compiler.
This tutorial might be helpful
See here more information. You can generate the assembly code with C comments for better understanding.
gcc -g -Wa,-adhls your_c_file.c > you_asm_file.s
This should help you a little.

Resources