I was putting together a C riddle for a couple of my friends when a friend drew my attention to the fact that the following snippet (which happens to be part of the riddle I'd been writing) ran differently when compiled and run on OSX
#include <stdio.h>
#include <string.h>
int main()
{
int a = 10;
volatile int b = 20;
volatile int c = 30;
int data[3];
memcpy(&data, &a, sizeof(data));
printf("%d %d %d\n", data[0], data[1], data[2]);
}
What you'd expect the output to be is 10 20 30, which happens to be the case under Linux, but when the code is built under OSX you'd get 10 followed by two random numbers. After some debugging and looking at the compiler-generated assembly I came to the conclusion that this is due to how the stack is built. I am by no means an assembly expert, but the assembly code generated on Linux seems pretty straightforward to understand while the one generated on OSX threw me off a little. Perhaps I could use some help from here.
This is the code that was generated on Linux:
.file "code.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB23:
.cfi_startproc
movl $10, -12(%rsp)
xorl %eax, %eax
movl $20, -8(%rsp)
movl $30, -4(%rsp)
ret
.cfi_endproc
.LFE23:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
And this is the code that was generated on OSX:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $20, -8(%rbp)
movl $30, -4(%rbp)
leaq L_.str(%rip), %rdi
movl $10, %esi
xorl %eax, %eax
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "%d %d %d\n"
.subsections_via_symbols
I'm really only interested in two questions here.
Why is this happening?
Are there any get-arounds to this issue?
I know this is not a practical way to utilize the stack as I'm a professional C developer, which is really the only reason I found this problem interesting to invest some of my time into.
Accessing memory past the end of a declared variable is undefined behaviour - there is no guarantee as to what will happen when you try to do that. Because of how the compiler generated the assembly under Linux, you happened to get the 3 variables directly in a row on the stack, however that behaviour is just a coincidence - the compiler could legally add extra data in between the variables on the stack or really do anything - the result is not defined by the language standard. So in answer to your first question, it's happening because what you're doing is not part of the language by design. In answer to your second, there's no way to reliably get the same result from multiple compilers because the compilers are not programmed to reliably reproduce undefined behaviour.
undefined behavior. You don't expect to copy 10, 20 ,30. You hope not to seg-fault.
There is nothing to guarantee that a,b, and c are sequential memory addresses, which is your naive assumption. On Linux, the compiler happened to make them sequential. You can't even rely on gcc always doing that.
You already know that the behavior is undefined. A good reason for the behavior to be different on OS/X and Linux is these systems use a different compiler, that generates different code:
When you run gcc in Linux, you invoke the installed version the Gnu C compiler.
When you run gcc in your version of OS/X, you most likely invoke the installed version of clang.
Try gcc --version on both systems and amaze your friends.
Related
I've been trying to get familiar with assembly on mac, and from what I can tell, the documentation is really sparse, and most books on the subject are for windows or linux. I thought I would be able to translate from linux to mac pretty easily, however this (linux)
.file "simple.c"
.text
.globl simple
.type simple, #function
simple:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
addl (%edx), %eax
movl %eax, (%edx)
popl %ebp
ret
.size simple, .-simple
.ident "GCC: (Ubuntu 4.3.2-1ubuntu11) 4.3.2"
.section .note.GNU-stack,"",#progbits
seems pretty different from this (mac)
.section __TEXT,__text,regular,pure_instructions
.globl _simple
.align 4, 0x90
_simple: ## #simple
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
addl (%rdi), %esi
movl %esi, (%rdi)
movl %esi, %eax
popq %rbp
ret
.cfi_endproc
.subsections_via_symbols
The "normal" (for lack of a better word) instructions and registers such as pushq %rbp don't worry me. But the "weird" ones like .cfi_startproc and Ltmp2: which are smack dab in the middle of the machine instructions don't make any sense.
I have no idea where to go to find out what these are and what they mean. I'm about to pull my hair out as I've been trying to find a good resource for beginners for months. Any suggestions?
To begin with, you're comparing 32-bit x86 assembly with 64-bit x86-64. While the OS X Mach-O ABI supports 32-bit IA32, I suspect you want the x86-64 SysV ABI. (Thankfully, the x86-64.org site seems to be up again). The Mach-O x86-64 model is essentially a variant of the ELF / SysV ABI, so the differences are relatively minor for user-space code, even with different assemblers.
The .cfi directives are DWARF debugging directives that you don't strictly need for assembly - they are used for call frame information, etc. Here are some minimal examples:
ELF x64-64 assembler:
.text
.p2align 4
.globl my_function
.type my_function,#function
my_function:
...
.L__some_address:
.size my_function,[.-my_function]
Mach-O x86-64 assembler:
.text
.p2align 4
.globl _my_function
_my_function:
...
L__some_address:
Short of writing an asm tutorial, the main differences between the assemblers are: leading underscores for Mach-O functions names, .L vs L for labels (destinations). The assembler with OS X understands the '.p2align' directive. .align 4, 0x90 essentially does the same thing.
Not all the directives in compiler-generated code are essential for the assembler to generate valid object code. They are required to generate stack frame (debugging) and exception handling data. Refer to the links for more information.
Obviously the Linux code is 32-bit Linux code. Note that 64-bit Linux can run both 32- and 64-bit code!
The Mac code is definitely 64-bit code.
This is the main difference.
The ".cfi_xxx" lines are only information used for the Mac specific file format.
I'm trying to learn how to understand assembly code so I've been studying the assembly output of GCC for some stupid programs. One of them was nothing but int i = 0;, the code of which I more or less fully understand now (the biggest struggle was understanding the GAS directives strewn about). Anyway, I went a step forward and added printf("%d\n", i); to see if I could understand that and suddenly the code is much more chaotic.
.file "helloworld.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%d\n"
.section .text.startup,"ax",#progbits
.p2align 4
.globl main
.type main, #function
main:
subq $8, %rsp
xorl %edx, %edx
leaq .LC0(%rip), %rsi
xorl %eax, %eax
movl $1, %edi
call __printf_chk#PLT
xorl %eax, %eax
addq $8, %rsp
ret
.size main, .-main
.ident "GCC: (Gentoo 10.2.0-r3 p4) 10.2.0"
.section .note.GNU-stack,"",#progbits
I'm compiling this with gcc -S -O3 -fno-asynchronous-unwind-tables to remove the .cfi directives, however -O2 produces the same code so -O3 is overkill. My understanding of assembly is quite limited but it seems to me like the compiler is doing a lot of unneccessary stuff here. Why subtract and then add 8 to rsp? Why is it performing so many xors? There's only one variable. What is movl $1, %edi doing? I thought maybe the compiler was doing something stupid in an attempt to optimize but as I said, it's not optimizing beyond -O2, also it performs all of these operations even at -O1. To be honest I don't understand the unoptimized code at all so I assume it's inefficient.
The only thing that comes to mind is that the call to printf uses these registers, otherwise they are unused and serve no purpose. Is that actually the case? If so, how is it possible to tell?
Thanks in advance. I'm reading a book on compiler design at the moment and I've read most of the GCC manual (I read the whole chapter on optimization) and I've read some introductory x86_64 asm material, if somebody could point me toward some other resources (besides the Intel x86 manual) for learning more I would also appreciate that.
For the compiler that you are using it looks like printf(...) is mapped to __printf_chk(1, ...)
To understand the code, you need to understand the parameter passing conventions for the platform (part of the ABI). Once you know that up to 4 params are passed in %rdi, %rsi, %rdx, %rcx, you can understand most of what is going on:
subq $8, %rsp ; allocate 8 bytes of stack
xorl %edx, %edx ; i = 0 ; put it in the 3rd parameter for __printf_chk
leaq .LC0(%rip), %rsi ; 2nd parameter for __printf_chk. The: "%d\n"
xorl %eax, %eax ; 0 variadic fp params
movl $1, %edi ; 1st parameter for __printf_chk
call __printf_chk#PLT ; call the runtime loader wrapper for __printf_chk
xorl %eax, %eax ; return 0 from main
addq $8, %rsp ; deallocate 8 bytes of stack.
ret
Nate points out in the comments that section 3.5.7 in the ABI explains the %eax = 0 (no floating point variadic parameters.)
I have compiled a program main.c with about two lines of code to see what directives gcc / gas add to the unoptimized assembly file, using:
gcc -o main.s main.c -S
I can look up the concise description of each directive on the gas directive page, but was hoping someone could give a bit more context to some of these directives and what its practical usage is (for example, in gdb or the linker or wherever downstream). Here is the full assembly file with the items in question below:
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $4, -8(%rbp)
movl $6, -4(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
.file: it seems this is halfway-obsolete based on This statement may go away in future: it is only recognized to be compatible with old as programs.. But given that it is still there, where or how is this currently being used?
.ident: it seems like this gives the same thing as doing gcc --version. Is this used at all beyond giving helper information on the 'gcc' that was used to issue the command, or how is this used?
.section .note...: I have seen .section .text, .section .bss, .section .text, ...but I've never come across a .note, and doing a ctrl-f to search for note doesn't give anything on this page. What is this line doing with the three arguments? And the #progbits ?
.size: given that the directives take up no space, this is giving us the length of the first statement within main -- pushq %rbp minus the last statement ret, which is the length of the main function. But again, what usage is this? Also, it says on the as page that It is only permitted inside .def/.endef pairs., but this isn't inside those pairs, right?
.section .text.startup,"ax",#progbits -- what is text.startup, the ax looks like it means allocatable+executable, but what or where is the text.startup ?
I'm trying to understand how does the compiler "sees" the i+1 part from expression i=i+1. I understand that i=3 means putting the value 3 in the location memory of variable i.
My guess about the i=i+1 is that the compiler expects a value from the right side of the "=" operator, so it gets the value from the location memory of variable i (which is 3, after the assignment) and add 1 to it, and the final result of the "i+1" expression(3+1=4) is stored back into the location memory of variable i, as a value. Is that correct?
And if it is, it means that any variable/combination of variables and literals present on the right side of an "=" operator will always be replaced with the value stored in them and those value can be added/substracted/etc with the values from other variables/literals (as in the x+1 expression), whilst the final result of those calculations will also be literal values (ex: 5, literal strings, etc), and will also be stored like values in a single variable on the left side of the "=" operator.
I'm also curious how this code is seen in assembly, and what are the main operations of this incrementation of i ( i = i+1);
#include <stdio.h>
int main()
{
int i = 3;
i = i + 1; // i should have the value of 4 stored back in it;
return 0;
}
This is not answerable for the general case. It depends on the target platform. If you want to inspect the assembly, you can do so with the -S parameter with gcc. When I did that to your code, it gave me this:
/tmp$ cat main.s
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $3, -4(%rbp)
addl $1, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
A brief little explanation of what is happening here. First we push the value of the stackpointer. This is so that we can jump back later.
.cfi_startproc
pushq %rbp
Then we set up the stack frame with this code. It corresponds to declaring variables.
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
Then we have this. Comments are mine.
movl $3, -4(%rbp) # i = 3;
addl $1, -4(%rbp) # i = i + 1;
Lastly, we return from the main function
movl $0, %eax # Store 0 in the "return register"
popq %rbp # Restore stackpointer
.cfi_def_cfa 7, 8
ret # return
Note that there is not a 1-1 relationship between lines. Not even for very simple lines.
Please also note that C imposes requirement on the observable behavior of the program and not on the generated assembly. So for instance, a compiler might remove the whole body for the main function because the variable i is not used in an observable way. And it will if you use optimization. When I recompiled your code with -O3 I got this instead:
/tmp/$ cat main.s
.file "main.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4
.globl main
.type main, #function
main:
.LFB11:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
Notice how much that got removed from main. It can be interesting that movl $0, %eax has changed to xorl %eax, %eax. If you think about it, it's pretty obvious that this is a "set zero" operation. One could reasonably argue why anyone would write stuff like that. Well, the optimizer does certainly not optimize for readability. There are a few reasons why it is better. You can read about them here: What is the best way to set a register to zero in x86 assembly: xor, mov or and?
I am trying to learn how to use ptrace library for tracing all system calls and their arguments. I am stuck in getting the arguments passed to system call.
I went through many online resources and SO questions and figured out that on 64 bit machine the arguments are stored in registers rax(sys call number), rdi, rsi, rdx, r10, r8, r9
in the same order. Check this website .
Just to confirm this I wrote a simple C program as follows
#include<stdio.h>
#include<fcntl.h>
int main() {
printf("some print data");
open("/tmp/sprintf.c", O_RDWR);
}
and generated assembly code for this using gcc -S t.c but assembly code generated is as below
.file "t.c"
.section .rodata
.LC0:
.string "some print data"
.LC1:
.string "/tmp/sprintf.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
movl $0, %eax
call printf
movl $2, %esi
movl $.LC1, %edi
movl $0, %eax
call open
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
As you can see this code is storing parameters on esi and edi instead.
Why is happening?
Also please guide me on what is the best way to access these passed arguments from these registers/memory location from a C code? How can I figure out if the contents of register is the argument itself or is it a memory location where actual argument is stored?
Thanks!
this code is storing parameters on esi and edi
32-bit instructions are smaller, thus preferred when possible. See also Why do most x64 instructions zero the upper part of a 32 bit register.
How can I figure out if the contents of register is the argument itself or is it a memory location where actual argument is stored?
The AMD64 SystemV calling convention never implicitly replaces a function arg with a hidden pointer. Integer / pointer args in the C prototype always go in the arg-passing registers directly.
structs / unions passed by value go in one or more registers, or on the stack.
The full details are documented in the ABI. See more links in the x86 tag wiki. http://www.x86-64.org/documentation.html is down right now, so I linked the current revision on github.