I am trying to figure out, how does compiler pad space between each struct members. In this example:
struct s{
int a,b,c;
};
struct s get(int a){
struct s foo = {.a=a,.b=a+1,.c=a+2};
return foo;
}
is compiled with cc -S a.c:
.file "a.c"
.text
.globl get
.type get, #function
get:
.LFB0:
pushq %rbp
movq %rsp, %rbp
movl %edi, -36(%rbp)
movl -36(%rbp), %eax
movl %eax, -24(%rbp)
movl -36(%rbp), %eax
addl $1, %eax
movl %eax, -20(%rbp)
movl -36(%rbp), %eax
addl $2, %eax
movl %eax, -16(%rbp)
movq -24(%rbp), %rax
movq %rax, -12(%rbp)
movl -16(%rbp), %eax
movl %eax, -4(%rbp)
movq -12(%rbp), %rax
movl -4(%rbp), %ecx
movq %rcx, %rdx
popq %rbp
ret
.LFE0:
.size get, .-get
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",#progbits
No optimization is used. The question is why is there -36(%rbp) used as first member "reference", when they are arranged sequentially in
.a == -24(%rbp)
.b == -20(%rbp)
.c == -16(%rbp)
There is no need to make room with -36(%rbp) which compiler uses here. Is it intentionally (as a room or compiler uses the -36(%rbp) as a "reference" to the first member)?
Also, at the end,
movq -24(%rbp), %rax #take first member
movq %rax, -12(%rbp) #place it randomly
movl -16(%rbp), %eax #take third member
movl %eax, -4(%rbp) #place it randomly
Does not make sense, it is not sequential with the initial struct, and the first and third member of the struct are copied randomly in the space the function get had allocated.
What is the convention for structs?
The code you observe is a jumble of three different things: the actual layout of a struct s, the ABI specification of how to return structs from functions, and the anti-optimizations inserted by many compilers in their default mode (equivalent to -O0) to ensure that unsophisticated debuggers can find and change the values of variables while stopped at any breakpoint (see Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? for more about this).
You can cut out the second of these factors by having get write into a struct s * argument, instead of returning a struct by value, and the third by compiling with gcc -O2 -S instead of just gcc -S. (Also try -Og and -O1; the complex optimizations applied at -O2 can be confusing, too.) For instance:
$ cat test.c
struct s {
int a,b,c;
};
void get(int a, struct s *s)
{
s->a = a;
s->b = a+1;
s->c = a+2;
}
$ gcc -O2 -S test.c
$ cat test.s
.file "test.c"
.text
.p2align 4
.globl get
.type get, #function
get:
.LFB0:
.cfi_startproc
leal 1(%rdi), %eax
movl %edi, (%rsi)
addl $2, %edi
movl %eax, 4(%rsi)
movl %edi, 8(%rsi)
ret
.cfi_endproc
.LFE0:
.size get, .-get
.ident "GCC: (Debian 9.3.0-13) 9.3.0"
.section .note.GNU-stack,"",#progbits
From this assembly language it should be clearer that a is at offset 0 within struct s, b is at offset 4, and c at offset 8.
Struct layout is specified by the "psABI" (processor-specific application binary interface) for each CPU architecture. You can read the psABI specs for x86 at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI. These also explain how structs are returned from functions. It's also important to know that the layout of a stack frame is only partially specified by the psABI. Some of the "random" offsets in your assembly dump are, in fact, arbitrarily chosen by the compiler.
Related
Let say I have the (x << n) | (x >> (-n & 63)) expression.
There is nothing conditional in it.
So, to my understanding, it will be executed in constant time.
Indeed, when I compile the following code using gcc -O3 -S:
#include <stdint.h>
// rotate left x by n places assuming n < 64
uint64_t rotl64(uint64_t x, uint8_t n) {
return (x << n) | (x >> (-n & 63));
}
I get, on linux/amd64, the following output (which executes in constant time):
.file "test.c"
.text
.p2align 4
.globl rotl64
.type rotl64, #function
rotl64:
.LFB0:
.cfi_startproc
movq %rdi, %rax
movl %esi, %ecx
rolq %cl, %rax
ret
.cfi_endproc
.LFE0:
.size rotl64, .-rotl64
.ident "GCC: (Alpine 9.3.0) 9.3.0"
.section .note.GNU-stack,"",#progbits
However, on linux/386 I get an output that contains conditional jumps:
.file "test.c"
.text
.p2align 4
.globl rotl64
.type rotl64, #function
rotl64:
.LFB0:
.cfi_startproc
pushl %edi
.cfi_def_cfa_offset 8
.cfi_offset 7, -8
pushl %esi
.cfi_def_cfa_offset 12
.cfi_offset 6, -12
movl 12(%esp), %eax
movl 16(%esp), %edx
movzbl 20(%esp), %ecx
movl %eax, %esi
movl %edx, %edi
shldl %esi, %edi
sall %cl, %esi
testb $32, %cl
je .L4
movl %esi, %edi
xorl %esi, %esi
.L4:
negl %ecx
andl $63, %ecx
shrdl %edx, %eax
shrl %cl, %edx
testb $32, %cl
je .L5
movl %edx, %eax
xorl %edx, %edx
.L5:
orl %esi, %eax
orl %edi, %edx
popl %esi
.cfi_restore 6
.cfi_def_cfa_offset 8
popl %edi
.cfi_restore 7
.cfi_def_cfa_offset 4
ret
.cfi_endproc
.LFE0:
.size rotl64, .-rotl64
.ident "GCC: (Alpine 9.3.0) 9.3.0"
.section .note.GNU-stack,"",#progbits
From what I understand, here the 64 bits operations have to be emulated, hence the need of conditional jumps.
Does GCC provide a builtin function that indicates if an expression will be compiled with no jumps?
If it isn't the case, how can I know if an expression will be executed in constant time?
Is this a problem for timing sensitive applications like security?
Does GCC provide a builtin function that indicates if an expression will be compiled with no jumps?
No.
If it isn't the case, how can I know if an expression will be executed in constant time?
By looking at the generated assembly code.
Is this a problem for timing sensitive applications like security?
Yes. That's why in these cases don't trust the compilers (and porters/package builders changing compiler settings) and rather implement it in assembly.
There are some constant time functions in general libc's, like in OpenBSD and FreeBSD. Like timingsafe_bcmp and timingsafe_memcmp, which are written in pure C, but their authors trust their packagers not to be like Debian or Ubuntu, who are assumed to break it.
Many other such functions are in the various security libraries itself, but even then you can safely assume that they are broken. For sure in OpenSSL and libsodium in many cases.
No such a function does not exist.
And unless you are writing the compiler (you're not) you should not really care about the actual machine code being generated. The compiler is free to optimize that code anyway it sees fit (as long as it is correct) depending on the options you pass in. And with -O3 you should get the fastest code, even with jumps.
If there were a function like you suggested, you're code would be tied to a single version of a single compiler with a particular set of optimization options. In other words: bye bye portability.
This simple c:
#include <stdio.h>
#include <string.h>
int *add(int a, int b){
int ar[1];
int result = a+b;
memcpy(ar, &result, sizeof(int));
return ar;
}
int main(){
int a = add(1,2)[0];
printf("%i\n",a);
}
is compiled into this:
.text
.globl add
.type add, #function
add:
pushq %rbp #
movq %rsp, %rbp #,
movl %edi, -20(%rbp) # a, a
movl %esi, -24(%rbp) # b, b
# a.c:5: int result = a+b;
movl -20(%rbp), %edx # a, tmp91
movl -24(%rbp), %eax # b, tmp92
addl %edx, %eax # tmp91, _1
# a.c:5: int result = a+b;
movl %eax, -8(%rbp) # _1, result
# a.c:6: memcpy(ar, &result, sizeof(int)); ---I SEE NO CALL INSTRUCTION---
movl -8(%rbp), %eax # MEM[(char * {ref-all})&result], _6
movl %eax, -4(%rbp) # _6, MEM[(char * {ref-all})&ar]
# a.c:7: return ar;
movl $0, %eax #--THE FUNCTION SHOULD RETURN ADDRESS OF ARRAY, NOT 0. OTHERWISE command terminated
# lea -4(%rbp), %rax #--ONLY THIS IS CORRECT, NOT `0`
# a.c:8: }
popq %rbp #
ret
.size add, .-add
.section .rodata
.LC0:
.string "%i\n"
.text
.globl main
.type main, #function
main:
pushq %rbp #
movq %rsp, %rbp #,
subq $16, %rsp #,
# a.c:11: int a = add(1,2)[0];
movl $2, %esi #,
movl $1, %edi #,
call add #
# a.c:11: int a = add(1,2)[0];
movl (%rax), %eax # *_1, tmp90
movl %eax, -4(%rbp) # tmp90, a
# a.c:12: printf("%i\n",a);
movl -4(%rbp), %eax # a, tmp91
movl %eax, %esi # tmp91,
leaq .LC0(%rip), %rdi #,
movl $0, %eax #,
call printf#PLT #
movl $0, %eax #, _6
# a.c:13: }
leave
ret
.size main, .-main
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",#progbits
Every function from stdlib like printf or puts are called from GOT (i.e. %rip register holds the address of GOT). But not memcpy, it is like "assembly inline instructions" instead of regular call address. So is memcpy even a symbol? If so, why is it not as argument to call? Is memcpy in GOT table? If so, what is a offset from GOT to that symbol?
So first off, you have a bug:
$ cc -O2 -S test.c
test.c: In function ‘add’:
test.c:7:12: warning: function returns address of local variable
Returning the address of a local variable has undefined behavior, if and only if the caller uses that value; this is why your compiler generated code that returned a null pointer, which will crash the program if used but be harmless otherwise. In fact, my copy of GCC generates only this for add:
add:
xorl %eax, %eax
ret
because that treatment of the return value makes the other operations in add be dead code.
(The "only if used" restriction is also why my compiler generates a warning, not a hard error.)
Now, if I modify your program to have well-defined behavior, e.g.
#include <stdio.h>
#include <string.h>
void add(int *sum, int a, int b)
{
int result = a+b;
memcpy(sum, &result, sizeof(int));
}
int main(void)
{
int a;
add(&a, 1, 2);
printf("%i\n",a);
return 0;
}
then I do indeed see assembly code in which the memcpy call has been replaced by inline code:
add:
addl %edx, %esi
movl %esi, (%rdi)
ret
This is a feature of many modern C compilers: they know what some of the C library's functions do, and can inline them when that makes sense. (You can see that in this case the generated code is both smaller and faster than it would have been with an actual call to memcpy.)
GCC lets me turn this feature off with a command-line option:
$ gcc -O2 -ffreestanding test.c
$ sed -ne '/^add:/,/cfi_endproc/{; /^\.LF[BE]/d; /\.cfi_/d; p; }' test.s
add:
subq $24, %rsp
addl %edx, %esi
movl $4, %edx
movl %esi, 12(%rsp)
leaq 12(%rsp), %rsi
call memcpy#PLT
addq $24, %rsp
ret
In this mode, the call to memcpy in add is treated the same as the call to printf in main. Your compiler may have similar options.
I have the following C program
int main() {
char string[] = "Hello, world.\r\n";
__asm__ volatile ("syscall;" :: "a" (1), "D" (0), "S" ((unsigned long) string), "d" (sizeof(string) - 1)); }
which I want to run under Linux with with x86 64 bit. I call the syscall for "write" with 0 as fd argument because this is stdout.
If I compile under gcc with -O3, it does not work. A look into the assembly code
.file "test_for_o3.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
subq $40, %rsp
.cfi_def_cfa_offset 48
xorl %edi, %edi
movl $15, %edx
movq %fs:40, %rax
movq %rax, 24(%rsp)
xorl %eax, %eax
movq %rsp, %rsi
movl $1, %eax
#APP
# 5 "test_for_o3.c" 1
syscall;
# 0 "" 2
#NO_APP
movq 24(%rsp), %rcx
xorq %fs:40, %rcx
jne .L5
xorl %eax, %eax
addq $40, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
call __stack_chk_fail#PLT
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0"
.section .note.GNU-stack,"",#progbits
tells us that gcc has simply not put the string data into the assembly code. Instead, if I declare "string" as "volatile", it works fine.
However, the idea of "volatile" is just to use it for variables that can change their values by (from the view of the executing function) unexpected events, isn't it? "volatile" can make code much slower, hence it should be avoided if possible.
As I would suppose, gcc must assume that the content of "string" must not be ignored because the pointer "string" is used as an input parameter in the inline assembly (and gcc has no idea what the inline assembly code will do with it).
If this is "allowed" behaviour of gcc, where can I read more about all the formal constraints I have to be aware of when writing code for -O3?
A second question would be what the "volatile" statement along with the inline assembly directive does exactly. I just got used to mark all inline assembly directives with "volatile" because it had not worked otherwise, in some situations.
I am doing several experiments with x86 asm trying to see how common language constructs map into assembly. In my current experiment, I am trying to see specifically how C language pointers map to register-indirect addressing. I have written a fairly hello-world like pointer program:
#include <stdio.h>
int
main (void)
{
int value = 5;
int *int_val = &value;
printf ("The value we have is %d\n", *int_val);
return 0;
}
and compiled it to the following asm using: gcc -o pointer.s -fno-asynchronous-unwind-tables pointer.c:[1][2]
.file "pointer.c"
.section .rodata
.LC0:
.string "The value we have is %d\n"
.text
.globl main
.type main, #function
main:
;------- function prologue
pushq %rbp
movq %rsp, %rbp
;---------------------------------
subq $32, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
;----------------------------------
movl $5, -20(%rbp) ; This is where the value 5 is stored in `value` (automatic allocation)
;----------------------------------
leaq -20(%rbp), %rax ;; (GUESS) If I have understood correctly, this is where the address of `value` is
;; extracted, and stored into %rax
;----------------------------------
movq %rax, -16(%rbp) ;;
movq -16(%rbp), %rax ;; Why do I have two times the same instructions, with reversed operands???
;----------------------------------
movl (%rax), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
;----------------------------------
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L3
call __stack_chk_fail
.L3:
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.9.1-16ubuntu6) 4.9.1"
.section .note.GNU-stack,"",#progbits
My issue is that I don't understand why it contains the instruction movq two times, with reversed operands. Could someone explain it to me?
[1]: I want to avoid having my asm code interspersed with cfi directives when I don't need them at all.
[2]: My environment is Ubuntu 14.10, gcc 4.9.1 (modified by ubuntu), and Gnu assembler (GNU Binutils for Ubuntu) 2.24.90.20141014, configured to target x86_64-linux-gnu
Maybe it will be clearer if you reorganize your blocks:
;----------------------------------
leaq -20(%rbp), %rax ; &value
movq %rax, -16(%rbp) ; int_val
;----------------------------------
movq -16(%rbp), %rax ; int_val
movl (%rax), %eax ; *int_val
movl %eax, %esi ; printf-argument
movl $.LC0, %edi ; printf-argument (format-string)
movl $0, %eax ; no floating-point numbers
call printf
;----------------------------------
The first block performs int *int_val = &value;, the second block performs printf .... Without optimization, the blocks are independent.
Since you're not doing any optimization, gcc creates very simple-minded code that does each statement in the program one at a time without looking at any other statement. So in your example, it stores a value into the variable int_val, and then the very next instruction reads that variable again as part of the next statement. In both cases, it is using %rax as the temporary to hold value, as that's the first register generally used for things.
I am trying to understand how to embed assembly language in C (using gcc on x86_64 architecture). I wrote this program to increment the value of a single variable. But I am getting garbage value as output. And ideas why?
#include <stdio.h>
int main(void) {
int x;
x = 4;
asm("incl %0": "=r"(x): "r0"(x));
printf("%d", x);
return 0;
}
Thanks
Update The program is giving expected result on gcc 4.8.3 but not on gcc 4.6.3. I am pasting the assembly output of the non-working code:
.file "abc.c"
.section .rodata
.LC0:
.string "%d"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
movl $4, -20(%rbp)
movl -20(%rbp), %eax
incl %edx
movl %edx, %ebx
.cfi_offset 3, -24
movl %ebx, -20(%rbp)
movl $.LC0, %eax
movl -20(%rbp), %edx
movl %edx, %esi
movq %rax, %rdi
movl $0, %eax
call printf
movl $0, %eax
addq $24, %rsp
popq %rbx
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
You don't need to say x twice; once is sufficient:
asm("incl %0": "+r"(x));
The +r says that the value will be input and output.
Your way, with separate inputs and output registers, requires that you take the input from %1, add one, and write the output to %0, but you can't do that with incl.
The reason it works on some compilers is because GCC is free to allocate both %0 and %1 to the same register, and appears to have done so in those cases, but it does not have to. Incidentally, if you want to prevent GCC allocating an input and output to the same register (say, if you want to initialize the output before using the input to calculate a final output), you need to use the & modifier.
The documentation for the modifiers is here.