Context:
Linux 64.
I would like a way to tell gcc to keep the structure as they are when generating assembly with gcc -O0 -S -g myprog.c
By that, I mean: instead of referencing the structure by address, I would like them to be referenced by label. That would ease the parsing without reading the source code again.
So, for example:
struct mystruct{
int32_t a;
char * b;
}
would become something like:
label_mystruct:
-4(label_mystruct)
-12(label_mystruct)
and for example, referenced by:
add $56, -4(label_mystruct)
Currently, it is referenced like
.globl _main
_main:
LFB13:
LM157:
pushq %rbp #
LCFI27:
movq %rsp, %rbp#,
LCFI28:
subq $80, %rsp#,
movl %edi,-68(%rbp) # argc, argc,
movq %rsi,-80(%rbp) # argv, argv
Next line is the culprit:
movq -56(%rbp), %rdx # list, D.3781
movq -16(%rbp), %rax # arr, D.3780
movq %rdx, %rsi # D.3781,
movq %rax, %rdi # D.3780,
call _myaddhu #
I would like it to be
label_mystruct:
-4(label_mystruct)
-12(label_mystruct)
.globl _main
_main:
LFB13:
LM157:
pushq %rbp #
LCFI27:
movq %rsp, %rbp#,
LCFI28:
subq $80, %rsp#,
movl %edi,-68(%rbp) # argc, argc,
movq %rsi,-80(%rbp) # argv, argv
Now it is fine:
movq label_mystruct, %rdx # list, D.3781
movq -16(%rbp), %rax # arr, D.3780
movq %rdx, %rsi # D.3781,
movq %rax, %rdi # D.3780,
call _myaddhu #
Question:
Is that possible with gcc and without using external tools?
I think it's not possible, and this is by the setup used in GCC.
The problem here is that the struct here is stored on the stack and you cannot really have a label referring to something on the stack. If the struct was not on the stack it would have had a label referring to it (for example if it were a global variable).
What you have on the other hand is that GCC would generate debugging info which has information about what data is placed when running specific code. In your example it would in essense say that "when executing this code -56(%ebp) points to mystruct".
On the other hand if you would write assembler code by hand you could certainly have symbolic references to a variable. You could for example do:
#define MYSTRUCT -56(%ebp)
...
movq MYSTRUCT, %rdx
however the MYSTRUCT will be expanded and that symbol being lost during assembling the code. It would be of no help if GCC did this (except maybe that the assembler code generated by -s could be more readable), in addition GCC does not pass the assembler through preprocessor anyway (because it don't do this).
You get that if you put your struct into static storage. This of course alters the meaning of the code. For example, this code
struct {
int a, b;
} test;
int settest(int a, b) {
test.a = a;
test.b = b;
}
compiles to (cleaned up):
settest:
movl %edi, test(%rip)
movl %esi, test+4(%rip)
ret
.comm test,8,4
You could also try to pass the option -fverbose-asm to gcc which instructs gcc to add some annotations that might make the assembly easier to read.
Related
This question already has answers here:
Can gcc or clang inline functions that are not in the same compilation unit?
(1 answer)
How do I force gcc to inline a function?
(8 answers)
C, inline function and GCC [duplicate]
(4 answers)
In C, should inline functions in headers be externed in the .c file?
(2 answers)
Closed 7 days ago.
There maybe a very simple solution to this problem but it has been bothering me for a while, so I have to ask.
In our embedded projects, it seems common to have simple get/set functions to many variables in separate C-files. Then, those variables are being called from many other C-files. When I look the assembly listing, those function calls are never replaced with move instructions. Faster way would be to just declare monitored variables as global variables to avoid unnecessary function calls.
Let's say you have a file.c which has variables that need to be monitored in another C-file main.c. For example, debugging variables, hardware registers, adc-values, etc. Is there a compiler optimization that replaces simple get/set functions with assembly move instructions thus avoiding unnecessary overhead caused by function calls?
file.h
#ifndef FILE_H
#define FILE_H
#include <stdint.h>
int32_t get_signal(void);
void set_signal(int32_t x);
#endif
file.c
#include "file.h"
#include <stdint.h>
static volatile int32_t *signal = SOME_HARDWARE_ADDRESS;
int32_t get_signal(void)
{
return *signal;
}
void set_signal(int32_t x)
{
*signal = x;
}
main.c
#include "file.h"
#include <stdio.h>
int main(int argc, char *args[])
{
// Do something with the variable
for (int i = 0; i < 10; i++)
{
printf("signal = %d\n", get_signal());
}
return 0;
}
If I compile the above code with gcc -Wall -save-temps main.c file.c -o main.exe, it gives the following assembly listing for main.c. You can always see the call get_signal even if you compile with -O3 flag which seems silly as we are only reading memory address. Why bother calling such simple function?
Same explanation applies for the simple set function. It is always called even though we would be only writing to one memory location in the function and doing nothing else.
main.s
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $48, %rsp
.seh_stackalloc 48
.seh_endprologue
movl %ecx, 16(%rbp)
movq %rdx, 24(%rbp)
call __main
movl $0, -4(%rbp)
jmp .L4
.L5:
call get_signal
movl %eax, %edx
leaq .LC0(%rip), %rcx
call printf
addl $1, -4(%rbp)
.L4:
cmpl $9, -4(%rbp)
jle .L5
movl $0, %eax
addq $48, %rsp
popq %rbp
ret
UPDATED 2023-02-13
Question was closed with several links to inline and Link-time Optimization-related answers. I don't think the same question has been answered before or at least the solution is not obvious for my get_function. What is there to inline if a function just returns a value and does nothing else?
Anyways, it seems, as suggested, that one solution to fix this problem is to add compiler flags -O2 -flto which correctly replaces assembly instruction call get_signal with move instruction with the following partial output:
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
movl tmp.0(%rip), %edx
movl $10, %eax
.p2align 4,,10
.p2align 3
.L4:
movl signal(%rip), %ecx
addl %ecx, %edx
subl $1, %eax
jne .L4
leaq .LC0(%rip), %rcx
movl %edx, tmp.0(%rip)
call printf.constprop.0
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
Thank you.
I'm experimenting disassembling clang binaries of simple C programs (compiled with -O0), and I'm confused about a certain instruction that gets generated.
Here are two empty main functions with standard arguments, one of which returns value and other does not:
// return_void.c
void main(int argc, char** argv)
{
}
// return_0.c
int main(int argc, char** argv)
{
return 0;
}
Now, when I disassemble their assemblies, they look reasonably different, but there's one line that I don't understand:
return_void.bin:
(__TEXT,__text) section
_main:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 movl %edi, -0x4(%rbp)
0000000000000007 movq %rsi, -0x10(%rbp)
000000000000000b popq %rbp
000000000000000c retq
return_0.bin:
(__TEXT,__text) section
_main:
0000000100000f80 pushq %rbp
0000000100000f81 movq %rsp, %rbp
0000000100000f84 xorl %eax, %eax # We return with EAX, so we clean it to return 0
0000000100000f86 movl $0x0, -0x4(%rbp) # What does this mean?
0000000100000f8d movl %edi, -0x8(%rbp)
0000000100000f90 movq %rsi, -0x10(%rbp)
0000000100000f94 popq %rbp
0000000100000f95 retq
It only gets generated when I use the function is not void, so I thought that it might be another way to return 0, but when I changed the returned constant, this line didn't change at all:
// return_1.c
int main(int argc, char** argv)
{
return 1;
}
empty_return_1.bin:
(__TEXT,__text) section
_main:
0000000100000f80 pushq %rbp
0000000100000f81 movq %rsp, %rbp
0000000100000f84 movl $0x1, %eax # Return value modified
0000000100000f89 movl $0x0, -0x4(%rbp) # This value is not modified
0000000100000f90 movl %edi, -0x8(%rbp)
0000000100000f93 movq %rsi, -0x10(%rbp)
0000000100000f97 popq %rbp
0000000100000f98 retq
Why is this line getting generated and what is it's purpose?
The purpose of that area is revealed by the following code
int main(int argc, char** argv)
{
if (rand() == 42)
return 1;
printf("Helo World!\n");
return 0;
}
At the start it does
movl $0, -4(%rbp)
then the early return looks as follows
callq rand
cmpl $42, %eax
jne .LBB0_2
movl $1, -4(%rbp)
jmp .LBB0_3
and then at the end it does
.LBB0_3:
movl -4(%rbp), %eax
addq $32, %rsp
popq %rbp
retq
So, this area is indeed reserved to store the function return value. It doesn't appear to be terribly necessary and it is not used in optimized code, but in -O0 mode that's the way it works.
clang is making space on the stack for the arguments (registers edi and rsi) and puts the value 0 on the stack, too, for some reason. I assume that clang compiles your code to an SSA-representation like this:
int main(int argc, char** argv)
{
int a;
a = 0;
return a;
}
This would explain why a stack slot is allocated. If clang does constant propagation, too, this would explain why eax is zeroed out instead of being loaded from -4(%rbp). In general, don't think too much about dubious constructs in unoptimized assembly. After all, you forbade the compiler from removing useless code.
movl $0x0,-0x4(%rbp)
This instruction stores 0 at %rbp - 4. It seems that clang allocates a hidden local variable for an implicit return value from main.
From the clang mailing list:
Yes. We allocate an implicit local variable to hold the return value;
return statements then just initialize the return slot and jump to the
epilogue, where the slot is loaded and returned. We don't use a phi
because the control flow for getting to the epilogue is not
necessarily as simple as a simple branch, due to cleanups in local
scopes (like C++ destructors).
Implicit return values like main's are handled with an implicit store
in the prologue.
Source: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012-February/019767.html
According the the standard (for hosted environments), 5.1.2.2.1, main is required to returrn an int result. So do not expect defined behavior if violating this.
Furthermore, main is actually _not required to explicitly return 0; this is implicitly returned if it reaches the end of the function. (Note this is only for main, which also does not have a prototype.
I have a program goo.c
void foo(double);
#include <stdio.h>
void foo(int x){
printf ("in foo.c:: x= %d\n",x);
}
which is called by foo.c
int main(){
double x=3.0;
foo(x);
}
I compile and run
gcc foo.c goo.c
./a.out
Guess what? I get "x= 1" as result. Then I find the signature of 'foo' should have been void foo(int). Apparently, my double input value 3.0 has to be downcast to an int. But, if I try to see the value of (int) 3.0 with the test program:
int main(){
double x=3.0;
printf ("%d", ((int) x));
}
I get 3 as output, which makes the earlier ` x= 1' even more hard to understand. Any idea? For information, my gcc is run with ANSI C standard. Thanks.
[EDIT] If I use gcc -S as suggested by JS1,
I get goo.s
.file "goo.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movabsq $4613937818241073152, %rax
movq %rax, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, -24(%rbp)
movsd -24(%rbp), %xmm0
call foo
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
and foo.s
.file "foo.c"
.section .rodata
.LC0:
.string "in foo.c:: x= %d\n"
.text
.globl foo
.type foo, #function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
Anyone who knows how to read Assembly can help figure out the source problem?
Understanding why you get '1' requires a bit of ASM and x86-64 ABI
knowledge. First of all, goo.c and foo.c are two separate compilation
units. The only thing that foo.c knows about the foo function is
the bogus prototype.
The bogus prototype is as follows: void foo(double);. It's a function
that takes only a single double argument. The x86-64 ABI mandates that
the doubles are passed through the xmm registers (The exact phrasing
is 'If the class is SSE, the next available vector register is used,
the registers are taken in the order from %xmm0 to %xmm7.'.
That means that when the compiler sets up the arguments to call the
foo() function, it's going to pass the argument via %xmm0. In
simplified asm what happens is:
mov 3.0, %xmm0
call foo
Now, foo(), on it's side, believes it's going to recieve an int. The
x86-64 ABI says: 'If the class is INTEGER, the next available register
of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.'. The first
argument is supposed to be passed via %rdi. That means that foo()
will do something like:
mov %rdi, %rsi
mov 0xabcd, %rdi // 0xabcd being the address of the "%d" string
call printf
So you're going to end up printing whatever was in %rsi, and not %xmm0.
But why 1? You'll get an idea by issuing the following commands:
./a.out a
./a.out a b
./a.out a b c
See a pattern? Let's go back to the simplified assembly:
main:
mov 3.0, %xmm0
call foo
ret
foo:
mov %rdi, %rsi
mov 0xabcd, %rdi // 0xabcd being the address of the "%d" string
call printf
ret
As you can see, nothing is setting %rdi until it reaches foo(),
where it's passed on to printf. Which means 1 was passed to main
in the first place. Now, in the question, main is given the following
prototype: int main(). But the compiler actually setup the function to
have the following prototype instead: int main (int argc, char *argv[],
char *envp[]). The first argument, thus stored in %rdi, is actually
argc. That's why the program was printing 1.
I am doing some extended assembly optimization on gnu C code running on 64 bit linux. I wanted to print debugging messages from within the assembly code and that's how I came accross the following. I am hoping someone can explain what I am supposed to do in this situation.
Take a look at this sample function:
void test(int a, int b, int c, int d){
__asm__ volatile (
"movq $0, %%rax\n\t"
"pushq %%rax\n\t"
"popq %%rax\n\t"
:
:"m" (a)
:"cc", "%rax"
);
}
Since the four agruments to the function are of class INTEGER, they will be passed through registers and then pushed onto the stack. The strange thing to me is how gcc actually does it:
test:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl %edx, -12(%rbp)
movl %ecx, -16(%rbp)
movq $0, %rax
pushq %rax
popq %rax
popq %rbp
ret
The passed arguments are pushed onto the stack, but the stack pointer is not decremented. Thus, when I do pushq %rax, the values of a and b are overwritten.
What I am wondering: is there a way to ask gcc to properly set up the local stack? Am I simply not supposed to use push and pop in function calls?
x86-64 abi provides a 128 byte red zone under the stack pointer, and the compiler decided to use that. You can turn that off using -mno-red-zone option.
I wanted to look into how certain C/C++ features were translated into assembly and I created the following file:
struct foo {
int x;
char y[0];
};
char *bar(struct foo *f)
{
return f->y;
}
I then compiled this with gcc -S (and also tried with g++ -S) but when I looked at the assembly code, I was disappointed to find a trivial redundancy in the bar function that I thought gcc should be able to optimize away:
_bar:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movabsq $4, %rcx
addq %rcx, %rax
movq %rax, -24(%rbp)
movq -24(%rbp), %rax
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
popq %rbp
ret
Leh_func_end1:
Among other things, the lines
movq %rax, -24(%rbp)
movq -24(%rbp), %rax
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
seem pointlessly redundant. Is there any reason gcc (and possibly other compilers) cannot/does not optimize this away?
I thought gcc should be able to optimize away.
From the gcc manual:
Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results.
In other words, it doesn't optimize unless you ask it to. When I turn on optimizations using the -O3 flag, gcc 4.4.6 produces much more efficient code:
bar:
.LFB0:
.cfi_startproc
leaq 4(%rdi), %rax
ret
.cfi_endproc
For more details, see Options That Control Optimization in the manual.
The code the compiler generates without optimization is typically a straight instruction-by-instruction translation, and the instructions are not those of the program but those of an intermediate representation in which redundancy may have been introduced.
If you expect assembly without such redundant instructions, use gcc -O -S
The kind of optimization you were expecting is called peephole optimization. Compilers usually have plenty of these, because unlike more global optimizations, they are cheap to apply and (generally) do not risk making the code worse—if applied towards the end of the compilation, at least.
In this blog post, I provide an example where both GCC and Clang may go as far as generating shorter 32-bit instructions when the integer type in the source code is 64-bit but only the lowest 32-bit of the result matter.