How to tell if a function argument is an immediate? - c

I have an inline function definition that wraps an inline assembly. I wish to choose
different inline assembly implementation based on the fact whether or not the argument
is known in build time or not.
My question is how to ask in C code or inline assembly whether an address value is an known at build time and therefore fit to be an immediate value. If you are thinking __builtin_constant_p - please do read ahead.
Here is a bit of code that illustrates my intent:. I am trying to find a way to implement "is_immediate".
static char arr[5];
void __attribute__((always_inline)) do_something(char * buf)
{
if(is_immediate(buf) {
// Argument is constant, can use immediate form
asm volatile ("insn1 %0" : : "i"(buf));
} else {
// Argument is computed at runtime, use a register
unsigned long tmp = (unsigned long)buf + 1;
asm volatile("insn2 %0" : : "r"(tmp));
}
int main(void)
{
do_something(&arr);
}
At first impression __builtin_constant_p() seems like it is exactly the right bit of magic needed, except it does not work.
The reason is does not work is that while the address of the array will be known after the linker has placed the array in memory and chosen an address for it (and so it does fit the immediate constraint of the inline assembly), it is not known at compile time before the link.
So, what I am looking for is a way to ask - "is this variable fit to be an immediate value?" rather then "is this a constant expression?".

If I have a look on what is produced by my x86 compiler (where this insn doesn't make sense after all), it is here:
static char arr[5];
void __attribute__((always_inline)) do_something(char * buf)
{
// Argument is constant, can use immediate form
asm volatile ("insn1 %0" : : "r"(buf));
unsigned long tmp = (unsigned long)buf;
asm volatile("insn2 %0" : : "r"(tmp));
}
int main(void)
{
char brr[5];
do_something(arr);
do_something(brr);
}
comes to
.file "inl.c"
.text
.globl do_something
.type do_something, #function
do_something:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
#APP
# 8 "inl.c" 1
insn1 %eax
# 0 "" 2
# 12 "inl.c" 1
insn2 %eax
# 0 "" 2
#NO_APP
popl %ebp
ret
.size do_something, .-do_something
.globl main
.type main, #function
main:
pushl %ebp
movl $arr, %eax
movl %esp, %ebp
subl $16, %esp
#APP
# 8 "inl.c" 1
insn1 %eax
# 0 "" 2
# 12 "inl.c" 1
insn2 %eax
# 0 "" 2
#NO_APP
leal -5(%ebp), %eax
#APP
# 8 "inl.c" 1
insn1 %eax
# 0 "" 2
# 12 "inl.c" 1
insn2 %eax
# 0 "" 2
#NO_APP
leave
ret
.size main, .-main
.local arr
.comm arr,5,4
.ident "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section .comment.SUSE.OPTs,"MS",#progbits,1
.string "OSpwg"
.section .note.GNU-stack,"",#progbits
so I think it could make sense to be immediate as $arr is probably immediate (while leal -5(%ebp), %eax isn't). Probably the said optimization just isn't supported.
But on the other hand,
movl $arr, %eax
insn1 %eax
is not soo much worse than
insn1 $arr
so that it is still acceptable, IMHO.

Related

Why does a simple C "hello world" program not work if gcc -O3 without "volatile"?

I have the following C program
int main() {
char string[] = "Hello, world.\r\n";
__asm__ volatile ("syscall;" :: "a" (1), "D" (0), "S" ((unsigned long) string), "d" (sizeof(string) - 1)); }
which I want to run under Linux with with x86 64 bit. I call the syscall for "write" with 0 as fd argument because this is stdout.
If I compile under gcc with -O3, it does not work. A look into the assembly code
.file "test_for_o3.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
subq $40, %rsp
.cfi_def_cfa_offset 48
xorl %edi, %edi
movl $15, %edx
movq %fs:40, %rax
movq %rax, 24(%rsp)
xorl %eax, %eax
movq %rsp, %rsi
movl $1, %eax
#APP
# 5 "test_for_o3.c" 1
syscall;
# 0 "" 2
#NO_APP
movq 24(%rsp), %rcx
xorq %fs:40, %rcx
jne .L5
xorl %eax, %eax
addq $40, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
call __stack_chk_fail#PLT
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0"
.section .note.GNU-stack,"",#progbits
tells us that gcc has simply not put the string data into the assembly code. Instead, if I declare "string" as "volatile", it works fine.
However, the idea of "volatile" is just to use it for variables that can change their values by (from the view of the executing function) unexpected events, isn't it? "volatile" can make code much slower, hence it should be avoided if possible.
As I would suppose, gcc must assume that the content of "string" must not be ignored because the pointer "string" is used as an input parameter in the inline assembly (and gcc has no idea what the inline assembly code will do with it).
If this is "allowed" behaviour of gcc, where can I read more about all the formal constraints I have to be aware of when writing code for -O3?
A second question would be what the "volatile" statement along with the inline assembly directive does exactly. I just got used to mark all inline assembly directives with "volatile" because it had not worked otherwise, in some situations.

How to understand atomic test_and_set in assembly level?

Hi I'm new to the gcc's built-in atomic functions. And I'm using the test-and-set one. You may find the reference here
Here's the question:
I've done this code:
#define bool int
#define true 1
#define false 0
int main() {
bool lock = true;
bool val = __sync_lock_test_and_set(&lock, true);
return 0;
}
What I intend to do is to check the assembly instruction of __sync_lock_test_and_set. I've used:
gcc -S [filename].c
And the result is:
.file "test_and_set.c"
.file "test_and_set.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
movl $1, -8(%ebp)
movl $1, %eax
xchgl -8(%ebp), %eax
movl %eax, -4(%ebp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.8.1"
However, I can't find where the test_and_set instruction is...
As you can see, I'm using gcc-4.8.1, and the environment is MAC OSX 10.10(I'm sure that this gcc is not what Apple provides. I've compiled this by myself)
Thanks!
movl $1, -8(%ebp) # lock = true
movl $1, %eax # true argument
xchgl -8(%ebp), %eax # the test-and-set
It is an atomic exchange, which returns the previous value (that's the test part) and writes 1 into the variable (the set part). This is used to implement mutexes. After the operation the lock will be held by somebody - either the original owner or your code that has just acquired it. That's why it's safe to write a value of 1. The original value is returned, so you can distinguish between those two events. If the original value was 0 then you got the lock and can proceed, otherwise you need to wait because somebody else has it.

Need help understanding GCC assembly code

For my homework assignment I am supposed to convert this C code
#define UPPER 15
const int lower = 12;
int sum = 0;
int main(void) {
int i;
for (i = lower; i < UPPER; i++) {
sum += i;
}
return sum;
}
into gcc assembly. I already compiled it to first study the code before doing it per hand (obviously translating by hand is going to look much differently). This is the assembler code I received:
.file "upper.c"
.globl lower
.section .rodata
.align 4
.type lower, #object
.size lower, 4
lower:
.long 12
.globl sum
.bss
.align 4
.type sum, #object
.size sum, 4
sum:
.zero 4
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $12, -4(%rbp)
jmp .L2
.L3:
movl sum(%rip), %edx
movl -4(%rbp), %eax
addl %edx, %eax
movl %eax, sum(%rip)
addl $1, -4(%rbp)
.L2:
cmpl $14, -4(%rbp)
jle .L3
movl sum(%rip), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
Now I was wondering if someone could give me a few examples like
where the constructors i, lower, upper and sum are located it in code
where some of the expressions i = lower or i < UPPER are located
where the for-loop starts
and such things so I can then get an idea of how the assembler code is constructed. Thank you!
If I understood correctly you question, here is the answers:
Q: where the constructors i, lower, upper and sum are located it in code?
lower is located inside .rodata section (readonly data section). It's value is initialized by linux loader during program loading stage to the value .long 12. lower constructor is a linux loader. It just loads lower value from binary image.
.globl lower
.section .rodata
.align 4
.type lower, #object
.size lower, 4
lower:
.long 12
sum is located inside .bss section (data segment containing statically-allocated variables). It's value is initialized by _init function what gets called when program execution begins. It's value is zero (.zero 4). Every variable located inside .bss section has zero as initial value (link to wiki's article for .bss).
.globl sum
.bss
.align 4
.type sum, #object
.size sum, 4
sum:
.zero 4
upper is a constant. The compiler did not put it's declaration into assembly. There is a reference to upper-1 (as $14) here:
.L2:
cmpl $14, -4(%rbp)
i is a on stack temporary variable. It's value is accessed using addresses relative %rbp (%rbp is a pointer to current function stack frame). The is no explicit declaration of i into assembly. There is no explicit stack reservation for i (no instruction like sub $0x8,%rsp at main preamble), I think, because main doesn't call another functions. Here is code for i initialization (note compiler knows that lower initial value is $12 and removes access to lower during i initialization):
movl $12, -4(%rbp)
Q: where some of the expressions i = lower or i < UPPER are located
i = lower:
movl $12, -4(%rbp)
jmp .L2
i < UPPER:
.L2:
cmpl $14, -4(%rbp)
jle .L3
i++:
addl $1, -4(%rbp)
sum += i;:
movl sum(%rip), %edx
movl -4(%rbp), %eax
addl %edx, %eax
movl %eax, sum(%rip)
return sum; (%eax register is used to hold function return value - more about this: X86 calling conventions):
jle .L3
movl sum(%rip), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
Q: where the for-loop starts
it start here:
movl $12, -4(%rbp)
jmp .L2

Can I list the local variables in a stack by using some system functions in C language

The C function backtrace just returns a series of functions calls for the programn, but i want to list all the locals variables in my programn, just like the info locals in gdb.Any idea if this can be done? Thanks
Generally, no. You should move away from thinking about a "stack" as some sort of god given factum. A call stack is merely a common implementation technique for C. It has no intrinsic meaning or required semantics. Automatic variables ("local variables", as you say) have to behave in a certain way, and sometimes that means that they are written onto the call stack. However, it is entirely conceivable that local variables are never realized in memory at all -- they may instead only ever be stored in a processor register, or eliminated entirely if an equivalent program can be formulated without them.
So, no, there is no language-intrinsic mechanism for enumerating local variables. As you say, the debugger can do so to some extent (depending on debug symbols being present and subject to optimizations); perhaps you can find a library that can process debug symbols from within a running program.
If this is just for occasional debugging, then you can invoke the debugger. However, since the debugger itself will freeze your program, you need an intermediary to capture the output. You can, for example, use system, and redirect the output to a file, then read the file afterwards. In the example below, the file gdbcmds.txt contains the line info locals.
char buf[512];
FILE *gdb;
snprintf(buf, sizeof(buf), "gdb -batch -x gdbcmds.txt -p %d > gdbout.txt",
(int)getpid());
system(buf);
gdb = fopen("gdbout.txt", "r");
while (fgets(buf, sizeof(buf), gdb) != 0) {
printf("%s", buf);
}
fclose(gdb);
First, note that backtrace is not a standard C library function, but a GNU-specific extension.
In general, it's difficult to impossible retrieve local variable information from compiled code, especially if it was compiled without debugging or with optimization enabled. If debugging isn't turned on, variable names and types are generally not preserved in the resulting machine code.
For example, take the following ridiculously simple code:
#include <stdio.h>
#include <math.h>
int main(void)
{
int x = 1, y = 2, z;
z = 2 * y - x;
printf("x = %d, y = %d, z = %d\n", x, y, z);
return 0;
}
Here's the resulting machine code, no debugging or optimization:
.file "varinfo.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "x = %d, y = %d, z = %d\n"
.text
.align 4
.globl main
.type main,#function
main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $1, -4(%ebp)
movl $2, -8(%ebp)
movl -8(%ebp), %eax
movl %eax, %eax
sall $1, %eax
subl -4(%ebp), %eax
movl %eax, -12(%ebp)
pushl -12(%ebp)
pushl -8(%ebp)
pushl -4(%ebp)
pushl $.LC0
call printf
addl $16, %esp
movl $0, %eax
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"
x, y, and z are referred to through -4(%ebp), -8(%ebp), and -12(%ebp) respectively. There's nothing to indicate that they're integers other than the instructions used to perform the arithmetic.
It's even better with optimization (-O1) turned on:
.file "varinfo.c"
.version "01.01"
gcc2_compiled.:
.section .rodata.str1.1,"ams",#progbits,1
.LC0:
.string "x = %d, y = %d, z = %d\n"
.text
.align 4
.globl main
.type main,#function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
pushl $3
pushl $2
pushl $1
pushl $.LC0
call printf
movl $0, %eax
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"
In this case, the compiler was able to do some static analysis and compute the value z at compile time; there's no need to set aside any memory for any of the variables at all, because the compiler already knows what those values have to be.

Can/do C compilers optimize out adress-of in inline functions?

Let's say I have following code:
int f() {
int foo = 0;
int bar = 0;
foo++;
bar++;
// many more repeated operations in actual code
foo++;
bar++;
return foo+bar;
}
Abstracting repeated code into a separate functions, we get
static void change_locals(int *foo_p, int *bar_p) {
*foo_p++;
*bar_p++;
}
int f() {
int foo = 0;
int bar = 0;
change_locals(&foo, &bar);
change_locals(&foo, &bar);
return foo+bar;
}
I'd expect the compiler to inline the change_locals function, and optimize things like *(&foo)++ in the resulting code to foo++.
If I remember correctly, taking address of a local variable usually prevents some optimizations (e.g. it can't be stored in registers), but does this apply when no pointer arithmetic is done on the address and it doesn't escape from the function? With a larger change_locals, would it make a difference if it was declared inline (__inline in MSVC)?
I am particularly interested in behavior of GCC and MSVC compilers.
inline (and all its cousins _inline, __inline...) are ignored by gcc. It might inline anything it decides is an advantage, except at lower optimization levels.
The code procedure by gcc -O3 for x86 is:
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
.ident "GCC: (GNU) 4.4.4 20100630 (Red Hat 4.4.4-10)"
It returns zero because *ptr++ doesn't do what you think. Correcting the increments to:
(*foo_p)++;
(*bar_p)++;
results in
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
movl $4, %eax
movl %esp, %ebp
popl %ebp
ret
So it directly returns 4. Not only did it inline them, but it optimized the calculations away.
Vc++ from vs 2005 provides similar code, but it also created unreachable code for change_locals(). I used the command line
/O2 /FD /EHsc /MD /FA /c /TP
If I remember correctly, taking
address of a local variable usually
prevents some optimizations (e.g. it
can't be stored in registers), but
does this apply when no pointer
arithmetic is done on the address and
it doesn't escape from the function?
The general answer is that if the compiler can ensure that no one else will change a value behind its back, it can safely be placed in a register.
Think of this as though the compiler first performs inlining, then transforms all those *&foo (which results from the inlining) to simply foo before deciding if they should be placed in registers on in memory on the stack.
With a larger change_locals, would it
make a difference if it was declared
inline (__inline in MSVC)?
Again, generally speaking, whether or not a compiler decides to inline something is done using heuristics. If you explicitly specify that you want something to be inlines, the compiler will probably weight this into its decision process.
I've tested gcc 4.5, MSC and IntelC using this:
#include <stdio.h>
void change_locals(int *foo_p, int *bar_p) {
(*foo_p)++;
(*bar_p)++;
}
int main() {
int foo = printf("");
int bar = printf("");
change_locals(&foo, &bar);
change_locals(&foo, &bar);
printf( "%i\n", foo+bar );
}
And all of them did inline/optimize the foo+bar value, but also did
generate the code for change_locals() (but didn't use it).
Unfortunately, there's still no guarantee that they'd do the same for
any kind of such a "local function".
gcc:
__Z13change_localsPiS_:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
incl (%edx)
incl (%eax)
leave
ret
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
pushl %ebx
subl $28, %esp
call ___main
movl $LC0, (%esp)
call _printf
movl %eax, %ebx
movl $LC0, (%esp)
call _printf
leal 4(%ebx,%eax), %eax
movl %eax, 4(%esp)
movl $LC1, (%esp)
call _printf
xorl %eax, %eax
addl $28, %esp
popl %ebx
leave
ret

Resources