literal constant vs variable in math library - c

So, I know that in C you need to link the code to the math library, libm, to be able to use its functions. Today, while I was trying to demonstrate this to a friend, and explain why you need to do this, I came across the following situation that I do not understand.
Consider the following code:
#include <math.h>
#include <stdio.h>
/* #define VARIABLE */
int main(void)
{
#ifdef VARIABLE
double a = 2.0;
double b = sqrt(a);
printf("b = %lf\n",b);
#else
double b = sqrt(2.0);
printf("b = %lf\n",b);
#endif
return 0;
}
If VARIABLE is defined, you need to link against libm as you would normally expect; otherwise you get the usual main.c:(.text+0x29): undefined reference to sqrt linking error indicating that the compiler cannot find the definition for the function sqrt. I was surprised to see that if I comment #define VARIABLE, the code runs fine and the result is correct!
Why is it that I need to link to libm when variables are used but I don't need to do so when literal constants are used? How does the compiler find the definition of sqrt when the library is not linked? I'm using gcc 4.4.5 under linux.

GCC can do constant folding for several standard-library functions. Obviously, if the function is folded at compile-time, there is no need for a run-time function call, so no need to link to libm. You could confirm this by taking a looking at the assembler that the compiler produces (using objdump or similar).
I guess these optimizations are only triggered when the argument is a constant expression.

As everyone mentions, yes it has to do with constant folding.
With optimizations off, GCC only seems to do it when sqrt(2.0) is used. Here's the evidence:
Case 1: With the variable.
.file "main.c"
.section .rodata
.LC1:
.string "b = %lf\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
fldl .LC0
fstpl 24(%esp)
fldl 24(%esp)
fsqrt
fucom %st(0)
fnstsw %ax
sahf
jp .L5
je .L2
fstp %st(0)
jmp .L4
.L5:
fstp %st(0)
.L4:
fldl 24(%esp)
fstpl (%esp)
call sqrt
.L2:
fstpl 16(%esp)
movl $.LC1, %eax
fldl 16(%esp)
fstpl 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long 0
.long 1073741824
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
You can see that it emits a call to the sqrt function. So you'll get a linker error if you don't link the math library.
Case 2: With the Literal.
.file "main.c"
.section .rodata
.LC1:
.string "b = %lf\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
fldl .LC0
fstpl 24(%esp)
movl $.LC1, %eax
fldl 24(%esp)
fstpl 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long 1719614413
.long 1073127582
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
There's no call to sqrt. Hence no linker error.
With optimizations on, GCC will do constant propagation in both cases. So no linker error in either case.
$ gcc main.c -save-temps
main.o: In function `main':
main.c:(.text+0x30): undefined reference to `sqrt'
collect2: ld returned 1 exit status
$ gcc main.c -save-temps -O2
$

I think it GCC uses its builtin. I compiled your code with: -fno-builtin-sqrt and got the expected linker error.
The ISO C90 functions ... sin, sprintf, sqrt ... are all
recognized as built-in functions unless -fno-builtin is specified

That's because gcc is clever enough to figure out that the square root of the constant 2 is also a constant, so it just generates code like:
mov register, whatever-the-square-root-of-2-is
Hence no need to do a square root calculation at run time, gcc has already done it at compile time.
This is akin to a benchmarking program which does bucketloads of calculations then does nothing with the result:
int main (void) {
// do something rather strenuous
return 0;
}
You're likely (at high optimisation levels) to see all the do something rather strenuous code optimised out of existence.
The gcc docs have a whole page dedicated to these built-ins here and the relevant section in that page for sqrt and others is:
The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).
So, quite a lot, really :-)

Related

Trying to understand the gcc assembly output for printf()

I'm trying to learn how to understand assembly code so I've been studying the assembly output of GCC for some stupid programs. One of them was nothing but int i = 0;, the code of which I more or less fully understand now (the biggest struggle was understanding the GAS directives strewn about). Anyway, I went a step forward and added printf("%d\n", i); to see if I could understand that and suddenly the code is much more chaotic.
.file "helloworld.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%d\n"
.section .text.startup,"ax",#progbits
.p2align 4
.globl main
.type main, #function
main:
subq $8, %rsp
xorl %edx, %edx
leaq .LC0(%rip), %rsi
xorl %eax, %eax
movl $1, %edi
call __printf_chk#PLT
xorl %eax, %eax
addq $8, %rsp
ret
.size main, .-main
.ident "GCC: (Gentoo 10.2.0-r3 p4) 10.2.0"
.section .note.GNU-stack,"",#progbits
I'm compiling this with gcc -S -O3 -fno-asynchronous-unwind-tables to remove the .cfi directives, however -O2 produces the same code so -O3 is overkill. My understanding of assembly is quite limited but it seems to me like the compiler is doing a lot of unneccessary stuff here. Why subtract and then add 8 to rsp? Why is it performing so many xors? There's only one variable. What is movl $1, %edi doing? I thought maybe the compiler was doing something stupid in an attempt to optimize but as I said, it's not optimizing beyond -O2, also it performs all of these operations even at -O1. To be honest I don't understand the unoptimized code at all so I assume it's inefficient.
The only thing that comes to mind is that the call to printf uses these registers, otherwise they are unused and serve no purpose. Is that actually the case? If so, how is it possible to tell?
Thanks in advance. I'm reading a book on compiler design at the moment and I've read most of the GCC manual (I read the whole chapter on optimization) and I've read some introductory x86_64 asm material, if somebody could point me toward some other resources (besides the Intel x86 manual) for learning more I would also appreciate that.
For the compiler that you are using it looks like printf(...) is mapped to __printf_chk(1, ...)
To understand the code, you need to understand the parameter passing conventions for the platform (part of the ABI). Once you know that up to 4 params are passed in %rdi, %rsi, %rdx, %rcx, you can understand most of what is going on:
subq $8, %rsp ; allocate 8 bytes of stack
xorl %edx, %edx ; i = 0 ; put it in the 3rd parameter for __printf_chk
leaq .LC0(%rip), %rsi ; 2nd parameter for __printf_chk. The: "%d\n"
xorl %eax, %eax ; 0 variadic fp params
movl $1, %edi ; 1st parameter for __printf_chk
call __printf_chk#PLT ; call the runtime loader wrapper for __printf_chk
xorl %eax, %eax ; return 0 from main
addq $8, %rsp ; deallocate 8 bytes of stack.
ret
Nate points out in the comments that section 3.5.7 in the ABI explains the %eax = 0 (no floating point variadic parameters.)

memcopying data off the stack in C

I was putting together a C riddle for a couple of my friends when a friend drew my attention to the fact that the following snippet (which happens to be part of the riddle I'd been writing) ran differently when compiled and run on OSX
#include <stdio.h>
#include <string.h>
int main()
{
int a = 10;
volatile int b = 20;
volatile int c = 30;
int data[3];
memcpy(&data, &a, sizeof(data));
printf("%d %d %d\n", data[0], data[1], data[2]);
}
What you'd expect the output to be is 10 20 30, which happens to be the case under Linux, but when the code is built under OSX you'd get 10 followed by two random numbers. After some debugging and looking at the compiler-generated assembly I came to the conclusion that this is due to how the stack is built. I am by no means an assembly expert, but the assembly code generated on Linux seems pretty straightforward to understand while the one generated on OSX threw me off a little. Perhaps I could use some help from here.
This is the code that was generated on Linux:
.file "code.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB23:
.cfi_startproc
movl $10, -12(%rsp)
xorl %eax, %eax
movl $20, -8(%rsp)
movl $30, -4(%rsp)
ret
.cfi_endproc
.LFE23:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
And this is the code that was generated on OSX:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $20, -8(%rbp)
movl $30, -4(%rbp)
leaq L_.str(%rip), %rdi
movl $10, %esi
xorl %eax, %eax
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "%d %d %d\n"
.subsections_via_symbols
I'm really only interested in two questions here.
Why is this happening?
Are there any get-arounds to this issue?
I know this is not a practical way to utilize the stack as I'm a professional C developer, which is really the only reason I found this problem interesting to invest some of my time into.
Accessing memory past the end of a declared variable is undefined behaviour - there is no guarantee as to what will happen when you try to do that. Because of how the compiler generated the assembly under Linux, you happened to get the 3 variables directly in a row on the stack, however that behaviour is just a coincidence - the compiler could legally add extra data in between the variables on the stack or really do anything - the result is not defined by the language standard. So in answer to your first question, it's happening because what you're doing is not part of the language by design. In answer to your second, there's no way to reliably get the same result from multiple compilers because the compilers are not programmed to reliably reproduce undefined behaviour.
undefined behavior. You don't expect to copy 10, 20 ,30. You hope not to seg-fault.
There is nothing to guarantee that a,b, and c are sequential memory addresses, which is your naive assumption. On Linux, the compiler happened to make them sequential. You can't even rely on gcc always doing that.
You already know that the behavior is undefined. A good reason for the behavior to be different on OS/X and Linux is these systems use a different compiler, that generates different code:
When you run gcc in Linux, you invoke the installed version the Gnu C compiler.
When you run gcc in your version of OS/X, you most likely invoke the installed version of clang.
Try gcc --version on both systems and amaze your friends.

What's the difference between puts and printf in C compiled into Assembly language

This is my C program using puts():
#include <stdio.h>
int main(void){
puts("testing");
}
After using gcc -S -o sample.s sample.cto compiled it into Assembly, this is what I got:
.file "sample.c"
.section .rodata
.LC0:
.string "testing"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $.LC0, (%esp)
call puts
leave
ret
.size main, .-main
.ident "GCC: (GNU) 4.4.5 20110214 (Red Hat 4.4.5-6)"
.section .note.GNU-stack,"",#progbits
I did the same way, this time I was using printf() instead of puts and this is what I got:
.file "sample.c"
.section .rodata
.LC0:
.string "testing"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $.LC0, %eax //this is the difference
movl %eax, (%esp)
call printf
leave
ret
.size main, .-main
.ident "GCC: (GNU) 4.4.5 20110214 (Red Hat 4.4.5-6)"
.section .note.GNU-stack,"",#progbits
Here is what I don't understand, the printf() function mov $.LC0 to %eax, then mov %eax to (%esp) while the puts() function mov %.LC0 directly to (%esp).
I don't know why is that.
The big difference between the two functions, at the assembly level, is that the puts() function will just take one argument (a pointer to the string to display) and the printf() function will take one argument (a pointer to the format string) and, then, an arbitrary number of arguments in the stack (printf() is a variadic function).
Note that, there is absolutely no check of the number of arguments, it is only depending of the number of time the character % is encountered in the format string. For example, this specificity is used in format string format bug exploitation method to interactively explore the content of the stack of a process.
So, basically, the difference is that puts() has only one argument and printf() is a variadic function.
If you want to better understand this difference, try to compile:
#include <stdio.h>
int main(void) {
printf("testing %d", 10);
}

GCC and the Multiply Instruction

I am using GCC in 32-bit mode on a Windows 7 machine under cygwin. I have the following function:
unsigned f1(unsigned x, unsigned y)
{
return x*y;
}
I want the code to do an unsigned multiply and as such I would expect it to generate the mul instruction, not the imul instruction. I compile the program
with the following command:
gcc -m32 -S t4.c
The generated assembly code is:
.file "t4.c"
.text
.globl _f1
.def _f1; .scl 2; .type 32; .endef
_f1:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
imull 12(%ebp), %eax
popl %ebp
ret
.ident "GCC: (GNU) 4.8.2"
I believe that the generated code has the wrong multiply instruction in it but I find it hard to believe that GCC has such a simple bug. Please comment.
The compiler relies on the "as-if" rule: No standard conforming program can detect a difference between what this program does and what the program should do, since the lowest 32 bits of the result are the same for both instructions.

Editing ASM result of an operation in C when compiling in GCC

me and my friend got a computer architecture project and we don't really know how to get to it. I hope you could at least point us in the right direction so we know what to look for. As our professor isn't really good at explaining what we really need to do and the subject is rather vague we'll start from the beginning.
Our task is to somehow "edit" GCC to treat some operations differently. For example when you add two char arguments in a .c program it uses addb. We need to change it to f.e. 16bit registers(addl), without using unnecessary parameters during compilation(just regular gcc p.c -o p). Why or will it work doesn't really matter at this point.
We'd like to know how we could change something inside GCC, where we can even start looking as I can't find any information about similar tasks besides making plugins/extensions. Is there anything we could read about something like this or anything we could use?
In C 'char' variables are normally added together as integers so the C compiler will already use addl. Except when it can see that it makes no difference to the result to use a smaller or faster form.
For example this C code
unsigned char a, b, c;
int i;
void func1(void) { a = b + c; }
void func2(void) { i = b + c; }
Gives this assembler for GCC.
.file "xq.c"
.text
.p2align 4,,15
.globl func1
.type func1, #function
func1:
movzbl c, %eax
addb b, %al
movb %al, a
ret
.size func1, .-func1
.p2align 4,,15
.globl func2
.type func2, #function
func2:
movzbl b, %edx
movzbl c, %eax
addl %edx, %eax
movl %eax, i
ret
.size func2, .-func2
.comm i,4,4
.comm c,1,4
.comm b,1,4
.comm a,1,4
.ident "GCC: (Debian 4.7.2-5) 4.7.2"
.section .note.GNU-stack,"",#progbits
Note that the first function uses addb but the second uses addl because the high bits of the result will be discarded in the first function when the result is stored.
This version of GCC is generating i686 code so the integers are 32bit (addl) depending on exactly what you want you may need to make the result a short or actually get a compiler version that outputs 16bit 8086 code.

Resources