Error with -mno-sse flag and gettimeofday() in C - c

A simple C program which uses gettimeofday() works fine when compiled without any flags ( gcc-4.5.1) but doesn't give output when compiled with the flag -mno-sse.
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct timeval s,e;
float time;
int i;
gettimeofday(&s, NULL);
for( i=0; i< 10000; i++);
gettimeofday(&e, NULL);
time = e.tv_sec - s.tv_sec + e.tv_usec - s.tv_usec;
printf("%f\n", time);
return 0;
}
I have CFLAGS=-march=native -mtune=native
Could someone explain why this happens?
The program returns a correct value normally, but prints "0" when compiled with -mno-sse enabled.

The flag -mno-sse causes floating point arguments to be passed on the stack, whereas the usual x86_64 ABI specifies that they should be passed via SSE registers.
Since printf() in your C library was compiled without -mno-sse, it is expecting floating point arguments to be passed in accordance with the ABI. This is why your code fails. It has nothing to do with gettimeofday().
If you wish to use printf() from your code compiled with -mno-sse and pass it floating point arguments, you will need to recompile your C library with that option and link against that version.

It appears that you are using a loop which does nothing in order to observe a time difference. The problem is, the compiler may optimize this loop away entirely. The issue may not be with the -mno-sse itself, but may be that that allows an optimization that removes the loop, thus giving you the same time each time you run it.
I would recommend trying to put something in that loop which can't be optimized out (such as incrementing a number which you print out at the end). See if you still get the same behavior. If not, I'd recommend looking at the generated assembler gcc -S and see what the code difference is.

The datastructures tv_usec and tv_sec are usually longs.
Redeclaration of the variable "time" as a long integer solved the issue.
The following link addresses the issue.
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00525.html
Working code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct timeval s,e;
long time;
int i;
gettimeofday(&s, NULL);
for( i=0; i< 10000; i++);
gettimeofday(&e, NULL);
time = e.tv_sec - s.tv_sec + e.tv_usec - s.tv_usec;
printf("%ld\n", time);
return 0;
}
Thanks for the prompt replies. Hope this helps.

What do you mean doesn't give output?
0 (zero) is a perfectly reasonable output to expect.
Edit: Try compiling to assembler (gcc -S ...) and see the differences between the normal and the no-sse version.

Related

Why is no warning given for the wrong use of __attribute__((pure)) in GCC?

I am trying to understand pure functions, and have been reading through the Wikipedia article on that topic. I wrote the minimal sample program as follows:
#include <stdio.h>
static int a = 1;
static __attribute__((pure)) int pure_function(int x, int y)
{
return x + y;
}
static __attribute__((pure)) int impure_function(int x, int y)
{
a++;
return x + y;
}
int main(void)
{
printf("pure_function(0, 0) = %d\n", pure_function(0, 0));
printf("impure_function(0, 0) = %d\n", impure_function(0, 0));
return 0;
}
I compiled this program with gcc -O2 -Wall -Wextra, expecting that an error, or at least a warning, should have been issued for decorating impure_function() with __attribute__((pure)). However, I received no warnings or errors, and the program also ran without issues.
Isn't marking impure_function() with __attribute__((pure)) incorrect? If so, why does it compile without any errors or warnings, even with the -Wextra and -Wall flags?
Thanks in advance!
Doing this is incorrect and you are responsible for using the attribute correctly.
Look at this example:
static __attribute__((pure)) int impure_function(int x, int y)
{
extern int a;
a++;
return x + y;
}
void caller()
{
impure_function(1, 1);
}
Code generated by GCC (with -O1) for the function caller is:
caller():
ret
As you can see, the impure_function call was completely removed because compiler treats it as "pure".
GCC can mark the function as "pure" internally automatically if it sees its definition:
static __attribute__((noinline)) int pure_function(int x, int y)
{
return x + y;
}
void caller()
{
pure_function(1, 1);
}
Generated code:
caller():
ret
So there is no point in using this attribute on functions that are visible to the compiler. It is supposed to be used when definition is not available, for example when function is defined in another DLL. That means that when it is used in a proper place the compiler won't be able to perform a sanity check anyway. Implementing a warning thus is not very useful (although not meaningless).
I don't think there is anything stopping GCC developers from implementing such warning, except time that must be spend.
A pure function is a hint for the optimizing compiler. Probably, gcc don't care about pure functions when you pass just -O0 to it (the default optimizations). So if f is pure (and defined outside of your translation unit, e.g. in some outside library), the GCC compiler might optimize y = f(x) + f(x); into something like
{
int tmp = f(x); /// tmp is a fresh variable, not appearing elsewhere
y = tmp + tmp;
}
but if f is not pure (which is the usual case: think of f calling printf or malloc), such an optimization is forbidden.
Standard math functions like sin or sqrt are pure (except for IEEE rounding mode craziness, see http://floating-point-gui.de/ and Fluctuat for more), and they are complex enough to compute to make such optimizations worthwhile.
You might compile your code with gcc -O2 -Wall -fdump-tree-all to guess what is happening inside the compiler. You could add the -fverbose-asm -S flags to get a generated *.s assembler file.
You could also read the Bismon draft report (notably its section §1.4). It might give some intuitions related to your question.
In your particular case, I am guessing that gcc is inlining your calls; and then purity matters less.
If you have time to spend, you might consider writing your own GCC plugin to make such a warning. You'll spend months in writing it! These old slides might still be useful to you, even if the details are obsolete.
At the theoretical level, be aware of Rice's theorem. A consequence of it is that perfect optimization of pure functions is probably impossible.
Be aware of the GCC Resource Center, located in Bombay.

Interleaved usleep() functions being executed together. Is this compiler optimization?

I have a code that does something similar to the following repeatedly over a loop:
$ cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
char arr[6] = {'h','e','l','l','o','!'};
for(int x=0; x<6 ; x++){
printf("%c",arr[x]);
usleep(1000000);
printf("%c",arr[x]);
usleep(1000000);
}
printf("\n");
return 0;
}
I see that printf() executes one after the other WITHOUT any delay (due to usleep), and then the program sleeps for the total usleep time at the end before the next iteration. Seems like all the usleep() calls happen together in the end.
I tried -O0 flag in gcc, because I suspected its the effect of compiler optimization. But I guess -O0 flag does not disable whatever optimization category this case falls under (if my guess is correct about the compiler being the reason for this behavior).
I am trying to understand the reason for this behavior and how to achieve the desired behavior from my program.
Note: I know it might be possible to replace usleep() with some compute-heavy function call that take an equivalent amount of time, but that is not the solution I am looking for.
You are using usleep() wrong. Use sleep(1) instead.
From man usleep:
EINVAL usec is greater than or equal to 1000000. (On systems where that is considered an error.)
Once you fix that you should do fflush() after printf() to avoid another surprise with output buffering.

difference between time() and gettimeofday() and why does one cause seg fault

I'm trying to measure the amount of time for a system call, and I tried using time(0) and gettimeofday() in this program, but whenever I use gettimeofday() it seg faults. I suppose I can just use time(0) but I'd like to know why this is happening. And I know you guys can just look at it and see the problem. Please don't yell at me!
I want to get the time but not save it anywhere.
I've tried every combination of code I can think of but I pasted the simplest version here. I'm new to C and Linux. I look at the .stackdump file but it's pretty meaningless to me.
GetRDTSC is in util.h and it does rdtsc(), as one might expect. Now it's set to 10 iterations but later the loop will run 1000 times, without printf.
#include <stdio.h>
#include <time.h>
#include "util.h"
int main() {
int i;
uint64_t cycles[10];
for (i = 0; i < 10; ++i) {
// get initial cycles
uint64_t init = GetRDTSC();
gettimeofday(); // <== time(0) will work here without a seg fault.
// get cycles after
uint64_t after = GetRDTSC();
// save cycles for each operation in an array
cycles[i] = after - init;
printf("%i\n", (int)(cycles[i]));
}
}
The short version
gettimeofday() requires a pointer to a struct timeval to fill with time data.
So, for example, you'd do something like this:
#include <sys/time.h>
#include <stdio.h>
int main() {
struct timeval tv;
gettimeofday(&tv, NULL); // timezone should be NULL
printf("%d seconds\n", tv.tv_secs);
return 0;
}
The long version
The real problem is that gcc is automatically including vdso on your system, which contains a symbol for the syscall gettimeofday. Consider this program (entire file):
int main() {
gettimeofday();
return 0;
}
By default, gcc will compile this without warning. If you check the symbols it's linked against, you'll see:
ternus#event-horizon ~> gcc -o foo foo.c
ternus#event-horizon ~> ldd foo
linux-vdso.so.1 => (0x00007ffff33fe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f56a5255000)
/lib64/ld-linux-x86-64.so.2 (0x00007f56a562b000)
You just happen to be using a function that has a defined symbol, but without the prototype, there's no way to tell how many command-line arguments it's supposed to have.
If you compile it with -Wall, you'll see:
ternus#event-horizon ~> gcc -Wall -o foo foo.c
foo.c: In function ‘main’:
foo.c:2:3: warning: implicit declaration of function ‘gettimeofday’ [-Wimplicit-function-declaration]
Of course, it'll segfault when you try to run it. Interestingly, it'll segfault in kernel space (this is on MacOS):
cternus#astarael ~/foo> gcc -o foo -g foo.c
cternus#astarael ~/foo> gdb foo
GNU gdb 6.3.50-20050815 (Apple version gdb-1822) (Sun Aug 5 03:00:42 UTC 2012)
[etc]
(gdb) run
Starting program: /Users/cternus/foo/foo
Reading symbols for shared libraries +.............................. done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000001
0x00007fff87eeab73 in __commpage_gettimeofday ()
Now consider this program (again, no header files):
typedef struct {
long tv_sec;
long tv_usec;
} timeval;
int main() {
timeval tv;
gettimeofday(&tv, 0);
return 0;
}
This will compile and run just fine -- no segfault. You've provided it with the memory location it expects, even though there's still no gettimeofday prototype provided.
More information:
Can anyone understand how gettimeofday works?
Is there a faster equivalent of gettimeofday?
The POSIX gettimeofday specification

Why is clang optimizing out my array even when using -O0 flag?

I am trying to debug the following C Program using GDB:
// Program to generate a user specified number of
// fibonacci numbers using variable length arrays
// Chapter 7 Program 8 2013-07-14
#include <stdio.h>
int main(void)
{
int i, numFibs;
printf("How many fibonacci numbers do you want (between 1 and 75)?\n");
scanf("%i", &numFibs);
if (numFibs < 1 || numFibs > 75)
{
printf("Between 1 and 75 remember?\n");
return 1;
}
unsigned long long int fibonacci[numFibs];
fibonacci[0] = 0; // by definition
fibonacci[1] = 1; // by definition
for(i = 2; i < numFibs; i++)
fibonacci[i] = fibonacci[i-2] + fibonacci[i-1];
for(i = 0; i < numFibs; i++)
printf("%llu ", fibonacci[i]);
printf("\n");
return 0;
}
The issue I am having is when trying to compile the code using:
clang -ggdb3 -O0 -Wall -Werror 7_8_FibonacciVarLengthArrays.c
When I try to run gdb on the a.out file created and I am stepping through the program execution. Anytime after the fibonacci[] array is decalared and I type:
info locals
the result says fibonacci <value optimized out> (until after the first iteration of my for loop) which then results in fibonacci holding the address 0xbffff128 for the rest of the program (but dereferencing that address does not appear to contain any meaningful data).
I am just confused why clang appears to be optimizing out this array when the -O0 flag is used?
I can use gcc to compile this code and the value displays as expected when using GDB....
Any thoughts?
Thank you.
You don't mention which version of clang you are using. I tried it with both 3.2 and a recent SVN install (3.4).
The code generated by the two versions looks pretty similar to me, but the debugging information is different. The clang 3.2 (which comes from a default ubuntu 13.04 install) produces an error when I try to examine fibonacci in gdb:
fibonacci = <error reading variable fibonacci (DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.)>
In the code compiled with clang 3.4, it all works fine. In neither case is the array "optimized out"; it's clearly allocated on the stack.
So I suspect the oddity that you're seeing has more to do with the emission of debugging information than with the actual code.
gdb does not yet support debugging stack allocated variable-length arrays. See https://sourceware.org/gdb/wiki/VariableLengthArray
Use a compile time constant or malloc to allocate fibonacci so that it will be visible to gdb.
See also GDB reports "no symbol in current context" upon array initialization
clang is not "optimizing out" the array at all! The array is declared as a variable-length array on the stack, so it has to be explicitly allocated (using techniques similar to those used by alloca()) when its declaration is reached. The starting address of the array is unknown until that process is complete.

glibc - force function call (no inline expansion)

I have a question regarding glibc function calls. Is there a flag to tell gcc not to inline a certain glibc function, e.g. memcpy?
I've tried -fno-builtin-memcpy and other flags, but they didn't work. The goal is that the actual glibc memcpy function is called and no inlined code (since the glibc version at compile time differs from the one at runtime). It's for testing purposes only. Normally I wan't do that.
Any solutions?
UPDATE:
Just to make it clearer: In the past memcpy works even with overlapping areas. This has changed meanwhile and I can see this changes when compiling with different glibc versions. So now I want to test if my old code (using memcpy where memmove should have been used) works correct or not on a system with a newer glibc (e.g. 2.14). But to do that, I have to make sure, that the new memcpy is called and no inlined code.
Best regards
This may not be exactly what you're looking for, but it seems to force gcc to generate an indirect call to memcpy():
#include <stdio.h>
#include <string.h>
#include <time.h>
// void *memcpy(void *dest, const void *src, size_t n)
unsigned int x = 0xdeadbeef;
unsigned int y;
int main(void) {
void *(*memcpy_ptr)(void *, const void *, size_t) = memcpy;
if (time(NULL) == 1) {
memcpy_ptr = NULL;
}
memcpy_ptr(&y, &x, sizeof y);
printf("y = 0x%x\n", y);
return 0;
}
The generated assembly (gcc, Ubuntu, x86) includes a call *%edx instruction.
Without the if (time(NULL) == 1) test (which should never succeed, but the compiler doesn't know that), gcc -O3 is clever enough to recognize that the indirect call always calls memcpy(), which can then be replaced by a movl instruction.
Note that the compiler could recognize that if memcpy_ptr == NULL then the behavior is undefined, and again replace the indirect call with a direct call, and then with a movl instruction. gcc 4.5.2 with -O3 doesn't appear to be that clever. If a later version of gcc is, you could replace the memcpy_ptr = NULL with an assignment of some actual function that behaves differently than memcpy().
In theory:
gcc -fno-inline -fno-builtin-inline ...
But then you said -fno-builtin-memcpy didn't stop the compiler from inlining it, so there's no obvious reason why this should work any better.
#undef memcpy
#define mempcy your_memcpy_replacement
Somewhere at the top but after #include obviously
And mark your_memcpy_replacement as attribute((noinline))

Resources