I know in some languages the following:
a += b
is more efficient than:
a = a + b
because it removes the need for creating a temporary variable. Is this the case in C? Is it more efficient to use += (and, therefore also -= *= etc)
So here's a definitive answer...
$ cat junk1.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s = s + a * a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ cat junk2.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s += a * a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ for a in *.c ; do gcc -O3 -o ${a%.c} $a ; done
michael#isolde:~/junk$ time ./junk1
Final sum: 3338615082255021824
real 0m2.188s
user 0m2.120s
sys 0m0.000s
michael#isolde:~/junk$ time ./junk2
Final sum: 3338615082255021824
real 0m2.179s
user 0m2.120s
sys 0m0.000s
...for my computer and my compiler running on my operating system. Your results may or may not vary. On my system, however, the time is identical: user time 2.120s.
Now just to show you how impressive modern compilers can be, you'll note that I used the expression a * a in the assignment. This is because of this little problem:
$ cat junk.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s = s + a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ gcc -O3 -S junk.c
michael#isolde:~/junk$ cat junk.s
.file "junk.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Final sum: %ld\n"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
movabsq $499999999500000000, %rdx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
jmp __printf_chk
.cfi_endproc
.LFE22:
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
The compiler figured out my loop and unrolled it to the point of calculating the cumulative sum and just embedded that as a constant which it proceeded to print out, skipping any kind of looping construct entirely. In the face of optimizers that clever do you really think you're going to find any meaningful edge in distinguishing between s = s + a and s += a?!
This is a compiler specific question really, but I expect all modern compilers would give the same result. Using Visual Studio 2008:
int main() {
int a = 10;
int b = 30;
a = a + b;
int c = 10;
int d = 50;
c += d;
}
The line a = a + b has disassembly
0014139C mov eax,dword ptr [a]
0014139F add eax,dword ptr [b]
001413A2 mov dword ptr [a],eax
The line c += d has disassembly
001413B3 mov eax,dword ptr [c]
001413B6 add eax,dword ptr [d]
001413B9 mov dword ptr [c],eax
Which is the same. They are compiled into the same code.
It depends on what a is.
a += b in C is by definition equivalent to a = a + b, except that from the abstract point of view a is evaluated only once in the former variant. If a is a "pure" value, i.e. if evaluating it once vs. evaluating it many times makes no impact on programs behavior, then a += b is strictly equivalent to a = a + b in all regards, including efficiency.
In other words, in situations when you actually have a free choice between a += b and a = a + b (meaning that you know that they do the same thing) they will generally have exactly the same efficiency. Some compilers might have difficulties when a stands for a function call (for one example; probably not what you meant), but when a is a non-volatile variable the machine code generated for both expressions will be the same.
For another example, if a is a volatile variable, then a += b and a = a + b have different behavior and, therefore, different efficiency. However, since they are not equivalent, your question simply does not apply in such cases.
In the simple cases shown in the question, there is no significant difference. Where the assignment operator scores is when you have an expression such as:
s[i]->m[j1].k = s[i]->m[jl].k + 23; // Was that a typo?
vs:
s[i]->m[j1].k += 23;
Two benefits - and I'm not counting less typing. There's no question about whether there was a typo when the first and second expressions differ; and the compiler doesn't evaluate the complex expression twice. The chances are that won't make much difference these days (optimizing compilers are a lot better than they used to be), but you could have still more complex expressions (evaluating a function defined in another translation unit, for example, as part of the subscripting) where the compiler may not be able to avoid evaluating the expression twice:
s[i]->m[somefunc(j1)].k = s[i]->m[somefunc(j1)].k + 23;
s[i]->m[somefunc(j1)].k += 23;
Also, you can write (if you're brave):
s[i++]->m[j1++].k += 23;
But you cannot write:
s[i++]->m[j1++].k = s[i]->m[j1].k + 23;
s[i]->m[j1].k = s[i++]->m[j1++].k + 23;
(or any other permutation) because the order of evaluation is not defined.
a += b
is more efficient than
a = a + b
because the former takes you 6 keystrokes and the latter takes you 9 keystrokes.
With modern hardware, even if the compiler is stupid and uses slower code for one than the other, the total time saved over the lifetime of the program may possibly be less than the time it takes you to type the three extra key strokes.
However, as others have said, the compiler almost certainly produces exactly the same code so the former is more efficient.
Even if you factor in readability, most C programmers probably mentally parse the former more quickly than the latter because it is such a common pattern.
In virtually all cases, the two produce identical results.
Other than with a truly ancient or incompetently written compiler there should be no difference, as long as a and b are normal variables so the two produce equivalent results.
If you were dealing with C++ rather than C, operator overloading would allow there to be more substantial differences though.
Related
I have seen various answer here that depicts Strange behavior of pow function in C.
But I Have something different to ask here.
In the below code I have initialized int x = pow(10,2) and int y = pow(10,n) (int n = 2).
In first case it when I print the result it shows 100 and in the other case it comes out to be 99.
I know that pow returns double and it gets truncated on storing in int, but I want to ask why the output comes to be different.
CODE1
#include<stdio.h>
#include<math.h>
int main()
{
int n = 2;
int x;
int y;
x = pow(10,2); //Printing Gives Output 100
y = pow(10,n); //Printing Gives Output 99
printf("%d %d" , x , y);
}
Output : 100 99
Why is the output coming out to be different. ?
My gcc version is 4.9.2
Update :
Code 2
int main()
{
int n = 2;
int x;
int y;
x = pow(10,2); //Printing Gives Output 100
y = pow(10,n); //Printing Gives Output 99
double k = pow(10,2);
double l = pow(10,n);
printf("%d %d\n" , x , y);
printf("%f %f\n" , k , l);
}
Output : 100 99
100.000000 100.000000
Update 2 Assembly Instructions FOR CODE1
Generated Assembly Instructions GCC 4.9.2 using gcc -S -masm=intel :
.LC1:
.ascii "%d %d\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
push ebp
mov ebp, esp
and esp, -16
sub esp, 48
call ___main
mov DWORD PTR [esp+44], 2
mov DWORD PTR [esp+40], 100 //Concerned Line
fild DWORD PTR [esp+44]
fstp QWORD PTR [esp+8]
fld QWORD PTR LC0
fstp QWORD PTR [esp]
call _pow //Concerned Line
fnstcw WORD PTR [esp+30]
movzx eax, WORD PTR [esp+30]
mov ah, 12
mov WORD PTR [esp+28], ax
fldcw WORD PTR [esp+28]
fistp DWORD PTR [esp+36]
fldcw WORD PTR [esp+30]
mov eax, DWORD PTR [esp+36]
mov DWORD PTR [esp+8], eax
mov eax, DWORD PTR [esp+40]
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], OFFSET FLAT:LC1
call _printf
leave
ret
.section .rdata,"dr"
.align 8
LC0:
.long 0
.long 1076101120
.ident "GCC: (tdm-1) 4.9.2"
.def _pow; .scl 2; .type 32; .endef
.def _printf; .scl 2; .type 32; .endef
I know that pow returns double and it gets truncated on storing in int, but I want to ask why the output comes to be different.
You must first, if you haven't already, divest yourself of the idea that floating-point numbers are in any way sensible or predictable. double only approximates real numbers and almost anything you do with a double is likely to be an approximation to the actual result.
That said, as you have realized, pow(10, n) resulted in a value like 99.99999999999997, which is an approximation accurate to 15 significant figures. And then you told it to truncate to the largest integer less than that, so it threw away most of those.
(Aside: there is rarely a good reason to convert a double to an int. Usually you should either format it for display with something like sprintf("%.0f", x), which does rounding correctly, or use the floor function, which can handle floating-point numbers that may be out of the range of an int. If neither of those suit your purpose, like in currency or date calculations, possibly you should not be using floating point numbers at all.)
There are two weird things going on here. First, why is pow(10, n) inaccurate? 10, 2, and 100 are all precisely representable as double. The best answer I can offer is that the C standard library you are using has a bug. (The compiler and the standard library, which I assume are gcc and glibc, are developed on different release schedules and by different teams. If pow is returning inaccurate results, that is probably a bug in glibc, not gcc.)
In the comments on your question, amdn found a glibc bug to do with FP rounding that might be related and another Q&A that goes into more detail about why this happens and how it's not a violation of the C standard. chux's answer also addresses this. (C doesn't require implementation of IEEE 754, but even if it did, pow isn't required to use correct rounding.) I will still call this a glibc bug, because it's an undesirable property.
(It's also conceivable, though unlikely, that your processor's FPU is wrong.)
Second, why is pow(10, n) different from pow(10, 2)? This one is far easier. gcc optimizes away function calls for which the result can be calculated at compile time, so pow(10, 2) is almost certainly being optimized to 100.0. If you look at the generated assembly code, you will find only one call to pow.
The GCC manual, section 6.59 describes which standard library functions may be treated in this way (follow the link for the full list):
The remaining functions are provided for optimization purposes.
With the exception of built-ins that have library equivalents such as the standard C library functions discussed below, or that expand to library calls, GCC built-in functions are always expanded inline and thus do not have corresponding entry points and their address cannot be obtained. Attempting to use them in an expression other than a function call results in a compile-time error.
[...]
The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).
So it would seem you can disable this behavior with -fno-builtin-pow.
Why is the output coming out to be different. ? (in the updated appended code)
We do not know the values are that different.
When comparing the textual out of int/double, be sure to print the double with sufficient precision to see if it is 100.000000 or just near 100.000000 or in hex to remove all doubt.
printf("%d %d\n" , x , y);
// printf("%f %f\n" , k , l);
// Is it the FP number just less than 100?
printf("%.17e %.17e\n" , k , l); // maybe 9.99999999999999858e+01
printf("%a %a\n" , k , l); // maybe 0x1.8ffffffffffff0000p+6
Why is the output coming out to be different. ? (in the original code)
C does not specify the accuracy of most <math.h> functions. The following are all compliant results.
// Higher quality functions return 100.0
pow(10,2) --> 100.0
// Lower quality and/or faster one may return nearby results
pow(10,2) --> 100.0000000000000142...
pow(10,2) --> 99.9999999999999857...
Assigning a floating point (FP) number to an int simple drops the fraction regardless of how close the fraction is to 1.0
When converting FP to an integer, better to control the conversion and round to cope with minor computational differences.
// long int lround(double x);
long i = lround(pow(10.0,2.0));
You're not the first to find this. Here's a discussion form 2013:
pow() cast to integer, unexpected result
I'm speculating that the assembly code produced by the tcc guys is causing the second value to be rounded down after calculating a result that is REALLY close to 100.
Like mikijov said in that historic post, looks like the bug has been fixed.
As others have mentioned, Code 2 returns 99 due to floating point truncation. The reason why Code 1 returns a different and correct answer is because of a libc optimization.
When the power is a small positive integer, it is more efficient to perform the operation as repeated multiplication. The simpler path removes roundoff. Since this is inlined you don't see function calls being made.
You've fooled it into thinking that the inputs are real and so it gives an approximate answer, which happens to be slightly under 100, e.g. 99.999999 that is then truncated to 99.
Example:
a : ++i;
b : i++;
c : i += 1;
d : i = i + 1;
Assuming each of them abcd are called completely simultaneous, which one of them will be performed first ?
Using gcc 5.2 to compile this program:
#include<stdio.h>
int main()
{
int i = 0;
++i;
i++;
i += 1;
i = i + 1;
return 0;
}
It gives this ASM:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
add DWORD PTR [rbp-4], 1 #++i
add DWORD PTR [rbp-4], 1 #i++
add DWORD PTR [rbp-4], 1 #i += 1
add DWORD PTR [rbp-4], 1 #i = i + 1
mov eax, 0
pop rbp
ret
Which means that with gcc 5.2 it's the exact same speed of execution.
It's seems to be the very same for version from 4.4.7 to 5.2.
In this particular example all four expressions have the exact same externally observable result so a competent compiler should generate the exact same code for them.
The compiler doesn't slavishly read the code and generate a few instructions for each statement, the compiler reasons about what the result of the code should be according to the standard and generates the code needed for a whole program to behave as required. Therefore asking performance questions about single statements is almost always meaningless. Let me show an example:
void foo(unsigned int a, unsigned int b) { unsigned int i = a * b; }
void bar(unsigned int a, unsigned int b) { unsigned int i = a + b; }
Which one is faster? Function foo or bar? Many would say "of course multiplication is slower", but most likely the answer is: both are equally fast because a very simple dead store optimization will see that nothing uses i, so there's no need to compute it, so the compiler can optimize the functions down to nothing. Let's try it:
$ cat > foo.c
void foo(unsigned int a, unsigned int b) { unsigned int i = a * b; }
void bar(unsigned int a, unsigned int b) { unsigned int i = a + b; }
$ cc -S -fomit-frame-pointer -O2 foo.c
$ cat foo.s
[... I edited out irrelevant spam to make this more readable ...]
_foo: ## #foo
retq
_bar: ## #bar
retq
The only instruction in both functions is retq which just returns from the function.
Modern compilers are smart enough to optimize all of the four cases to improve the performance.
You should note that in the last expression i = i+1, i will be evaluated twice.
In Programming, Unary operators are having higher priority than the other operators. Unary Operators are executed before the execution of the other operators. Pre and Post Increment operators are the examples of the Unary operators while c and d are binary operators, hence executed later.Also c is just the short-hand notation for d, hence both take the same time and from a and b, a is executed earlier than b as post increment is faster than pre increment.
I hope this answer helps.
In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):
#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
where constant_test_bit and variable_test_bit are defined as:
static inline int constant_test_bit(int nr, const volatile unsigned long *addr )
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}
I understand that __builtin_constant_p is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?
UPDATE: The following main function is used to test the performance:
constant, call constant_test_bit:
int main(void) {
unsigned long i, j = 21;
unsigned long cnt = 0;
srand(111)
//j = rand() % 31;
for (i = 1; i < (1 << 30); i++) {
j = (j + 1) % 28;
if (constant_test_bit(j, &i))
cnt++;
}
if (__builtin_constant_p(j))
printf("j is a compile time constant\n");
return 0;
}
This correctly outputs the sentence j is a...
For the other situations just uncomment the line which assigns a "random" number to j and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.
I use gcc test.c -O1 to compile, and here is the result:
constant, constant_test_bit:
$ time ./a.out
j is compile time constant
real 0m0.454s
user 0m0.450s
sys 0m0.000s
constant, variable_test_bit( omit time ./a.out, same for the following ):
j is compile time constant
real 0m0.885s
user 0m0.883s
sys 0m0.000s
variable, constant_test_bit:
real 0m0.485s
user 0m0.477s
sys 0m0.007s
variable, variable_test_bit:
real 0m3.471s
user 0m3.467s
sys 0m0.000s
I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit function is always faster than the variable_test_bit function, no matter whether the parameter is a compile time constant or not... For the last two results( when j is not constant ) the variable version is even dramatically slower than the constant one.
Am I missing something here?
Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc with the -O3 flag:
// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
int x = constant_test_bit(j, &i) ;
return x ;
}
// constant expression test case
int func2(unsigned long i)
{
int x = constant_test_bit(21, &i) ;
return x ;
}
We see the optimizer is able to optimize the constant expression case to the following:
shrq $21, %rax
andl $1, %eax
while the non-constant expression case ends up as follows:
sarl $5, %eax
andl $31, %ecx
cltq
leaq -8(%rsp,%rax,8), %rax
movq (%rax), %rax
shrq %cl, %rax
andl $1, %eax
So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit is pretty bad compared to the hand rolled assembly in variable_test_bit and the implementer must believe the constant expression case for constant_test_bit ends up being better than:
btl %edi,8(%rsp)
sbbl %esi,%esi
for most cases.
As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit inside the loop to:
movq (%rax), %rdi
even with an older gcc version, but this case may not be relevant to the cases test_bit is being used in. There may be more specific cases where this kind of optimization won't be possible.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
performance-impact-of-fno-strict-aliasing
strict-aliasing
when-is-char-safe-for-strict-pointer-aliasing
how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.
No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
Date Wed, 26 Feb 2003 09:22:15 -0800
Subject Re: Invalid compilation without -fno-strict-aliasing
From Jean Tourrilhes <>
On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:
Jean Tourrilhes <> said:
It looks like a compiler bug to me...
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
Code (from linux/include/net/iw_handler.h) :
static inline char *
iwe_stream_add_event(char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
IMHO, the compiler should have enough context to know that the
reordering is dangerous. Any suggestion to make this simple code more
bullet proof is welcomed.
The compiler is free to assume char *stream and struct iw_event *iwe point
to separate areas of memory, due to strict aliasing.
Which is true and which is not the problem I'm complaining about.
(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)
And Linus Torvald's comment on the above:
Jean Tourrilhes wrote:
>
It looks like a compiler bug to me...
Why do you think the kernel uses "-fno-strict-aliasing"?
The gcc people are more interested in trying to find out what can be
allowed by the c99 specs than about making things actually work. The
aliasing code in particular is not even worth enabling, it's just not
possible to sanely tell gcc when some things can alias.
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
The "problem" is that we inline the memcpy(), at which point gcc won't
care about the fact that it can alias, so they'll just re-order
everything and claim it's out own fault. Even though there is no sane
way for us to even tell gcc about it.
I tried to get a sane way a few years ago, and the gcc developers really
didn't care about the real world in this area. I'd be surprised if that
had changed, judging by the replies I have already seen.
I'm not going to bother to fight it.
Linus
http://www.mail-archive.com/linux-btrfs#vger.kernel.org/msg01647.html:
Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.
...
I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that
unsigned long a;
a = 5;
*(unsigned short *)&a = 4;
could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.
SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.
SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
jlong jresult = 0 ;
int arg1 ;
int arg2 ;
my_struct_t *result = 0 ;
(void)jenv;
(void)jcls;
arg1 = (int)jarg1;
arg2 = (int)jarg2;
result = (my_struct_t *)make_my_struct(arg1,arg2);
*(my_struct_t **)&jresult = result; /* <<<< horror*/
return jresult;
}
gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
#include <stdio.h>
static void copy(int n, int a[][n], int b[][n]) {
int i, j;
for (i = 0; i < 2; i++) // 'n' not used in this example
for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
b[i][j] = a[i][j];
}
int main(int argc, char *argv[]) {
int a[2][2] = {{1, 2},{3, 4}};
int b[2][2];
copy(2, a, b);
printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
return 0;
}
With gcc 4.1.2 on CentOS, I get:
$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
Use __attribute__((noinline)) for copy().
Use the gcc switch -fno-strict-aliasing.
Change the third parameter of copy() from b[][n] to b[][2].
Don't use -O2 or -O3.
Further notes:
This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
Yes, I know you wouldn't write copy() like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
No gcc warning switches, include -Wstrict-aliasing=, did anything here.
1-D variable-length arrays seem to be OK.
Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
static void zero(int n, int a[][n]) {
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = 0;
}
int main(void) {
int a[2][2] = {{1, 2},{3, 4}};
zero(2, a);
printf("%d\n", a[1][1]);
return 0;
}
produces the results:
gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4
It seems it is the combination -fstrict-aliasing with -finline which causes the bug.
here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.
The Common Initial Sequence rule of C used to be interpreted as making it
possible to write a function which could work on the leading portion of a
wide variety of structure types, provided they start with elements of matching
types. Under C99, the rule was changed so that it only applied if the structure
types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if
the accesses are performed through the union type, notwithstanding the facts
that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary
usefulness lay in what it implied about the way in which structs were
laid out and accessed. If S1 and S2 were structures that shared a CIS,
there would be no way that a function that accepted a pointer to an S1
and an S2 from an outside source could comply with C89's CIS rules
without allowing the same behavior to be useful with pointers to
structures that weren't actually inside a union object; specifying CIS
support for structures would thus have been redundant given that it was
already specified for unions.
The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
int main()
{
int v = 10;
union vv {
int v;
short q;
} *s = (union vv *)&v;
s->v = 1;
return v;
}
I'm building my project with GCC's -Wconversion warning flag. (gcc (Debian 4.3.2-1.1) 4.3.2) on a 64bit GNU/Linux OS/Hardware. I'm finding it useful in identifying where I've mixed types or lost clarity as to which types should be used.
It's not so helpful in most of the other situations which activate it's warnings and I'm asking how am I meant to deal with these:
enum { A = 45, B, C }; /* fine */
char a = A; /* huh? seems to not warn about A being int. */
char b = a + 1; /* warning converting from int to char */
char c = B - 2; /* huh? ignores this *blatant* int too.*/
char d = (a > b ? b : c) /* warning converting from int to char */
Due to the unexpected results of the above tests (cases a and c) I'm also asking for these differences to be explained also.
Edit: And is it over-engineering to cast all these with (char) to prevent the warning?
Edit2: Some extra cases (following on from above cases):
a += A; /* warning converting from int to char */
a++; /* ok */
a += (char)1; /* warning converting from int to char */
Aside from that, what I'm asking is subjective and I'd like to hear how other people deal with the conversion warnings in cases like these when you consider that some developers advocate removing all warnings.
YAE:
One possible solution is to just use ints instead of chars right? Well actually, not only does it require more memory, it is slower too, as can been demonstrated by the following code. The maths expressions are just there to get the warnings when built with -Wconversion. I assumed the version using char variables would run slower than that using ints due to the conversions, but on my (64bit dual core II) system the int version is slower.
#include <stdio.h>
#ifdef USE_INT
typedef int var;
#else
typedef char var;
#endif
int main()
{
var start = 10;
var end = 100;
var n = 5;
int b = 100000000;
while (b > 0) {
n = (start - 5) + (n - (n % 3 ? 1 : 3));
if (n >= end) {
n -= (end + 7);
n += start + 2;
}
b--;
}
return 0;
}
Pass -DUSE_INT to gcc to build the int version of the above snippet.
When you say /* int */ do you mean it's giving you a warning about that? I'm not seeing any warnings at all in this code with gcc 4.0.1 or 4.2.1 with -Wconversion. The compiler is converting these enums into constants. Since everything is known at compile time, there is no reason to generate a warning. The compiler can optimize out all the uncertainty (the following is Intel with 4.2.1):
movb $45, -1(%rbp) # a = 45
movzbl -1(%rbp), %eax
incl %eax
movb %al, -2(%rbp) # b = 45 + 1
movb $44, -3(%rbp) # c = 44 (the math is done at compile time)
movzbl -1(%rbp), %eax
cmpb -2(%rbp), %al
jle L2
movzbl -2(%rbp), %eax
movb %al, -17(%rbp)
jmp L4
L2:
movzbl -3(%rbp), %eax
movb %al, -17(%rbp)
L4:
movzbl -17(%rbp), %eax
movb %al, -4(%rbp) # d = (a > b ? b : c)
This is without turning on optimizations. With optimizations, it will calculate b and d for you at compile time and hardcode their final values (if it actually needs them for anything). The point is that gcc has already worked out that there can't be a problem here because all the possible values fit in a char.
EDIT: Let me amend this somewhat. There is a possible error in the assignment of b, and the compiler will never catch it, even if it's certain. For example, if b=a+250;, then this will be certain to overflow b but gcc will not issue a warning. It's because the assignment to a is legal, a is a char, and it's your problem (not the compiler's) to make sure that math doesn't overflow at runtime.
Maybe the compiler can already see that all the values fit into a char so it doesn't bother warning. I'd expect the enum to be resolved right at the beginning of the compilation.