I'm building my project with GCC's -Wconversion warning flag. (gcc (Debian 4.3.2-1.1) 4.3.2) on a 64bit GNU/Linux OS/Hardware. I'm finding it useful in identifying where I've mixed types or lost clarity as to which types should be used.
It's not so helpful in most of the other situations which activate it's warnings and I'm asking how am I meant to deal with these:
enum { A = 45, B, C }; /* fine */
char a = A; /* huh? seems to not warn about A being int. */
char b = a + 1; /* warning converting from int to char */
char c = B - 2; /* huh? ignores this *blatant* int too.*/
char d = (a > b ? b : c) /* warning converting from int to char */
Due to the unexpected results of the above tests (cases a and c) I'm also asking for these differences to be explained also.
Edit: And is it over-engineering to cast all these with (char) to prevent the warning?
Edit2: Some extra cases (following on from above cases):
a += A; /* warning converting from int to char */
a++; /* ok */
a += (char)1; /* warning converting from int to char */
Aside from that, what I'm asking is subjective and I'd like to hear how other people deal with the conversion warnings in cases like these when you consider that some developers advocate removing all warnings.
YAE:
One possible solution is to just use ints instead of chars right? Well actually, not only does it require more memory, it is slower too, as can been demonstrated by the following code. The maths expressions are just there to get the warnings when built with -Wconversion. I assumed the version using char variables would run slower than that using ints due to the conversions, but on my (64bit dual core II) system the int version is slower.
#include <stdio.h>
#ifdef USE_INT
typedef int var;
#else
typedef char var;
#endif
int main()
{
var start = 10;
var end = 100;
var n = 5;
int b = 100000000;
while (b > 0) {
n = (start - 5) + (n - (n % 3 ? 1 : 3));
if (n >= end) {
n -= (end + 7);
n += start + 2;
}
b--;
}
return 0;
}
Pass -DUSE_INT to gcc to build the int version of the above snippet.
When you say /* int */ do you mean it's giving you a warning about that? I'm not seeing any warnings at all in this code with gcc 4.0.1 or 4.2.1 with -Wconversion. The compiler is converting these enums into constants. Since everything is known at compile time, there is no reason to generate a warning. The compiler can optimize out all the uncertainty (the following is Intel with 4.2.1):
movb $45, -1(%rbp) # a = 45
movzbl -1(%rbp), %eax
incl %eax
movb %al, -2(%rbp) # b = 45 + 1
movb $44, -3(%rbp) # c = 44 (the math is done at compile time)
movzbl -1(%rbp), %eax
cmpb -2(%rbp), %al
jle L2
movzbl -2(%rbp), %eax
movb %al, -17(%rbp)
jmp L4
L2:
movzbl -3(%rbp), %eax
movb %al, -17(%rbp)
L4:
movzbl -17(%rbp), %eax
movb %al, -4(%rbp) # d = (a > b ? b : c)
This is without turning on optimizations. With optimizations, it will calculate b and d for you at compile time and hardcode their final values (if it actually needs them for anything). The point is that gcc has already worked out that there can't be a problem here because all the possible values fit in a char.
EDIT: Let me amend this somewhat. There is a possible error in the assignment of b, and the compiler will never catch it, even if it's certain. For example, if b=a+250;, then this will be certain to overflow b but gcc will not issue a warning. It's because the assignment to a is legal, a is a char, and it's your problem (not the compiler's) to make sure that math doesn't overflow at runtime.
Maybe the compiler can already see that all the values fit into a char so it doesn't bother warning. I'd expect the enum to be resolved right at the beginning of the compilation.
Related
The Effective Type rule in C99 and C11 provides that storage with no declared type may be written with any type and, that storing a value of a non-character type will set the Effective Type of the storage accordingly.
Setting aside the fact that INT_MAX might be less than 123456789, would the following code's use of the Effective Type rule be strictly conforming?
#include <stdlib.h>
#include <stdio.h>
/* Performs some calculations using using int, then float,
then int.
If both results are desired, do_test(intbuff, floatbuff, 1);
For int only, do_test(intbuff, intbuff, 1);
For float only, do_test(floatbuff, float_buff, 0);
The latter two usages require storage with no declared type.
*/
void do_test(void *p1, void *p2, int leave_as_int)
{
*(int*)p1 = 1234000000;
float f = *(int*)p1;
*(float*)p2 = f*2-1234000000.0f;
if (leave_as_int)
{
int i = *(float*)p2;
*(int*)p1 = i+567890;
}
}
void (*volatile test)(void *p1, void *p2, int leave_as_int) = do_test;
int main(void)
{
int iresult;
float fresult;
void *p = malloc(sizeof(int) + sizeof(float));
if (p)
{
test(p,p,1);
iresult = *(int*)p;
test(p,p,0);
fresult = *(float*)p;
free(p);
printf("%10d %15.2f\n", iresult,fresult);
}
return 0;
}
From my reading of the Standard, all three usages of the function described in the comment should be strictly conforming (except for the integer-range issue). The code should thus output 1234567890 1234000000.00. GCC 7.2, however, outputs 1234056789 1157904.00. I think that when leave_as_int is 0, it's storing 123400000 to *p1 after it stores 123400000.0f to *p2, but I see nothing in the Standard that would authorize such behavior. Am I missing anything, or is gcc non-conforming?
Yes, this is a gcc bug. I've filed it (with a simplified testcase) as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82697 .
The generated machine code unconditionally writes to both pointers:
do_test:
cmpl $1, %edx
movl $0x4e931ab1, (%rsi)
sbbl %eax, %eax
andl $-567890, %eax
addl $1234567890, %eax
movl %eax, (%rdi)
ret
This is a GCC bug because all stores are expected to change the dynamic type of the memory accessed. I do not think this behavior is mandated by the standard; it is a GCC extension.
Can you file a GCC bug?
In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):
#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
where constant_test_bit and variable_test_bit are defined as:
static inline int constant_test_bit(int nr, const volatile unsigned long *addr )
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}
I understand that __builtin_constant_p is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?
UPDATE: The following main function is used to test the performance:
constant, call constant_test_bit:
int main(void) {
unsigned long i, j = 21;
unsigned long cnt = 0;
srand(111)
//j = rand() % 31;
for (i = 1; i < (1 << 30); i++) {
j = (j + 1) % 28;
if (constant_test_bit(j, &i))
cnt++;
}
if (__builtin_constant_p(j))
printf("j is a compile time constant\n");
return 0;
}
This correctly outputs the sentence j is a...
For the other situations just uncomment the line which assigns a "random" number to j and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.
I use gcc test.c -O1 to compile, and here is the result:
constant, constant_test_bit:
$ time ./a.out
j is compile time constant
real 0m0.454s
user 0m0.450s
sys 0m0.000s
constant, variable_test_bit( omit time ./a.out, same for the following ):
j is compile time constant
real 0m0.885s
user 0m0.883s
sys 0m0.000s
variable, constant_test_bit:
real 0m0.485s
user 0m0.477s
sys 0m0.007s
variable, variable_test_bit:
real 0m3.471s
user 0m3.467s
sys 0m0.000s
I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit function is always faster than the variable_test_bit function, no matter whether the parameter is a compile time constant or not... For the last two results( when j is not constant ) the variable version is even dramatically slower than the constant one.
Am I missing something here?
Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc with the -O3 flag:
// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
int x = constant_test_bit(j, &i) ;
return x ;
}
// constant expression test case
int func2(unsigned long i)
{
int x = constant_test_bit(21, &i) ;
return x ;
}
We see the optimizer is able to optimize the constant expression case to the following:
shrq $21, %rax
andl $1, %eax
while the non-constant expression case ends up as follows:
sarl $5, %eax
andl $31, %ecx
cltq
leaq -8(%rsp,%rax,8), %rax
movq (%rax), %rax
shrq %cl, %rax
andl $1, %eax
So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit is pretty bad compared to the hand rolled assembly in variable_test_bit and the implementer must believe the constant expression case for constant_test_bit ends up being better than:
btl %edi,8(%rsp)
sbbl %esi,%esi
for most cases.
As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit inside the loop to:
movq (%rax), %rdi
even with an older gcc version, but this case may not be relevant to the cases test_bit is being used in. There may be more specific cases where this kind of optimization won't be possible.
Consider the following hypothetical type:
typedef struct Stack {
unsigned long len;
void **elements;
} Stack;
And the following hypothetical macros for dealing with the type (purely for enhanced readability.) In these macros I am assuming that the given argument has type (Stack *) instead of merely Stack (I can't be bothered to type out a _Generic expression here.)
#define stackNull(stack) (!stack->len)
#define stackHasItems(stack) (stack->len)
Why do I not simply use !stackNull(x) for checking if a stack has items? I thought that this would be slightly less efficient (read: not noticeable at all really, but I thought it was interesting) than simply checking stack->len because it would lead to double negation. In the following case:
int thingy = !!31337;
printf("%d\n", thingy);
if (thingy)
doSomethingImportant(thingy);
The string "1\n" would be printed, and It would be impossible to optimize the conditional (well actually, only impossible if the thingy variable didn't have a constant initializer or was modified before the test, but we'll say in this instance that 31337 is not a constant) because (!!x) is guaranteed to be either 0 or 1.
But I'm wondering if compilers will optimize something like the following
int thingy = wellOkaySoImNotAConstantThingyAnyMore();
if (!!thingy)
doSomethingFarLessImportant();
Will this be optimized to actually just use (thingy) in the if statement, as if the if statement had been written as
if (thingy)
doSomethingFarLessImportant();
If so, does it expand to (!!!!!thingy) and so on? (however this is a slightly different question, as this can be optimized in any case, !thingy is !!!!!thingy no matter what, just like -(-(-(1))) = -1.)
In the question title I said "compilers", by that I mean that I am asking if any compiler does this, however I am particularly interested in how GCC will behave in this instance as it is my compiler of choice.
This seems like a pretty reasonable optimization and a quick test using godbolt with this code (see it live):
#include <stdio.h>
void func( int x)
{
if( !!x )
{
printf( "first\n" ) ;
}
if( !!!x )
{
printf( "second\n" ) ;
}
}
int main()
{
int x = 0 ;
scanf( "%d", &x ) ;
func( x ) ;
}
seems to indicate gcc does well, it generates the following:
func:
testl %edi, %edi # x
jne .L4 #,
movl $.LC1, %edi #,
jmp puts #
.L4:
movl $.LC0, %edi #,
jmp puts #
we can see from the first line:
testl %edi, %edi # x
it just uses x without doing any operations on it, also notice the optimizer is clever enough to combine both tests into one since if the first condition is true the other must be false.
Note I used printf and scanf for side effects to prevent the optimizer from optimizing all the code away.
I know in some languages the following:
a += b
is more efficient than:
a = a + b
because it removes the need for creating a temporary variable. Is this the case in C? Is it more efficient to use += (and, therefore also -= *= etc)
So here's a definitive answer...
$ cat junk1.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s = s + a * a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ cat junk2.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s += a * a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ for a in *.c ; do gcc -O3 -o ${a%.c} $a ; done
michael#isolde:~/junk$ time ./junk1
Final sum: 3338615082255021824
real 0m2.188s
user 0m2.120s
sys 0m0.000s
michael#isolde:~/junk$ time ./junk2
Final sum: 3338615082255021824
real 0m2.179s
user 0m2.120s
sys 0m0.000s
...for my computer and my compiler running on my operating system. Your results may or may not vary. On my system, however, the time is identical: user time 2.120s.
Now just to show you how impressive modern compilers can be, you'll note that I used the expression a * a in the assignment. This is because of this little problem:
$ cat junk.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s = s + a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ gcc -O3 -S junk.c
michael#isolde:~/junk$ cat junk.s
.file "junk.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Final sum: %ld\n"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
movabsq $499999999500000000, %rdx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
jmp __printf_chk
.cfi_endproc
.LFE22:
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
The compiler figured out my loop and unrolled it to the point of calculating the cumulative sum and just embedded that as a constant which it proceeded to print out, skipping any kind of looping construct entirely. In the face of optimizers that clever do you really think you're going to find any meaningful edge in distinguishing between s = s + a and s += a?!
This is a compiler specific question really, but I expect all modern compilers would give the same result. Using Visual Studio 2008:
int main() {
int a = 10;
int b = 30;
a = a + b;
int c = 10;
int d = 50;
c += d;
}
The line a = a + b has disassembly
0014139C mov eax,dword ptr [a]
0014139F add eax,dword ptr [b]
001413A2 mov dword ptr [a],eax
The line c += d has disassembly
001413B3 mov eax,dword ptr [c]
001413B6 add eax,dword ptr [d]
001413B9 mov dword ptr [c],eax
Which is the same. They are compiled into the same code.
It depends on what a is.
a += b in C is by definition equivalent to a = a + b, except that from the abstract point of view a is evaluated only once in the former variant. If a is a "pure" value, i.e. if evaluating it once vs. evaluating it many times makes no impact on programs behavior, then a += b is strictly equivalent to a = a + b in all regards, including efficiency.
In other words, in situations when you actually have a free choice between a += b and a = a + b (meaning that you know that they do the same thing) they will generally have exactly the same efficiency. Some compilers might have difficulties when a stands for a function call (for one example; probably not what you meant), but when a is a non-volatile variable the machine code generated for both expressions will be the same.
For another example, if a is a volatile variable, then a += b and a = a + b have different behavior and, therefore, different efficiency. However, since they are not equivalent, your question simply does not apply in such cases.
In the simple cases shown in the question, there is no significant difference. Where the assignment operator scores is when you have an expression such as:
s[i]->m[j1].k = s[i]->m[jl].k + 23; // Was that a typo?
vs:
s[i]->m[j1].k += 23;
Two benefits - and I'm not counting less typing. There's no question about whether there was a typo when the first and second expressions differ; and the compiler doesn't evaluate the complex expression twice. The chances are that won't make much difference these days (optimizing compilers are a lot better than they used to be), but you could have still more complex expressions (evaluating a function defined in another translation unit, for example, as part of the subscripting) where the compiler may not be able to avoid evaluating the expression twice:
s[i]->m[somefunc(j1)].k = s[i]->m[somefunc(j1)].k + 23;
s[i]->m[somefunc(j1)].k += 23;
Also, you can write (if you're brave):
s[i++]->m[j1++].k += 23;
But you cannot write:
s[i++]->m[j1++].k = s[i]->m[j1].k + 23;
s[i]->m[j1].k = s[i++]->m[j1++].k + 23;
(or any other permutation) because the order of evaluation is not defined.
a += b
is more efficient than
a = a + b
because the former takes you 6 keystrokes and the latter takes you 9 keystrokes.
With modern hardware, even if the compiler is stupid and uses slower code for one than the other, the total time saved over the lifetime of the program may possibly be less than the time it takes you to type the three extra key strokes.
However, as others have said, the compiler almost certainly produces exactly the same code so the former is more efficient.
Even if you factor in readability, most C programmers probably mentally parse the former more quickly than the latter because it is such a common pattern.
In virtually all cases, the two produce identical results.
Other than with a truly ancient or incompetently written compiler there should be no difference, as long as a and b are normal variables so the two produce equivalent results.
If you were dealing with C++ rather than C, operator overloading would allow there to be more substantial differences though.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
performance-impact-of-fno-strict-aliasing
strict-aliasing
when-is-char-safe-for-strict-pointer-aliasing
how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.
No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
Date Wed, 26 Feb 2003 09:22:15 -0800
Subject Re: Invalid compilation without -fno-strict-aliasing
From Jean Tourrilhes <>
On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:
Jean Tourrilhes <> said:
It looks like a compiler bug to me...
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
Code (from linux/include/net/iw_handler.h) :
static inline char *
iwe_stream_add_event(char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
IMHO, the compiler should have enough context to know that the
reordering is dangerous. Any suggestion to make this simple code more
bullet proof is welcomed.
The compiler is free to assume char *stream and struct iw_event *iwe point
to separate areas of memory, due to strict aliasing.
Which is true and which is not the problem I'm complaining about.
(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)
And Linus Torvald's comment on the above:
Jean Tourrilhes wrote:
>
It looks like a compiler bug to me...
Why do you think the kernel uses "-fno-strict-aliasing"?
The gcc people are more interested in trying to find out what can be
allowed by the c99 specs than about making things actually work. The
aliasing code in particular is not even worth enabling, it's just not
possible to sanely tell gcc when some things can alias.
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
The "problem" is that we inline the memcpy(), at which point gcc won't
care about the fact that it can alias, so they'll just re-order
everything and claim it's out own fault. Even though there is no sane
way for us to even tell gcc about it.
I tried to get a sane way a few years ago, and the gcc developers really
didn't care about the real world in this area. I'd be surprised if that
had changed, judging by the replies I have already seen.
I'm not going to bother to fight it.
Linus
http://www.mail-archive.com/linux-btrfs#vger.kernel.org/msg01647.html:
Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.
...
I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that
unsigned long a;
a = 5;
*(unsigned short *)&a = 4;
could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.
SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.
SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
jlong jresult = 0 ;
int arg1 ;
int arg2 ;
my_struct_t *result = 0 ;
(void)jenv;
(void)jcls;
arg1 = (int)jarg1;
arg2 = (int)jarg2;
result = (my_struct_t *)make_my_struct(arg1,arg2);
*(my_struct_t **)&jresult = result; /* <<<< horror*/
return jresult;
}
gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
#include <stdio.h>
static void copy(int n, int a[][n], int b[][n]) {
int i, j;
for (i = 0; i < 2; i++) // 'n' not used in this example
for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
b[i][j] = a[i][j];
}
int main(int argc, char *argv[]) {
int a[2][2] = {{1, 2},{3, 4}};
int b[2][2];
copy(2, a, b);
printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
return 0;
}
With gcc 4.1.2 on CentOS, I get:
$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
Use __attribute__((noinline)) for copy().
Use the gcc switch -fno-strict-aliasing.
Change the third parameter of copy() from b[][n] to b[][2].
Don't use -O2 or -O3.
Further notes:
This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
Yes, I know you wouldn't write copy() like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
No gcc warning switches, include -Wstrict-aliasing=, did anything here.
1-D variable-length arrays seem to be OK.
Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
static void zero(int n, int a[][n]) {
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = 0;
}
int main(void) {
int a[2][2] = {{1, 2},{3, 4}};
zero(2, a);
printf("%d\n", a[1][1]);
return 0;
}
produces the results:
gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4
It seems it is the combination -fstrict-aliasing with -finline which causes the bug.
here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.
The Common Initial Sequence rule of C used to be interpreted as making it
possible to write a function which could work on the leading portion of a
wide variety of structure types, provided they start with elements of matching
types. Under C99, the rule was changed so that it only applied if the structure
types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if
the accesses are performed through the union type, notwithstanding the facts
that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary
usefulness lay in what it implied about the way in which structs were
laid out and accessed. If S1 and S2 were structures that shared a CIS,
there would be no way that a function that accepted a pointer to an S1
and an S2 from an outside source could comply with C89's CIS rules
without allowing the same behavior to be useful with pointers to
structures that weren't actually inside a union object; specifying CIS
support for structures would thus have been redundant given that it was
already specified for unions.
The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
int main()
{
int v = 10;
union vv {
int v;
short q;
} *s = (union vv *)&v;
s->v = 1;
return v;
}