Is there a way to determine a member offset at compile time? - c

I'm finding I'm spending a lot of time trying to determine member offsets of structures while debugging. I was wondering if there was a quick way to determine the offset of a member within a large structure at compile time. (Note: I know several ways of creating compile time assertions that an offset is in a given range, and I can do a binary search for the correct value, but I'm looking for something more efficient). I'm using a fairly recent version of gcc to compile C code.

offsetof is what you want for this and it compile time. The C99 draft standard section 7.17 Common definitions paragraph 3 says:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes [...]
the man pages linked above has the following sample code, which demonstrates it's usage:
struct s {
int i;
char c;
double d;
char a[];
};
/* Output is compiler dependent */
printf("offsets: i=%ld; c=%ld; d=%ld a=%ld\n",
(long) offsetof(struct s, i),
(long) offsetof(struct s, c),
(long) offsetof(struct s, d),
(long) offsetof(struct s, a));
printf("sizeof(struct s)=%ld\n", (long) sizeof(struct s));
sample output, which may vary:
offsets: i=0; c=4; d=8 a=16
sizeof(struct s)=16
For reference constant expressions are covered in section 6.6 Constant expressions and paragraph 2 says:
A constant expression can be evaluated during translation rather than runtime, and
accordingly may be used in any place that a constant may be.
Update
After clarification from the OP I came up with a new methods using -O0 -fverbose-asm -S and grep, code is as follows:
#include <stddef.h>
struct s {
int i;
char c;
double d;
char a[];
};
int main()
{
int offsetArray[4] = { offsetof(struct s, i ), offsetof( struct s, c ), offsetof(struct s, d ), offsetof(struct s, a ) } ;
return 0 ;
}
build using:
gcc -x c -std=c99 -O0 -fverbose-asm -S main.cpp && cat main.s | grep offsetArray
sample output (live example):
movl $0, -16(%rbp) #, offsetArray
movl $4, -12(%rbp) #, offsetArray
movl $8, -8(%rbp) #, offsetArray
movl $16, -4(%rbp) #, offsetArray

Ok, answering my own question here: Note: I'm looking to determine the offset at compile time, that is, I don't want to have to run the code (I can also compile just the files I need, and not the whole system): The following can be cut and paste for those who are interested:
#include <stddef.h>
#define offsetof_ct(structname, membername) \
void ___offset_##membername ## ___(void) { \
volatile char dummy[10000 + offsetof(structname, membername) ]; \
dummy[0]=dummy[0]; \
}
struct x {
int a[100];
int b[20];
int c[30];
};
offsetof_ct(struct x,a);
offsetof_ct(struct x,b);
offsetof_ct(struct x,c);
And then run:
~/tmp> gcc tst.c -Wframe-larger-than=1000
tst.c: In function ‘___offset_a___’:
tst.c:16:1: warning: the frame size of 10000 bytes is larger than 1000 bytes
tst.c: In function ‘___offset_b___’:
tst.c:17:1: warning: the frame size of 10400 bytes is larger than 1000 bytes
tst.c: In function ‘___offset_c___’:
tst.c:18:1: warning: the frame size of 10480 bytes is larger than 1000 bytes
/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
I then subtract 10000 to get the offset. Note: I tried adding #pragma GCC diagnostic warning "-Wframe-larger-than=1000" into the file, but it didn't like it, so it'll have to be specified on the command line.
John

This should do:
#define OFFSETOF(T, m) \
(size_t) (((char *) &(((T*) NULL)->m)) - ((char *) ((T*) NULL)))
Call it like this:
size_t off = OFFSETOF(struct s, c);

Related

Is this use of the Effective Type rule strictly conforming?

The Effective Type rule in C99 and C11 provides that storage with no declared type may be written with any type and, that storing a value of a non-character type will set the Effective Type of the storage accordingly.
Setting aside the fact that INT_MAX might be less than 123456789, would the following code's use of the Effective Type rule be strictly conforming?
#include <stdlib.h>
#include <stdio.h>
/* Performs some calculations using using int, then float,
then int.
If both results are desired, do_test(intbuff, floatbuff, 1);
For int only, do_test(intbuff, intbuff, 1);
For float only, do_test(floatbuff, float_buff, 0);
The latter two usages require storage with no declared type.
*/
void do_test(void *p1, void *p2, int leave_as_int)
{
*(int*)p1 = 1234000000;
float f = *(int*)p1;
*(float*)p2 = f*2-1234000000.0f;
if (leave_as_int)
{
int i = *(float*)p2;
*(int*)p1 = i+567890;
}
}
void (*volatile test)(void *p1, void *p2, int leave_as_int) = do_test;
int main(void)
{
int iresult;
float fresult;
void *p = malloc(sizeof(int) + sizeof(float));
if (p)
{
test(p,p,1);
iresult = *(int*)p;
test(p,p,0);
fresult = *(float*)p;
free(p);
printf("%10d %15.2f\n", iresult,fresult);
}
return 0;
}
From my reading of the Standard, all three usages of the function described in the comment should be strictly conforming (except for the integer-range issue). The code should thus output 1234567890 1234000000.00. GCC 7.2, however, outputs 1234056789 1157904.00. I think that when leave_as_int is 0, it's storing 123400000 to *p1 after it stores 123400000.0f to *p2, but I see nothing in the Standard that would authorize such behavior. Am I missing anything, or is gcc non-conforming?
Yes, this is a gcc bug. I've filed it (with a simplified testcase) as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82697 .
The generated machine code unconditionally writes to both pointers:
do_test:
cmpl $1, %edx
movl $0x4e931ab1, (%rsi)
sbbl %eax, %eax
andl $-567890, %eax
addl $1234567890, %eax
movl %eax, (%rdi)
ret
This is a GCC bug because all stores are expected to change the dynamic type of the memory accessed. I do not think this behavior is mandated by the standard; it is a GCC extension.
Can you file a GCC bug?

incompatible return type from struct function - C

When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}
If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..
I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise

Which is faster: Increment or equation with addition arithmetic

Example:
a : ++i;
b : i++;
c : i += 1;
d : i = i + 1;
Assuming each of them abcd are called completely simultaneous, which one of them will be performed first ?
Using gcc 5.2 to compile this program:
#include<stdio.h>
int main()
{
int i = 0;
++i;
i++;
i += 1;
i = i + 1;
return 0;
}
It gives this ASM:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
add DWORD PTR [rbp-4], 1 #++i
add DWORD PTR [rbp-4], 1 #i++
add DWORD PTR [rbp-4], 1 #i += 1
add DWORD PTR [rbp-4], 1 #i = i + 1
mov eax, 0
pop rbp
ret
Which means that with gcc 5.2 it's the exact same speed of execution.
It's seems to be the very same for version from 4.4.7 to 5.2.
In this particular example all four expressions have the exact same externally observable result so a competent compiler should generate the exact same code for them.
The compiler doesn't slavishly read the code and generate a few instructions for each statement, the compiler reasons about what the result of the code should be according to the standard and generates the code needed for a whole program to behave as required. Therefore asking performance questions about single statements is almost always meaningless. Let me show an example:
void foo(unsigned int a, unsigned int b) { unsigned int i = a * b; }
void bar(unsigned int a, unsigned int b) { unsigned int i = a + b; }
Which one is faster? Function foo or bar? Many would say "of course multiplication is slower", but most likely the answer is: both are equally fast because a very simple dead store optimization will see that nothing uses i, so there's no need to compute it, so the compiler can optimize the functions down to nothing. Let's try it:
$ cat > foo.c
void foo(unsigned int a, unsigned int b) { unsigned int i = a * b; }
void bar(unsigned int a, unsigned int b) { unsigned int i = a + b; }
$ cc -S -fomit-frame-pointer -O2 foo.c
$ cat foo.s
[... I edited out irrelevant spam to make this more readable ...]
_foo: ## #foo
retq
_bar: ## #bar
retq
The only instruction in both functions is retq which just returns from the function.
Modern compilers are smart enough to optimize all of the four cases to improve the performance.
You should note that in the last expression i = i+1, i will be evaluated twice.
In Programming, Unary operators are having higher priority than the other operators. Unary Operators are executed before the execution of the other operators. Pre and Post Increment operators are the examples of the Unary operators while c and d are binary operators, hence executed later.Also c is just the short-hand notation for d, hence both take the same time and from a and b, a is executed earlier than b as post increment is faster than pre increment.
I hope this answer helps.

Does struct with a single member have the same performance as a member type?

Does struct with a single member have the same performance as a member type (memory usage and speed)?
Example:
This code is a struct with a single member:
struct my_int
{
int value;
};
is the performance of my_int same as int ?
Agree with #harper overall, but watch out for the following:
A classic difference is seen with a "unstructured" array and an array in a structure.
char s1[1000];
// vs
typedef struct {
char s2[1000];
} s_T;
s_T s3;
When calling functions ...
void f1(char s[1000]);
void f2(s_T s);
void f3(s_T *s);
// Significant performance difference is not expected.
// In both, only an address is passed.
f1(s1);
f1(s3.s2);
// Significant performance difference is expected.
// In the second case, a copy of the entire structure is passed.
// This style of parameter passing is usually frowned upon.
f1(s1);
f2(s3);
// Significant performance difference is not expected.
// In both, only an address is passed.
f1(s1);
f3(&s3);
In some cases, the ABI may have specific rules for returning structures and passing them to functions. For example, given
struct S { int m; };
struct S f(int a, struct S b);
int g(int a, S b);
calling f or g may, for example, pass a in a register, and pass b on the stack. Similarly, calling g may use a register for the return value, whereas calling f may require the caller to set up a location where f will store its result.
The performance differences of this should normally be negligible, but one case where it could make a significant difference is when this difference enables or disables tail recursion.
Suppose g is implemented as int g(int a, struct S b) { return g(a, b).m; }. Now, on an implementation where f's result is returned the same way as g's, this may compile to (actual output from clang)
.file "test.c"
.text
.globl g
.align 16, 0x90
.type g,#function
g: # #g
.cfi_startproc
# BB#0:
jmp f # TAILCALL
.Ltmp0:
.size g, .Ltmp0-g
.cfi_endproc
.section ".note.GNU-stack","",#progbits
However, on other implementations, such a tail call is not possible, so if you want to achieve the same results for a deeply recursive function, you really need to give f and g the same return type or you risk a stack overflow. (I'm aware that tail calls are not mandated.)
This doesn't mean int is faster than S, nor does it mean that S is faster than int, though. The memory use would be similar regardless of whether int or S is used, so long as the same one is consistently used.
If the compiler has any penalty on using structs instead of single variables is strictly compiler and compiler options dependent.
But there are no reasons why the compiler should make any differences when your struct contains only one member. There should be additional code necessary to access the member nor to derefence any pointer to such an struct. If you don't have this oversimplified structure with one member deferencing might cost one addtional CPU instruction depending on the used CPU.
A minimal example w/ GCC 10.2.0 -O3 gives exactly the same output i.e. no overhead introduced by struct:
diff -u0 <(
gcc -S -o /dev/stdout -x c -std=gnu17 -O3 -Wall -Wextra - <<EOF
// Out of the box
void OOTB(int *n){
*n+=999;
}
EOF
) <(
gcc -S -o /dev/stdout -x c -std=gnu17 -O3 -Wall -Wextra - <<EOF
// One member struct
typedef struct { int inq_n; } inq;
void OMST(inq *n){
n->inq_n+=999;
}
EOF
)
--- /dev/fd/63 [...]
+++ /dev/fd/62 [...]
## -4,3 +4,3 ##
- .globl OOTB
- .type OOTB, #function
-OOTB:
+ .globl OMST
+ .type OMST, #function
+OMST:
## -13 +13 ##
- .size OOTB, .-OOTB
+ .size OMST, .-OMST
Not sure about more realistic/complex situations.

gcc, strict-aliasing, and horror stories [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
performance-impact-of-fno-strict-aliasing
strict-aliasing
when-is-char-safe-for-strict-pointer-aliasing
how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.
No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
Date Wed, 26 Feb 2003 09:22:15 -0800
Subject Re: Invalid compilation without -fno-strict-aliasing
From Jean Tourrilhes <>
On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:
Jean Tourrilhes <> said:
It looks like a compiler bug to me...
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
Code (from linux/include/net/iw_handler.h) :
static inline char *
iwe_stream_add_event(char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
IMHO, the compiler should have enough context to know that the
reordering is dangerous. Any suggestion to make this simple code more
bullet proof is welcomed.
The compiler is free to assume char *stream and struct iw_event *iwe point
to separate areas of memory, due to strict aliasing.
Which is true and which is not the problem I'm complaining about.
(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)
And Linus Torvald's comment on the above:
Jean Tourrilhes wrote:
>
It looks like a compiler bug to me...
Why do you think the kernel uses "-fno-strict-aliasing"?
The gcc people are more interested in trying to find out what can be
allowed by the c99 specs than about making things actually work. The
aliasing code in particular is not even worth enabling, it's just not
possible to sanely tell gcc when some things can alias.
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
The "problem" is that we inline the memcpy(), at which point gcc won't
care about the fact that it can alias, so they'll just re-order
everything and claim it's out own fault. Even though there is no sane
way for us to even tell gcc about it.
I tried to get a sane way a few years ago, and the gcc developers really
didn't care about the real world in this area. I'd be surprised if that
had changed, judging by the replies I have already seen.
I'm not going to bother to fight it.
Linus
http://www.mail-archive.com/linux-btrfs#vger.kernel.org/msg01647.html:
Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.
...
I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that
unsigned long a;
a = 5;
*(unsigned short *)&a = 4;
could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.
SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.
SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
jlong jresult = 0 ;
int arg1 ;
int arg2 ;
my_struct_t *result = 0 ;
(void)jenv;
(void)jcls;
arg1 = (int)jarg1;
arg2 = (int)jarg2;
result = (my_struct_t *)make_my_struct(arg1,arg2);
*(my_struct_t **)&jresult = result; /* <<<< horror*/
return jresult;
}
gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
#include <stdio.h>
static void copy(int n, int a[][n], int b[][n]) {
int i, j;
for (i = 0; i < 2; i++) // 'n' not used in this example
for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
b[i][j] = a[i][j];
}
int main(int argc, char *argv[]) {
int a[2][2] = {{1, 2},{3, 4}};
int b[2][2];
copy(2, a, b);
printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
return 0;
}
With gcc 4.1.2 on CentOS, I get:
$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
Use __attribute__((noinline)) for copy().
Use the gcc switch -fno-strict-aliasing.
Change the third parameter of copy() from b[][n] to b[][2].
Don't use -O2 or -O3.
Further notes:
This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
Yes, I know you wouldn't write copy() like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
No gcc warning switches, include -Wstrict-aliasing=, did anything here.
1-D variable-length arrays seem to be OK.
Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
static void zero(int n, int a[][n]) {
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = 0;
}
int main(void) {
int a[2][2] = {{1, 2},{3, 4}};
zero(2, a);
printf("%d\n", a[1][1]);
return 0;
}
produces the results:
gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4
It seems it is the combination -fstrict-aliasing with -finline which causes the bug.
here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.
The Common Initial Sequence rule of C used to be interpreted as making it
possible to write a function which could work on the leading portion of a
wide variety of structure types, provided they start with elements of matching
types. Under C99, the rule was changed so that it only applied if the structure
types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if
the accesses are performed through the union type, notwithstanding the facts
that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary
usefulness lay in what it implied about the way in which structs were
laid out and accessed. If S1 and S2 were structures that shared a CIS,
there would be no way that a function that accepted a pointer to an S1
and an S2 from an outside source could comply with C89's CIS rules
without allowing the same behavior to be useful with pointers to
structures that weren't actually inside a union object; specifying CIS
support for structures would thus have been redundant given that it was
already specified for unions.
The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
int main()
{
int v = 10;
union vv {
int v;
short q;
} *s = (union vv *)&v;
s->v = 1;
return v;
}

Resources