Religious arguments aside:
Option1:
if (pointer[i] == NULL) ...
Option2:
if (!pointer[i]) ...
In C is option1 functionally equivalent to option2?
Does the later resolve quicker due to absence of a comparison ?
I prefer the explicit style (first version). It makes it obvious that there is a pointer involved and not an integer or something else but it's just a matter of style.
From a performance point of view, it should make no difference.
Equivalent. It says so in the language standard. And people have the damndest religious preferences!
I like the second, other people like the first.
Actually, I prefer a third kind to the first:
if (NULL == ptr) {
...
}
Because then I:
won't be able to miss and just type one =
won't miss the == NULL and mistake it for the opposite if the condition is long (multiple lines)
Functionally they are equivalent.
Even if a NULL pointer is not "0" (all zero bits), if (!ptr) compares with the NULL pointer.
The following is incorrect. It's still here because there are many comments referring to it:
Do not compare a pointer with literal zero, however. It will work almost everywhere but is undefined behavior IIRC.
It is often useful to assume that compiler writers have at least a minimum of intelligence. Your compiler is not written by concussed ducklings. It is written by human beings, with years of programming experience, and years spent studying compiler theory. This doesn't mean that your compiler is perfect, and always knows best, but it does mean that it is perfectly capable of handling trivial automatic optimizations.
If the two forms are equivalent, then why wouldn't the compiler just translate one into the other to ensure both are equally efficient?
If if (pointer[i] == NULL) was slower than if (!pointer[i]), wouldn't the compiler just change it into the second, more efficient form?
So no, assuming they are equivalent, they are equally efficient.
As for the first part of the question, yes, they are equivalent. The language standard actually states this explicitly somewhere -- a pointer evaluates to true if it is non-NULL, and false if it is NULL, so the two are exactly identical.
Almost certainly no difference in performance. I prefer the implicit style of the second, though.
NULL should be declared in one of the standard header files as such:
#define NULL ((void*)0)
So either way, you are comparing against zero, and the compiler should optimize both the same way. Every processor has some "optimization" or opcode for comparing with zero.
Early optimization is bad. Micro optimization is also bad, unless you are trying to squeeze every last bit of Hz from your CPU, there is no point it doing it. As people have already shown, the compiler will optimize most of your code away anyways.
Its best to make your code as concise and readable as possible. If this is more readable
if (!ptr)
than this
if (NULL==ptr)
then use it. As long as everyone who will be reading your code agrees.
Personally I use the fully defined value (NULL==ptr) so it is clear what I am checking for. Might be longer to type, but I can easily read it. I'd think the !ptr would be easy to miss ! if reading to quickly.
It really depends on the compiler. I'd be surprised if most modern C compilers didn't generate virtually identical code for the specific scenario you describe.
Get your compiler to generate an assembly listing for each of those scenarios and you can answer your own question (for your particular compiler :)).
And even if they are different, the performance difference will probably be irrelevant in practical applications.
Turn on compiler optimization and they're basically the same
tested this on gcc 4.3.3
int main (int argc, char** argv) {
char c = getchar();
int x = (c == 'x');
if(x == NULL)
putchar('y');
return 0;
}
vs
int main (int argc, char** argv) {
char c = getchar();
int x = (c == 'x');
if(!x)
putchar('y');
return 0;
}
gcc -O -o test1 test1.c
gcc -O -o test2 test2.c
diff test1 test2
produced no output :)
I did a assembly dump, and found the difference between the two versions:
## -11,8 +11,7 ##
pushl %ecx
subl $20, %esp
movzbl -9(%ebp), %eax
- movsbl %al,%eax
- testl %eax, %eax
+ testb %al, %al
It looks like the latter actually generates one instruction and the first generates two, but this is pretty unscientific.
This is gcc, no optimizations:
test1.c:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char *pointer[5];
if(pointer[0] == NULL) {
exit(1);
}
exit(0);
}
test2.c: Change pointer[0] == NULL to !pointer[0]
gcc -s test1.c, gcc -s test2.c, diff -u test1.s test2.s
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char pointer[5];
/* This is insense you are comparing a pointer to a value */
if(pointer[0] == NULL) {
exit(1);
}
...
}
=> ...
movzbl 9(%ebp), %eax # your code compares a 1 byte value to a signed 4 bytes one
movsbl %al,%eax # Will result in sign extension...
testl %eax, %eax
...
Beware, gcc should have bumped out a warning, if not the case compile with -Wall flag on
Though, you should always compile to optimized gcc code.
BTW, precede your variable with volatile keyword in order to avoid gcc from ignoring it...
Always mention your compiler build version :)
Related
Consider:
#include <stdio.h>
char toUpper(char);
int main(void)
{
char ch, ch2;
printf("lowercase input: ");
ch = getchar();
ch2 = toUpper(ch);
printf("%c ==> %c\n", ch, ch2);
return 0;
}
char toUpper(char c)
{
if(c>='a' && c<='z')
c = c - 32;
}
In the toUpper function, the return type is char, but there isn't any "return" in toUpper(). And compile the source code with gcc (GCC) 4.5.1 20100924 (Red Hat 4.5.1-4), Fedora 14.
Of course, a warning is issued: "warning: control reaches end of non-void function", but, working well.
What has happened in that code during compile with gcc?
When the C program was compiled into assembly language, your toUpper function ended up like this, perhaps:
_toUpper:
LFB4:
pushq %rbp
LCFI3:
movq %rsp, %rbp
LCFI4:
movb %dil, -4(%rbp)
cmpb $96, -4(%rbp)
jle L8
cmpb $122, -4(%rbp)
jg L8
movzbl -4(%rbp), %eax
subl $32, %eax
movb %al, -4(%rbp)
L8:
leave
ret
The subtraction of 32 was carried out in the %eax register. And in the x86 calling convention, that is the register in which the return value is expected to be! So... you got lucky.
But please pay attention to the warnings. They are there for a reason!
It depends on the Application Binary Interface and which registers are used for the computation.
E.g. on x86, the first function parameter and the return value is stored in EAX and so gcc is most likely using this to store the result of the calculation as well.
Essentially, c is pushed into the spot that should later be filled with the return value; since it's not overwritten by use of return, it ends up as the value returned.
Note that relying on this (in C, or any other language where this isn't an explicit language feature, like Perl), is a Bad Idea™. In the extreme.
One missing thing that's important to understand is that it's rarely a diagnosable error to omit a return statement. Consider this function:
int f(int x)
{
if (x!=42) return x*x;
}
As long as you never call it with an argument of 42, a program containing this function is perfectly valid C and does not invoke any undefined behavior, despite the fact that it would invoke UB if you called f(42) and subsequently attempted to use the return value.
As such, while it's possible for a compiler to provide warning heuristics for missing return statements, it's impossible to do so without false positives or false negatives. This is a consequence of the impossibility of solving the halting problem.
I can't tell you the specifics of your platform as I don't know it, but there is a general answer to the behaviour you see.
When some function that has a return is compiled, the compiler will use a convention on how to return that data. It could be a machine register, or a defined memory location such as via a stack or whatever (though generally machine registers are used). The compiled code may also use that location (register or otherwise) while doing the work of the function.
If the function doesn't return anything, then the compiler will not generate code that explicitly fills that location with a return value. However, like I said above, it may use that location during the function. When you write code that reads the return value (ch2 = toUpper(ch);), the compiler will write code that uses its convention on how retrieve that return from the conventional location. As far as the caller code is concerned, it will just read that value from the location, even if nothing was written explicitly there. Hence you get a value.
Now look at Ray's example. The compiler used the EAX register to store the results of the upper casing operation. It just so happens, this is probably the location that return values are written to. On the calling side, ch2 is loaded with the value that's in EAX - hence a phantom return. This is only true of the x86 range of processors, as on other architectures the compiler may use a completely different scheme in deciding how the convention should be organised.
However, good compilers will try optimise according to a set of local conditions, knowledge of code, rules, and heuristics. So an important thing to note is that this is just luck that it works. The compiler could optimise and not do this or whatever - you should not reply on the behaviour.
You should keep in mind that such code may crash depending on the compiler. For example, Clang generates a ud2 instruction at the end of such function and your app will crash at run time.
There are no local variables, so the value on the top of the stack at the end of the function will be the parameter c. The value at the top of the stack upon exiting, is the return value. So whatever c holds, that's the return value.
I have tried a small program:
#include <stdio.h>
int f1() {
}
int main() {
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
}
Result:
TEST: <1>
TEST: <10>
TEST: <11>
TEST: <11>
TEST: <11>
I have used the MinGW-GCC compiler, so there might be differences.
You could just play around and try, e.g., a char function.
As long you don't use the result value, it will still work fine.
#include <stdio.h>
char f1() {
}
int main() {
f1();
}
But I still would recommend to set either void function or give some return value.
Your function seems to need a return:
char toUpper(char c)
{
if(c>='a'&&c<='z')
c = c - 32;
return c;
}
When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}
If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..
I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise
I had been struggling for weeks with a poor-performing translator I had written.
On the following simple bechmark
#include<stdio.h>
int main()
{
int x;
char buf[2048];
FILE *test = fopen("test.out", "wb");
setvbuf(test, buf, _IOFBF, sizeof buf);
for(x=0;x<1024*1024; x++)
fprintf(test, "%04d", x);
fclose(test);
return 0
}
we see the following result
bash-3.1$ gcc -O2 -static test.c -o test
bash-3.1$ time ./test
real 0m0.334s
user 0m0.015s
sys 0m0.016s
As you can see, the moment the "-std=c99" flag is added in, performance comes crashing down:
bash-3.1$ gcc -O2 -static -std=c99 test.c -o test
bash-3.1$ time ./test
real 0m2.477s
user 0m0.015s
sys 0m0.000s
The compiler I'm using is gcc 4.6.2 mingw32.
The file generated is about 12M, so this is a difference between of about 21MB/s between the two.
Running diff shows the the generated files are identical.
I assumed this has something to do with file locking in fprintf, of which the program makes heavy use, but I haven't been able to find a way to switch that off in the C99 version.
I tried flockfile on the stream I use at the beginning of the program, and an corresponding funlockfile at the end, but was greeted with compiler errors about implicit declarations, and linker errors claiming undefined references to those functions.
Could there be another explanation for this problem, and more importantly, is there any way to use C99 on windows without paying such an enormous performance price?
Edit:
After looking at the code generated by these options, it looks like in the slow versions, mingw sticks in the following:
_fprintf:
LFB0:
.cfi_startproc
subl $28, %esp
.cfi_def_cfa_offset 32
leal 40(%esp), %eax
movl %eax, 8(%esp)
movl 36(%esp), %eax
movl %eax, 4(%esp)
movl 32(%esp), %eax
movl %eax, (%esp)
call ___mingw_vfprintf
addl $28, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
In the fast version, this simply does not exist; otherwise, both are exactly the same. I assume __mingw_vfprintf seems to be the slowpoke here, but I have no idea what behavior it needs to emulate that makes it so slow.
After some digging in the source code, I have found why the MinGW function is so terribly slow:
At the beginning of a [v,f,s]printf in MinGW, there is some innocent-looking initialization code:
__pformat_t stream = {
dest, /* output goes to here */
flags &= PFORMAT_TO_FILE | PFORMAT_NOLIMIT, /* only these valid initially */
PFORMAT_IGNORE, /* no field width yet */
PFORMAT_IGNORE, /* nor any precision spec */
PFORMAT_RPINIT, /* radix point uninitialised */
(wchar_t)(0), /* leave it unspecified */
0, /* zero output char count */
max, /* establish output limit */
PFORMAT_MINEXP /* exponent chars preferred */
};
However, PFORMAT_MINEXP is not what it appears to be:
#ifdef _WIN32
# define PFORMAT_MINEXP __pformat_exponent_digits()
# ifndef _TWO_DIGIT_EXPONENT
# define _get_output_format() 0
# define _TWO_DIGIT_EXPONENT 1
# endif
static __inline__ __attribute__((__always_inline__))
int __pformat_exponent_digits( void )
{
char *exponent_digits = getenv( "PRINTF_EXPONENT_DIGITS" );
return ((exponent_digits != NULL) && ((unsigned)(*exponent_digits - '0') < 3))
|| (_get_output_format() & _TWO_DIGIT_EXPONENT)
? 2
: 3
;
}
This winds up getting called every time I want to print, and getenv on windows must not be very quick. Replacing that define with a 2 brings the runtime back to where it should be.
So, the answer comes down to this: when using -std=c99 or any ANSI-compliant mode, MinGW switches the CRT runtime with its own. Normally, this wouldn't be an issue, but the MinGW lib had a bug which slowed its formatting functions down far beyond anything imaginable.
Using -std=c99 disable all GNU extensions.
With GNU extensions and optimization, your fprintf(test, "B") is probably replaced by a fputc('B', test)
Note this answer is obsolete, see https://stackoverflow.com/a/13973562/611560 and https://stackoverflow.com/a/13973933/611560
After some consideration of your assembler, it looks like the slow version is using the *printf() implementation of MinGW, based undoubtedly in the GCC one, while the fast version is using the Microsoft implementation from msvcrt.dll.
Now, the MS one is notably for lacking a lot of features, that the GCC one does implement. Some of these are GNU extensions but some others are for C99 conformance. And since you are using -std=c99 you are requesting the conformance.
But why so slow? Well, one factor is simplicity, the MS version is far simpler so it is expected that it will run faster, even in the trivial cases. Other factor is that you are running under Windows, so it is expected that the MS version be more efficient that one copied from the Unix world.
Does it explain a factor of x10? Probably not...
Another thing you can try:
Replace fprintf() with sprintf(), printing into a memory buffer without touching the file at all. Then you can try doing fwrite() without printfing. That way you can guess if the loss is in the formatting of the data or in the writing to the FILE.
Since MinGW32 3.15, compliant printf functions are available to use instead of those found in Microsoft C runtime (CRT).
The new printf functions are used when compiling in strict ANSI, POSIX and/or C99 modes.
For more information see the mingw32 changelog
You can use __msvcrt_fprintf() to use the fast (non compliant) function.
In my SAX xml parsing callback (XCode 4, LLVM), I am doing a lot of calls to
this type of code:
static const char* kFoo = "Bar";
void SaxCallBack(char* sax_string,.....)
{
if ( strcmp(sax_string, kFoo, strlen(kFoo) ) == 0)
{
}
}
Is it safe to assume that strlen(kFoo) is optimized by the compiler?
(The Apple sample code
had pre-calculated strlen(kFoo), but I think this is error prone for large numbers of constant strings.)
Edit: Motivation for optimizing: parsing my SVG map on iPod touch 2G takes 5 seconds (!) using NSXMLParser. So, I want to switch to lib2xml, and optimize the string comparisons.
If by "LLVM" you mean clang, then yes, you can count on clang -O to optimize the strlen away. Here is what the code for your function looks like:
_SaxCallBack:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
leaq L_.str1(%rip), %rsi
movl $3, %edx
callq _strncmp
...
I changed the strcmp into strncmp, but the third argument has indeed been replaced by the immediate $3.
Note that gcc 4.2.1 -O3 does not optimize this strlen call, and that you can only expect it to work in the precise conditions of your question (especially, the string and the call to strlen must be in the same file).
Don't write things like:
static const char* kFoo = "Bar";
You've created a variable named kFoo that points to constant data. The compiler might be able to detect that this variable does not change and optimize it out, but if not, you've bloated your program's data segment.
Also don't write things like:
static const char *const kFoo = "Bar";
Now your variable kFoo is const-qualified and non-modifiable, but if it's used in position independent code (shared libraries etc.), the contents will still vary at runtime and thus it will add startup and memory cost to your program. Instead, use:
static const char kFoo[] = "Bar";
or even:
#define kFoo "Bar"
In general you can't count on it. However, you could use 'sizeof' and apply it to a string literal. Of course, this mean that you can't define 'kFoo' the way it originally was defined.
The following should work on all compilers and on all optimization levels.
#define kFoo "..."
... strcmp(... sizeof(kFoo))
Follow-up question:
Have you tested the following ?
static std::string const kFoo = "BAR";
void SaxCallBack(char* sax_string,.....)
{
if ( sax_string == kFoo)
{
}
}
It's a net win in readability, but I have no idea about the performance cost.
As an alternative, if you must dispatch by yourself, I have found that using a state-machine like approach (with stack) is much better readability-wise, and might also win performance-wise (instead of having a large number of tags to switch on you only have the tags that can be met right now).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
performance-impact-of-fno-strict-aliasing
strict-aliasing
when-is-char-safe-for-strict-pointer-aliasing
how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.
No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
Date Wed, 26 Feb 2003 09:22:15 -0800
Subject Re: Invalid compilation without -fno-strict-aliasing
From Jean Tourrilhes <>
On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:
Jean Tourrilhes <> said:
It looks like a compiler bug to me...
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
Code (from linux/include/net/iw_handler.h) :
static inline char *
iwe_stream_add_event(char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
IMHO, the compiler should have enough context to know that the
reordering is dangerous. Any suggestion to make this simple code more
bullet proof is welcomed.
The compiler is free to assume char *stream and struct iw_event *iwe point
to separate areas of memory, due to strict aliasing.
Which is true and which is not the problem I'm complaining about.
(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)
And Linus Torvald's comment on the above:
Jean Tourrilhes wrote:
>
It looks like a compiler bug to me...
Why do you think the kernel uses "-fno-strict-aliasing"?
The gcc people are more interested in trying to find out what can be
allowed by the c99 specs than about making things actually work. The
aliasing code in particular is not even worth enabling, it's just not
possible to sanely tell gcc when some things can alias.
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
The "problem" is that we inline the memcpy(), at which point gcc won't
care about the fact that it can alias, so they'll just re-order
everything and claim it's out own fault. Even though there is no sane
way for us to even tell gcc about it.
I tried to get a sane way a few years ago, and the gcc developers really
didn't care about the real world in this area. I'd be surprised if that
had changed, judging by the replies I have already seen.
I'm not going to bother to fight it.
Linus
http://www.mail-archive.com/linux-btrfs#vger.kernel.org/msg01647.html:
Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.
...
I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that
unsigned long a;
a = 5;
*(unsigned short *)&a = 4;
could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.
SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.
SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
jlong jresult = 0 ;
int arg1 ;
int arg2 ;
my_struct_t *result = 0 ;
(void)jenv;
(void)jcls;
arg1 = (int)jarg1;
arg2 = (int)jarg2;
result = (my_struct_t *)make_my_struct(arg1,arg2);
*(my_struct_t **)&jresult = result; /* <<<< horror*/
return jresult;
}
gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
#include <stdio.h>
static void copy(int n, int a[][n], int b[][n]) {
int i, j;
for (i = 0; i < 2; i++) // 'n' not used in this example
for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
b[i][j] = a[i][j];
}
int main(int argc, char *argv[]) {
int a[2][2] = {{1, 2},{3, 4}};
int b[2][2];
copy(2, a, b);
printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
return 0;
}
With gcc 4.1.2 on CentOS, I get:
$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
Use __attribute__((noinline)) for copy().
Use the gcc switch -fno-strict-aliasing.
Change the third parameter of copy() from b[][n] to b[][2].
Don't use -O2 or -O3.
Further notes:
This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
Yes, I know you wouldn't write copy() like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
No gcc warning switches, include -Wstrict-aliasing=, did anything here.
1-D variable-length arrays seem to be OK.
Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
static void zero(int n, int a[][n]) {
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = 0;
}
int main(void) {
int a[2][2] = {{1, 2},{3, 4}};
zero(2, a);
printf("%d\n", a[1][1]);
return 0;
}
produces the results:
gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4
It seems it is the combination -fstrict-aliasing with -finline which causes the bug.
here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.
The Common Initial Sequence rule of C used to be interpreted as making it
possible to write a function which could work on the leading portion of a
wide variety of structure types, provided they start with elements of matching
types. Under C99, the rule was changed so that it only applied if the structure
types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if
the accesses are performed through the union type, notwithstanding the facts
that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary
usefulness lay in what it implied about the way in which structs were
laid out and accessed. If S1 and S2 were structures that shared a CIS,
there would be no way that a function that accepted a pointer to an S1
and an S2 from an outside source could comply with C89's CIS rules
without allowing the same behavior to be useful with pointers to
structures that weren't actually inside a union object; specifying CIS
support for structures would thus have been redundant given that it was
already specified for unions.
The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
int main()
{
int v = 10;
union vv {
int v;
short q;
} *s = (union vv *)&v;
s->v = 1;
return v;
}