gcc, strict-aliasing, and horror stories [closed]

gcc, strict-aliasing, and horror stories [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
performance-impact-of-fno-strict-aliasing
strict-aliasing
when-is-char-safe-for-strict-pointer-aliasing
how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.

No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
Date Wed, 26 Feb 2003 09:22:15 -0800
Subject Re: Invalid compilation without -fno-strict-aliasing
From Jean Tourrilhes <>
On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:
Jean Tourrilhes <> said:
It looks like a compiler bug to me...
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
Code (from linux/include/net/iw_handler.h) :
static inline char *
iwe_stream_add_event(char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
IMHO, the compiler should have enough context to know that the
reordering is dangerous. Any suggestion to make this simple code more
bullet proof is welcomed.
The compiler is free to assume char *stream and struct iw_event *iwe point
to separate areas of memory, due to strict aliasing.
Which is true and which is not the problem I'm complaining about.
(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)
And Linus Torvald's comment on the above:
Jean Tourrilhes wrote:
>
It looks like a compiler bug to me...
Why do you think the kernel uses "-fno-strict-aliasing"?
The gcc people are more interested in trying to find out what can be
allowed by the c99 specs than about making things actually work. The
aliasing code in particular is not even worth enabling, it's just not
possible to sanely tell gcc when some things can alias.
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
The "problem" is that we inline the memcpy(), at which point gcc won't
care about the fact that it can alias, so they'll just re-order
everything and claim it's out own fault. Even though there is no sane
way for us to even tell gcc about it.
I tried to get a sane way a few years ago, and the gcc developers really
didn't care about the real world in this area. I'd be surprised if that
had changed, judging by the replies I have already seen.
I'm not going to bother to fight it.
Linus
http://www.mail-archive.com/linux-btrfs#vger.kernel.org/msg01647.html:
Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.
...
I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that
unsigned long a;
a = 5;
*(unsigned short *)&a = 4;
could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.

SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.
SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
jlong jresult = 0 ;
int arg1 ;
int arg2 ;
my_struct_t *result = 0 ;
(void)jenv;
(void)jcls;
arg1 = (int)jarg1;
arg2 = (int)jarg2;
result = (my_struct_t *)make_my_struct(arg1,arg2);
*(my_struct_t **)&jresult = result; /* <<<< horror*/
return jresult;
}

gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
#include <stdio.h>
static void copy(int n, int a[][n], int b[][n]) {
int i, j;
for (i = 0; i < 2; i++) // 'n' not used in this example
for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
b[i][j] = a[i][j];
}
int main(int argc, char *argv[]) {
int a[2][2] = {{1, 2},{3, 4}};
int b[2][2];
copy(2, a, b);
printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
return 0;
}
With gcc 4.1.2 on CentOS, I get:
$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
Use __attribute__((noinline)) for copy().
Use the gcc switch -fno-strict-aliasing.
Change the third parameter of copy() from b[][n] to b[][2].
Don't use -O2 or -O3.
Further notes:
This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
Yes, I know you wouldn't write copy() like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
No gcc warning switches, include -Wstrict-aliasing=, did anything here.
1-D variable-length arrays seem to be OK.
Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
static void zero(int n, int a[][n]) {
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = 0;
}
int main(void) {
int a[2][2] = {{1, 2},{3, 4}};
zero(2, a);
printf("%d\n", a[1][1]);
return 0;
}
produces the results:
gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4
It seems it is the combination -fstrict-aliasing with -finline which causes the bug.

here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.

The Common Initial Sequence rule of C used to be interpreted as making it
possible to write a function which could work on the leading portion of a
wide variety of structure types, provided they start with elements of matching
types. Under C99, the rule was changed so that it only applied if the structure
types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if
the accesses are performed through the union type, notwithstanding the facts
that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary
usefulness lay in what it implied about the way in which structs were
laid out and accessed. If S1 and S2 were structures that shared a CIS,
there would be no way that a function that accepted a pointer to an S1
and an S2 from an outside source could comply with C89's CIS rules
without allowing the same behavior to be useful with pointers to
structures that weren't actually inside a union object; specifying CIS
support for structures would thus have been redundant given that it was
already specified for unions.

The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
int main()
{
int v = 10;
union vv {
int v;
short q;
} *s = (union vv *)&v;
s->v = 1;
return v;
}

Related

Absolute worst case stack size based on automatic varaibles

In a C99 program, under the (theoretical) assumption that I'm not using variable-length arrays, and each of my automatic variables can only exist once at a time in the whole stack (by forbidding circular function calls and explicit recursion), if I sum up all the space they are consuming, could I declare that this is the maximal stack size that can ever happen?
A bit of context here: I told a friend that I wrote a program not using dynamic memory allocation ("malloc") and allocate all memory static (by modeling all my state variables in a struct, which I then declared global). He then told me that if I'm using automatic variables, I still make use of dynamic memory. I argued that my automatic variables are not state variables but control variables, so my program is still to be considered static. We then discussed that there has to be a way to make a statement about the absolute worst-case behaviour about my program, so I came up with the above question.
Bonus question: If the assumptions above hold, I could simply declare all automatic variables static and would end up with a "truly" static program?

Even if array sizes are constant a C implementation could allocate arrays and even structures dynamically. I'm not aware of any that do (anyone) and it would appear quite unhelpful. But the C Standard doesn't make such guarantees.
There is also (almost certainly) some further overhead in the stack frame (the data added to the stack on call and released on return).
You would need to declare all your functions as taking no parameters and returning void to ensure no program variables in the stack. Finally the 'return address' of where execution of a function is to continue after return is pushed onto the stack (at least logically).
So having removed all parameters, automatic variables and return values to you 'state' struct there will still be something going on to the stack - probably.
I say probably because I'm aware of a (non-standard) embedded C compiler that forbids recursion that can determine the maximum size of the stack by examining the call tree of the whole program and identify the call chain that reaches the peek size of the stack.
You could achieve this a monstrous pile of goto statements (some conditional where a functon is logically called from two places or by duplicating code.
It's often important in embedded code on devices with tiny memory to avoid any dynamic memory allocation and know that any 'stack-space' will never overflow.
I'm happy this is a theoretical discussion. What you suggest is a mad way to write code and would throw away most of (ultimately limited) services C provides to infrastructure of procedural coding (pretty much the call stack)
Footnote: See the comment below about the 8-bit PIC architecture.

Bonus question: If the assumptions above hold, I could simply declare
all automatic variables static and would end up with a "truly" static
program?
No. This would change the function of the program. static variables are initialized only once.
Compare this 2 functions:
int canReturn0Or1(void)
{
static unsigned a=0;
a++;
if(a>1)
{
return 1;
}
return 0;
}
int willAlwaysReturn0(void)
{
unsigned a=0;
a++;
if(a>1)
{
return 1;
}
return 0;
}

In a C99 program, under the (theoretical) assumption that I'm not using variable-length arrays, and each of my automatic variables can only exist once at a time in the whole stack (by forbidding circular function calls and explicit recursion), if I sum up all the space they are consuming, could I declare that this is the maximal stack size that can ever happen?
No, because of function pointers..... Read n1570.
Consider the following code, where rand(3) is some pseudo random number generator (it could also be some input from a sensor) :
typedef int foosig(int);
int foo(int x) {
foosig* fptr = (x>rand())?&foo:NULL;
if (fptr)
return (*fptr)(x);
else
return x+rand();
}
An optimizing compiler (such as some recent GCC suitably invoked with enough optimizations) would make a tail-recursive call for (*fptr)(x). Some other compiler won't.
Depending on how you compile that code, it would use a bounded stack or could produce a stack overflow. With some ABI and calling conventions, both the argument and the result could go thru a processor register and won't consume any stack space.
Experiment with a recent GCC (e.g. on Linux/x86-64, some GCC 10 in 2020) invoked as gcc -O2 -fverbose-asm -S foo.c then look inside foo.s. Change the -O2 to a -O0.
Observe that the naive recursive factorial function could be compiled into some iterative machine code with a good enough C compiler and optimizer. In practice GCC 10 on Linux compiling the below code:
int fact(int n)
{
if (n<1) return 1;
else return n*fact(n-1);
}
as gcc -O3 -fverbose-asm tmp/fact.c -S -o tmp/fact.s produces the following assembler code:
.type fact, #function
fact:
.LFB0:
.cfi_startproc
endbr64
# tmp/fact.c:3: if (n<1) return 1;
movl $1, %eax #, <retval>
testl %edi, %edi # n
jle .L1 #,
.p2align 4,,10
.p2align 3
.L2:
imull %edi, %eax # n, <retval>
subl $1, %edi #, n
jne .L2 #,
.L1:
# tmp/fact.c:5: }
ret
.cfi_endproc
.LFE0:
.size fact, .-fact
.ident "GCC: (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0"
And you can observe that the call stack is not increasing above.
If you have serious and documented arguments against GCC, please submit a bug report.
BTW, you could write your own GCC plugin which would choose to randomly apply or not such an optimization. I believe it stays conforming to the C standard.
The above optimization is essential for many compilers generating C code, such as Chicken/Scheme or Bigloo.
A related theorem is Rice's theorem. See also this draft report funded by the CHARIOT project.
See also the Compcert project.

incompatible return type from struct function - C

When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}

If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..

I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise

Massive fprintf speed difference without "-std=c99"

I had been struggling for weeks with a poor-performing translator I had written.
On the following simple bechmark
#include<stdio.h>
int main()
{
int x;
char buf[2048];
FILE *test = fopen("test.out", "wb");
setvbuf(test, buf, _IOFBF, sizeof buf);
for(x=0;x<1024*1024; x++)
fprintf(test, "%04d", x);
fclose(test);
return 0
}
we see the following result
bash-3.1$ gcc -O2 -static test.c -o test
bash-3.1$ time ./test
real 0m0.334s
user 0m0.015s
sys 0m0.016s
As you can see, the moment the "-std=c99" flag is added in, performance comes crashing down:
bash-3.1$ gcc -O2 -static -std=c99 test.c -o test
bash-3.1$ time ./test
real 0m2.477s
user 0m0.015s
sys 0m0.000s
The compiler I'm using is gcc 4.6.2 mingw32.
The file generated is about 12M, so this is a difference between of about 21MB/s between the two.
Running diff shows the the generated files are identical.
I assumed this has something to do with file locking in fprintf, of which the program makes heavy use, but I haven't been able to find a way to switch that off in the C99 version.
I tried flockfile on the stream I use at the beginning of the program, and an corresponding funlockfile at the end, but was greeted with compiler errors about implicit declarations, and linker errors claiming undefined references to those functions.
Could there be another explanation for this problem, and more importantly, is there any way to use C99 on windows without paying such an enormous performance price?
Edit:
After looking at the code generated by these options, it looks like in the slow versions, mingw sticks in the following:
_fprintf:
LFB0:
.cfi_startproc
subl $28, %esp
.cfi_def_cfa_offset 32
leal 40(%esp), %eax
movl %eax, 8(%esp)
movl 36(%esp), %eax
movl %eax, 4(%esp)
movl 32(%esp), %eax
movl %eax, (%esp)
call ___mingw_vfprintf
addl $28, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
In the fast version, this simply does not exist; otherwise, both are exactly the same. I assume __mingw_vfprintf seems to be the slowpoke here, but I have no idea what behavior it needs to emulate that makes it so slow.

After some digging in the source code, I have found why the MinGW function is so terribly slow:
At the beginning of a [v,f,s]printf in MinGW, there is some innocent-looking initialization code:
__pformat_t stream = {
dest, /* output goes to here */
flags &= PFORMAT_TO_FILE | PFORMAT_NOLIMIT, /* only these valid initially */
PFORMAT_IGNORE, /* no field width yet */
PFORMAT_IGNORE, /* nor any precision spec */
PFORMAT_RPINIT, /* radix point uninitialised */
(wchar_t)(0), /* leave it unspecified */
0, /* zero output char count */
max, /* establish output limit */
PFORMAT_MINEXP /* exponent chars preferred */
};
However, PFORMAT_MINEXP is not what it appears to be:
#ifdef _WIN32
# define PFORMAT_MINEXP __pformat_exponent_digits()
# ifndef _TWO_DIGIT_EXPONENT
# define _get_output_format() 0
# define _TWO_DIGIT_EXPONENT 1
# endif
static __inline__ __attribute__((__always_inline__))
int __pformat_exponent_digits( void )
{
char *exponent_digits = getenv( "PRINTF_EXPONENT_DIGITS" );
return ((exponent_digits != NULL) && ((unsigned)(*exponent_digits - '0') < 3))
|| (_get_output_format() & _TWO_DIGIT_EXPONENT)
? 2
: 3
;
}
This winds up getting called every time I want to print, and getenv on windows must not be very quick. Replacing that define with a 2 brings the runtime back to where it should be.
So, the answer comes down to this: when using -std=c99 or any ANSI-compliant mode, MinGW switches the CRT runtime with its own. Normally, this wouldn't be an issue, but the MinGW lib had a bug which slowed its formatting functions down far beyond anything imaginable.

Using -std=c99 disable all GNU extensions.
With GNU extensions and optimization, your fprintf(test, "B") is probably replaced by a fputc('B', test)
Note this answer is obsolete, see https://stackoverflow.com/a/13973562/611560 and https://stackoverflow.com/a/13973933/611560

After some consideration of your assembler, it looks like the slow version is using the *printf() implementation of MinGW, based undoubtedly in the GCC one, while the fast version is using the Microsoft implementation from msvcrt.dll.
Now, the MS one is notably for lacking a lot of features, that the GCC one does implement. Some of these are GNU extensions but some others are for C99 conformance. And since you are using -std=c99 you are requesting the conformance.
But why so slow? Well, one factor is simplicity, the MS version is far simpler so it is expected that it will run faster, even in the trivial cases. Other factor is that you are running under Windows, so it is expected that the MS version be more efficient that one copied from the Unix world.
Does it explain a factor of x10? Probably not...
Another thing you can try:
Replace fprintf() with sprintf(), printing into a memory buffer without touching the file at all. Then you can try doing fwrite() without printfing. That way you can guess if the loss is in the formatting of the data or in the writing to the FILE.

Since MinGW32 3.15, compliant printf functions are available to use instead of those found in Microsoft C runtime (CRT).
The new printf functions are used when compiling in strict ANSI, POSIX and/or C99 modes.
For more information see the mingw32 changelog
You can use __msvcrt_fprintf() to use the fast (non compliant) function.

Mixing variables and integer constants in the Boost Preprocessor

I am using BOOST_PP for to do precompile computations in the preprocessor. I am focusing on an application where code size is extremely important to me. (So please don't say the compiler should or usually does that, I need to control what is performed at compile time and what code is generated). However, I want to be able to use the same name of the macro/function for both integer constants and variables. As trivial example, I can have
#define TWICE(n) BOOST_PP_MUL(n,2)
//.....
// somewhere else in code
int a = TWICE(5);
This does what I want it to, evaluating to
int a = 10;
during compile time.
However, I also want it to be used at
int b = 5;
int a = TWICE(b);
This should be preprocessed to
int b = 5;
int a = 5 * 2;
Of course, I can do so by using the traditional macros like
#define TWICE(n) n * 2
But then it doesnt do what I want it to do for integer constants (evaluating them during compile time).
So, my question is, is there a trick to check whether the argument is a literal or a variable, and then use different definitions. i.e., something like this:
#define TWICE(n) BOOST_PP_IF( _IS_CONSTANT(n), \
BOOST_PP_MUL(n,2), \
n * 2 )
edit:
So what I am really after is some way to check if something is a constant available at compile time, and hence a good argument for the BOOST_PP_ functions. I realize that this is different from what most people expect from a preprocessor and general programming recommendations. But there is no wrong way of programming, so please don't hate on the question if you disagree with its philosophy. There is a reason the BOOST_PP library exists, and this question is in the same spirit. It might just not be possible though.

You're trying to do something that is better left to the optimizations of the compiler.
int main (void) {
int b = 5;
int a = b * 2;
return a; // return it so we use a and it's not optimized away
}
gcc -O3 -s t.c
.file "t.c"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
movl $10, %eax
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 4.5.0-6) 4.5.1 20100617 (prerelease)"
.section .note.GNU-stack,"",#progbits
Optimizing compiler will optimize.
EDIT: I'm aware that you don't want to hear that the compiler "should" or "usually" does that. However, what you are trying to do is not something that is meant to be done in the C preprocessor; the people who designed the C language and the C preprocessor designed it to work with the preprocessing token as it's basic atom. The CPP is, in many ways, "dumb". This is not a bad thing (in fact, this is in many cases what makes it so useful), but at the end of the day, it is a preprocessor. It preprocesses source files before they are parsed. It preprocesses source files before semantic analysis occurs. It preprocesses source files before the validity of a given source file is checked. I understand that you do not want to hear that this is something that the parser and semantic analyzer should handle or usually does. However, this is the reality of the situation. If you want to design code that is incredibly small, then you should rely on your compiler to do it's job instead of trying to create preprocessor constructs to do the work. Think of it this way: thousands of hours of work went into your compiler, so try to reuse as much of that work as possible!

not quite direct approach, however:
struct operation {
template<int N>
struct compile {
static const int value = N;
};
static int runtime(int N) { return N; }
};
operation::compile<5>::value;
operation::runtime(5);
alternatively
operation<5>();
operation(5);

There is no real chance of mixing the two levels (preprocessor and evaluation of variables). From what I understand from you question, b should be a symbolic constant?
I think you should use the traditional one
#define TWICE(n) ((n) * 2)
but then instead of initializing variables with expressions, you should initialize them with compile time constants. The only thing that I see to force evaluation at compile time and to have symbolic constants in C are integral enumeration constants. These are defined to have type int and are evaluated at compile time.
enum { bInit = 5 };
int b = bInit;
enum { aInit = TWICE(bInit) };
int a = aInit;
And generally you should not be too thrifty with const (as for your b) and check the produced assembler with -S.

C: Comparison to NULL

Religious arguments aside:
Option1:
if (pointer[i] == NULL) ...
Option2:
if (!pointer[i]) ...
In C is option1 functionally equivalent to option2?
Does the later resolve quicker due to absence of a comparison ?

I prefer the explicit style (first version). It makes it obvious that there is a pointer involved and not an integer or something else but it's just a matter of style.
From a performance point of view, it should make no difference.

Equivalent. It says so in the language standard. And people have the damndest religious preferences!

I like the second, other people like the first.
Actually, I prefer a third kind to the first:
if (NULL == ptr) {
...
}
Because then I:
won't be able to miss and just type one =
won't miss the == NULL and mistake it for the opposite if the condition is long (multiple lines)
Functionally they are equivalent.
Even if a NULL pointer is not "0" (all zero bits), if (!ptr) compares with the NULL pointer.
The following is incorrect. It's still here because there are many comments referring to it:
Do not compare a pointer with literal zero, however. It will work almost everywhere but is undefined behavior IIRC.

It is often useful to assume that compiler writers have at least a minimum of intelligence. Your compiler is not written by concussed ducklings. It is written by human beings, with years of programming experience, and years spent studying compiler theory. This doesn't mean that your compiler is perfect, and always knows best, but it does mean that it is perfectly capable of handling trivial automatic optimizations.
If the two forms are equivalent, then why wouldn't the compiler just translate one into the other to ensure both are equally efficient?
If if (pointer[i] == NULL) was slower than if (!pointer[i]), wouldn't the compiler just change it into the second, more efficient form?
So no, assuming they are equivalent, they are equally efficient.
As for the first part of the question, yes, they are equivalent. The language standard actually states this explicitly somewhere -- a pointer evaluates to true if it is non-NULL, and false if it is NULL, so the two are exactly identical.

Almost certainly no difference in performance. I prefer the implicit style of the second, though.

NULL should be declared in one of the standard header files as such:
#define NULL ((void*)0)
So either way, you are comparing against zero, and the compiler should optimize both the same way. Every processor has some "optimization" or opcode for comparing with zero.

Early optimization is bad. Micro optimization is also bad, unless you are trying to squeeze every last bit of Hz from your CPU, there is no point it doing it. As people have already shown, the compiler will optimize most of your code away anyways.
Its best to make your code as concise and readable as possible. If this is more readable
if (!ptr)
than this
if (NULL==ptr)
then use it. As long as everyone who will be reading your code agrees.
Personally I use the fully defined value (NULL==ptr) so it is clear what I am checking for. Might be longer to type, but I can easily read it. I'd think the !ptr would be easy to miss ! if reading to quickly.

It really depends on the compiler. I'd be surprised if most modern C compilers didn't generate virtually identical code for the specific scenario you describe.
Get your compiler to generate an assembly listing for each of those scenarios and you can answer your own question (for your particular compiler :)).
And even if they are different, the performance difference will probably be irrelevant in practical applications.

Turn on compiler optimization and they're basically the same
tested this on gcc 4.3.3
int main (int argc, char** argv) {
char c = getchar();
int x = (c == 'x');
if(x == NULL)
putchar('y');
return 0;
}
vs
int main (int argc, char** argv) {
char c = getchar();
int x = (c == 'x');
if(!x)
putchar('y');
return 0;
}
gcc -O -o test1 test1.c
gcc -O -o test2 test2.c
diff test1 test2
produced no output :)

I did a assembly dump, and found the difference between the two versions:
## -11,8 +11,7 ##
pushl %ecx
subl $20, %esp
movzbl -9(%ebp), %eax
- movsbl %al,%eax
- testl %eax, %eax
+ testb %al, %al
It looks like the latter actually generates one instruction and the first generates two, but this is pretty unscientific.
This is gcc, no optimizations:
test1.c:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char *pointer[5];
if(pointer[0] == NULL) {
exit(1);
}
exit(0);
}
test2.c: Change pointer[0] == NULL to !pointer[0]
gcc -s test1.c, gcc -s test2.c, diff -u test1.s test2.s

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char pointer[5];
/* This is insense you are comparing a pointer to a value */
if(pointer[0] == NULL) {
exit(1);
}
...
}
=> ...
movzbl 9(%ebp), %eax # your code compares a 1 byte value to a signed 4 bytes one
movsbl %al,%eax # Will result in sign extension...
testl %eax, %eax
...
Beware, gcc should have bumped out a warning, if not the case compile with -Wall flag on
Though, you should always compile to optimized gcc code.
BTW, precede your variable with volatile keyword in order to avoid gcc from ignoring it...
Always mention your compiler build version :)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

gcc, strict-aliasing, and horror stories [closed] - c

The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4? int main() { int v = 10; union vv { int v; short q; } s = (union vv )&v; s->v = 1; return v; }

Related

Absolute worst case stack size based on automatic varaibles

incompatible return type from struct function - C

Massive fprintf speed difference without "-std=c99"

Mixing variables and integer constants in the Boost Preprocessor

C: Comparison to NULL

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

gcc, strict-aliasing, and horror stories [closed] - c

The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4? int main() { int v = 10; union vv { int v; short q; } *s = (union vv *)&v; s->v = 1; return v; }

Related

Absolute worst case stack size based on automatic varaibles

incompatible return type from struct function - C

Massive fprintf speed difference without "-std=c99"

Mixing variables and integer constants in the Boost Preprocessor

C: Comparison to NULL

Categories

Resources

The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4? int main() { int v = 10; union vv { int v; short q; } s = (union vv )&v; s->v = 1; return v; }