Ensuring tail-call optimization in C [duplicate]

Ensuring tail-call optimization in C [duplicate] - c

It seems to me that it would work perfectly well to do tail-recursion optimization in both C and C++, yet while debugging I never seem to see a frame stack that indicates this optimization. That is kind of good, because the stack tells me how deep the recursion is. However, the optimization would be kind of nice as well.
Do any C++ compilers do this optimization? Why? Why not?
How do I go about telling the compiler to do it?
For MSVC: /O2 or /Ox
For GCC: -O2 or -O3
How about checking if the compiler has done this in a certain case?
For MSVC, enable PDB output to be able to trace the code, then inspect the code
For GCC..?
I'd still take suggestions for how to determine if a certain function is optimized like this by the compiler (even though I find it reassuring that Konrad tells me to assume it)
It is always possible to check if the compiler does this at all by making an infinite recursion and checking if it results in an infinite loop or a stack overflow (I did this with GCC and found out that -O2 is sufficient), but I want to be able to check a certain function that I know will terminate anyway. I'd love to have an easy way of checking this :)
After some testing, I discovered that destructors ruin the possibility of making this optimization. It can sometimes be worth it to change the scoping of certain variables and temporaries to make sure they go out of scope before the return-statement starts.
If any destructor needs to be run after the tail-call, the tail-call optimization can not be done.

All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:
int bar(int, int);
int foo(int n, int acc) {
return (n == 0) ? acc : bar(n - 1, acc + 2);
}
int bar(int n, int acc) {
return (n == 0) ? acc : foo(n - 1, acc + 1);
}
Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:
For MSVC, use /O2 or /Ox.
For GCC, Clang and ICC, use -O3
An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.
As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.

As well as the obvious (compilers don't do this sort of optimization unless you ask for it), there is a complexity about tail-call optimization in C++: destructors.
Given something like:
int fn(int j, int i)
{
if (i <= 0) return j;
Funky cls(j,i);
return fn(j, i-1);
}
The compiler can't (in general) tail-call optimize this because it needs
to call the destructor of cls after the recursive call returns.
Sometimes the compiler can see that the destructor has no externally visible side effects (so it can be done early), but often it can't.
A particularly common form of this is where Funky is actually a std::vector or similar.

gcc 4.3.2 completely inlines this function (crappy/trivial atoi() implementation) into main(). Optimization level is -O1. I notice if I play around with it (even changing it from static to extern, the tail recursion goes away pretty fast, so I wouldn't depend on it for program correctness.
#include <stdio.h>
static int atoi(const char *str, int n)
{
if (str == 0 || *str == 0)
return n;
return atoi(str+1, n*10 + *str-'0');
}
int main(int argc, char **argv)
{
for (int i = 1; i != argc; ++i)
printf("%s -> %d\n", argv[i], atoi(argv[i], 0));
return 0;
}

Most compilers don't do any kind of optimisation in a debug build.
If using VC, try a release build with PDB info turned on - this will let you trace through the optimised app and you should hopefully see what you want then. Note, however, that debugging and tracing an optimised build will jump you around all over the place, and often you cannot inspect variables directly as they only ever end up in registers or get optimised away entirely. It's an "interesting" experience...

As Greg mentions, compilers won't do it in debug mode. It's ok for debug builds to be slower than a prod build, but they shouldn't crash more often: and if you depend on a tail call optimization, they may do exactly that. Because of this it is often best to rewrite the tail call as an normal loop. :-(

Related

Why is no warning given for the wrong use of attribute((pure)) in GCC?

I am trying to understand pure functions, and have been reading through the Wikipedia article on that topic. I wrote the minimal sample program as follows:
#include <stdio.h>
static int a = 1;
static __attribute__((pure)) int pure_function(int x, int y)
{
return x + y;
}
static __attribute__((pure)) int impure_function(int x, int y)
{
a++;
return x + y;
}
int main(void)
{
printf("pure_function(0, 0) = %d\n", pure_function(0, 0));
printf("impure_function(0, 0) = %d\n", impure_function(0, 0));
return 0;
}
I compiled this program with gcc -O2 -Wall -Wextra, expecting that an error, or at least a warning, should have been issued for decorating impure_function() with __attribute__((pure)). However, I received no warnings or errors, and the program also ran without issues.
Isn't marking impure_function() with __attribute__((pure)) incorrect? If so, why does it compile without any errors or warnings, even with the -Wextra and -Wall flags?
Thanks in advance!

Doing this is incorrect and you are responsible for using the attribute correctly.
Look at this example:
static __attribute__((pure)) int impure_function(int x, int y)
{
extern int a;
a++;
return x + y;
}
void caller()
{
impure_function(1, 1);
}
Code generated by GCC (with -O1) for the function caller is:
caller():
ret
As you can see, the impure_function call was completely removed because compiler treats it as "pure".
GCC can mark the function as "pure" internally automatically if it sees its definition:
static __attribute__((noinline)) int pure_function(int x, int y)
{
return x + y;
}
void caller()
{
pure_function(1, 1);
}
Generated code:
caller():
ret
So there is no point in using this attribute on functions that are visible to the compiler. It is supposed to be used when definition is not available, for example when function is defined in another DLL. That means that when it is used in a proper place the compiler won't be able to perform a sanity check anyway. Implementing a warning thus is not very useful (although not meaningless).
I don't think there is anything stopping GCC developers from implementing such warning, except time that must be spend.

A pure function is a hint for the optimizing compiler. Probably, gcc don't care about pure functions when you pass just -O0 to it (the default optimizations). So if f is pure (and defined outside of your translation unit, e.g. in some outside library), the GCC compiler might optimize y = f(x) + f(x); into something like
{
int tmp = f(x); /// tmp is a fresh variable, not appearing elsewhere
y = tmp + tmp;
}
but if f is not pure (which is the usual case: think of f calling printf or malloc), such an optimization is forbidden.
Standard math functions like sin or sqrt are pure (except for IEEE rounding mode craziness, see http://floating-point-gui.de/ and Fluctuat for more), and they are complex enough to compute to make such optimizations worthwhile.
You might compile your code with gcc -O2 -Wall -fdump-tree-all to guess what is happening inside the compiler. You could add the -fverbose-asm -S flags to get a generated *.s assembler file.
You could also read the Bismon draft report (notably its section §1.4). It might give some intuitions related to your question.
In your particular case, I am guessing that gcc is inlining your calls; and then purity matters less.
If you have time to spend, you might consider writing your own GCC plugin to make such a warning. You'll spend months in writing it! These old slides might still be useful to you, even if the details are obsolete.
At the theoretical level, be aware of Rice's theorem. A consequence of it is that perfect optimization of pure functions is probably impossible.
Be aware of the GCC Resource Center, located in Bombay.

How to force a crash in C, is dereferencing a null pointer a (fairly) portable way?

I'm writing my own test-runner for my current project. One feature (that's probably quite common with test-runners) is that every testcase is executed in a child process, so the test-runner can properly detect and report a crashing testcase.
I want to also test the test-runner itself, therefore one testcase has to force a crash. I know "crashing" is not covered by the C standard and just might happen as a result of undefined behavior. So this question is more about the behavior of real-world implementations.
My first attempt was to just dereference a null-pointer:
int c = *((int *)0);
This worked in a debug build on GNU/Linux and Windows, but failed to crash in a release build because the unused variable c was optimized out, so I added
printf("%d", c); // to prevent optimizing away the crash
and thought I was settled. However, trying my code with clang instead of gcc revealed a surprise during compilation:
[CC] obj/x86_64-pc-linux-gnu/release/src/test/test/test_s.o
src/test/test/test.c:34:13: warning: indirection of non-volatile null pointer
will be deleted, not trap [-Wnull-dereference]
int c = *((int *)0);
^~~~~~~~~~~
src/test/test/test.c:34:13: note: consider using __builtin_trap() or qualifying
pointer with 'volatile'
1 warning generated.
And indeed, the clang-compiled testcase didn't crash.
So, I followed the advice of the warning and now my testcase looks like this:
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
int *volatile nptr = 0;
int c = *nptr;
printf("%d", c); // to prevent optimizing away the crash
}
This solved my immediate problem, the testcase "works" (aka crashes) with both gcc and clang.
I guess because dereferencing the null pointer is undefined behavior, clang is free to compile my first code into something that doesn't crash. The volatile qualifier removes the ability to be sure at compile time that this really will dereference null.
Now my questions are:
Does this final code guarantee the null dereference actually happens at runtime?
Is dereferencing null indeed a fairly portable way for crashing on most platforms?

I wouldn't rely on that method as being robust if I were you.
Can't you use abort(), which is part of the C standard and is guaranteed to cause an abnormal program termination event?

The answer refering to abort() was great, I really didn't think of that and it's indeed a perfectly portable way of forcing an abnormal program termination.
Trying it with my code, I came across msvcrt (Microsoft's C runtime) implements abort() in a special chatty way, it outputs the following to stderr:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
That's not so nice, at least it unnecessarily clutters the output of a complete test run. So I had a look at __builtin_trap() that's also referenced in clang's warning. It turns out this gives me exactly what I was looking for:
LLVM code generator translates __builtin_trap() to a trap instruction if it is supported by the target ISA. Otherwise, the builtin is translated into a call to abort.
It's also available in gcc starting with version 4.2.4:
This function causes the program to exit abnormally. GCC implements this function by using a target-dependent mechanism (such as intentionally executing an illegal instruction) or by calling abort.
As this does something similar to a real crash, I prefer it over a simple abort(). For the fallback, it's still an option trying to do your own illegal operation like the null pointer dereference, but just add a call to abort() in case the program somehow makes it there without crashing.
So, all in all, the solution looks like this, testing for a minimum GCC version and using the much more handy __has_builtin() macro provided by clang:
#undef HAVE_BUILTIN_TRAP
#ifdef __GNUC__
# define GCC_VERSION (__GNUC__ * 10000 \
+ __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
# if GCC_VERSION > 40203
# define HAVE_BUILTIN_TRAP
# endif
#else
# ifdef __has_builtin
# if __has_builtin(__builtin_trap)
# define HAVE_BUILTIN_TRAP
# endif
# endif
#endif
#ifdef HAVE_BUILTIN_TRAP
# define crashMe() __builtin_trap()
#else
# include <stdio.h>
# define crashMe() do { \
int *volatile iptr = 0; \
int i = *iptr; \
printf("%d", i); \
abort(); } while (0)
#endif
// [...]
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
crashMe();
}

you can write memory instead of reading it.
*((int *)0) = 0;

No, dereferencing a NULL pointer is not a portable way of crashing a program. It is undefined behavior, which means just that, you have no guarantees what will happen.
As it happen, for the most part under any of the three main OS's used today on desktop computers, that being MacOS, Linux and Windows NT (*) dereferencing a NULL pointer will immediately crash your program.
That said: "The worst possible result of undefined behavior is for it to do what you were expecting."
I purposely put a star beside Windows NT, because under Windows 95/98/ME, I can craft a program that has the following source:
int main()
{
int *pointer = NULL;
int i = *pointer;
return 0;
}
that will run without crashing. Compile it as a TINY mode .COM files under 16 bit DOS, and you'll be just fine.
Ditto running the same source with just about any C compiler under CP/M.
Ditto running that on some embedded systems. I've not tested it on an Arduino, but I would not want to bet either way on the outcome. I do know for certain that were a C compiler available for the 8051 systems I cut my teeth on, that program would run fine on those.

The program below should work. It might cause some collateral damage, though.
#include <string.h>
void crashme( char *str)
{
char *omg;
for(omg=strtok(str, "" ); omg ; omg=strtok(NULL, "") ) {
strcat(omg , "wtf");
}
*omg =0; // always NUL-terminate a NULL string !!!
}
int main(void)
{
char buff[20];
// crashme( "WTF" ); // works!
// crashme( NULL ); // works, too
crashme( buff ); // Maybe a bit too slow ...
return 0;
}

Initializing array in C - execution time

int a[5] = {0};
VS
typedef struct
{
int a[5];
} ArrStruct;
ArrStruct arrStruct;
sizeA = sizeof(arrStruct.a)/sizeof(int);
for (it = 0 ; it < sizeA ; ++it)
arrStruct.a[it] = 0;
Does initializing by for loop takes more execution time? if so, why?

It depends upon the compiler and the optimization flags.
On recent GCC (e.g. 4.8 or 4.9) with gcc -O3 (or probably even -O1 or -O2) it should not matter, since the same code would be emitted (GCC has even an optimization which would transform your loop into a builtin_memset which would be further optimized).
On some compilers, it could happen that the int a[5] = {0}; might be faster, because the compiler might emit e.g. vector instruction (or on x86 a rep stosw) to clear an array.
The best thing is to examine the generated (gimple representation and) assembler code (e.g. with gcc -fdump-tree-gimple -O3 -fverbose-asm -mtune=native -S) and to benchmark. Most of the cases it does not matter. Be sure to enable optimizations when compiling.
Generally, don't care about such micro-optimization; a good optimizing compiler is better than you have time to code.

It depends on the scope of the variables. For a static or global variable, the first initialization
int a[5]={0};
may be done at compile time, while the loop is run at, well, run time. Thus there is no "execution" associated with the former.
You may find the discussion of this question (and in particular this answer ) interesting.

Why is 248x248 the maximum bi dimensional array size I can declare?

I have a program problem for which I would like to declare a 256x256 array in C. Unfortunately, I each time I try to even declare an array of that size (integers) and I run my program, it terminates unexpectedly. Any suggestions? I haven't tried memory allocation since I cannot seem to understand how it works with multi-dimensional arrays (feel free to guide me through it though I am new to C). Another interesting thing to note is that I can declare a 248x248 array in C without any problems, but no larger.
dims = 256;
int majormatrix[dims][dims];
Compiled with:
gcc -msse2 -O3 -march=pentium4 -malign-double -funroll-loops -pipe -fomit-frame-pointer -W -Wall -o "SkyFall.exe" "SkyFall.c"
I am using SciTE 323 (not sure how to check GCC version).

There are three places where you can allocate an array in C:
In the automatic memory (commonly referred to as "on the stack")
In the dynamic memory (malloc/free), or
In the static memory (static keyword / global space).
Only the automatic memory has somewhat severe constraints on the amount of allocation (that is, in addition to the limits set by the operating system); dynamic and static allocations could potentially grab nearly as much space as is made available to your process by the operating system.
The simplest way to see if this is the case is to move the declaration outside your function. This would move your array to static memory. If crashes continue, they have nothing to do with the size of your array.

Unless you're running a very old machine/compiler, there's no reason that should be too large. It seems to me the problem is elsewhere. Try the following code and tell me if it works:
#include <stdio.h>
int main()
{
int ints[256][256], i, j;
i = j = 0;
while (i<256) {
while (j<256) {
ints[i][j] = i*j;
j++;
}
i++;
j = 0;
}
printf("Made it :) \n");
return 0;
}

You can't necessarily assume that "terminates unexpectedly" is necessarily directly because of "declaring a 256x256 array".
SUGGESTION:
1) Boil your code down to a simple, standalone example
2) Run it in the debugger
3) When it "terminates unexpectedly", use the debugger to get a "stack traceback" - you must identify the specific line that's failing
4) You should also look for a specific error message (if possible)
5) Post your code, the error message and your traceback
6) Be sure to tell us what platform (e.g. Centos Linux 5.5) and compiler (e.g. gcc 4.2.1) you're using, too.

glibc - force function call (no inline expansion)

I have a question regarding glibc function calls. Is there a flag to tell gcc not to inline a certain glibc function, e.g. memcpy?
I've tried -fno-builtin-memcpy and other flags, but they didn't work. The goal is that the actual glibc memcpy function is called and no inlined code (since the glibc version at compile time differs from the one at runtime). It's for testing purposes only. Normally I wan't do that.
Any solutions?
UPDATE:
Just to make it clearer: In the past memcpy works even with overlapping areas. This has changed meanwhile and I can see this changes when compiling with different glibc versions. So now I want to test if my old code (using memcpy where memmove should have been used) works correct or not on a system with a newer glibc (e.g. 2.14). But to do that, I have to make sure, that the new memcpy is called and no inlined code.
Best regards

This may not be exactly what you're looking for, but it seems to force gcc to generate an indirect call to memcpy():
#include <stdio.h>
#include <string.h>
#include <time.h>
// void *memcpy(void *dest, const void *src, size_t n)
unsigned int x = 0xdeadbeef;
unsigned int y;
int main(void) {
void *(*memcpy_ptr)(void *, const void *, size_t) = memcpy;
if (time(NULL) == 1) {
memcpy_ptr = NULL;
}
memcpy_ptr(&y, &x, sizeof y);
printf("y = 0x%x\n", y);
return 0;
}
The generated assembly (gcc, Ubuntu, x86) includes a call *%edx instruction.
Without the if (time(NULL) == 1) test (which should never succeed, but the compiler doesn't know that), gcc -O3 is clever enough to recognize that the indirect call always calls memcpy(), which can then be replaced by a movl instruction.
Note that the compiler could recognize that if memcpy_ptr == NULL then the behavior is undefined, and again replace the indirect call with a direct call, and then with a movl instruction. gcc 4.5.2 with -O3 doesn't appear to be that clever. If a later version of gcc is, you could replace the memcpy_ptr = NULL with an assignment of some actual function that behaves differently than memcpy().

In theory:
gcc -fno-inline -fno-builtin-inline ...
But then you said -fno-builtin-memcpy didn't stop the compiler from inlining it, so there's no obvious reason why this should work any better.

#undef memcpy
#define mempcy your_memcpy_replacement
Somewhere at the top but after #include obviously
And mark your_memcpy_replacement as attribute((noinline))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Ensuring tail-call optimization in C [duplicate] - c

Related

Why is no warning given for the wrong use of attribute((pure)) in GCC?

How to force a crash in C, is dereferencing a null pointer a (fairly) portable way?

Initializing array in C - execution time

Why is 248x248 the maximum bi dimensional array size I can declare?

glibc - force function call (no inline expansion)

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Ensuring tail-call optimization in C [duplicate] - c

Related

Why is no warning given for the wrong use of __attribute__((pure)) in GCC?

How to force a crash in C, is dereferencing a null pointer a (fairly) portable way?

Initializing array in C - execution time

Why is 248x248 the maximum bi dimensional array size I can declare?

glibc - force function call (no inline expansion)

Categories

Resources

Why is no warning given for the wrong use of attribute((pure)) in GCC?