Today i was reading about pure function, got confused with its use:
A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.
e.g. strlen() is a pure function while rand() is an impure one.
__attribute__ ((pure)) int fun(int i)
{
return i*i;
}
int main()
{
int i=10;
printf("%d",fun(i));//outputs 100
return 0;
}
http://ideone.com/33XJU
The above program behaves in the same way as in the absence of pure declaration.
What are the benefits of declaring a function as pure[if there is no change in output]?
pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like
for (int i = 0; i < 1000; i++)
{
printf("%d", fun(10));
}
With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.
When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.
Here is what the GCC documentation says about the pure attribute:
pure
Many functions have no effects except the return value and their return
value depends only on the parameters and/or global variables.
Such a function can be subject to common subexpression elimination and
loop optimization just as an arithmetic operator would be. These
functions should be declared with the attribute pure. For example,
int square (int) __attribute__ ((pure));
Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.
Here is one for common sub-expression elimination (given foo is pure):
a = foo (99) * x + y;
b = foo (99) * x + z;
Can become:
_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;
In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.
A non-pure function
int foo(int x, int y) // possible side-effects
is like an extension of a pure function
int bar(int x, int y) // guaranteed no side-effects
in which you have, besides the explicit function arguments x, y,
the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.
It should be clear why it is much easier to reason about a pure function than a non-pure one.
Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:
#include <iostream>
#include <cstring>
constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}
constexpr const char * str = "asdfjkl;";
constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.
int main() {
std::cout << len << std::endl << std::strlen(str) << std::endl;
return 0;
}
The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.
So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)
In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:
Caching
Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary
Parallelism
Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence
Passing By Reference
A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains
In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.
No need to construct objects or mock connections to DBs / file system.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm learning C and programming in general, and I don't know when to return a value and when to use void.
Is there any rule to apply when to use one over the another ?
Is there any difference between this two cases? I know that first case is
working with a local copy of int (n) , and second with original value.
#include <stdio.h>
int case_one(int n)
{
return n + 2;
}
void case_two(int *n)
{
*n = *n + 2;
}
int main(int argc, char *argv[])
{
int n = 5;
n = case_one(n);
printf("%i\n", n);
n = 5;
case_two(&n);
printf("%i\n", n);
return 0;
}
There is one more reason to use out param instead of return value - error handling. Usually return value (int) of the function call in C represents success of the operation. Error represented by not 0 value.
Example:
#include <stdio.h>
int extract_ip(const char *str, int out[4]) {
return -1;
}
int main() {
int out[4];
int rv = extract_ip("test", out);
if (rv != 0) {
printf("Error :%d", rv);
};
}
This approach used in POSIX socket API for example.
It very much depends on what you want to do, but basically You should use the former unless You have a good reason to use the latter.
Think of the implications of the choices. First, let's think about the way we provide input to function. It is quite often that you provide explicitly constant, compile-time constant or temporary data as input:
foo(1);
const int a = 2;
foo(a);
int x = 5;
int y = 5;
foo(x + y);
In all of the above cases the source of initial value is not a viable location for storing the result.
Next, let's think about how we may want to store the result. Foremost we may often want to use the result elsewhere. It may be inconvenient to use the same variable to store and then pass input, and to store output. But furthermore, often we would like to use the result immediately. We invoke the function as a part of larger expression:
x = foo(1) + foo(2);
Rewriting the preceding line in a manner that would use a pointer would require much unnecessary code - that is time and complication that we certainly don't want, when it's not really buying anything.
So when do we actually want to use a pointer? C functions are pass-by-value. Whenever we pass anything, a copy is created. We can then work on that copy and upon returning it, it requires copying again. If we do know that all we want to do is manipulate certain data set in place, we can provide a pointer as a handler and all that is copied is several bytes that store address.
So the former, prevalent way to create functions is flexible and leads to concise usage. The latter is useful for manipulation of objects in place.
Obviously sometimes our input actually is an address, and that's a trivial case for using pointers as function parameters.
I'm learning C and programming in general, and i don't know when to return a value and when to use void.
There is no definitive rule, and it is a matter of opinion. Notice that you might re-code your case_one as the following:
// we take the convention that the first argument would be ...
// a pointer to the "result"
void case_one_proc(int *pres, int n) {
*pres = n+2;
}
then a code like
int i = j+3; /// could be any arbitrary expression initializing i
int r = case_one(i);
is equivalent to
int i = j+3; // the same expression initializing i
int r;
case_one_proc(&r, i); //we pass the address of the "result" as first argument
Hence, you can guess that you might replace any whole C program with an equivalent program having only void returning functions (that is, only procedures). Of course, you may have to introduce supplementary variables like r above.
So you see that you might even avoid any value returning function. However, that would not be convenient (for human developers to code and to read other code) and might not be efficient.
(actually you could even make a complex C program which transforms the text of any C program -given by their several translation units- into another equivalent C program without any value returning function)
Notice that at the most elementary machine code level (at least on real processors like x86 and ARM), everything are instructions, and expressions don't exist anymore! And you favorite C compiler is transforming your program (and every C program in practice) into such machine code.
If you want more theory about such whole-program transformations, read about A-normal forms and about continuations (and CPS transformations)
Is there any rule to apply when to use one over the another ?
The rule is to be pragmatic, and favor first the readability and understandability of your source code. As a rule of thumb, any pure function implementing a mathematical function (like your case_one, which mathematically is a translation) is better coded as returning some result. Conversely, any program function which has mostly side effects is often coded as returning void. For cases in between, use your common sense, and look at existing practice -their source code- in existing free software projects (e.g. on github). Often a side effecting procedure might return some error code (or some success flag). Read documentation of printf & scanf for good examples.
Notice also that many ABIs and calling conventions are passing the result (and some arguments) in processor registers (and that is faster than passing thru memory).
Notice also that C has only call by value.
Some programming languages have only functions returning one value (sometimes ignored, or uninteresting), and have just expressions, without any notion of statement or instruction. Read about functional programming. An excellent example of a programming language with only value returning functions is Scheme, and SICP is an excellent introductory book (freely available) to programming that I strongly recommend.
The first approach is preferred, where you return a value. The second can be used when multiple values are computed by the function.
For example, the function strtol has this prototype:
long strtol(const char *s, char **endp, int base);
strtol attempts to interpret the initial portion of the string pointed to by s as the representation of a long integer expressed in base base. It returns the converted value with a return statement, and stores a pointer to the character that follows the number in s into *endp. Note however that this standard function should have returned 3 values: the converted value, the updated pointer and a success indicator.
There are other ways to return multiple values:
returning a structure.
updating a structure to which you receive a pointer.
C offers some flexibility. Sometimes different methods are equivalent and choosing one is mostly a matter of conventions, personal style or local practice, but for the example you give, the first option is definitely preferred.
For example:
int f1() {
return 3;
}
void f2(int *num) {
*num = 3;
}
int n1, n2;
n1 = f1();
f2(&n2);
With f1, we can return a value and do "variable=f1()"
But the same can be done with a void function that updates the value of that variable given its address without having to do "variable=f1()".
So, does this mean that we can actually just use void functions for everything? Or is there something that a void function cannot do to replace another int function/(type) function?
The main problem with making everything a void function (which in some people's lexicon is called a "routine") is that you can't chain them easily:
f(g(x))
becomes, if you really want to chain it:
int gout;
f((g(x, &gout), gout))
Which is painful.
Yes you could use void return types for everything and rely exclusively on returning via modified parameters. In fact, you could avoid using functions entirely and put everything in your main method.
As with any other feature of the language, return values give you particular advantages, and its up to you to decide if you want them. Here are some advantages of return values off the top of my head:
Returned values can be assigned to const variables, which can make your code easier to reason about
Certain types of optimisation can be applied by the compiler for returned values (this is more applicable to C++ RVO but may also apply to C's structs; I'm not sure)
Code which uses returned values is often easier to read, especially when the functions are mathematical (e.g. imagine having to declare all the temporaries manually for a large mathematical operation using sin/cos/etc. if they required the output to be via parameters). Compare:
double x = A*sin(a) + B*cos(b);
with
double tmpA, tmpB;
sin(&tmpA, a);
cos(&tmpB, b);
double x = A * tmpA + B * tmpB;
or to use a similar structure as John Zwinck suggested in his answer:
double tmpA, tmpB;
double x = A * (sin(&tmpA, a), tmpA) + B * (cos(&tmpB, b), tmpB);
It is guaranteed that the value will be set no matter what happens inside the function, as this is enforced by the compiler (except some very special cases such as longjumps)
You do not need to worry about checking if the assigned value is used or not; you can return the value and if the requester doesn't need it, they can ignore it (compare this to needing NULL-checks everywhere in your alternative method)
Of course there are also disadvantages:
You only get a single return value, so if your function logically returns multiple types of data (and they can't logically be combined into a single struct), returning via parameters may be better
Large objects may introduce performance penalties due to the need to copy them (which is why RVO was introduced in C++, which makes this much less of an issue)
So, does this mean that we can actually just use void functions for everything?
Indeed. And as it turn out, doing so is a fairly common coding style. But rather than void, such styles usually state that the return value should always be reserved for error codes.
In practice, you usually won't be able to stick to such a style consistently. There are a some special cases where not using the return value becomes inconvenient.
For example when writing callback functions of the kind used by standard C generic functions bsearch or qsort. The expect a callback of the format
int compare (const void *p1, const void *p2);
where the function returns less than zero, more than zero or zero. Design-wise it is important to keep the parameters passed as read-only, you wouldn't want your generic search algorithm to suddenly start modifying the searched contents. So while there is no reason in theory why these kind of functions couldn't be of void return type too, in practice it would make the code uglier and harder to read.
Of course you could; but that does not make it a good idea.
It may not always be convenient or lead to easy to comprehended code. A function returning void cannot be used directly as an operand in an expression. For example while you could write:
if( f1() == 3 )
{
...
}
for f2() you would have to write:
f2( &answer ) ;
if( answer )
{
...
}
Another issue is one of access control - by passing a pointer to the function you are giving that function indirect access to the caller's data, which is fine so long as the function is well behaved and does not overrun. A pointer may refer to a single object or an array of objects - the function taking that pointer has to impose appropriate rules, so it is intrinsically less safe.
I have a function which returns an integer value. Now I want to write a macro which call this function, gets the return value and prepends a string to it and return the resultant string.
I have tried this:
#define TEST(x) is_enabled(x)
I call this macro in the main function as:
int ret = 0;
ret = TEST(2);
printf("PORT-%d\n", ret);
This works perfectly. However I want the macro to return the string PORT-x, where, x is the return value of the called function. How can I do this?
EDIT :
I also tried writing it into multiple lines as:
#define TEST(x)\
{\
is_enabled(x);\
}
And called it in the main function as:
printf("PORT-%d\n", TEST(2));
But this gives a compile time error:
error: expected expression before â{â token
Use a function, not a macro. There is no good reason to use a macro here.
You can solve it by using sprintf(3), in conjonction with malloc or a buffer. See Creating C formatted strings (not printing them) or man pages for details.
About your edit: You don't need to use braces {} in a macro, and they are causing your error as preprocessing would translate it to something like
printf("format%d", {
is_enabled(x);
});
To better understand macros, run gcc or clang with -E flag, or try to read this article: http://en.wikipedia.org/wiki/C_preprocessor
That's a bit of a pain since you need to ensure there's storage for the string. In all honesty, macros nowadays could be reserved for conditional compilation only.
Constants are better done with enumerated types, and macro functions are generally better as inline functions (with the knowledge that inline is a suggestion to the compiler, not a demand).
If you insist on using a macro, the storage could be done with static storage though that has problems with threads if you're using them, and delayed/multiple use of the returned string.
You could also dynamically allocate the string but then you have to free it when done, and handle out-of-memory conditions.
Perhaps the easiest way is to demand the macro user provide their own storage, along the lines of:
#include <stdio.h>
#define TEST2_STR(b,p) (sprintf(b,"PORT-%d",p),b)
int main (void) {
char buff[20];
puts (TEST2_STR(buff, 42));
return 0;
}
which outputs:
PORT-42
In case the macro seems a little confusing, it makes use of the comma operator, in which the expression (a, b) evaluates both a and b, and has a result of b.
In this case, it evaluates the sprintf (which populates the buffer) then "returns" the buffer. And, even if you think you've never seen that before, you're probably wrong:
for (i = 0, j = 9; i < 10; i++, j--)
xyzzy[i] = plugh[j];
Despite most people thinking that's a feature of for, it's very much a different construct that can be used in many different places:
int i, j, k;
i = 7, j = 4, k = 42;
while (puts("Hello, world"),sleep(1),1);
(and so on).
I often see instances in which using a macro is better than using a function.
Could someone explain me with an example the disadvantage of a macro compared to a function?
Macros are error-prone because they rely on textual substitution and do not perform type-checking. For example, this macro:
#define square(a) a * a
works fine when used with an integer:
square(5) --> 5 * 5 --> 25
but does very strange things when used with expressions:
square(1 + 2) --> 1 + 2 * 1 + 2 --> 1 + 2 + 2 --> 5
square(x++) --> x++ * x++ --> increments x twice
Putting parentheses around arguments helps but doesn't completely eliminate these problems.
When macros contain multiple statements, you can get in trouble with control-flow constructs:
#define swap(x, y) t = x; x = y; y = t;
if (x < y) swap(x, y); -->
if (x < y) t = x; x = y; y = t; --> if (x < y) { t = x; } x = y; y = t;
The usual strategy for fixing this is to put the statements inside a "do { ... } while (0)" loop.
If you have two structures that happen to contain a field with the same name but different semantics, the same macro might work on both, with strange results:
struct shirt
{
int numButtons;
};
struct webpage
{
int numButtons;
};
#define num_button_holes(shirt) ((shirt).numButtons * 4)
struct webpage page;
page.numButtons = 2;
num_button_holes(page) -> 8
Finally, macros can be difficult to debug, producing weird syntax errors or runtime errors that you have to expand to understand (e.g. with gcc -E), because debuggers cannot step through macros, as in this example:
#define print(x, y) printf(x y) /* accidentally forgot comma */
print("foo %s", "bar") /* prints "foo %sbar" */
Inline functions and constants help to avoid many of these problems with macros, but aren't always applicable. Where macros are deliberately used to specify polymorphic behavior, unintentional polymorphism may be difficult to avoid. C++ has a number of features such as templates to help create complex polymorphic constructs in a typesafe way without the use of macros; see Stroustrup's The C++ Programming Language for details.
Macro features:
Macro is Preprocessed
No Type Checking
Code Length Increases
Use of macro can lead to side effect
Speed of Execution is Faster
Before Compilation macro name is replaced by macro value
Useful where small code appears many time
Macro does not Check Compile Errors
Function features:
Function is Compiled
Type Checking is Done
Code Length remains Same
No side Effect
Speed of Execution is Slower
During function call, Transfer of Control takes place
Useful where large code appears many time
Function Checks Compile Errors
Side-effects are a big one. Here's a typical case:
#define min(a, b) (a < b ? a : b)
min(x++, y)
gets expanded to:
(x++ < y ? x++ : y)
x gets incremented twice in the same statement. (and undefined behavior)
Writing multi-line macros are also a pain:
#define foo(a,b,c) \
a += 10; \
b += 10; \
c += 10;
They require a \ at the end of each line.
Macros can't "return" anything unless you make it a single expression:
int foo(int *a, int *b){
side_effect0();
side_effect1();
return a[0] + b[0];
}
Can't do that in a macro unless you use GCC's statement expressions. (EDIT: You can use a comma operator though... overlooked that... But it might still be less readable.)
Order of Operations: (courtesy of #ouah)
#define min(a,b) (a < b ? a : b)
min(x & 0xFF, 42)
gets expanded to:
(x & 0xFF < 42 ? x & 0xFF : 42)
But & has lower precedence than <. So 0xFF < 42 gets evaluated first.
When in doubt, use functions (or inline functions).
However answers here mostly explain the problems with macros, instead of having some simple view that macros are evil because silly accidents are possible.You can be aware of the pitfalls and learn to avoid them. Then use macros only when there is a good reason to.
There are certain exceptional cases where there are advantages to using macros, these include:
Generic functions, as noted below, you can have a macro that can be used on different types of input arguments.
Variable number of arguments can map to different functions instead of using C's va_args.eg: https://stackoverflow.com/a/24837037/432509.
They can optionally include local info, such as debug strings:(__FILE__, __LINE__, __func__). check for pre/post conditions, assert on failure, or even static-asserts so the code won't compile on improper use (mostly useful for debug builds).
Inspect input args, You can do tests on input args such as checking their type, sizeof, check struct members are present before casting(can be useful for polymorphic types).Or check an array meets some length condition.see: https://stackoverflow.com/a/29926435/432509
While its noted that functions do type checking, C will coerce values too (ints/floats for example). In rare cases this may be problematic. Its possible to write macros which are more exacting then a function about their input args. see: https://stackoverflow.com/a/25988779/432509
Their use as wrappers to functions, in some cases you may want to avoid repeating yourself, eg... func(FOO, "FOO");, you could define a macro that expands the string for you func_wrapper(FOO);
When you want to manipulate variables in the callers local scope, passing pointer to a pointer works just fine normally, but in some cases its less trouble to use a macro still.(assignments to multiple variables, for a per-pixel operations, is an example you might prefer a macro over a function... though it still depends a lot on the context, since inline functions may be an option).
Admittedly, some of these rely on compiler extensions which aren't standard C. Meaning you may end up with less portable code, or have to ifdef them in, so they're only taken advantage of when the compiler supports.
Avoiding multiple argument instantiation
Noting this since its one of the most common causes of errors in macros (passing in x++ for example, where a macro may increment multiple times).
its possible to write macros that avoid side-effects with multiple instantiation of arguments.
C11 Generic
If you like to have square macro that works with various types and have C11 support, you could do this...
inline float _square_fl(float a) { return a * a; }
inline double _square_dbl(float a) { return a * a; }
inline int _square_i(int a) { return a * a; }
inline unsigned int _square_ui(unsigned int a) { return a * a; }
inline short _square_s(short a) { return a * a; }
inline unsigned short _square_us(unsigned short a) { return a * a; }
/* ... long, char ... etc */
#define square(a) \
_Generic((a), \
float: _square_fl(a), \
double: _square_dbl(a), \
int: _square_i(a), \
unsigned int: _square_ui(a), \
short: _square_s(a), \
unsigned short: _square_us(a))
Statement expressions
This is a compiler extension supported by GCC, Clang, EKOPath & Intel C++ (but not MSVC);
#define square(a_) __extension__ ({ \
typeof(a_) a = (a_); \
(a * a); })
So the disadvantage with macros is you need to know to use these to begin with, and that they aren't supported as widely.
One benefit is, in this case, you can use the same square function for many different types.
Example 1:
#define SQUARE(x) ((x)*(x))
int main() {
int x = 2;
int y = SQUARE(x++); // Undefined behavior even though it doesn't look
// like it here
return 0;
}
whereas:
int square(int x) {
return x * x;
}
int main() {
int x = 2;
int y = square(x++); // fine
return 0;
}
Example 2:
struct foo {
int bar;
};
#define GET_BAR(f) ((f)->bar)
int main() {
struct foo f;
int a = GET_BAR(&f); // fine
int b = GET_BAR(&a); // error, but the message won't make much sense unless you
// know what the macro does
return 0;
}
Compared to:
struct foo {
int bar;
};
int get_bar(struct foo *f) {
return f->bar;
}
int main() {
struct foo f;
int a = get_bar(&f); // fine
int b = get_bar(&a); // error, but compiler complains about passing int* where
// struct foo* should be given
return 0;
}
No type checking of parameters and code is repeated which can lead to code bloat. The macro syntax can also lead to any number of weird edge cases where semi-colons or order of precedence can get in the way. Here's a link that demonstrates some macro evil
one drawback to macros is that debuggers read source code, which does not have expanded macros, so running a debugger in a macro is not necessarily useful. Needless to say, you cannot set a breakpoint inside a macro like you can with functions.
Functions do type checking. This gives you an extra layer of safety.
Adding to this answer..
Macros are substituted directly into the program by the preprocessor (since they basically are preprocessor directives). So they inevitably use more memory space than a respective function. On the other hand, a function requires more time to be called and to return results, and this overhead can be avoided by using macros.
Also macros have some special tools than can help with program portability on different platforms.
Macros don't need to be assigned a data type for their arguments in contrast with functions.
Overall they are a useful tool in programming. And both macroinstructions and functions can be used depending on the circumstances.
I did not notice, in the answers above, one advantage of functions over macros that I think is very important:
Functions can be passed as arguments, macros cannot.
Concrete example: You want to write an alternate version of the standard 'strpbrk' function that will accept, rather than an explicit list of characters to search for within another string, a (pointer to a) function that will return 0 until a character is found that passes some test (user-defined). One reason you might want to do this is so that you can exploit other standard library functions: instead of providing an explicit string full of punctuation, you could pass ctype.h's 'ispunct' instead, etc. If 'ispunct' was implemented only as a macro, this wouldn't work.
There are lots of other examples. For example, if your comparison is accomplished by macro rather than function, you can't pass it to stdlib.h's 'qsort'.
An analogous situation in Python is 'print' in version 2 vs. version 3 (non-passable statement vs. passable function).
If you pass function as an argument to macro it will be evaluated every time.
For example, if you call one of the most popular macro:
#define MIN(a,b) ((a)<(b) ? (a) : (b))
like that
int min = MIN(functionThatTakeLongTime(1),functionThatTakeLongTime(2));
functionThatTakeLongTime will be evaluated 5 times which can significantly drop perfomance
I'm trying to understand when and when not to use the restrict keyword in C and in what situations it provides a tangible benefit.
After reading, "Demystifying The Restrict Keyword", ( which provides some rules of thumb on usage ), I get the impression that when a function is passed pointers, it has to account for the possibility that the data pointed to might overlap (alias) with any other arguments being passed into the function. Given a function:
foo(int *a, int *b, int *c, int n) {
for (int i = 0; i<n; ++i) {
b[i] = b[i] + c[i];
a[i] = a[i] + b[i] * c[i];
}
}
the compiler has to reload c in the second expression, because maybe b and c point to the same location. It also has to wait for b to be stored before it can load a for the same reason. It then has to wait for a to be stored and must reload b and c at the beginning of the next loop. If you call the function like this:
int a[N];
foo(a, a, a, N);
then you can see why the compiler has to do this. Using restrict effectively tells the compiler that you will never do this, so that it can drop the redundant load of c and load a before b is stored.
In a different SO post, Nils Pipenbrinck, provides a working example of this scenario demonstrating the performance benefit.
So far I've gathered that it's a good idea to use restrict on pointers you pass into functions which won't be inlined. Apparently if the code is inlined the compiler can figure out that the pointers don't overlap.
Now here's where things start getting fuzzy for me.
In Ulrich Drepper's paper, "What every programmer should know about memory" he makes the statement that, "unless restrict is used, all pointer accesses are potential sources of aliasing," and he gives a specific code example of a submatrix matrix multiply where he uses restrict.
However, when I compile his example code either with or without restrict I get identical binaries in both cases. I'm using gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
The thing I can't figure out in the following code is whether it needs to be rewritten to make more extensive use of restrict, or if the alias analysis in GCC is just so good that it's able to figure out that none of the arguments alias each other. For purely educational purposes, how can I make using or not using restrict matter in this code - and why?
For restrict compiled with:
gcc -DCLS=$(getconf LEVEL1_DCACHE_LINESIZE) -DUSE_RESTRICT -Wextra -std=c99 -O3 matrixMul.c -o matrixMul
Just remove -DUSE_RESTRICT to not use restrict.
#include <stdlib.h>
#include <stdio.h>
#include <emmintrin.h>
#ifdef USE_RESTRICT
#else
#define restrict
#endif
#define N 1000
double _res[N][N] __attribute__ ((aligned (64)));
double _mul1[N][N] __attribute__ ((aligned (64)))
= { [0 ... (N-1)]
= { [0 ... (N-1)] = 1.1f }};
double _mul2[N][N] __attribute__ ((aligned (64)))
= { [0 ... (N-1)]
= { [0 ... (N-1)] = 2.2f }};
#define SM (CLS / sizeof (double))
void mm(double (* restrict res)[N], double (* restrict mul1)[N],
double (* restrict mul2)[N]) __attribute__ ((noinline));
void mm(double (* restrict res)[N], double (* restrict mul1)[N],
double (* restrict mul2)[N])
{
int i, i2, j, j2, k, k2;
double *restrict rres;
double *restrict rmul1;
double *restrict rmul2;
for (i = 0; i < N; i += SM)
for (j = 0; j < N; j += SM)
for (k = 0; k < N; k += SM)
for (i2 = 0, rres = &res[i][j],
rmul1 = &mul1[i][k]; i2 < SM;
++i2, rres += N, rmul1 += N)
for (k2 = 0, rmul2 = &mul2[k][j];
k2 < SM; ++k2, rmul2 += N)
for (j2 = 0; j2 < SM; ++j2)
rres[j2] += rmul1[k2] * rmul2[j2];
}
int main (void)
{
mm(_res, _mul1, _mul2);
return 0;
}
It is a hint to the code optimizer. Using restrict ensures it that it can store a pointer variable in a CPU register and not have to flush an update of the pointer value to memory so that an alias is updated as well.
Whether or not it takes advantage of it depends heavily on implementation details of the optimizer and the CPU. Code optimizers already are heavily invested in detecting non-aliasing since it is such an important optimization. It should have no trouble detecting that in your code.
Also, GCC 4.0.0-4.4 has a regression bug that causes the restrict keyword to be ignored. This bug was reported as fixed in 4.5 (I lost the bug number though).
(I don't know if using this keyword gives you a significant advantage, actually. It's very easy for programmer to err with this qualifier as there is no enforcement, so an optimizer cannot be certain that the programmer doesn't "lie".)
When you know that a pointer A is the only pointer to some region of memory, that is, it doesn't have aliases (that is, any other pointer B will necessarily be unequal to A, B != A), you can tell this fact to the optimizer by qualifying the type of A with the "restrict" keyword.
I have written about this here: http://mathdev.org/node/23 and tried to show that some restricted pointers are in fact "linear" (as mentioned in that post).
It's worth noting that recent versions of clang are capable of generating code with a run-time check for aliasing, and two code paths: one for cases where there is potential aliasing and the other for case where is is obvious there is no chance of it.
This clearly depends on the extents of data pointed to being conspicuous to the compiler - as they would be in the example above.
I believe the prime justification is for programs making heavy use of STL - and particularly <algorithm> , where is either difficult or impossible to introduce the __restrict qualifier.
Of course, this all comes at the expense of code-size, but removes a great deal of potential for obscure bugs that could result for pointers declared as __restrict not being quite as non-overlapping as the developer thought.
I would be surprised if GCC hadn't also got this optimisation.
May be the optimisation done here don't rely on pointers not being aliased ? Unless you preload multiple mul2 element before writing result in res2, I don't see any aliasing problem.
In the first piece of code you show, it is quite clear what kind of aliases problem can occur.
Here it is not so clear.
Rereading Dreppers article, he does not specifically says restrict might solve anything. There is even this phrase :
{In theory the restrict keyword
introduced into the C language in the
1999 revision should solve the
problem. Compilers have not caught up
yet, though. The reason is mainly that
too much incorrect code exists which
would mislead the compiler and cause
it to generate incorrect object code.}
In this code, optimisations of memory access has already been done within the algorithm. The residual optimisation seems to be done in the vectorized code presented in appendice. So for the code presented here, I guess there is no difference, because no optimisation relying on restrict is done. Every pointer access is a source of aliasing, but not every optimisation relies on aliassing.
Premature optimization being the root of all evil, the use of the restrict keyword should be limited to the case your are actively studying and optimizing, not used wherever it could be used.
If there is a difference at all, moving mm to a seperate DSO (such that gcc can no longer know everything about the calling code) will be the way to demonstrate it.
Are you running on 32 or 64-bit Ubuntu? If 32-bit, then you need to add -march=core2 -mfpmath=sse (or whatever your processor architecture is), otherwise it doesn't use SSE. Secondly, in order to enable vectorization with GCC 4.2, you need to add the -ftree-vectorize option (as of 4.3 or 4.4 this is included as default in -O3). It might also be necessary to add -ffast-math (or another option providing relaxed floating point semantics) in order to allow the compiler to reorder floating point operations.
Also, add the -ftree-vectorizer-verbose=1 option to see whether it manages to vectorize the loop or not; that's an easy way to check the effect of adding the restrict keyword.
The problem with your example code is that the compiler will just inline the call and see that there is no aliasing ever possible in your example. I suggest you remove the main() function and compile it using -c.
The following C99 code can show you that the output of the program depends on restrict :
__attribute__((noinline))
int process(const int * restrict const a, int * const b) {
*b /= (*a + 1) ;
return *a + *b ;
}
int main(void) {
int data[2] = {1, 2};
return process(&data[0], &data[0]);
}
The software terminates with code 1 using restrict and 0 without restrict qualifier.
The compilation is done with gcc -std=c99 -Wall -pedantic -O3 main.c.
The flag -O1 do the job too.
It is useful to use restrict when, for example, you can tell the compiler that the loop condition remains unchanged, even if another pointer has been updated (necessarily, the loop condition couldn't change due to restrict).
And certainly so on.