I want to make several constants in C with #define to speed up computation. Two of them are not simply trivial numbers, where one is a right shift, the other is a power. math.h in C gives the function pow() for doubles, whereas I need powers for integers, so I wrote my own function, ipow, so I wouldn't need to be casting everytime.
My question is this: One of the #define constants I want to make is a power, say ipow(M, T), where M and T were also #define constants. ipow is a function in the actual code, so this actually seems to slows things down when I run the code (is it running ipow everytime the constant is mentioned?). However, when I ues the built in pow function and just do (int)pow(M,T), the code is sped up. I'm confused as to why this is, since the ipow and pow functions are just as fast.
On a more general note, can I define constants using #define using functions inside the actual code? The above example has me confused on whether this speeds things up or actually slows things down.
(int)pow(M,T) is faster than using your function ipow, because if they are doing the same, then ipow is as fast but with the overhead of calling it (pushing arguments, etc.).
Also, yes, if you #define it in this way, ipow / pow / whatever is called every time; the preprocessor has no idea about what it is doing; it's basically string replacing. Therefore, your constant is simply being replaced by the text ipow(M,T) and so it is calculated everytime you need your constant.
Finally, for your case, a solution might be to use a global variable instead of a #define constant for your constant. This way, you can compute it once at the beginning of your program, and then use it later (without any more computations of it).
You don't need C++ to do metaprogramming. If you have a C99 compatible C compiler and preprocessor you can use P99 with something like the following
#include "p99_for.h"
#define P00_POW(NAME, I, REC, Y) (NAME) * (REC)
#define IPOW(N, X) (P99_FOR(X, N, P00_POW, P00_NAM, P99_REP(N,)))
For example IPOW(4, A) is then expanded to ((A) * ((A) * ((A) * (A)))). The only things that you should watch are
N must be (or expand to) a plain decimal constant with no suffix such as U or L
X should not have side effects
since it is evaluated several times
Yes, ipow is getting run every time the constant is mentioned. The C preprocessor is simply replacing all mentions of the constant with what you #define'd it as.
EDIT:
If you really want to compute these integers at compile time, you could try using Template Metaprogramming. This requires C++, however.
I don't think this is possible with c pre-possessor , because it doesn't support recursion.
(you can use template meta-programming if you are using c++)
I suspect that (int)pow(M,T) is faster than using (int)ipow(M,T) because the compiler has special knowledge of the pow() function (as an intrinsic). I wouldn't be surprised if given constant arguments that it elides the function call altogether when pow() is used.
However, since it has no special knowledge of ipow(), it doesn't do the same, and ends up actually calling the function.
You should be able to verify whether or not this is happening by looking at the assembly generated in a debugger or by having the compiler create an assembly listing. If that's what's happening, and your ipow() function is nothing more than a call to pow() (casting the result to an int), you might be able to convince your compiler to perform the same optimization for ipow() by making it an inline function.
Your ipow isn't faster since its just a simple call to a function.
Also I'm aware of compiler optimisation for standard C library routines and math functions.
Most possible the compiler is capable of determining the constexpr parameters and calculate the value of the #define at compile time.
Internally they will be replaced to some thing like this where the exponent is constant.
#define pow2(x) ( (x) * (x) )
#define pow3(x) ( (x) * (x) * (x) )
#define pow4(x) ( pow2(x) * pow2(x) )
#define pow5(x) ( pow4(x) * (x) )
#define pow6(x) ( pow3(x) * pow3(x) )
...
The only work around is to use C++ metta programming to get better run time performance.
template<class T, T base, T exp>
struct ipow
{
static const T value = base * ipow<T, base, exp - 1>::value;
};
template<class T, T base>
struct ipow<T, base, 0>
{
static const T value = 1;
};
you would use the above struct as follows:
ipow<size_t, M, T>::value
The C preprocessor will not evaluate a function call to a C function such as ipow or pow at compile time, it merely does text replacement.
The preprocessor does have a concept of function-like macros, however these are not so much 'evaluated' as text replaced. It would be temping to think you could write a recursive function-like macro to self-multiply a constant to raise it to a power, but in fact you can't - due to the non-evaluation of macro bodies, you won't actually get continually recursive calculation when the macro refers to itself.
For your shift operation, a #define involving constants and the shift operator will get text replaced by the preprocessor, but the constant expression will get evaluated during compilation, so this is efficient. In fact it's very common in hardware interfaces, ie #define UART_RXD_READY ( 1 << 11 ) or something like that
Related
Is there a way in C to programmatically determine that variable's value was computed at compile time or at run time?
Example:
const double a = 2.0;
const double b = 3.0;
double c1 = a / b; // done at compile time (constant folding / propagation)
double c2 = *(volatile double*)&a / *(volatile double*)&b; // done at run time
compute_time_t c1_ct = compute_time(c1);
compute_time_t c2_ct = compute_time(c2);
assert(c1_ct == COMPILE_TIME);
assert(c2_ct == RUN_TIME);
In C (as in, defined by the language standard), no, there is no way.
There are however compiler-specific ways using which you can get really close to achieving what you want. The most famous, as #Nate Eldredge notes in the comments, is the builtin function __builtin_constant_p() available in GCC and Clang.
Here's the relevant excerpt from the GCC doc:
Built-in Function: int __builtin_constant_p (exp)
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the -O option.
Note that this function does not guarantee to detect all compile-time constants, but only the ones that GCC is able to prove as such. Different optimization levels might change the result returned by this function.
This built-in function is widely used in glibc for optimization purposes (example), and usually the result is only trusted when it's 1, assuming a non-constant otherwise:
void somefunc(int x) {
if (__builtin_constant_p(x)) {
// Perform optimized operation knowing x is a compile-time constant.
} else {
// Assume x is not a compile-time constant.
}
}
Using your own example:
const double a = 2.0;
const double b = 3.0;
double c1 = a / b; // done at compile time (constant folding / propagation)
double c2 = *(volatile double*)&a / *(volatile double*)&b; // done at run time
assert(__builtin_constant_p(c1));
assert(!__builtin_constant_p(c2));
You ask,
Is there a way in C to programmatically determine that variable's
value was computed at compile time or at run time?
No, there is no way to encode such a determination into the source of a strictly conforming C program.
Certainly C does not require values to be tagged systematically in a way that distinguishes among them based on when they were computed, and no C implementation I have ever heard of or imagined does that, so such a determination cannot be based on the values of the expressions of interest. Furthermore, all C function arguments are passed by value, so the hypothetical compute_time() cannot be implemented as a function because values are all it would have to work with.
compute_time() also cannot be a macro, because macros can work only with (preprocessing) tokens, for example the identifiers c1 and c2 in your example code. Those are opaque to the preprocessor; it knows nothing about values attributed to them when they are evaluated as expressions according to C semantics.
And there is no operator that serves the purpose.
Standard C provides no other alternatives, so if the question is about the C language and not any particular implementation of it then that's the end of the story. Moreover, although it is conceivable that a given C implementation would provide your compute_time() or a functional equivalent as an extension, I am unaware of any that do. (However, see #MarcoBonelli's answer, for an example of a similar, but not identical, extension.)
I need to write a macro which traps any invalid index i for an array of length n. Here is what I got so far:
#define TRAP(i, n) (((unsigned int) (i) < (n))? (i): (abort(), 0))
The problem with this definition, however, is that the index expression i is evaluated twice; in the expression a[TRAP(f(), n)], for instance, f may have a side effect or take a long time to execute. I cannot introduce a temporary variable since the macro needs to expand to an expression. Also, defining TRAP as an ordinary function implies a run-time overhead and makes it harder for the compiler to optimize away the trap.
Is there a way to rewrite TRAP so that i is evaluated only once?
Edit: I'm using ANSI C89
You can evaluate once, and use the result, by doing something like this:
#define TRAP2(i, n) ({unsigned int _i = (i); _i < (n)? _i: (abort(), 0);})
This is a gcc specific solution, that will compile when used as the RHS of an assignment. It defines a (very) local variable, which might hide a prior definition of another variable, but that doesn't matter, as long as you don't try to use the prior version in the macro. But as people say, why do this in the first place?
Use the macro TRAP when the index expression doesn't contain a function call and use a (non-macro) function trap when it does. This way the function call overhead only occurs in the rarer latter case.
I would like to switch processing in a library routine based on whether a parameter exceeds a function of a system limit or not, for example whether (input - 1) <= sqrt (LONG_MAX)
As I see it, I have three choices of implementing this in C:
evaluate the function in each library call. Expensive, though some compilers can probably optimise out math.h function calls with constant parameters
define the result of the function call as a preprocessor macro. Looking at glibc limits.h this would require two #defines based on the __WORDSIZE value. I don't think this would be portable
create a global variable that is set to the result of the function in an initialiser routine. This requires the library user to always run an init routine before any other library routines
I do not really like any of these approaches. A compromise between 1 and 3 would be to run the init internally if not run previously. This spares the user the need to do it and reduces the runtime overhead to one boolean value check.
Is there some more elegant solution possible?
"Elegant" is not really a well defined term, you would have been better off specifying something more measurable, like "speed".
If speed is indeed the goal, an the system parameter is one that doesn't change at runtime, you can have a portable solution like:
#undef SQRT_LM
#if LONG_MAX == 64
#define SQRT_LM 8
#endif
#if LONG_MAX == 256
#define SQRT_LM 16
#endif
: : :
#ifndef SQRT_LM
#error Weird LONG_MAX value, please adjust code above.
#endif
Then your code can simply use SQRT_LM as a constant value.
The 1/3 combo, along the lines of:
void doSomething(int x) {
static long sqrt_lm = -1;
if (sqrt_lm == -1)
sqrt_lm = sqrt(LONG_MAX);
// Now can use sqrt_lm freely
}
is not really as efficient as forcing the user to explicitly call an init function, since the above code still has to perform the if on every call.
But, as stated, it really depends on what you mean by "elegant". I tend to optimise for readability first and only worry about performance if it becomes a serious issue.
Use a static variable in the function:
void foo(int input)
{
static const long limit = __builtin_sqrt(LONG_MAX);
assert(input < limit);
}
So limit is only computed the first time the function is executed. This requires that the function is a constant expression, which is why I use GCC's __builtin_sqrt(); regular sqrt() will be rejected (by GCC, at least).
Isn't (input -1) <= sqrt(LONG_MAX) the same as input <= sqrt(LONG_MAX) + 1 which just looks like a simple compare of a value with a constant.
How can I use #define to say that one value consists of the sum of two other values. Would it be allowed and good practice in C to do something like this?
#define VALUE_A 2
#define VALUE_B 2
#define SUM_A_B (VALUE_A + VALUE_B)
If not, what should I do to achieve this functionality?
The Linux and GCC header files do it routinely, if that's a vote of confidence. e.g.:
$ grep -r 'define.*+' /usr/include/
...
/usr/include/linux/fdreg.h:#define FD_STATUS (4 + FD_IOPORT )
...
/usr/include/linux/elf.h:#define PT_GNU_STACK (PT_LOOS + 0x474e551)
...
/usr/include/i386-linux-gnu/asm/unistd_32.h:#define __NR_timer_settime (__NR_timer_create+1)
etc.
If you just need that for integer constants (type int) you may use enumerations for this type of constants
enum { SUM_A_B = (VALUE_A + VALUE_B), };
possible advantages:
the sum is only evaluated once by the compiler. This is not a big
deal for modern compilers if this is only such a simple sum, but
could make a small difference when you are using more complicated
expressions
even nowadays compiler errors and debugging information isn't that
good for values coming from the preprocessor. Enumeration constants usually can be traced well.
A disadvantage is that the value itself is not accessible in the preprocessor itself. So you can't do #if/#else constructs with it. But you could at least still define it as
#define SUM_A_B SUM_A_B
So #ifdef/#else constructs would still work.
I have the following code
void Fun2()
{
if(X<=A)
X=ceil(M*1.0/A*X);
else
X=M*1.0/(M-A)*(M-X);
}
I want to program it in fast manner using C99, take into account the following comments.
Xand A, are 32 bit variables and I declare them as uint64_t, While M as static const uint64_t.
This function is called by another function and the value of A are changed to a new value every n times of calling.
The optimization is needed in the execution time, CPU is Core i3, OS is windows 7
The math model I want to implement it is
F=ceil(Max/A*X) if x<=A
F=floor(M/(M-A)*(M-X)) if x>A
For clarity and no confusion My previous post was
I have the following code
void Fun2()
{
if(X0<=A)
X0=ceil(Max1*X0);
else
X0=Max2*(Max-X0);
}
I want to program it in fast manner using C99, take into account the following comments.
X0, A, Max1, and Max2 are 32 bit variable and I declare them as uint64_t, While Max as static const uint64_t.
This function is called by another function and the values of Max1, A, Max2 are changed to random values every n times of calling.
I work in Windows 7 and in codeblocks software
Thanks
It is completely pointless and impossible to optimize code like this without a specific target in mind. In order to do so, you need the following knowledge:
Which CPU is used.
Which OS is used (if any).
In-depth knowledge of the above, to the point where you know more, or about as much of the system as the people who wrote the optimizer for the given compiler port.
What kind of optimization that is most important: execution speed, RAM usage or program size.
The only kind of optimization you can do without knowing the above is on the algorithm level. There are no such algorithms in the code posted.
Thus your question cannot be answered by anyone until more information is provided.
If "fast manner" means fast execution, your first change is to declare this function as an inline one, a feature of C99.
inline void Fun2()
{
...
...
}
I recall that GNU CC has some interesting macros that may help optimizing this code as well. I don't think this is C99 compliant but it is always interesting to note. I mean: your function has an if statement. If you can know by advance what probability has each branch of being taken, you can do things like:
if (likely(X0<=A)).....
If it's probable that X0 is less or equal than A. Or:
if (unlikely(X0<=A)).....
If it's not probable that X0 is less or equal than A.
With that information, the compiler will optimize the comparison and jump so the most probable branch will be executed with no jumps, so it will be executed faster in architectures with no branch prediction.
Another thing that may improve speed is to use the ?: ternary operator, as both branches assign a value to the same variable, something like this:
inline void Func2()
{
X0 = (X0>=A)? Max1*X0 : Max2*(Max-X0);
}
BTW: why use ceil()? ceil() is for double numbers to round down a decimal number to the nearest non greater integer. If X0 and Max1 are integer numbers, there won't be decimals in the result, so ceil() won't have any effect.
I think one thing that can be improved is not to use floating point. Your code mostly deals with integers, so you want to stick to integer arithmetic.
The only floating point number is Max1. If it's always whole, it can be an integer. If not, you may be able to replace it with two integers: Max1*X0 -> X0 * Max1_nom / Max1_denom. If you calculate the nominator/denominator once, and use many times, this can speed things up.
I'd transform the math model to
Ceil (M*(X-0) / (A-0)) when A<=X
Floor (M*(X-M) / (A-M)) when A>X
with
Ceil (A / B) = Floor((A + (B-1)) / B)
Which substituted to the first gives:
((M * (X - m0) + c ) / ( A - m0))
where
c = A-1; m0 = 0, when A <= X
c = 0; m0 = M, when A >= X
Everything will be performed in integer arithmetic, but it'll be quite tough to calculate the reciprocals in advance;
It may still be possible to use some form of DDA to avoid calculating the division between iterations.
Using the temporary constants c, m0 is simply for unifying the pipeline for both branches as the next step is in pursuit of parallelism.