I have code that works mainly with single-precision floating point numbers. Calls to transcendental functions occur fairly often. Right now, I'm using sin(), cos(), sqrt(), etc--functions that accept and return double. When it's compiled for x87 only, I know there's no difference between single and double precision. I read in Agner Fog's optimization guide however that software versions of these function utilizing SSE instructions are faster for single-precision floating point numbers.
My question is whether the compiler would automatically use the faster function when it encounters something like:
float x = 1.23;
float y = sin(x);
Or does rounding rule preclude such an optimization?
It'd be easy enough to just do a search-and-replace and see whether there's any performance gain. Trouble is that I also need pointers to these functions. In MSVC, sinf(), cosf(), and friends are inline functions. Using them would therefore requires a bit of gymnastics. Before making the effort, I would like to know whether it's worthwhile.
Besides MSVC, I'm also targeting gcc.
There is really no need to have the cast when calling sin. In fact it would be counterproductive if you'd use the <tgmath.h> header that comes with C99. That provides you type generic macros that would chose the right function according to your argument, not for the target type unfortunately. So if you'd use that header (not sure if this is available for MS)
float x = 1.23;
float y = sin(x);
would automatically use sinf under the hood.
Related
I have a long formula, like the following:
float a = sin(b)*cos(c)+sin(c+d)*sin(d)....
Is there a way to use s instead of sin in C, to shorten the formula, without affecting the running time?
There are at least three options for using s for sin:
Use a preprocessor macro:
#define s(x) (sin(x))
#define c(x) (cos(x))
float a = s(b)*c(c)+s(c+d)*c(d)....
#undef c
#undef s
Note that the macros definitions are immediately removed with #undef to prevent them from affecting subsequent code. Also, you should be aware of the basics of preprocessor macro substitution, noting the fact that the first c in c(c) will be expanded but the second c will not since the function-like macro c(x) is expanded only where c is followed by (.
This solution will have no effect on run time.
Use an inline function:
static inline double s(double x) { return sin(x); }
static inline double c(double x) { return cos(x); }
With a good compiler, this will have no effect on run time, since the compiler should replace a call to s or c with a direct call to sin or cos, having the same result as the original code. Unfortunately, in this case, the c function will conflict with the c object you show in your sample code. You will need to change one of the names.
Use function pointers:
static double (* const s)(double) = sin;
static double (* const c)(double) = cos;
With a good compiler, this also will have no effect on run time, although I suspect a few more compilers might fail to optimize code using this solution than than previous solution. Again, you will have the name conflict with c. Note that using function pointers creates a direct call to the sin and cos functions, bypassing any macros that the C implementation might have defined for them. (C implementations are allowed to implement library function using macros as well as functions, and they might do so to support optimizations or certain features. With a good quality compiler, this is usually a minor concern; optimization of a direct call still should be good.)
if I use define, does it affect runtime?
define works by doing text-based substitution at compile time. If you #define s(x) sin(x) then the C pre-processor will rewrite all the s(x) into sin(x) before the compiler gets a chance to look at it.
BTW, this kind of low-level text-munging is exactly why define can be dangerous to use for more complex expressions. For example, one classic pitfall is that if you do something like #define times(x, y) x*y then times(1+1,2) rewrites to 1+1*2, which evaluates to 3 instead of the expected 4. For more complex expressions like it is often a good idea to use inlineable functions instead.
Don't do this.
Mathematicians have been abbreviating the trigonometric functions to sin, cos, tan, sinh, cosh, and tanh for many many years now. Even though mathematicians (like me) like to use their favourite and often idiosyncratic notation so puffing up any paper by a number of pages, these have emerged as pretty standard. Even LaTeX has commands like \sin, \cos, and \tan.
The Japanese immortalised the abbreviations when releasing scientific calculators in the 1970s (the shorthand can fit easily on a button), and the C standard library adopted them.
If you deviate from this then your code immediately becomes difficult to read. This can be particularly pernicious with mathematical code where you can't immediately see the effects of a bad implementation.
But if you must, then a simple
static double(*const s)(double) = sin;
will suffice.
I can' figure out what is the advantage of using
#define CRANDOM() (random() / 2.33);
instead of
float CRANDOM() {
return random() / 2.33;
}
By using a #define macro you are forcing the body of the macro to be inserted inline.
When using a function there will1 be a function call (and therefor a jump to the address of the function (among other things)), which will slow down performance somewhat.
The former will most often be faster, even though the size of the executable will grow for each use of the #defined macro.
greenend.org.uk - Inline Functions In C
1 a compiler might be smart enough to optimize away the function call, and inline the function - effectively making it the same as using a macro. But for the sake of simplicitly we will disregard this in this post.
It makes sure that the call to CRANDOM is inlined, even if the compiler doesn't support inlining.
Firstly, that #define is incorrect due to the semi-colon at the end, and the compiler would baulk at:
float f = CRANDOM() * 2;
Secondly, I personally try and avoid using the preprocessor beyond separating platform-independent sections in cross-platform code, and of course code reserved exclusively for DEBUG or non-DEBUG builds.
nightcracker correctly states it will always be "effectively" inline, but given you can re-write the function to be inline itself, I see no advantage to using the preprocessor version unless the C-compiler in question does not inline.
The former is old style. The only advantage of the former is that if you have a compiler following the old C90 standard, the macro will work as inlining. On a modern C compiler you should always write:
inline float CRANDOM() {
return random() / 2.33f;
}
where the inline keyword is optional.
(Note that float literals must have a f at the end, otherwise you force the calculation to be performed on double, which you then implicitly round into a float.)
Calling a function involves a little bit of overhead -- pushing the return address onto the machine stack and branching, in this case. By using a macro, you can avoid this overhead. A long time ago, this was important; these days, many compilers will insert the body of a tiny function like this inline, anyway. In general, trying to fool the compiler into emitting faster code is a fool's game; you often end up creating something slower.
I have an question about performance of my code.
Let's say I have a struct in C for a point:
typedef struct _CPoint
{
float x, y;
} CPoint;
and a function where I use the struct.
float distance(CPoint p1, CPoint p2)
{
return sqrt(pow((p2.x-p1.x),2)+pow((p2.y-p1.y),2));
}
I was wondering if it would be a smart idea to replace this function for a #define,
#define distance(p1, p2)(sqrt(pow((p2.x-p1.x),2)+pow((p2.y-p1.y),2)));
I think it will be faster because there will be no function overhead, and I'm wondering if I should use this approach for all other functions in my program to increase the performance. So my question is:
Should I replace all my functions with #define to increase the performance of my code?
No. You should never make the decision between a macro and a function based on a perceived performance difference. You should evaluate it soley based on the merits of functions over macros. In general choose functions.
Macros have a lot of hidden downsides that can bite you. Case in point, your translation to a macro here is incorrect (or at least not semantics preserving with the original function). The argument to the macro distance gets evaluated 2 times each. Imagine I made the following call
distance(GetPointA(), GetPointB());
In the macro version this actually results in 4 function calls because each argument is evaluated twice. Had distance been left as a function it would only result in 3 function calls (distance and each argument). Note: I'm ignoring the impact of sqrt and pow in the above calculations as they're the same in both versions.
There are three things:
normal functions like your distance above
inline functions
preprocessor macros
While functions guarantee some kind of type safety, they also incur a performance loss due to the fact that a stack frame needs to be used at each function call. code from inline functions is copied at the call site so that penalty is not paid -- however, your code size will increase. Macros provide no type safety and also involve textual substitution.
Choosing from all three, I'd usually use inline functions. Macros only when they are very short and very useful in this form (like hlist_for_each from the Linux kernel)
Jared's right, and in this specific case, the cycles spent in the pow calls and the sqrt call would be in the range of 2 orders of magnitude more than the cycles spent in the call to distance.
Sometimes people assume that small code equals small time. Not so.
I'd recommend an inline function rather than a macro. It'll give you any possible performance benefits of a macro, without the ugliness. (Macros have some gotchas that make them very iffy as a general replacement for functions. In particular, macro args are evaluated every time they're used, while function args are evaluated once each before the "call".)
inline float distance(CPoint p1, CPoint p2)
{
float dx = p2.x - p1.x;
float dy = p2.y - p1.y;
return sqrt(dx*dx + dy*dy);
}
(Note i also replaced pow(dx, 2) with dx * dx. The two are equivalent, and multiplication is more likely to be efficient. Some compilers might try to optimize away the call to pow...but guess what they replace it with.)
If using a fairly mature compiler it propaby will do this for you on assembly level if optimisation is swtiched on.
For gcc the -O3 or (for "small" functions) even the -O2 option will do this.
For details on this you might consider reading here http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for "-finline*" options.
I have a function in C that accepts and returns a double (and uses several doubles internally). Is there a good way to make a second version of the function, just like it except with float in place of double? Also constants like DBL_EPSILON should be updated.
I suppose I could do this with the preprocessor, but that seems awkward (and probably difficult to debug if there's a compile error). What do best practices recommend? I can't imagine I'm the only one who's had to deal with this.
Edit: I forgot, this is stackoverflow so I can't just ask a question, I have to justify myself. I have code which is very sensitive to precision in this case; the cost of using doubles rather than floats is 200% to 300%. Up until now I only needed a double version -- when I needed it I wanted as much precision as possible, regardless of the time needed (in that application it was a tiny percentage). But now I've found a use that is sensitive to speed and doesn't benefit from the extra precision. I cringed at my first thought, which was to copy the entire function and replace the types. Then I thought that a better approach would be known to the experts at SO so I posted here.
don't know about "best practices", but the preprocessor definitely was the first thing to jump to my mind. it's similar to templates in C++.
[edit: and the Jesus Ramos answer mentions the different letters on functions with different types in libraries, and indeed you would probably want to do this]
you create a separate source file with your functions, everywhere you have a double change it to FLOATING_POINT_TYPE (just as an example) and then include your source file twice from another file. (or whatever method you choose you just need to be able to ultimately process the file twice, once with each data type as your define.) [also to determine the character appended to distinguish different versions of the function, define FLOATING_POINT_TYPE_CHAR]
#define FLOATING_POINT_TYPE double
#define FLOATING_POINT_TYPE_CHAR d
#include "my_fp_file.c"
#undef FLOATING_POINT_TYPE_CHAR
#undef FLOATING_POINT_TYPE
#define FLOATING_POINT_TYPE float
#define FLOATING_POINT_TYPE_CHAR f
#include "my_fp_file.c"
#undef FLOATING_POINT_TYPE
#undef FLOATING_POINT_TYPE_CHAR
then you can also use a similar strategy for your prototypes in your headers.
but, so in your header file you would need something something like:
#define MY_FP_FUNC(funcname, typechar) \
funcname##typechar
and for your function definitions/prototypes:
FLOATING_POINT_TYPE
MY_FP_FUNC(DoTheMath, FLOATING_POINT_TYPE_CHAR)
(
FLOATING_POINT_TYPE Value1,
FLOATING_POINT_TYPE Value2
);
and so forth.
i'll definitely leave it to someone else to talk about best practices :)
BTW for an example of this kind of strategy in a mature piece of software you can check out FFTW (fftw.org), although it's a bit more complicated than the example i think it uses basically the same strategy.
Don't bother.
Except for a few specific hardware implementations, there is no advantage to having a float version of a double function. Most IEEE 754 hardware performs all calculations in 64- or 80-bit arithmetic internally, and truncates the results to the desired precision on storing.
It is completely fine to return a double to be used or stored as a float. Creating a float version of the same logic is not likely to run any faster or be more suitable for much of anything. The only exception coming to mind would be GPU-optimized algorithms which do not support 64+ bit operations.
As you can see from most standard librarys and such methods aren't really overridden just new methods are created. For example:
void my_function(double d1, double d2);
void my_functionf(float f1, float f2);
A lot of them have different last letters in the method to indicate that it is sort of like a method override for different types. This also applies for return types such as the function atoi, atol, atof.... etc.
Alternatively wrap your function in a macro that adds the type as an argument such as
#define myfunction(arg1, arg2, type) ....
This way it's much easier as you can now just wrap everything with your type avoiding copy pasting the function and you can always check type.
In this case I would say the best practice would be writing a custom codegen tool, which will take 'generic' code and create new version of double and float each time before compilation.
Could someone please help and tell me how to include IEEE mathematical functions in MSVC++6? I tried both and , but I still get these errors:
error C2065: 'ilogbf' : undeclared identifier
error C2065: 'scalbnf' : undeclared identifier
Edit 3: Hopefully this will be my final edit. I have come to realize that I haven't properly addressed this question at all. I am going to leave my answer in place as a cautionary tale, and because it may have some educational value. But I understand why I have zero upvotes, and in fact I am going to upvote Andy Ross' answer because I think his is much more relevant (although incomplete at least at the time of writing). It seems to me my mistake was to take the Man definitions I found for ilogbf() a little superficially. It's a function that takes the integer part of the log of a float, how hard can that be to implement ? It turns out what the function is really about is IEEE floating point representation, in particular the exponent (as opposed to the mantissa) part of that representation. I should definitely have realized that before attempting to answer the question! An interesting point to me is how a function can possibly find the exponent part of a float, as I thought a fundamental rule of C is that floats are promoted to doubles as part of a function call. But that's a whole separate discussion of course.
--- End of edit 3, start of cautionary tale ---
A little googling suggests these are defined in some flavors of Unix, but maybe are not in any Posix or ANSI standard and so not provided with the MSVC libraries. If the functions aren't in the library they won't be declared in math.h. Obviously if the compiler can't see declarations for these external symbols it won't be happy and you'll get errors like the ones you list.
The obvious work around is to create your own versions of these functions, using math functions that are provided. eg
#include <math.h>
int ilogbf( float f )
{
double d1 = (double)f;
double d2 = log(d1);
int ret = (int)d2;
return ret;
}
Edit: This isn't quite right. Apparently, this function should use log to the base 2, rather than natural logs, so that the returned value is actually a binary exponent. It should also take the absolute value of its parameter, so that it will work for negative numbers as well. I will work up an improved version, if you ask me in a comment, otherwise I'm tempted to leave that as an exercise for the reader :-)
The essence of my answer, i.e. that ANSI C doesn't require this function and that MSVC doesn't include it, is apparently correct.
Edit 2: Okay I've weakened and provided an improved version without being asked. Here it is;
#include <math.h>
int ilogbf( float f )
{
double d1 = (double)f;
if( d1 < 0 )
d1 = -d1;
double d2 = log(d1) / log(2); // log2(x) = ln(x)/ln(2)
int ret = (int)d2;
return ret;
}
These are C99 functions, not IEEE754-1985. Microsoft seems to have decided that their market doesn't care about C99 support, so they haven't bothered to provide them. This is a shame, but unless more of you (developers) complain, there's no reason to expect that the situation will change.
The brand new 754 standard, IEEE754-2008, requires these functions (Clause 5.3.3, "logBFormat operations"), but that version of the standard won't be widely adopted for several more years; even if it does reach wide adoption, Microsoft hasn't seen fit to provide these functions for the ten years they've been in C99 so why would they bother to provide them just because they're in the IEEE754 standard?
edit: note that scalb and logb are defined in the IEEE754-1985 Appendix "Recommended Functions and Predicates", but said appendix is explicitly "not a part of" said standard.
If you know you're on an IEEE system (and these days, you do), these functions aren't needed: just inspect the bits directly by unioning the double with a uint64_t. Presumably you're using these functions in the interest of efficiency in the first place (otherwise you'd be using more natural operations like log() or exp()), so spending a little effort on matching your code to the floating point representation is probably worthwhile.