Sometimes I find it's easier to understand code (for yourself in the future or others) by being explicit about arithmetic. E.g. writing 1+2+3 if you're adding 3 values from elsewhere, rather than a single magic number +6.
Is this inefficient or would a compiler optimize/reduce it appropriately? I'm thinking about C but in general is this something to worry about?
Yes. All competent C compilers will perform constant folding optimizations where possible, replacing constant mathematical expressions with their results. In most compilers, this type of optimization is applied even when optimizations are otherwise disabled (e.g, -O0). Here's an example.
This behavior is not restricted to C; most other compiled languages will perform this type of optimization as well. Interpreted languages typically do not, as the benefits are less dramatic there, and some of them may have semantics which may make constant folding an unsafe optimization (e.g, allowing basic operations to be overridden on builtin types).
Related
What are the reasons behind a C compiler ignoring register declarations? I understand that this declaration is essentially meaningless for modern compilers since they store values in registers when appropriate. I'm taking a Computer Architecture class so it's important that I understand why this is the case for older compilers.
"A register declaration advises the compiler that the variable in
question will be heavily used. The idea is that register variables
are to be placed in machine registers, which may result in smaller and
faster programs. But compilers are free to ignore the advice." ~ The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie
Thank you!
Historical C compilers might only be "smart" (complex) enough to look at one C statement at a time, like modern TinyCC. Or parse a whole function to find out how many variables there are, then come back and still only do code-gen one statement at a time. For examples of how simplistic and naive old compilers are, see Why do C to Z80 compilers produce poor code? - some of the examples shown could have been optimized for the special simple case, but weren't. (Despite Z80 and 6502 being quite poor C compiler targets because (unlike PDP-11) "a pointer" isn't something you can just keep in a register.)
With optimization enabled, modern compilers do have enough RAM (and compile-time) available to use more complex algorithms to map out register allocation for the whole function and make good decisions anyway, after inlining. (e.g. transform the program logic into an SSA form.) See also https://en.wikipedia.org/wiki/Register_allocation
The register keyword becomes pointless; the compiler can already notice when the address of a variable isn't taken (something the register keyword disallows). Or when it can optimize away the address-taking and keep a variable in a register anyway.
TL:DR: Modern compilers no longer need hand-holding to fully apply the as-if rule in the ways the register keyword hinted at.
They basically always keep everything in registers except when forced to spill it back to memory, unless you disable optimization for fully consistent debugging. (So you can change variables with a debugger when stopped at a breakpoint, or even jump between statements.)
Fun fact: Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? shows an example where register float makes modern GCC and clang create more efficient asm with optimization disabled.
I was doing some research on strict aliasing and how to handle it and found this commit on DPDK.
To fix strict aliasing (according to the comments), they are casting the void* parameters src and dst into uintptr_t. And then using the casted versions.
In my understanding, this should do nothing with the strict aliasing rule since there is no mention of casting to uintptr_t in the rule itself.
Would a cast to uintptr_t really help strict-aliasing? Or would this just fix some possible warnings from GCC?
Would a cast to uintptr_t really help strict-aliasing?
No, it would not.
Or would this just fix some possible warnings from GCC?
"Fix" in the sense of disguising the strict-aliasing violations well enough that the compiler does not diagnose them, yes, it might. And presumably it indeed did so for whoever made that change.
This is pernicious, because now, not only may the compiler do something unwanted with the code, but you cannot even prevent it from doing so by passing it the -fno-strict-aliasing option (or whatever similar option a different compiler might provide). Worse, it might work fine with the compiler used today, but break months or years later when you upgrade to a new version or when you switch to a different C implementation.
The "strict aliasing rules" specify situations where even implementations that are not intended to be suitable for low-level programming must allow for the possibility of aliasing between seemingly-unrelated objects. Compilers which are designed to be suitable for low-level programming are allowed to, and will, extend the language by behaving meaningfully--typically processing constructs "in a documented fashion characteristic of the environment" in more circumstances than mandated by the Standard, especially in the presence of constructs that would generally be useless otherwise.
Relatively few programs that aren't intending to access storage in low-level fashion will perform integer-to-pointer conversions. Thus, implementations that treat such conversions as an indication that they should avoid making any assumptions about the pointers formed thereby will be able to usefully process a wider range of programs than those which don't, without having to give up many opportunities for genuinely-useful optimizations. While it would be better to have the Standard specify a syntax for the purpose of erasing any evidence of pointer provenance, conversions through integer types presently work for almost all compilers other than clang.
To my understanding the difference between a macro and a function is, that a macro-call will be replaced by the instruction in the definition, and a function does the whole push, branch and pop -thing. Is this right, or have I understand something wrong?
Additionally, if this is right, it would mean, that macros would take more space, but would be faster (because of the lack of the push, branch and pop instructions), wouldn't it?
What you are wrote about the performance implications is correct if the C compiler is not optimizing. But optimizing compilers can inline functions just as if they were macros, so an inlined function call runs at the same speed as a macro, and there is no pushing/popping overhead. To trigger inlining, enable optimization in your compiler settings (e.g. gcc -O2), and put your functions to the .h file as static inline.
Please note that sometimes inlining/macros is faster, sometimes a real function call is faster, depending on the code and the compiler. If the function body is very short (and most of it will be optimized away), usually inlining is faster than a function call.
Another important difference that macros can take arguments of different types, and the macro definition can make sense for multiple types (but the compiler won't do type checking for you, so you may get undesired behavior or a cryptic error message if you use a macro with the wrong argument type). This polymorphism is hard to mimic with functions in C (but easy in C++ with function overloading and function templates).
This might have been right in the 1980s, but modern compilers are much better.
Functions don't always push and pop the stack, especially if they're leaf functions or have tail calls. Also, functions are often inlined, and can be inlined even if they are defined in other translation units (this is called link-time optimization).
But you're right that in general, when optimizations are turned off, a macro be inlined and a function won't be inlined. Either version may take more space, it depends on the particulars of the macro/function.
A function uses space in two ways: the body uses space, and the function call uses space. If the function body is very small, it may actually save space to inline it.
Yes your understanding is right. But you should also note that, no type checking in macro and it can lead to side effect. You should also be very careful in parenthesizing macros.
Your understanding is half correct. The point is that macros are resolved before compilation. You should think of them as sophisticated text replacement tools (that's oversimplifying it, but is mostly what it comes down to).
So the difference is when in the build process your code is used.
This is orthogonal to the question of what the compiler really does with it when it creates the final binary code. It is more or less free to do whatever it thinks is correct to produce the intended behaviour. In C++, you can only hint at your preference with the inline keyword. The compiler is free to ignore that hint.
Again, this is orthogonal to the whole preprocessor business. Nothing stops you from writing macros which result in C++ code using the inline keyword, after all. Likewise, nobody stops you from writing macros which result in a lot of recursive C++ functions which the compiler will probably not be able to inline even if wanted to do.
The conclusion is that your question is wrong. It's a general question of having binaries with a lot of inlined functions vs. binaries with a lot of real function calls. Macros are just one technique you can use to influence the tradeoff in one way or the other, and you will ask yourself the same general question without macros.
The assumption that inlining a function will always trade space for speed is wrong. Inlining the wrong (i.e. too big) functions will even have a negative impact on speed. As is always the case with such opimisations, do not guess but measure.
You should read the FAQ on this: "Do inline functions improve performance?"
Main reason for this is attempt to write perfectly portable C library. After a few weeks i ended up with constants, which are unfortunately not very flexible (using constants for defining another constants isn't possible).
Thx for any advice or critic.
What you ask of is impossible. As stated before me, any standards compliant implementation of C will have limits.h correctly defined. If it's incorrect for whatever reason, blame the vendor of the compiler. Any "dynamic" discovery of the true limits wouldn't be possible at compile time, especially if you're cross compiling for an embedded system, and thus the target architecture might have smaller integers than the compiling system.
To dynamically discover the limits, you would have to do it at run-time by bit shifting, multiplying, or adding until an overflow is encountered, but then you have a variable in memory rather than a constant, which would be significantly slower. (This wouldn't be reliable anyways since different architectures use different bit-level representations, and arithmetic sometimes gets a bit funky around the limits especially with signed and abstract number representations such as floats)
Just use standard types and limits as found in stdint.h and limits.h, or try to avoid pushing the limits all together.
First thing that comes to my mind: have you considered using stdint.h? Thanks to that your library will be portable across C99-compliant compilers.
I currently have code that looks like
while (very_long_loop) {
...
y1 = getSomeValue();
...
x1 = y1*cos(PI/2);
x2 = y2*cos(SOME_CONSTANT);
...
outputValues(x1, x2, ...);
}
the obvious optimization would be to compute the cosines ahead-of-time. I could do this by filling an array with the values but I was wondering would it be possible to make the compiler compute these at compile-time?
Edit: I know that C doesn't have compile-time evaluation but I was hoping there would had been some weird and ugly way to do this with macros.
If you're lucky, you won't have to do anything: Modern compilers do constant propagation for functions in the same translation unit and intrinsic functions (which most likely will include the math functions).
Look at the assembly to check if that's the case for your compiler and increase the optimization levels if necessary.
Nope. A pre-computed lookup table would be the only way. In fact, Cosine (and Sine) might even be implemented that way in your libraries.
Profile first, Optimise Later.
No, unfortunately.
I would recommend writing a little program (or script) that generates a list of these values (which you can then #include into the correct place), that is run as part of your build process.
By the way: cos(pi/2) = 0!
You assume that computing cos is more expensive than an access. Perhaps this is not true on your architecture. Thus you should do some testing (profiling) - as always with optimization ideas.
Instead of precomputing these values, it is possible to use global variables to hold the values, which would be computed once on program startup.
No, C doesn't have the concept of compile time evaluation of functions and not even of symbolic constants if they are of type double. The only way to have them as immediate operand would be to precompute them and then to define them in macros. This is the way the C library does it for pi for example.
If you check the code and the compiler is not hoisting the constant values out of the loop, then do so yourself.
If the arguments to the trig functions are constant as in your sample code, then either pre-compute them yourself, or make them static variables so they are only computed once. If they vary between calls, but are constant within the loop then move them to outside the loop. If they vary between iterations of the loop, then a look-up table may be faster, but if that is acceptable accuracy then implementing your own trig functions which halt the calculation at a lower accuracy is also an option.
I am struck with awe by Christoph's answer above.
So nothing needs to be done in this case, where gcc has some knowledge about the math functions. But if you have a function (maybe implemented by you) which cannot be calculated by your C compiler or if your C compiler is not so clever (or you need to fill complicated data structures or some other reason) you can use some higher level language to act as macroprocessor. In the past, I have used eRuby for this purpose, but (ePerl should work very well too and is another obvious readily available and more or less comfortable choice.
You can specify make rules for transforming files with extension .eruby (or .eperl or whatever) to files with that extension stripped out so that, for example, if you write files module.c.eruby or module.h.eruby then make automatically knows how to generate module.c or module.h, respectively, and keeps them up-to-date. In your make rule you can easily add generation of comment that warns editing the file directly.
If you are using Windows or something similar, then I am out of my depths in explaining how to add support for running this transformation automatically for you by your favorite IDE. But I believe it should be possible, or you could just run make outside of your IDE whenever you need to change those .eruby (or whatever) files.
By the way, I have seen that with incredibly small lines of code I have seen eLua implemented to use Lua as a macro language. Of course any other scripting language with support for regular expressions and flexible layout rules should work as well (but Python is malsuited for this purpose due to strict white space rules).