Is it safe to compile programs using cryptography with -Ofast? - c

I'm building a toy cracking program for self teaching purposes in C. I want the brute forcing to run as fast as possible, and one of the considerations there is naturally compiler optimizations. Presumably, cryptographic implementations would break or have their results thrown off by forgoing floating point precision, but I tested enabling -Ofast (on gcc) with my current script and the final hash output from a long series of cryptographic functions remains the same as with just -O3.
I understand though that this isn't necessarily conclusive as there's a lot that can be going on under the hood with modern compilers, so my question is, will enabling -Ofast on my crypto cracking script potentially throw off the results of my crypto functions?

-Ofast does this:
Disregard strict standards compliance. -Ofast enables all -O3
optimizations. It also enables optimizations that are not valid for
all standard-compliant programs. It turns on -ffast-math and the
Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is
specified, and -fno-protect-parens.
-ffast-math turns on a bunch of other flags, but none of them matter unless you're using floating-point arithmetic, which no hash function I'm aware of does.
-fstack-arrays and -fno-protect-parens don't do anything at all unless you're using Fortran.


Forcing automatic vectorization with GCC

Here my very simple question. With ICC I know it is possible to use #pragma SIMD to force vectorization of loops that the compiler chooses not to vectorize. Is there something analogous in GCC? Or, is there any plan to add this feature in a future release?
Quite related, what about forcing vectorization with Graphite?
As long as gcc is allowed to use SSE/SSE2/etc instructions, the compiler will in general produce vector instructions when it realizes that it's "worthwhile". Like most things in compilers, this requires some luck/planning/care from the programmer to avoid the compiler thinking "maybe this isn't safe" or "this is too complicated, I can't figure out what's going on". But quite often, it's successful if you are using a reasonably modern version of gcc (4.x versions should all do this).
You can make the compiler use SSE or SSE2 instructions by adding -msse or -msse2 (etc. for later SSE extensions). -msse2 is default in x86-64.
I'm not aware of any way that you can FORCE this, however. The compiler will either do this because it's happy that it's a good solution, or it wont.
Sorry, can't answer about Graphite.

Secure gcc optimization options for numerics

Which gcc compiler options may be safely used for numerical programming?
The easy way to turn on optimizations for gcc is to add -0# to the compiler options. It is tempting to say -O3. However I know that -O3 includes optimization which are non-save in the sense that results of numerical computations may differ once this option is included. Small changes in the result may be insignificant if the algorithm is stable. On the other hand, precision can be an issue for certain math operations, so math optimization can have significant impact.
I find it inconvenient to take compiler dependent issues into account in the process of debugging. I.e. I don't want to wonder whether minor changes in the code will lead to strongly different behavior because the compiler changed its optimizations internally.
Which options are safe to add if I want deterministic--and hence controllable--behavior in my code? Which are almost safe, that is, which options induce only minor uncertainties compared to performance benefits?
I think of options like: -finline -finline-limit=2000 which inlines functions even if they are long.
It is not true that -O3 includes numerically unsafe optimizations. According to the manual, -O3 includes the following optimization passes in comparison to -O2:
-finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-cp-clone
You might be referring to -ffast-math, turned on by default with -Ofast, but not with -O3:
-ffast-math Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. This option causes the preprocessor macro __FAST_MATH__ to be defined.
This option is not turned on by any -O option besides -Ofast since it
can result in incorrect output for programs that depend on an exact
implementation of IEEE or ISO rules/specifications for math functions.
It may, however, yield faster code for programs that do not require
the guarantees of these specifications.
In other words, all of -O, -O2, and -O3 are safe for numeric programming.

Enabling strict floating point mode in GCC

I haven't yet created a program to see whether GCC will need it passed, When I do I'd like to know how I'd go about enabling strict floating point mode which will allow reproducible results between runs and computers, Thanks.
Compiling with -msse2 on an Intel/AMD processor that supports it will get you almost there. Do not let any library put the FPU in FTZ/DNZ mode, and you will be mostly set (processor bugs notwithstanding).
For other architectures, the answer would be different. Those achitectures that do not offer any convenient way to get exact IEEE 754 semantics (for instance, pre-SSE2 IA32 CPUs) would require the use of a floating-point emulation library to get the result you want, at a very high performance penalty.
If your target architecture supports the fmadd (multiplication and addition without intermediate rounding) instruction, make sure your compiler does not use it when you have explicit multiplications and additions in the source code. GCC is not supposed to do this unless you use the -ffast-math option.
If you use -ffloat-store and always store intermediate values to variables or apply (explicit) casts to the desired type/precision, you should be at least 90% to your goal, and maybe more. I'd welcome comments on whether there are cases this approach still misses. Note that I claim this works even without any SSE options.
You can also use GCC's option -mpc64 on i386 / ia32 target to force double precision computation even on x87 FPU. See GCC manual.
You can also modify the x87 FPU behavor at runtime, see Deterministic cross-platform floating point arithmetics and also An Introduction to GCC.

Disable vectorized looping in FORTRAN?

Is it possible to bypass loop vectorization in FORTRAN? I'm writing to F77 standards for a particular project, but the GNU gfortran compiles up through modern FORTRANs, such as F95. Does anyone know if certain FORTRAN standards avoided loop vectorization or if there are any flags/options in gfortran to turn this off?
UPDATE: So, I think the final solution to my specific problem has to "DO" with the FORTRAN DO loops not allowing the updating of the iteration variable. Mention of this can be found in #High Performance Mark's reply on this related thread... Loop vectorization and how to avoid it
[Into the FORT, RAN the noobs for shelter.]
The Fortran standards are generally silent on how the language is to be implemented, leaving that to the compiler writers who are in a better position to determine the best, or good (and bad) options for implementation of the language's various features on whatever chip architecture(s) they are writing for.
What do you mean when you write that you want to bypass loop vectorisation ? And in the next sentence suggest that this would be unavailable to FORTRAN77 programs ? It is perfectly normal for a compiler for a modern CPU to generate vector instructions if the CPU is capable of obeying them. This is true whatever version of the language the program is written in.
If you really don't want to generate vector instructions then you'll have to examine the gfortran documentation carefully -- it's not a compiler I use so I can't point you to specific options or flags. You might want to look at its capabilities for architecture-specific code generation, paying particular attention to SSE level.
You might be able to coerce the compiler into not vectorising loops if all your loops are explicit (so no whole-array operations) and if you make your code hard to vectorise in other ways (dependencies between loop iterations for example). But a good modern compiler, without interference, is going to try its damndest to vectorise loops for your own good.
It seems rather perverse to me to try to force the compiler to go against its nature, perhaps you could explain why you want to do that in more detail.
As High Performance Mark wrote, the compiler is free to select machine instructions to implement your source code as long as the results follow the rules of the language. You should not be able to observe any difference in the output values as a result of loop vectorization ... you code should run faster. So why do you care?
Sometimes differences can be observed across optimization levels, e.g., on some architectures registers have extra precision.
The place to look for these sorts of compiler optimizations is the gcc manual. They are located there since they are common across the gcc compiler suite.
With most modern compilers, the command-line option -O0 should turn off all optimisations, including loop vectorisation.
I have sometimes found that this causes bugs to apparently disappear. However usually this means that there is something wrong with my code so if this sort of thing is happening to you then you have almost certainly written a buggy program.
It is theoretically possible but much less likely that there is a bug in the compiler, you can easily check this by compiling your code in another fortran compiler. (e.g. gfortran or g95).
gfortran doesn't auto-vectorize unless you have set -O3 or -ftree-vectorize. So it's easy to avoid vectorization. You will probably need to read (skim) the gcc manual as well as the gfortran one.
Auto-vectorization has been a well-known feature of Fortran compilers for over 35 years, and even the Fortran 77 definition of DO loops was set with this in mind (and also in view of some known non-portable abuses of F66 standard). You could not count on turning off vectorization as a way of making incorrect code work, although it might expose symptoms of incorrect code.

safe, fast CFLAGS for mex functions in matlab

I am converting a number of low-level operations from native matlab code into C/mex code, with great speedups. (These low-level operations can be done vectorized in .m code, but I think I get memory hits b/c of large data. whatever.) I have noticed that compiling the mex code with different CFLAGS can cause mild improvements. For example CFLAGS = -O3 -ffast-math does indeed give some speedups, at the cost of mild numerical inaccuracy.
My question: what are the "best" CFLAGS to use, without incurring too many other side effects? It seems that, at the very least that
CFLAGS = -O3 -fno-math-errno -fno-unsafe-math-optimizations -fno-trapping-math -fno-signaling-nans are all OK. I'm not sure about -funroll-loops.
also, how would you optimize the set of CFLAGS used, semi-automatically, without going nuts?
If you know the target CPU...or are at least willing to guarantee a "minimum" should definitely look into -mcpu and -march.
The performance gain can be significant.
Whatever ATLAS uses on your machine ( is probably a good starting point. I don't know that ATLAS automatically optimizes specific compiler flags, but the developers have probably spent a fair amount of time doing so by hand.
