I am trying to understand cross compilation. Many cross compilers mention gnueabihf. I was able to understand what EABI stands for, but I couldn't find anything about the hf suffix. What does it mean?
Thank you!
hf means Hard Float.
When you use this flag, during compilation, your compiler will compile your program so that it will use the Floating Point Unit once launched.
Your program will run faster but only if you use floating points!
Be carreful not to use this flag if your CPU doesn't support it, or it will not start.
EDIT:
Also, if you use Soft Float it will compute every floats "softwarely".
Related
I'm trying to implement a support for double and float and corresponding basic arithmetic on a CPU without an FPU.
I know that it is possible on all AVR ATmega controllers. An ATmega also has no FPU. So here comes the question: How does it work? If there any suggestions for literature or links with explanations and examples?
At the best case I will provide a support for code like this:
double twice ( double x )
{
return x*x;
}
Many thanks in advance,
Alex
Here are AVR related links with explanations and examples for implementing soft double:
You will find one double floating point lib here.
Another one can be found it in the last message here.
Double is very slow, so if speed is in concern, you might opt for fixed point math. Just read my messages in this thread:
This post may be interesting for you: Floating point calculations in a processor with no FPU
As stated:
Your compiler may provide support, or you may need to roll your own.
There are freely-available implementations, too.
If it's for an ATmega, you probably don't have to write anything yourself. All the already available libraries are probably already optimized much further than you possible can do yourself. If you need more performance, you could consider to convert the floating points to fixed points. You should consider this anyway. If you can get the job done in fixed point, you should stay away from floating point.
Avr uses AVR Libc, which you can download and examine.
There is a math library, but that is not what you are looking for. That contains the standard functions defined in math.h.
Instead, you need functions that perform multiplication and things like like that. These are also in Libc, under fplib and written in assembly language. But the user doesn't call them directly. Instead, when the compiler comes across a multiplication involving floats, the compiler will choose the correct function to insert.
You can browse through the AVR fplib to get an idea of what to do, but you are going to have to write your own assembly or bit-twiddling code for your processor.
You need to find out what standard your processor and language are using for floating point. IEEE, perhaps? And you'll also need to know if the system is little-endian or big-endian.
I am assuming you system doesn't have a C compiler already. Otherwise, all the floating point operations would already be implemented. That "twice()" function (actually square()) would work just fine as it is.
Link to MIRACL crypto library by CertiVox
Following the instructions in fastgf2m.txt, I've been able to get everything to compile. However, after execution, the benchmark (bmark.exe) program halts when evaluating curves over GF(2^m) with error, "This is not a point on the curve!"
I am able to get everything to work without the optimization but I'm unsure where the problem exists. I haven't modified any curve parameters and followed instructions in the distribution. I'm compiling on 64-bit Windows 8.1, on an Intel i7-3520M.
If anyone has any advice on how to correct this, it would be greatly appreciated.
Thanks!!
The method outlined in fastgf2m.txt is for generating unrolled code associated with a fixed m value determined at compile time. The bmark program changes m at runtime, and so the unrolled code will often not be correct in this case. The documentation could be clearer on this point.
Also make sure your processor does support the PCLMULQDQ instruction - many older processors will not.
It might be better to test the method on the ecsgen2/ecssign2/ecsver2 programs to implement ECDSA over GF(2^283) for example.
I am writing a massively parallel GPU application using CUDA. I have been optimizing it by hand. I received a 20% performance increase with __fdividef_(x, y), and according to The Cuda C Programming Guide (section C.2.1), using similar functions for multiplication and adding is also beneficial.
The function is stated as this: __fmul_[rn,rz,ru,rd](x,y).
__fdividef(x,y) was not stated with the arguments in brackets. I was wondering, what are those brackets?
If I run the simple code:
int t = __fmul_(5,4);
I get a compiler error about how __fmul_ is undefined. I have the CUDA runtime included, so I don't think it is a setup thing; rather it is something to do with those square brackets. How do I correctly use this function? Thank you.
EDIT: I should clarify, the compiler is the CUDA-compiler NVCC.
You should specify rounding mode with ru (rounding up) or rd (rounding down). There is no function __fmul_ but available function signatures are __fmul_rd or __fmul_ru.
CUDA Programming Guide explains the suffixes:
_rd: round down.
_rn: round to nearest even.
_ru: round up.
_rz: round towards zero.
See CUDA's Single Precision Intrinsics documentation for details on these functions.
1) I've got many constants in my C algo.
2) my code works both in floating-point and fixed-point.
Right now, these constants are initialized by a function, float2fixed, whereby in floating-point it does nothing, while in fixed-point, it finds their fixed-point representation. For instance, 0.5f stays 0.5f if working in floating-point, whereas it uses the pow() routine and becomes 32768 if working in fixed-point and the fixed-point representation is Qx.16.
That's easy to maintain, but it takes a lot of time actually to compute these constants in fixed-point (pow is a floatin-point function). In C++, I'd use some meta-programming, so the compiler computes these values at compile-time, so there's no hit at run-time. But in C, thats not possible. Or is it? Anybody knows of such a trick? Is any compiler clever enough to do that?
Looking forward to any answers.
A
Rather than using (unsigned)(x*pow(2,16)) to do your fixed point conversion, write it as (unsigned)(0.5f * (1 << 16))
This should be an acceptable as a compile-time constant expression since it involves only builtin operators.
When using fixed-point, can you write a program that takes your floating point values and converts them into correct, constant initializers for the fixed point type, so you effectively add a step to the compilation that generates the fixed point values.
One advantage of this will be that you can then define and declare your constants with const so that they won't change at run-time - whereas with the initialization functions, of course, the values have to be modifiable because they are calculated once.
I mean write a simple program that can scan for formulaic lines that might read:
const double somename = 3.14159;
it would read that and generate:
const fixedpoint_t somename = { ...whatever is needed... };
You design the operation to make it easy to manage for both notations - so maybe your converter always reads the file and sometimes rewrites it.
datafile.c: datafile.constants converter
converter datafile.constants > datafile.c
In plain C, there's not much you can do. You need to do the conversion at some point, and the compiler doesn't give you any access to call interesting user-provided functions at compile time. Theoretically, you could try to coax the preprocessor to do it for you, but that's the quick road to total insanity (i.e. you'd have to implement pow() in macros, which is pretty hideous).
Some options I can think of:
Maintain a persistent cache on disk. At least then it'd only be slow once, though you still have to load it, make sure it's not corrupt, etc.
As mentioned in another comment, use template metaprogramming anyway and compile with a C++ compiler. Most C works just fine (arguably better) with a C++ compiler.
Hmm, I guess that's about all I can think of. Good luck.
Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimisations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do, and it would be hard to know if it's optimizing a given instance without going and looking at the assembly. But it might be worth checking out. Here's a link to the description in the changelog
I have an application that was developed for Linux x86 32 bits. There are lots of floating-point operations and a lot of tests depending on the results. Now we are porting it to x86_64, but the test results are different in this architecture. We don't want to keep a separate set of results for each architecture.
According to the article An Introduction to GCC - for the GNU compilers gcc and g++ the problem is that GCC in X86_64 assumes fpmath=sse while x86 assumes fpmath=387. The 387 FPU uses 80 bit internal precision for all operations and only convert the result to a given floating-point type (float, double or long double) while SSE uses the type of the operands to determine its internal precision.
I can force -mfpmath=387 when compiling my own code and all my operations work correctly, but whenever I call some library function (sin, cos, atan2, etc.) the results are wrong again. I assume it's because libm was compiled without the fpmath override.
I tried to build libm myself (glibc) using 387 emulation, but it caused a lot of crashes all around (don't know if I did something wrong).
Is there a way to force all code in a process to use the 387 emulation in x86_64? Or maybe some library that returns the same values as libm does on both architectures? Any suggestions?
Regarding the question of "Do you need the 80 bit precision", I have to say that this is not a problem for an individual operation. In this simple case the difference is really small and makes no difference. When compounding a lot of operations, though, the error propagates and the difference in the final result is not so small any more and makes a difference. So I guess I need the 80 bit precision.
I'd say you need to fix your tests. You're generally setting yourself up for disappointment if you assume floating point math to be accurate. Instead of testing for exact equality, test whether it's close enough to the expected result. What you've found isn't a bug, after all, so if your tests report errors, the tests are wrong. ;)
As you've found out, every library you rely on is going to assume SSE floating point, so unless you plan to compile everything manually, now and forever, just so you can set the FP mode to x87, you're better off dealing with the problem now, and just accepting that FP math is not 100% accurate, and will not in general yield the same result on two different platforms. (I believe AMD CPU's yield slightly different results in x87 math as well).
Do you absolutely need 80-bit precision? (If so, there obviously aren't many alternatives, other than to compile everything yourself to use 80-bit FP.)
Otherwise, adjust your tests to perform comparisons and equality tests within some small epsilon. If the difference is smaller than that epsilon, the values are considered equal.
80 bit precision is actually dangerous. The problem is that it is actually preserved as long as the variable is stored in the CPU register. Whenever it is forced out to RAM, it is truncated to the type precision. So you can have a variable actually change its value even though nothing happened to it in the code.
If you want long double precision, use long double for all of your floating point variables, rather than expecting float or double to have extra magic precision. This is really a no-brainer.
SSE floating point and 387 floating point use entirely different instructions, and so there's no way to convince SSE fp instructions to use the 387. Probably the best way to deal with this is resign your test suite to getting slightly different results, and not depend on results being the same to the last bit.