I'm trying to implement a support for double and float and corresponding basic arithmetic on a CPU without an FPU.
I know that it is possible on all AVR ATmega controllers. An ATmega also has no FPU. So here comes the question: How does it work? If there any suggestions for literature or links with explanations and examples?
At the best case I will provide a support for code like this:
double twice ( double x )
{
return x*x;
}
Many thanks in advance,
Alex
Here are AVR related links with explanations and examples for implementing soft double:
You will find one double floating point lib here.
Another one can be found it in the last message here.
Double is very slow, so if speed is in concern, you might opt for fixed point math. Just read my messages in this thread:
This post may be interesting for you: Floating point calculations in a processor with no FPU
As stated:
Your compiler may provide support, or you may need to roll your own.
There are freely-available implementations, too.
If it's for an ATmega, you probably don't have to write anything yourself. All the already available libraries are probably already optimized much further than you possible can do yourself. If you need more performance, you could consider to convert the floating points to fixed points. You should consider this anyway. If you can get the job done in fixed point, you should stay away from floating point.
Avr uses AVR Libc, which you can download and examine.
There is a math library, but that is not what you are looking for. That contains the standard functions defined in math.h.
Instead, you need functions that perform multiplication and things like like that. These are also in Libc, under fplib and written in assembly language. But the user doesn't call them directly. Instead, when the compiler comes across a multiplication involving floats, the compiler will choose the correct function to insert.
You can browse through the AVR fplib to get an idea of what to do, but you are going to have to write your own assembly or bit-twiddling code for your processor.
You need to find out what standard your processor and language are using for floating point. IEEE, perhaps? And you'll also need to know if the system is little-endian or big-endian.
I am assuming you system doesn't have a C compiler already. Otherwise, all the floating point operations would already be implemented. That "twice()" function (actually square()) would work just fine as it is.
Related
I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.
I am writing a massively parallel GPU application using CUDA. I have been optimizing it by hand. I received a 20% performance increase with __fdividef_(x, y), and according to The Cuda C Programming Guide (section C.2.1), using similar functions for multiplication and adding is also beneficial.
The function is stated as this: __fmul_[rn,rz,ru,rd](x,y).
__fdividef(x,y) was not stated with the arguments in brackets. I was wondering, what are those brackets?
If I run the simple code:
int t = __fmul_(5,4);
I get a compiler error about how __fmul_ is undefined. I have the CUDA runtime included, so I don't think it is a setup thing; rather it is something to do with those square brackets. How do I correctly use this function? Thank you.
EDIT: I should clarify, the compiler is the CUDA-compiler NVCC.
You should specify rounding mode with ru (rounding up) or rd (rounding down). There is no function __fmul_ but available function signatures are __fmul_rd or __fmul_ru.
CUDA Programming Guide explains the suffixes:
_rd: round down.
_rn: round to nearest even.
_ru: round up.
_rz: round towards zero.
See CUDA's Single Precision Intrinsics documentation for details on these functions.
Just curious how are these implemented. I can't see where I would start. Do they work directly on the float's/double's bits?
Also where can I find the source code of functions from math.h? All I find are either headers with prototypes or files with functions that call other functions from somewhere else.
EDIT: Part of the message was lost after editing the title. What I meant in particular were the ceil() and floor() functions.
If you're interested in seeing source code for algorithms for this kind of thing, then fdlibm - the "Freely Distributable libm", originally from Sun, and the reference implementation for Java's math libraries - might be a good place to start. (For casual browsing, it's certainly a better place to start than GNU libc, where the pieces are scattered around various subdirectories - math/, sysdeps/ieee754/, etc.)
fdlibm assumes that it's working with an IEEE 754 format double, and if you look at the implementations - for example, the core of the implementation of log() - you'll see that they use all sorts of clever tricks, often using a mixture of both standard double arithmetic, and knowledge of the bit representation of a double.
(And if you're interested in algorithms for supporting basic IEEE 754 floating point arithmetic, such as might be used for processors without hardware floating point support, take a look at John R. Hauser's SoftFloat.)
As for your edit: in general, ceil() and floor() might well be implemented in hardware; for example, on x86, GCC (with optimisations enabled) generates code using the frndint instruction with appropriate fiddling of the FPU control word to set the rounding mode. But fdlibm's pure software implementations (s_ceil.c, s_floor.c) do work using the bit representation directly.
math.h is part of the Standard C Library.
If you are interested in source code, the GNU C Library (glibc) is available to inspect.
EDIT TO ADD:
As others have said, math functions are typically implemented at the hardware level.
Math functions like addition and division are almost always implemented by machine instructions. The exceptions are mostly small processors, like the 8048 family, which use a library to implement the functions for which there are not a simple machine instruction sequence to compute.
Math functions like sin(), sqrt(), log(), etc. are almost always implemented in the runtime library. A few rare CPUs, like the Cray, have a square root instruction.
Tell us which particular implementation (gcc, MSVC, etc./Mac, Linux, etc.) you are using and someone will direct you precisely where to look.
On many platforms (such as any modern x86-compatible), many of the maths functions are implemented directly in the floating-point hardware (see for example http://en.wikipedia.org/wiki/X86_instruction_listings#x87_floating-point_instructions). Not all of them are used, though (as I learnt through comments to other answers here). But, for instance, the sqrt library function is often implemented directly in terms of the SSE hardware instruction.
For some ideas on how the underlying algorithms work, you might try reading Numerical Recipes, which is available as PDFs somewhere.
A lot of it is done on processors these days.
The chip I cut my teeth on didn't even have a multiply instruction (z80)
We had to approximate stuff using the concept of the Taylor Series.
About 1/2 the way down this page, you can see how sin and cos are approximated.
While modern CPU do have hardware implementation of common transcendantal functions like sin, cos, etc..., it is rarely used as is. It may be for portability reason, speed, accuracy, etc... Instead, approximation algorithms are used.
I am having some trouble with IEEE floating point rules preventing compiler optimizations that seem obvious. For example,
char foo(float x) {
if (x == x)
return 1;
else
return 0;
}
cannot be optimized to just return 1 because NaN == NaN is false. Okay, fine, I guess.
However, I want to write such that the optimizer can actually fix stuff up for me. Are there mathematical identities that hold for all floats? For example, I would be willing to write !(x - x) if it meant the compiler could assume that it held all the time (though that also isn't the case).
I see some reference to such identities on the web, for example here, but I haven't found any organized information, including in a light scan of the IEEE 754 standard.
It'd also be fine if I could get the optimizer to assume isnormal(x) without generating additional code (in gcc or clang).
Clearly I'm not actually going to write (x == x) in my source code, but I have a function that's designed for inlining. The function may be declared as foo(float x, float y), but often x is 0, or y is 0, or x and y are both z, etc. The floats represent onscreen geometric coordinates. These are all cases where if I were coding by hand without use of the function I'd never distinguish between 0 and (x - x), I'd just hand-optimize stupid stuff away. So, I really don't care about the IEEE rules in what the compiler does after inlining my function, and I'd just as soon have the compiler ignore them. Rounding differences are also not very important since we're basically doing onscreen drawing.
I don't think -ffast-math is an option for me, because the function appears in a header file, and it is not appropriate that the .c files that use the function compile with -ffast-math.
Another reference that might be of some use for you is a really nice article on floating-point optimization in Game Programming Gems volume 2, by Yossarian King. You can read the article here. It discusses the IEEE format in quite detail, taking into account implementations and architecture, and provides many optimization tricks.
I think that you are always going to struggle to make computer floating-point-number arithmetic behave like mathematical real-number arithmetic, and suggest that you don't for any reason. I suggest that you are making a type error trying to compare the equality of 2 fp numbers. Since fp numbers are, in the overwhelming majority, approximations, you should accept this and use approximate-equality as your test.
Computer integers exist for equality testing of numerical values.
Well, that's what I think, you go ahead and fight the machine (well, all the machines actually) if you wish.
Now, to answer some parts of your question:
-- for every mathematical identity you are familiar with from real-number arithmetic, there are counter examples in the domain of floating-point numbers, whether IEEE or otherwise;
-- 'clever' programming almost always makes it more difficult for a compiler to optimise code than straightforward programming;
-- it seems that you are doing some graphics programming: in the end the coordinates of points in your conceptual space are going to be mapped to pixels on a screen; pixels always have integer coordinates; your translation from conceptual space to screen space defines your approximate-equality function
Regards
Mark
If you can assume that floating-point numbers used in this module will not be Inf/NaN, you can compile it with -ffinite-math-only (in GCC). This may "improve" the codegen for examples like the one you posted.
You could compare for bitwise equality. Although you might get bitten for some values that are equivalent but bitwise different, it will catch all those cases where you have a true equality as you mentioned. And I am not sure the compiler will recognize what you do and remove it when inlining (which I believe is what you are after), but that can easily be checked.
What happened when you tried it the obvious way and profiled it? or examined the generated asm?
If the function is inlined with values known at the call site, the optimizer has this information available. For example: foo(0, y).
You may be surprised at the work you don't have to do, but at the very least profiling or looking at what the compiler actually does with the code will give you more information and help you figure out where to proceed next.
That said, if you know certain things that the optimizer can't figure out itself, you can write multiple versions of the function, and specify the one you want to call. This is something of a hassle, but at least with inline functions they will all be specified together in one header. It's also quite a bit easier than the next step, which is using inline asm to do exactly what you want.
1) I've got many constants in my C algo.
2) my code works both in floating-point and fixed-point.
Right now, these constants are initialized by a function, float2fixed, whereby in floating-point it does nothing, while in fixed-point, it finds their fixed-point representation. For instance, 0.5f stays 0.5f if working in floating-point, whereas it uses the pow() routine and becomes 32768 if working in fixed-point and the fixed-point representation is Qx.16.
That's easy to maintain, but it takes a lot of time actually to compute these constants in fixed-point (pow is a floatin-point function). In C++, I'd use some meta-programming, so the compiler computes these values at compile-time, so there's no hit at run-time. But in C, thats not possible. Or is it? Anybody knows of such a trick? Is any compiler clever enough to do that?
Looking forward to any answers.
A
Rather than using (unsigned)(x*pow(2,16)) to do your fixed point conversion, write it as (unsigned)(0.5f * (1 << 16))
This should be an acceptable as a compile-time constant expression since it involves only builtin operators.
When using fixed-point, can you write a program that takes your floating point values and converts them into correct, constant initializers for the fixed point type, so you effectively add a step to the compilation that generates the fixed point values.
One advantage of this will be that you can then define and declare your constants with const so that they won't change at run-time - whereas with the initialization functions, of course, the values have to be modifiable because they are calculated once.
I mean write a simple program that can scan for formulaic lines that might read:
const double somename = 3.14159;
it would read that and generate:
const fixedpoint_t somename = { ...whatever is needed... };
You design the operation to make it easy to manage for both notations - so maybe your converter always reads the file and sometimes rewrites it.
datafile.c: datafile.constants converter
converter datafile.constants > datafile.c
In plain C, there's not much you can do. You need to do the conversion at some point, and the compiler doesn't give you any access to call interesting user-provided functions at compile time. Theoretically, you could try to coax the preprocessor to do it for you, but that's the quick road to total insanity (i.e. you'd have to implement pow() in macros, which is pretty hideous).
Some options I can think of:
Maintain a persistent cache on disk. At least then it'd only be slow once, though you still have to load it, make sure it's not corrupt, etc.
As mentioned in another comment, use template metaprogramming anyway and compile with a C++ compiler. Most C works just fine (arguably better) with a C++ compiler.
Hmm, I guess that's about all I can think of. Good luck.
Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimisations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do, and it would be hard to know if it's optimizing a given instance without going and looking at the assembly. But it might be worth checking out. Here's a link to the description in the changelog