How to avoid FPU when given float numbers? - c

Well, this is not at all an optimization question.
I am writing a (for now) simple Linux kernel module in which I need to find the average of some positions. These positions are stored as floating point (i.e. float) variables. (I am the author of the whole thing, so I can change that, but I'd rather keep the precission of float and not get involved in that if I can avoid it).
Now, these position values are stored (or at least used to) in the kernel simply for storage. One user application writes these data (through shared memory (I am using RTAI, so yes I have shared memory between kernel and user spaces)) and others read from it. I assume read and write from float variables would not use the FPU so this is safe.
By safe, I mean avoiding FPU in the kernel, not to mention some systems may not even have an FPU. I am not going to use kernel_fpu_begin/end, as that likely breaks the real-time-ness of my tasks.
Now in my kernel module, I really don't need much precision (since the positions are averaged anyway), but I would need it up to say 0.001. My question is, how can I portably turn a floating point number to an integer (1000 times the original number) without using the FPU?
I thought about manually extracting the number from the float's bit-pattern, but I'm not sure if it's a good idea as I am not sure how endian-ness affects it, or even if floating points in all architectures are standard.

If you want to tell gcc to use a software floating point library there's apparently a switch for that, albeit perhaps not turnkey in the standard environment:
Using software floating point on x86 linux
In fact, this article suggests that linux kernel and its modules are already compiled with -msoft-float:
http://www.linuxsmiths.com/blog/?p=253
That said, #PaulR's suggestion seems most sensible. And if you offer an API which does whatever conversions you like then I don't see why it's any uglier than anything else.

The SoftFloat software package has the function float32_to_int32 that does exactly what you want (it implements IEEE 754 in software).
In the end it will be useful to have some sort of floating point support in a kernel anyway (be it hardware or software), so including this in your project would most likely be a wise decision. It's not too big either.

Really, I think you should just change your module's API to use data that's already in integer format, if possible. Having floating point types in a kernel-user interface is just a bad idea when you're not allowed to use floating point in kernelspace.
With that said, if you're using single-precision float, it's essentially ALWAYS going to be IEEE 754 single precision, and the endianness should match the integer endianness. As far as I know this is true for all archs Linux supports. With that in mind, just treat them as unsigned 32-bit integers and extract the bits to scale them. I would scale by 1024 rather than 1000 if possible; doing that is really easy. Just start with the mantissa bits (bits 0-22), "or" on bit 23, then right shift if the exponent (after subtracting the bias of 127) is less than 23 and left shift if it's greater than 23. You'll need to handle the cases where the right shift amount is greater than 32 (which C wouldn't allow; you have to just special-case the zero result) or where the left shift is sufficiently large to overflow (in which case you'll probably want to clamp the output).
If you happen to know your values won't exceed a particular range, of course, you might be able to eliminate some of these checks. In fact, if your values never exceed 1 and you can pick the scaling, you could pick it to be 2^23 and then you could just use ((float_bits & 0x7fffff)|0x800000) directly as the value when the exponent is zero, and otherwise right-shift.

You can use rational numbers instead of floats. The operations (multiplication, addition) can be implemented without loss in accuracy too.
If you really only need 1/1000 precision, you can just store x*1000 as a long integer.

Related

Standard guarantees for using floating point arithmetic to represent integer operations

I am working on some code to be run on a very heterogeneous cluster. The program performs interval arithmetic using 3, 4, or 5 32 bit words (unsigned ints) to represent high precision boundaries for the intervals. It seems to me that representing some words in floating point in some situations may produce a speedup. So, my question is two parts:
1) Are there any guarantees in the C11 standard as to what range of integers will be represented exactly, and what range of input pairs would have their products represented exactly? One multiplication error could entirely change the results.
2) Is this even a reasonable approach? It seems that the separation of floating point and integer processing within the processor would allow data to be running through both pipelines simultaneously, improving throughput. I don't know much about hardware though, so I'm not sure that the pipelines for integers and floating points actually are all that separate, or, if they are, if they can be used simultaneously.
I understand that the effectiveness of this sort of thing is platform dependent, but right now I am concerned about the reliability of the approach. If it is reliable, I can benchmark it and see, but I am having trouble proving reliability. Secondly, perhaps this sort of approach shows little promise, and if so I would like to know so I can focus elsewhere.
Thanks!
I don't know about the Standard, but it seems that you can assume all your processors are using the normal IEEE floating point format. In this case, it's pretty easy to determine whether your calculations are correct. The first integer not representable by the 32-bit float format is 16777217 (224+1), so if all your intermediate results are less than that (in absolute value), float will be fine.
The reverse is also true: if any intermediate result is greater than 224 (in absolute value) and odd, float representation will alter it, which is unacceptable for you.
If you are worried specifically about multiplications, look at how the multiplicands are limited. If one is limited by 211, and the other by 213, you will be fine (just barely). If, for example, both are limited by 216, there almost certainly is a problem. To prove it, find a test case that causes their product to exceed 224 and be odd.
All that you need to know to which limits you may go and still have integer precision should be available to you through the macros defined in <float.h>. There you have the exact description of the floating point types, FLT_RADIX for the radix, FLT_MANT_DIG for the number of the digits, etc.
As you say, whether or not such an approach is efficient will depend on the platform. You should be aware that this is much dependent of the particular processor you'd have, not only the processor family. From one Intel or AMD processor variant to another there could already be sensible differences. So you'd basically benchmark all possibilities and have code that decides on program startup which variant to use.

Looking for Ansi C89 arbitrary precision math library

I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.

How to do floating point calculations with integers

I have a coprocessor attached to the main processor. Some floating point calculations needs to be done in the coprocessor, but it does not support hardware floating point instructions, and emulation is too slow.
Now one way is to have the main processor to scale the floating point values so that they can be represented as integers, send them to the co processor, who performs some calculations, and scale back those values on return. However, that wouldn't work most of the time, as the numbers would eventually become too big or small to be out of range of those integers. So my question is, what is the fastest way of doing this properly.
You are saying emulation is too slow. I guess you mean emulation of floating point. The only remaining alternative if scaled integers are not sufficient, is fixed point math but it's not exactly fast either, even though it's much faster than emulated float.
Also, you are never going to escape the fact that with both scaled integers, and fixed point math, you are going to get less dynamic range than with floating point.
However, if your range is known in advance, the fixed point math implementation can be tuned for the range you need.
Here is an article on fixed point. The gist of the trick is deciding how to split the variable, how many bits for the low and high part of the number.
A full implementation of fixed point for C can be found here. (BSD license.) There are others.
In addition to #Amigable Clark Kant's suggestion, Anthony Williams' fixed point math library provides a C++ fixed class that can be use almost interchangeably with float or double and on ARM gives a 5x performance improvement over software floating point. It includes a complete fixed point version of the standard math library including trig and log functions etc. using the CORDIC algorithm.

efficient disk storage of decimal numbers in C (C89)

I am writing functions that serialize/deserialize a large data structure for efficient reloading later on. There is a particular set of decimal numbers for which precision is not a huge deal, and I would like to store them in 4 bytes of binary data.
For most, reading the bytes into a buffer and using memcpy to place them into a float is sufficient, and is the most common solution I've found. However, this is not portable, as floats on the systems this software is meant for are not guaranteed to be 4 bytes in size.
What I would like is something very portable (which is one of the reasons I'm limited to C89). I'm not wedded to 4 byte storage, but it is an attractive option to me. I am pretty wholly against storing the numbers as strings. I'm familiar with endianness issues, and such things are already taken into account.
What I am looking for, therefore, is a system-independent way to store and retrieve floating point numbers in a small amount of binary data (preferably around 4 bytes). I, in my foolishness, imagined this would be the easiest part of this task, since it seems like such a common problem, but popular search engines and various reference books have provided no material assistance.
You could store them in 32 bit IEEE float format (or a very close approximation to it, for instance you might what to restrict denorms and NaNs). Then have each platform adjust as necessary to coerce its own float type to that format and back.
Of course there will be some loss of accuracy, but that's inevitable anyway if you're transferring float values of difference precisions from one system to another.
It should be possible to write portable code to find the closest IEEE value to a native float value, and vice-versa, if that's required. You wouldn't really want to use it, though, because it would probably be far less efficient than code that takes advantage of knowing the float format. In the common case where the platform uses an IEEE representation it's a no-op or a simple narrowing/widening conversion. Even in the worst case you're likely to encounter, as long as it's a binary fraction you basically just have to extract the sign, exponent and significand bits and do the right thing with them (discard bits from the significand if it's too big, adjust the bias and possibly the width of the exponent, do the right thing with underflow and overflow).
If you want to avoid losing accuracy in the case where the file is saved and then reloaded on the same system (but that system doesn't use 32bit IEEE), you could look at storing some data indicating the format in the file (size of each value, number of bits of significand and exponent), then store each value at native precision, so that it only gets rounded if it's ever loaded onto a less-precise system. I don't know whether ASN.1 has a standard to encode floating-point values along these lines, but it's the kind of complicated trickery I'd expect from it.
Check this out:http://steve.hollasch.net/cgindex/coding/portfloat.html
They give a routine which is portable and doesnt add too much overhead.

Fortran/C Interlanguage problems: results differ in the 14th digit

I have to use C and Fortran together to do some simulations. In their course I use the same memory in both programming language parts, by defining a pointer in C to access memory allocated by Fortran.
The datatype of the problematic variable is
real(kind=8)
for Fortran, and
double
for C. The results of the same calculations now differ in the respective programming languages, and I need to directly compare them and get a zero. All calculations are done only with the above accuracies. The difference is always in the 13-14th digit.
What would be a good way to resolve this? Any compiler-flags? Just cut-off after some digits?
Many thanks!
Floating point is not perfectly accurate. Ever. Even cos(x) == cos(y) can be false if x == y.
So when doing your comparisons, take this into account, and allow the values to differ by some small epsilon value.
This is a problem with the inaccuracy with floating point numbers - they will be inaccurate and a certain place. You usually compare them either by rounding them to a digit that you know will be in the accurate area, or by providing an epsilon of appropiate value (small enough to not impact further calculations, and big enough to take care of the inaccuracy while comparing).
One thing you might check is to be sure that the FPU control word is the same in both cases. If it is set to 53-bit precision in one case and 64-bit in the other, it would likely produce different results. You can use the instructions fstcw and fldcw to read and load the control word value. Nonetheless, as others have mentioned, you should not depend on the accuracy being identical even if you can make it work in one situation.
Perfect portability is very difficult to achieve in floating point operations. Changing the order of the machine instructions might change the rounding. One compiler might keep values in registers, while another copy it to memory, which can change the precision. Currently the Fortran and C languages allow a certain amount of latitude. The IEEE module of Fortran 2008, when implemented, will allow requiring more specific and therefore more portable floating point computations.
Since you are compiling for an x86 architecture, it's likely that one of the compilers is maintaining intermediate values in floating point registers, which are 80 bits as opposed to the 64 bits of a C double.
For GCC, you can supply the -ffloat-store option to inhibit this optimisation. You may also need to change the code to explicitly store some intermediate results in double variables. Some experimentation is likely in order.

Resources