What is the best way to fmod a decimal number? - c

My problem is getting most accurate result from a mod calculation, I'm getting a remainder answer to do another rounding calculation, so I do need a accurate result to do so.
double a = 0.12345678...(may with many digits);
double b = fmod(a, 0.01);
the result b may be inaccurate deal with the binary storing issue.
Do I have to consider using float to increase the accuracy.
Or I just move the digit from decimal point to integer
double a = 12345678.0;
thanks

First, any serious implementation of fmod will answer the floating point nearest to the remainder in single/double/whatever precision as if the division were performed with infinite precision.
(NOTE: rephrased thanks to #EricPostpischil)
Though, that's well too late. The binary floating point internal representation of 0.01 does not represent 1/100 exactly as you already seem to know.
Let's examine how the error cumulates.
You want to know the remainder of a division, say a % b = c.
You have inexact representations a1 and b1, and you know an error bound for these representations: a1=a+ea1, abs(ea1) < ea, b1=b+eb1, abs(eb1) < eb.
What can you say about a1 % b1 = c1 (the exact operation), c1=c+ec1 that is about error bound abs(ec1) < ec?
a = q * b + c.
a1 = q1 * b1 + c1.
a+ea1 = (q+eq1)*(b+eb1) + (c+ec1).
ea1 = eq1*(b+eb1) + q*eb1 + ec1.
ec1 = ea1 - q*eb1 - eq1*(b+eb1).
ec >= max( ea , abs(q)*eb , eq*abs(b) , eq*eb).
ec <= ea + abs(q)*eb + eq*abs(b) + eq*eb.
You can control ea and abs(q)*eb by increasing precision of representation (single, double, extended, quadruple, arbitrary precision...).
But the important term in this equality is eq*abs(b), because if quotient can be off by one, then the bound of error is ec > b !
And of course, quotient can be off by one, such cases is extremely easy to construct.
Take c=0 and a1 a representation off a by default (ea1<0) or b1 a representation off b by excess (eb1>0) and you're done, you get eq1 = -1 even for small quotient and accurate precision.
Don't think that carefully controlling rounding modes such as to obtain ea1 > 0 (excess) and eb1 <= 0 (default) would protect you in all cases, since we can construct the inverse case where
b - smallValue < c < b
Don't try remainder a variant of fmod that rounds the quotient rather than truncate, that will just move the problem near perfect tie (when the exact division a/b is a multiple of 1/2).
With a carefull analysis of error bounds, you could answer an estimate of ec and identify the bad cases of potentially incorrect rounding of quotient q (when a1/b1 is near a whole int), or abs(q)*eb reaches 1, or ea>=b.
In bad cases, you could arrange to raise an exception, and restart producing a1 and b1 with increased precision, but in edge case c=0, there is no guaranty of convergence, even with arbitrary precision.

If I do understand your question correctly , you want the results of fmod in double. As described by Pascal Couq in comments that fmod prototype is double fmod(double x,double y); you can do it like this:
#include<stdio.h>
#include<math.h>
int main()
{
double a = 12.1649232848373633242;
double b = 1.234;
double c;
setbuf(stdout,NULL);
c = fmod(a,b);
printf("%.13f",c);//.13 in the format specifiers here describes the number of decimal places upto which you want to get the value .
return 0;
}

Related

Underflow error in floating point arithmetic in C

I am new to C, and my task is to create a function
f(x) = sqrt[(x^2)+1]-1
that can handle very large numbers and very small numbers. I am submitting my script on an online interface that checks my answers.
For very large numbers I simplify the expression to:
f(x) = x-1
By just using the highest power. This was the correct answer.
The same logic does not work for smaller numbers. For small numbers (on the order of 1e-7), they are very quickly truncated to zero, even before they are squared. I suspect that this has to do with floating point precision in C. In my textbook, it says that the float type has smallest possible value of 1.17549e-38, with 6 digit precision. So although 1e-7 is much larger than 1.17e-38, it has a higher precision, and is therefore rounded to zero. This is my guess, correct me if I'm wrong.
As a solution, I am thinking that I should convert x to a long double when x < 1e-6. However when I do this, I still get the same error. Any ideas? Let me know if I can clarify. Code below:
#include <math.h>
#include <stdio.h>
double feval(double x) {
/* Insert your code here */
if (x > 1e299)
{;
return x-1;
}
if (x < 1e-6)
{
long double g;
g = x;
printf("x = %Lf\n", g);
long double a;
a = pow(x,2);
printf("x squared = %Lf\n", a);
return sqrt(g*g+1.)- 1.;
}
else
{
printf("x = %f\n", x);
printf("Used third \n");
return sqrt(pow(x,2)+1.)-1;
}
}
int main(void)
{
double x;
printf("Input: ");
scanf("%lf", &x);
double b;
b = feval(x);
printf("%f\n", b);
return 0;
}
For small inputs, you're getting truncation error when you do 1+x^2. If x=1e-7f, x*x will happily fit into a 32 bit floating point number (with a little bit of error due to the fact that 1e-7 does not have an exact floating point representation, but x*x will be so much smaller than 1 that floating point precision will not be sufficient to represent 1+x*x.
It would be more appropriate to do a Taylor expansion of sqrt(1+x^2), which to lowest order would be
sqrt(1+x^2) = 1 + 0.5*x^2 + O(x^4)
Then, you could write your result as
sqrt(1+x^2)-1 = 0.5*x^2 + O(x^4),
avoiding the scenario where you add a very small number to 1.
As a side note, you should not use pow for integer powers. For x^2, you should just do x*x. Arbitrary integer powers are a little trickier to do efficiently; the GNU scientific library for example has a function for efficiently computing arbitrary integer powers.
There are two issues here when implementing this in the naive way: Overflow or underflow in intermediate computation when computing x * x, and substractive cancellation during final subtraction of 1. The second issue is an accuracy issue.
ISO C has a standard math function hypot (x, y) that performs the computation sqrt (x * x + y * y) accurately while avoiding underflow and overflow in intermediate computation. A common approach to fix issues with subtractive cancellation is to transform the computation algebraically such that it is transformed into multiplications and / or divisions.
Combining these two fixes leads to the following implementation for float argument. It has an error of less than 3 ulps across all possible inputs according to my testing.
/* Compute sqrt(x*x+1)-1 accurately and without spurious overflow or underflow */
float func (float x)
{
return (x / (1.0f + hypotf (x, 1.0f))) * x;
}
A trick that is often useful in these cases is based on the identity
(a+1)*(a-1) = a*a-1
In this case
sqrt(x*x+1)-1 = (sqrt(x*x+1)-1)*(sqrt(x*x+1)+1)
/(sqrt(x*x+1)+1)
= (x*x+1-1) / (sqrt(x*x+1)+1)
= x*x/(sqrt(x*x+1)+1)
The last formula can be used as an implementation. For vwry small x sqrt(x*x+1)+1 will be close to 2 (for small enough x it will be 2) but we don;t loose precision in evaluating it.
The problem isn't with running into the minimum value, but with the precision.
As you said yourself, float on your machine has about 7 digits of precision. So let's take x = 1e-7, so that x^2 = 1e-14. That's still well within the range of float, no problems there. But now add 1. The exact answer would be 1.00000000000001. But if we only have 7 digits of precision, this gets rounded to 1.0000000, i.e. exactly 1. So you end up computing sqrt(1.0)-1 which is exactly 0.
One approach would be to use the linear approximation of sqrt around x=1 that sqrt(x) ~ 1+0.5*(x-1). That would lead to the approximation f(x) ~ 0.5*x^2.

Float data type uncertainty

I am doing a numerical analysis of a math software I developed. I want to identify what is the uncertainty of my result. Being f() my method and x an input value, I want to identify y of my result as f(x) +/- y. My f() method has multiple operations between float variables. To study the error propagation occurred in f(), I have to apply the Statistical Propagation of Uncertainty formulas and in order to do so I have to know the uncertainty of a float variable.
I do understand the architecture of a float variable as specified in the IEEE 754 standard and the rounding error converting a decimal value to float inherent to the latter.
From what I understood of the literature, the FLT_EPSILON macro in http://www.cplusplus.com/reference/cfloat/
defines my y value but this quick test proves it wrong:
float f1 = 1.234567f;
float f2 = 1.234567f + 1.192092896e-7f;
float f3 = 1.234567f + 1.192092895e-7f;
printf("Inicial:\t%f\n", f1);
printf("Inicial:\t%f\n", f2);
printf("Inicial:\t%f\n\n", f3);
Output:
Inicial: 1.234567
Inicial: 1.234567
Inicial: 1.234567
When the expected output should be:
Inicial: 1.234567
Inicial: 1.234568 <---
Inicial: 1.234567
What is that I am wrong about?
Should not the float value of x + FLT_EPSILON and x - FLT_EPSILON be the same?
EDIT: My question is being R the float value of x, what is the y value that x + y || x - y equals the same R float value?
Propagation of uncertainty is from the field of statistics and refers to how uncertainties in inputs affect mathematical functions of them. The analysis of errors that occur in computational arithmetic is numerical analysis.
FLT_EPSILON is not a measure of uncertainty or error in floating-point results. It is the distance between 1 and the next value representable in the float type. Hence, it is the size of steps between representable numbers at the magnitude of 1.
When you convert a decimal numeral to floating-point, the rounding error that results may have a magnitude of up to ½ the step size when the common round-to-nearest mode is used. The reason the bound is ½ the step size is that for any number x (within the finite domain of the floating-point format), there is a representable value within ½ the step size (inclusive). This is because, if there is a representable number more than ½ the step size in one direction, there is a representable number less than ½ the step size in the other direction.
The step size varies with the magnitudes of the numbers. With binary floating-point, it doubles at 2, and again at 4, then 8, and so on. Below 1, it halves, and again at ½, ¼, and so on.
When you perform floating-point arithmetic operations, the rounding that occurs in the computation may compound or cancel previous errors. There is no general formula for the final error.
The two numerals use used in your sample code, 1.192092897e-7f and 1.192092896e-7f, are so close together that they convert to the same float value, 2−23. That is why there is no difference in your f2 and f3.
There is a difference between f1 and f2, but you did not print enough digits to display it.
You ask “Should not the float value of x + FLT_EPSILON and x - FLT_EPSILON be the same?”, but your code does not contain x - FLT_EPSILON.
Re: “My question is being R the float value of x, what is the y value that x + y || x - y equals the same R float value?” This is trivially satisfied by y = 0. Did you mean to ask what is the largest value of y that satisfies the condition? That is a bit complicated.
The step size for a number x is called the ULP of x, which we may consider as a function ULP(x). ULP stands for Unit of Least Precision. It is the place value of the least digit in the floating-point representation of x. It is not a constant; it is a function of x.
For most values representable in a floating-point format, the largest y that satisfies your condition is ½ ULP(x) of the least digit in the floating-point representation of x is even and, if the digit is odd, it is just under ½ ULP(x). This complication arises from the rule that the results of arithmetic are rounded to the nearest representable value and, in case of a tie, the value with the even low digit is chosen. Thus, adding ½ ULP(x) to x will yield a tie that will round to x if the low digit is even, but will not round to x if the low digit is odd.
However, for x that are on the boundary where the ULP changes, the largest y that satisfies your condition is ¼ ULP(x). This is because, just below x (in magnitude), the step size changes, and the next number lower than x is half of x’s step size away instead of the usual full step size. So you can only go halfway toward that value before changing the result of the subtraction, so the most y can be is ¼ ULP(x).
Float is a 32 bit IEEE 754 single precision Floating Point Number: 1 bit for the sign, 8 bits for the exponent, and 23* for the value, i.e. float has 7 decimal digits of precision.
Increase the printf number of printed digits to see more but after 7 digits its just noise:
#include <stdio.h>
int main(void) {
float f1 = 1.234567f;
float f2 = 1.234567f + 1.192092897e-7f;
float f3 = 1.234567f + 1.192092896e-7f;
printf("Inicial:\t%.16f\n", f1);
printf("Inicial:\t%.16f\n", f2);
printf("Inicial:\t%.16f\n\n", f3);
return 0;
}
Output:
Inicial: 1.2345670461654663
Inicial: 1.2345671653747559
Inicial: 1.2345671653747559
float f1 = 1.234567f;
float f2 = f1 + 1.192092897e-7f;
float f3 = f1 + 1.192092896e-7f;
printf("Inicial:\t%.20f\n", f1);
printf("Inicial:\t%.20f\n", f2);
printf("Inicial:\t%.20f\n\n", f3);
Output:
Inicial: 1.23456704616546630000
Inicial: 1.23456716537475590000
Inicial: 1.23456716537475590000
No, your expectation is wrong
In the first printf call, you're printing the variable f1 with no effect which is just 1.234567f.

Comparing two numbers without comparison operators

As part of a program that I am writing for an assignment, I need to compare two numbers. Essentially, the program computes the eccentricity of an ellipse given its two axes and it has to compare the value of the calculated eccentricity to the (given) eccentricity of the Moon's orbit around the Earth, and Earth's orbit around the Sun. If the calculated eccentricity is greater than the given eccentricity, then this needs to be represented by a value of 1, otherwise, a value of 0. All of these values are floating-point, specifically, long double.
The constraints of the assignment do not allow me to use comparison operators (like >) or any sort of logic (!x or if-else). However, I am allowed to use the pow and sqrt functions from the math.h library. I am also allowed to use arithmetic operations as well as the modulo operation.
I know that I can take advantage of integer division to truncate the decimal if the denominator is greater than the numerator, i.e.:
int x = eccentricity / MOON_ORBIT_ECCENTRICITY;
... will be 0 if MOON_ORBIT_ECCENTRICITY is greater than eccentricity. However, if this relationship is inverted, then the value of x could be any non-zero integer. In such a case, the desired result is 1.
The first and most intuitive (and naïve) solution was:
int y = (x / x);
This will return 1 if x is non-zero. However, if x is 0, then my program crashes due to division by zero. In fact, I keep running into the problem of dividing by zero. This also happens in the case of:
int y = (x + 1) % x;
Does anyone have an idea of how to solve this? This seems so frustratingly easy.
#lurker comment above is a good approach to handle eccentricity as restricted by OP.
So as not to copy that, consider the not-so-serious following:
// Return e1 > e2
int Eccentricity_Compare(long double e1, long double e2) {
char buf[20];
// print a number beginning with
// if e2 >= e1: `+`
// else `-`
sprintf(buf, "%+Le", e2 - e1); // reverse subtraction for = case
const char *pm = "+-";
char *p = strchr(pm, buf[0]);
return (int) (p - pm);
}
Wink, wink: OP said nothing about <stdio.h> functions.

accuracy of sqrt of integers

I have a loop like this:
for(uint64_t i=0; i*i<n; i++) {
This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.
unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {
In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.
My question is: is the sqrt function accurate enough to do this for all cases?
Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.
My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.
sqrt(x+dx) < sqrt(x) + 1 //solve
dx < 2*sqrt(x) + 1
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1
Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:
unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi); //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi)); //65535 65536
Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.
First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.
If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:
for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)
replacing the multiply with an lea or a shift and two adds.
Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.
However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:
uint32_t cut = sqrt(n) + 1;
and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:
uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;
Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:
uint32_t cut = sqrt(n);
Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:
double_n = n*(1 + e)
which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:
square_root(double_n) = square_root(n)*square_root(1+e)
Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:
square_root(1+e) = 1 + e/2 + O(e^2)
= 1 + d with |d| <~ 2^-54
Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.
[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.
All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both
a simple way to avoid imul in loop
a simple way to compute the ceiling sqrt
Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:
sqrtFloor
"Return the integer part of the square root of self"
| guess delta |
guess := 1 bitShift: (self highBit + 1) // 2.
[
delta := (guess squared - self) // (guess + guess).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1
Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow
But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:
uint32_t sqrtFloor(uint64_t n)
{
int64_t diff;
int64_t delta;
uint64_t guess=sqrt(n); /* implicit conversions here... */
while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
guess -= delta;
return guess-(diff>0);
}
That's a few integer multiplications and divisions, but outside the main loop.
What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.
For x>0, there is
.
To make the notation more compact, rewriting the above formula as
Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:
Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.
For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.

What operation turns floating point numbers into a "group"?

Might anyone be famiiar with tricks and techniques to coerce the set of valid floating point numbers to be a group under a multiplication based operation?
That is, given any two floating point numbers ("double a,b"), what sequence of operations, including multiply, will turn this into another valid floating point number? (A valid floating point number is anything 1-normalized, excluding NaN, denormals and -0.0).
To put this rough code:
double a = drand();
while ( forever )
{
double b = drand();
a = GROUP_OPERATION(a,b);
//invariant - a is a valid floating point number
}
Just multiply by itself doesn't work, because of NaNs. Ideally this would be a straight-line approach (avoiding "if above X, divide by Y" formulations).
If this can't work for all valid floating point numbers, is there a subset for which such an operation is available?
(The model I'm looking for is akin to integer multiplication in C - no matter what two integers get multiplied together, you always get an integer back).
(The model I'm looking for is akin to integer multiplication in C - no matter what two integers get multiplied together, you always get an integer back).
Integers modulo 2^N do not form a group - what integer multiplied by 2 gives 1? For integers to be a group under multiplication, you have to be modulo a prime number. (eg Z mod 7, 2*4 = 1, so 2 and 4 are each other's inverses)
For floating point values, simple multiplication or addition saturates to +/- Infinity, and there are no values which are the inverses of infinity, so either the set is not closed, or it lacks invertibility.
If on the other hand you want something similar to integer multiplication modulo a power of 2, then multiplication will do - there are elements without an inverse, so it's not a group, but it is closed - you always get a floating point value back. For subsets of floats which are a true group, see lakshmanaraj's answer.
Floating point numbers are backed by bits. That means that you can use the integer arithmetic on the integer representation of your floating point values and you will get a group.
Not sure this is very usefull though.
/* You have to find the integer type whose size correspond to your double */
typedef double float_t;
typedef long long int_t;
float_t group_operation(float_t a, float_t b)
{
int_t *ia, *ib, c;
assert(sizeof(float_t) == sizeof(int_t));
ia = &a;
ib = &b;
c = *ia * *ib;
return (float_t)c;
}
Floating point numbers never form a group in the sense you are talking about, because of rounding errors. Consider any of those horrible examples from numerical analysis class, like the fact that 0.1 can't be represented exactly in binary.
But then even computational ints don't form a group in that sense, since they're not closed under multiplication either. (Proof: compute the result of while true do x = x*x. At some point you'll exceed the word size, run out of resources for a BIGNUM, or something.)
update for #UnderAchievementAward:
-- added here so I can get formatting, unlike comments
Since I start with floating point (instead of "real" numbers), can't I avoid any of the 0.1 representational issues? The "x = x*x" problem is why additional operations are needed to keep the result in the valid range.
Okay, but then you're going to run into a situation where there will exist some x,y st 0 ≤ x,y < max where xy < 0. Or something equally non-intuitive.
The point is that you can certainly define a collection of operations that will look like a group on a finite representation set, but it's going to do weird things if you try to use it as the normal arithmetic operations.
If group operation is multiplication then
if n is the highest bit, then r1=1/power(2,n-1) is the least decimal that you can operate and the set
[r1,2 * r1,4 * r1,8 * r1...1] union [-r1, -2 * r1, -4 * r1,....-1] union [0] will be the group that you are expecting.
For integer [1,0,-1] is the group.
if Group operation can be any thing else,
then to form n set of valid groups,
A(r)=cos(2*Pi*r/n) from r=0 to n-1
and group operation is
COS(COSINV(A1)+COSINV(A2))
I don't know whether you want this.....
or if you want INFINITY set as a valid group then
simple answer :
GROUP OPERATION = AVG(A1,A2) = (A1+A2)/2
or some functions exists F which has FINV as it's inverse and then FINV(F(A1)+F(A2)/2)
Example of F is Log, inverse, square etc ..
double a = drand();
while ( forever )
{
double b = drand();
a = (a+b)/2
//invariant - a is a valid floating point number
}
or if you want INFINITY set of DIGITAL format as a valid group then
Let L be the lowest float number and H be highest float number
then GROUP OPERATION = AVG(A1,A2, L, H) = (A1+A2+L+H)/4
this operation will always be within L & H for all Positive numbers.
You can take L as four times the lowest decimal number and H as the (highest decimal number /4) for practical purpose.
double l = (0.0000000000000000000000000//1) * 4
double h = (0xFFFFFFFFFFFFFFFFFFFFFF///F) / 4
double a = abs(drand()) / 4;
while ( forever )
{
double b = abs(drand()) / 4;
a = (a+b+l+h)/4
//invariant - a is a valid floating point number
}
this has a subset of all possitive float number / 4.
The integers don't form a group under multiplication -- 0 doesn't have
an inverse.

Resources