How to convert floating point input to integers and preserve maximum precision?

How to convert floating point input to integers and preserve maximum precision? - c

I have to use an algorithm which expects a matrix of integers as input. The input I have is real valued, therefore I want to convert the input it to integer before passing it to the algorithm.
I though of scaling the input by a large constant and then rounding it to integers. This looks like a good solution but how does one decide a good constant to be used, specially since the range of float input could vary from case to case? Any other ideas are also welcome?

Probably the best general answer to this question is to find out what is the maximum integer value that your algorithm can accept as an element in the matrix without causing overflow in the algorithm itself. Once you have this maximum value, find the maximum floating point value in your input data, then scale your inputs by the ratio of these two maximum values and round to the nearest integer (avoid truncation).
In practice you probably cannot do this because you probably cannot determine what is the maximum integer value that the algorithm can accept without overflowing. Perhaps you don't know the details of the algorithm, or it depends in a complicated way on all of the input values. If this is the case, you'll just have to pick an arbitrary maximum input value that seems to work well enough.

First normalize your input to [0,1) range, then use a common way to scale them:
f(x) = range_max_exclusive * x + range_min_inclusive
After that, cast f(x) (or round if you wish) to integer. In that way you can handle situations such as real values are in range [0,1) or [0,n) where n>1.
In general, your favourite library contains matrix operations, which you can implement this technique easily and with better performance than your possible implementation.
EDIT: Scaling-down then Scaling-up is sure to get lost some precision. I favor it because a normalization operation is generally comes with the library. Also you can do that without downscaling by:
f(x) = range_max_exlusive / max_element * x + range_min_inclusive

Related

Numerical Integration in fortran with infinity as one of the limits

I am asked to normalize a probability distribution P=A(x^2)(e^-x) within 0 to infinity by finding the value for A. I know the algorithms to calculate the Numerical value of Integration, but how do I deal with one of the limits being Infinity.

The only way I have been able to solve this problem with some accuracy (I got full accuracy, indeed) is by doing some math first, in order to obtain the taylor series that represents the integral of the first.
I have been looking here for my sample code, but I don't find it. I'll edit my post if I get a working solution.
The basic idea is to calculate all the derivatives of the function exp(-(x*x)) and use the coeficients to derive the integral form (by dividing those coeficients by one more than the exponent of x of the above function) to get the taylor series of the integral (I recommend you to use the unnormalized version described above to get the simple number coeficients, then adjust the result by multiplying by the proper constants) you'll get a taylor series with good convergence, giving you precise values for full precision (The integral requires a lot of subdivision, and you cannot divide an unbounded interval into a finite number of intervals, all finite)
I'll edit this question if I get on the code I wrote (so stay online, and dont' change the channel :) )

How to only allow floats to two decimal places

For instance, I have a floating point number 0.02344489282. I want to be able to make sure that every float that I have is upto two decimal points: 0.02. It will be inexact, I'm sure but the entire floats in my code should be able to truncate anything after two decimal places. I have seen other related posts on Stack Overflow but they deal with outputting the decimal to two points.
Goal: to optimize memory consumption at the expense of accuracy. But the accuracy can be downgraded to 5-15%.
Practical example: I am executing a Kalman filter. Instead of exact values of noise and actual values, I try to find the approximate values by shortening the bit width of variables. Then I'll find the difference of accuracy of the former script and the latter script and how much of energy and memory is saved.

Two possible solutions:
Use integers representing units of 1/100.
Use floating point, but only use integer multiples of 0.25 (i.e. numbers ending in .25, .50, .75, or .00) since these are the only floats which have only two decimal places.
Since option 2 is almost certainly not what you actually want, go for 1.

are floating point numbers changed when sorted in perl?

I'm running a statistical bootstrap at 10k permutations, which I'm trying to compare against an observed value. The observed is supposed to be identical to the max of the 10k permutations. The way I am measuring this is by attempting to find its percentile.
All results of the 10k permutations (10,000 random numbers) are stored in an array, which I sort using:
my #sorted = sort {$a <=> $b} #permutednumbers;
When I then compare the observed value $truevalue, I'm getting an inaccurate comparison. These are stored as floating point numbers. The bootstrapping procedure uses the same formula for generating the random number so it should be absolutely identical, but when comparing the same value, it becomes inaccurate. I'm testing this with:
if ($sorted[$#sorted] == $truevalue) {
print "sorted: $sorted[$#sorted] is eq truevalue:$truevalue\n";
} elsif ($sorted[$#sorted] > $truevalue) {
print "sorted: $sorted[$#sorted] is gt truevalue:$truevalue\n";
} elsif ($sorted[$#sorted] < $truevalue) {
print "sorted: $sorted[$#sorted] is lt truevalue:$truevalue, totalpermvalues; $totalpermvalues\n";
}
output:
sorted: 0.937864522389543 is gt truevalue:0.937864522389543
So I get that floating point numbers aren't printed in complete accuracy, but I always assumed internally the computer stores the correct numbers. Is that not a correct assumption? Of course I can fix this quickly by changing them into integers of some sort, but is this something that I should be doing automatically all the time? Are floating point numbers just dangerous to use? Those exact values should be identical given that they are outputs of identical inputs, which is what is confusing me...
If this matters, the values are individually calculated using the linear_interpolate function in Math::Interpolate package, but the inputs are identical.

If I understand correctly, you are wondering why == is returning false and > is returning true for what appear to be identical numbers. Obviously, the numbers are not actually identical. You can see this by printing more digits.
printf "sorted: %.80e is gt truevalue:%.80e\n", $sorted[$#sorted], $truevalue;

No, sort will not change values. One has to assume that there is a difference in the way these two values have been produced.
It is most certainly possible to use == with floating point numbers (FPN), returning true if a pair of 64 bit quantities is identical. But one has to be very careful when one ask the question "Are these two FPNs equal?"
A (relatively small but still considerable) quantity of integers and rational numbers can be represented accurately in a FPN. For these (and only for these), questions such as "Is the FPN a equal to 1.5?" (written as $a==1.5) may make sense but only if you are confident about the genesis of the value in $a. - Don't take this lightly: will both of the following statements print "1"?
print 0.12345678901234567 == 1.2345678901234567E-1,"\n";
print 0.12345678901234567 == 12.345678901234567E-2,"\n";
All FPNs are not only representatives of the value x they represent accurately. They are also responsible for an interval of real numbers, including rational, irrational and transcendent (and even integer) numbers "a little greater and a little smaller" than x. You can quantify "a little": it is 1e-16 for x == 1.0, and shrinks or grows accordingly. So, for instance, 1+1e-17 will be 1.0 on your computer. You can input this number, but the FPN will be 1.0 all the same. Asking whether a FPN as the result of some computation equals 1+1e-17 doesn't make sense since you cannot even tell the computer that value.
The solution isn't difficult. Instead of asking for equality you have to ask "Is the FPN a in an interval [p,q] around x?" Determining p and q should be given a little thought, as a suitable choice of these values primarily depends on x. The usual formula is something like
abs( $a - $expect ) <= $expect*PRECISION
where PRECISION could be, for instance, 1e-12. (The value to use here may depend on the algorithm you use for computing $a, or on your needs, or both.)
Finally: due to the mathematical properties of FP machine instructions, the usual arithmetic laws of associativity or distributivity are not guaranteed. The effect of truncation in addition or subtraction may, for instance cause heavy distortion in the result. A typical example for illustrating this, compute some Taylor series: once adding terms in decreasing order until terms become smaller than a given limit, and once, using the same terms, but in increasing order.

How to compare double numbers?

I know that when I would like to check if double == double I should write:
bool AreSame(double a, double b)
{
return fabs(a - b) < EPSILON;
}
But what when I would like to check if a > b or b > a ?

There is no general solution for comparing floating-point numbers that contain errors from previous operations. The code that must be used is application-specific. So, to get a proper answer, you must describe your situation more specifically. For example, if you are sorting numbers in a list or other data structure, you should not use any tolerance for comparison.
Usually, if your program needs to compare two numbers for order but cannot do so because it has only approximations of those numbers, then you should redesign the program rather than try to allow numbers to be ordered incorrectly.
The underlying problem is that performing a correct computation using incorrect data is in general impossible. If you want to compute some function of two exact mathematical values x and y but the only data you have is some incorrectly computed values x and y, it is generally impossible to compute the exactly correct result. For example, suppose you want to know what the sum, x+y, is, but you only know x is 3 and y is 4, but you do not know what the true, exact x and y are. Then you cannot compute x+y.
If you know that x and y are approximately x and y, then you can compute an approximation of x+y by adding x and y. This works when the function being computed has a reasonable derivative: Slightly changing the inputs of a function with a reasonable derivative slightly changes its outputs. This fails when the function you want to compute has a discontinuity or a large derivative. For example, if you want to compute the square root of x (in the real domain) using an approximation x but x might be negative due to previous rounding errors, then computing sqrt(x) may produce an exception. Similarly, comparing for inequality or order is a discontinuous function: A slight change in inputs can change the answer completely.
The common bad advice is to compare with a “tolerance”. This method trades false negatives (incorrect rejections of numbers that would satisfy the comparison if the true mathematical values were compared) for false positives (incorrect acceptance of numbers that would not satisfy the comparison).
Whether or not an application can tolerate false acceptance depends on the application. Therefore, there is no general solution.
The level of tolerance to set, and even the nature by which it is calculated, depend on the data, the errors, and the previous calculations. So, even when it is acceptable to compare with a tolerance, the amount of tolerance to use and how to calculate it depends on the application. There is no general solution.

The analogous comparisons are:
a > b - EPSILON
and
b > a - EPSILON
I am assuming that EPSILON is some small positive number.

Is there a good reason for storing percentages that are less than 1 as numbers greater than 1?

I inherited a project that uses SQL Server 200x, wherein a column that stores a value that is always considered as a percentage in the problem domain is stored as its greater than 1 decimal equivalent. For example, 70% (0.7, literally) is stored as 70, 100% as 100, etc. Aside from the need to remember to * 0.01 on retrieved values and * 100 before persisting values, it doesn't seem to be a problem in and of itself. It does make my head explode though... so is there a good reason for it that I'm missing? Are there compelling reasons to fix it, given that there is a fair amount of code written to work with the pseudo-percentages?
There are a few cases where greater than 100% occurs, but I don't see why the value wouldn't just be stored as 1.05, for example, in those cases.
EDIT: Head feeling better, and slightly smarter. Thanks for all the insights.

There are actually four good reasons I can think of that you might want to store—and calculate with—whole-number percentage values rather than floating-point equivalents:
Depending on the data types chosen, the integer value may take up less space.
Depending on the data type, the floating-point value may lose precision (remember that not all languages have a data type equivalent to SQL Server's decimal type).
If the value will be input from or output to the user very frequently, it may be more convenient to keep it in a more user-friendly format (decision between convert when you display and convert when you calculate ... but see the next point).
If the principle values are also integers, then
principle * integerPercentage / 100
which uses all integer arithmetic is usually faster than its floating-point equivalent (likely significantly faster in the case of a floating-point type equivalent to T-SQL's decimal type).

If its a byte field then it takes up less room in the db than floating point numbers, but unless you have millions and millions of records, you'll hardly see a difference.

Since floating-point values can't be compared for equality, an integer may have been used to make the SQL simpler.
For example
(0.3==3*.1)
is usually False.
However
abs( 0.3 - 3*.1 )
Is a tiny number (5.55e-17). But it's pain to have to do everything with (column-SomeValue) BETWEEN -0.0001 AND 0.0001 or ABS(column-SomeValue) < 0.0001. You'd rather do column = SomeValue in your WHERE clause.

Floating point numbers are prone to rounding errors and, therefore, can act "funny" in comparisons. If you always want to deal with it as fixed decimal, you could either choose a decimal type, say decimal(5,2), or do the convert and store as int thing that your db does. I'd probably go the decimal route, even though the int would take up less space.

A good guess is because anything you do with integers (storing, calculating, stuffing into an edit for for a user, etc.) is marginally easier and more efficient than doing the same with floating point numbers. And the rounding issues aren't so obvious when you look at the data.

If these are numbers that end users are likely to see and interact with, percentages are easier to understand than decimals.
This is one of those situations where a notation aid can help; in the program, be consistent in using a prefix (Hungarian) or postfix to specify values that are percentages vs. those that are decimal. If you can extend a naming convention to the database fields themselves, so much the better.

And to add to the data storage issue, if you can use integer arithmetic for whatever processing you are doing, the performance is much better than when doing floating point arithmetic... So storing ther percetages as integer values may allow the processing logic to itilize integer arithmetic

If you're actually using them as a coefficient (or expect users of the database to do this sort of thing in reports), there's a case for storing them as a coefficient - particularly if there's a reason to do calculations involving more than one.
However, if you do this you should be consistent - either all percentages or all coefficients.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight