Is there a good reason for storing percentages that are less than 1 as numbers greater than 1? - database

I inherited a project that uses SQL Server 200x, wherein a column that stores a value that is always considered as a percentage in the problem domain is stored as its greater than 1 decimal equivalent. For example, 70% (0.7, literally) is stored as 70, 100% as 100, etc. Aside from the need to remember to * 0.01 on retrieved values and * 100 before persisting values, it doesn't seem to be a problem in and of itself. It does make my head explode though... so is there a good reason for it that I'm missing? Are there compelling reasons to fix it, given that there is a fair amount of code written to work with the pseudo-percentages?
There are a few cases where greater than 100% occurs, but I don't see why the value wouldn't just be stored as 1.05, for example, in those cases.
EDIT: Head feeling better, and slightly smarter. Thanks for all the insights.

There are actually four good reasons I can think of that you might want to store—and calculate with—whole-number percentage values rather than floating-point equivalents:
Depending on the data types chosen, the integer value may take up less space.
Depending on the data type, the floating-point value may lose precision (remember that not all languages have a data type equivalent to SQL Server's decimal type).
If the value will be input from or output to the user very frequently, it may be more convenient to keep it in a more user-friendly format (decision between convert when you display and convert when you calculate ... but see the next point).
If the principle values are also integers, then
principle * integerPercentage / 100
which uses all integer arithmetic is usually faster than its floating-point equivalent (likely significantly faster in the case of a floating-point type equivalent to T-SQL's decimal type).

If its a byte field then it takes up less room in the db than floating point numbers, but unless you have millions and millions of records, you'll hardly see a difference.

Since floating-point values can't be compared for equality, an integer may have been used to make the SQL simpler.
For example
(0.3==3*.1)
is usually False.
However
abs( 0.3 - 3*.1 )
Is a tiny number (5.55e-17). But it's pain to have to do everything with (column-SomeValue) BETWEEN -0.0001 AND 0.0001 or ABS(column-SomeValue) < 0.0001. You'd rather do column = SomeValue in your WHERE clause.

Floating point numbers are prone to rounding errors and, therefore, can act "funny" in comparisons. If you always want to deal with it as fixed decimal, you could either choose a decimal type, say decimal(5,2), or do the convert and store as int thing that your db does. I'd probably go the decimal route, even though the int would take up less space.

A good guess is because anything you do with integers (storing, calculating, stuffing into an edit for for a user, etc.) is marginally easier and more efficient than doing the same with floating point numbers. And the rounding issues aren't so obvious when you look at the data.

If these are numbers that end users are likely to see and interact with, percentages are easier to understand than decimals.
This is one of those situations where a notation aid can help; in the program, be consistent in using a prefix (Hungarian) or postfix to specify values that are percentages vs. those that are decimal. If you can extend a naming convention to the database fields themselves, so much the better.

And to add to the data storage issue, if you can use integer arithmetic for whatever processing you are doing, the performance is much better than when doing floating point arithmetic... So storing ther percetages as integer values may allow the processing logic to itilize integer arithmetic

If you're actually using them as a coefficient (or expect users of the database to do this sort of thing in reports), there's a case for storing them as a coefficient - particularly if there's a reason to do calculations involving more than one.
However, if you do this you should be consistent - either all percentages or all coefficients.

Related

How to only allow floats to two decimal places

For instance, I have a floating point number 0.02344489282. I want to be able to make sure that every float that I have is upto two decimal points: 0.02. It will be inexact, I'm sure but the entire floats in my code should be able to truncate anything after two decimal places. I have seen other related posts on Stack Overflow but they deal with outputting the decimal to two points.
Goal: to optimize memory consumption at the expense of accuracy. But the accuracy can be downgraded to 5-15%.
Practical example: I am executing a Kalman filter. Instead of exact values of noise and actual values, I try to find the approximate values by shortening the bit width of variables. Then I'll find the difference of accuracy of the former script and the latter script and how much of energy and memory is saved.
Two possible solutions:
Use integers representing units of 1/100.
Use floating point, but only use integer multiples of 0.25 (i.e. numbers ending in .25, .50, .75, or .00) since these are the only floats which have only two decimal places.
Since option 2 is almost certainly not what you actually want, go for 1.

Real to Float conversion with no loss of data

I had a table with two columns for coordinates stored in. These columns were REAL datatype, and I noticed that from my application it was only showing 5 decimals for coordinates, and positions were not accurate enough.
I decided to change datatype to FLOAT, so I could use more decimals. It was for my pleasant surprise that when I changed the column data type, the decimals suddenly appeared without me having to store all the coordinates again.
Anyone can tell me why this happens? What happens with the decimal precision on REAL datatype?. Isn´t the data rounded and truncated when inserted? Why when I changed the datatype the precision came up with no loss of data?..
You want to use a Decimal data-type.
Floating point values are caluclated by a value and an exponenent. This allows you have store huge number representations in small amounts of memory. This also means that you don't always get exactly the number you're looking for, just very very close. This is why when you compare floating point values, you compare them within a certain tolerance.
It was for my pleasant surprise that when I changed the column data type, the decimals suddenly appeared without me having to store all the coordinates again.
Be careful, this doesn't mean that the value that was filled in is the accurate value of what you're looking for. If you truncated your original calculation, you need to get those numbers again without cutting off any precision. The values that it autofills when you convert from Real to Float aren't the rest of what you truncated, they are entirely new values which result from adding more precision to the calculation used to populate your Real value.
Here is a good thread that explains the difference in data-types in SQL:
Difference between numeric, float and decimal in SQL Server
Another helpful link:
Bad habits to kick : choosing the wrong data type

How to convert floating point input to integers and preserve maximum precision?

I have to use an algorithm which expects a matrix of integers as input. The input I have is real valued, therefore I want to convert the input it to integer before passing it to the algorithm.
I though of scaling the input by a large constant and then rounding it to integers. This looks like a good solution but how does one decide a good constant to be used, specially since the range of float input could vary from case to case? Any other ideas are also welcome?
Probably the best general answer to this question is to find out what is the maximum integer value that your algorithm can accept as an element in the matrix without causing overflow in the algorithm itself. Once you have this maximum value, find the maximum floating point value in your input data, then scale your inputs by the ratio of these two maximum values and round to the nearest integer (avoid truncation).
In practice you probably cannot do this because you probably cannot determine what is the maximum integer value that the algorithm can accept without overflowing. Perhaps you don't know the details of the algorithm, or it depends in a complicated way on all of the input values. If this is the case, you'll just have to pick an arbitrary maximum input value that seems to work well enough.
First normalize your input to [0,1) range, then use a common way to scale them:
f(x) = range_max_exclusive * x + range_min_inclusive
After that, cast f(x) (or round if you wish) to integer. In that way you can handle situations such as real values are in range [0,1) or [0,n) where n>1.
In general, your favourite library contains matrix operations, which you can implement this technique easily and with better performance than your possible implementation.
EDIT: Scaling-down then Scaling-up is sure to get lost some precision. I favor it because a normalization operation is generally comes with the library. Also you can do that without downscaling by:
f(x) = range_max_exlusive / max_element * x + range_min_inclusive

"Round half up" on floating point values

We are stuck with a database that (unfortunately) uses floats instead of decimal values. This makes rounding a bit difficult. Consider the following example (SQL Server T-SQL):
SELECT ROUND(6.925e0, 2) --> returns 6.92
ROUND does round half up, but since floating point numbers cannot accurately represent decimal numbers, the "wrong" result (from the point of view of the end-user) is displayed. I understand why this happens.
I already came up with two possible solutions (both returning a float, which is, unfortunately, also a requirement):
Convert to a decimal data type before rounding: SELECT CONVERT(float, ROUND(CONVERT(decimal(29,14), 6.925e0), 2))
Multiply until the third digit is on the left-hand side of the decimal point (i.e. accurately represented), and then do the rounding: SELECT ROUND(6.925e0 * 1000, -1) / 1000
Which one should I choose? Is there some better solution? (Unfortunately, we cannot change the field types in the database due to some legacy applications accessing the same DB.)
Is there a well-established best practice solution for this (common?) problem?
(Obviously, the common technique "rounding twice" will not help here since 6.925 is already rounded to three decimal places -- as far as this is possible in a float.)
Your first solution seems safer, and also seems like a conceptually closer fit to the problem: convert as soon as possible from float to decimal, do all relevant calculations within the decimal type, and then do a last minute conversion back to float before writing to the DB.
Edit: You'll likely still need to do an extra round (e.g. to 3 decimal places, or whatever's appropriate for your application) immediately after retrieving the float value and converting to decimal, to make sure that you end up with the decimal value that was actually intended. 6.925e0 converted to decimal would again be likely (assuming that the decimal format has > 16 digits of precision) to give something that's very close to, but not exactly equal to, 6.925; an extra round would take care of this.
The second solution doesn't look reliable to me: what if the stored value for 6.925e0 happens to be, due to the usual binary floating-point issues, a tiny amount too small? Then after multiplication by 1000, the result may still be a touch under 6925, so that the rounding step rounds down instead of up. If you know your value always has at most 3 digits after the point, you could fix this by doing an extra round after multiplying by 1000, something like ROUND(ROUND(x * 1000, 0), -1).
(Disclaimer: while I have plenty of experience dealing with float and decimal issues in other contexts, I know next to nothing about SQL.)
Old question, but I am surprised that the normal practice is not mentioned here, so I just add it.
Normally, you would add a small amount that you know is much smaller than the accuracy of the numbers you are working with, e.g. like this:
SELECT ROUND(6.925e0 + 1e-7, 2)
Of course the added amount must be larger than the precision of the floating point type that is used.
Use an arbitrary-precision format such as DECIMAL. That way you can leave it to the language to get it right (or wrong as the case may be).
I managed to round the float column correctly using the following command:
SELECT CONVERT(float, ROUND(ROUND(CONVERT(decimal(38,14),float_column_name),3),2))

Use float or decimal for accounting application dollar amount?

We are rewriting our legacy accounting system in VB.NET and SQL Server. We brought in a new team of .NET/ SQL Programmers to do the rewrite. Most of the system is already completed with the dollar amounts using floats. The legacy system language, I programmed in, did not have a float, so I probably would have used a decimal.
What is your recommendation?
Should the float or decimal data type be used for dollar amounts?
What are some of the pros and cons for either?
One con mentioned in our daily scrum was you have to be careful when you calculate an amount that returns a result that is over two decimal positions. It sounds like you will have to round the amount to two decimal positions.
Another con is all displays and printed amounts have to have a format statement that shows two decimal positions. I noticed a few times where this was not done and the amounts did not look correct. (i.e. 10.2 or 10.2546)
A pro is the float-only approach takes up eight bytes on disk where the decimal would take up nine bytes (decimal 12,2).
Should Float or Decimal data type be used for dollar amounts?
The answer is easy. Never floats. NEVER!
Floats were according to IEEE 754 always binary, only the new standard IEEE 754R defined decimal formats. Many of the fractional binary parts can never equal the exact decimal representation.
Any binary number can be written as m/2^n (m, n positive integers), any decimal number as m/(2^n*5^n).
As binaries lack the prime factor 5, all binary numbers can be exactly represented by decimals, but not vice versa.
0.3 = 3/(2^1 * 5^1) = 0.3
0.3 = [0.25/0.5] [0.25/0.375] [0.25/3.125] [0.2825/3.125]
1/4 1/8 1/16 1/32
So you end up with a number either higher or lower than the given decimal number. Always.
Why does that matter? Rounding.
Normal rounding means 0..4 down, 5..9 up. So it does matter if the result is
either 0.049999999999.... or 0.0500000000... You may know that it means 5 cent, but the the computer does not know that and rounds 0.4999... down (wrong) and 0.5000... up (right).
Given that the result of floating point computations always contain small error terms, the decision is pure luck. It gets hopeless if you want decimal round-to-even handling with binary numbers.
Unconvinced? You insist that in your account system everything is perfectly ok?
Assets and liabilities equal? Ok, then take each of the given formatted numbers of each entry, parse them and sum them with an independent decimal system!
Compare that with the formatted sum. Oops, there is something wrong, isn't it?
For that calculation, extreme accuracy and fidelity was required (we used Oracle's
FLOAT) so we could record the "billionth's of a penny" being accured.
It doesn't help against this error. Because all people automatically assume that the computer sums right, and practically no one checks independently.
This photo answers:
This is another situation: man from Northampton got a letter stating his home would be seized if he didn't pay up zero dollars and zero cents!
First you should read What Every Computer Scientist Should Know About Floating Point Arithmetic. Then you should really consider using some type of fixed point / arbitrary-precision number package (e.g., Java BigNum or Python decimal module). Otherwise, you'll be in for a world of hurt. Then figure out if using the native SQL decimal type is enough.
Floats and doubles exist(ed) to expose the fast x87 floating-point coprocessor that is now pretty much obsolete. Don't use them if you care about the accuracy of the computations and/or don't fully compensate for their limitations.
Just as an additional warning, SQL Server and the .NET framework use a different default algorithm for rounding. Make sure you check out the MidPointRounding parameter in Math.Round(). .NET framework uses bankers' rounding by default and SQL Server uses Symmetric Algorithmic Rounding. Check out the Wikipedia article here.
Ask your accountants! They will frown upon you for using float. Like David Singer said, use float only if you don't care for accuracy. Although I would always be against it when it comes to money.
In accounting software is not acceptable a float. Use decimal with four decimal points.
A bit of background here....
No number system can handle all real numbers accurately. All have their limitations, and this includes both the standard IEEE floating point and signed decimal. The IEEE floating point is more accurate per bit used, but that doesn't matter here.
Financial numbers are based on centuries of paper-and-pen practice, with associated conventions. They are reasonably accurate, but, more importantly, they're reproducible. Two accountants working with various numbers and rates should come up with the same number. Any room for discrepancy is room for fraud.
Therefore, for financial calculations, the right answer is whatever gives the same answer as a CPA who's good at arithmetic. This is decimal arithmetic, not IEEE floating point.
Floating points have unexpected irrational numbers.
For instance you can't store 1/3 as a decimal, it would be 0.3333333333... (and so on)
Floats are actually stored as a binary value and a power of 2 exponent.
So 1.5 is stored as 3 x 2 to the -1 (or 3/2)
Using these base-2 exponents create some odd irrational numbers, for instance:
Convert 1.1 to a float and then convert it back again, your result will be something like: 1.0999999999989
This is because the binary representation of 1.1 is actually 154811237190861 x 2^-47, more than a double can handle.
More about this issue on my blog, but basically, for storage, you're better off with decimals.
On Microsoft SQL server you have the money data type - this is usually best for financial storage. It is accurate to 4 decimal positions.
For calculations you have more of a problem - the inaccuracy is a tiny fraction, but put it into a power function and it quickly becomes significant.
However decimals aren't very good for any sort of maths - there's no native support for decimal powers, for instance.
I'd recommend using 64-bit integers that store the whole thing in cents.
Use SQL Server's decimal type.
Do not use money or float.
money uses four decimal places and is faster than using decimal, but suffers from some obvious and some not so obvious problems with rounding (see this connect issue).
The only reason to use Float for money is if you don't care about accurate answers.
Floats are not exact representations, precision issues are possible, for example when adding very large and very small values. That's why decimal types are recommended for currency, even though the precision issue may be sufficiently rare.
To clarify, the decimal 12,2 type will store those 14 digits exactly, whereas the float will not as it uses a binary representation internally. For example, 0.01 cannot be represented exactly by a floating point number - the closest representation is actually 0.0099999998
For a banking system I helped develop, I was responsible for the "interest accrual" part of the system. Each day, my code calculated how much interest had been accrued (earnt) on the balance that day.
For that calculation, extreme accuracy and fidelity was required (we used Oracle's FLOAT) so we could record the "billionth's of a penny" being accrued.
When it came to "capitalising" the interest (ie. paying the interest back into your account) the amount was rounded to the penny. The data type for the account balances was two decimal places. (In fact it was more complicated as it was a multi-currency system that could work in many decimal places - but we always rounded to the "penny" of that currency). Yes - there where "fractions" of loss and gain, but when the computers figures were actualised (money paid out or paid in) it was always REAL money values.
This satisfied the accountants, auditors and testers.
So, check with your customers. They will tell you their banking/accounting rules and practices.
Even better than using decimals is using just plain old integers (or maybe some kind of bigint). This way you always have the highest accuracy possible, but the precision can be specified. For example the number 100 could mean 1.00, which is formatted like this:
int cents = num % 100;
int dollars = (num - cents) / 100;
printf("%d.%02d", dollars, cents);
If you like to have more precision, you can change the 100 to a bigger value, like: 10 ^ n, where n is the number of decimals.
Another thing you should be aware of in accounting systems is that no one should have direct access to the tables. This means all access to the accounting system must be through stored procedures.
This is to prevent fraud, not just SQL injection attacks. An internal user who wants to commit fraud should not have the ability to directly change data in the database tables, ever. This is a critical internal control on your system.
Do you really want some disgruntled employee to go to the backend of your database and have it start writing them checks? Or hide that they approved an expense to an unauthorized vendor when they don't have approval authority? Only two people in your whole organization should be able to directly access data in your financial database, your database administrator (DBA) and his backup. If you have many DBAs, only two of them should have this access.
I mention this because if your programmers used float in an accounting system, likely they are completely unfamiliar with the idea of internal controls and did not consider them in their programming effort.
I had been using SQL's money type for storing monetary values. Recently, I've had to work with a number of online payment systems and have noticed that some of them use integers for storing monetary values. In my current and new projects I've started using integers and I'm pretty content with this solution.
Out of the 100 fractions n/100, where n is a natural number such that 0 <= n and n < 100, only four can be represented as floating point numbers. Take a look at the output of this C program:
#include <stdio.h>
int main()
{
printf("Mapping 100 numbers between 0 and 1 ");
printf("to their hexadecimal exponential form (HEF).\n");
printf("Most of them do not equal their HEFs. That means ");
printf("that their representations as floats ");
printf("differ from their actual values.\n");
double f = 0.01;
int i;
for (i = 0; i < 100; i++) {
printf("%1.2f -> %a\n",f*i,f*i);
}
printf("Printing 128 'float-compatible' numbers ");
printf("together with their HEFs for comparison.\n");
f = 0x1p-7; // ==0.0071825
for (i = 0; i < 0x80; i++) {
printf("%1.7f -> %a\n",f*i,f*i);
}
return 0;
}
You can always write something like a Money type for .NET.
Take a look at this article: A Money type for the CLR. The author did an excellent work in my opinion.
Whatever you do, you need to be careful of rounding errors. Calculate using a greater degree of precision than you display in.
Have you considered using the money-data type to store dollar-amounts?
Regarding the con that decimal takes up one more byte, I would say don't care about it. In 1 million rows you will only use 1 more MB and storage is very cheap these days.
You will probably want to use some form of fixed point representation for currency values. You will also want to investigate banker's rounding (also known as "round half to even"). It avoids bias that exist in the usual "round half up" method.
Always use Decimal. Float will give you inaccurate values due to rounding issues.
Floating point numbers can only represent numbers that are a sum of negative multiples of the base - for binary floating point, of course, that's two.
There are only four decimal fractions representable precisely in binary floating point: 0, 0.25, 0.5 and 0.75. Everything else is an approximation, in the same way that 0.3333... is an approximation for 1/3 in decimal arithmetic.
Floating point is a good choice for computations where the scale of the result is what is important. It's a bad choice where you're trying to be accurate to some number of decimal places.
This is an excellent article describing when to use float and decimal. Float stores an approximate value and decimal stores an exact value.
In summary, exact values like money should use decimal, and approximate values like scientific measurements should use float.
Here is an interesting example that shows that both float and decimal are capable of losing precision. When adding a number that is not an integer and then subtracting that same number float results in losing precision while decimal does not:
DECLARE #Float1 float, #Float2 float, #Float3 float, #Float4 float;
SET #Float1 = 54;
SET #Float2 = 3.1;
SET #Float3 = 0 + #Float1 + #Float2;
SELECT #Float3 - #Float1 - #Float2 AS "Should be 0";
Should be 0
----------------------
1.13797860024079E-15
When multiplying a non integer and dividing by that same number, decimals lose precision while floats do not.
DECLARE #Fixed1 decimal(8,4), #Fixed2 decimal(8,4), #Fixed3 decimal(8,4);
SET #Fixed1 = 54;
SET #Fixed2 = 0.03;
SET #Fixed3 = 1 * #Fixed1 / #Fixed2;
SELECT #Fixed3 / #Fixed1 * #Fixed2 AS "Should be 1";
Should be 1
---------------------------------------
0.99999999999999900
Your accountants will want to control how you round. Using float means that you'll be constantly rounding, usually with a FORMAT() type statement, which isn't the way you want to do it (use floor / ceiling instead).
You have currency datatypes (money, smallmoney), which should be used instead of float or real. Storing decimal (12,2) will eliminate your roundings, but will also eliminate them during intermediate steps - which really isn't what you'll want at all in a financial application.

Resources