Check for precision loss when converting string to float

Check for precision loss when converting string to float - c

I have a string representing a rational number.
I want to convert the string to a float with strtof(nptr, &endptr)
The problem is that e.g. a string "1.0000000000000000000001" will be converted to 1. without raising any flags (iirc).
Therefore my question: How does one catch this precision loss?

How does one catch this precision loss?
One doesn't, at least not with anything in the standard library; none of the strto* conversion functions will tell you if the value cannot be represented exactly.
Edit
I know that's not terribly helpful, but it means you'll have to go outside anything in the standard library. You'll either have to write your own conversion routines that somehow keep track of precision loss (I have no idea how you would implement this), or you'll have to go with some arbitrary-precision library like GMP, or you'll have to implement your own version of binary-coded decimal and hand-hack your own API to assign, compare, and manipulate BCD values.
C just doesn't give you the tools needed to do that kind of analysis.

Related

How to print a float on stdout without printf()?

I'm on environment that has no printf() or any equivalent, so I'm writing it myself. But I have no idea how to perform such a conversion of float types. I tried to seen how gcc does it, but it's really hard to understand.

Floating-point formatting is very easy to get wrong. Writing a simplistic implementation that works for "most" numbers is deceptively easy, but it is likely to break on very large numbers, very small numbers, and numbers close to zero, not to mention IEEE754 subnormals, infinities and NaN. It might also get wrong the trailing decimals, failing to provide a string representation that allows reproducing the float bit-by-bit.
Fortunately, there are libraries out there that implement the work of formatting floating-point numbers, either for education, for embedded systems, or to improve on some aspect of standard-library formatting. If possible, I recommend that you incorporate David Gay's dtoa library, which has been extensively tested in Python and elsewhere.

You can take a look at musl libc implementation. musl is a lightweight libc.
In fmt_fp function defined in src/stdio/vfprintf.c, they are basically converting a float to a string for fprintf conversion specifiers like f.
If you search on the internet with keyword ftoa, you will find some other implementations of functions converting a float to a string.

Storing floating point numbers in a file

For a C application that I am implementing, I need to be able to read and write a set of configuration values to a file. These values are floating point numbers. In the future it is possible that another application (could be written in C++, Python, Perl, etc...) will use this same data, so these configuration values need to be stored in a well defined format that is compiler and machine independent.
Byte order conversion functions (ntoh/hton) can be used to handle the Endianness, however what is the best way to get around the different meanings of "float" value? Is there are common method for storing floats? Rounding and truncating is not a problem, just as long as it is defined.

There are probably two main options:
Store in text format. Here you would standardise on a particular format using a well-defined decimal separator and use scientific notation, i.e. 6.66e42.
Store in binary format using the IEEE754 standard. Use either the 4 or 8 byte data type. And as you noted, you'd have to settle on an endianness convention.
A text format is probably more portable because there are machines that do not natively understand IEEE754. That said, such machines are rare in these times.

The C formatted input/output functions have a format specifier for this, %a. It formats a floating-point number in a hexadecimal floating-point format, [-]0xh.hhhhp±d. That is, it has a “-” sign if needed, hexadecimal digits for the fraction part, including a radix point, a “p” (for “power”) to start the exponent and a signed exponent of two (in decimal).
As long as your C implementation uses binary floating-point (or any floating-point such that its FLT_RADIX is a power of two), conversion with the %a format should be exact.

IEEE 754, or ISO/IEC/IEEE 60559:2011, is the standard for floating point used by most languages.
For C, it's officially taken by standard in C11. (C11 Annex F IEC 60559 floating-point arithmetic)

For small amounts of data, such as configuration values, go with text not binary. If you want, go for structured text of some form, such as JSON, XML. Do decide on how many digits to write to represent a floating-point number according to your requirements.
As the range of required portability (across languages, operating systems, time, space, etc) increases so the force of the argument in favour of text becomes stronger.

Writing my own float parser

I am trying to write a parser in C and part of its job is to convert a series of characters into a double. Up to now I have been using strtod but I find it to be quite dangerous and it won't handle cases where the number is at the end of the buffer, which is not null terminated.
I thought I'd write my own. If I have a string representation of a number of the form a.b, will I be nieve to think that I can just calculate (double)a + ((double)b / (double)10^n), where n is the number of digits in b?
For example, 23.4563:
a = 23
b = 4563
final answer: 23 + (4563/10000)
Or would that produce inaccurate results with regard to the IEEE format of floats?

It is hard to read floating-point numerals accurately, in the sense that there are various problems that must be carefully addressed, and many people fail to do so. However, it is a solved problem. To start, see How to read floating point numbers accurately, June 1990, by William D. Clinger.
I agree with Roddy, you are likely better off copying the data into a buffer and using existing library functions. (However, you should check that your C implementation provides correctly rounded conversion of floating-point numerals. The C standard does not require it, and some implementations do not provide it.)

You may be interested in this answer of mine to a somewhat related question.
The parser in that answer converts decimal floating point numbers (represented as strings) into IEEE-754 floats and doubles with proper rounding.
As far as I remember, about the only issue in the code is that it may not handle the cases when the exponent part is too big (doesn't fit into an integer) and should amount to returning either an error or INF.
Otherwise, it should give you a good idea of what to do (if you have any idea at all of what you're doing:).

As already said, it's difficult, you need extra precision, etc...
But if you have restricted inputs, and want to know if you can still correctly convert these restricted decimal to binary with semi naive algorithm and standard IEEE 754 ops, you might be interested in my answer to
How to manually parse a floating point number from a string

efficient disk storage of decimal numbers in C (C89)

I am writing functions that serialize/deserialize a large data structure for efficient reloading later on. There is a particular set of decimal numbers for which precision is not a huge deal, and I would like to store them in 4 bytes of binary data.
For most, reading the bytes into a buffer and using memcpy to place them into a float is sufficient, and is the most common solution I've found. However, this is not portable, as floats on the systems this software is meant for are not guaranteed to be 4 bytes in size.
What I would like is something very portable (which is one of the reasons I'm limited to C89). I'm not wedded to 4 byte storage, but it is an attractive option to me. I am pretty wholly against storing the numbers as strings. I'm familiar with endianness issues, and such things are already taken into account.
What I am looking for, therefore, is a system-independent way to store and retrieve floating point numbers in a small amount of binary data (preferably around 4 bytes). I, in my foolishness, imagined this would be the easiest part of this task, since it seems like such a common problem, but popular search engines and various reference books have provided no material assistance.

You could store them in 32 bit IEEE float format (or a very close approximation to it, for instance you might what to restrict denorms and NaNs). Then have each platform adjust as necessary to coerce its own float type to that format and back.
Of course there will be some loss of accuracy, but that's inevitable anyway if you're transferring float values of difference precisions from one system to another.
It should be possible to write portable code to find the closest IEEE value to a native float value, and vice-versa, if that's required. You wouldn't really want to use it, though, because it would probably be far less efficient than code that takes advantage of knowing the float format. In the common case where the platform uses an IEEE representation it's a no-op or a simple narrowing/widening conversion. Even in the worst case you're likely to encounter, as long as it's a binary fraction you basically just have to extract the sign, exponent and significand bits and do the right thing with them (discard bits from the significand if it's too big, adjust the bias and possibly the width of the exponent, do the right thing with underflow and overflow).
If you want to avoid losing accuracy in the case where the file is saved and then reloaded on the same system (but that system doesn't use 32bit IEEE), you could look at storing some data indicating the format in the file (size of each value, number of bits of significand and exponent), then store each value at native precision, so that it only gets rounded if it's ever loaded onto a less-precise system. I don't know whether ASN.1 has a standard to encode floating-point values along these lines, but it's the kind of complicated trickery I'd expect from it.

Check this out:http://steve.hollasch.net/cgindex/coding/portfloat.html
They give a routine which is portable and doesnt add too much overhead.

casting doubles to integers in order to gain speed

in Redis (http://code.google.com/p/redis) there are scores associated to elements, in order to take this elements sorted. This scores are doubles, even if many users actually sort by integers (for instance unix times).
When the database is saved we need to write this doubles ok disk. This is what is used currently:
snprintf((char*)buf+1,sizeof(buf)-1,"%.17g",val);
Additionally infinity and not-a-number conditions are checked in order to also represent this in the final database file.
Unfortunately converting a double into the string representation is pretty slow. While we have a function in Redis that converts an integer into a string representation in a much faster way. So my idea was to check if a double could be casted into an integer without lost of data, and then using the function to turn the integer into a string if this is true.
For this to provide a good speedup of course the test for integer "equivalence" must be fast. So I used a trick that is probably undefined behavior but that worked very well in practice. Something like that:
double x = ... some value ...
if (x == (double)((long long)x))
use_the_fast_integer_function((long long)x);
else
use_the_slow_snprintf(x);
In my reasoning the double casting above converts the double into a long, and then back into an integer. If the range fits, and there is no decimal part, the number will survive the conversion and will be exactly the same as the initial number.
As I wanted to make sure this will not break things in some system, I joined #c on freenode and I got a lot of insults ;) So I'm now trying here.
Is there a standard way to do what I'm trying to do without going outside ANSI C? Otherwise, is the above code supposed to work in all the Posix systems that currently Redis targets? That is, archs where Linux / Mac OS X / *BSD / Solaris are running nowaday?
What I can add in order to make the code saner is an explicit check for the range of the double before trying the cast at all.
Thank you for any help.

Perhaps some old fashion fixed point math could help you out. If you converted your double to a fixed point value, you still get decimal precision and converting to a string is as easy as with ints with the addition of a single shift function.
Another thought would be to roll your own snprintf() function. Doing the conversion from double to int is natively supported by many FPU units so that should be lightning fast. Converting that to a string is simple as well.
Just a few random ideas for you.

The problem with doing that is that the comparisons won't work out the way you'd expect. Just because one floating point value is less than another doesn't mean that its representation as an integer will be less than the other's. Also, I see you comparing one of the (former) double values for equality. Due to rounding and representation errors in the low-order bits, you almost never want to do that.
If you are just looking for some kind of key to do something like hashing on, it would probably work out fine. If you actually care about which values really have greater or lesser value, its a bad idea.

I don't see a problem with the casts, as long as x is within the range of long long. Maybe you should check out the modf() function which separates a double into its integral and fractional part. You can then add checks against (double)LLONG_MIN and (double)LLONG_MAX for the integral part to make sure. Though there may be difficulties with the precision of double.
But before doing anything of this, have you made sure it actually is a bottleneck by measuring its performance? And is the percentage of integer values high enough that it would really make a difference?

Your test is perfectly fine (assuming you have already separately handled infinities and NANs by this point) - and it's probably one of the very few occaisions when you really do want to compare floats for equality. It doesn't invoke undefined behaviour - even if x is outside of the range of long long, you'll just get an "implementation-defined result", which is OK here.
The only fly in the ointment is that negative zero will end up as positive zero (because negative zero compares equal to positive zero).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight