double precision lost when parsing csv file in C - c

I'm trying to read in a file in c with the following format:
6.43706064058,4.15417249035
3.43706064058,1.15417249035
...
I'm able to parse out the two doubles, but when I print out what I've parsed, I notice that I only get up to 6 decimal places. Here is my code:
long double d1;
long double d2;
fscanf(file, "%Lf,%Lf", &d1, &d2);
printf("x:%Lf, y:%Lf", d1, d2);
Output:
x:6.437061, y:4.154172
...
Where am I losing the precision? Is it possible that its being read in correctly, but my printf statement isn't showing all the precision?

Is it possible that its being read in correctly, but my printf statement isn't showing all the precision?
That's exactly what's happening. From the printf(3) man page:
... the number of digits after the
decimal-point character is equal to the precision specification.
If the precision is missing, it is taken as 6 ...
Tell printf to show more precision by changing your format string:
printf("x:%.11Lf, y:%.11Lf", d1, d2);

The default %f format only prints 6 places after the decimal point, which gives you much less precision than the actual floating point value (unless the exponent is large) and possibly no precision at all (if the exponent is more than slightly negative). Unless you know all your values are bounded away from zero (e.g. all greater than 1), you really need to use the %g format (which can switch to exponential notation as needed) or the %e format (which always uses exponential notation) to print floating point values in a way that preserves their precision.
You also need to use sufficiently many decimal places. For IEEE double precision, 17 decimal places is sufficient, so %.17g would be the preferred format. For long double, it depends on the type used on your particular implementation. Thankfully, C offers a macro, DECIMAL_DIG, that gives you exactly the number of places you need. So you would use:
printf("%.*Lg", DECIMAL_DIG, x);
or similar. Note that this will print more places than were originally present in your input file. If you know your input always has a particular number of places, you could perhaps just hard-code that instead of using DECIMAL_DIG to get a more uniform output.

The reason you are not as far with the precision as you'd like to be is because the level of the spacing in the number is not enough. In the first number, 6.43706064058, you have 13 numbers, including the decimal, so you'd put
printf("x:%13Lf, y:%Lf", d1, d2);
allowing 13 spaces for the x:
for the second number, 4.15417249035, you have 13 also, so for that, you'd put
printf("x:%13Lf, y:13%Lf", d1, d2);
and that will print:
x:6.43706064058, y:4.15417249035
you must allow room for all of the spaces within the number when doing the printf function.
Hope that helped!

Related

How does printf for float does not print the correct value for floating point [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
So i have been trying to make my own printf and now i stuck at %f.
The problem i have is i don't know what printf does in the background when i give it a float number like: f = 1.4769996 it print 1.477000.
but when i give it f = 1.4759995 it print the value 1.475999
float f = 1.4769996;
printf("%f\n", f); // 1.477000
f = 1.4759995;
printf("%f\n", f); // 1.475999
what i thought of is that printf see the 5 at last and it adds one but not working in the second example.
What is the logic behind this floating point ?
Your C implementation likely uses the IEEE-754 binary32 and binary64 formats for float and double. Given this, float f = 1.4769996; results in setting f to 1.47699964046478271484375, and f = 1.4759995; results in setting f to 1.47599947452545166015625.
Then it is easy to see that rounding 1.47699964046478271484375 to six digits after the decimal point results in 1.477000 (because the next digit is 6, so we round up), and rounding 1.47599947452545166015625 to six digits after the decimal point results in 1.475999 (because the next digit is 4, so we round down).
When working with floating-point numbers, it is important to understand each floating-point value represents one number exactly (unless it is a Not a Number [NaN] encoding). When you write 1.4769996 in source code, it is converted to a value representable in double. When you assign it to a float, it is converted to a value representable in float. Operations on the floating-point object behave as if the object have exactly the value it represents, not as if its value is the numeral you wrote in source code.
To provide some further details, the C standard requires (in C 2018 7.21.6.1 13) that formatting with f be correctly rounded if the number of digits requested is at most DECIMAL_DIG. DECIMAL_DIG is the number of decimal digits in the widest floating-point format the implementation supports such that converting any number in that format to a numeral with DECIMAL_DIG significant decimal digits and back to the floating-point format yields the original value (5.2.4.2.2 12). DECIMAL_DIG must be at least 10. If more than DECIMAL_DIG digits are requested, the C standard allows some leeway in rounding. However, high-quality C implementations will round correctly as specified by IEEE-754 (to the nearest number with the requested number of digits, with ties favoring an even low digit).
If you are trying to write your own printf, and if you are stuck on %f, there are three or four things you need to know:
When a "varargs" function like printf is called, arguments of type float are always implicitly promoted to type double. So when you've seen %f in the format string, and you're using va_arg() to pluck the next argument from the list, you'll want to pluck an argument of type double, not float. (This also means that you have just one case to handle, not two. Inside printf, you don't have to worry about handling type float at all.)
Printing the whole-number part of a double is easy; it's more or less the same problem as printing an int, which I'm guessing you've already figured out, if you've got %d working. And to do a straightforward, simpleminded job of printing the fractional part, it usually works pretty well to just repeatedly multiply by 10. That is, if you're trying to print 123.456, and you've already got the 123 part taken care of, you can then proceed to print the rest by taking the fractional part 0.456, multiplying by 10 to get 4.56 then truncating to get 4, then taking the new fractional part 0.56 and repeating.
There is no such number as 1.4769996. (There's no such number as the 123.456 I was just using, either.) When we write numbers like 1.4769996 and 123.456 we're thinking about decimal fractions, but most computers (including the one you're using) use binary fractions internally, and you can't represent decimal fractions like 1.4769996 and 123.456 exactly in binary, so the actual numbers are always a little bit different than you expect, which is why you often get slight "roundoff error", or extra 999's at the end when you expected 000.
Doing a proper job on this stuff is really, really hard. If you're trying to write your own printf, and you've gotten to %f, and if you can get it working pretty well most of the time, consider yourself lucky, and call it a day. Don't get bogged down on the last digit -- or if you're bound and determined to get the last digit right in every case (which is certainly a noble goal), do some research and set aside some time, because you're going to be working at it for a while.

How can I round a float to a given decimal precision

I need to convert a floating-point number with system precision to one with a specified precision (e.g. 3 decimal places) for the printed output. The fprintf function will not suffice for this as it will not correctly round some numbers. All the other solutions I've tried fail in that they all reintroduce undesired precision when I convert back to a float. For example:
float xf_round1_f(float input, int prec) {
printf("%f\t",input);
int trunc = round(input * pow(10, prec));
printf("%f\t",(float)trunc);
input=(float)trunc / pow(10, prec);
printf("%f\n",input);
return (input);
}
This function prints the input, the truncated integer and the output to each line, and the result looks like this for some numbers supposed to be truncated to 3 decimal places:
49.975002 49975.000000 49.974998
49.980000 49980.000000 49.980000
49.985001 49985.000000 49.985001
49.990002 49990.000000 49.990002
49.995003 49995.000000 49.994999
50.000000 50000.000000 50.000000
You can see that the second step works as intended - even when "trunc" is cast to float for printing - but as soon as I convert it back to a float the precision returns. The 1st and 6th rows illustrate problem cases.
Surely there must be a way of resolving this - even if the 1st row result remained 49.975002 a formatted print would give the desired effect, but in this case there is a real problem.
Any solutions?
Binary floating-point cannot represent most decimal numerals exactly. Each binary floating-point number is formed by multiplying an integer by a power of two. For the common implementation of float, IEEE-754 32-bit binary floating-point, that integer must be in (–224, 224). There is no integer x and integer y such that x•2y exactly equals 49.975. Therefore, when you divide 49975 by 1000, the result must be an approximation.
If you merely need to format a number for output, you can do this with the usual fprintf format specifiers. If you need to compute exactly with such numbers, you may be able to do it by scaling them to representable values and doing the arithmetic either in floating-point or in integer arithmetic, depending on your needs.
Edit: it appears you may only care about the printed results. printf is generally smart enough to do proper rounding to the number of digits you specify. If you give a format of "%.3f" you will probably get what you need.
If your only problem is with the cases that are below the desired number, you can easily fix it by making everything higher than the desired number instead. Unfortunately this increases the absolute error of the answer; even a result that was exact before, such as 50.000 is now off.
Simply add this line to the end of the function:
input=nextafterf(input, input*1.0001);
See it in action at http://ideone.com/iHNTzs
49.975002 49975.000000 49.974998 49.975002
49.980000 49980.000000 49.980000 49.980003
49.985001 49985.000000 49.985001 49.985004
49.990002 49990.000000 49.990002 49.990005
49.995003 49995.000000 49.994999 49.995003
50.000000 50000.000000 50.000000 50.000004
If you require exact representation of all decimal fractions with three digits after the decimal point, you can work in thousandths. Use an integer data type to represent one thousand times the actual number for all intermediate results.
Fixed point numbers. That is where you keep the actual numbers in a wide precision integer format, for example long or long long. And you also keep the number of decimal places. And then you will also need methods to scale the fixed point number by the decimal places. And some way to convert to/from strings.
The reason why you are having trouble that 1/10 is not representable exactly as a fractional power of 2 (1/2, 1/4, 1/8, etc). This is the same reason that 1/3 is a repeating decimal in base 10 (0.33333...).

How do printf and scanf handle floating point precision formats?

Consider the following snippet of code:
float val1 = 214.20;
double val2 = 214.20;
printf("float : %f, %4.6f, %4.2f \n", val1, val1, val1);
printf("double: %f, %4.6f, %4.2f \n", val2, val2, val2);
Which outputs:
float : 214.199997, 214.199997, 214.20 | <- the correct value I wanted
double: 214.200000, 214.200000, 214.20 |
I understand that 214.20 has an infinite binary representation. The first two elements of the first line have an approximation of the intended value, but the the last one seems to have no approximation at all, and this led me to the following question:
How do the scanf, fscanf, printf, fprintf (etc.) functions treat the precision formats?
With no precision provided, printf printed out an approximated value, but with %4.2f it gave the correct result. Can you explain me the algorithm used by these functions to handle precision?
The thing is, 214.20 cannot be expressed exactly with binary representation. Few decimal numbers can. So an approximation is stored. Now when you use printf, the binary representation is turned into a decimal representation, but it again cannot be expressed exactly and is only approximated.
As you noticed, you can give a precision to printf to tell it how to round the decimal approximation. And if you don't give it a precision then a precision of 6 is assumed (see the man page for details).
If you use %.40f for the float and %.40lf for the double in your example above, you will get these results:
214.1999969482421875000000000000000000000000
214.1999999999999886313162278383970260620117
They are different because with double, there are more bits to better approximate 214.20. But as you can see, they are still very odd when represented in decimal.
I recommend to read the Wikipedia article on floating point numbers for more insights about how floating point numbers work. An excellent read is also What Every Computer Scientist Should Know About Floating-Point Arithmetic
Since you asked about scanf, one thing you should note is that POSIX requires printf and a subsequent scanf (or strtod) to reconstruct the original value exactly as long as sufficiently significant digits (at least DECIMAL_DIG, I believe) were printed. Plain C of course makes no such requirement; the C standard pretty much allows floating point operations to give whatever result the implementor likes as long as they document it. However, if your intent is to store floating point numbers in text files to be read back later, you might be better off using the C99 %a specifier to print them in hex. This way they'll be exact and there's no confusion about whether the serialize/deserialize process loses precision.
Keep in mind thatr when I said "reconstruct the original value", I mean the actual value which was held in the variable/expression passed to printf, not the original decimal you wrote in the source file which got rounded to the best binary representation by the compiler.
scanf will round the input value to the nearest exactly representable floating point value. As DarkDust's answer illustrates, in single precision the closest exactly representable value is below the exact value, and in double precision the closest exactly representable value is above the exact value.

C: How long can a double be when printed through printf()

I need to specify the exact length of a string to be printed from a double value, but I don't want to restrict the output any more than is necessary.
What is the maximum length that a 6-digit precision double will have when formatted by printf()?
Specifically, what value should I give to X in printf("%X.6lg",doubleValue); to ensure that no value gets truncated?
The reason I need to be specific about the length is that I'm defining an MPI derived datatype made up of lots of string representations of double values and must know their exact length in order to divide regions of the file between MPI processes.
I hope that's clear. Thanks in advance for answering.
use printf("%.6g", doubleValue) or printf("%.6Lg", doubleValue)
Note that leaving off the leading digits (the "width") in the precision specifier makes no demands on the length of the integral portion of the value. Also note that the undercase "l" will specify that your value is a long int. The uppercase "L" specifies a long double value.Also note that if you don't want this to be potentially changed to scientific notation (if it is a shorter representation), then you would use "f" instead of "g".See a printf reference here.
The maximum exponent of an IEEE double is 1023, so largest double will be 1 + 1/2 + 1/4 + 1/8 + ... etc * 2^1023. Which will be about 318 characters long, in decimal notation.
Why not use the "e" format specifier?
There's < float.h > containing many useful values, among them DECIMAL_DIG, which is for a long double however.
The same file will most likely tell you that a double on your platform has more than 6 digits of precision...
PS: Also note Demi's answer above. He points out various flaws in your printf() that escaped me.

What Comes After The %?

I've searched for this a little but I have not gotten a particularly straight answer. In C (and I guess C++), how do you determine what comes after the % when using printf?. For example:
double radius = 1.0;
double area = 0.0;
area = calculateArea( radius );
printf( "%10.1f %10.2\n", radius, area );
I took this example straight from a book that I have on the C language. This does not make sense to me at all. Where do you come up with 10.1f and 10.2f? Could someone please explain this?
http://en.wikipedia.org/wiki/Printf#printf_format_placeholders is Wikipedia's reference for format placeholders in printf. http://www.cplusplus.com/reference/clibrary/cstdio/printf.html is also helpful
Basically in a simple form it's %[width].[precision][type]. Width allows you to make sure that the variable which is being printed is at least a certain length (useful for tables etc). Precision allows you to specify the precision a number is printed to (eg. decimal places etc) and the informs C/C++ what the variable you've given it is (character, integer, double etc).
Hope this helps
UPDATE:
To clarify using your examples:
printf( "%10.1f %10.2\n", radius, area );
%10.1f (referring to the first argument: radius) means make it 10 characters long (ie. pad with spaces), and print it as a float with one decimal place.
%10.2 (referring to the second argument: area) means make it 10 character long (as above) and print with two decimal places.
man 3 printf
on a Linux system will give you all the information you need. You can also find these manual pages online, for example at http://linux.die.net/man/3/printf
10.1f means floating point with 1 place after the decimal point and the 10 places before the decimal point. If the number has less than 10 digits, it's padded with spaces. 10.2f is the same, but with 2 places after the decimal point.
On every system I've seen, from Unix to Rails Migrations, this is not the case. #robintw expresses it best:
Basically in a simple form it's %[width].[precision][type].
That is, not "10 places before the decimal point," but "10 places, both before and after, and including the decimal point."
10.1f means floating point with 10 characters wide with 1 place after the decimal point.
If the number has less than 10 digits, it's padded with spaces.
10.2f is the same, but with 2 places after the decimal point.
You have these basic types:
%d - integer
%x - hex integer
%s - string
%c - char (only one)
%f - floating point (float)
%d - signed int (decimal)
%i - signed int (integer) (same as decimal).
%u - unsigned int
%ld - long (signed) int
%lu - long unsigned int
%lld - long long (signed) int
%llu - long long unsigned int
Edit: there are several others listed in #Eli's response (man 3 printf).
10.1f means you want to display a float with 1 decimal and the displayed number should be 10 characters long.
In short, those values after the % tell printf how to interpret (or output) all of the variables coming later. In your example, radius is interpreted as a float (this the 'f'), and the 10.1 gives information about how many decimal places to use when printing it out.
See this link for more details about all of the modifiers you can use with printf.
Man pages contain the information you want. To read what you have above:
printf( "%10.2f", 1.5 )
This will print:
1.50
Whereas:
printf("%.2f", 1.5 )
Prints:
1.50
Note the justification of both.
Similarly:
printf("%10.1f", 1.5 )
Would print:
1.5
Any number after the . is the precision you want printed. Any number before the . is the distance from the left margin.
One issue that hasn't been raised by others is whether double is the same as a float. On some systems a different format specifier was needed for a double compared to a float. Not least because the parameters passed could be of different sizes.
%f - float
%lf - double
%g - double

Resources