getfloat returning 23.7999 - c

#define MAXBUF 1000
int buf[MAXBUF];
int buffered = 0;
int bufp = 0;
int getch()
{
if(bufp > 0) {
if(!--bufp)
buffered = 0;
return buf[bufp];
}
else {
buffered = 0;
return getchar();
}
}
void ungetch(int c)
{
buf[bufp++] = c;
buffered = 1;
}
int getfloat(float *pn)
{
int c, sign, sawsign;
float power = 1.0;
while(isspace(c=getch()))
;
if(!isdigit(c) && c!= '+' && c!= '-' && c != '.') {
ungetch(c);
return 0;
}
sign = (c == '-') ? -1 : 1;
if(sawsign = (c == '-' || c == '+'))
c = getch();
if(c != '.' && !isdigit(c)) {
ungetch(c);
if(sawsign)
ungetch((sign == -1) ? '-' : '+');
return 0;
}
for(*pn = 0.0; isdigit(c); c = getch())
*pn = 10.0 * *pn + (float)(c - '0');
if(c == '.')
while(isdigit(c = getch())) {
*pn = 10.0 * *pn + (float)(c - '0');
power *= 10.0;
}
*pn *= sign;
*pn /= power;
ungetch(c);
return c;
}
It always returns 23.7999 when i enter 23.8, and i have no idea why. Can anybody tell me why?

Numbers are represented in base 2, and base-2 floating-point values cannot represent every base-10 decimal value exactly. What you enter as 23.8 gets converted into its closest equivalent base-2 value, which is not exactly 23.8. When you print this approximate value out, it gets printed as 23.7999.
You are also using float, which is the smallest floating-point type, and has only 24 bits of precision (roughly 7 decimal digits). If you switch to double, the amount of bits of precision more than doubles from float, so the difference between a decimal value such as 23.8 and its double representation is much smaller. This may allow a printing routine to perform the rounding better so that you see 23.8 with double. However, the actual value in the variable is still not exactly 23.8.
As general advice, unless you have a huge number of floating-point values (making memory usage your primary concern), it is best to use double whenever you need a floating-point type. You don't get rid of all odd behavior but you're going to see less of it than with float.

Because certain floating point numbers are inherently inaccurate.

23.8 can't be represented exactly given the limited accuracy of IEEE 754 floats.

Related

Concatenating binary numbers

I am trying to code a program that will take a floating point number in base 10 and convert its fractional part in base 2. In the following code, I am intending to call my converting function into a printf, and format the output; the issue I have lies in my fra_binary() where I can't figure out the best way to return an integer made of the result of the conversion at each turn respectively (concatenation). Here is what I have done now (the code is not optimized because I am still working on it) :
#include <stdio.h>
#include <math.h>
int fra_binary(double fract) ;
int main()
{
long double n ;
double fract, deci ;
printf("base 10 :\n") ;
scanf("%Lf", &n) ;
fract = modf(n, &deci) ;
int d = deci ;
printf("base 2: %d.%d\n", d, fra_binary(fract)) ;
return(0) ;
}
int fra_binary(double F)
{
double fl ;
double decimal ;
int array[30] ;
for (int i = 0 ; i < 30 ; i++) {
fl = F * 2 ;
F = modf(fl, &decimal) ;
array[i] = decimal ;
if (F == 0) break ;
}
return array[0] ;
}
Obviously this returns partly the desired output, because I would need the whole array concatenated as one int or char to display the series of 1 and 0s I need. So at each turn, I want to use the decimal part of the number I work on as the binary number to concatenate (1 + 0 = 10 and not 1). How would I go about it?
Hope this makes sense!
return array[0] ; is only the first value of int array[30] set in fra_binary(). Code discards all but the first calculation of the loop for (int i = 0 ; i < 30 ; i++).
convert its fractional part in base 2
OP's loop idea is a good starting point. Yet int array[30] is insufficient to encode the fractional portion of all double into a "binary".
can't figure out the best way to return an integer
Returning an int will be insufficient. Instead consider using a string - or manage an integer array in a likewise fashion.
Use defines from <float.h> to drive the buffer requirements.
#include <stdio.h>
#include <math.h>
#include <float.h>
char *fra_binary(char *dest, double x) {
_Static_assert(FLT_RADIX == 2, "Unexpected FP base");
double deci;
double fract = modf(x, &deci);
fract = fabs(fract);
char *s = dest;
do {
double d;
fract = modf(fract * 2.0, &d);
*s++ = "01"[(int) d];
} while (fract);
*s = '\0';
// For debug
printf("%*.*g --> %.0f and .", DBL_DECIMAL_DIG + 8, DBL_DECIMAL_DIG, x,
deci);
return dest;
}
int main(void) {
// Perhaps 53 - -1021 + 1
char fraction_string[DBL_MANT_DIG - DBL_MIN_EXP + 1];
puts(fra_binary(fraction_string, -0.0));
puts(fra_binary(fraction_string, 1.0));
puts(fra_binary(fraction_string, asin(-1))); // machine pi
puts(fra_binary(fraction_string, -0.1));
puts(fra_binary(fraction_string, DBL_MAX));
puts(fra_binary(fraction_string, DBL_MIN));
puts(fra_binary(fraction_string, DBL_TRUE_MIN));
}
Output
-0 --> -0 and .0
1 --> 1 and .0
3.1415926535897931 --> 3 and .001001000011111101101010100010001000010110100011
-0.10000000000000001 --> -0 and .0001100110011001100110011001100110011001100110011001101
1.7976931348623157e+308 --> 179769313486231570814527423731704356798070600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 and .0
2.2250738585072014e-308 --> 0 and .00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
4.9406564584124654e-324 --> 0 and .000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
Also unclear why input is long double, yet processing is with double. Recommend using just one FP type.
Note that your algorithm finds out the binary representation of the fraction most significant bit first.
One way to convert the fractional part to a binary string, would be to supply the function with a string and a string length, and have the function fill it with up to that many binary digits:
/* This function returns the number of chars needed in dst
to describe the fractional part of value in binary,
not including the trailing NUL ('\0').
Returns zero in case of an error (non-finite value).
*/
size_t fractional_bits(char *dst, size_t len, double value)
{
double fraction, integral;
size_t i = 0;
if (!isfinite(value))
return 0;
if (value > 0.0)
fraction = modf(value, &integral);
else
if (value < 0.0)
fraction = modf(-value, &integral);
else {
/* Zero fraction. */
if (len > 1) {
dst[0] = '0';
dst[1] = '\0';
} else
if (len > 0)
dst[0] = '\0';
/* One binary digit was needed for exact representation. */
return 1;
}
while (fraction > 0.0) {
fraction = fraction * 2.0;
if (fraction >= 1.0) {
fraction = fraction - 1.0;
if (i < len)
dst[i] = '1';
} else
if (i < len)
dst[i] = '0';
i++;
}
if (i < len)
dst[i] = '\0';
else
if (len > 0)
dst[len - 1] = '\0';
return i;
}
The above function works very much like snprintf(), except it takes only the double whose fractional bits are to be stored as a string of binary digits (0 or 1). and returns 0 in case of an error (non-finite double value).
Another option is to use an unsigned integer type to hold the bits. For example, if your code is intended to work on architectures where double is an IEEE-754 Binary64 type or similar, the mantissa has up to 53 bits of precision, and an uint64_t would suffice.
Here is an example of that:
uint64_t fractional_bits(const double val, size_t bits)
{
double fraction, integral;
uint64_t result = 0;
if (bits < 1 || bits > 64) {
errno = EINVAL;
return 0;
}
if (!isfinite(val)) {
errno = EDOM;
return 0;
}
if (val > 0.0)
fraction = modf(val, &integral);
else
if (val < 0.0)
fraction = modf(-val, &integral);
else {
errno = 0;
return 0;
}
while (bits-->0) {
result = result << 1;
fraction = fraction * 2.0;
if (fraction >= 1.0) {
fraction = fraction - 1.0;
result = result + 1;
}
}
errno = 0;
return result;
}
The return value is the binary representation of the fractional part: [i]fractional_part[/i] ≈ [i]result[/i] / 2[sup][i]bits[/i][/sup], where [i]bits[/i] is between 1 and 64, inclusive.
In order for the caller to detect an error, the function clears errno to zero if no error occurred. If an error does occur, the function returns zero with errno set to EDOM if the value is not finite, or to EINVAL if bits is less than 1 or greater than 64.
You can combine the two approaches, if you implement an arbitrary-size unsigned integer type, or a bitmap type.

K&R book exercise 4-2

I'm studying K&R book. I'm currently at chapter 4. I was reading the atof() function on page 71. Function atof(s) converts string to its double precision floating point equivalent.
The code of atof() is as following:
//atof: convert string s to double
double atof2(char s[])
{
double val, power;
int i, sign;
for (i = 0; isspace(s[i]); ++i) //skip white space
;
sign = (s[i] == '-') ? -1: 1;
if (s[i] == '-' || s[i] == '-')
++i;
for (val = 0.0; isdigit(s[i]); i++)
val = 10.0 * val + (s[i] - '0');
if (s[i] == '.')
++i;
for (power = 1.0; isdigit(s[i]); i++) {
val = 10.0 * val + (s[i] - '0');
power *= 10.0;
}
return sign * val / power;
}
My question is about variable: power. Why do we need it for?
I do understand the use of variable: "val" but i'm not sure about variable: "power". Why do we divide val by power?
Variable power is for division of number by power , to get result as float point .
Let your string be -12.83 , then first for loop will check for space and increment i as no space so ,i=0 .
sign will be -1 as s[i]=s[0]='-' .
In next two loops string's values are converted to integers and stored in val ( excluding . - figure out yourself) .
Now after both loop val will be 1283 . But last loop will iterate for 2 times and power will be changed to 100.00 (10*1.0 in first iteration and 10*10.0 in second iteration) .
Now to get value as float point val is divided by power and multiplied by sign .
So , what it will return is -1*1283/100 , thus -12.83 is your float point number .

Writing IEEE 754-1985 double as ASCII on a limited 16 bytes string

This is a follow-up to my original post. But I'll repeat it for clarity:
As per DICOM standard, a type of floating point can be stored using a Value Representation of Decimal String. See Table 6.2-1. DICOM Value Representations:
Decimal String: A string of characters representing either a fixed
point number or a floating point number. A fixed point number shall
contain only the characters 0-9 with an optional leading "+" or "-"
and an optional "." to mark the decimal point. A floating point number
shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to
indicate the start of the exponent. Decimal Strings may be padded with
leading or trailing spaces. Embedded spaces are not allowed.
"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default
Character Repertoire. 16 bytes maximum
The standard is saying that the textual representation is fixed point vs. floating point. The standard only refers to how the values are represented within in the DICOM data set itself. As such there is not requirement to load a fixed point textual representation into a fixed-point variable.
So now that this is clear that DICOM standard implicitely recommend double (IEEE 754-1985) for representing a Value Representation of type Decimal String (maximum of 16 significant digits). My question is how do I use the standard C I/O library to convert back this binary representation from memory into ASCII onto this limited sized string ?
From random source on internet, this is non-trivial, but a generally accepted solution is either:
printf("%1.16e\n", d); // Round-trippable double, always with an exponent
or
printf("%.17g\n", d); // Round-trippable double, shortest possible
Of course both expression are invalid in my case since they can produce output much longer than my limited maximum of 16 bytes. So what is the solution to minimizing the loss in precision when writing out an arbitrary double value to a limited 16 bytes string ?
Edit: if this is not clear, I am required to follow the standard. I cannot use hex/uuencode encoding.
Edit 2: I am running the comparison using travis-ci see: here
So far the suggested codes are:
Serge Ballesta
chux
Mark Dickinson
chux
Results I see over here are:
compute1.c leads to a total sum error of: 0.0095729050923877828
compute2.c leads to a total sum error of: 0.21764383725715469
compute3.c leads to a total sum error of: 4.050031792674619
compute4.c leads to a total sum error of: 0.001287056579548422
So compute4.c leads to the best possible precision (0.001287056579548422 < 4.050031792674619), but triple (x3) the overall execution time (only tested in debug mode using time command).
It is trickier than first thought.
Given the various corner cases, it seems best to try at a high precision and then work down as needed.
Any negative number prints the same as a positive number with 1 less precision due to the '-'.
'+' sign not needed at the beginning of the string nor after the 'e'.
'.' not needed.
Dangerous to use anything other than sprintf() to do the mathematical part given so many corner cases. Given various rounding modes, FLT_EVAL_METHOD, etc., leave the heavy coding to well established functions.
When an attempt is too long by more than 1 character, iterations can be saved. E.g. If an attempt, with precision 14, resulted with a width of 20, no need to try precision 13 and 12, just go to 11.
Scaling of the exponent due to the removal of the '.', must be done after sprintf() to 1) avoid injecting computational error 2) decrementing a double to below its minimum exponent.
Maximum relative error is less than 1 part in 2,000,000,000 as with -1.00000000049999e-200. Average relative error about 1 part in 50,000,000,000.
14 digit precision, the highest, occurs with numbers like 12345678901234e1 so start with 16-2 digits.
static size_t shrink(char *fp_buffer) {
int lead, expo;
long long mant;
int n0, n1;
int n = sscanf(fp_buffer, "%d.%n%lld%ne%d", &lead, &n0, &mant, &n1, &expo);
assert(n == 3);
return sprintf(fp_buffer, "%d%0*llde%d", lead, n1 - n0, mant,
expo - (n1 - n0));
}
int x16printf(char *dest, size_t width, double value) {
if (!isfinite(value)) return 1;
if (width < 5) return 2;
if (signbit(value)) {
value = -value;
strcpy(dest++, "-");
width--;
}
int precision = width - 2;
while (precision > 0) {
char buffer[width + 10];
// %.*e prints 1 digit, '.' and then `precision - 1` digits
snprintf(buffer, sizeof buffer, "%.*e", precision - 1, value);
size_t n = shrink(buffer);
if (n <= width) {
strcpy(dest, buffer);
return 0;
}
if (n > width + 1) precision -= n - width - 1;
else precision--;
}
return 3;
}
Test code
double rand_double(void) {
union {
double d;
unsigned char uc[sizeof(double)];
} u;
do {
for (size_t i = 0; i < sizeof(double); i++) {
u.uc[i] = rand();
}
} while (!isfinite(u.d));
return u.d;
}
void x16printf_test(double value) {
printf("%-27.*e", 17, value);
char buf[16+1];
buf[0] = 0;
int y = x16printf(buf, sizeof buf - 1, value);
printf(" %d\n", y);
printf("'%s'\n", buf);
}
int main(void) {
for (int i = 0; i < 10; i++)
x16printf_test(rand_double());
}
Output
-1.55736829786841915e+118 0
'-15573682979e108'
-3.06117209691283956e+125 0
'-30611720969e115'
8.05005611774356367e+175 0
'805005611774e164'
-1.06083057094522472e+132 0
'-10608305709e122'
3.39265065244054607e-209 0
'33926506524e-219'
-2.36818580315246204e-244 0
'-2368185803e-253'
7.91188576978592497e+301 0
'791188576979e290'
-1.40513111051994779e-53 0
'-14051311105e-63'
-1.37897140950449389e-14 0
'-13789714095e-24'
-2.15869805640288206e+125 0
'-21586980564e115'
For finite floating point values the printf() format specifier "%e" well matches
"A floating point number shall be ... with an "E" or "e" to indicate the start of the exponent"
[−]d.ddd...ddde±dd
The sign is present with negative numbers and likely -0.0. The exponent is at least 2 digits.
If we assume DBL_MAX < 1e1000, (safe for IEEE 754-1985 double), then the below works in all cases: 1 optional sign, 1 lead digit, '.', 8 digits, 'e', sign, up to 3 digits.
(Note: the "16 bytes maximum" does not seem to refer to C string null character termination. Adjust by 1 if needed.)
// Room for 16 printable characters.
char buf[16+1];
int n = snprintf(buf, sizeof buf, "%.*e", 8, x);
assert(n >= 0 && n < sizeof buf);
puts(buf);
But this reserves room for the optional sign and 2 to 3 exponent digits.
The trick is the boundary, due to rounding, of when a number uses 2 or uses 3 exponent digits is fuzzy. Even testing for negative numbers, the -0.0 is an issue.
[Edit] Also needed test for very small numbers.
Candidate:
// Room for 16 printable characters.
char buf[16+1];
assert(isfinite(x)); // for now, only address finite numbers
int precision = 8+1+1;
if (signbit(x)) precision--; // Or simply `if (x <= 0.0) precision--;`
if (fabs(x) >= 9.99999999e99) precision--; // some refinement possible here.
else if (fabs(x) <= 1.0e-99) precision--;
int n = snprintf(buf, sizeof buf, "%.*e", precision, x);
assert(n >= 0 && n < sizeof buf);
puts(buf);
Additional concerns:
Some compilers print at least 3 exponent digits.
The maximum number of decimal significant digits for IEEE 754-1985 double needed varies on definition of need, but likely about 15-17. Printf width specifier to maintain precision of floating-point value
Candidate 2: One time test for too long an output
// Room for N printable characters.
#define N 16
char buf[N+1];
assert(isfinite(x)); // for now, only address finite numbers
int precision = N - 2 - 4; // 1.xxxxxxxxxxe-dd
if (signbit(x)) precision--;
int n = snprintf(buf, sizeof buf, "%.*e", precision, x);
if (n >= sizeof buf) {
n = snprintf(buf, sizeof buf, "%.*e", precision - (n - sizeof buf) - 1, x);
}
assert(n >= 0 && n < sizeof buf);
puts(buf);
C library formatter has no direct format for your requirement. At a simple level, if you can accept the waste of characters of the standard %g format (e20 is written e+020: 2 chars wasted), you can:
generate the output for the %.17g format
if it is greater the 16 characters, compute the precision that would lead to 16
generate the output for that format.
Code could look like:
void encode(double f, char *buf) {
char line[40];
char format[8];
int prec;
int l;
l = sprintf(line, "%.17g", f);
if (l > 16) {
prec = 33 - strlen(line);
l = sprintf(line, "%.*g", prec, f);
while(l > 16) {
/* putc('.', stdout);*/
prec -=1;
l = sprintf(line, "%.*g", prec, f);
}
}
strcpy(buf, line);
}
If you really try to be optimal (meaning write e30 instead of e+030), you could try to use %1.16e format and post-process the output. Rationale (for positive numbers):
the %1.16e format allows you to separate the mantissa and the exponent (base 10)
if the exponenent is between size-2 (included) and size (excluded): just correctly round the mantissa to the int part and display it
if the exponent is between 0 and size-2 (both included): display the rounded mantissa with the dot correctly placed
if the exponent is between -1 and -3 (both included): start with a dot, add eventual 0 and fill with rounded mantissa
else use a e format with minimal size for the exponent part and fill with the rounded mantissa
Corner cases:
for negative numbers, put a starting - and add the display for the opposite number and size-1
rounding : if first rejected digit is >=5, increase preceding number and iterate if it was a 9. Process 9.9999999999... as a special case rounded to 10
Possible code:
void clean(char *mant) {
char *ix = mant + strlen(mant) - 1;
while(('0' == *ix) && (ix > mant)) {
*ix-- = '\0';
}
if ('.' == *ix) {
*ix = '\0';
}
}
int add1(char *buf, int n) {
if (n < 0) return 1;
if (buf[n] == '9') {
buf[n] = '0';
return add1(buf, n-1);
}
else {
buf[n] += 1;
}
return 0;
}
int doround(char *buf, unsigned int n) {
char c;
if (n >= strlen(buf)) return 0;
c = buf[n];
buf[n] = 0;
if ((c >= '5') && (c <= '9')) return add1(buf, n-1);
return 0;
}
int roundat(char *buf, unsigned int i, int iexp) {
if (doround(buf, i) != 0) {
iexp += 1;
switch(iexp) {
case -2:
strcpy(buf, ".01");
break;
case -1:
strcpy(buf, ".1");
break;
case 0:
strcpy(buf, "1.");
break;
case 1:
strcpy(buf, "10");
break;
case 2:
strcpy(buf, "100");
break;
default:
sprintf(buf, "1e%d", iexp);
}
return 1;
}
return 0;
}
void encode(double f, char *buf, int size) {
char line[40];
char *mant = line + 1;
int iexp, lexp, i;
char exp[6];
if (f < 0) {
f = -f;
size -= 1;
*buf++ = '-';
}
sprintf(line, "%1.16e", f);
if (line[0] == '-') {
f = -f;
size -= 1;
*buf++ = '-';
sprintf(line, "%1.16e", f);
}
*mant = line[0];
i = strcspn(mant, "eE");
mant[i] = '\0';
iexp = strtol(mant + i + 1, NULL, 10);
lexp = sprintf(exp, "e%d", iexp);
if ((iexp >= size) || (iexp < -3)) {
i = roundat(mant, size - 1 -lexp, iexp);
if(i == 1) {
strcpy(buf, mant);
return;
}
buf[0] = mant[0];
buf[1] = '.';
strncpy(buf + i + 2, mant + 1, size - 2 - lexp);
buf[size-lexp] = 0;
clean(buf);
strcat(buf, exp);
}
else if (iexp >= size - 2) {
roundat(mant, iexp + 1, iexp);
strcpy(buf, mant);
}
else if (iexp >= 0) {
i = roundat(mant, size - 1, iexp);
if (i == 1) {
strcpy(buf, mant);
return;
}
strncpy(buf, mant, iexp + 1);
buf[iexp + 1] = '.';
strncpy(buf + iexp + 2, mant + iexp + 1, size - iexp - 1);
buf[size] = 0;
clean(buf);
}
else {
int j;
i = roundat(mant, size + 1 + iexp, iexp);
if (i == 1) {
strcpy(buf, mant);
return;
}
buf[0] = '.';
for(j=0; j< -1 - iexp; j++) {
buf[j+1] = '0';
}
if ((i == 1) && (iexp != -1)) {
buf[-iexp] = '1';
buf++;
}
strncpy(buf - iexp, mant, size + 1 + iexp);
buf[size] = 0;
clean(buf);
}
}
I think your best option is to use printf("%.17g\n", d); to generate an initial answer and then trim it. The simplest way to trim it is to drop digits from the end of the mantissa until it fits. This actually works very well but will not minimize the error because you are truncating instead of rounding to nearest.
A better solution would be to examine the digits to be removed, treating them as an n-digit number between 0.0 and 1.0, so '49' would be 0.49. If their value is less than 0.5 then just remove them. If their value is greater than 0.50 then increment the printed value in its decimal form. That is, add one to the last digit, with wrap-around and carry as needed. Any trailing zeroes that are created should be trimmed.
The only time this becomes a problem is if the carry propagates all the way to the first digit and overflows it from 9 to zero. This might be impossible, but I don't know for sure. In this case (+9.99999e17) the answer would be +1e18, so as long as you have tests for that case you should be fine.
So, print the number, split it into sign/mantissa strings and an exponent integer, and string manipulate them to get your result.
Printing in decimal cannot work because for some numbers a 17 digit mantissa is needed which uses up all of your space without printing the exponent. To be more precise, printing a double in decimal sometimes requires more than 16 characters to guarantee accurate round-tripping.
Instead you should print the underlying binary representation using hexadecimal. This will use exactly 16 bytes, assuming that a null-terminator isn't needed.
If you want to print the results using fewer than 16 bytes then you can basically uuencode it. That is, use more than 16 digits so that you can squeeze more bits into each digit. If you use 64 different characters (six bits) then a 64-bit double can be printed in eleven characters. Not very readable, but tradeoffs must be made.

convert number from base n to an integer

So I'm hoping to get a little guidance on this one. I have a function that takes a radix(base) and then using getchar() will get the number to convert from the given radix to an integer representation.
The only argument given is the radix number, then getchar() gets the number representation via the command line.
So if I pass
str2int 16
input a number: 3c
It should output (16^1*3) + (16^0*12) = 48 + 12 = 60.
I fully understand the math, and different ways of converting bases, but don't know how to go about coding something up. The math is always MUCH easier than the code, at least to me.
Another way to compute would be:
(702) base 15 = 15*7 + 0 = 105; 15*105 + 2 = 1577
I don't know how to express this in C only using getchar()? Is it possible to do without using the math function?
Keep getting one char at a time until not a digit or no more are needed.
unsigned shparkison(unsigned base) {
unsigned sum = 0;
int ch;
while ((ch = getchar()) != EOF) {
// one could instead look up the toupper(value) in an array "0123...ABC...Z";
// Following assumes ASCII
if (isdigit(ch)) ch -= '0';
else if (islower(ch)) ch -= 'A' - 10;
else if (isupper(ch)) ch -= 'a' - 10;
else {
break; // Not a digit
}
if (ch >= base) {
break; // Digit too high
}
unsigned sum_old = sum;
sum *= base;
sum += ch;
if (sum < sum_old) {
sum = sum_old;
break; // Overflow
}
}
ungetc(ch, stdin);
return sum;
}

Max Float Fraction

Platform: Linux 3.2.0 (Debian 7.0)
Compiler: GCC 4.7.2 (Debian 4.7.2-5)
I am trying to convert character strings into floats. I am aware that there are already functions that do this. I am only doing this for practice. My function works well with simple numbers like 123.123, 1234, -678.8. But when I try to convert the string .99999999 into a float I end up with 1. Which is obviously a problem. I do not know if this is because .99999999 cannot be expressed by a float or if I am doing something incorrectly. The question I am asking is how can I calculate the maximum fraction that a float can express. How do I, for lack of a better term, know when a float is about to overflow?
Here is what I have so far.
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
int cstrtof(const char *cstr, float *f);
int cstrtof(const char *cstr, float *f)
{
unsigned short int i = 0;
unsigned short int bool_fraction = 0;
float tmp_f = 0;
if(cstr[0] == '\000') return -1;
else if(cstr[0] == '-') i = 1;
for(; cstr[i] != '\000'; i++)
{
printf("tmp_f = %f\n", tmp_f);
if(cstr[i] >= '0' && cstr[i] <= '9')
{
if(tmp_f > (FLT_MAX - (cstr[i] - '0')) / 10) return -2;
else tmp_f = tmp_f * 10 + (cstr[i] - '0');
}
else if(cstr[i] == '.') bool_fraction = i+1;
else return i+1;
}
printf("tmp_f = %f\nbool_fraction = %i\n", tmp_f, bool_fraction);
if(bool_fraction)
{
for(bool_fraction--; bool_fraction < i-1; bool_fraction++, tmp_f /= 10)
{
printf("tmp_f = %f\n", tmp_f);
}
}
printf("tmp_f = %f\nbool_fraction = %i\n", tmp_f, bool_fraction);
if(cstr[0] == '-') *f = tmp_f*-1;
else *f = tmp_f;
return 0;
}
int main(int argc, char *argv[])
{
float f = 0;
int return_value = 0;
return_value = cstrtof(argv[1], &f);
if(return_value == 0)
{
printf("f = %.11f\n", f);
}
else if(return_value == -1)
{
printf("ERROR Empty String\n");
}
else if(return_value == -2)
{
printf("ERROR Data Type Overflow\n");
}
else
{
printf("ERROR Invalid character '%c'\n", argv[1][return_value-1]);
}
return 0;
}
Also cstrtof() is based on the following function.
int cstrtoslli(const char *cstr, signed long long int *slli)
{
unsigned short int i = 0;
signed long long int tmp_slli = 0;
if(cstr[0] == '\000') return -1;
else if(cstr[0] == '-') i = 1;
else if(cstr[0] == '0') return -2;
for(; cstr[i] != '\000'; i++)
{
if(cstr[i] >= '0' && cstr[i] <= '9')
{
//LLONG_MAX is defined in limits.h
if(tmp_slli > (LLONG_MAX - (cstr[i] - '0')) / 10) return -3;
else tmp_slli = tmp_slli * 10 + (cstr[i] - '0');
}
else return i+1;
}
if(cstr[0] == '-') *slli = tmp_slli*-1;
else *slli = tmp_slli;
return 0;
}
The largest representable float value less than 1 is returned by nexttowardf(1, -INFINITY).
This will generally have a different fraction part than, for example, the largest representable float value less than 2, which is nexttowardf(2, -INFINITY). This is because numbers of different magnitudes generally have different numbers of bits available for the fraction part (because some of the bits are used for the integer part). Large numbers have zero bits for the fraction part.
When float is an IEEE-754 32-bit binary floating-point value, which is common in modern implementations, the largest float below 1 is 0.999999940395355224609375. When the routines that convert decimal numerals to float are good quality and the rounding mode is to-nearest (the common default), then the point where numbers switch from rounding to 0.999999940395355224609375 to rounding to 1 is halfway between those two values (and the exact midpoint will round to 1).
Properly converting decimal numerals to binary floating-point is complicated. It is a solved problem and there are academic papers about it, but you should generally rely on existing library code, if it is doing the job properly. Doing it correctly yourself will require a significant investment of time.
The question I am asking is how can I calculate the maximum fraction that a float can express. How do I, for lack of a better term, know when a float is about to overflow?
You are looking for:
#include <math.h>
… nextafterf(1.0f, 0.0f) …
But you should familiarize yourself with C99's hexadecimal notation for floats, and then you can write the constant directly : 0x1.fffffep-1.

Resources