What is ilogb() in c? - c

What does ilogb() do? I looked it up on Google but I didn't understand it.
#include <stdio.h>
#include <math.h>
int main()
{
float num;
printf("%f",(float)ilogb(125));
return 0;
}
Output:
6.000000
Process returned 0 (0x0) execution time : 0.764 s
Press any key to continue.
Why does it return 6?

What does ilogb() do?
Why does it return 6?
When FLT_RADIX, defined in <float.h>, has value 2, ilogb() returns the position of the most significant 1 in the binary representation of the argument.
For example: 125 is "0b1111101"; the most significant 1 is at position 6; ilogb(125) returns 6 as your code printed.
Or 0.25 is "0b0.01"; ilogb(0.25) returns -2.

ilogb() is a logarithm base 2 operation, but it's meant to be used specifically on floating point numbers, and returns an integer. To understand what it means to take the exponent part of a floating point number you need to know about floating point numbers.
A floating point number isn't just one number, but a combination of three. Even though you write -333.6, inside the computer it looks more like (-1)*(3.336)*(2^2), a bit like scientific notation but using a 2 instead of a 10. So you have one bit for the sign, a few bits for the exponent, and a few for the actual number (the mantissa). This is what the number 125 looks like as a single-precision floating point which is what c uses:
0 000110 11110100000000000000000//notice the first digit of the mantissa is missing because it's always 1
(sign = 1) (exponent = 110) (mantissa = 1.111101)
In binary, this is (1)*(1.111101)*(10^110), in decimal it is (1)*(1.953125)*(2^6)
Back to ilogb(): What this does is return the exponent part of the number, in this case 110 which is a 6.

Related

How are results rounded in floating-point arithmetic?

I wrote this code that simply sums a list of n numbers, to practice with floating point arithmetic, and I don't understand this:
I am working with float, this means I have 7 digits of precision, therefore, if I do the operation 10002*10002=100040004, the result in data type float will be 100040000.000000, since I lost any digit beyond the 7th (the program still knows the exponent, as seen here).
If the input in this program is
3
10000
10001
10002
You will see that, however, when this program computes 30003*30003=900180009 we have 30003*30003=900180032.000000
I understand this 32 appears becasue I am working with float, and my goal is not to make the program more precise but understand why this is happening. Why is it 900180032.000000 and not 900180000.000000? Why does this decimal noise (32) appear in 30003*30003 and not in 10002*10002 even when the magnitude of the numbers are the same? Thank you for your time.
#include <stdio.h>
#include <math.h>
#define MAX_SIZE 200
int main()
{
int numbers[MAX_SIZE];
int i, N;
float sum=0;
float sumb=0;
float sumc=0;
printf("introduce n" );
scanf("%d", &N);
printf("write %d numbers:\n", N);
for(i=0; i<N; i++)
{
scanf("%d", &numbers[i]);
}
int r=0;
while (r<N){
sum=sum+numbers[r];
sumb=sumb+(numbers[r]*numbers[r]);
printf("sum is %f\n",sum);
printf("sumb is %f\n",sumb);
r++;
}
sumc=(sum*sum);
printf("sumc is %f\n",sumc);
}
As explained below, the computed result of multiplying 10,002 by 10,002 must be a multiple of eight, and the computed result of multiplying 30,003 by 30,003 must be a multiple of 64, due to the magnitudes of the numbers and the number of bits available for representing them. Although your question asks about “decimal noise,” there are no decimal digits involved here. The results are entirely due to rounding to multiples of powers of two. (Your C implementation appears to use the common IEEE 754 format for binary floating-point.)
When you multiply 10,002 by 10,002, the computed result must be a multiple of eight. I will explain why below. The mathematical result is 100,040,004. The nearest multiples of eight are 100,040,000 and 100,040,008. They are equally far from the exact result, and the rule used to break ties chooses the even multiple (100,040,000 is eight times 12,505,000, an even number, while 100,040,008 is eight times 12,505,001, an odd number).
Many C implementations use IEEE 754 32-bit basic binary floating-point for float. In this format, a number is represented as an integer M multiplied by a power of two 2e. The integer M must be less than 224 in magnitude. The exponent e may be from −149 to 104. These limits come from the numbers of bits used to represent the integer and the exponent.
So all float values in this format have the value M • 2e for some M and some e. There are no decimal digits in the format, just an integer multiplied by a power of two.
Consider the number 100,040,004. The biggest M we can use is 16,777,215 (224−1). That is not big enough that we can write 100,040,004 as M • 20. So we must increase the exponent. Even with 22, the biggest we can get is 16,777,215 • 22 = 67,108,860. So we must use 23. And that is why the computed result must be a multiple of eight, in this case.
So, to produce a result for 10,002•10,002 in float, the computer uses 12,505,000 • 23, which is 100,040,000.
In 30,003•30,003, the result must be a multiple of 64. The exact result is 900,180,009. 25 is not enough because 16,777,215•25 is 536,870,880. So we need 26, which is 64. The two nearest multiples of 64 are 900,179,968 and 900,180,032. In this case, the latter is closer (23 away versus 41 away), so it is chosen.
(While I have described the format as an integer times a power of two, it can also be described as a binary numeral with one binary digit before the radix point and 23 binary digits after it, with the exponent range adjusted to compensate. These are mathematically equivalent. The IEEE 754 standard uses the latter description. Textbooks may use the former description because it makes analyzing some of the numerical properties easier.)
Floating point arithmetic is done in binary, not in decimal.
Floats actually have 24 binary bits of precision, 1 of which is a sign bit and 23 of which are called significand bits. This converts to approximately 7 decimal digits of precision.
The number you're looking at, 900180032, is already 9 digits long and so it makes sense that the last two digits (the 32) might be wrong. The rounding like the arithmetic is done in binary, the reason for the difference in rounding can only be seen if you break things down into binary.
900180032 = 110101101001111010100001000000
900180000 = 110101101001111010100000100000
If you count from the first 1 to the last 1 in each of those numbers (the part I put in bold), that is how many significand bits it takes to store the number. 900180032 takes only 23 significand bits to store while 900180000 takes 24 significand bits which makes 900180000 an impossible number to store as floats only have 23 significand bits. 900180032 is the closest number to the correct answer, 900180009, that a float can store.
In the other example
100040000 = 101111101100111110101000000
100040004 = 101111101100111110101000100
The correct answer, 100040004 has 25 significand bits, too much for floats. The nearest number that has 23 or less significand bits is 10004000 which only has 21 significant bits.
For more on floating point arithmetic works, try here http://steve.hollasch.net/cgindex/coding/ieeefloat.html

Using floorf to reduce the number of decimals

I would like to use the first five digits of a number for computation.
For example,
A floating point number: 4.23654897E-05
I wish to use 4.2365E-05.I tried the following
#include <math.h>
#include <stdio.h>
float num = 4.23654897E-05;
int main(){
float rounded_down = floorf(num * 10000) / 10000;
printf("%f",rounded_down);
return 0;
}
The output is 0.000000.The desired output is 4.2365E-05.
In short,say 52 bits are allocated for storing the mantissa.Is there a way to reduce the number of bits being allocated?
Any suggestions on how this can be done?
A number x that is positive and within the normal range can be rounded down approximately to five significant digits with:
double l = pow(10, floor(log10(x)) - 4);
double y = l * floor(x / l);
This is useful only for tinkering with floating-point arithmetic as a learning tool. The exact mathematical result is generally not exactly representable, because binary floating-point cannot represent most decimal values exactly. Additionally, rounding errors can occur in the pow, /, and * operations that may cause the result to differ slightly from the true mathematical result of rounding x to five significant digits. Also, poor implementations of log10 or pow can cause the result to differ from the true mathematical result.
I'd go:
printf("%.6f", num);
Or you can try using snprintf() from stdlib.h:
float num = 4.23654897E-05; char output[50];
snprintf(output, 50, "%f", num);
printf("%s", output);
The result is expected. The multiplication by 10000 yield 0.423.. the nearest integer to it is 0. So the result is 0. Rounding can be done using format specifier %f to print the result upto certain decimal places after decimal point.
If you check the return value of floorf you will see it returns If no errors occur, the largest integer value not greater than arg, that is ⌊arg⌋, is returned. where arg is the passed argument.
Without using floatf you can use %e or (%E)format specifier to print it accordingly.
printf("%.4E",num);
which outputs:
4.2365E-05
After David's comment:
Your way of doing things is right but the number you multiplied is wrong. The thing is 4.2365E-05 is 0.00004235.... Now if you multiply it with 10000 then it will 0.42365... Now you said I want the expression to represent in that form. floorf returns float in this case. Store it in a variable and you will be good to go. The rounded value will be in that variable. But you will see that the rounded down value will be 0. That is what you got.
float rounded_down = floorf(num * 10000) / 10000;
This will hold the correct value rounded down to 4 digits after . (not in exponent notation with E or e). Don't confuse the value with the format specifier used to represent it.
What you need to do in order to get the result you want is move the decimal places to the right. To do that multiply with larger number. (1e7 or 1e8 or as you want it to).
I would like to use the first five digits of a number for computation.
In general, floating point numbers are encoded using binary and OP wants to use 5 significant decimal digits. This is problematic as numbers like 4.23654897E-05 and 4.2365E-05 are not exactly representable as a float/double. The best we can do is get close.
The floor*() approach has problems with 1) negative numbers (should have used trunc()) and 2) values near x.99995 that during rounding may change the number of digits. I strongly recommend against it here as such solutions employing it fail many corner cases.
The *10000 * power10, round, /(10000 * power10) approach suffers from 1) power10 calculation (1e5 in this case) 2) rounding errors in the multiple, 3) overflow potential. The needed power10 may not be exact. * errors show up with cases when the product is close to xxxxx.5. Often this intermediate calculation is done using wider double math and so the corner cases are rare. Bad rounding using (some_int_type) which has limited range and is a truncation instead of the better round() or rint().
An approach that gets close to OP's goal: print to 5 significant digits using %e and convert back. Not highly efficient, yet handles all cases well.
int main(void) {
float num = 4.23654897E-05f;
// sign d . dddd e sign expo + \0
#define N (1 + 1 + 1 + 4 + 1 + 1 + 4 + 1)
char buf[N*2]; // Use a generous buffer - I like 2x what I think is needed.
// OP wants 5 significant digits so print 4 digits after the decimal point.
sprintf(buf, "%.4e", num);
float rounded = (float) atof(buf);
printf("%.5e %s\n", rounded, buf);
}
Output
4.23650e-05 4.2365e-05
Why 5 in %.5e: Typical float will print up to 6 significant decimal digits as expected (research FLT_DIG), so 5 digits after the decimal point are printed. The exact value of rounded in this case was about 4.236500171...e-05 as 4.2365e-05 is not exactly representable as a float.

data type: float, long conversion in C

I was reading C primer plus, in chapter 3, data type, the author says:
If you take the bit pattern that represents the float number 256.0 and interpret it as a long value, you get 113246208.
I don't understand how the conversion works. Can someone helps me with this? Thanks.
256.0 is 1.0*28, right?
Now, look at the format (stealing it from #bash.d):
31 0
| |
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM //S - SIGN , E - EXPONENT, M - MANTISSA
The number is positive, so 0 goes into S.
The exponent, 8, goes into EEEEEEEE but before it goes there you need to add 127 to it as required by the format, so 135 goes there.
Now, of 1.0 only what's to the right of the point is actually stored in MMMMMMMMMMMMMMMMMMMMMMM, so 0 goes there. The 1. is implied for most numbers represented in the format and isn't actually stored in the format.
The idea here is that the absolute values of all nonzero numbers can be transformed into
1.0...1.111(1) * 10some integer (all numbers are binary)
or nearly equivalently
1.0...1.999(9) * 2some integer (all numbers are decimal)
and that's what I did at the top of my answer. The transformation is done by repeated division or multiplication of the number by 2 until you get the mantissa in the decimal range [1.0, 2.0) (or [1.0, 10.0) in binary). Since there's always this 1 in a non-zero number, why store it? And so it's not stored and gives you another free M bit.
So you end up with:
(0 << 31) + ((8 + 127) << 23) + 0 = 1132462080
The format is described here.
What's important from that quote is that integer/long and floats are saved in a different format in memory, so that you cannot simply pick up a bit of memory that has a float in it and say that now it's an int and get a correct value.
The specifics on how each data type is saved into memory can be found searching for IEEE standard, but again that isn't probably the objective of the quote. What it tries to tell you is that floats and integers are saved using a different pattern and you cannot simply use a float number as an int or vice-versa.
While integer and long values are usually represented using two's complement, float-values have a special Encoding, because you cannot tell the computer to display a float-value only using bits.
A 32-bit float number contains a sign-bit, a mantisse and an exponent. These determine together what value the float has.
See here for an article.
EDIT
So, this is what a float encoded by IEEE 754 looks like (32-bit)
31 0
| |
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM //S - SIGN , E - EXPONENT, M - MANTISSE
I don't know the pattern for 256.0, but the long value will be purely interpreted as
31 0
| |
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB // B - BIT
So there is no "conversion", but a different interpretation.

Why does a C floating-point type modify the actual input of 125.1 to 125.099998 on output?

I wrote the following program:
#include<stdio.h>
int main(void)
{
float f;
printf("\nInput a floating-point no.: ");
scanf("%f",&f);
printf("\nOutput: %f\n",f);
return 0;
}
I am on Ubuntu and used GCC to compile the above program. Here is my sample run and output I want to inquire about:
Input a floating-point no.: 125.1
Output: 125.099998
Why does the precision change?
Because the number 125.1 is impossible to represent exactly with floating-point numbers. This happens in most programming languages. Use e.g. printf("%.1f", f); if you want to print the number with one decimal, but be warned: the number itself is not exactly equal to 125.1.
Thank you all for your answers. Although almost all of you helped me look in the right direction I could not understand the exact reason for this behavior. So I did a bit of research in addition to reading the pages you guys pointed me to. Here is my understanding for this behavior:
Single Precision Floating Point numbers typically use 4 bytes for storage on x86/x86-64 architectures. However not all 32 bits (4 bytes = 32 bits) are used to store the magnitude of the number.
For storing as a single precision floating type, the input stream is formatted in the following notation (somewhat similar to scientific notation):
(-1)^s x 1.m x 2^(e-127), where
s = sign of the number, range:{0,1} - takes up 1 bit
m = mantissa (fractional portion) of the number - takes up 23 bits
e = exponent of the number offset by 127, range:{0,..,255} - takes up 8 bits
and then stored in memory as
0th byte 1st byte 2nd byte 3rd byte
mmmmmmmm mmmmmmmm emmmmmmm seeeeeee
Therefore the decimal number 125.1 is first converted to binary form but limited to 24 bits so that the mantissa is represented by no more than 23 bits. After conversion to binary form:
125.1 = 1111101.00011001100110011
NOTE: 0.1 in decimal can be represented up to infinite bits in binary but the computer limits the representation to 17 bits so the complete representation does not exceed 24 bits.
Now converting it into the specified notation we get:
125.1 = 1.111101 00011001100110011 x 2^6
= (-1)^0 + 1.111101 00011001100110011 x 2^(133-127)
which implies
s = 0
m = 11110100011001100110011
e = 133 = 10000101
Therefore, 125.1 will be stored in memory as:
0th byte 1st byte 2nd byte 3rd byte
mmmmmmmm mmmmmmmm emmmmmmm seeeeeee
00110011 00110011 11111010 01000010
On being passed to the printf() function the output stream is generated by converting the binary form to the decimal form. The bytes are actually stored in reverse order (from the input stream) and hence read in this order:
3rd byte 2nd byte 1st byte 0th byte
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
01000010 11111010 00110011 00110011
Next, it is converted into the specific notation for conversion
(-1)^0 + 1.111101 00011001100110011 x 2^(133-127)
On simplifying the above representation further:
= 1.111101 00011001100110011 x 2^6
= 1111101.00011001100110011
And finally converting it to decimal:
= 125.0999984741210938
but single precision floating point can represent only up to 6 decimal places, therefore the answer is rounded off to 125.099998.
Think about a fixed point representation first.
2^3=8 2^2=4 2^1=2 2^0=1 2^-1=1/2 2^-2=1/4 2^-3=1/8 2^-4=1/16
If we want to represent a fraction then we set the bits to the right of the point, so 5.5 is represented as 01011000.
But if we want to represent 5.6, there is not an exact fractional representation. The closest we can get is 01011001 == 5.5625
1/2 + 1/16 = 0.5625
2^-4 + 2^-1
Because its the closest representation of 125.1 , remember that single precision floating point are just 32 bits.
If I tell you to write 1/3 as decimal number down, you realize there a numbers which have no finite representation. .1 is the exact representation of 1/10 there this problem does not appear, BUT this is just in decimal representation. In binary representation .1 is one of those numbers that require infinite digits. As your number must be somehwere cut there is something lost.
No floating point numbers has an exact representation, they all have limited accuracy. When converting from a number in text to a float (with scanf or otherwise), you're in another world with different kinds of numbers, and precision may be lost. Same thing goes when converting from a float to a string: you decide on how many digits you want. You can't know "how many digits there are" in a float before converting to text or another format that can keep that information. This all has to do with how floats are stored:
significant_digits * baseexponent
The normal type used for floating point in C is double, not float. Your float is implicitly cast to a double, and because the float is less precise, the difference to the closest representable number to 125.1 is more apparent (and printf's default precision is tailored for use with doubles). Try this instead:
#include<stdio.h>
int main(void)
{
double f;
printf("\nInput a floating-point no.: ");
scanf("%lf",&f);
printf("\nOutput: %f\n",f);
return 0;
}

Finding the smallest integer that can not be represented as an IEEE-754 32 bit float [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Which is the first integer that an IEEE 754 float is incapable of representing exactly?
Firstly, this IS a homework question, just to clear this up immediately. I'm not looking for a spoon fed solution of course, just maybe a little pointer to the right direction.
So, my task is to find the smallest positive integer that can not be represented as an IEEE-754 float (32 bit). I know that testing for equality on something like "5 == 5.00000000001" will fail, so I thought I'd simply loop over all the numbers and test for that in this fashion:
int main(int argc, char **argv)
{
unsigned int i; /* Loop counter. No need to inizialize here. */
/* Header output */
printf("IEEE floating point rounding failure detection\n\n");
/* Main program processing */
/* Loop over every integer number */
for (i = 0;; ++i)
{
float result = (float)i;
/* TODO: Break condition for integer wrapping */
/* Test integer representation against the IEEE-754 representation */
if (result != i)
break; /* Break the loop here */
}
/* Result output */
printf("The smallest integer that can not be precisely represented as IEEE-754"
" is:\n\t%d", i);
return 0;
}
This failed. Then I tried to subtract the integer "i" from the floating point "result" that is "i" hoping to achieve something of a "0.000000002" that I could try and detect, which failed, too.
Can someone point me out a property of floating points that I can rely on to get the desired break condition?
-------------------- Update below ---------------
Thanks for help on this one! I learned multiple things here:
My original thought was indeed correct and determined the result on the machine it was intended to be run on (Solaris 10, 32 bit), yet failed to work on my Linux systems (64 bit and 32 bit).
The changes that Hans Passant added made the program also work with my systems, there seem to be some platform differences going on here that I didn't expect,
Thanks to everyone!
The problem is that your equality test is a float point test. The i variable will be converted to float first and that of course produces the same float. Convert the float back to int to get an integer equality test:
float result = (float)i;
int truncated = (int)result;
if (truncated != i) break;
If it starts with the digits 16 then you found the right one. Convert it to hex and explain why that was the one that failed for a grade bonus.
I think you should reason on the representation of the floating numbers as (base, sign,significand,exponent)
Here it is an excerpt from Wikipedia that can give you a clue:
A given format comprises:
* Finite numbers, which may be either base 2 (binary) or base 10
(decimal). Each finite number is most
simply described by three integers: s=
a sign (zero or one), c= a significand
(or 'coefficient'), q= an exponent.
The numerical value of a finite number
is
(−1)s × c × bq
where b is the base (2 or 10). For example, if the sign is 1
(indicating negative), the significand
is 12345, the exponent is −3, and the
base is 10, then the value of the
number is −12.345.
That would be FLT_MAX+1. See float.h.
Edit: or actually not. Check the modf() function in math.h

Resources