Why floating-points number's significant numbers is 7 or 6 - c

I see this in Wikipedia log 224 = 7.22.
I have no idea why we should calculate 2^24 and why we should take log10......I really really need your help.

why floating-points number's significant numbers is 7 or 6 (?)
Consider some thoughts employing the Pigeonhole principle:
binary32 float can encode about 232 different numbers exactly. The numbers one can write in text like 42.0, 1.0, 3.1415623... are infinite, even if we restrict ourselves to a range like -1038 ... +1038. Any time code has a textual value like 0.1f, it is encoded to a nearby float, which may not be the exact same text value. The question is: how many digits can we code and still maintain distinctive float?
For the various powers-of-2 range, 223 (8,388,608) values are normally linearly encoded.
Example: In the range [1.0 ... 2.0), 223 (8,388,608) values are linearly encoded.
In the range [233 or 8,589,934,592 ... 234 or 17,179,869,184), again, 223 (8,388,608) values are linearly encoded: 1024.0 apart from each other. In the sub range [9,000,000,000 and 10,000,000,000), there are about 976,562 different values.
Put this together ...
As text, the range [1.000_000 ... 2.000_000), using 1 lead digit and 6 trailing ones, there are 1,000,000 different values. Per #3, In the same range, with 8,388,608 different float exist, allowing each textual value to map to a different float. In this range we can use 7 digits.
As text, the range [9,000,000 × 103 and 10,000,000 × 103), using 1 lead digit and 6 trailing ones, there are 1,000,000 different values. Per #4, In the same range, there are less than 1,000,000 different float values. Thus some decimal textual values will convert to the same float. In this range we can use 6, not 7, digits for distinctive conversions.
The worse case for typical float is 6 significant digits. To find the limit for your float:
#include <float.h>
printf("FLT_DIG = %d\n", FLT_DIG); // this commonly prints 6
... no idea why we should calculate 2^24 and why we should take log10
224 is a generalization as with common float and its 24 bits of binary precision, that corresponds to fanciful decimal system with 7.22... digits. We take log10 to compare the binary float to decimal text.
224 == 107.22...
Yet we should not take 224. Let us look into how FLT_DIG is defined from C11dr §5.2.4.2.2 11:
number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,
p log10 b ............. if b is a power of 10
⎣(p − 1) log10 _b_⎦.. otherwise
Notice "log10 224" is same as "24 log10 2".
As a float, the values are distributed linearly between powers of 2 as shown in #2,3,4.
As text, values are distributed linearly between powers of 10 like a 7 significant digit values of [1.000000 ... 9.999999]*10some_exponent.
The transition of these 2 groups happen at different values. 1,2,4,8,16,32... versus 1,10,100, ... In determining the worst case, we subtract 1 from the 24 bits to account for the mis-alignment.
⎣(p − 1) log10 _b_⎦ --> floor((24 − 1) log10(2)) --> floor(6.923...) --> 6.
Had our float used base 10, 100, or 1000, rather than very common 2, the transition of these 2 groups happen at same values and we would not subtract one.

An IEEE 754 single-precision float has a 24-bit mantissa. This means it has 24 binary bits' worth of precision.
But we might be interested in knowing how many decimal digits worth of precision it has.
One way of computing this is to consider how many 24-bit binary numbers there are. The answer, of course, is 224. So these binary numbers go from 0 to 16777215.
How many decimal digits is that? Well, log10 gives you the number of decimal digits. log10(224) is 7.2, or a little more than 7 decimal digits.
And look at that: 16777215 has 8 digits, but the leading digit is just 1, so in fact it's only a little more than 7 digits.
(Of course this doesn't mean we can represent only numbers from 0 to 16777215! It means we can represent numbers from 0 to 16777215 exactly. But we've also got the exponent to play with. We can represent numbers from 0 to 1677721.5 more or less exactly to one place past the decimal, numbers from 0 to 167772.15 more or less exactly to two decimal points, etc. And we can represent numbers from 0 to 167772150, or 0 to 1677721500, but progressively less exactly -- always with ~7 digits' worth of precision, meaning that we start losing precision in the low-order digits to the left of the decimal point.)
The other way of doing this is to note that log10(2) is about 0.3. This means that 1 bit corresponds to about 0.3 decimal digits. So 24 bits corresponds to 24 × 0.3 = 7.2.
(Actually, IEEE 754 single-precision floating point explicitly stores only 23 bits, not 24. But there's an implicit leading 1 bit in there, so we do get the effect of 24 bits.)

Let's start a little smaller. With 10 bits (or 10 base-2 digits), you can represent the numbers 0 upto 1023. So you can represent up to 4 digits for some values, but 3 digits for most others (the ones below 1000).
To find out how many base-10 (decimal) digits can be represented by a bunch of base-2 digits (bits), you can use the log10() of the maximum representable value, i.e. log10(2^10) = log10(2) * 10 = 3.01....
The above means you can represent all 3 digit — or smaller — values and a few 4 digits ones. Well, that is easily verified: 0-999 have at most 3 digits, and 1000-1023 have 4.
Now take 24 bits. In 24 bits you can store log10(2^24) = 24 * log(2) base-10 digits. But because the top bit is always the same, you can in fact only store log10(2^23) = log10(8388608) = 6.92. This means you can represent most 7 digits numbers, but not all. Some of the numbers you can represent faithfully can only have 6 digits.
The truth is a bit more complicated though, because exponents play role too, and some of the many possible larger values can be represented too, so 6.92 may not be the exact value. But it gets close, and can nicely serve as a rule of thumb, and that is why they say that single precision can represent 6 to 7 digits.

Related

what is the difference between number,number(3),number(3,2) [duplicate]

This question already has answers here:
What is the default Precision and Scale for a Number in Oracle?
(6 answers)
Closed 7 years ago.
i know basic difference between them ,i wanted to know
if i do not specify precision and scale and define the datatype as number
what are the default values for precision and scale assigned?
create table a(id number);
create table b(id number(3));
both above queries creates a table with column and number datatype but what is the difference from
1)performance point of view
2)How it is handled internally by database
3)Is there any advantage of specifying number as datatype over number(3)
Answer from Oracle perspective...
NUMBER Datatype
The NUMBER datatype stores zero as well as positive and negative fixed numbers with absolute values from 1.0 x 10-130 to (but not including) 1.0 x 10126. If you specify an arithmetic expression whose value has an absolute value greater than or equal to 1.0 x 10126, then Oracle returns an error. Each NUMBER value requires from 1 to 22 bytes.
Specify a fixed-point number using the following form:
NUMBER(p,s)
where:
p is the precision, or the total number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point.
s is the scale, or the number of digits from the decimal point to the least significant digit. The scale can range from -84 to 127.
Positive scale is the number of significant digits to the right of the decimal point to and including the least significant digit.
Negative scale is the number of significant digits to the left of the decimal point, to but not including the least significant digit. For negative scale the least significant digit is on the left side of the decimal point, because the actual data is rounded to the specified number of places to the left of the decimal point. For example, a specification of (10,-2) means to round to hundreds.
Scale can be greater than precision, most commonly when e notation is used. When scale is greater than precision, the precision specifies the maximum number of significant digits to the right of the decimal point. For example, a column defined as NUMBER(4,5) requires a zero for the first digit after the decimal point and rounds all values past the fifth digit after the decimal point.
It is good practice to specify the scale and precision of a fixed-point number column for extra integrity checking on input. Specifying scale and precision does not force all values to a fixed length. If a value exceeds the precision, then Oracle returns an error. If a value exceeds the scale, then Oracle rounds it.
Specify an integer using the following form:
NUMBER(p)
This represents a fixed-point number with precision p and scale 0 and is equivalent to NUMBER(p,0).
Specify a floating-point number using the following form:
NUMBER
The absence of precision and scale designators specifies the maximum range and precision for an Oracle number.
And
NUMBER[(precision [, scale]])
Number having precision p and scale s. The precision p can range from 1 to 38. The scale s can range from -84 to 127

Floating point conversion - Binary -> decimal

Here's the number I'm working on
1 01110 001 = ____
1 sign bit, 5 exp bits, 3 fraction bits
bias = 15
Here's my current process, hopefully you can tell me where I'm missing something
Convert binary exponent to decimal
01110 = 14
Subtract bias
14 - 15 = -1
Multiply fraction bits by result
0.001 * 2^-1 = 0.0001
Convert to decimal
.0001 = 1/16
The sign bit is 1 so my result is -1/16, however the given answer is -9/16. Would anyone mind explaining where the extra 8 in the fraction is coming from?
You seem to have the correct concept, including an understanding of the excess-N representation, but you're missing a crucial point.
The 3 bits used to encode the fractional part of the magnitude are 001, but there is an implicit 1. preceding the fraction bits, so the full magnitude is actually 1.001, which can be represented as an improper fraction as 1+1/8 => 9/8.
2^(-1) is the same as 1/(2^1), or 1/2.
9/8 * 1/2 = 9/16. Take the sign bit into account, and you arrive at the answer -9/16.
For normalized floating point representation, the Mantissa (fractional bits) = 1 + f. This is sometimes called an implied leading 1 representation. This is a trick for getting an additional bit of precision for free since we can always adjust the exponent E so that significant M is in the range 1<=M < 2 ...
You are almost correct but must take into consideration the implied 1. If it is denormalized (meaning the exponent bits are all 0s) you do not add an implied 1.
I would solve this problem as such...
1 01110 001
bias = 2^(k-1) -1 = 14
Exponent = e - bias
14 - 15 = -1
Take the fractional bits ->> 001
Add the implied 1 ->> 1.001
Shift it by the exponent, which is -1. Becomes .1001
Count up the values, 1(1/2) + 0(1/4) + 0(1/8) + 1(1/16) = 9/16
With the a negative sign bit it becomes -9/16
hope that helps!

Number of bits assigned for double data type

How many bits out of 64 is assigned to integer part and fractional part in double. Or is there any rule to specify it?
Note: I know I already replied with a comment. This is for my own benefit as much as the OPs; I always learn something new when I try to explain it.
Floating-point values (regardless of precision) are represented as follows:
sign * significand * βexp
where sign is 1 or -1, β is the base, exp is an integer exponent, and significand is a fraction. In this case, β is 2. For example, the real value 3.0 can be represented as 1.102 * 21, or 0.112 * 22, or even 0.0112 * 23.
Remember that a binary number is a sum of powers of 2, with powers decreasing from the left. For example, 1012 is equivalent to 1 * 22 + 0 * 21 + 1 * 20, which gives us the value 5. You can extend that past the radix point by using negative powers of 2, so 101.112 is equivalent to
1 * 22 + 0 * 21 + 1 * 20 + 1 * 2-1 + 1 * 2-2
which gives us the decimal value 5.75. A floating-point number is normalized such that there's a single non-zero digit prior to the radix point, so instead of writing 5.75 as 101.112, we'd write it as 1.01112 * 22
How is this encoded in a 32-bit or 64-bit binary format? The exact format depends on the platform; most modern platforms use the IEEE-754 specification (which also specifies the algorithms for floating-point arithmetic, as well as special values as infinity and Not A Number (NaN)), however some older platforms may use their own proprietary format (such as VAX G and H extended-precision floats). I think x86 also has a proprietary 80-bit format for intermediate calculations.
The general layout looks something like the following:
seeeeeeee...ffffffff....
where s represents the sign bit, e represents bits devoted to the exponent, and f represents bits devoted to the significand or fraction. The IEEE-754 32-bit single-precision layout is
seeeeeeeefffffffffffffffffffffff
This gives us an 8-bit exponent (which can represent the values -126 through 127) and a 22-bit significand (giving us roughly 6 to 7 significant decimal digits). A 0 in the sign bit represents a positive value, 1 represents negative. The exponent is encoded such that 000000012 represents -126, 011111112 represents 0, and 111111102 represents 127 (000000002 is reserved for representing 0 and "denormalized" numbers, while 111111112 is reserved for representing infinity and NaN). This format also assumes a hidden leading fraction bit that's always set to 1. Thus, our value 5.75, which we represent as 1.01112 * 22, would be encoded in a 32-bit single-precision float as
01000000101110000000000000000000
|| || |
|| |+----------+----------+
|| | |
|+--+---+ +------------ significand (1.0111, hidden leading bit)
| |
| +---------------------------- exponent (2)
+-------------------------------- sign (0, positive)
The IEEE-754 double-precision float uses 11 bits for the exponent (-1022 through 1023) and 52 bits for the significand. I'm not going to bother writing that out (this post is turning into a novel as it is).
Floating-point numbers have a greater range than integers because of the exponent; the exponent 127 only takes 8 bits to encode, but 2127 represents a 38-digit decimal number. The more bits in the exponent, the greater the range of values that can be represented. The precision (the number of significant digits) is determined by the number of bits in the significand. The more bits in the significand, the more significant digits you can represent.
Most real values cannot be represented exactly as a floating-point number; you cannot squeeze an infinite number of values into a finite number of bits. Thus, there are gaps between representable floating point values, and most values will be approximations. To illustrate the problem, let's look at an 8-bit "quarter-precision" format:
seeeefff
This gives us an exponent between -7 and 8 (we're not going to worry about special values like infinity and NaN) and a 3-bit significand with a hidden leading bit. The larger our exponent gets, the wider the gap between representable values gets. Here's a table showing the issue. The left column is the significand; each additional column shows the values we can represent for the given exponent:
sig -1 0 1 2 3 4 5
--- ---- ----- ----- ----- ----- ----- ----
000 0.5 1 2 4 8 16 32
001 0.5625 1.125 2.25 4.5 9 18 36
010 0.625 1.25 2.5 5 10 20 40
011 0.6875 1.375 2.75 5.5 11 22 44
100 0.75 1.5 3 6 12 24 48
101 0.8125 1.625 3.25 6.5 13 26 52
110 0.875 1.75 3.5 7 14 28 56
111 0.9375 1.875 3.75 7.5 15 30 60
Note that as we move towards larger values, the gap between representable values gets larger. We can represent 8 values between 0.5 and 1.0, with a gap of 0.0625 between each. We can represent 8 values between 1.0 and 2.0, with a gap of 0.125 between each. We can represent 8 values between 2.0 and 4.0, with a gap of 0.25 in between each. And so on. Note that we can represent all the positive integers up to 16, but we cannot represent the value 17 in this format; we simply don't have enough bits in the significand to do so. If we add the values 8 and 9 in this format, we'll get 16 as a result, which is a rounding error. If that result is used in any other computation, that rounding error will be compounded.
Note that some values cannot be represented exactly no matter how many bits you have in the significand. Just like 1/3 gives us the non-terminating decimal fraction 0.333333..., 1/10 gives us the non-terminating binary fraction 1.10011001100.... We would need an infinite number of bits in the significand to represent that value.
a double on a 64 bit machine, has one sign bit, 11 exponent bits and 52 fractional bits.
think (1 sign bit) * (52 fractional bits) ^ (11 exponent bits)

Problematic understanding of IEEE 754 [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
First of all i woild like to point out that i am not native speaker and i really need some terms used more commonly.
And the second thing i would like to mention is that i am not a math genious. I am really trying to understand everything about programming.. but ieee-754 makes me think that it'll never happan.. its full of mathematical terms i don't understand..
What is precision? What is it used for? What is mantissa and what is mantissa used for? How to determine the range of float/double by their size? What is ± symbol (Plus-minus) used for? (i believe its positive/negative choice but what does that have to do with everything?),
Isn't there any brief and clean explanation you guys could provide me with?
I spent 600 years of trying to understand wikipedia. I failed tremendously.
What is precision?
It refers to how closely a binary floating point representation can represent a real value. Real values have infinite precision and infinite range. Digital values have finite range and precision. In practice a single-precision IEEE-754 can represent real values of a precision of 6 significant figures (decimal), while double-precision is good for 15 significant figures.
The practical effect of this for example is that a single precision value: 123456000.00 cannot be distinguished from say 123456001.00, but equally a value 0.00123456 can be represented.
What is it used for?
Precision is not used for anything other than to define a characteristic of a particular floating point representation.
What is mantissa and what is mantissa used for?
The term is not mentioned in the English language Wikipedia article, and is imprecise - in mathematics in general it has a different meaning that that used here.
The correct term is significand. For a decimal value 0.00123456 for example the significand is is 123456. 123456000.00 has exactly the same significand. Each of these values has the same significand but a different exponent. The exponent is a scaling factor which determines where the decimal point is (hence floating point).
Of course IEEE754 is a binary floating point representation not decimal, but for the same of explanation of the terms it is perhaps easier to use decimal.
How to determine the range of float/double by their size?
By the size alone you cannot; you need to know how many bits are assigned to the significand and how many bits are assigned to the exponent. In C however the range is defined by the macros FLT_MIN, FLT_MAX, DBL_MIN and DBL_MAX in the float.h header. Other characteristics of the implementations floating point representation are described there also.
Note that a specific compiler may not in fact use IEEE754, however that is the format used by most hardware FPU implementations, and the compiler will naturally follow that. For targets with no FPU (small embedded processors typically), other formats may be used.
What is ± symbol (Plus-minus) used for?
It simply means that the value given may be both positive or negative. It may refer to a specific value, or it may indicate a range. So ±n may refer to two discrete values -n or +n, or it may mean a range -n to +n. Context is everything! In this article it refers to discrete values +0, -0, +∞ and -∞.
There are 3 different components: sign, exponent, mantissa
Assuming that the exponent has only 2 Bits, 4 combinations are possible:
binary decimal
00 0
01 1
10 2
11 3
The represented floating-point value is 2exponent:
binary exponent-value
00 2^0 = 1
01 2^1 = 2
10 2^2 = 4
11 2^3 = 8
The range of the floating point value, results from the exponent. 2 bits => maximum value = 8.
The mantissa divide the range from a given exponent to the next higher exponent.
For example the exponent is 2 and the mantissa has one bit, then there are two values possible:
exponent-value mantissa-binary represented floating-point value
2 0 2
2 1 3
The represented floating-point value is 2exponent × (1 + m1×2-1 + m2×2-2 + m3×2-3 + …).
Here an example with a 3 bit mantissa:
exponent-value mantissa-binary represented floating-point value
2 000 2 * (1 ) = 2
2 001 2 * (1 + 2^-3) = 2,25
2 010 2 * (1 + 2^-2 ) = 2,5
2 011 2 * (1 + 2^-2 + 2^-3) = 2,75
2 100 2 * (1 + 2^-1 ) = 3
and so on…
The sign has only just one Bit:
0 -> positive value
1 -> negative value
In IEEE-754 a 32 bit floating-point data type has an 8 bit exponent (with a range from 2-127 to 2128) and a 23 bit mantissa.
1 10000010 01101000000000000000000
- 130 1,40625
The represented floating-point value for this is:
-1 × 2(130 – 127) × (1 + 2-2 + 2-3 + 2-5) = -11,25
Try it: http://www.h-schmidt.net/FloatConverter/IEEE754.html

How much range Oracle Number(18) can store?

I was looking at documentation that a number type in oracle db can store range from 10 raise to -130 to 10 raise to 126.
Was wondering how many positive numbers a field NUMBER(18) can store?
Integer numbers with up to 18 digits (Integers between -10^18+1 and 10^18-1)
According to Oracle documentation, the NUMBER datatype stores fixed and floating-point numbers. Optionally, you can also specify a precision (total number of digits) and scale (number of digits to the right of the decimal point):
NUMBER (precision, scale)
If no scale is specified, the scale is zero.
In your case, NUMBER(18), you specified a precision of 18 digits and did not specify any scale so 0 is used (no numbers after the decimal point).

Resources