Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Adding the ability to print a decimal fixed-point number as hexadecimal in my general purpose library and realized i wasn't %100 sure how i should represent the fraction part of the number. A quick google search suggests i should:
Multiply by 16
Convert the integer part to hex and add it to the buffer
Get rid of the integer part
Repeat
As suggested here https://bytes.com/topic/c/answers/219928-how-convert-float-hex
This method is for floating-point (ieee 754 binary formats) and it works for that just fine. However, i tried to adopt this to my decimal fixed-point (scaled by 8) format, and after testing this approach on paper i noticed for some fractions (i.e. .7), this causes a repeating pattern of .B3333... and so on.
To me this looks very undesirable. I also wonder if this would case a loss in precision if i was to try to read this from a string into my fixed-point format.
Is there any reason why someone wouldn't print the fraction part like any other 2s complement hexadecimal number? i.e where 17535.564453 is printed as 447F.89CE5
While this is targeted at decimal fixed-point, I'm looking for a solution that can also be used by other real number formats such as ieee 754 binary.
Perhaps theres another alternative to these 2 methods. Any ideas?
Although the question asks about fixed-point, the C standard has some useful information in its information for the %a format for floating-point. C 2018 7.21.6.1 8 says:
… if the [user-requested] precision is missing and FLT_RADIX is not a power of 2, then the precision is sufficient to distinguish285) values of type double, except that trailing zeros may be omitted;…
Footnote 285 says:
The precision p is sufficient to distinguish values of the source type if 16p−1 > bn where b is FLT_RADIX and n is the number of base-b digits in the significand of the source type…
To see this intuitively, visualize the decimal fixed-point numbers on the real number line from 0 to 1. For each such number x, visualize a segment starting halfway toward the previous fixed-point number and ending halfway toward the next fixed-point number. All the points in that segment are closer to x than they are to the previous or next numbers, except for the endpoints. Now, consider where all the single-hexadecimal-digit numbers j/16 are. They lie in some of those segments. But, if there are 100 segments (from two-digit decimal numbers), most of the segments do not contain one of those single-hexadecimal-digit numbers. If you increase the number of hexadecimal digits, p, until 16p−1 > bn, then the spacing between the hexadecimal numbers is less than the width of the segments, and every segment contains a hexadecimal number.
This shows that using p hexadecimal digits is sufficient to distinguish numbers made with bn decimal digits. (This is sufficient, but it may be one more than necessary.) This means all the information needed to recover the original decimal number is present, and avoiding any loss of accuracy in recovering the original decimal number is a matter of programming the conversion from hexadecimal to decimal correctly.
Printing the fraction “like any other hexadecimal number” is inadequate if leading zeroes are not accounted for. The decimal numbers “3.7” and “3.007” are different, so the fraction part cannot be formatted merely as “7”. If a convention is adopted to convert the decimal part **including trailing zeros* to hexadecimal, then this could work. For example, if the decimal fixed-point number has four decimal digits after the decimal point, then treating the fraction parts of 3.7 and 3.007 as 7000 and 0070 and converting those to hexadecimal will preserve the required information. When converting back, one would convert the hexadecimal to decimal, format it in four digits, and insert it into the decimal fixed-point number. This could be a suitable solution where speed is desired, but it will not be a good representation for human use.
Of course, if one merely wishes to preserve the information in the number so that it can be transmitted or stored and later recovered, one might as well simply transmit the bits representing the number with whatever conversion is easiest to compute, such as formatting all the raw bits as hexadecimal.
Related
So I just made a float binary to float decimal converter. Now i'm attempting to do the opposite. My goal is to take a floating decimal and convert it to the IEEE 754 format. I've seen the methodology done on paper but when trying to implement it into code i'm running into a lot of issues when trying to manipulate the program's input, which is a char array. I'd show my code but it's extremely wrong and extremely bulky. In short, my method was to:
1. find the signed bit (negative/non-negative)
2. separate the whole number and fraction
3. find the whole numbers binary equivalent
3. find the fractional numbers binary pattern (multiplying fraction by 2 repeatedly)
4. recombine whole and fractional parts
5. find the exponent associated with the scientific notation form of of the combined values
6. add 127(bias) to exponent to find "exponent number"
7. convert exponent number to binary
8. finally, combine all these different values together as such:
signed bit char -> exponent bits array -> whole number binary array from [1] to [n] -> fractional numbers binary pattern array
which should theoretically give you the IEEE format. I'm running into a lot of errors when attempting to use this method, mainly with string manipulation, memory errors, etc. So my question is, is there a simpler way of accomplishing this rather than having multiple binary char array's and then combining them? Is it possible to atof() the initial input and work my way down from there? Any tips on making this process easier would be much appreciated.
Examples:
Input (from command line)
./file 250
output
11000011011110100000000000000000
Input
./file -0.78
output
10111111010001111010111000010100
Note* i'm to output 32 bits with a bias of 127
I've used this successfully in production code:
// endian issues???
union {
float f;
unsigned u;
} Rx_PWR0;
As per DICOM standard, a type of floating point can be stored using a Value Representation of Decimal String. See Table 6.2-1. DICOM Value Representations:
Decimal String: A string of characters representing either a fixed
point number or a floating point number. A fixed point number shall
contain only the characters 0-9 with an optional leading "+" or "-"
and an optional "." to mark the decimal point. A floating point number
shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to
indicate the start of the exponent. Decimal Strings may be padded with
leading or trailing spaces. Embedded spaces are not allowed.
"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default
Character Repertoire. 16 bytes maximum
So I would be tempted to simply use 64 bits double (IEEE 754-1985) to represent the value in memory in my C code, based on the fact that input is stored on a maximum of 16 bytes.
Could someone with a little bif more knowledge of X3.9-1978 confirms that this is the best possible representation (compared to arbitrary-precision, float and/or long double) ? By best, I mean the representation were round-trip read/write will be visually lossless. I should be able to read such ASCII floating point representation from disk, put it into memory, and write it back to disk (as specified above) with maximum accuracy compared to the original values (= machine epsilon when possible). The actual implementation details on how to represent a double as ASCII with only 16 bytes of storage is outside the scope of this question, see here for details..
This is heavily based on Hans Passant's and Mark Dickinson's comments.
Using any floating point type to represent decimal values is generally a bad idea because binary floating point types cannot exactly represent decimal values. Typically never use them for processing exact monetary values.
But here, DICOM spec sets the limit to 16 characters, when the precision of a double is about 15-16 decimal digits (ref.). As soon as your decimal string contains one sign (+/-), a dot (.) or an exponent part (e/E), you will have at most 15 decimal digits and a round trip should be correct. The only problems should occur when you have 16 digits. The example provided by Mark Dickinson is: the 16-character strings 9999999999999997 and 9999999999999996 would both map to the same IEEE 754 binary64 float value.
TL/DR: Hans Passant gave a nice abstract: "16 bytes maximum" [is] exactly as many accurate significant digits you can store in a double. This DICOM spec was written to let you use double. So just use it
Disclaimer: All values acceptable in IEEE 754 will be correctly processed, but beware, 1e1024 will be an acceptable value for a DICOM Decimal String, but it in not representable in a double (limited at about 1e308).
There are many questions (and answers) on this subject, but I am too thick to figure it out. In C, for a floating point of a given type, say double:
double x;
scanf("%lf", &x);
Is there a generic way to calculate an upper bound (as small as possible) for the error between the decimal fraction string passed to scanf and the internal representation of what is now in x?
If I understand correctly, there is sometimes going to be an error, and it will increase as the absolute value of the decimal fraction increases (in other words, 0.1 will be a bit off, but 100000000.1 will be off by much more).
This aspect of the C standard is slightly under-specified, but you can expect the conversion from decimal to double to be within one Unit in the Last Place of the original.
You seem to be looking for a bound on the absolute error of the conversion. With the above assumption, you can compute such a bound as a double as DBL_EPSILON * x. DBL_EPSILON is typically 2^-52.
A tighter bound on the error that can have been made during the conversion can be computed as follows:
double va = fabs(x);
double error = nextafter(va, +0./0.) - va;
The best conversion functions guarantee conversion to half a ULP in default round-to-nearest mode. If you are using conversion functions with this guarantee, you can divide the bound I offer by two.
The above applies when the original number represented in decimal is 0 or when its absolute value is comprised between DBL_MIN (approx. 2*10^-308) and DBL_MAX (approx. 2*10^308). If the non-null decimal number's absolute value is lower than DBL_MIN, then the absolute error is only bounded by DBL_MIN * DBL_EPSILON. If the absolute value is higher than DBL_MAX, you are likely to get infinity as the result of the conversion.
you cant think of this in terms of base 10, the error is in base 2, which wont necessarily point to a specific decimal place in base 10.
You have two underlying issues with your question, first scanf taking an ascii string and converting it to a binary number, that is one piece of software which uses a number of C libraries. I have seen for example compile time parsing vs runtime parsing give different conversion results on the same system. so in terms of error, if you want an exact number convert it yourself and place that binary number in the register/variable, otherwise accept what you get with the conversion and understand there may be rounding or clipping on the conversion that you didnt expect (which results in an accuracy issue, you didnt get the number you expected).
the second and real problem Pascal already answered. you only have x number if binary places. In terms of decimal if you had 3 decimal places the number 1.2345 would either have to be represented as 1.234 or 1.235. same for binary if you have 3 bits of mantissa then 1.0011 is either 1.001 or 1.010 depending on rounding. the mantissa length for IEEE floating point numbers is well documented you can simply google to find how many binary places you have for each precision.
int main()
{
float x=3.4e2;
printf("%f",x);
return 0;
}
Output:
340.000000 // It's ok.
But if write x=3.1234e2 the output is 312.339996 and if x=3.12345678e2 the output is 312.345673.
Why are the outputs like these? I think if I write x=3.1234e2 the output should be 312.340000, but the actual output is 312.339996 using GCC compiler.
Not all fractional numbers have an exact binary equivalent so it is rounded to the nearest value.
Simplified example,
if you have 3 bits for the fraction, you can have:
0
0.125
0.25
0.375
...
0.5 has an exact representation, but 0.1 will be shown as 0.125.
Of course the real differences are much smaller.
Floating-point numbers are normally represented as binary fractions times a power of two, for efficiency. This is about as accurate as base-10 representation, except that there are decimal fractions that cannot be exactly represented as binary fractions. They are, instead, represented as approximations.
Moreover, a float is normally 32 bits long, which means that it doesn't have all that many significant digits. You can see in your examples that they're accurate to about 8 significant digits.
You are, however, printing the numbers to slightly beyond their significance, and therefore you're seeing the difference. Look at your printf format string documentation to see how to print fewer digits.
You may need to represent decimal numbers exactly; this often happens in financial applications. In that case, you need to use a special library to represent numbers, or simply calculate everything as integers (such as representing amounts as cents rather than as dollars and fractions of a dollar).
The standard reference is What Every Computer Scientist Should Know About Floating-Point Arithmetic, but it looks like that would be very advanced for you. Alternatively, you could Google floating-point formats (particularly IEEE standard formats) or look them up on Wikipedia, if you wanted the details.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
After all of this time, I've never thought to ask this question; I understand this came from c++, but what was the reasoning behind it:
Specify decimal numbers as you
normally would
Specify octal numbers by a leading 0
Specify hexadecimal numbers by a leading 0x
Why 0? Why 0x? Is there a natural progression for base-32?
C, the ancestor of C++ and Java, was originally developed by Dennis Richie on PDP-8s in the early 70s. Those machines had a 12-bit address space, so pointers (addresses) were 12 bits long and most conveniently represented in code by four 3-bit octal digits (first addressable word would be 0000octal, last addressable word 7777octal).
Octal does not map well to 8 bit bytes because each octal digit represents three bits, so there will always be excess bits representable in the octal notation. An all-TRUE-bits byte (1111 1111) is 377 in octal, but FF in hex.
Hex is easier for most people to convert to and from binary in their heads, since binary numbers are usually expressed in blocks of eight (because that's the size of a byte) and eight is exactly two Hex digits, but Hex notation would have been clunky and misleading in Dennis' time (implying the ability to address 16 bits). Programmers need to think in binary when working with hardware (for which each bit typically represents a physical wire) and when working with bit-wise logic (for which each bit has a programmer-defined meaning).
I imagine Dennis added the 0 prefix as the simplest possible variation on everyday decimal numbers, and easiest for those early parsers to distinguish.
I believe Hex notation 0x__ was added to C slightly later. The compiler parse tree to distinguish 1-9 (first digit of a decimal constant), 0 (first [insignificant] digit of an octal constant), and 0x (indicating a hex constant to follow in subsequent digits) from each other is considerably more complicated than just using a leading 0 as the indicator to switch from parsing subsequent digits as octal rather than decimal.
Why did Dennis design this way? Contemporary programmers don't appreciate that those early computers were often controlled by toggling instructions to the CPU by physically flipping switches on the CPUs front panel, or with a punch card or paper tape; all environments where saving a few steps or instructions represented savings of significant manual labor. Also, memory was limited and expensive, so saving even a few instructions had a high value.
In summary:
0 for octal because it was efficiently parseable and octal was user-friendly on PDP-8s (at least for address manipulation)
0x for hex probably because it was a natural and backward-compatible extension on the octal prefix standard and still relatively efficient to parse.
The zero prefix for octal, and 0x for hex, are from the early days of Unix.
The reason for octal's existence dates to when there was hardware with 6-bit bytes, which made octal the natural choice. Each octal digit represents 3 bits, so a 6-bit byte is two octal digits. The same goes for hex, from 8-bit bytes, where a hex digit is 4 bits and thus a byte is two hex digits. Using octal for 8-bit bytes requires 3 octal digits, of which the first can only have the values 0, 1, 2 and 3 (the first digit is really 'tetral', not octal).
There is no reason to go to base32 unless somebody develops a system in which bytes are ten bits long, so a ten-bit byte could be represented as two 5-bit "nybbles".
“New” numerals had to start with a digit, to work with existing syntax.
Established practice had variable names and other identifiers starting with a letter (or a few other symbols, perhaps underscore or dollar sign). So “a”, “abc”, and “a04” are all names. Numbers started with a digit. So “3” and “3e5” are numbers.
When you add new things to a programming language, you seek to make them fit into the existing syntax, grammar, and semantics, and you try to make existing code continue working. So, you would not want to change the syntax to make “x34” a hexadecimal number or “o34” an octal number.
So, how do you fit octal numerals into this syntax? Somebody realized that, except for “0”, there is no need for numerals beginning with “0”. Nobody needs to write “0123” for 123. So we use a leading zero to denote octal numerals.
What about hexadecimal numerals? You could use a suffix, so that “34x” means 3416. However, then the parser has to read all the way to the end of the numeral before it knows how to interpret the digits (unless it encounters one of the “a” to “f” digits, which would of course indicate hexadecimal). It is “easier” on the parser to know that the numeral is hexadecimal early. But you still have to start with a digit, and the zero trick has already been used, so we need something else. “x” was picked, and now we have “0x” for hexadecimal.
(The above is based on my understanding of parsing and some general history about language development, not on knowledge of specific decisions made by compiler developers or language committees.)
I dunno ...
0 is for 0ctal
0x is for, well, we've already used 0 to mean octal and there's an x in hexadecimal so bung that in there too
as for natural progression, best look to the latest programming languages which can affix subscripts such as
123_27 (interpret _ to mean subscript)
and so on
?
Mark
Is there a natural progression for base-32?
This is part of why Ada uses the form 16# to introduce hex constants, 8# for octal, 2# for binary, etc.
I wouldn't concern myself too much over needing space for "future growth" in basing though. This isn't like RAM or addressing space where you need an order of magnitude more every generation.
In fact, studies have shown that octal and hex are pretty much the sweet spot for human-readable representations that are binary-compatible. If you go any lower than octal, it starts to require a rediculous number of digits to represent larger numbers. If you go any higher than hex, the math tables get rediculously large. Hex is actually a bit too much already, but Octal has the problem that it doesn't evenly fit in a byte.
There is a standard encoding for Base32. It is very similar to Base64. But it isn't very convenient to read. Hex is used because 2 hex digits can be used to represent 1 8-bit byte. And octal was used primarily for older systems that used 12-bit bytes. It made for a more compact representation of data when compared to displaying raw registers as binary.
It should also be noted that some languages use o### for octal and x## or h## for hex, as well as, many other variations.
I think it 0x actually came for the UNIX/Linux world and was picked-up by C/C++ and other languages. But I don't know the exact reason or true origin.