Best IEEE 754-1985 representation for X3.9-1978 based standard - c

As per DICOM standard, a type of floating point can be stored using a Value Representation of Decimal String. See Table 6.2-1. DICOM Value Representations:
Decimal String: A string of characters representing either a fixed
point number or a floating point number. A fixed point number shall
contain only the characters 0-9 with an optional leading "+" or "-"
and an optional "." to mark the decimal point. A floating point number
shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to
indicate the start of the exponent. Decimal Strings may be padded with
leading or trailing spaces. Embedded spaces are not allowed.
"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default
Character Repertoire. 16 bytes maximum
So I would be tempted to simply use 64 bits double (IEEE 754-1985) to represent the value in memory in my C code, based on the fact that input is stored on a maximum of 16 bytes.
Could someone with a little bif more knowledge of X3.9-1978 confirms that this is the best possible representation (compared to arbitrary-precision, float and/or long double) ? By best, I mean the representation were round-trip read/write will be visually lossless. I should be able to read such ASCII floating point representation from disk, put it into memory, and write it back to disk (as specified above) with maximum accuracy compared to the original values (= machine epsilon when possible). The actual implementation details on how to represent a double as ASCII with only 16 bytes of storage is outside the scope of this question, see here for details..

This is heavily based on Hans Passant's and Mark Dickinson's comments.
Using any floating point type to represent decimal values is generally a bad idea because binary floating point types cannot exactly represent decimal values. Typically never use them for processing exact monetary values.
But here, DICOM spec sets the limit to 16 characters, when the precision of a double is about 15-16 decimal digits (ref.). As soon as your decimal string contains one sign (+/-), a dot (.) or an exponent part (e/E), you will have at most 15 decimal digits and a round trip should be correct. The only problems should occur when you have 16 digits. The example provided by Mark Dickinson is: the 16-character strings 9999999999999997 and 9999999999999996 would both map to the same IEEE 754 binary64 float value.
TL/DR: Hans Passant gave a nice abstract: "16 bytes maximum" [is] exactly as many accurate significant digits you can store in a double. This DICOM spec was written to let you use double. So just use it
Disclaimer: All values acceptable in IEEE 754 will be correctly processed, but beware, 1e1024 will be an acceptable value for a DICOM Decimal String, but it in not representable in a double (limited at about 1e308).

Related

Is there a C equivalent of C#'s decimal type?

In relation to: Convert Decimal to Double
Now, I got to many questions relating to C#'s floating-point type called decimal and I've seen its differences with both float and double, which got me thinking if there is an equivalent to this type in C.
In the question specified, I got an answer I want to convert to C:
double trans = trackBar1.Value / 5000.0;
double trans = trackBar1.Value / 5000d;
Of course, the only change is the second line gone, but with the thing about the decimal type, I want to know it's C equivalent.
Question: What is the C equivalent of C#'s decimal?
C2X will standardize decimal floating point as _DecimalN: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2573.pdf
In addition, GCC implements decimal floating point as an extension; it currently supports 32-bit, 64-bit, and 128-bit decimal floats.
Edit: much of what I said below is just plain wrong, as pointed out by phuclv in the comments. Even then, I think there's valuable information to be gained in reading that answer, so I'll leave it unedited below.
So in short: yes, there is support for Decimal floating-point values and arithmetic in the standard C language. Just check out phuclv's comment and S.S. Anne's answer.
In the C programming language, as others have commented, there's no such thing as a Decimal type, nor are there types implemented like it. The simplest type that is close to it would be double, which is implemented, most commonly, as an IEEE-754 compliant 64-bit floating-point type. It contains a 1-bit sign, an 11-bit exponent and a 52-bit mantissa/fraction. The following image represents it quite well(from wikipedia):
So you have the following format:
A more detailed explanation can be read here, but you can see that the exponent part is a power of two, which means that there will be imprecisions when dealing with division and multiplication by ten. A simple explanation is because division by anything that isn't a power of two is sure to repeat digits indefinitely in base 2. Example: 1/10 = 0.1(in base 10) = 0.00011001100110011...(in base 2). And, because computers can't store an unlimited amount of zeroes, your operations will have to be truncated/approximated.
In the case of C#'s Decimal, from the documentation:
The binary representation of a Decimal number consists of a 1-bit sign, a 96-bit integer number, and a scaling factor used to divide the integer number and specify what portion of it is a decimal fraction.
This last part is important, because instead of being a multiplication by a power of two, it is a division by a power of ten. So you have the following format:
Which, as you can clearly see, is a completely different implementation from above!
For instance, if you wanted to divide by a power of 10, you could do that exactly, because that just involves increasing the exponent part(N). You have to be aware of the limitation of the numbers that can be represented by Decimal, though, which is at most a measly 7.922816251426434e+28, whereas double can go up to 1.79769e+308.
Given that there are no equivalents (yet) in C to Decimal, you may wonder "what do I do?". Well, it depends. First off, is it really important for you to use a Decimal type? Can't you use a double? To answer that question, it's helpful to know why that type was created in the first place. Again, from Microsoft's documentation:
The Decimal value type is appropriate for financial calculations that require large numbers of significant integral and fractional digits and no round-off errors
And, just at the next phrase:
The Decimal type does not eliminate the need for rounding. Rather, it minimizes errors due to rounding
So you shouldn't think of Decimal as having "infinite precision", just as being a more appropriate type for calculations that generally need to be made in the decimal system(such as financial ones, as stated above).
If you still want a Decimal data type in C, you'd have to work in developing a library to support addition, subtraction, multiplication, etc --- Because C doesn't support operator overloading. Also, it still wouldn't have hardware support(e.g. from the x64 instruction set), so all of your operations would be slower than those of double, for example. Finally, if you still want something that supports a Decimal in other languages(in your final question), you may look into Decimal TR in C++.
As other pointed out, there's nothing in C standard(s) such as .NET's decimal, but, if you're working on Windows and have the Windows SDK, it's defined:
DECIMAL structure (wtypes.h)
Represents a decimal data type that provides a sign and scale for a
number (as in coordinates.)
Decimal variables are stored as 96-bit (12-byte) unsigned integers
scaled by a variable power of 10. The power of 10 scaling factor
specifies the number of digits to the right of the decimal point, and
ranges from 0 to 28.
typedef struct tagDEC {
USHORT wReserved;
union {
struct {
BYTE scale;
BYTE sign;
} DUMMYSTRUCTNAME;
USHORT signscale;
} DUMMYUNIONNAME;
ULONG Hi32;
union {
struct {
ULONG Lo32;
ULONG Mid32;
} DUMMYSTRUCTNAME2;
ULONGLONG Lo64;
} DUMMYUNIONNAME2;
} DECIMAL;
DECIMAL is used to represent an exact numeric value with a fixed precision and fixed scale.
The origin of this type is Windows' COM/OLE automation (introduced for VB/VBA/Macros, etc. so, it predates .NET, which has very good COM automation support), documented here officially: [MS-OAUT]: OLE Automation Protocol, 2.2.26 DECIMAL
It's also one of the VARIANT type (VT_DECIMAL). In x86 architecture, it's size fits right in the VARIANT (16 bytes).
Decimal type in C# is used is used with precision of 28-29 digits and it has size of 16 bytes.There is not even a close equivalent in C to C#.In Java there is a BigDecimal data type that is closest to C# decimal data type.C# decimal gives you numbers like:
+/- someInteger / 10 ^ someExponent
where someInteger is a 96 bit unsigned integer and someExponent is an integer between 0 and 28.
Is Java's BigDecimal the closest data type corresponding to C#'s Decimal?

Print decimal fixed-point/float formats as hexadecimal [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Adding the ability to print a decimal fixed-point number as hexadecimal in my general purpose library and realized i wasn't %100 sure how i should represent the fraction part of the number. A quick google search suggests i should:
Multiply by 16
Convert the integer part to hex and add it to the buffer
Get rid of the integer part
Repeat
As suggested here https://bytes.com/topic/c/answers/219928-how-convert-float-hex
This method is for floating-point (ieee 754 binary formats) and it works for that just fine. However, i tried to adopt this to my decimal fixed-point (scaled by 8) format, and after testing this approach on paper i noticed for some fractions (i.e. .7), this causes a repeating pattern of .B3333... and so on.
To me this looks very undesirable. I also wonder if this would case a loss in precision if i was to try to read this from a string into my fixed-point format.
Is there any reason why someone wouldn't print the fraction part like any other 2s complement hexadecimal number? i.e where 17535.564453 is printed as 447F.89CE5
While this is targeted at decimal fixed-point, I'm looking for a solution that can also be used by other real number formats such as ieee 754 binary.
Perhaps theres another alternative to these 2 methods. Any ideas?
Although the question asks about fixed-point, the C standard has some useful information in its information for the %a format for floating-point. C 2018 7.21.6.1 8 says:
… if the [user-requested] precision is missing and FLT_RADIX is not a power of 2, then the precision is sufficient to distinguish285) values of type double, except that trailing zeros may be omitted;…
Footnote 285 says:
The precision p is sufficient to distinguish values of the source type if 16p−1 > bn where b is FLT_RADIX and n is the number of base-b digits in the significand of the source type…
To see this intuitively, visualize the decimal fixed-point numbers on the real number line from 0 to 1. For each such number x, visualize a segment starting halfway toward the previous fixed-point number and ending halfway toward the next fixed-point number. All the points in that segment are closer to x than they are to the previous or next numbers, except for the endpoints. Now, consider where all the single-hexadecimal-digit numbers j/16 are. They lie in some of those segments. But, if there are 100 segments (from two-digit decimal numbers), most of the segments do not contain one of those single-hexadecimal-digit numbers. If you increase the number of hexadecimal digits, p, until 16p−1 > bn, then the spacing between the hexadecimal numbers is less than the width of the segments, and every segment contains a hexadecimal number.
This shows that using p hexadecimal digits is sufficient to distinguish numbers made with bn decimal digits. (This is sufficient, but it may be one more than necessary.) This means all the information needed to recover the original decimal number is present, and avoiding any loss of accuracy in recovering the original decimal number is a matter of programming the conversion from hexadecimal to decimal correctly.
Printing the fraction “like any other hexadecimal number” is inadequate if leading zeroes are not accounted for. The decimal numbers “3.7” and “3.007” are different, so the fraction part cannot be formatted merely as “7”. If a convention is adopted to convert the decimal part **including trailing zeros* to hexadecimal, then this could work. For example, if the decimal fixed-point number has four decimal digits after the decimal point, then treating the fraction parts of 3.7 and 3.007 as 7000 and 0070 and converting those to hexadecimal will preserve the required information. When converting back, one would convert the hexadecimal to decimal, format it in four digits, and insert it into the decimal fixed-point number. This could be a suitable solution where speed is desired, but it will not be a good representation for human use.
Of course, if one merely wishes to preserve the information in the number so that it can be transmitted or stored and later recovered, one might as well simply transmit the bits representing the number with whatever conversion is easiest to compute, such as formatting all the raw bits as hexadecimal.

How are floating point numbers stored in memory?

I've read that they're stored in the form of mantissa and exponent
I've read this document but I could not understand anything.
To understand how they are stored, you must first understand what they are and what kind of values they are intended to handle.
Unlike integers, a floating-point value is intended to represent extremely small values as well as extremely large. For normal 32-bit floating-point values, this corresponds to values in the range from 1.175494351 * 10^-38 to 3.40282347 * 10^+38.
Clearly, using only 32 bits, it's not possible to store every digit in such numbers.
When it comes to the representation, you can see all normal floating-point numbers as a value in the range 1.0 to (almost) 2.0, scaled with a power of two. So:
1.0 is simply 1.0 * 2^0,
2.0 is 1.0 * 2^1, and
-5.0 is -1.25 * 2^2.
So, what is needed to encode this, as efficiently as possible? What do we really need?
The sign of the expression.
The exponent
The value in the range 1.0 to (almost) 2.0. This is known as the "mantissa" or the significand.
This is encoded as follows, according to the IEEE-754 floating-point standard.
The sign is a single bit.
The exponent is stored as an unsigned integer, for 32-bits floating-point values, this field is 8 bits. 1 represents the smallest exponent and "all ones - 1" the largest. (0 and "all ones" are used to encode special values, see below.) A value in the middle (127, in the 32-bit case) represents zero, this is also known as the bias.
When looking at the mantissa (the value between 1.0 and (almost) 2.0), one sees that all possible values start with a "1" (both in the decimal and binary representation). This means that it's no point in storing it. The rest of the binary digits are stored in an integer field, in the 32-bit case this field is 23 bits.
In addition to the normal floating-point values, there are a number of special values:
Zero is encoded with both exponent and mantissa as zero. The sign bit is used to represent "plus zero" and "minus zero". A minus zero is useful when the result of an operation is extremely small, but it's still important to know from which direction the operation came from.
plus and minus infinity -- represented using an "all ones" exponent and a zero mantissa field.
Not a Number (NaN) -- represented using an "all ones" exponent and a non-zero mantissa.
Denormalized numbers -- numbers smaller than the smallest normal number. Represented using a zero exponent field and a non-zero mantissa. The special thing with these numbers is that the precision (i.e. the number of digits a value can contain) will drop the smaller the value becomes, simply because there is not room for them in the mantissa.
Finally, the following is a handful of concrete examples (all values are in hex):
1.0 : 3f800000
-1234.0 : c49a4000
100000000000000000000000.0: 65a96816
In layman's terms, it's essentially scientific notation in binary. The formal standard (with details) is IEEE 754.
typedef struct {
unsigned int mantissa_low:32;
unsigned int mantissa_high:20;
unsigned int exponent:11;
unsigned int sign:1;
} tDoubleStruct;
double a = 1.2;
tDoubleStruct* b = reinterpret_cast<tDoubleStruct*>(&a);
Is an example how memory is set up if compiler uses IEEE 754 double precision which is the default for a C double on little endian systems (e.g. Intel x86).
Here it is in C based binary form and better read
wikipedia about double precision to understand it.
There are a number of different floating-point formats. Most of them share a few common characteristics: a sign bit, some bits dedicated to storing an exponent, and some bits dedicated to storing the significand (also called the mantissa).
The IEEE floating-point standard attempts to define a single format (or rather set of formats of a few sizes) that can be implemented on a variety of systems. It also defines the available operations and their semantics. It's caught on quite well, and most systems you're likely to encounter probably use IEEE floating-point. But other formats are still in use, as well as not-quite-complete IEEE implementations. The C standard provides optional support for IEEE, but doesn't mandate it.
The mantissa represents the most significant bits of the number.
The exponent represents how many shifts are to be performed on the mantissa in order to get the actual value of the number.
Encoding specifies how are represented sign of mantissa and sign of exponent (basically whether shifting to the left or to the right).
The document you refer to specifies IEEE encoding, the most widely used.
I have found the article you referenced quite illegible (and I DO know a little how IEEE floats work). I suggest you try with the Wiki version of the explanation. It's quite clear and has various examples:
http://en.wikipedia.org/wiki/Single_precision and http://en.wikipedia.org/wiki/Double_precision
It is implementation defined, although IEEE-754 is the most common by far.
To be sure that IEEE-754 is used:
in C, use #ifdef __STDC_IEC_559__
in C++, use the std::numeric_limits<float>::is_iec559 constants
I've written some guides on IEEE-754 at:
In Java, what does NaN mean?
What is a subnormal floating point number?

Why does the value of this float change from what it was set to?

Why is this C program giving the "wrong" output?
#include<stdio.h>
void main()
{
float f = 12345.054321;
printf("%f", f);
getch();
}
Output:
12345.054688
But the output should be, 12345.054321.
I am using VC++ in VS2008.
It's giving the "wrong" answer simply because not all real values are representable by floats (or doubles, for that matter). What you'll get is an approximation based on the underlying encoding.
In order to represent every real value, even between 1.0x10-100 and 1.1x10-100 (a truly minuscule range), you still require an infinite number of bits.
Single-precision IEEE754 values have only 32 bits available (some of which are tasked to other things such as exponent and NaN/Inf representations) and cannot therefore give you infinite precision. They actually have 23 bits available giving precision of about 224 (there's an extra implicit bit) or just over 7 decimal digits (log10(224) is roughly 7.2).
I enclose the word "wrong" in quotes because it's not actually wrong. What's wrong is your understanding about how computers represent numbers (don't be offended though, you're not alone in this misapprehension).
Head on over to http://www.h-schmidt.net/FloatApplet/IEEE754.html and type your number into the "Decimal representation" box to see this in action.
If you want a more accurate number, use doubles instead of floats - these have double the number of bits available for representing values (assuming your C implementation is using IEEE754 single and double precision data types for float and double respectively).
If you want arbitrary precision, you'll need to use a "bignum" library like GMP although that's somewhat slower than native types so make sure you understand the trade-offs.
The decimal number 12345.054321 cannot be represented accurately as a float on your platform. The result that you are seeing is a decimal approximation to the closest number that can be represented as a float.
floats are about convenience and speed, and use a binary representation - if you care about precision use a decimal type.
To understand the problem, read What Every Computer Scientist Should Know About Floating-Point Arithmetic:
http://docs.sun.com/source/806-3568/ncg_goldberg.html
For a solution, see the Decimal Arithmetic FAQ:
http://speleotrove.com/decimal/decifaq.html
It's all to do with precision. Your number cannot be stored accurately in a float.
Single-precision floating point values can only represent about eight to nine significant (decimal) digits. Beyond that point, you're seeing quantization error.

Where did the octal/hex notations come from? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
After all of this time, I've never thought to ask this question; I understand this came from c++, but what was the reasoning behind it:
Specify decimal numbers as you
normally would
Specify octal numbers by a leading 0
Specify hexadecimal numbers by a leading 0x
Why 0? Why 0x? Is there a natural progression for base-32?
C, the ancestor of C++ and Java, was originally developed by Dennis Richie on PDP-8s in the early 70s. Those machines had a 12-bit address space, so pointers (addresses) were 12 bits long and most conveniently represented in code by four 3-bit octal digits (first addressable word would be 0000octal, last addressable word 7777octal).
Octal does not map well to 8 bit bytes because each octal digit represents three bits, so there will always be excess bits representable in the octal notation. An all-TRUE-bits byte (1111 1111) is 377 in octal, but FF in hex.
Hex is easier for most people to convert to and from binary in their heads, since binary numbers are usually expressed in blocks of eight (because that's the size of a byte) and eight is exactly two Hex digits, but Hex notation would have been clunky and misleading in Dennis' time (implying the ability to address 16 bits). Programmers need to think in binary when working with hardware (for which each bit typically represents a physical wire) and when working with bit-wise logic (for which each bit has a programmer-defined meaning).
I imagine Dennis added the 0 prefix as the simplest possible variation on everyday decimal numbers, and easiest for those early parsers to distinguish.
I believe Hex notation 0x__ was added to C slightly later. The compiler parse tree to distinguish 1-9 (first digit of a decimal constant), 0 (first [insignificant] digit of an octal constant), and 0x (indicating a hex constant to follow in subsequent digits) from each other is considerably more complicated than just using a leading 0 as the indicator to switch from parsing subsequent digits as octal rather than decimal.
Why did Dennis design this way? Contemporary programmers don't appreciate that those early computers were often controlled by toggling instructions to the CPU by physically flipping switches on the CPUs front panel, or with a punch card or paper tape; all environments where saving a few steps or instructions represented savings of significant manual labor. Also, memory was limited and expensive, so saving even a few instructions had a high value.
In summary:
0 for octal because it was efficiently parseable and octal was user-friendly on PDP-8s (at least for address manipulation)
0x for hex probably because it was a natural and backward-compatible extension on the octal prefix standard and still relatively efficient to parse.
The zero prefix for octal, and 0x for hex, are from the early days of Unix.
The reason for octal's existence dates to when there was hardware with 6-bit bytes, which made octal the natural choice. Each octal digit represents 3 bits, so a 6-bit byte is two octal digits. The same goes for hex, from 8-bit bytes, where a hex digit is 4 bits and thus a byte is two hex digits. Using octal for 8-bit bytes requires 3 octal digits, of which the first can only have the values 0, 1, 2 and 3 (the first digit is really 'tetral', not octal).
There is no reason to go to base32 unless somebody develops a system in which bytes are ten bits long, so a ten-bit byte could be represented as two 5-bit "nybbles".
“New” numerals had to start with a digit, to work with existing syntax.
Established practice had variable names and other identifiers starting with a letter (or a few other symbols, perhaps underscore or dollar sign). So “a”, “abc”, and “a04” are all names. Numbers started with a digit. So “3” and “3e5” are numbers.
When you add new things to a programming language, you seek to make them fit into the existing syntax, grammar, and semantics, and you try to make existing code continue working. So, you would not want to change the syntax to make “x34” a hexadecimal number or “o34” an octal number.
So, how do you fit octal numerals into this syntax? Somebody realized that, except for “0”, there is no need for numerals beginning with “0”. Nobody needs to write “0123” for 123. So we use a leading zero to denote octal numerals.
What about hexadecimal numerals? You could use a suffix, so that “34x” means 3416. However, then the parser has to read all the way to the end of the numeral before it knows how to interpret the digits (unless it encounters one of the “a” to “f” digits, which would of course indicate hexadecimal). It is “easier” on the parser to know that the numeral is hexadecimal early. But you still have to start with a digit, and the zero trick has already been used, so we need something else. “x” was picked, and now we have “0x” for hexadecimal.
(The above is based on my understanding of parsing and some general history about language development, not on knowledge of specific decisions made by compiler developers or language committees.)
I dunno ...
0 is for 0ctal
0x is for, well, we've already used 0 to mean octal and there's an x in hexadecimal so bung that in there too
as for natural progression, best look to the latest programming languages which can affix subscripts such as
123_27 (interpret _ to mean subscript)
and so on
?
Mark
Is there a natural progression for base-32?
This is part of why Ada uses the form 16# to introduce hex constants, 8# for octal, 2# for binary, etc.
I wouldn't concern myself too much over needing space for "future growth" in basing though. This isn't like RAM or addressing space where you need an order of magnitude more every generation.
In fact, studies have shown that octal and hex are pretty much the sweet spot for human-readable representations that are binary-compatible. If you go any lower than octal, it starts to require a rediculous number of digits to represent larger numbers. If you go any higher than hex, the math tables get rediculously large. Hex is actually a bit too much already, but Octal has the problem that it doesn't evenly fit in a byte.
There is a standard encoding for Base32. It is very similar to Base64. But it isn't very convenient to read. Hex is used because 2 hex digits can be used to represent 1 8-bit byte. And octal was used primarily for older systems that used 12-bit bytes. It made for a more compact representation of data when compared to displaying raw registers as binary.
It should also be noted that some languages use o### for octal and x## or h## for hex, as well as, many other variations.
I think it 0x actually came for the UNIX/Linux world and was picked-up by C/C++ and other languages. But I don't know the exact reason or true origin.

Resources