I understand that the two's complement is used to represent a negative number but there is also the method of using sign magnitude. Is sign magnitude still used to represent negative numbers? If not where was it previously used then? And how is a machine that interprets negative numbers using two's complement able to communicate and understand another machine that uses the sign magnitude instead?
Yes, sign magnitude is frequently used today, though not where you might expect. For example, IEEE floating point uses a single "sign" bit to denote positive or negative. (As a result, IEEE floating point numbers can be -0.) Sign magnitude is not commonly used today for integers, however.
Communication between two machines using different number representations only presents a problem if they both try to use their native encoding format. If a common format is defined for exchanging information, there is no problem. For example, a machine that uses two's complement can easily construct a number using sign magnitude encoding (and vice versa). These days, different machines are more likely to communicate using the ASCII representation of the number (eg. in JSON or XML), or to use a completely different binary encoding (eg. ASN.1, zigzag encoding, etc).
Related
As far as I know, representing a fraction in C relies on floats and doubles which are in floating point representation.
Assume I'm trying to represent 1.5 which is a fixed point number (only one digit to the right of the radix point). Is there a way to represent such number in C or even assembly using a fixed point data type?
Are there even any fixed point instructions on x86 (or other architectures) which would operate on such type?
Every integral type can be used as a fixed point type. A favorite of mine is to use int64_t with an implied 8 digit shift, e.g. you store 1.5 as 150000000 (1.5e8). You'll have to analyze your use case to decide on an underlying type and how many digits to shift (that is, assuming you use base-10 scaling, which most people do). But 64 bits scaled by 10^8 is a pretty reasonable starting point with a broad range of uses.
While some C compilers offer special fixed-point types as an extension (not part of the standard C language), there's really very little use for them. Fixed point is just integers, interpreted with a different unit. For example, fixed point currency in typical cent denominations is just using integers that represent cents instead of dollars (or whatever the whole currency unit is) for your unit. Likewise, you can think of 8-bit RGB as having units of 1/256 or 1/255 "full intensity".
Adding and subtracting fixed point values with the same unit is just adding and subtracting integers. This is just like arithmetic with units in the physical sciences. The only value in having the language track that they're "fixed point" would be ensuring that you can only add/subtract values with matching units.
For multiplication and division, the result will not have the same units as the operands so you have to either treat the result as a different fixed-point type, or renormalize. For example if you multiply two values representing 1/16 units, the result will have 1/256 units. You can then either scale the value down by a factor of 16 (rounding in whatever way is appropriate) to get back to a value with 1/16 units.
If the issue here is representing decimal values as fixed point, there's probably a library for this for C, you could try a web search. You could create your own BCD fixed point library in assembly, using the BCD related instructions, AAA (adjusts after addition), AAS (adjusts after subtraction) and AAM (adjusts after multiplication). However, it seems these instructions are invalid in X86 X64 (64 bit) mode, so you'll need to use a 32 bit program, which should be runnable on a 64 bit OS.
Financial institutions in the USA and other countries are required by law to perform decimal based math on currency values, to avoid decimal -> binary -> decimal conversion issues.
I have read:
"Like an unsigned int, but offset by −(2^(n−1) − 1), where n is the number of bits in the numeral. Aside:
Technically we could choose any bias we please, but the choice presented here is extraordinarily common." - http://inst.eecs.berkeley.edu/~cs61c/sp14/disc/00/Disc0.pdf
However, I don't get what the point is. Can someone explain this to me with examples? Also, when should I use it, given other options like one's compliment, sign and mag, and two's compliment?
Biased notation is a way of storing a range of values that doesn't start with zero.
Put simply, you take an existing representation that goes from zero to N, and then add a bias B to each number so it now goes from B to N+B.
Floating-point exponents are stored with a bias to keep the dynamic range of the type "centered" on 1.
Excess-three encoding is a technique for simplifying decimal arithmetic using a bias of three.
Two's complement notation could be considered as biased notation with a bias of INT_MIN and the most-significant bit flipped.
A "representation" is a way of encoding information so that it easy to extract details or inferences from the encoded information.
Most modern CPUs "represent" numbers using "twos complement notation". They do this because it is easy to design digital circuits that can do what amounts to arithmetic on these values quickly (add, subtract, multiply, divide, ...). Twos complement also has the nice property that one can interpret the most significant bit as either a power-of-two (giving "unsigned numbers") or as a sign bit (giving signed numbers) without changing essentially any of the hardware used to implement the arithmetic.
Older machines used other bases, e.g, quite common in the 60s were machines that represented numbers as sets of binary-coded-decimal digits stuck in 4-bit addressable nibbles (the IBM 1620 and 1401 are examples of this). So, you can represent that same concept or value different ways.
A bias just means that whatever representation you chose (for numbers), you have added a constant bias to that value. Presumably that is done to enable something to be done more effectively. I can't speak to "−(2^(n−1) − 1)" being "an extraordinaly common (bias)"; I do lots of assembly and C coding and pretty don't find a need to "bias" values.
However, there is a common example. Modern CPUs largely implement IEEE floating point, which stores floating point numbers with sign, exponent, mantissa. The exponent is is power of two, symmetric around zero, but biased by 2^(N-1) if I recall correctly, for an N-bit exponent.
This bias allows floating point values with the same sign to be compared for equal/less/greater by using the standard machine twos-complement instructions rather than a special floating point instruction, which means that sometimes use of actual floating point compares can be avoided. (See http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm for dark corner details). [Thanks to #PotatoSwatter for noting
the inaccuracy of my initial answer here, and making me go dig this out.]
For a C application that I am implementing, I need to be able to read and write a set of configuration values to a file. These values are floating point numbers. In the future it is possible that another application (could be written in C++, Python, Perl, etc...) will use this same data, so these configuration values need to be stored in a well defined format that is compiler and machine independent.
Byte order conversion functions (ntoh/hton) can be used to handle the Endianness, however what is the best way to get around the different meanings of "float" value? Is there are common method for storing floats? Rounding and truncating is not a problem, just as long as it is defined.
There are probably two main options:
Store in text format. Here you would standardise on a particular format using a well-defined decimal separator and use scientific notation, i.e. 6.66e42.
Store in binary format using the IEEE754 standard. Use either the 4 or 8 byte data type. And as you noted, you'd have to settle on an endianness convention.
A text format is probably more portable because there are machines that do not natively understand IEEE754. That said, such machines are rare in these times.
The C formatted input/output functions have a format specifier for this, %a. It formats a floating-point number in a hexadecimal floating-point format, [-]0xh.hhhhp±d. That is, it has a “-” sign if needed, hexadecimal digits for the fraction part, including a radix point, a “p” (for “power”) to start the exponent and a signed exponent of two (in decimal).
As long as your C implementation uses binary floating-point (or any floating-point such that its FLT_RADIX is a power of two), conversion with the %a format should be exact.
IEEE 754, or ISO/IEC/IEEE 60559:2011, is the standard for floating point used by most languages.
For C, it's officially taken by standard in C11. (C11 Annex F IEC 60559 floating-point arithmetic)
For small amounts of data, such as configuration values, go with text not binary. If you want, go for structured text of some form, such as JSON, XML. Do decide on how many digits to write to represent a floating-point number according to your requirements.
As the range of required portability (across languages, operating systems, time, space, etc) increases so the force of the argument in favour of text becomes stronger.
The latter representation looks more natural to understand. Why do most languages choose the former one? I guess there must be some unique and thus advantageous characteristics in the Two's complement which make data operations easier.
Languages don't specify the number format; the hardware does. Ask Intel why they designed their ALU to do 2's complement
The answer will be because the mathematical operations are more regular in 2s complement; positive and negative numbers need to be handled differently in 1s complement, which means double the hardware/microcode needed for basic math in the CPU.
From Wikipedia
The two's-complement system has the advantage that the fundamental arithmetical operations of addition, subtraction and multiplication are identical, regardless of whether the inputs and outputs are interpreted as unsigned binary numbers or two's complement (provided that overflow is ignored). This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.
I've read that they're stored in the form of mantissa and exponent
I've read this document but I could not understand anything.
To understand how they are stored, you must first understand what they are and what kind of values they are intended to handle.
Unlike integers, a floating-point value is intended to represent extremely small values as well as extremely large. For normal 32-bit floating-point values, this corresponds to values in the range from 1.175494351 * 10^-38 to 3.40282347 * 10^+38.
Clearly, using only 32 bits, it's not possible to store every digit in such numbers.
When it comes to the representation, you can see all normal floating-point numbers as a value in the range 1.0 to (almost) 2.0, scaled with a power of two. So:
1.0 is simply 1.0 * 2^0,
2.0 is 1.0 * 2^1, and
-5.0 is -1.25 * 2^2.
So, what is needed to encode this, as efficiently as possible? What do we really need?
The sign of the expression.
The exponent
The value in the range 1.0 to (almost) 2.0. This is known as the "mantissa" or the significand.
This is encoded as follows, according to the IEEE-754 floating-point standard.
The sign is a single bit.
The exponent is stored as an unsigned integer, for 32-bits floating-point values, this field is 8 bits. 1 represents the smallest exponent and "all ones - 1" the largest. (0 and "all ones" are used to encode special values, see below.) A value in the middle (127, in the 32-bit case) represents zero, this is also known as the bias.
When looking at the mantissa (the value between 1.0 and (almost) 2.0), one sees that all possible values start with a "1" (both in the decimal and binary representation). This means that it's no point in storing it. The rest of the binary digits are stored in an integer field, in the 32-bit case this field is 23 bits.
In addition to the normal floating-point values, there are a number of special values:
Zero is encoded with both exponent and mantissa as zero. The sign bit is used to represent "plus zero" and "minus zero". A minus zero is useful when the result of an operation is extremely small, but it's still important to know from which direction the operation came from.
plus and minus infinity -- represented using an "all ones" exponent and a zero mantissa field.
Not a Number (NaN) -- represented using an "all ones" exponent and a non-zero mantissa.
Denormalized numbers -- numbers smaller than the smallest normal number. Represented using a zero exponent field and a non-zero mantissa. The special thing with these numbers is that the precision (i.e. the number of digits a value can contain) will drop the smaller the value becomes, simply because there is not room for them in the mantissa.
Finally, the following is a handful of concrete examples (all values are in hex):
1.0 : 3f800000
-1234.0 : c49a4000
100000000000000000000000.0: 65a96816
In layman's terms, it's essentially scientific notation in binary. The formal standard (with details) is IEEE 754.
typedef struct {
unsigned int mantissa_low:32;
unsigned int mantissa_high:20;
unsigned int exponent:11;
unsigned int sign:1;
} tDoubleStruct;
double a = 1.2;
tDoubleStruct* b = reinterpret_cast<tDoubleStruct*>(&a);
Is an example how memory is set up if compiler uses IEEE 754 double precision which is the default for a C double on little endian systems (e.g. Intel x86).
Here it is in C based binary form and better read
wikipedia about double precision to understand it.
There are a number of different floating-point formats. Most of them share a few common characteristics: a sign bit, some bits dedicated to storing an exponent, and some bits dedicated to storing the significand (also called the mantissa).
The IEEE floating-point standard attempts to define a single format (or rather set of formats of a few sizes) that can be implemented on a variety of systems. It also defines the available operations and their semantics. It's caught on quite well, and most systems you're likely to encounter probably use IEEE floating-point. But other formats are still in use, as well as not-quite-complete IEEE implementations. The C standard provides optional support for IEEE, but doesn't mandate it.
The mantissa represents the most significant bits of the number.
The exponent represents how many shifts are to be performed on the mantissa in order to get the actual value of the number.
Encoding specifies how are represented sign of mantissa and sign of exponent (basically whether shifting to the left or to the right).
The document you refer to specifies IEEE encoding, the most widely used.
I have found the article you referenced quite illegible (and I DO know a little how IEEE floats work). I suggest you try with the Wiki version of the explanation. It's quite clear and has various examples:
http://en.wikipedia.org/wiki/Single_precision and http://en.wikipedia.org/wiki/Double_precision
It is implementation defined, although IEEE-754 is the most common by far.
To be sure that IEEE-754 is used:
in C, use #ifdef __STDC_IEC_559__
in C++, use the std::numeric_limits<float>::is_iec559 constants
I've written some guides on IEEE-754 at:
In Java, what does NaN mean?
What is a subnormal floating point number?