What is Biased Notation? - c

I have read:
"Like an unsigned int, but offset by −(2^(n−1) − 1), where n is the number of bits in the numeral. Aside:
Technically we could choose any bias we please, but the choice presented here is extraordinarily common." - http://inst.eecs.berkeley.edu/~cs61c/sp14/disc/00/Disc0.pdf
However, I don't get what the point is. Can someone explain this to me with examples? Also, when should I use it, given other options like one's compliment, sign and mag, and two's compliment?

Biased notation is a way of storing a range of values that doesn't start with zero.
Put simply, you take an existing representation that goes from zero to N, and then add a bias B to each number so it now goes from B to N+B.
Floating-point exponents are stored with a bias to keep the dynamic range of the type "centered" on 1.
Excess-three encoding is a technique for simplifying decimal arithmetic using a bias of three.
Two's complement notation could be considered as biased notation with a bias of INT_MIN and the most-significant bit flipped.

A "representation" is a way of encoding information so that it easy to extract details or inferences from the encoded information.
Most modern CPUs "represent" numbers using "twos complement notation". They do this because it is easy to design digital circuits that can do what amounts to arithmetic on these values quickly (add, subtract, multiply, divide, ...). Twos complement also has the nice property that one can interpret the most significant bit as either a power-of-two (giving "unsigned numbers") or as a sign bit (giving signed numbers) without changing essentially any of the hardware used to implement the arithmetic.
Older machines used other bases, e.g, quite common in the 60s were machines that represented numbers as sets of binary-coded-decimal digits stuck in 4-bit addressable nibbles (the IBM 1620 and 1401 are examples of this). So, you can represent that same concept or value different ways.
A bias just means that whatever representation you chose (for numbers), you have added a constant bias to that value. Presumably that is done to enable something to be done more effectively. I can't speak to "−(2^(n−1) − 1)" being "an extraordinaly common (bias)"; I do lots of assembly and C coding and pretty don't find a need to "bias" values.
However, there is a common example. Modern CPUs largely implement IEEE floating point, which stores floating point numbers with sign, exponent, mantissa. The exponent is is power of two, symmetric around zero, but biased by 2^(N-1) if I recall correctly, for an N-bit exponent.
This bias allows floating point values with the same sign to be compared for equal/less/greater by using the standard machine twos-complement instructions rather than a special floating point instruction, which means that sometimes use of actual floating point compares can be avoided. (See http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm for dark corner details). [Thanks to #PotatoSwatter for noting
the inaccuracy of my initial answer here, and making me go dig this out.]

Related

Why adding 1.0/3.0 three times works just as mathematically expected?

I am aware that real numbers cannot be represented exactly in binary (even though with so-called double precision) in most cases. For example, 1.0/3.0 is approximated by 0x3fd5555555555555, which actually represents 0.33333333333333331483.... If we perform (1.0/3.0)+(1.0/3.0) then we obtain 0x3fe5555555555555 (so 0.66666666666666662965...), just as expected in a sense of computer arithmetic.
However, when I tried to perform (1.0/3.0)+(1.0/3.0)+(1.0/3.0) by writing the following code
#include<stdio.h>
int main(){
double result=1.0/3.0;
result+=1.0/3.0;
result+=1.0/3.0;
printf("%016llx\n",result);
}
and compiling it with the standard GNU C compiler, then the resulting program returned 0x3ff0000000000000 (which represents exactly 1). This result made me confused, because I initially expected 0x3fefffffffffffff (I did not expect rounding error to cancel each other because both (1.0/3.0) and ((1.0/3.0)+(1.0/3.0)) are smaller than actual value when represented in binary), and I still have not figured out what happened.
I would be grateful if you let me know possible reasons for this result.
There is no need to consider 80 bit representation - the results are the same in Java which requires, except for some irrelevant edge cases, the same behavior as IEEE 754 64-bit binary arithmetic for its doubles.
The exact value of 1.0/3.0 is 0.333333333333333314829616256247390992939472198486328125
As long as all numbers involved are in the normal range, multiplying or dividing by a power of two is exact. It only changes the exponent, not the significand. In particular, adding 1.0/3.0 to itself is exact, so the result of the first addition is 0.66666666666666662965923251249478198587894439697265625
The second addition does involve rounding. The exact sum is 0.99999999999999988897769753748434595763683319091796875, which is bracketed by representable numbers 0.999999999999999944488848768742172978818416595458984375 and 1.0. The exact value is half way between the bracketing numbers. A single bit has to be dropped. The least significant bit of 1.0 is a zero, so that is the rounded result of the addition.
That is a good rounding question. If I correctly remember, the arithmetic coprocessor uses 80 bits: 64 precision bits and 15 for the exponent (ref.). That means that internally the operation uses more bits than you can display. And in the end the coprocessor actually rounds its internal representation (more accurate) to give a 64 bit only value. And as the first bit dropped is 1 and not 0, the result is rounded upside giving 1.
But I must admit I am just guessing here...
But if you try to do by hand the operation, if immediately comes that the addition sets all precision bits to 1 (adding 5555...5 and 555...5 shifted by 1) plus the first bit to drop which is also 1. So by hand a normal human being would round upside also giving 1, so it is no surprise that the arithmetic unit is also able to do the correct rounding.

Converting 32-bit number to 16 bits or less

On my mbed LPC1768 I have an ADC on a pin which when polled returns a 16-bit short number normalised to a floating point value between 0-1. Document here.
Because it converts it to a floating point number does that mean its 32-bits? Because the number I have is a number to six decimal places. Data Types here
I'm running Autocorrelation and I want to reduce the time it takes to complete the analysis.
Is it correct that the floating point numbers are 32-bits long and if so is it correct that multiplying two 32-bit floating point numbers will take a lot longer than multiplying two 16-bit short value (non-demical) numbers together?
I am working with C to program the mbed.
Cheers.
I should be able to comment on this quite accurately. I used to do DSP processing work where we would "integerize" code, which effectively meant we'd take a signal/audio/video algorithm, and replace all the floating point logic with fixed point arithmetic (ie: Q_mn notation, etc).
On most modern systems, you'll usually get better performance using integer arithmetic, compared to floating point arithmetic, at the expense of more complicated code you have to write.
The Chip you are using (Cortex M3) doesn't have a dedicated hardware-based FPU: it only emulates floating point operations, so floating point operations are going to be expensive (take a lot of time).
In your case, you could just read the 16-bit value via read_u16(), and shift the value right 4 times, and you're done. If you're working with audio data, you might consider looking into companding algorithms (a-law, u-law), which will give a better subjective performance than simply chopping off the 4 LSBs to get a 12-bit number from a 16-bit number.
Yes, a float on that system is 32bit, and is likely represented in IEEE754 format. Multiplying a pair of 32-bit values versus a pair of 16-bit values may very well take the same amount of time, depending on the chip in use and the presence of an FPU and ALU. On your chip, multiplying two floats will be horrendously expensive in terms of time. Also, if you multiply two 32-bit integers, they could potentially overflow, so there is one potential reason to go with floating point logic if you don't want to implement a fixed-point algorithm.
It is correct to assume that multiplying two 32-bit floating point numbers will take longer than multiplying two 16-bit short value if special hardware(Floating point unit) is not present in the processor.

Is sign magnitude used to represent negative numbers?

I understand that the two's complement is used to represent a negative number but there is also the method of using sign magnitude. Is sign magnitude still used to represent negative numbers? If not where was it previously used then? And how is a machine that interprets negative numbers using two's complement able to communicate and understand another machine that uses the sign magnitude instead?
Yes, sign magnitude is frequently used today, though not where you might expect. For example, IEEE floating point uses a single "sign" bit to denote positive or negative. (As a result, IEEE floating point numbers can be -0.) Sign magnitude is not commonly used today for integers, however.
Communication between two machines using different number representations only presents a problem if they both try to use their native encoding format. If a common format is defined for exchanging information, there is no problem. For example, a machine that uses two's complement can easily construct a number using sign magnitude encoding (and vice versa). These days, different machines are more likely to communicate using the ASCII representation of the number (eg. in JSON or XML), or to use a completely different binary encoding (eg. ASN.1, zigzag encoding, etc).

Why is Two's complement being used more widely in preference to Ones' complement in representation of signed numbers

The latter representation looks more natural to understand. Why do most languages choose the former one? I guess there must be some unique and thus advantageous characteristics in the Two's complement which make data operations easier.
Languages don't specify the number format; the hardware does. Ask Intel why they designed their ALU to do 2's complement
The answer will be because the mathematical operations are more regular in 2s complement; positive and negative numbers need to be handled differently in 1s complement, which means double the hardware/microcode needed for basic math in the CPU.
From Wikipedia
The two's-complement system has the advantage that the fundamental arithmetical operations of addition, subtraction and multiplication are identical, regardless of whether the inputs and outputs are interpreted as unsigned binary numbers or two's complement (provided that overflow is ignored). This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.

How are floating point numbers stored in memory?

I've read that they're stored in the form of mantissa and exponent
I've read this document but I could not understand anything.
To understand how they are stored, you must first understand what they are and what kind of values they are intended to handle.
Unlike integers, a floating-point value is intended to represent extremely small values as well as extremely large. For normal 32-bit floating-point values, this corresponds to values in the range from 1.175494351 * 10^-38 to 3.40282347 * 10^+38.
Clearly, using only 32 bits, it's not possible to store every digit in such numbers.
When it comes to the representation, you can see all normal floating-point numbers as a value in the range 1.0 to (almost) 2.0, scaled with a power of two. So:
1.0 is simply 1.0 * 2^0,
2.0 is 1.0 * 2^1, and
-5.0 is -1.25 * 2^2.
So, what is needed to encode this, as efficiently as possible? What do we really need?
The sign of the expression.
The exponent
The value in the range 1.0 to (almost) 2.0. This is known as the "mantissa" or the significand.
This is encoded as follows, according to the IEEE-754 floating-point standard.
The sign is a single bit.
The exponent is stored as an unsigned integer, for 32-bits floating-point values, this field is 8 bits. 1 represents the smallest exponent and "all ones - 1" the largest. (0 and "all ones" are used to encode special values, see below.) A value in the middle (127, in the 32-bit case) represents zero, this is also known as the bias.
When looking at the mantissa (the value between 1.0 and (almost) 2.0), one sees that all possible values start with a "1" (both in the decimal and binary representation). This means that it's no point in storing it. The rest of the binary digits are stored in an integer field, in the 32-bit case this field is 23 bits.
In addition to the normal floating-point values, there are a number of special values:
Zero is encoded with both exponent and mantissa as zero. The sign bit is used to represent "plus zero" and "minus zero". A minus zero is useful when the result of an operation is extremely small, but it's still important to know from which direction the operation came from.
plus and minus infinity -- represented using an "all ones" exponent and a zero mantissa field.
Not a Number (NaN) -- represented using an "all ones" exponent and a non-zero mantissa.
Denormalized numbers -- numbers smaller than the smallest normal number. Represented using a zero exponent field and a non-zero mantissa. The special thing with these numbers is that the precision (i.e. the number of digits a value can contain) will drop the smaller the value becomes, simply because there is not room for them in the mantissa.
Finally, the following is a handful of concrete examples (all values are in hex):
1.0 : 3f800000
-1234.0 : c49a4000
100000000000000000000000.0: 65a96816
In layman's terms, it's essentially scientific notation in binary. The formal standard (with details) is IEEE 754.
typedef struct {
unsigned int mantissa_low:32;
unsigned int mantissa_high:20;
unsigned int exponent:11;
unsigned int sign:1;
} tDoubleStruct;
double a = 1.2;
tDoubleStruct* b = reinterpret_cast<tDoubleStruct*>(&a);
Is an example how memory is set up if compiler uses IEEE 754 double precision which is the default for a C double on little endian systems (e.g. Intel x86).
Here it is in C based binary form and better read
wikipedia about double precision to understand it.
There are a number of different floating-point formats. Most of them share a few common characteristics: a sign bit, some bits dedicated to storing an exponent, and some bits dedicated to storing the significand (also called the mantissa).
The IEEE floating-point standard attempts to define a single format (or rather set of formats of a few sizes) that can be implemented on a variety of systems. It also defines the available operations and their semantics. It's caught on quite well, and most systems you're likely to encounter probably use IEEE floating-point. But other formats are still in use, as well as not-quite-complete IEEE implementations. The C standard provides optional support for IEEE, but doesn't mandate it.
The mantissa represents the most significant bits of the number.
The exponent represents how many shifts are to be performed on the mantissa in order to get the actual value of the number.
Encoding specifies how are represented sign of mantissa and sign of exponent (basically whether shifting to the left or to the right).
The document you refer to specifies IEEE encoding, the most widely used.
I have found the article you referenced quite illegible (and I DO know a little how IEEE floats work). I suggest you try with the Wiki version of the explanation. It's quite clear and has various examples:
http://en.wikipedia.org/wiki/Single_precision and http://en.wikipedia.org/wiki/Double_precision
It is implementation defined, although IEEE-754 is the most common by far.
To be sure that IEEE-754 is used:
in C, use #ifdef __STDC_IEC_559__
in C++, use the std::numeric_limits<float>::is_iec559 constants
I've written some guides on IEEE-754 at:
In Java, what does NaN mean?
What is a subnormal floating point number?

Resources