Integer memory representation - c

As been told again and so that nehative numbers are represented by 2s complement while unsigned don't use that extra bit for signed convention. In case of integer we can represent both signed and unsigned. How in data type integer computer figures out which encoding scheme to pursue

Some operations (such as addition) work identically on both signed and unsigned integers.
But that's not the case for all operations. When right-shifting, we shift in zeroes for unsigned integers, and we shift in the sign bit for signed integers.
In these cases, the processor provides the means to achieve both operations. It's possible for the processor two offer two different instructions, or two variations of one.
But whatever the case, there is no decision making on the processor's part. The processor just executes the instructions selected by the compiler. It's up to the compiler to emit instructions that achieve the desired result based on the type of the values involved.

Related

What are ways to do arithmetic with values whose corresponding binary-encoding is larger than hardware can support?

If I wanted to do arithmetic with values greater than 2^32 and did not want to use long's, how would I do that?
I can think of a scheme where I implement numbers (to whatever number of bits I wish) by using multiple variables to implement a single number:
int upper32;
int lower32;
The above 2 variables can represent a 2^64 bit value (if lower 32 overflows, I increment upper32 by one. This would require some overhead.
What are some better implementations?
If you want to deal with integers that are not larger than 2^64, then you can use long long or unsigned long long, which are 64 bit integers.
If you want to do arithmetics with larger numbers, take a look at gmp library, or write your own multiple precision arithmetic library (it's very simple).
Also, take a look at this

Can stdint's int8_t exist on an architecture that does not have 8-bit bytes?

Apparently there are architectures that don't have 8-bit bytes.
It would seem that such architectures would preclude the existence of an int8_t (defined in stdint.h) type since C, from my understanding, cannot create datatypes smaller than a CHAR_BIT.
That said, the IEEE stdint.h def seems to require that such a type exist (along with others), only allowing for the 64-bit to not exist on architectures that do not support it.
Am I missing something?
EDIT: As #JasonD points out in the comments below, the linked page states at the end;
As a consequence of adding int8_t, the following are true:
A byte is exactly 8 bits.
{CHAR_BIT} has the value 8, {SCHAR_MAX} has the value 127, {SCHAR_MIN} has the value -128, and {UCHAR_MAX} has the value 255.
In other words, the linked IEEE page does not apply to architectures with other byte lengths than 8. This is in line with POSIX which requires 8 bit char.
-- Before edit --
The explanation is in a note on the page you linked to;
The "width" of an integer type is the number of bits used to store its value in a pure binary system; the actual type may use more bits than that (for example, a 28-bit type could be stored in 32 bits of actual storage)
Just because an architecture doesn't handle 8-bit bytes natively, doesn't preclude an exact 8-bit integral type. The arithmetic could be handled using shifts and masks of wider registers to 'emulate' 8-bit arithmetic.

Is plain char usually/always unsigned on non-twos-complement systems?

Obviously the standard says nothing about this, but I'm interested more from a practical/historical standpoint: did systems with non-twos-complement arithmetic use a plain char type that's unsigned? Otherwise you have potentially all sorts of weirdness, like two representations for the null terminator, and the inability to represent all "byte" values in char. Do/did systems this weird really exist?
The null character used to terminate strings could never have two representations. It's defined like so (even in C90):
A byte with all bits set to 0, called the null character, shall exist in the basic execution character set
So a 'negative zero' on a ones-complement wouldn't do.
That said, I really don't know much of anything about non-two's complement C implementations. I used a one's-complement machine way back when in university, but don't remember much about it (and even if I cared about the standard back then, it was before it existed).
It's true, for the first 10 or 20 years of commercially produced computers (the 1950's and 60's) there were, apparently, some disagreements on how to represent negative numbers in binary. There were actually three contenders:
Two's complement, which not only won the war but also drove the others to extinction
One's complement, -x == ~x
Sign-magnitude, -x = x ^ 0x80000000
I think the last important ones-complement machine was probably the CDC-6600, at the time, the fastest machine on earth and the immediate predecessor of the first supercomputer.1.
Unfortunately, your question cannot really be answered, not because no one here knows the answer :-) but because the choice never had to be made. And this was for actually two reasons:
Two's complement took over simultaneously with byte machines. Byte addressing hit the world with the twos-complement IBM System/360. Previous machines had no bytes, only complete words had addresses. Sometimes programmers would pack characters inside these words and sometimes they would just use the whole word. (Word length varied from 12 bits to 60.)
C was not invented until a decade after the byte machines and two's complement transition. Item #1 happened in the 1960's, C first appeared on small machines in the 1970's and did not take over the world until the 1980's.
So there simply never was a time when a machine had signed bytes, a C compiler, and something other than a twos-complement data format. The idea of null-terminated strings was probably a repeatedly-invented design pattern thought up by one assembly language programmer after another, but I don't know that it was specified by a compiler until the C era.
In any case, the first actually standardized C ("C89") simply specifies "a byte or code of value zero is appended" and it is clear from the context that they were trying to be number-format independent. So, "+0" is a theoretical answer, but it may never really have existed in practice.
1. The 6600 was one of the most important machines historically, and not just because it was fast. Designed by Seymour Cray himself, it introduced out-of-order execution and various other elements later collectively called "RISC". Although others tried to claimed credit, Seymour Cray is the real inventor of the RISC architecture. There is no dispute that he invented the supercomputer. It's actually hard to name a past "supercomputer" that he didn't design.
I believe it would be almost but not quite possible for a system to have a one's-complement 'char' type, but there are four problems which cannot all be resolved:
Every data type must be representable as a sequence of char, such that if all the char values comprising two objects compare identical, the data objects containing in question will be identical.
Every data type must likewise be representable as a sequence of 'unsigned char'.
The unsigned char values into which any data type can be decomposed must form a group whose order is a power of two.
I don't believe the standard permits a one's-complement machine to special-case the value that would be negative zero and make it behave as something else.
It might be possible to have a standards-compliant machine with a one's-complement or sign-magnitude "char" type if the only way to get a negative zero would be by overlaying some other data type, and if negative zero compared unequal to positive zero. I'm not sure if that could be standards-compliant or not.
EDIT
BTW, if requirement #2 were relaxed, I wonder what the exact requirements would be when overlaying other data types onto 'char'? Among other things, while the standard makes it abundantly clear that one must be able to perform assignments and comparisons on any 'char' values that may result from overlaying another variable onto a 'char', I don't know that it imposes any requirement that all such values must behave as an arithmetic group. For example, I wonder what the legality would be of a machine in which every memory location was physically stored as 66 bits, with the top two bits indicating whether the value was a 64-bit integer, a 32-bit memory handle plus a 32-bit offset, or a 64-bit double-precision floating-point number? Since the standard allows implementations to do anything they like when an arithmetic computation exceeds the range of a signed type, that would suggest that signed types do not necessarily have to behave as a group.
For most signed types, there's no requirement that the type be unable to represent any numbers outside the range specified in limits.h; if limits.h specifies that the minimum "int" is -32767, then it would be perfectly legitimate for an implementation to in fact allow a value of -32768 since any program that tried to do so would invoke Undefined Behavior. The key question would probably be whether it would be legitimate for a 'char' value resulting from the overlay of some other type to yield a value outside the range specified in limits.h. I wonder what the standard says?

Signed vs Unsigned operations in C

Very simple question:
I have a program doing lots and lots of mathematical computations over ints and long longs. To fit in an extra bit, I made the long longs unsigned, since I only dealt with positive numbers, and could now get a few more values.
Oddly enough, this gave me a 15% performance boost, which I confirmed to be in simply making all the long long's unsigned.
Is this possible? Are mathematical operations really faster with unsigned numbers? I remember reading that there would be no difference, and the compiler automatically picks out the fastest way to go whether signed or unsigned. Is this 15% boost really from making the vars unsigned, or could it be something else affected in my code?
And, if it really is from making the vars unsigned, should I aim to make everything (even ints) unsigned, as I never need negative numbers, and every second is important if I can save it.
In some operations, signed integers are faster, in others, unsigned are faster:
In C, signed integer operations can be assumed not to wrap. The compiler will take advantage of this in loop optimization, for example. Comparisons can be optimized away similarly. (This can also lead to subtle bugs if you don't expect this).
On the other hand, unsigned integers do not have this assumption. However, not having to deal with a sign is a big advantage for some operations, for example: division. Unsigned division by a constant power of two is a simple shift, but (depending on your rounding rules) there's a conditional off-by-1 for negative numbers.
Personally, I make a habit of only using unsigned integers unless I really, really do have a value which needs to be signed. It's not so much for performance as correctness.
You may see the effect magnified with long long, which (I'm guessing) is 64 bits in your case. The CPU usually doesn't have single instructions do deal with these types (in 32 bit mode), so the slight added complexity for signed operations will be more noticeable.
On a 32-bit processor, 64-bit integer operations are emulated; using unsigned instead of signed means the emulation library doesn't have to do extra work to propagate carry bits etc.
There are three cases where a compiler cares whether a variable is signed or unsigned:
When the variable is converted to a longer type
When the comparison operators (greater-than, etc.) are applied
When overflows might occur
On some machines, conversion of signed variables to longer types requires extra code; on other machines, a conversion may be performed as part of a 'load' or 'move' instruction.
Some machines (mainly small embedded microcontrollers) require more instructions to perform a signed-versus-signed comparison than unsigned-versus-unsigned, but most machines have a full array of both signed and unsigned compare instructions available.
When overflows occur with unsigned types, the compiler may have to add code to ensure that the defined behavior actually occurs. No such code is required for signed types, because anything that might happen in the absence of such code would be permitted by the standard.
The compiler doesn't pick if it's going to be unsigned or signed. But, yes, in theory, unsigned with unsigned is faster than signed with signed. If you really want to slow things down, you'll go with signed with unsigned. And even worse: floats with integers.
It depends on the processor, of course.

Bit shifts in C

If the bit pattern corresponding to a signed integer is shifted to the right then
1 vacant bit will be filled by the sign bit
2 vacant bit will be filled by 0
3 The outcome is implementation dependent
4 none of the above
The answer to this question is 3rd option.. Can anybody explain this,,
Also give some basic idea, about the theory behind left shift and right shift operators in C programming. E.g.
what is filled on the vacant bit when any of the operation is performed. I checked and noticed that left shifting fills the vacant bit by 0 and right shift fills by 1. Please clear the logic...
I'd have to check the spec for the question of what precisely is implementation dependent.
However, every implementation I've used in (mumble) years of embedded systems projects has been sensible:
Left shifts always shift in a 0 at the low bit. No other value makes sense.
Right shifts depend on the data type. A right shift of a signed integer duplicates the high bit as it shifts the rest to the right. This is called an "arithmetic shift", and has the nice property (in twos complement arithmetic, at least) that it divides the value by two while preserving the sign of the original number.
A right shift of an unsigned integer shifts a 0 into the high bit, and is usually known as a "logical shift".
It makes sense for an implementation to provide both kinds of shifts because both are useful, and using signed/unsigned to select which is meant is a sensible choice.
Edit: At least one thing that absolutely is implementation dependent is that the C standard does not (completely) specify the underlying implementation of integers and their storage. For instance, it is possible to build a compliant C compiler for a machine that uses one's complement arithmetic. It would also be possible (I think) to build a compliant compiler for a machine whose native storage was signed magnitude BCD. (Nope, I was wrong, see below.)
In practice, the world is pretty much settled on two's complement binary for the CPU and some of the pedantry is mooted.
So part of the question really is: how do you define the meaning of the << and >> operators in a way that is stable regardless of the underlying arithmetic system used.
IIRC, the definition of n<<1 is effectively n*2, and n>>1 is effectively n/2, with a natural extension to shifts by more than 1 (but not more than 31... there be undefined dragons there...) and with the notion that the >> operator will preserve the sign if operating on a signed value.
Edit 2: Pete Kirkham points out in his fine answer that the C standard does specifically disallow the scary case of a BCD representation of integers, whether it is signed magnitude or ten's complement. I'm sure that is a good thing, even if Knuth did use a (optionally) BCD machine for his sample code in early editions of The Art of Computer Programming.
In those rare use cases where BCD is the right answer, then storing them in an unsigned long (8 digits ten's complement) or an unsigned 64-bit integer (room for 16 digits ten's complement or 15 digits plus sign and flags) and using a carefully crafted arithmetic library to manipulate them makes sense.
In practice, of course, C implementations map the operators as directly as allowed by the standard to the native machine instructions of the CPU. The folk who wrote the standard were very mindful of the existence of of many ways to implement even simple things like the representation of an integral value, and the C standard reflects that by allowing just enough implementation defined behavior to let operators be efficiently implemented in each machine.
The alternative leads swiftly to a world where all math operations are completely specified, and cannot be efficiently implemented on any machine.
C does not guarantee that there is a sign bit, or anything about the bit-level representation of integers, that is why.
For two's complement, you will typically see the sign bit being shifted in, but that is up to the implementation.
ISO C99 requires a sign bit somewhere in the representation, but gives the option between various compliment/sign and magnitude schemes and allows padding bits, all of which effect the operation of >>.
Section 6.2.6.2 (Integer types)
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; there shall be exactly one
sign bit.
and
Section 6.5.7 (Bitwise shift operators)
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient of
E1 / 2E2 . If E1 has a signed type and a negative value,
the resulting value is implementation-defined.
It doesn't specify which of 1's compliment, 2's compliment or sign and magnitude is used, nor whether the sign bit is left or right of the value bits, or where any padding is, all of which would effect the output of the >> operator on signed negatives.
In answer to RBerteig's query, C99 precludes a BCD representation of integers:
Section 6.2.6.2 (Integer types)
If there are N value bits, each bit shall represent a different power of 2 between 1 and 2 N −1 , so that objects of that type shall be
capable of representing values from 0 to 2 N − 1 using a pure binary
representation; this shall be known as the value representation.
The C language implementations tend to map bit shifting operations directly onto the corresponding machine code instructions. Since different hardware architectures have historically done different things, the C specification tends to leave things implementation defined so that C implementations can take advantage of whatever the hardware offers.
The outcome is implementation dependent. However, in practice, every single x86, PPC, and MIPS compiler I have ever worked with has followed this rule for shifting right:
If the operand is a signed integer,
the vacant bit is filled with the
sign bit (really the most
significant bit)
If the operand is
an unsigned integer, the vacant bit
is filled with zero.
As RBerteig says, this is so that for signed integers, n >> 1 = n/2 (rounded down) for both positive and negative n, and for unsigned integers, n >> 1 = n/2 even for n > 2^31 (on a 32-bit architecture).
The corresponding hardware instructions are arithmetic (sign-extending) and logical (not sign-extending) shift; the compiler chooses between them based on whether the operand is signed or unsigned.

Resources