I was reading a book on C, where in some section it says: " The bitwise operations are typically used with unsigned types.".
Question: why?
Simply because it is not immediately clear what bit operations on the sign bit of a signed number should mean.
Unsigned types have no special bits, everything works straight
forward.
Signed types have a special sign bit and can interpret that with
three different encodings to represent negative values (ones and two's complement or sign and magnitude).
From the programming languages' point of view the unsigned does not mean, that it cannot be negative. It means, that the first bit of the number is not used for determining if the value is negative or positive.
So, for an 8 bit value the bitwise operators consider all the 8 bits, thus working on the range 0..255 (while the mathematical operators can consider the first bit as being the signedness indicator, thus working on a range of -128 to +127).
Unsigned operands do not have any special bits for sign representation.
For signed operands, the special bit for sign representation may interrupt operations which we want to perform.
Bitwise operations are typically used with unsigned types or signed types with non-negative values.
The C Standard supports 3 different representations for negative values: applying bitwise operators on such values has at best an implementation defined behavior.
For example -1 is represented:
as |1000|0000|0000|0001| on a 16-bit system with sign magnitude.
as |1111|1111|1111|1110| on a 16-bit system with one's complement.
and more commonly as |1111|1111|1111|1111| on a 16-bit system with two's complement.
As a consequence, -1 & 3 can have 3 different values:
1 on sign-magnitude systems,
2 on one's complement systems,
3 on two's complement systems,
Other operations may behave in even more unexpected ways on negative numbers, such as << and >>, especially on the DS9K.
Conversely, the bitwise operations are fully defined on unsigned types, which are mandated to have binary representation (with some restrictions on the right operand of shift operators).
Related
I am going through some old C code using Lint which stumbled upon this line:
int16_t max = -0;
The Lint message is that the "Constant expression evaluates to 0 in operation '-'".
Is there any reason why someone would use -0?
In the C specification (6.2.6.2 Integer types), it states the following (emphasis mine):
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are
M value bits in the signed type and N in the unsigned type, then M £ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and
magnitude);
the sign bit has the value -(2N) (two’s complement);
the sign bit has the value -(2N - 1) (one’s complement).
Which of these applies is implementation-defined, as is whether the
value with sign bit 1 and all value bits zero (for the first two), or
with sign bit and all value bits 1 (for one’s complement), is a trap
representation or a normal value. In the case of sign and magnitude
and one’s complement, if this representation is a normal value it is
called a negative zero.
In other words, C supports three different representations for signed integers and two of them have the concept of signed zero, which makes a distinction between a positive and a negative zero.
So, my explanation is that perhaps the author of your code snippet was trying to produce a negative zero value. But, as pointed out in Jens Gustedt's answer, this expression cannot actually produce a negative zero, which means the author may have made a wrong assumption there.
No, I can't see any reason for this. Others have mentioned that it is possible to have platforms with "negative zero", but such a negative zero can never be produced by this expression, so this is useless.
The corresponding paragraph in the C standard is 6.2.6.2 p3, emphasis is mine:
If the implementation supports negative zeros, they shall be generated
only by:
— the &, |, ^, ~, <<, and >> operators with operands that
produce such a value;
— the +, -, *, /, and % operators where one operand is a negative zero and the result is zero;
— compound
assignment operators based on the above cases.
To produce a negative zero on such a platform you could use ~INT_MAX, for example, but that would not be a zero for other representations, so the code wouldn't be very portable.
That is for an architecture using a CPU with one's complement numbers.
Ones complement is a way to represent negative numbers, having a -0. Still in use for floating point.
For unsigned numbers the maximal number is indeed -0 : all 1s.
We are accustomed to two's complement numbers, having one negative number more. Though one's complement has some troubles around zero, the same holds for two's complement around MIN_INT: -MIN_INT == MIN_INT.
In the code above probably intended was unsigned numbers:
uint16_t max = (uint16_t) -1;
No reason for it with C99 or later.
int16_t is an exact-width integer type.
The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits. C11dr §7.20.1.1 1
There is no signed zero with two’s complement. So code is equivalent to
int16_t max = 0;
IMO, the int16_t max = -0; hoped for result, on a non-2's complement platform, was to initialize max to -0 to flag an array of length 0 or one that only contained elements with the value -0.
As per the C standard the value representation of a integer type is implementation defined. So 5 might not be represented as 00000000000000000000000000000101 or -1 as 11111111111111111111111111111111 as we usually assume in a 32-bit 2's complement. So even though the operators ~, << and >> are well defined, the bit patterns they will work on is implementation defined. The only defined bit pattern I could find was "§5.2.1/3 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.".
So my questions is - Is there a implementation independent way of converting integer types to a bit pattern?
We can always start with a null character and do enough bit operations on it to get it to a desired value, but I find it too cumbersome. I also realise that practically all implementations will use a 2's complement representation, but I want to know how to do it in a pure C standard way. Personally I find this topic quite intriguing due to the matter of device-driver programming where all code written till date assumes a particular implementation.
In general, it's not that hard to accommodate unusual platforms for the most cases (if you don't want to simply assume 8-bit char, 2's complement, no padding, no trap, and truncating unsigned-to-signed conversion), the standard mostly gives enough guarantees (a few macros to inspect certain implementation details would be helpful, though).
As far as a strictly conforming program can observe (outside bit-fields), 5 is always encoded as 00...0101. This is not necessarily the physical representation (whatever this should mean), but what is observable by portable code. A machine using Gray code internally, for example, would have to emulate a "pure binary notation" for bitwise operators and shifts.
For negative values of signed types, different encodings are allowed, which leads to different (but well-defined for every case) results when re-interpreting as the corresponding unsigned type. For example, strictly conforming code must distinguish between (unsigned)n and *(unsigned *)&n for a signed integer n: They are equal for two's complement without padding bits, but different for the other encodings if n is negative.
Further, padding bits may exist, and signed integer types may have more padding bits than their corresponding unsigned counterparts (but not the other way round, type-punning from signed to unsigned is always valid). sizeof cannot be used to get the number of non-padding bits, so e.g. to get an unsigned value where only the sign-bit (of the corresponding signed type) is set, something like this must be used:
#define TYPE_PUN(to, from, x) ( *(to *)&(from){(x)} )
unsigned sign_bit = TYPE_PUN(unsigned, int, INT_MIN) &
TYPE_PUN(unsigned, int, -1) & ~1u;
(there are probably nicer ways) instead of
unsigned sign_bit = 1u << sizeof sign_bit * CHAR_BIT - 1;
as this may shift by more than the width. (I don't know of a constant expression giving the width, but sign_bit from above can be right-shifted until it's 0 to determine it, Gcc can constant-fold that.) Padding bits can be inspected by memcpying into an unsigned char array, though they may appear to "wobble": Reading the same padding bit twice may give different results.
If you want the bit pattern (without padding bits) of a signed integer (little endian):
int print_bits_u(unsigned n) {
for(; n; n>>=1) {
putchar(n&1 ? '1' : '0'); // n&1 never traps
}
return 0;
}
int print_bits(int n) {
return print_bits_u(*(unsigned *)&n & INT_MAX);
/* This masks padding bits if int has more of them than unsigned int.
* Note that INT_MAX is promoted to unsigned int here. */
}
int print_bits_2scomp(int n) {
return print_bits_u(n);
}
print_bits gives different results for negative numbers depending on the representation used (it gives the raw bit pattern), print_bits_2scomp gives the two's complement representation (possibly with a greater width than a signed int has, if unsigned int has less padding bits).
Care must be taken not to generate trap representations when using bitwise operators and when type-punning from unsigned to signed, see below how these can potentially be generated (as an example, *(int *)&sign_bit can trap with two's complement, and -1 | 1 can trap with ones' complement).
Unsigned-to-signed integer conversion (if the converted value isn't representable in the target type) is always implementation-defined, I would expect non-2's complement machines to differ from the common definition more likely, though technically, it could also become an issue on 2's complement implementations.
From C11 (n1570) 6.2.6.2:
(1) For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
(2) For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M≤N ). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -(2M) (two's complement);
the sign bit has the value -(2M-1) (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
To add to mafso's excellent answer, there's a part of the ANSI C rationale which talks about this:
The Committee has explicitly restricted the C language to binary architectures, on the grounds that this stricture was implicit in any case:
Bit-fields are specified by a number of bits, with no mention of “invalid integer” representation. The only reasonable encoding for such bit-fields is binary.
The integer formats for printf suggest no provision for “invalid integer” values, implying that any result of bitwise manipulation produces an integer result which can be printed by printf.
All methods of specifying integer constants — decimal, hex, and octal — specify an integer value. No method independent of integers is defined for specifying “bit-string constants.” Only a binary encoding provides a complete one-to-one mapping between bit strings and integer values.
The restriction to binary numeration systems rules out such curiosities as Gray code and makes
possible arithmetic definitions of the bitwise operators on unsigned types.
The relevant part of the standard might be this quote:
3.1.2.5 Types
[...]
The type char, the signed and unsigned integer types, and the
enumerated types are collectively called integral types. The
representations of integral types shall define values by use of a pure
binary numeration system.
If you want to get the bit-pattern of a given int, then bit-wise operators are your friends. If you want to convert an int to its 2-complement representation, then arithmetic operators are your friends. The two representations can be different, as it is implementation defined:
Std Draft 2011. 6.5/4. Some operators (the unary operator ~, and the
binary operators <<, >>, &, ^, and |, collectively described as
bitwise operators) are required to have operands that have integer
type. These operators yield values that depend on the internal
representations of integers, and have implementation-defined and
undefined aspects for signed types.
So it means that i<<1 will effectively shift the bit-pattern by one position to the left, but that the value produced can be different than i*2 (even for smal values of i).
Consider the expression x>>y , here x is signed int with left most bit is 1 then is the result depend on machine ?
I have tried for signed int with left most bit is 0 i got same result, but i don't know about given case.
There are no unambiguous "leftmost" or "rightmost" bits (which depends on convention), but most significant and least significant bits. The sign bit on a 2's complement machine is the most significant bit.
>> uses zero extension if the shifted is an unsigned.
Positive signed values behave like positive unsigned values. The >> of a negative quantity, however, is implementation defined, but wherever I have used it, negative numbers have been sign-extended.
Also, left-shifting something to the sign bit of a signed quantity is undefined behaviour, so for most portable programs it is best to use bitshift tricks only to unsigned values.
This question already has answers here:
Arithmetic bit-shift on a signed integer
(6 answers)
Closed 9 years ago.
so, lets say I have a signed integer (couple of examples):
-1101363339 = 10111110 01011010 10000111 01110101 in binary.
-2147463094 = 10000000 00000000 01010000 01001010 in binary.
-20552 = 11111111 11111111 10101111 10111000 in binary.
now: -1101363339 >> 31 for example, should equal 1 right? but on my computer, I am getting -1. Regardless of what negative integer I pick if x = negative number, x >> 31 = -1. why? clearly in binary it should be 1.
Per C99 6.5.7 Bitwise shift operators:
If E1 has a signed type and a negative value, the resulting value is implementation-defined.
where E1 is the left-hand side of the shift expression. So it depends on your compiler what you'll get.
In most languages when you shift to the right it does an arithmetic shift, meaning it preserves the most significant bit. Therefore in your case you have all 1's in binary, which is -1 in decimal. If you use an unsigned int you will get the result you are looking for.
Per C 2011 6.5.7 Bitwise shift operators:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1/ 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Basically, the right-shift of a negative signed integer is implementation defined but most implementations choose to do it as an arithmetic shift.
The behavior you are seeing is called an arithmetic shift which is when right shifting extends the sign bit. This means that the MSBs will carry the same value as the original sign bit. In other words, a negative number will always be negative after a left shift operation.
Note that this behavior is implementation defined and cannot be guaranteed with a different compiler.
What you are seeing is an arithmetic shift, in contrast to the bitwise shift you were expecting; i.e., the compiler, instead of "brutally" shifting the bits, is propagating the sign bit, thus dividing by 2N.
When talking about unsigned ints and positive ints, a right shift is a very simple operation - the bits are shifted to the right by one place (inserting 0 on the left), regardless of their meaning. In such cases, the operation is equivalent to dividing by 2N (and actually the C standard defines it like that).
The distinction comes up when talking about negative numbers. Several negative numbers representation exist, although currently for integers most commonly 2's complement representation is used.
The problem of a "brutal" bitwise shift here is, for starters, that one of the bits is used in some way to express the sign; thus, shifting the binary digits regardless of the negative integers representation can give unexpected results.
For example, commonly in 2's representation the most significant bit is 1 for negative numbers, 0 for positive numbers; applying a bitwise shift (with zeroes inserted to the left) to a negative number would (between other things) make it positive, not resulting in the (usually expected) division by 2N
So, arithmetic shift is introduced; negative numbers represented in 2's complement have an interesting property: the division by 2N behavior of the shift is preserved if, instead of inserting zeroes from the left, you insert bits that have the same value of the original sign bit.
In this way, signed divisions by 2N can be performed with just a bit of extra logic in the shift, without having to resort to a fully-fledged division routine.
Now, is arithmetic shift guaranteed for signed integers? In some languages yes1, but in C it's not like that - the behavior of the shift operators when dealing with negative integers is left as an implementation-defined detail.
As often happens, this is due to different hardware support for the operation; C is used on vastly different platforms, and, especially in the past, there was quite a difference in the "cost" of operations depending on the platform.
For example, if the processor does not provide an arithmetic right shift instruction, the compiler would be mandated to emit a much slower DIV instruction of some kind, which could be a problem in an inner loop on slower processors. For these reasons, the C standard leaves it up to the implementor to do the most appropriate thing for the current platform.
In your case, your implementation probably chose arithmetic shift because you are running on an x86 processor, that uses 2's complement arithmetic and provides both bitwise and arithmetic shift as single CPU instructions.
Actually, languages like Java even have separated arithmetic and bitwise shift operators - this is mainly due to the fact that they do not have unsigned types to e.g. store bitfields.
I know that the behavior of >> on signed integer can be implementation dependent (specifically, when the left operand is negative).
What about the others: ~, >>, &, ^, |?
When their operands are signed integers of built-in type (short, int, long, long long), are the results guaranteed to be the same (in terms of bit content) as if their type is unsigned?
For negative operands, << has undefined behavior and the result of >> is implementation-defined (usually as "arithmetic" right shift). << and >> are conceptually not bitwise operators. They're arithmetic operators equivalent to multiplication or division by the appropriate power of two for the operands on which they're well-defined.
As for the genuine bitwise operators ^, ~, |, and &, they operate on the bit representation of the value in the (possibly promoted) type of the operand. Their results are well-defined for each possible choice of signed representation (twos complement, ones complement, or sign-magnitude) but in the latter two cases it's possible that the result will be a trap representation if the implementation treats the "negative zero" representation as a trap. Personally, I almost always use unsigned expressions with bitwise operators so that the result is 100% well-defined in terms of values rather than representations.
Finally, note that this answer as written may only apply to C. C and C++ are very different languages and while I don't know C++ well, I understand it may differ in some of these areas from C...
A left shift << of a negative value has undefined behaviour;
A right shift >> of a negative value gives an implementation-defined result;
The result of the &, | and ^ operators is defined in terms of the bitwise representation of the values. Three possibilities are allowed for the representation of negative numbers in C: two's complement, ones' complement and sign-magnitude. The method used by the implementation will determine the numerical result when these operators are used on negative values.
Note that the value with sign bit 1 and all value bits zero (for two's complement and sign-magnitude), or with sign bit and all value bits 1 (for ones’ complement) is explicitly allowed to be a trap representation, and in this case if you use arguments to these operators that would generate such a value the behaviour is undefined.
The bit content will be the same, but the resulting values will still be implementation dependent.
You really shouldn't see the values as signed or unsigned when using bitwise operations, because that is working on a different level.
Using unsigned types saves you from some of this trouble.
The C89 Standard defined the behavior of left-shifting signed numbers based upon bit positions. If neither signed nor unsigned types have padding bits, the required behavior for unsigned types, combined with the requirement that positive signed types share the same representation as unsigned types, would imply that the sign bit is immediately to the left of the most significant value bit.
This, in C89, -1<<1 would be -2 on two's-complement implementations which don't have padding bits and -3 on ones'-complement implementations which don't have padding bits. If there are any sign-magnitude implementations without padding bits, -1<<1 would equal 2 on those.
The C99 Standard changed left-shifts of negative values to Undefined Behavior, but nothing in the rationale gives any clue as to why (or even mentions the change at all). The behavior required by C89 may have been less than ideal in some ones'-complement implementations, and so it would made sense to allow those implementations the freedom to select something better. I've seen no evidence to suggest that the authors of the Standard didn't intended that quality two's-complement implementations should continue to provide the same behavior mandated by C89, but unfortunately they didn't actually say so.