When are bitwise operations undefined in C? [duplicate] - c

Bitwise operators (~, &, | and ^) operate on the bitwise representation of their promoted operands. Can such operations cause undefined behavior?
For example, the ~ operator is defined this way in the C Standard:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.
On all architectures, ~0 produces a bit pattern with the sign bit set to 1 and all value bits set to 1. On a one's complement architecture, this representation correspond to a negative zero. Can this bit pattern be a trap representation?
Are there other examples of undefined behavior involving simple bitwise operators for more common architectures?

For one's complement systems, there's explicitly listed the possibility of trap values for those that do not support negative zeros in signed integers (C11 6.2.6.2p4):
If the implementation does not support negative zeros, the behavior of the &, |, ^, ~, <<, and >> operators with operands that would produce such a value is undefined.
Then again, one's complement systems are not exactly common; as for example GCC doesn't support any!
C11 does imply that the implementation-defined and undefined aspects are just allowed for signed types (C11 6.5p4).

Related

Can bitwise operators have undefined behavior?

Bitwise operators (~, &, | and ^) operate on the bitwise representation of their promoted operands. Can such operations cause undefined behavior?
For example, the ~ operator is defined this way in the C Standard:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.
On all architectures, ~0 produces a bit pattern with the sign bit set to 1 and all value bits set to 1. On a one's complement architecture, this representation correspond to a negative zero. Can this bit pattern be a trap representation?
Are there other examples of undefined behavior involving simple bitwise operators for more common architectures?
For one's complement systems, there's explicitly listed the possibility of trap values for those that do not support negative zeros in signed integers (C11 6.2.6.2p4):
If the implementation does not support negative zeros, the behavior of the &, |, ^, ~, <<, and >> operators with operands that would produce such a value is undefined.
Then again, one's complement systems are not exactly common; as for example GCC doesn't support any!
C11 does imply that the implementation-defined and undefined aspects are just allowed for signed types (C11 6.5p4).

C bitwise-shift: right operand considered for implicit type-conversion?

gcc 4.8.4 warns about 1u << 63ul​ (assuming 64 bit long and 32 bit int) and computes 0. Rightfully so (no promotion from 1u​ to 1ul before shifting)?
ISO/IEC 9899:201x, 6.3.1.8 (Usual arithmetic conversions): "Many operators that expect operands of arithmetic type cause conversions"; 6.5.7 (Bitwise shift operators): "The integer promotions are performed on each of the operands...".
But I am not unable to conclude. Which are those "many operators"? As I understand, "integer promotion" does not pertain to types wider than int (am I correct?), but the standard does not explicitly state that the right operand of a bitwise-shift is not taken into account for the implicit type conversion.
Each operation documents this separately. For example, n1548 §6.5.5 "Multiplicative operators" ¶3
The usual arithmetic conversions are performed on the operands.
This phrase is omitted from §6.5.7 "Bitwise shift operators". Instead it says:
The integer promotions are performed on each of the operands. The type of the result is the type of the promoted left operand. …
Since the section on bitwise shift operators says nothing about "usual arithmetic conversions", that conversion does not happen.
The "usual arithmetic conversions" include conversions to floating point types and how to determine a common type for two operands.
Bitshifts do not operate on floating point types, only integers (constraint, 6.5.7p2). Different than for other binary operators taking integers only (e.g. bit-and), the two operands are not directly combined for the result; there is no requirement to have a common type for the operation. Thus, each operand is independently promoted (from your citation: "The integer promotions are performed on each of the operands").
Reading the whole paragraph 6.5.7p3 makes it clear:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
Note the emphasised sentence, which clarifies the result type is solely determined from the left operand. From that and the last sentence follows that int or unsigned int for the right operand is more than sufficient for all current implementations. The lowest upper limit of int (INT_MAX) is 32767 - much more than the number of bits in any standard type of any implementation, even if we consider future wider integers with 1024 and more bits.
Also note that your code invokes undefined behaviour, unless your platform has an unsigned int with at least 64 bits (last sentence).
The compiler correctly warns about your code invoking undefined behaviour. The warning is not required, but you should be glad it does. Treat it seriously! Any behaviour of the program is correct if you invoke undefined behaviour.
If you want a 64 bit type, use uint64_t. For a constant, use the macro UINt64_C(1) which generates an integer constant with at least 64 bits (uint_least64_t). Both are provided by stdint.h.
About your other question:
From 6.3.1.1p2:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

Are bitwise operations portable?

Suppose we have the following code:
int j = -1 & 0xFF;
The resulting value in j could be one of the following based on the underlying representation:
System Value
Two's complement 0xFF
One's complement 0xFE
Sign/Magnitude 0x01
But are the &, |, and ^ operators in C always defined in terms of two's complement (thus making j always be equal to 0xFF), or are they defined in terms of the underlying representation of the system?
They're defined in terms of the actual bit representation. From the C11 final draft:
The result of the binary & operator is the bitwise AND of the operands (that is, each bit in the result is set if and only if each of the corresponding bits in the converted operands is set).
...
The result of the ^ operator is the bitwise exclusive OR of the operands (that is, each bit in the result is set if and only if exactly one of the corresponding bits in the converted operands is set).
...
The result of the | operator is the bitwise inclusive OR of the operands (that is, each bit in the result is set if and only if at least one of the corresponding bits in the converted operands is set).

Use of small integer with bits operator in C

Related to a previous question, I can't understand some rules of MISRA C 2004.
In ISO C99 draft 2007, in 6.5 section §4 :
Some operators (the unary operator ~, and the binary operators <<, >>, &, ^, and |, collectively described as bitwise operators) are required to have operands that have integer type. These operators yield values that depend on the internal representations of integers, and have implementation-defined and undefined aspects for signed types.
Ok, using a signed integer with bitwise operators can produce undefined behaviour (and makes no sense).
A good solution is to use explicit conversion to a wider unsigned integer type in order to by-pass integral promotion, and then not use signed value with bitwise operators (see associated answers of my previous question).
But in MISRA C 2004, use of small unsigned integers with bitwise operators is possible (rule 10.5 for example). Why, if integral promotion leads to use signed values with bitwise operators? I think I don't understand some things.
The rules don't contradict each other and you don't need to widen the type. You can immediately cast the result of small integer binary operation back to its type.
A small integer will not be promoted to int for shifts unless the first operand is int.
This is from their example:
uint8_t port = 0x5aU;
uint8_t result_8;
uint16_t result_16;
result_8 = (~port) >> 4; /* not compliant */
result_8 = ((uint8_t)(~port)) >> 4; /* compliant */
result_16 = ((uint16_t)(~(uint16_t)port)) >> 4; /* compliant */

Are the results of bitwise operations on signed integers defined?

I know that the behavior of >> on signed integer can be implementation dependent (specifically, when the left operand is negative).
What about the others: ~, >>, &, ^, |?
When their operands are signed integers of built-in type (short, int, long, long long), are the results guaranteed to be the same (in terms of bit content) as if their type is unsigned?
For negative operands, << has undefined behavior and the result of >> is implementation-defined (usually as "arithmetic" right shift). << and >> are conceptually not bitwise operators. They're arithmetic operators equivalent to multiplication or division by the appropriate power of two for the operands on which they're well-defined.
As for the genuine bitwise operators ^, ~, |, and &, they operate on the bit representation of the value in the (possibly promoted) type of the operand. Their results are well-defined for each possible choice of signed representation (twos complement, ones complement, or sign-magnitude) but in the latter two cases it's possible that the result will be a trap representation if the implementation treats the "negative zero" representation as a trap. Personally, I almost always use unsigned expressions with bitwise operators so that the result is 100% well-defined in terms of values rather than representations.
Finally, note that this answer as written may only apply to C. C and C++ are very different languages and while I don't know C++ well, I understand it may differ in some of these areas from C...
A left shift << of a negative value has undefined behaviour;
A right shift >> of a negative value gives an implementation-defined result;
The result of the &, | and ^ operators is defined in terms of the bitwise representation of the values. Three possibilities are allowed for the representation of negative numbers in C: two's complement, ones' complement and sign-magnitude. The method used by the implementation will determine the numerical result when these operators are used on negative values.
Note that the value with sign bit 1 and all value bits zero (for two's complement and sign-magnitude), or with sign bit and all value bits 1 (for ones’ complement) is explicitly allowed to be a trap representation, and in this case if you use arguments to these operators that would generate such a value the behaviour is undefined.
The bit content will be the same, but the resulting values will still be implementation dependent.
You really shouldn't see the values as signed or unsigned when using bitwise operations, because that is working on a different level.
Using unsigned types saves you from some of this trouble.
The C89 Standard defined the behavior of left-shifting signed numbers based upon bit positions. If neither signed nor unsigned types have padding bits, the required behavior for unsigned types, combined with the requirement that positive signed types share the same representation as unsigned types, would imply that the sign bit is immediately to the left of the most significant value bit.
This, in C89, -1<<1 would be -2 on two's-complement implementations which don't have padding bits and -3 on ones'-complement implementations which don't have padding bits. If there are any sign-magnitude implementations without padding bits, -1<<1 would equal 2 on those.
The C99 Standard changed left-shifts of negative values to Undefined Behavior, but nothing in the rationale gives any clue as to why (or even mentions the change at all). The behavior required by C89 may have been less than ideal in some ones'-complement implementations, and so it would made sense to allow those implementations the freedom to select something better. I've seen no evidence to suggest that the authors of the Standard didn't intended that quality two's-complement implementations should continue to provide the same behavior mandated by C89, but unfortunately they didn't actually say so.

Resources