Related to a previous question, I can't understand some rules of MISRA C 2004.
In ISO C99 draft 2007, in 6.5 section §4 :
Some operators (the unary operator ~, and the binary operators <<, >>, &, ^, and |, collectively described as bitwise operators) are required to have operands that have integer type. These operators yield values that depend on the internal representations of integers, and have implementation-defined and undefined aspects for signed types.
Ok, using a signed integer with bitwise operators can produce undefined behaviour (and makes no sense).
A good solution is to use explicit conversion to a wider unsigned integer type in order to by-pass integral promotion, and then not use signed value with bitwise operators (see associated answers of my previous question).
But in MISRA C 2004, use of small unsigned integers with bitwise operators is possible (rule 10.5 for example). Why, if integral promotion leads to use signed values with bitwise operators? I think I don't understand some things.
The rules don't contradict each other and you don't need to widen the type. You can immediately cast the result of small integer binary operation back to its type.
A small integer will not be promoted to int for shifts unless the first operand is int.
This is from their example:
uint8_t port = 0x5aU;
uint8_t result_8;
uint16_t result_16;
result_8 = (~port) >> 4; /* not compliant */
result_8 = ((uint8_t)(~port)) >> 4; /* compliant */
result_16 = ((uint16_t)(~(uint16_t)port)) >> 4; /* compliant */
Related
Normally, C requires that a binary operator's operands are promoted to the type of the higher-ranking operand. This can be exploited to avoid filling code with verbose casts, for example:
if (x-48U<10) ...
y = x+0ULL << 40;
etc.
However, I've found that, at least with gcc, this behavior does not work for bitshifts. I.e.
int x = 1;
unsigned long long y = x << 32ULL;
I would expect the type of the right-hand operand to cause the left-hand operand to be promoted to unsigned long long so that the shift succeeds. But instead, gcc prints a warning:
warning: left shift count >= width of type
Is gcc broken, or does the standard make some exception to the type promotion rules for bitshifts?
The so-called usual arithmetic conversions apply to many binary operators, but not all of them. For example they do not apply to the bit shift operators, &&, ||, comma operator, and assignment operators. This is the rule for the bit shift operators:
6.5.7 ... 3 Semantics ...
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
The trouble really is that promotion only works up to whatever your platform defines as an int. As some other answers have stated, the bit-shift operator will promote the left operand to an int. However, here an int is defined as a 32-bit value. The integer conversion will not promote to a long long (64-bit).
Bitwise operators (~, &, | and ^) operate on the bitwise representation of their promoted operands. Can such operations cause undefined behavior?
For example, the ~ operator is defined this way in the C Standard:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.
On all architectures, ~0 produces a bit pattern with the sign bit set to 1 and all value bits set to 1. On a one's complement architecture, this representation correspond to a negative zero. Can this bit pattern be a trap representation?
Are there other examples of undefined behavior involving simple bitwise operators for more common architectures?
For one's complement systems, there's explicitly listed the possibility of trap values for those that do not support negative zeros in signed integers (C11 6.2.6.2p4):
If the implementation does not support negative zeros, the behavior of the &, |, ^, ~, <<, and >> operators with operands that would produce such a value is undefined.
Then again, one's complement systems are not exactly common; as for example GCC doesn't support any!
C11 does imply that the implementation-defined and undefined aspects are just allowed for signed types (C11 6.5p4).
Bitwise operators (~, &, | and ^) operate on the bitwise representation of their promoted operands. Can such operations cause undefined behavior?
For example, the ~ operator is defined this way in the C Standard:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.
On all architectures, ~0 produces a bit pattern with the sign bit set to 1 and all value bits set to 1. On a one's complement architecture, this representation correspond to a negative zero. Can this bit pattern be a trap representation?
Are there other examples of undefined behavior involving simple bitwise operators for more common architectures?
For one's complement systems, there's explicitly listed the possibility of trap values for those that do not support negative zeros in signed integers (C11 6.2.6.2p4):
If the implementation does not support negative zeros, the behavior of the &, |, ^, ~, <<, and >> operators with operands that would produce such a value is undefined.
Then again, one's complement systems are not exactly common; as for example GCC doesn't support any!
C11 does imply that the implementation-defined and undefined aspects are just allowed for signed types (C11 6.5p4).
gcc 4.8.4 warns about 1u << 63ul (assuming 64 bit long and 32 bit int) and computes 0. Rightfully so (no promotion from 1u to 1ul before shifting)?
ISO/IEC 9899:201x, 6.3.1.8 (Usual arithmetic conversions): "Many operators that expect operands of arithmetic type cause conversions"; 6.5.7 (Bitwise shift operators): "The integer promotions are performed on each of the operands...".
But I am not unable to conclude. Which are those "many operators"? As I understand, "integer promotion" does not pertain to types wider than int (am I correct?), but the standard does not explicitly state that the right operand of a bitwise-shift is not taken into account for the implicit type conversion.
Each operation documents this separately. For example, n1548 §6.5.5 "Multiplicative operators" ¶3
The usual arithmetic conversions are performed on the operands.
This phrase is omitted from §6.5.7 "Bitwise shift operators". Instead it says:
The integer promotions are performed on each of the operands. The type of the result is the type of the promoted left operand. …
Since the section on bitwise shift operators says nothing about "usual arithmetic conversions", that conversion does not happen.
The "usual arithmetic conversions" include conversions to floating point types and how to determine a common type for two operands.
Bitshifts do not operate on floating point types, only integers (constraint, 6.5.7p2). Different than for other binary operators taking integers only (e.g. bit-and), the two operands are not directly combined for the result; there is no requirement to have a common type for the operation. Thus, each operand is independently promoted (from your citation: "The integer promotions are performed on each of the operands").
Reading the whole paragraph 6.5.7p3 makes it clear:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
Note the emphasised sentence, which clarifies the result type is solely determined from the left operand. From that and the last sentence follows that int or unsigned int for the right operand is more than sufficient for all current implementations. The lowest upper limit of int (INT_MAX) is 32767 - much more than the number of bits in any standard type of any implementation, even if we consider future wider integers with 1024 and more bits.
Also note that your code invokes undefined behaviour, unless your platform has an unsigned int with at least 64 bits (last sentence).
The compiler correctly warns about your code invoking undefined behaviour. The warning is not required, but you should be glad it does. Treat it seriously! Any behaviour of the program is correct if you invoke undefined behaviour.
If you want a 64 bit type, use uint64_t. For a constant, use the macro UINt64_C(1) which generates an integer constant with at least 64 bits (uint_least64_t). Both are provided by stdint.h.
About your other question:
From 6.3.1.1p2:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
I can't seem to find the relevant parts in the C standard fully defining the behavior of the unary minus operator with unsigned operands.
The 2003 C++ standard (yes, C++, bear with me for a few lines) says in 5.3.1c7: The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand.
The 1999 C standard, however, doesn't include such an explicit statement and does not clearly define the unary - behavior neither in 6.5.3.3c1,3 nor in 6.5c4. In the latter it says Some operators (the unary operator ~, and the binary operators <<, >>, &, ^, and |, ...) ... return values that depend on the internal representations of integers, and have implementation-defined and undefined aspects for signed types.), which excludes the unary minus and things seem to remain vague.
This earlier question refers to the K&R ANSI C book, section A.7.4.5 that says The negative of an unsigned quantity is computed by subtracting the promoted value from the largest value of the promoted type and adding one.
What would be the 1999 C standard equivalent to the above quote from the book?
6.2.5c9 says: A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Is that it? Or is there something else I'm missing?
Yes, 6.2.5c9 is exactly the paragraph that you looked for.
The behavior of the unary minus operator on unsigned operands has nothing to do with whether a machine uses two's-complement arithmetic with signed numbers. Instead, given unsigned int x,y; the statement y=-x; will cause y to receive whatever value it would have to hold to make x+y equal zero. If x is zero, y will likewise be zero. For any other value of x, it will be UINT_MAX-x+1, in which case the arithmetic value of x+y will be UINT_MAX+1+(y-y) which, when assigned to a unsigned integer, will have UINT_MAX+1 subtracted from it, yielding zero.
In every implementation I know of, a negative is calculated as two's complement...
int a = 12;
int b = -a;
int c = ~a + 1;
assert(b == c);
...so there is really no physical difference between negative signed and "negative" unsigned integers - the only difference is in how they are interpreted.
So in this example...
unsigned a = 12;
unsigned b = -a;
int c = -a;
...the b and c are going to contain the exact same bits. The only difference is that b is interpreted as 2^32-12 (or 2^64-12), while c is interpreted as "normal" -12.
So, a negative is calculated in the exact same way regardless of "sign-ness", and the casting between unsigned and signed is actually a no-op (and can never cause an overflow in a sense that some bits need to be "cut-off").
This is late, but anyway...
C states (in a rather hard way, as mentioned in other answers already) that
any unsigned type is a binary representation with a type-specific
number of bits
all arithmetic operations on unsigned types are done (mod 2^N), 'mod'
being the mathematical definition of the modulus, and 'N' being the
number of bits used to represent the type.
The unary minus operator applied to an unsigned type behaves as if the value would have been promoted to the next bigger signed type, then negated, and then again converted to unsigned and truncated to the source type. (This is a slight simplification because of integer promotion happens for all types that have fewer bits than 'int', but it comes close enough I think.)
Some compilers do indeed give warnings when applying the unary minus to an unsigned type, but this is merely for the benefit of the programmer. IMHO the construct is well-defined and portable.
But if in doubt, just don't use the unary minus: write '0u - x' instead of '-x', and everything will be fine. Any decent code generator will create just a negate instruction from this, unless optimization is fully disabled.