using bit shifting variables inside if statement - error or not - c

Suppose we have some variables x and y, and the following if statement which involves bit shifting:
if (x<<y)
I've read some posts which also deal with the issue of using bit shifting with variables (of some type) and inside if statement, but unfortunately I haven't been able to reach a unequivocal conclusion whether it is an error or not.
I assume that if it is an error, then it's a semantic error or a run-time error .
But is it necessarily en error ?

If x is of an unsigned integer type that is at least as large as unsigned int, and y is less than the number of bits in x's type, then the above partial statement will test whether bits in x that aren't in the top y are set. The C89 Standard would require that implementations behave likewise if x is of a signed type or a small unsigned type, with the caveat that setting the top bit of a small signed type is regarded as setting all bits beyond. The C99 and later standards, however, wouldn't require that implementations usefully process any situation in which x is non-zero but the expression x<<y would yield zero, unless x is an unsigned integer type at least as large as unsigned int.

It's not a syntactic error. if expects a parenthesized expression. (int_x<<int_y) satisfies that. The shift expression may cause a runtime error, but only if the particular values of int_x and int_y invoke undefined behavior (see 6.5.7 for when that might happen).

Related

Bitwise operation results in unexpected variable size

Context
We are porting C code that was originally compiled using an 8-bit C compiler for the PIC microcontroller. A common idiom that was used in order to prevent unsigned global variables (for example, error counters) from rolling over back to zero is the following:
if(~counter) counter++;
The bitwise operator here inverts all the bits and the statement is only true if counter is less than the maximum value. Importantly, this works regardless of the variable size.
Problem
We are now targeting a 32-bit ARM processor using GCC. We've noticed that the same code produces different results. So far as we can tell, it looks like the bitwise complement operation returns a value that is a different size than we would expect. To reproduce this, we compile, in GCC:
uint8_t i = 0;
int sz;
sz = sizeof(i);
printf("Size of variable: %d\n", sz); // Size of variable: 1
sz = sizeof(~i);
printf("Size of result: %d\n", sz); // Size of result: 4
In the first line of output, we get what we would expect: i is 1 byte. However, the bitwise complement of i is actually four bytes which causes a problem because comparisons with this now will not give the expected results. For example, if doing (where i is a properly-initialized uint8_t):
if(~i) i++;
we will see i "wrap around" from 0xFF back to 0x00. This behaviour is different in GCC compared with when it used to work as we intended in the previous compiler and 8-bit PIC microcontroller.
We are aware that we can resolve this by casting like so:
if((uint8_t)~i) i++;
or, by
if(i < 0xFF) i++;
however in both of these workarounds, the size of the variable must be known and is error-prone for the software developer. These kinds of upper bounds checks occur throughout the codebase. There are multiple sizes of variables (eg., uint16_t and unsigned char etc.) and changing these in an otherwise working codebase is not something we're looking forward to.
Question
Is our understanding of the problem correct, and are there options available to resolving this that do not require re-visiting each case where we've used this idiom? Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand? It seems like this would break, depending on processor architectures. I feel like I'm taking crazy pills and that C should be a bit more portable than this. Again, our understanding of this could be wrong.
On the surface this might not seem like a huge issue but this previously-working idiom is used in hundreds of locations and we're eager to understand this before proceeding with expensive changes.
Note: There is a seemingly similar but not exact duplicate question here: Bitwise operation on char gives 32 bit result
I didn't see the actual crux of the issue discussed there, namely, the result size of a bitwise complement being different than what's passed into the operator.
What you are seeing is the result of integer promotions. In most cases where an integer value is used in an expression, if the type of the value is smaller than int the value is promoted to int. This is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an intor
unsigned int may be used
An object or expression with an integer type (other than intor unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int ,signed int, orunsigned int`.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions. All
other types are unchanged by the integer promotions.
So if a variable has type uint8_t and the value 255, using any operator other than a cast or assignment on it will first convert it to type int with the value 255 before performing the operation. This is why sizeof(~i) gives you 4 instead of 1.
Section 6.5.3.3 describes that integer promotions apply to the ~ operator:
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the
result has the promoted type. If the promoted type is an unsigned
type, the expression ~E is equivalent to the maximum value
representable in that type minus E.
So assuming a 32 bit int, if counter has the 8 bit value 0xff it is converted to the 32 bit value 0x000000ff, and applying ~ to it gives you 0xffffff00.
Probably the simplest way to handle this is without having to know the type is to check if the value is 0 after incrementing, and if so decrement it.
if (!++counter) counter--;
The wraparound of unsigned integers works in both directions, so decrementing a value of 0 gives you the largest positive value.
in sizeof(i); you request the size of the variable i, so 1
in sizeof(~i); you request the size of the type of the expression, which is an int, in your case 4
To use
if(~i)
to know if i does not value 255 (in your case with an the uint8_t) is not very readable, just do
if (i != 255)
and you will have a portable and readable code
There are multiple sizes of variables (eg., uint16_t and unsigned char etc.)
To manage any size of unsigned :
if (i != (((uintmax_t) 2 << (sizeof(i)*CHAR_BIT-1)) - 1))
The expression is constant, so computed at compile time.
#include <limits.h> for CHAR_BIT and #include <stdint.h> for uintmax_t
Here are several options for implementing “Add 1 to x but clamp at the maximum representable value,” given that x is some unsigned integer type:
Add one if and only if x is less than the maximum value representable in its type:
x += x < Maximum(x);
See the following item for the definition of Maximum. This method
stands a good chance of being optimized by a compiler to efficient
instructions such as a compare, some form of conditional set or move,
and an add.
Compare to the largest value of the type:
if (x < ((uintmax_t) 2u << sizeof x * CHAR_BIT - 1) - 1) ++x
(This calculates 2N, where N is the number of bits in x, by shifting 2 by N−1 bits. We do this instead of shifting 1 N bits because a shift by the number of bits in a type is not defined by the C standard. The CHAR_BIT macro may be unfamiliar to some; it is the number of bits in a byte, so sizeof x * CHAR_BIT is the number of bits in the type of x.)
This can be wrapped in a macro as desired for aesthetics and clarity:
#define Maximum(x) (((uintmax_t) 2u << sizeof (x) * CHAR_BIT - 1) - 1)
if (x < Maximum(x)) ++x;
Increment x and correct if it wraps to zero, using an if:
if (!++x) --x; // !++x is true if ++x wraps to zero.
Increment x and correct if it wraps to zero, using an expression:
++x; x -= !x;
This is is nominally branchless (sometimes beneficial for performance), but a compiler may implement it the same as above, using a branch if needed but possibly with unconditional instructions if the target architecture has suitable instructions.
A branchless option, using the above macro, is:
x += 1 - x/Maximum(x);
If x is the maximum of its type, this evaluates to x += 1-1. Otherwise, it is x += 1-0. However, division is somewhat slow on many architectures. A compiler may optimize this to instructions without division, depending on the compiler and the target architecture.
Before stdint.h the variable sizes can vary from compiler to compiler and the actual variable types in C are still int, long, etc and are still defined by the compiler author as to their size. Not some standard nor target specific assumptions. The author(s) then need to create stdint.h to map the two worlds, that is the purpose of stdint.h to map the uint_this that to int, long, short.
If you are porting code from another compiler and it uses char, short, int, long then you have to go through each type and do the port yourself, there is no way around it. And either you end up with the right size for the variable, the declaration changes but the code as written works....
if(~counter) counter++;
or...supply the mask or typecast directly
if((~counter)&0xFF) counter++;
if((uint_8)(~counter)) counter++;
At the end of the day if you want this code to work you have to port it to the new platform. Your choice as to how. Yes, you have to spend the time hit each case and do it right, otherwise you are going to keep coming back to this code which is even more expensive.
If you isolate the variable types on the code before porting and what size the variable types are, then isolate the variables that do this (should be easy to grep) and change their declarations using stdint.h definitions which hopefully won't change in the future, and you would be surprised but the wrong headers are used sometimes so even put checks in so you can sleep better at night
if(sizeof(uint_8)!=1) return(FAIL);
And while that style of coding works (if(~counter) counter++;), for portability desires now and in the future it is best to use a mask to specifically limit the size (and not rely on the declaration), do this when the code is written in the first place or just finish the port and then you won't have to re-port it again some other day. Or to make the code more readable then do the if x<0xFF then or x!=0xFF or something like that then the compiler can optimize it into the same code it would for any of these solutions, just makes it more readable and less risky...
Depends on how important the product is or how many times you want send out patches/updates or roll a truck or walk to the lab to fix the thing as to whether you try to find a quick solution or just touch the affected lines of code. if it is only a hundred or few that is not that big of a port.
6.5.3.3 Unary arithmetic operators
...
4 The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
C 2011 Online Draft
The issue is that the operand of ~ is being promoted to int before the operator is applied.
Unfortunately, I don't think there's an easy way out of this. Writing
if ( counter + 1 ) counter++;
won't help because promotions apply there as well. The only thing I can suggest is creating some symbolic constants for the maximum value you want that object to represent and testing against that:
#define MAX_COUNTER 255
...
if ( counter < MAX_COUNTER-1 ) counter++;

MISRA warning 12.4: integer conversion resulted in truncation (negation operation)

In a huge macro I have in a program aimed for a 16-bit processor, the following code (simplified) appears several times:
typedef unsigned short int uint16_t;
uint16_t var;
var = ~0xFFFF;
MISRA complains with the warning 12.4: integer conversion resulted in truncation. The tool used to get this is Coverity.
I have checked the forum but I really need a solution (instead of changing the negation by the actual value) as this line is inside a macro with varying parameters.
I have tried many things and here is the final attempt which fails also:
var = (uint16_t)((~(uint16_t)(0xFFFFu))&(uint16_t)0xFFFFu);
(the value 0xFFFF is just an example. In the actual code, the value is a variable which can take whatever value (but 16 bits))
Do you have any other idea please? Thanks.
EDIT:
I have tried then to use 32bits value and the result is the same with the following code:
typedef unsigned int uint32_t;
uint32_t var;
var = (uint32_t)(~(uint32_t)(0xFFFF0000u));
Summary:
Assuming you are using a static analyser for MISRA-C:2012, you should have gotten warnings for violations against rule 10.3 and 7.2.
Rule 12.4 is only concerned with wrap-around of unsigned integer constants, which can only occur with the binary + and - operators. It seems irrelevant here.
The warning text doesn't seem to make sense for neither MISRA-C:2004 12.4 nor MISRA-C:2012 12.4. Possibly, the tool is displaying the wrong warning.
There is however a MISRA:2012 rule 10.3 that forbids to assign a value to a variable that is of a smaller type than intended in the expression.
To use MISRA terms, the essential type of ~0xFFFF is unsigned, because the hex literal is of type unsigned int. On your system, unsigned int is apparently larger than uint16_t (int is a "greater ranked" integer type than short in the standard 6.3.1.1, even if they are of the same size). That is, uint16_t is of a narrower essential type than unsigned int, so your code does not conform to rule 10.3. This is what your tool should have reported.
The actual technical issue, which is hidden behind the MISRA terms, is that the ~ operator is dangerous because it comes with an implicit integer promotion. Which in turn causes code like for example
uint8_t x=0xFF;
~x << n; // BAD, always a bug
to invoke undefined behavior when the value 0xFFFFFF00 is left shifted.
It is therefore always good practice to cast the result of the ~ operator to the correct, intended type. There was even an explicit rule about this in MISRA 2004, which has now merged into the "essential type" rules.
In addition, MISRA (7.2) states that all integer constants should have an u or U suffix.
MISRA-C:2012 compliant code would look like this:
uint16_t var;
var = (uint16_t)~0xFFFFu;
or overly pedantic:
var = (uint16_t)~(uint16_t)0xFFFFu;
When the compiler looks at the right side, first it sees the literal 0xFFFF. It is automatically promoted to an integer which is (obvious from the warning) 32-bit in your system. Now we can imagine that value as 0x0000FFFF (whole 32-bit). When the compiler does the ~ operation on it, it becomes 0xFFFF0000 (whole 32-bit). When you write var = ~0xFFFF; the compiler in fact sees var = 0xFFFF0000; just before the assign operation. And of course a truncation happens during this assignment...

Initializing bit-fields

When you write
struct {
unsigned a:3, b:2;
} x = {10, 11};
is x.b guaranteed to be 3 by ANSI C (C89)? I have read and reread the standard, but can't seem to find exactly that case.
For example, "result that cannot be represented by the
resulting unsigned integer type is reduced modulo the number that is
one greater than the largest value that can be represented by the
resulting unsigned integer type." speaks about computation, not about initialization. And moreover, bit-field is not really a type.
Also, (when speaking about unsigned t:4) "contains values in the range [0,15]", but it doesn't necessarily mean that initializer must be reduced modulo 16 to be mapped to [0,15].
Struct initialization is really painstakingly detailedly described, but I really can't seem to find exactly that behavior. (Of course compilers do exactly that. And IBM documentation says " when you assign a value that is out of range to a bit field, the low-order bit pattern is preserved and the appropriate bits are assigned.", but I'd like to know if ANSI C standardizes that.
"ANSI C"/C89 has been obsolete for 25 years. Therefore, my answer cites the current C standard ISO 9899:2011, also known as C11.
Pretty much everything related to bit-fields in the C standard is poorly defined. Typically, you will not find anything explicitly addressing the behavior of bit fields, but their behavior is rather specified implicitly, "between the lines". This is why you should avoid using bit fields.
However, I believe that this specific case is well-defined: it should work like any other integer initialization.
The detailed struct initialization rules you mention (6.7.9) show how the literal 11 in the initializer list is related to the variable b. Nothing strange with that. What then applies is "simple assignment", the same thing that would happen as if you wrote x.b = 11;.
When doing any kind of assignment or initialization in C, the right operand is converted to the type of the left operand. This is specified by C11 6.5.16:
In simple assignment (=), the value of the right operand is converted
to the type of the assignment expression and replaces the value stored
in the object designated by the left operand.
In your case, the literal 11 of type int is converted to a bit field of unsigned int:2.
Therefore, the rule you are looking for should be found in the chapter dealing with conversions (C11 6.3). What applies is what you already cited in your question, C11 6.3.1.3:
...if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The maximum value of an unsigned int:2 is 3. One more than the maximum value is 3+1=4. The compiler should repeatedly subtract this from the value 11:
11 - (3+1) = 7 does not fit, subtract once more:
7 - (3+1) = 3 does fit, store value 3
But then of course, this is the very same thing as taking the 2 least significant bits of the decimal value 11 and storing them in the bit field.
WRT "speaks about computation, not about initialization", the C89 standard explicitly applies the rules of assignment and conversion to initialization. It also says:
A bit-field is interpreted as an integral type consisting of the specified number of bits.
Given those, while a compiler warning would clearly be in order, it seems that throwing away upper-order bits is guaranteed by the standard.

Signed integers' undefined behavior and Apple Secure Coding Guide

Apple Secure Coding Guide says the following (page 27):
Also, any bits that overflow past the length of an integer variable (whether signed or unsigned) are dropped.
However, regards to signed integer overflow C standard (89) says:
An example of undefined behavior is the behavior on integer overflow.
and
If an exception occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not representable), the behavior is undefined.
Is the Coding Guide wrong? Is there something here that I don't get? I am not convinced myself that Apple Secure Coding Guide could get this wrong.
Here is a second opinion, from a static analyzer described as detecting undefined behavior:
int x;
int main(){
x = 0x7fffffff + 1;
}
The analyzer is run so:
$ frama-c -val -machdep x86_32 t.c
And it produces:
[kernel] preprocessing with "gcc -C -E -I. t.c"
[value] Analyzing a complete application starting at main
...
t.c:4:[kernel] warning: signed overflow. assert 0x7fffffff+1 ≤ 2147483647;
...
[value] Values at end of function main:
NON TERMINATING FUNCTION
This means that the program t.c contains undefined behavior, and that no execution of it ever terminates without causing undefined behavior.
Let's take this example:
1 << 32
If we assume 32-bit int, C clearly says it is undefined behavior. Period.
But any implementation can define this undefined behavior.
gcc for example says (while not very explicit in defining the behavior):
GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change.
http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
I don't know for clang but I suspect that as for gcc, the evaluation of an expression like 1 << 32 would give no surprise (that is, evaluate to 0).
But even if it is defined on implementations running in Apple operating systems, a portable program should not make use of expressions that invoke undefined behavior in the C language.
EDIT: I thought the Apple sentence was dealing only with bitwise << operator. It looks like it's more general and in that case for C language, they are utterly wrong.
The two statements are not mutually incompatible.
The standard does not define what behaviour each implementation is required to provide (so different implementations can do different things and still be standard conformant).
Apple is allowed to define the behaviour of its implementation.
You as a programmer would be well advised to treat the behaviour as undefined since your code may need to be moved to other platforms where the behaviour is different, and perhaps because Apple could, in theory, change its mind in the future and still conform to the standard.
Consider the code
void test(int mode)
{
int32_t a = 0x12345678;
int32_t b = mode ? a*0x10000 : a*0x10000LL;
return b;
}
If this method is invoked with a mode value of zero, the code will compute the long long value 0x0000123456780000 and store it into a. The behavior of this is fully defined by the C standard: if bit 31 of the result is clear, it will lop off all but the bottom 32 bits and store the resulting (positive) integer into a. If bit 31 were set and the result were being stored to a 32-bit int rather than a variable of type int32_t, the implementation would have some latitude, but implementations are only allowed to define int32_t if they would perform such narrowing conversions according to the rules of two's-complement math.
If this method were invoked with a non-zero mode value, then the numerical computation would yield a result outside the range of the temporary expression value, and as such would cause Undefined Behavior. While the rules dictate what should happen if a calculation performed on a longer type is stored into a shorter one, they do not indicate what should happen if calculations don't fit in the type with which they are performed. A rather nasty gap in the standard (which should IMHO be plugged) occurs with:
uint16_t multiply(uint16_t x, uint16_t y)
{
return x*y;
}
For all combinations of x and y values where the Standard says anything about what this function should do, the Standard requires that it compute and return the product mod 65536. If the Standard were to mandate that for all combinations of x and y values 0-65535 this method must return the arithmetical value of (x*y) mod 65536, it would be mandating behavior with which 99.99% of standards-compliant compilers would already be in conformance. Unfortunately, on machines where int is 32 bits, the Standard presently imposes no requirements with regard to this function's behavior in cases where the arithmetical product would be larger than 2147483647. Even though any portion of the intermediate result beyond the bottom 16 bits will ignored, the code will try to evaluate the result using a 32-bit signed integer type; the Standard imposes no requirements on what should happen if a compiler recognizes that the product will overflow that type.

How is {int i=999; char c=i;} different from {char c=999;}?

My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?
They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.
Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)
It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.
Aside from the fact that the first also defines an object iof type int, the semantics are identical.
i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.

Resources