Inconsistent left logical shift behavior [duplicate] - c

This question already has answers here:
What's bad about shifting a 32-bit variable 32 bits?
(5 answers)
Closed 7 years ago.
I am developing a simple C app on a CentOS linux machine my university owns and I am getting very strange inconsistent behavior with the << operator.
Basically I am attempting to shift 0xffffffff left based on a variable shiftNum which is based on variable n
int shiftNum = (32 + (~n + 1));
int shiftedBits = (0xffffffff << shiftNum);
This has the effect of shifting 0xffffffff left 32-n times and works as expected. However when n = 0 and shiftNum = 32 I get some very strange behaviour. Instead of getting the expected 0x00000000 I get 0xffffffff.
For example this script:
int n = 0;
int shiftNum = (32 + (~n + 1));
int shiftedBits = (0xffffffff << shiftNum );
printf("n: %d\n",n);
printf("shiftNum: 0x%08x\n",shiftNum);
printf("shiftedBits: 0x%08x\n",shiftedBits);
int thirtyTwo = 32;
printf("ThirtyTwo: 0x%08x\n",thirtyTwo);
printf("Test: 0x%08x\n", (0xffffffff << thirtyTwo));
Outputs:
n: 0
shiftNum: 0x00000020
shiftedBits: 0xffffffff
ThirtyTwo: 0x00000020
Test: 0x00000000
I have no idea what is going on honestly. Some crazy low-level something I suspect. Even more strange the operation (0xffffffff << (shiftNum -1)) << 1 outputs 0x00000000.
Does anyone have any clue whats going on?

If you invoke undefined behaviour, the results are unspecified and anything is valid.
When n is 0, 32 + (~n + 1) is 32 (on a two's complement CPU). If sizeof(shiftNum) == 4 (or sizeof(shiftNum) * CHAR_BIT == 32, which usually has the same result), then you are only allowed to shift by values 0..31; anything else is undefined behaviour.
ISO/IEC 9899:2011 ยง6.5.7 Bitwise shift operators:
If the value of the right operand is negative or is
greater than or equal to the width of the promoted left operand, the behavior is undefined.
The result, therefore, is correct โ€” even if you get a different answer each time you run the code, or recompile the program, or anything else.

Related

Why does gcc not add tmin + tmin correctly? [duplicate]

This question already has an answer here:
Where in the C99 standard does it say that signed integer overflow is undefined behavior?
(1 answer)
Closed 3 years ago.
I've been playing around with bitwise operations and two's complement, when I discovered this oddity.
#include <stdio.h>
int main ()
{
int tmin = 0x80000000;
printf("tmin + tmin: 0x%x\n", tmin + tmin);
printf("!(tmin + tmin): 0x%x\n", !(tmin + tmin));
}
The code above results in the following output
tmin + tmin: 0x0
!(tmin + tmin): 0x0
Why does this happen?
0x80000000 in binary is
0b10000000000000000000000000000000
When you add two 0x80000000s together,
|<- 32bits ->|
0b10000000000000000000000000000000
+ 0b10000000000000000000000000000000
------------------------------------
0b100000000000000000000000000000000
|<- 32bits ->|
However, int on your machine seem to have 32 bits, so only the lower 32 bits are preserved, which means the 1 in your result is silently discarded. This is called an Integer Overflow.
Also note that in C, signed (as opposed to unsigned, i.e. unsigned int) integer overflow is actually undefined behavior, which is why !(tmin + tmin) gives 0x0 instead of 0x1. See this blog post for an example where a variable is both true and false due to another undefined behavior, i.e. uninitialized variable.

Is this bit operator code is side effects (term used in K&R C book) or machine dependent processing instruction?

Here is two code that appears to be doing same thing,but It does not. These two different when run and compared output with tracing gives confusion as it appears that the 1st code processing is machine dependent code.
Please read the two codes
Code 1:--
unsigned char c=(((~0 << 3) >> 4) << 1);
printf("%d", c);
Output:-- 254
Code 2:--
unsigned char c=(~0 << 3);
c >>= 4;
c <<= 1;
printf("%d", c);
Output:-. 30
The Output of the above code is different.
Not only this code (1st code) giving confusion but all types of code involving single line multiple bitwise shift operator gives unexpected results.
2nd code is doing correct.
Please run this code on your machine and verify above output
AND / OR
Explain why these output are not same.
OR
Finally we have to learn that we should not apply multiple bitwise shift operator in our code.
Thanks
~0 << 3 is always a bug, neither example is correct.
0 is of type int which is signed.
~0 will convert the binary contents to all ones: 0xFF...FF.
When you left shift data into the sign bit of a signed integer, you invoke undefined behavior. Same thing if you left shift a negative integer.
Conclusion: neither example has deterministic output and both can crash or print garbage.
First, ~0 << 3 invokes undefined behavior because ~0 is a signed integer value with all bits set to 1 and you subsequently left shift into the sign bit.
Changing this to ~0u << 3 prevents UB but prints the same result, so the question is why.
So first we have this:
~0u
Which has type unsigned int. This is at least 16 bits so the value is:
0xffff
Then this:
`~0u << 3`
Gives you:
0xfff8
Then this:
((~0 << 3) >> 4)
Gives you:
0x0fff
And this:
(((~0 << 3) >> 4) << 1)
Gives you:
0x1ffe
Assigning this value to an unsigned char effectively trim it down to the low order byte:
0xfe
So it prints 254.
Now in the second case you start with this:
unsigned char c = (~0 << 3);
From above, this assigns 0xfff8 to c which gets truncated to 0xf8. Then >> 4 gives you 0x0f and << 1 gives you 0x1e which is 30.
I compiled (with x86-64 gcc 9.1) these two lines:
int main() {
unsigned char e=(~0 << 1);
unsigned char d=(((~0 << 3) >> 4) << 1);
}
And I got the following assembly output:
main:
push rbp
mov rbp, rsp
mov BYTE PTR [rbp-1], -2
mov BYTE PTR [rbp-2], -2
mov eax, 0
pop rbp
ret
As you can see, both lines are converted to the same instruction mov BYTE PTR [rbp-1], -2. So, it seems the compiler is making an optimization with your first code.
Thanks to Thomas Jager for his answer (given on question comment)
The solution is simple.
In 1st code, the bit manipulation is performed with taking operand as signed char.
Because of this, two complement binary number continue to change its bit pattern as the bit manipulation is in process. After that the result two complement number is converted to positive number before assigning to unsigned variable c. Hence Result is 254 finally.
The question is to explain why two code output is different.
We all know Code 2nd is working good.
Hence i explain only why Code 1 is working incorrectly.
1st Code : -
unsigned char c=(((~0 << 3) >> 4) << 1);
printf("%d", c);
The tracing of 1st code are as follows : -
Step 1: ~0 -----> -1 ----(binary form)----> 11111111 with sign bit 1 (means negative)
Step 2: (sign bit 1)11111111 << 3 -----shifting to left----> (sign bit 1)11111000
Step 3 ***: (sign bit 1)11111000 >> 4 ----shifing to right-----> (sign bit 1)11111111
*[*** - The left most bits is 1 in Result because of sign extension
Sign bit 1 retain its bit to 1 but right shifting the number will append 1 to
left most bits without modify sign bit to 0 .
Hence all left most bit append to 1 because sign bit 1 is supplying this 1 to
left most bits while right shifting ]*
Step 4: (sign bit 1)11111111 << 1 ---shifting to left---> (sign bit 1)11111110
Step 5: two complement number (sign bit 1)11111110 converted to positive number
by deleting only sign bit to 0.
Step 6: Result : (sign bit 0)11111110 ---decimal equivalent---> 254
I only just explain his answer.
Thanks to all for giving effort for answer of this question.

Shifting the same constant by another yields two different answers? [duplicate]

This question already has answers here:
Unexpected output when executing left-shift by 32 bits
(2 answers)
undefined behavior when left operand is negative
(3 answers)
Closed 8 years ago.
I'm debugging some code and came across some behavior I cannot explain.
I am trying to shift the number -1 to the left 32 times to produce a zero in this particular case.
int n = 0;
int negOne = ~0;
int negativeN = ( (~n) + 1 );
int toShift = (32 + negativeN); //32 - n
/*HELP!!! These produce two different answers*/
printf("%d << %d = %d \n",negOne, toShift, negOne << toShift);
printf("-1 << 32 = %d \n", -1 << 32) ;
Here is the what the console outputs:
-1 << 32 = -1
-1 << 32 = 0
I am not sure why the left shift is behaving differently in each of these cases.
It's undefined behavior because you shift count is bigger than the number of bits for an int, that means that the result can't be predicted.
When you Shift a number equal to or more than the number of its bit times your result can't be predictable! Simply it is Undefined_behavior.
If you compile your program with flags you will get warning for this shifting

How to detect in C whether your machine is 32-bits

So I am revising for an exam and I got stuck in this problem:
2.67 โ—†โ—†
You are given the task of writing a procedure int_size_is_32() that yields 1
when run on a machine for which an int is 32 bits, and yields 0 otherwise. You are
not allowed to use the sizeof operator. Here is a first attempt:
1 /* The following code does not run properly on some machines */
2 int bad_int_size_is_32() {
3 /* Set most significant bit (msb) of 32-bit machine */
4 int set_msb = 1 << 31;
5 /* Shift past msb of 32-bit word */
6 int beyond_msb = 1 << 32;
7
8 /* set_msb is nonzero when word size >= 32
9 beyond_msb is zero when word size <= 32 */
10 return set_msb && !beyond_msb;
11 }
When compiled and run on a 32-bitSUNSPARC, however, this procedure returns 0. The following compiler message gives us an indication of the problem: warning: left shift count >= width of type
A. In what way does our code fail to comply with the C standard?
B. Modify the code to run properly on any machine for which data type int is
at least 32 bits.
C. Modify the code to run properly on any machine for which data type int is
at least 16 bits.
__________ MY ANSWERS:
A: When we shift by 31 in line 4, we overflow, bec according to the unsigned integer standard, the maximum unsigned integer we can represent is 2^31-1
B: In line 4 1<<30
C: In line 4 1<<14 and in line 6 1<<16
Am I right? And if not why please? Thank you!
__________ Second tentative answer:
B: In line 4 (1<<31)>>1 and in line 6: int beyond_msb = set_msb+1; I think I might be right this time :)
A: When we shift by 31 in line 4, we overflow, bec according to the unsigned integer standard, the maximum unsigned integer we can represent is 2^31-1
The error is on line 6, not line 4. The compiler message explains exactly why: shifting by a number of bits greater than the size of the type is undefined behavior.
B: In line 4 1<<30
C: In line 4 1<<14 and in line 6 1<<16
Both of those changes will cause the error to not appear, but will also make the function give incorrect results. You will need to understand how the function works (and how it doesn't work) before you fix it.
For first thing shifting by 30 will not create any overflow as max you can shift is word size w-1.
So when w = 32 you can shift till 31.
Overflow occurs when you shift it by 32 bits as lsb will now move to 33rd bit which is out of bound.
So the problem is in line 6 not 4.
For B.
0xffffffff + 1
If it is 32 bit then it will result 0 otherwise some nozero no.
There is absolutely no way to test the size of signed types in C at runtime. This is because overflow is undefined behavior; you cannot tell if overflow has happened. If you use unsigned int, you can just count how many types you can double a value that starts at 1 before the result becomes zero.
If you want to do the test at compile-time instead of runtime, this will work:
struct { int x:N; };
where N is replaced by successively larger values. The compiler is required to accept the program as long as N is no larger than the width of int, and reject it with a diagnostic/error when N is larger.
You should be able to comply with the C standard by breaking up the shifts left.
B -
Replace Line 6 with
int beyond_msb = (1 << 31) << 1;
C -
Replace Line 4 with
int set_msb = ((1 << 15) << 15) << 1 ;
Replace Line 6 with
int beyond_msb = ((1 << 15) << 15) << 2;
Also, as an extension to the question the following should satisify both B and C, and keep runtime error safe. Shifting left a bit at a time until it reverts back to all zeroes.
int int_size_is_32() {
//initialise our test integer variable.
int x = 1;
//count for checking purposes
int count = 0;
//keep shifting left 1 bit until we have got pushed the 1-bit off the left of the value type space.
while ( x != 0 ) {
x << 1 //shift left
count++;
}
return (count==31);
}

C macro to create a bit mask -- possible? And have I found a GCC bug?

I am somewhat curious about creating a macro to generate a bit mask for a device register, up to 64bits. Such that BIT_MASK(31) produces 0xffffffff.
However, several C examples do not work as thought, as I get 0x7fffffff instead. It is as-if the compiler is assuming I want signed output, not unsigned. So I tried 32, and noticed that the value wraps back around to 0. This is because of C standards stating that if the shift value is greater than or equal to the number of bits in the operand to be shifted, then the result is undefined. That makes sense.
But, given the following program, bits2.c:
#include <stdio.h>
#define BIT_MASK(foo) ((unsigned int)(1 << foo) - 1)
int main()
{
unsigned int foo;
char *s = "32";
foo = atoi(s);
printf("%d %.8x\n", foo, BIT_MASK(foo));
foo = 32;
printf("%d %.8x\n", foo, BIT_MASK(foo));
return (0);
}
If I compile with gcc -O2 bits2.c -o bits2, and run it on a Linux/x86_64 machine, I get the following:
32 00000000
32 ffffffff
If I take the same code and compile it on a Linux/MIPS (big-endian) machine, I get this:
32 00000000
32 00000000
On the x86_64 machine, if I use gcc -O0 bits2.c -o bits2, then I get:
32 00000000
32 00000000
If I tweak BIT_MASK to ((unsigned int)(1UL << foo) - 1), then the output is 32 00000000 for both forms, regardless of gcc's optimization level.
So it appears that on x86_64, gcc is optimizing something incorrectly OR the undefined nature of left-shifting 32 bits on a 32-bit number is being determined by the hardware of each platform.
Given all of the above, is it possible to programatically create a C macro that creates a bit mask from either a single bit or a range of bits?
I.e.:
BIT_MASK(6) = 0x40
BIT_FIELD_MASK(8, 12) = 0x1f00
Assume BIT_MASK and BIT_FIELD_MASK operate from a 0-index (0-31). BIT_FIELD_MASK is to create a mask from a bit range, i.e., 8:12.
Here is a version of the macro which will work for arbitrary positive inputs. (Negative inputs still invoke undefined behavior...)
#include <limits.h>
/* A mask with x least-significant bits set, possibly 0 or >=32 */
#define BIT_MASK(x) \
(((x) >= sizeof(unsigned) * CHAR_BIT) ?
(unsigned) -1 : (1U << (x)) - 1)
Of course, this is a somewhat dangerous macro as it evaluates its argument twice. This is a good opportunity to use a static inline if you use GCC or target C99 in general.
static inline unsigned bit_mask(int x)
{
return (x >= sizeof(unsigned) * CHAR_BIT) ?
(unsigned) -1 : (1U << x) - 1;
}
As Mysticial noted, shifting more than 32 bits with a 32-bit integer results in implementation-defined undefined behavior. Here are three different implementations of shifting:
On x86, only examine the low 5 bits of the shift amount, so x << 32 == x.
On PowerPC, only examine the low 6 bits of the shift amount, so x << 32 == 0 but x << 64 == x.
On Cell SPUs, examine all bits, so x << y == 0 for all y >= 32.
However, compilers are free to do whatever they want if you shift a 32-bit operand 32 bits or more, and they are even free to behave inconsistently (or make demons fly out your nose).
Implementing BIT_FIELD_MASK:
This will set bit a through bit b (inclusive), as long as 0 <= a <= 31 and 0 <= b <= 31.
#define BIT_MASK(a, b) (((unsigned) -1 >> (31 - (b))) & ~((1U << (a)) - 1))
Shifting by more than or equal to the size of the integer type is undefined behavior.
So no, it's not a GCC bug.
In this case, the literal 1 is of type int which is 32-bits in both systems that you used. So shifting by 32 will invoke this undefined behavior.
In the first case, the compiler is not able to resolve the shift-amount to 32. So it likely just issues the normal shift-instruction. (which in x86 uses only the bottom 5-bits) So you get:
(unsigned int)(1 << 0) - 1
which is zero.
In the second case, GCC is able to resolve the shift-amount to 32. Since it is undefined behavior, it (apparently) just replaces the entire result with 0:
(unsigned int)(0) - 1
so you get ffffffff.
So this is a case of where GCC is using undefined behavior as an opportunity to optimize.
(Though personally, I'd prefer that it emits a warning instead.)
Related: Why does integer overflow on x86 with GCC cause an infinite loop?
Assuming you have a working mask for n bits, e.g.
// set the first n bits to 1, rest to 0
#define BITMASK1(n) ((1ULL << (n)) - 1ULL)
you can make a range bitmask by shifting again:
// set bits [k+1, n] to 1, rest to 0
#define BITNASK(n, k) ((BITMASK(n) >> k) << k)
The type of the result is unsigned long long int in any case.
As discussed, BITMASK1 is UB unless n is small. The general version requires a conditional and evaluates the argument twice:
#define BITMASK1(n) (((n) < sizeof(1ULL) * CHAR_BIT ? (1ULL << (n)) : 0) - 1ULL)
#define BIT_MASK(foo) ((~ 0ULL) >> (64-foo))
I'm a bit paranoid about this. I think this assumes that unsigned long long is exactly 64 bits. But it's a start and it works up to 64 bits.
Maybe this is correct:
define BIT_MASK(foo) ((~ 0ULL) >> (sizeof(0ULL)*8-foo))
A "traditional" formula (1ul<<n)-1 has different behavior on different compilers/processors for n=8*sizeof(1ul). Most commonly it overflows for n=32. Any added conditionals will evaluate n multiple times. Going 64-bits (1ull<<n)-1 is an option, but problem migrates to n=64.
My go-to formula is:
#define BIT_MASK(n) (~( ((~0ull) << ((n)-1)) << 1 ))
It does not overflow for n=64 and evaluates n only once.
As downside it will compile to 2 LSH instructions if n is a variable. Also n cannot be 0 (result will be compiler/processor-specific), but it is a rare possibility for all uses that I have(*) and can be dealt with by adding a guarding "if" statement only where necessary (and even better an "assert" to check both upper and lower boundaries).
(*) - usually data comes from a file or pipe, and size is in bytes. If size is zero, then there's no data, so code should do nothing anyway.
What about:
#define BIT_MASK(n) (~(((~0ULL) >> (n)) << (n)))
This works on all endianess system, doing -1 to invert all bits doesn't work on big-endian system.
Since you need to avoid shifting by as many bits as there are in the type (whether that's unsigned long or unsigned long long), you have to be more devious in the masking when dealing with the full width of the type. One way is to sneak up on it:
#define BIT_MASK(n) (((n) == CHAR_BIT * sizeof(unsigned long long)) ? \
((((1ULL << (n-1)) - 1) << 1) | 1) : \
((1ULL << (n )) - 1))
For a constant n such as 64, the compiler evaluates the expression and generates only the case that is used. For a runtime variable n, this fails just as badly as before if n is greater than the number of bits in unsigned long long (or is negative), but works OK without overflow for values of n in the range 0..(CHAR_BIT * sizeof(unsigned long long)).
Note that CHAR_BIT is defined in <limits.h>.
#iva2k's answer avoids branching and is correct when the length is 64 bits. Working on that, you can also do this:
#define BIT_MASK(length) ~(((unsigned long long) -2) << length - 1);
gcc would generate exactly the same code anyway, though.

Resources