Bizarre right bitshift inconsistency - c

I've been working with bits in C (running on ubuntu). In using two different ways to right shift an integer, I got oddly different outputs:
#include <stdio.h>
int main(){
int x = 0xfffffffe;
int a = x >> 16;
int b = 0xfffffffe >> 16;
printf("%X\n%X\n", a, b);
return 0;
}
I would think the output would be the same for each: FFFF, because the right four hex places (16 bits) are being rightshifted away. Instead, the output is:
FFFFFFFF
FFFF
What explains this behaviour?

When you say:
int x = 0xfffffffe;
That sets x to -2 because the maximum value an int can hold here is 0x7FFFFFFF and it wraps around during conversion. When you bit-shift the negative number it gets weird.
If you change those values to unsigned int it all works out.
#include <stdio.h>
int main(){
unsigned int x = 0xfffffffe;
unsigned int a = x >> 16;
unsigned int b = 0xfffffffe >> 16;
printf("%X\n%X\n", a, b);
return 0;
}

The behaviour you see here has to do with shifting on signed or unsigned integers which give different results.
Shifts on unsigned integers are logical. On the contrary, shift on signed integers are arithmetic. EDIT: In C, it's implementation defined but generally the case.
Consequently,
int x = 0xfffffffe;
int a = x >> 16;
this part performs an arithmetic shift because x is signed. And because x is actually negative (-2 in two's complement), x is sign extended, so '1's are appended which results in 0xFFFFFFFF.
On the contrary,
int b = 0xfffffffe >> 16;
0xfffffffe is a litteral interpreted as an unsigned integer. Therefore a logical shift of 16 results in 0x0000FFFF as expected.

Related

Extract k bits from any side of hex notation

int X = 0x1234ABCD;
int Y = 0xcdba4321;
// a) print the lower 10 bits of X in hex notation
int output1 = X & 0xFF;
printf("%X\n", output1);
// b) print the upper 12 bits of Y in hex notation
int output2 = Y >> 20;
printf("%X\n", output2);
I want to print the lower 10 bits of X in hex notation; since each character in hex is 4 bits, FF = 8 bits, would it be right to & with 0x2FF to get the lower 10 bits in hex notation.
Also, would shifting right by 20 drop all 20 bits at the end, and keep the upper 12 bits only?
I want to print the lower 10 bits of X in hex notation; since each character in hex is 4 bits, FF = 8 bits, would it be right to & with 0x2FF to get the lower 10 bits in hex notation.
No, that would be incorrect. You'd want to use 0x3FF to get the low 10 bits. (0x2FF in binary is: 1011111111). If you're a little uncertain with hex values, an easier way to do that these days is via binary constants instead, e.g.
// mask lowest ten bits in hex
int output1 = X & 0x3FF;
// mask lowest ten bits in binary
int output1 = X & 0b1111111111;
Also, would shifting right by 20 drop all 20 bits at the end, and keep the upper 12 bits only?
In the case of LEFT shift, zeros will be shifted in from the right, and the higher bits will be dropped.
In the case of RIGHT shift, it depends on the sign of the data type you are shifting.
// unsigned right shift
unsigned U = 0x80000000;
U = U >> 20;
printf("%x\n", U); // prints: 800
// signed right shift
int S = 0x80000000;
S = S >> 20;
printf("%x\n", S); // prints: fffff800
Signed right-shift typically shifts the highest bit in from the left. Unsigned right-shift always shifts in zero.
As an aside: IIRC the C standard is a little vague wrt to signed integer shifts. I believe it is theoretically possible to have a hardware platform that shifts in zeros for signed right shift (i.e. micro-controllers). Most of your typical platforms (Intel/Arm) will shift in the highest bit though.
Assuming 32 bit int, then you have the following problems:
0xcdba4321 is too large to fit inside an int. The hex constant itself will actually be unsigned int in this specific case, because of an oddball type rule in C. From there you force an implicit conversion to int, likely ending up with a negative number.
Y >> 20 right shifts a negative number, which is non-portable behavior. It can either shift in ones (arithmetic shift) or zeroes (logical shift), depending on compiler. Whereas right shifting unsigned types is well-defined and always results in logical shift.
& 0xFF masks out 8 bits, not 10.
%X expects an unsigned int, not an int.
The root of all your problems is "sloppy typing" - that is, writing int all over the place when you actually need a more suitable type. You should start using the portable types from stdint.h instead, in this case uint32_t. Also make a habit of always ending you hex constants with a u or U suffix.
A fixed program:
#include <stdio.h>
#include <stdint.h>
int main (void)
{
uint32_t X = 0x1234ABCDu;
uint32_t Y = 0xcdba4321u;
printf("%X\n", X & 0x3FFu);
printf("%X\n", Y >> (32-12));
}
The 0x3FFu mask can also be written as ( (1u<<10) - 1).
(Strictly speaking you need to printf the stdint.h types using specifiers from inttypes.h but lets not confuse the answer by introducing those at the same time.)
Lots of high-value answers to this question.
Here's more info that might spark curiosity...
int main() {
uint32_t X;
X = 0x1234ABCDu; // your first hex number
printf( "%X\n", X );
X &= ((1u<<12)-1)<<20; // mask 12 bits, shifting mask left
printf( "%X\n", X );
X = 0x1234ABCDu; // your first hex number
X &= ~0u^(~0u>>12);
printf( "%X\n", X );
X = 0x0234ABCDu; // Note leading 0 printed in two styles
printf( "%X %08X\n", X, X );
return 0;
}
1234ABCD
12300000
12300000
234ABCD 0234ABCD
print the upper 12 bits of Y in hex notation
To handle this when the width of int is not known, first determine the width with code like sizeof(unsigned)*CHAR_BIT. (C specifies it must be at least 16-bit.)
Best to use unsigned or mask the shifted result with an unsigned.
#include <limits.h>
int output2 = Y;
printf("%X\n", (unsigned) output2 >> (sizeof(unsigned)*CHAR_BIT - 12));
// or
printf("%X\n", (output2 >> (sizeof output2 * CHAR_BIT - 12)) & 0x3FFu);
Rare non-2's complement encoded int needs additional code - not shown.
Very rare padded int needs other bit width detection - not shown.

Convert signed int of variable bit size

I have a number of bits (the number of bits can change) in an unsigned int (uint32_t). For example (12 bits in the example):
uint32_t a = 0xF9C;
The bits represent a signed int of that length.
In this case the number in decimal should be -100.
I want to store the variable in a signed variable and gets is actual value.
If I just use:
int32_t b = (int32_t)a;
it will be just the value 3996, since it gets casted to (0x00000F9C) but it actually needs to be (0xFFFFFF9C)
I know one way to do it:
union test
{
signed temp :12;
};
union test x;
x.temp = a;
int32_t result = (int32_t) x.temp;
now i get the correct value -100
But is there a better way to do it?
My solution is not very flexbile, as I mentioned the number of bits can vary (anything between 1-64bits).
But is there a better way to do it?
Well, depends on what you mean by "better". The example below shows a more flexible way of doing it as the size of the bit field isn't fixed. If your use case requires different bit sizes, you could consider it a "better" way.
unsigned sign_extend(unsigned x, unsigned num_bits)
{
unsigned f = ~((1 << (num_bits-1)) - 1);
if (x & f) x = x | f;
return x;
}
int main(void)
{
int x = sign_extend(0xf9c, 12);
printf("%d\n", x);
int y = sign_extend(0x79c, 12);
printf("%d\n", y);
}
Output:
-100
1948
A branch free way to sign extend a bitfield (Henry S. Warren Jr., CACM v20 n6 June 1977) is this:
// value i of bit-length len is a bitfield to sign extend
// i is right aligned and zero-filled to the left
sext = 1 << (len - 1);
i = (i ^ sext) - sext;
UPDATE based on #Lundin's comment
Here's tested code (prints -100):
#include <stdio.h>
#include <stdint.h>
int32_t sign_extend (uint32_t x, int32_t len)
{
int32_t i = (x & ((1u << len) - 1)); // or just x if you know there are no extraneous bits
int32_t sext = 1 << (len - 1);
return (i ^ sext) - sext;
}
int main(void)
{
printf("%d\n", sign_extend(0xF9C, 12));
return 0;
}
This relies on the implementation defined behavior of sign extension when right-shifting signed negative integers. First you shift your unsigned integer all the way left until the sign bit is becoming MSB, then you cast it to signed integer and shift back:
#include <stdio.h>
#include <stdint.h>
#define NUMBER_OF_BITS 12
int main(void) {
uint32_t x = 0xF9C;
int32_t y = (int32_t)(x << (32-NUMBER_OF_BITS)) >> (32-NUMBER_OF_BITS);
printf("%d\n", y);
return 0;
}
This is a solution to your problem:
int32_t sign_extend(uint32_t x, uint32_t bit_size)
{
// The expression (0xffffffff << bit_size) will fill the upper bits to sign extend the number.
// The expression (-(x >> (bit_size-1))) is a mask that will zero the previous expression in case the number was positive (to avoid having an if statemet).
return (0xffffffff << bit_size) & (-(x >> (bit_size-1))) | x;
}
int main()
{
printf("%d\n", sign_extend(0xf9c, 12)); // -100
printf("%d\n", sign_extend(0x7ff, 12)); // 2047
return 0;
}
The sane, portable and effective way to do this is simply to mask out the data part, then fill up everything else with 0xFF... to get proper 2's complement representation. You need to know is how many bits that are the data part.
We can mask out the data with (1u << data_length) - 1.
In this case with data_length = 8, the data mask becomes 0xFF. Lets call this data_mask.
Thus the data part of the number is a & data_mask.
The rest of the number needs to be filled with zeroes. That is, everything not part of the data mask. Simply do ~data_mask to achieve that.
C code: a = (a & data_mask) | ~data_mask. Now a is proper 32 bit 2's complement.
Example:
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
const uint32_t data_length = 8;
const uint32_t data_mask = (1u << data_length) - 1;
uint32_t a = 0xF9C;
a = (a & data_mask) | ~data_mask;
printf("%"PRIX32 "\t%"PRIi32, a, (int32_t)a);
}
Output:
FFFFFF9C -100
This relies on int being 32 bits 2's complement but is otherwise fully portable.

Bit-shift not applying to a variable declaration-assignment one-liner

I'm seeing strange behavior when I try to apply a right bit-shift within a variable declaration/assignment:
unsigned int i = ~0 >> 1;
The result I'm getting is 0xffffffff, as if the >> 1 simply wasn't there. It seems to be something about the ~0, because if I instead do:
unsigned int i = 0xffffffff >> 1;
I get 0x7fffffff as expected. I thought I might be tripping over an operator precedence issue, so tried:
unsigned int i = (~0) >> 1;
but it made no difference. I could just perform the shift in a separate statement, like
unsigned int i = ~0;
i >>= 1;
but I'd like to know what's going on.
update Thanks merlin2011 for pointing me towards an answer. Turns out it was performing an arithmetic shift because it was interpreting ~0 as a signed (negative) value. The simplest fix seems to be:
unsigned int i = ~0u >> 1;
Now I'm wondering why 0xffffffff wasn't also interpreted as a signed value.
It is how c compiler works for signed value. The base literal for number in C is int (in 32-bit machine, it is 32-bit signed int)
You may want to change it to:
unsigned int i = ~(unsigned int)0 >> 1;
The reason is because for the signed value, the compiler would treat the operator >> as an arithmetic shift (or signed shift).
Or, more shortly (pointed out by M.M),
unsigned int i = ~0u >> 1;
Test:
printf("%x", i);
Result:
In unsigned int i = ~0;, ~0 is seen as a signed integer (the compiler should warn about that).
Try this instead:
unsigned int i = (unsigned int)~0 >> 1;

bit shifting in C, unexpected result

when i pass n = 0x0, i get 0xffffffff on the screen
which i expect should be 0x00000000
as i shift the word by 32 bits
(Just Ignore the x! I didn't use it inside the function.)
void logicalShift(int x, int n) {
int y = 32;
int mask = 0xffffffff;
printf("mask %x", mask << (y-n));
}
One of the interesting point is
void logicalShift(int x, int n) {
int y = 32;
int mask = 0xffffffff;
printf("mask %x", mask << 32);
}
this will output what i expected. Do i miss out anything?
Thank you!
Im running on ubuntu
A shift left of 32 bits on a 32 bit value has undefined results. You can only shift 0 to 31 bits.
See also here: Why doesn't left bit-shift, "<<", for 32-bit integers work as expected when used more than 32 times?
Here is the relevant quote from the C11 draft §6.5.7.3;
If the value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior is
undefined.
In other words, the result is undefined, and the compiler is free to generate any result.

C standard on negative zero (1's complement and signed magnitude)

All of these functions gives the expected result on my machine. Do they all work on other platforms?
More specifically, if x has the bit representation 0xffffffff on 1's complement machines or 0x80000000 on signed magnitude machines what does the standard says about the representation of (unsigned)x ?
Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is this correct?
Assume sizeof(int) = 4 and CHAR_BIT = 8
int logicalrightshift_v1 (int x, int n) {
return (unsigned)x >> n;
}
int logicalrightshift_v2 (int x, int n) {
int msb = 0x4000000 << 1;
return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}
int logicalrightshift_v2a (int x, int n) {
return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}
int logicalrightshift_v3 (int x, int n) {
return ((x & 0x7fffffff) >> n) | (x < 0 ? (unsigned)0x80000000 >> n : 0);
}
int logicalrightshift_v4 (int x, int n) {
return ((x & 0x7fffffff) >> n) | (((unsigned)x & 0x80000000) >> n);
}
int logicalrightshift_v5 (int x, int n) {
unsigned y;
*(int *)&y = x;
y >>= n;
*(unsigned *)&x = y;
return x;
}
int logicalrightshift_v6 (int x, int n) {
unsigned y;
memcpy (&y, &x, sizeof (x));
y >>= n;
memcpy (&x, &y, sizeof (x));
return x;
}
If x has the bit representation 0xffffffff on 1's
complement machines or 0x80000000 on signed magnitude machines what
does the standard says about the representation of (unsigned)x ?
The conversion to unsigned is specified in terms of values, not representations. If you convert -1 to unsigned, you always get UINT_MAX (so if your unsigned is 32 bits, you always get 4294967295). This happens regardless of the representation of signed numbers that your implementation uses.
Likewise, if you convert -0 to unsigned then you always get 0. -0 is numerically equal to 0.
Note that a ones complement or sign-magnitude implementation is not required to support negative zeroes; if it does not, then accessing such a representation causes the program to have undefined behaviour.
Going through your functions one-by-one:
int logicalrightshift_v1(int x, int n)
{
return (unsigned)x >> n;
}
The result of this function for negative values of x will depend on UINT_MAX, and will further be implementation-defined if (unsigned)x >> n is not within the range of int. For example, logicalrightshift_v1(-1, 1) will return the value UINT_MAX / 2 regardless of what representation the machine uses for signed numbers.
int logicalrightshift_v2(int x, int n)
{
int msb = 0x4000000 << 1;
return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}
Almost everything about this is could be implementation-defined. Assuming that you are attempting to create a value in msb with 1 in the sign bit and zeroes in the value bits, you cannot do this portably by use of shifts - you can use ~INT_MAX, but this is allowed to have undefined behaviour on a sign-magnitude machine that does not allow negative zeroes, and is allowed to give an implementation-defined result on two's complement machines.
The types of 0x7fffffff and 0x80000000 will depend on the ranges of the various types, which will affect how other values in this expression are promoted.
int logicalrightshift_v2a(int x, int n)
{
return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}
If you create an unsigned value that is not in the range of int (for example, given a 32bit int, values > 0x7fffffff) then the implicit conversion in the return statement produces an implementation-defined value. The same applies to v3 and v4.
int logicalrightshift_v5(int x, int n)
{
unsigned y;
*(int *)&y = x;
y >>= n;
*(unsigned *)&x = y;
return x;
}
This is still implementation defined, because it is unspecified whether the sign bit in the representation of int corresponds to a value bit or a padding bit in the representation of unsigned. If it corresponds to a padding bit it could be a trap representation, in which case the behaviour is undefined.
int logicalrightshift_v6(int x, int n)
{
unsigned y;
memcpy (&y, &x, sizeof (x));
y >>= n;
memcpy (&x, &y, sizeof (x));
return x;
}
The same comments applying to v5 apply to this.
Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is
this correct?
It depends. As a hex constant, 0x80000000 will have type int if that value is within the range of int; otherwise unsigned if that value is within the range of unsigned; otherwise long if that value is within the range of long; otherwise unsigned long (because that value is within the minimum allowed range of unsigned long).
If you wish to ensure that it has unsigned type, then suffix the constant with a U, to 0x80000000U.
Summary:
Converting a number greater than INT_MAX to int gives an implementation-defined result (or indeed, allows an implementation-defined signal to be raised).
Converting an out-of-range number to unsigned is done by repeated addition or subtraction of UINT_MAX + 1, which means it depends on the mathematical value, not the representation.
Inspecting a negative int representation as unsigned is not portable (positive int representations are OK, though).
Generating a negative zero through use of bitwise operators and trying to use the resulting value is not portable.
If you want "logical shifts", then you should be using unsigned types everywhere. The signed types are designed for dealing with algorithms where the value is what matters, not the representation.
If you follow the standard to the word, none of these are guaranteed to be the same on all platforms.
In v5, you violate strict-aliasing, which is undefined behavior.
In v2 - v4, you have signed right-shift, which is implementation defined. (see comments for more details)
In v1, you have signed to unsigned cast, which is implementation defined when the number is out of range.
EDIT:
v6 might actually work given the following assumptions:
'int' is either 2's or 1's complement.
unsigned and int are exactly the same size (in both bytes and bits, and are densely packed).
The endian of unsigned matches that of int.
The padding and bit-layout is the same: (See caf's comment for more details.)

Resources