Why left shift 24 bits changed the value of unsigned long in C? - c

I expect 0b11010010 << 24 should be the same value as 0b11010010000000000000000000000000.
I tested it in C, 0b11010010 << 24 doesn't work as expected if we saved it in c unsigned long.
Does anyone know how C unsigned long works like this?
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
int main(){
unsigned long a = 0b11010010000000000000000000000000;
unsigned long b = 0b11010010 << 24;
bool isTheSame1 = a == b;
printf("isTheSame1 %d \n",isTheSame1);
bool isTheSame2 = 0b11010010000000000000000000000000 == (0b11010010 << 24);
printf("isTheSame2 %d",isTheSame2);
}
isTheSame1 should be 1 but it prints 0 as following
isTheSame1 0
isTheSame2 1
Compiled and executed by gcc main.c && ./a.out
gcc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin22.2.0
Thread model: posix
Updated
As Allan Wind pointed out, I added UL suffix and now it works as expected.
unsigned long a = 0b11010010000000000000000000000000UL;
unsigned long b = 0b11010010UL << 24;
bool isTheSame1 = a == b;
printf("isTheSame1 %d \n",isTheSame1);
bool isTheSame2 = 0b11010010000000000000000000000000UL == (0b11010010UL << 24);
printf("isTheSame2 %d",isTheSame2);

The constant 0b11010010 has type int which is signed. Assuming an int is 32 bits, the expression 0b11010010 << 24 will shift a "1" bit into the sign bit. Doing so triggers undefined behavior which is why you're getting strange results.
Add the UL suffix to the constant to give it type unsigned long, then the shift will work as expected.
unsigned long b = 0b11010010UL << 24;

You are doing a left shift of a signed value (see good answer of #dbush)
In absence of suffixes numbers have int or double types
b = 0b11010010 ; /* type int */
b = 1.0; /* type double */
If you want want b in your example as unsigned long use a suffix:
b = 0b11010010UL; /* type unsigned long */
or a cast:
b = (unsigned long)0b11010010; /* type unsigned long */

With 32-bit (or smaller) int, 0b11010010 << 24 is undefined behaver (UB). It attempts to shift into the sign bit.
When int is 32-bit (common), this often results in a negative value corresponding to the bit pattern 11010010-00000000-00000000-00000000.
When a negative value is saved as an unsigned long, ULONG_MAX + 1 is added to it. With a 64-bit unsigned long the value has the bit pattern:
11111111-11111111-11111111-11111111-11010010-00000000-00000000-00000000
This large unsigned long in not equal to 0b11010010000000000000000000000000UL and so the output of "isTheSame1 0".
Had OP's long been 32-bit, it "might" have worked as OP had intended - yet unfortunately still replying on UB.
Appending an L
32-bit unsigned long: 0b11010010 << 24 suffers the same UB problem as above - yet might have "worked".
64-bit unsigned long: 0b11010010L is also long and 0b11010010L << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
Appending an U
32-bit unsigned: 0b11010010U << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
16-bit unsigned: 0b11010010U << 24 is undefined behavior as the shift is too great. Often the UB results in the same as 0b11010010U << (24-16), yet this is not reliably done.
Appending an UL
32 or 64-bit unsigned long: 0b11010010UL << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
Since the left hand side of the = of the below is unsigned long, better for the right hand side constant to be unsigned long.
unsigned long b = 0b11010010 << 24; // Original
unsigned long b = 0b11010010UL << 24; // Better

Related

A C left shift anomaly in unsigned long long ints

When this code:
// Print out the powers of 2 from 1 to 63.
#include <stdio.h>
int main() {
unsigned long long n = 64;
for (unsigned long long i = 1; i < n; i++) {
unsigned long long po2 = (1 << i);
printf("%llu) %llu/%0llX\n", i, po2, po2);
}
printf("Size of unsigned long long int: %ld.\n", sizeof(unsigned long long int));
}
is run, left shift values up to 30 are correct:
1) 2/2
2) 4/4
3) 8/8
...
28) 268435456/10000000
29) 536870912/20000000
30) 1073741824/40000000
However, once i == 31 (a 32 bit left shift), the results are not correct:
31) 18446744071562067968/FFFFFFFF80000000
32) 1/1
33) 2/2
34) 4/4
35) 8/8
36) 16/10
37) 32/20
...
61) 536870912/20000000
62) 1073741824/40000000
63) 18446744071562067968/FFFFFFFF80000000
Size of unsigned long long int: 8.
This is run on a 64 bit machine, and the C Standard states:
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the width
of the promoted left operand, the behavior is undefined.
Note that the variables, i and n, in the above code are both 64 bit integers, yet they are treated as if they were 32 bit integers!
Look at the << operation in the following line:
unsigned long long po2 = (1 << i);
What is its left operand? Well, that is the int literal, 1, and an int type will not undergo promotion, in that context1. So, the type of the result will be int, as specified in the extract from the Standard that you cited, and that int result will be converted to the required unsigned long long type … but after the overflow (undefined behaviour) has already happened.
To fix the issue, make that literal an unsigned long long, using the uLL suffix:
unsigned long long po2 = (1uLL << i);
1 Some clarity on the "context" from this cppreference page (bold emphasis mine):
Shift Operators
…
First, integer promotions are performed, individually, on each operand (Note: this is unlike
other binary arithmetic operators, which all perform usual arithmetic
conversions). The type of the result is the type of lhs after
promotion.
Adrian Mole is correct. The revised code:
// Print out the powers of 2 from 1 to 63.
#include <stdio.h>
int main() {
unsigned long long n = 64,
one = 1;
for (unsigned long long i = 1; i < n; i++) {
unsigned long long po2 = (one << i);
printf("%llu) %llu/%0llX\n", i, po2, po2);
}
printf("Size of unsigned long long int: %ld.\n", sizeof(unsigned long long int));
}
works.

How to make a hexadecimal number from bytes?

I want to compose the number 0xAAEFCDAB from individual bytes. Everything goes well up to 4 tetrads, and for some reason extra 4 bytes are added with it. What am I doing wrong?
#include <stdio.h>
int main(void) {
unsigned long int a = 0;
a = a | ((0xAB) << 0);
printf("%lX\n", a);
a = a | ((0xCD) << 8);
printf("%lX\n", a);
a = a | ((0xEF) << 16);
printf("%lX\n", a);
a = a | ((0xAA) << 24);
printf("%lX\n", a);
return 0;
}
Output:
Constants in C are actually typed, which might not be obvious at first, and the default type for a constant is an int which is a signed 32-bit integer (it depends on the platform, but it probably is in your case).
In signed numbers, the highest bit describes the sign of the number: 1 is negative and 0 is positive (for more details you can read about two's complement).
When you perform the operation 0xAB << 24 it results in a 32-bit signed value of 0xAB000000 which is equal to 10101011 00000000 00000000 00000000 in binary. As you can see, the highest bit is set to 1, which means that the entire 32-bit signed number is actually negative.
In order to perform the | OR operation between a (which is a 64-bit unsigned number) and a 32-bit signed number, some type conversions must be performed. The size promotion is performed first, and the 32-bit signed value of 0xAB000000 is promoted to a 64-bit signed value of 0xFFFFFFFFAB000000, according to the rules of the two's complement system. This is a 64-bit signed number which has the same numerical value as the 32-bit signed one before conversion.
Afterwards, type conversion is performed from 64-bit signed to 64-bit unsigned value in order to OR the value with a. This fills the top bits with ones and results in the value you see on the screen.
In order to force your constants to be different type than 32-bit signed int you may use suffixes such as u and l, as shown in the website I linked in the beginning of my answer. In your case, a ul suffix should work best, indicating a 64-bit unsigned value. Your lines of code which OR constants with your a variable would then look similarly to this:
a = a | ((0xAAul) << 24);
Alternatively, if you want to limit yourself to 4 bytes only, a 32-bit unsigned int is enough to hold them. In that case, I suggest you change your a variable type to unsigned int and use the u suffix for your constants. Do not forget to change the printf formats to reflect the type change. The resulting code looks like this:
#include <stdio.h>
int main(void) {
unsigned int a = 0;
a = a | ((0xABu) << 0);
printf("%X\n", a);
a = a | ((0xCDu) << 8);
printf("%X\n", a);
a = a | ((0xEFu) << 16);
printf("%X\n", a);
a = a | ((0xAAu) << 24);
printf("%X\n", a);
return 0;
}
My last suggestion is to not use the default int and long types when portability and size in bits are important to you. These types are not guaranteed to have the same amount of bits on all platforms. Instead use types defined in the <stdint.h> header file, in your case probably either a uint64_t or uint32_t. These two are guaranteed to be unsigned integers (their signed counterparts omit the 'u': int64_t and int32_t) while being 64-bit and 32-bit in size respectively on all platforms. For Pros and Cons of using them instead of traditional int and long types I refer you to this Stack Overflow answer.
a = a | ((0xAA) << 24);
((0xAA) << 24) is a negative number (it is int), then it is sign extended to the size of 'unsigned long' which adds those 0xffffffff at the beginning.
You need to tell the compiler that you want an unsigned number.
a = a | ((0xAAU) << 24);
int main(void) {
unsigned long int a = 0;
a = a | ((0xAB) << 0);
printf("%lX\n", a);
a = a | ((0xCD) << 8);
printf("%lX\n", a);
a = a | ((0xEF) << 16);
printf("%lX\n", a);
a = a | ((0xAAUL) << 24);
printf("%lX\n", a);
printf("%d\n", ((0xAA) << 24));
return 0;
}
https://gcc.godbolt.org/z/fjv19bKGc
0xAA gets treated as a signed value when it is scaled up during the bit shifting. Since its high bit is 1 (0xAA = 10101010b), the scaled value is sign extended to 0x...FFFFFFAA before you shift and OR it to a.
You need to cast 0xAA to an unsigned value before bit shifting it, so it gets zero extended instead.

Shifting on Integer Constants shows warning. How to clear this?

Reference: Suffix in Integer Constants
unsigned long long y = 1 << 33;
Results in warning:
left shift count >= width of type [-Wshift-count-overflow]
Two Questions need to be cleared from the above context:
unsigned long long type has 64-bit, why cant we do left shift in it?
how shifting works in int constants('1')?
In C language, 1 is an int which is 32 bits on most platforms. When you try to shift it 33 bits before storing its value in an unsigned long long, that's not going to end well. You can fix this in 2 ways:
Use 1ULL instead, which is an unsigned long long constant:
unsigned long long y = 1ULL << 33;
Assign the value, then shift it:
unsigned long long y = 1;
y <<= 33;
Both are valid, but I'd suggest the first one since it's shorter and you can make y const.

store 2 signed shorts in one unsigned int

This is given:
signed short a, b;
a = -16;
b = 340;
Now I want to store these 2 signed shorts in one unsigned int and later retrieve these 2 signed shorts again. I tried this but the resulting shorts are not the same:
unsigned int c = a << 16 | b;
signed short ar, br;
ar = c >> 16;
br = c & 0xFFFF;
OP almost had it right
#include <assert.h>
#include <limits.h>
unsigned ab_to_c(signed short a, signed short b) {
assert(SHRT_MAX == 32767);
assert(UINT_MAX == 4294967295);
// unsigned int c = a << 16 | b; fails as `b` get sign extended before the `|`.
// *1u insures the shift of `a` is done as `unsigned` to avoid UB
// of shifting into the sign bit.
unsigned c = (a*1u << 16) | (b & 0xFFFF);
return c;
}
void c_to_ab(unsigned c, signed short *a, signed short *b) {
*a = c >> 16;
*b = c & 0xFFFF;
}
Since a has a negative value,
unsigned int c = a << 16 | b;
results in undefined behavior.
From the C99 standard (emphasis mine):
6.5.7 Bitwise shift operators
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 x 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
You can explicitly cast the signed short to unsigned short to get a predictable behavior.
#include <stdio.h>
int main()
{
signed short a, b;
a = -16;
b = 340;
unsigned int c = (unsigned short)a << 16 | (unsigned short)b;
signed short ar, br;
ar = c >> 16;
br = c & 0xFFFF;
printf("ar: %hd, br: %hd\n", ar, br);
}
Output:
ar: -16, br: 340
This is really weird, I've compiled your code and it works for me
perhaps this is undefined behavior I'm not sure, however if I were you
I'd add castings to explicitly avoid some bit loss that may or may not be caused by abusing two complement or compiler auto casting....
In my opinion what's happening is probably you shifting out all the bits in
a... try this
unsigned int c = ((unsigned int) a) << 16 | b;
This is because you are using an unsigned int, which is usually 32 bits and a negative signed short which is usually 16 bits.
When you put a short with a negative value into an unsigned int, that "negative" bit is going to be interpreted as part of a positive number.
And so you get a vastly different number in the unsigned int.
Storing two positive numbers would solve this problem....but you might need to store a negative one.
Not sure if this way of doing is good for portability or others but I use...
#ifndef STDIO_H
#define STDIO_H
#include <stdio.h>
#endif
#ifndef SDTINT_H
#define STDINT_H
#include <stdint.h>
#endif
#ifndef BOOLEAN_TE
#define BOOLEAN_TE
typedef enum {false, true} bool;
#endif
#ifndef UINT32_WIDTH
#define UINT32_WIDTH 32 // defined in stdint.h, inttypes.h even in libc.h... undefined ??
#endif
typedef struct{
struct{ // anonymous struct
uint32_t x;
uint32_t y;
};}ts_point;
typedef struct{
struct{ // anonymous struct
uint32_t line;
uint32_t column;
};}ts_position;
bool is_little_endian()
{
uint8_t n = 1;
return *(char *)&n == 1;
}
int main(void)
{
uint32_t x, y;
uint64_t packed;
ts_point *point;
ts_position *position;
x = -12;
y = 3457254;
printf("at start: x = %i | y = %i\n", x, y);
if (is_little_endian()){
packed = (uint64_t)y << UINT32_WIDTH | (uint64_t)x;
}else{
packed = (uint64_t)x << UINT32_WIDTH | (uint64_t)y;
}
printf("packed: position = %llu\n", packed);
point = (ts_point*)&packed;
printf("unpacked: x = %i | y = %i\n", point->x, point->y); // access via pointer
position = (ts_position*)&packed;
printf("unpacked: line = %i | column = %i\n", position->line, position->column);
return 0;
}
I like the way I do as it's offer lots of readiness and can be applied in manay ways ie. 02x32, 04x16, 08x08, etc. I'm new at C so feel free to critic my code and way of doing... thanks

Concatenate two 32bit numbers to get a 64bit result

I need to concatenate two hexadecimal numbers 32 bits each each, to get a final result of 64 bits.
I tried the following code but didn't get a good result:
unsigned long a,b;
unsigned long long c;
c = (unsigned long long) (a << 32 | b);
Can anybody help me please?
Thanks.
Use proper fixed size types and be careful about type promotion and operator precedence, e.g.
#include <stdint.h>
uint32_t a, b;
uint64_t c;
c = ((uint64_t)a << 32) | b;
You need to cast a to long long before shifting it:
unsigned long long c = ((unsigned long long)a << 32 | b);
Shortest form is:
c = a+0ULL<<32|b
The third line should be changed to
((unsigned long long)a) << 32 | ((unsigned long long) b)
What your current code is doing, is taking the 32-bit variable a and shifting it 32 bits to the left (making its value 0, because the bottom 32 bits are all empty), then or-ing it with the 32-bit variable b.
What the changed version does is to case the 32-bit variable a to 64 bits, shift it 32 bits to the left, cast the 32-bit variable b to 64 bits, then or the two 64-bit variables together. The result is naturally 64 bits.
I would imagine that this would do the trick:
typedef unsigned long U64 ; // your unsigned 64-bit int typedef here
typedef unsigned int U32 ; // your unsigned 32-bit int typedef here
U64 join( U32 a , U32 b )
{
U64 result = ((U64)a) << 32
| ((U64)b)
;
return result ;
}
I'll leave to you to divine the appropriate typedefs for U64 and U32.

Resources