C: unsigned long int not behaving as expected [duplicate] - c

This question already has answers here:
Weird result after assigning 2^31 to a signed and unsigned 32-bit integer variable
(3 answers)
Closed 3 years ago.
This code is not behaving as expected. It simply tries to set bit 31 in an unsigned long int.
int main() {
printf("sizeof(unsigned long int) is %ld bytes\n", sizeof(unsigned long int));
unsigned long int v = 1 << 30;
printf("v is (%lx)\n", v);
v = 1 << 31;
printf("v is (%lx)\n", v);
}
Here is the output:
sizeof(unsigned long int) is 8 bytes
v is (40000000)
v is (ffffffff80000000)
Can anyone explain this? Maybe a problem with the printf formatting?

In v = 1 << 31;, 1 is not an unsigned long int. It is an int. Shifting it by 31 bits overflows the int type (in your C implementation).
To get an unsigned long int with a 1 in bit 31, you should shift an unsigned long int by 31 bits: v = (unsigned long int) 1 << 31; or v = 1ul << 31.

Related

Why left shift 24 bits changed the value of unsigned long in C?

I expect 0b11010010 << 24 should be the same value as 0b11010010000000000000000000000000.
I tested it in C, 0b11010010 << 24 doesn't work as expected if we saved it in c unsigned long.
Does anyone know how C unsigned long works like this?
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
int main(){
unsigned long a = 0b11010010000000000000000000000000;
unsigned long b = 0b11010010 << 24;
bool isTheSame1 = a == b;
printf("isTheSame1 %d \n",isTheSame1);
bool isTheSame2 = 0b11010010000000000000000000000000 == (0b11010010 << 24);
printf("isTheSame2 %d",isTheSame2);
}
isTheSame1 should be 1 but it prints 0 as following
isTheSame1 0
isTheSame2 1
Compiled and executed by gcc main.c && ./a.out
gcc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin22.2.0
Thread model: posix
Updated
As Allan Wind pointed out, I added UL suffix and now it works as expected.
unsigned long a = 0b11010010000000000000000000000000UL;
unsigned long b = 0b11010010UL << 24;
bool isTheSame1 = a == b;
printf("isTheSame1 %d \n",isTheSame1);
bool isTheSame2 = 0b11010010000000000000000000000000UL == (0b11010010UL << 24);
printf("isTheSame2 %d",isTheSame2);
The constant 0b11010010 has type int which is signed. Assuming an int is 32 bits, the expression 0b11010010 << 24 will shift a "1" bit into the sign bit. Doing so triggers undefined behavior which is why you're getting strange results.
Add the UL suffix to the constant to give it type unsigned long, then the shift will work as expected.
unsigned long b = 0b11010010UL << 24;
You are doing a left shift of a signed value (see good answer of #dbush)
In absence of suffixes numbers have int or double types
b = 0b11010010 ; /* type int */
b = 1.0; /* type double */
If you want want b in your example as unsigned long use a suffix:
b = 0b11010010UL; /* type unsigned long */
or a cast:
b = (unsigned long)0b11010010; /* type unsigned long */
With 32-bit (or smaller) int, 0b11010010 << 24 is undefined behaver (UB). It attempts to shift into the sign bit.
When int is 32-bit (common), this often results in a negative value corresponding to the bit pattern 11010010-00000000-00000000-00000000.
When a negative value is saved as an unsigned long, ULONG_MAX + 1 is added to it. With a 64-bit unsigned long the value has the bit pattern:
11111111-11111111-11111111-11111111-11010010-00000000-00000000-00000000
This large unsigned long in not equal to 0b11010010000000000000000000000000UL and so the output of "isTheSame1 0".
Had OP's long been 32-bit, it "might" have worked as OP had intended - yet unfortunately still replying on UB.
Appending an L
32-bit unsigned long: 0b11010010 << 24 suffers the same UB problem as above - yet might have "worked".
64-bit unsigned long: 0b11010010L is also long and 0b11010010L << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
Appending an U
32-bit unsigned: 0b11010010U << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
16-bit unsigned: 0b11010010U << 24 is undefined behavior as the shift is too great. Often the UB results in the same as 0b11010010U << (24-16), yet this is not reliably done.
Appending an UL
32 or 64-bit unsigned long: 0b11010010UL << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
Since the left hand side of the = of the below is unsigned long, better for the right hand side constant to be unsigned long.
unsigned long b = 0b11010010 << 24; // Original
unsigned long b = 0b11010010UL << 24; // Better

Using bitwise operators to change all bits of the most significant byte to 1

Let's suppose I have an unsigned int x = 0x87654321. How can I use bitwise operators to change the most significant byte (the leftmost 8 bits) of this number to 1?
So, instead of 0x87654321, I would have 0xFF654321?
As an unsigned in C may be 32 bits, 16 bits or other sizes, best to drive code without assuming the width.
The value UINT_MAX has all value bits set.
A "byte" in C is CHAR_BIT wide - usually 8.
UINT_MAX ^ (UINT_MAX >> CHAR_BIT) or ~(UINT_MAX >> CHAR_BIT) is the desired mask.
#include <limits.h>
#include <stdio.h>
#define UPPER_BYTE_MASK (UINT_MAX ^ (UINT_MAX >> CHAR_BIT))
// or
#define UPPER_BYTE_MASK (~(UINT_MAX >> CHAR_BIT))
int main() {
unsigned value = 0x87654321;
printf("%X\n", value | UPPER_BYTE_MASK);
}
#define MSB1(x) ((x) | (((1ULL << CHAR_BIT) - 1)<< ((sizeof(x) - 1) * CHAR_BIT)))
int main(void)
{
char x;
short y;
int z;
long q;
long long l;
printf("0x%llx\n", (unsigned long long)MSB1(x));
printf("0x%llx\n", (unsigned long long)MSB1(y));
printf("0x%llx\n", (unsigned long long)MSB1(z));
printf("0x%llx\n", (unsigned long long)MSB1(q));
printf("0x%llx\n", (unsigned long long)MSB1(l));
l = MSB1(l);
}
If you know the size of the integer, you can simply use something like
x |= 0xFF000000;
If not, you'll need to calculate the mask. One way:
x |= UINT_MAX - ( UINT_MAX >> 8 );

A C left shift anomaly in unsigned long long ints

When this code:
// Print out the powers of 2 from 1 to 63.
#include <stdio.h>
int main() {
unsigned long long n = 64;
for (unsigned long long i = 1; i < n; i++) {
unsigned long long po2 = (1 << i);
printf("%llu) %llu/%0llX\n", i, po2, po2);
}
printf("Size of unsigned long long int: %ld.\n", sizeof(unsigned long long int));
}
is run, left shift values up to 30 are correct:
1) 2/2
2) 4/4
3) 8/8
...
28) 268435456/10000000
29) 536870912/20000000
30) 1073741824/40000000
However, once i == 31 (a 32 bit left shift), the results are not correct:
31) 18446744071562067968/FFFFFFFF80000000
32) 1/1
33) 2/2
34) 4/4
35) 8/8
36) 16/10
37) 32/20
...
61) 536870912/20000000
62) 1073741824/40000000
63) 18446744071562067968/FFFFFFFF80000000
Size of unsigned long long int: 8.
This is run on a 64 bit machine, and the C Standard states:
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the width
of the promoted left operand, the behavior is undefined.
Note that the variables, i and n, in the above code are both 64 bit integers, yet they are treated as if they were 32 bit integers!
Look at the << operation in the following line:
unsigned long long po2 = (1 << i);
What is its left operand? Well, that is the int literal, 1, and an int type will not undergo promotion, in that context1. So, the type of the result will be int, as specified in the extract from the Standard that you cited, and that int result will be converted to the required unsigned long long type … but after the overflow (undefined behaviour) has already happened.
To fix the issue, make that literal an unsigned long long, using the uLL suffix:
unsigned long long po2 = (1uLL << i);
1 Some clarity on the "context" from this cppreference page (bold emphasis mine):
Shift Operators
…
First, integer promotions are performed, individually, on each operand (Note: this is unlike
other binary arithmetic operators, which all perform usual arithmetic
conversions). The type of the result is the type of lhs after
promotion.
Adrian Mole is correct. The revised code:
// Print out the powers of 2 from 1 to 63.
#include <stdio.h>
int main() {
unsigned long long n = 64,
one = 1;
for (unsigned long long i = 1; i < n; i++) {
unsigned long long po2 = (one << i);
printf("%llu) %llu/%0llX\n", i, po2, po2);
}
printf("Size of unsigned long long int: %ld.\n", sizeof(unsigned long long int));
}
works.

When R.H.S have negative int and unsigned int outside the range of int in arithmetic operation

I apologize for the title since I had to somehow find a unique one.
Consider the code below:
#include<stdio.h>
int main(void)
{
int b = 2147483648; // To show the maximum value of int type here is 2147483647
printf("%d\n",b);
unsigned int a = 2147483650;
unsigned int c = a+(-1);
printf("%u\n",c);
}
The output of the above program when run on a 64 bit OS with gcc compiler is:
-2147483648
2147483649
Please see my understanding of the case:
Unsigned int a is outside the range of signed int type. In the R.H.S (-1) will converted to unsigned int since the operands are of different types. The result of converting -1 to unsigned int is:
-1 + (unsigned int MAX_UINT +1) = unsigned int MAX_UINT = 4294967295.
Now R.H.S will be:
unsigned int MAX_UINT + 2147483650
Now this looks like it is outside the range of unsigned int. I do not know how to proceed from here and it looks like even if I proceed with this explanation I will not reach the empirical output.
Please give a proper explanation.
PS: To know how int b = 2147483648 became -2147483648 is not my intention. I just added that line in the code so it is pretty clear that 2147483650
is outside the range of int.
2147483648 is not a 32-bit int, it is just above INT_MAX whose value is 2147483647 on such platforms.
int b = 2147483648; is implementation defined. On your platform, it seems to perform 32-bit wrap around, which is typical of two's complement architectures but not guaranteed by the C Standard.
As a consequence printf("%d\n", b); outputs -2147483648.
The rest of the code is perfectly defined on 32-bit systems, and the output 2147483649 is correct and expected. The fact that the OS by 64 bit plays a very subtle role in the evaluation steps but is mostly irrelevant to the actual result, which is fully defined by the C Standard.
Here are steps:
unsigned int a = 2147483650; no surprise here, a is an unsigned int and its initializer is either an int, a long int or a long long int depending on which of these types has at least 32 value bits. On Windows and 32-bit linux, it would be long long int whereas on 64-bit linux it would be long int. The value is truncated to 32-bit upon storing to the unsigned int variable.
You can verify these steps by adding this code:
printf("sizeof(2147483650) -> %d\n", (int)sizeof(2147483650));
printf(" sizeof(a) -> %d\n", (int)sizeof(a));
The second definition unsigned int c = a+(-1); undergoes the same steps:
c is defined as an unsigned int and its initializer is truncated to 32 bits when stored into c. The initializer is an addition:
the first term is an unsigned int with value 2147483650U.
the second term is a parenthesized expression with the unary negation of an int with value 1. Hence it is an int with value -1 as you correctly analyzed.
the second term is converted to unsigned int: conversion is performed modulo 232, hence the value is 4294967295U.
the addition is then performed using unsigned arithmetics, which is specified as taking place modulo the width of the unsigned int type, hence the result is an unsigned int with value 2147483649U, (6442450945 modulo 232)
This unsigned int value is stored into c and prints correctly with printf("%u\n", c); as 2147483649.
If the expression had been instead 2147483650 + (-1), the computation would have taken place in 64 bits signed arithmetics, with type long int or long long int depending on the architecture, with a result of 2147483649. This value would then be truncated to 32-bits when stored into c, hence the same value for c as 2147483649.
Note that the above steps do not depend on the actual representation of negative values. They are fully defined for all architectures, only the width of type int matters.
You can verify these steps with extra code. Here is a complete instrumented program to illustrate these steps:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("\n");
printf(" sizeof(int) -> %d\n", (int)sizeof(int));
printf(" sizeof(unsigned int) -> %d\n", (int)sizeof(unsigned int));
printf(" sizeof(long int) -> %d\n", (int)sizeof(long int));
printf(" sizeof(long long int) -> %d\n", (int)sizeof(long long int));
printf("\n");
int b = 2147483647; // To show the maximum value of int type here is 2147483647
printf(" int b = 2147483647;\n");
printf(" b -> %d\n",b);
printf(" sizeof(b) -> %d\n", (int)sizeof(b));
printf(" sizeof(2147483647) -> %d\n", (int)sizeof(2147483647));
printf(" sizeof(2147483648) -> %d\n", (int)sizeof(2147483648));
printf(" sizeof(2147483648U) -> %d\n", (int)sizeof(2147483648U));
printf("\n");
unsigned int a = 2147483650;
printf(" unsigned int a = 2147483650;\n");
printf(" a -> %u\n", a);
printf(" sizeof(2147483650U) -> %d\n", (int)sizeof(2147483650U));
printf(" sizeof(2147483650) -> %d\n", (int)sizeof(2147483650));
printf("\n");
unsigned int c = a+(-1);
printf(" unsigned int c = a+(-1);\n");
printf(" c -> %u\n", c);
printf(" sizeof(c) -> %d\n", (int)sizeof(c));
printf(" a+(-1) -> %u\n", a+(-1));
printf(" sizeof(a+(-1)) -> %d\n", (int)sizeof(a+(-1)));
#if LONG_MAX == 2147483647
printf(" 2147483650+(-1) -> %lld\n", 2147483650+(-1));
#else
printf(" 2147483650+(-1) -> %ld\n", 2147483650+(-1));
#endif
printf(" sizeof(2147483650+(-1)) -> %d\n", (int)sizeof(2147483650+(-1)));
printf(" 2147483650U+(-1) -> %u\n", 2147483650U+(-1));
printf("sizeof(2147483650U+(-1)) -> %d\n", (int)sizeof(2147483650U+(-1)));
printf("\n");
return 0;
}
Output:
sizeof(int) -> 4
sizeof(unsigned int) -> 4
sizeof(long int) -> 8
sizeof(long long int) -> 8
int b = 2147483647;
b -> 2147483647
sizeof(b) -> 4
sizeof(2147483647) -> 4
sizeof(2147483648) -> 8
sizeof(2147483648U) -> 4
unsigned int a = 2147483650;
a -> 2147483650
sizeof(2147483650U) -> 4
sizeof(2147483650) -> 8
unsigned int c = a+(-1);
c -> 2147483649
sizeof(c) -> 4
a+(-1) -> 2147483649
sizeof(a+(-1)) -> 4
2147483650+(-1) -> 2147483649
sizeof(2147483650+(-1)) -> 8
2147483650U+(-1) -> 2147483649
sizeof(2147483650U+(-1)) -> 4
int b = 2147483648;
printf("%d\n",b);
// -2147483648
Conversion of an integer (any signed or unsigned) that is outside the range of the target signed type:
... either the result is implementation-defined or an implementation-defined signal is raised. C11 §6.3.1.3 3
In your case with the signed integer 2147483648, the implementation-defined behavior appears to map the lowest 32-bits of the source 2147483648 to your int's 32-bits. This may not be the result with another compiler.
a+(-1) is the same as a + (-(1u)) same as a + (-1u + UINT_MAX + 1u) same as a + UINT_MAX. The addition overflows the unsigned range, yet unsigned overflow wraps around. So the sum is 2147483649 before the assignment. With the below code, there is no out of range conversion. The only conversion is signed 1 to unsigned 1 and long 2147483650 (or long long 2147483650) to unsigned 2147483650. Both in range conversions.
unsigned int a = 2147483650;
unsigned int c = a+(-1);
printf("%u\n",c);
// 2147483649
Look at it like this
2147483650 0x80000002
+ -1 +0xFFFFFFFF
---------- ----------
2147483649 0x80000001
Where does the 0xFFFFFFFF come from? Well, 0 is 0x00000000, and if you subtract 1 from that you get 0xFFFFFFFF because unsigned arithmetic is well-defined to "wrap".
Or taking your decimal version further, 0 - 1 is UINT_MAX because unsigned int wraps, and so does the sum.
your value 2147483650
UINT_MAX + 4294967295
----------
6442450945
modulo 2^32 % 4294967296
----------
2147483649

store 2 signed shorts in one unsigned int

This is given:
signed short a, b;
a = -16;
b = 340;
Now I want to store these 2 signed shorts in one unsigned int and later retrieve these 2 signed shorts again. I tried this but the resulting shorts are not the same:
unsigned int c = a << 16 | b;
signed short ar, br;
ar = c >> 16;
br = c & 0xFFFF;
OP almost had it right
#include <assert.h>
#include <limits.h>
unsigned ab_to_c(signed short a, signed short b) {
assert(SHRT_MAX == 32767);
assert(UINT_MAX == 4294967295);
// unsigned int c = a << 16 | b; fails as `b` get sign extended before the `|`.
// *1u insures the shift of `a` is done as `unsigned` to avoid UB
// of shifting into the sign bit.
unsigned c = (a*1u << 16) | (b & 0xFFFF);
return c;
}
void c_to_ab(unsigned c, signed short *a, signed short *b) {
*a = c >> 16;
*b = c & 0xFFFF;
}
Since a has a negative value,
unsigned int c = a << 16 | b;
results in undefined behavior.
From the C99 standard (emphasis mine):
6.5.7 Bitwise shift operators
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 x 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
You can explicitly cast the signed short to unsigned short to get a predictable behavior.
#include <stdio.h>
int main()
{
signed short a, b;
a = -16;
b = 340;
unsigned int c = (unsigned short)a << 16 | (unsigned short)b;
signed short ar, br;
ar = c >> 16;
br = c & 0xFFFF;
printf("ar: %hd, br: %hd\n", ar, br);
}
Output:
ar: -16, br: 340
This is really weird, I've compiled your code and it works for me
perhaps this is undefined behavior I'm not sure, however if I were you
I'd add castings to explicitly avoid some bit loss that may or may not be caused by abusing two complement or compiler auto casting....
In my opinion what's happening is probably you shifting out all the bits in
a... try this
unsigned int c = ((unsigned int) a) << 16 | b;
This is because you are using an unsigned int, which is usually 32 bits and a negative signed short which is usually 16 bits.
When you put a short with a negative value into an unsigned int, that "negative" bit is going to be interpreted as part of a positive number.
And so you get a vastly different number in the unsigned int.
Storing two positive numbers would solve this problem....but you might need to store a negative one.
Not sure if this way of doing is good for portability or others but I use...
#ifndef STDIO_H
#define STDIO_H
#include <stdio.h>
#endif
#ifndef SDTINT_H
#define STDINT_H
#include <stdint.h>
#endif
#ifndef BOOLEAN_TE
#define BOOLEAN_TE
typedef enum {false, true} bool;
#endif
#ifndef UINT32_WIDTH
#define UINT32_WIDTH 32 // defined in stdint.h, inttypes.h even in libc.h... undefined ??
#endif
typedef struct{
struct{ // anonymous struct
uint32_t x;
uint32_t y;
};}ts_point;
typedef struct{
struct{ // anonymous struct
uint32_t line;
uint32_t column;
};}ts_position;
bool is_little_endian()
{
uint8_t n = 1;
return *(char *)&n == 1;
}
int main(void)
{
uint32_t x, y;
uint64_t packed;
ts_point *point;
ts_position *position;
x = -12;
y = 3457254;
printf("at start: x = %i | y = %i\n", x, y);
if (is_little_endian()){
packed = (uint64_t)y << UINT32_WIDTH | (uint64_t)x;
}else{
packed = (uint64_t)x << UINT32_WIDTH | (uint64_t)y;
}
printf("packed: position = %llu\n", packed);
point = (ts_point*)&packed;
printf("unpacked: x = %i | y = %i\n", point->x, point->y); // access via pointer
position = (ts_position*)&packed;
printf("unpacked: line = %i | column = %i\n", position->line, position->column);
return 0;
}
I like the way I do as it's offer lots of readiness and can be applied in manay ways ie. 02x32, 04x16, 08x08, etc. I'm new at C so feel free to critic my code and way of doing... thanks

Resources