A C left shift anomaly in unsigned long long ints - c

When this code:
// Print out the powers of 2 from 1 to 63.
#include <stdio.h>
int main() {
unsigned long long n = 64;
for (unsigned long long i = 1; i < n; i++) {
unsigned long long po2 = (1 << i);
printf("%llu) %llu/%0llX\n", i, po2, po2);
}
printf("Size of unsigned long long int: %ld.\n", sizeof(unsigned long long int));
}
is run, left shift values up to 30 are correct:
1) 2/2
2) 4/4
3) 8/8
...
28) 268435456/10000000
29) 536870912/20000000
30) 1073741824/40000000
However, once i == 31 (a 32 bit left shift), the results are not correct:
31) 18446744071562067968/FFFFFFFF80000000
32) 1/1
33) 2/2
34) 4/4
35) 8/8
36) 16/10
37) 32/20
...
61) 536870912/20000000
62) 1073741824/40000000
63) 18446744071562067968/FFFFFFFF80000000
Size of unsigned long long int: 8.
This is run on a 64 bit machine, and the C Standard states:
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the width
of the promoted left operand, the behavior is undefined.
Note that the variables, i and n, in the above code are both 64 bit integers, yet they are treated as if they were 32 bit integers!

Look at the << operation in the following line:
unsigned long long po2 = (1 << i);
What is its left operand? Well, that is the int literal, 1, and an int type will not undergo promotion, in that context1. So, the type of the result will be int, as specified in the extract from the Standard that you cited, and that int result will be converted to the required unsigned long long type … but after the overflow (undefined behaviour) has already happened.
To fix the issue, make that literal an unsigned long long, using the uLL suffix:
unsigned long long po2 = (1uLL << i);
1 Some clarity on the "context" from this cppreference page (bold emphasis mine):
Shift Operators
…
First, integer promotions are performed, individually, on each operand (Note: this is unlike
other binary arithmetic operators, which all perform usual arithmetic
conversions). The type of the result is the type of lhs after
promotion.

Adrian Mole is correct. The revised code:
// Print out the powers of 2 from 1 to 63.
#include <stdio.h>
int main() {
unsigned long long n = 64,
one = 1;
for (unsigned long long i = 1; i < n; i++) {
unsigned long long po2 = (one << i);
printf("%llu) %llu/%0llX\n", i, po2, po2);
}
printf("Size of unsigned long long int: %ld.\n", sizeof(unsigned long long int));
}
works.

Related

Why left shift 24 bits changed the value of unsigned long in C?

I expect 0b11010010 << 24 should be the same value as 0b11010010000000000000000000000000.
I tested it in C, 0b11010010 << 24 doesn't work as expected if we saved it in c unsigned long.
Does anyone know how C unsigned long works like this?
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
int main(){
unsigned long a = 0b11010010000000000000000000000000;
unsigned long b = 0b11010010 << 24;
bool isTheSame1 = a == b;
printf("isTheSame1 %d \n",isTheSame1);
bool isTheSame2 = 0b11010010000000000000000000000000 == (0b11010010 << 24);
printf("isTheSame2 %d",isTheSame2);
}
isTheSame1 should be 1 but it prints 0 as following
isTheSame1 0
isTheSame2 1
Compiled and executed by gcc main.c && ./a.out
gcc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin22.2.0
Thread model: posix
Updated
As Allan Wind pointed out, I added UL suffix and now it works as expected.
unsigned long a = 0b11010010000000000000000000000000UL;
unsigned long b = 0b11010010UL << 24;
bool isTheSame1 = a == b;
printf("isTheSame1 %d \n",isTheSame1);
bool isTheSame2 = 0b11010010000000000000000000000000UL == (0b11010010UL << 24);
printf("isTheSame2 %d",isTheSame2);
The constant 0b11010010 has type int which is signed. Assuming an int is 32 bits, the expression 0b11010010 << 24 will shift a "1" bit into the sign bit. Doing so triggers undefined behavior which is why you're getting strange results.
Add the UL suffix to the constant to give it type unsigned long, then the shift will work as expected.
unsigned long b = 0b11010010UL << 24;
You are doing a left shift of a signed value (see good answer of #dbush)
In absence of suffixes numbers have int or double types
b = 0b11010010 ; /* type int */
b = 1.0; /* type double */
If you want want b in your example as unsigned long use a suffix:
b = 0b11010010UL; /* type unsigned long */
or a cast:
b = (unsigned long)0b11010010; /* type unsigned long */
With 32-bit (or smaller) int, 0b11010010 << 24 is undefined behaver (UB). It attempts to shift into the sign bit.
When int is 32-bit (common), this often results in a negative value corresponding to the bit pattern 11010010-00000000-00000000-00000000.
When a negative value is saved as an unsigned long, ULONG_MAX + 1 is added to it. With a 64-bit unsigned long the value has the bit pattern:
11111111-11111111-11111111-11111111-11010010-00000000-00000000-00000000
This large unsigned long in not equal to 0b11010010000000000000000000000000UL and so the output of "isTheSame1 0".
Had OP's long been 32-bit, it "might" have worked as OP had intended - yet unfortunately still replying on UB.
Appending an L
32-bit unsigned long: 0b11010010 << 24 suffers the same UB problem as above - yet might have "worked".
64-bit unsigned long: 0b11010010L is also long and 0b11010010L << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
Appending an U
32-bit unsigned: 0b11010010U << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
16-bit unsigned: 0b11010010U << 24 is undefined behavior as the shift is too great. Often the UB results in the same as 0b11010010U << (24-16), yet this is not reliably done.
Appending an UL
32 or 64-bit unsigned long: 0b11010010UL << 24 becomes the value 0b11010010000000000000000000000000, the same value as a.
Since the left hand side of the = of the below is unsigned long, better for the right hand side constant to be unsigned long.
unsigned long b = 0b11010010 << 24; // Original
unsigned long b = 0b11010010UL << 24; // Better

Shifting on Integer Constants shows warning. How to clear this?

Reference: Suffix in Integer Constants
unsigned long long y = 1 << 33;
Results in warning:
left shift count >= width of type [-Wshift-count-overflow]
Two Questions need to be cleared from the above context:
unsigned long long type has 64-bit, why cant we do left shift in it?
how shifting works in int constants('1')?
In C language, 1 is an int which is 32 bits on most platforms. When you try to shift it 33 bits before storing its value in an unsigned long long, that's not going to end well. You can fix this in 2 ways:
Use 1ULL instead, which is an unsigned long long constant:
unsigned long long y = 1ULL << 33;
Assign the value, then shift it:
unsigned long long y = 1;
y <<= 33;
Both are valid, but I'd suggest the first one since it's shorter and you can make y const.

C: unsigned long int not behaving as expected [duplicate]

This question already has answers here:
Weird result after assigning 2^31 to a signed and unsigned 32-bit integer variable
(3 answers)
Closed 3 years ago.
This code is not behaving as expected. It simply tries to set bit 31 in an unsigned long int.
int main() {
printf("sizeof(unsigned long int) is %ld bytes\n", sizeof(unsigned long int));
unsigned long int v = 1 << 30;
printf("v is (%lx)\n", v);
v = 1 << 31;
printf("v is (%lx)\n", v);
}
Here is the output:
sizeof(unsigned long int) is 8 bytes
v is (40000000)
v is (ffffffff80000000)
Can anyone explain this? Maybe a problem with the printf formatting?
In v = 1 << 31;, 1 is not an unsigned long int. It is an int. Shifting it by 31 bits overflows the int type (in your C implementation).
To get an unsigned long int with a 1 in bit 31, you should shift an unsigned long int by 31 bits: v = (unsigned long int) 1 << 31; or v = 1ul << 31.

Iterate bits from left to right for any number

I am trying to implement Modular Exponentiation (square and multiply left to right) algorithm in c.
In order to iterate the bits from left to right, I can use masking which is explained in this link
In this example mask used is 0x80 which can work only for a number with max 8 bits.
In order to make it work for any number of bits, I need to assign mask dynamically but this makes it a bit complicated.
Is there any other solution by which it can be done.
Thanks in advance!
-------------EDIT-----------------------
long long base = 23;
long long exponent = 297;
long long mod = 327;
long long result = 1;
unsigned int mask;
for (mask = 0x80; mask != 0; mask >>= 1) {
result = (result * result) % mod; // Square
if (exponent & mask) {
result = (base * result) % mod; // Mul
}
}
As in this example, it will not work if I will use mask 0x80 but if I use 0x100 then it works fine.
Selecting the mask value at run time seems to be an overhead.
If you want to iterate over all bits, you first have to know how many bits there are in your type.
This is a surprisingly complicated matter:
sizeof gives you the number of bytes, but a byte can have more than 8 bits.
limits.h gives you CHAR_BIT to know the number of bits in a byte, but even if you multiply this by the sizeof your type, the result could still be wrong because unsigned types are allowed to contain padding bits that are not part of the number representation, while sizeof returns the storage size in bytes, which includes these padding bits.
Fortunately, this answer has an ingenious macro that can calculate the number of actual value bits based on the maximum value of the respective type:
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))
The maximum value of an unsigned type is surprisingly easy to get: just cast -1 to your unsigned type.
So, all in all, your code could look like this, including the macro above:
#define UNSIGNED_BITS IMAX_BITS((unsigned)-1)
// [...]
unsigned int mask;
for (mask = 1 << (UNSIGNED_BITS-1); mask != 0; mask >>= 1) {
// [...]
}
Note that applying this complicated macro has no runtime drawback at all, it's a compile-time constant.
Your algorithm seems unnecessarily complicated: bits from the exponent can be tested from the least significant to the most significant in a way that does not depend on the integer type nor its maximum value. Here is a simple implementation that does not need any special case for any size integers:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
unsigned long long base = (argc > 1) ? strtoull(argv[1], NULL, 0) : 23;
unsigned long long exponent = (argc > 2) ? strtoull(argv[2], NULL, 0) : 297;
unsigned long long mod = (argc > 3) ? strtoull(argv[3], NULL, 0) : 327;
unsigned long long y = exponent;
unsigned long long x = base;
unsigned long long result = 1;
for (;;) {
if (y & 1) {
result = result * x % mod;
}
if ((y >>= 1) == 0)
break;
x = x * x % mod;
}
printf("expmod(%llu, %llu, %llu) = %llu\n", base, exponent, mod, result);
return 0;
}
Without any command line arguments, it produces: expmod(23, 297, 327) = 185. You can try other numbers by passing the base, exponent and modulo as command line arguments.
EDIT:
If you must scan the bits in exponent from most significant to least significant, mask should be defined as the same type as exponent and initialized this way if the type is unsigned:
unsigned long long exponent = 297;
unsigned long long mask = 0;
mask = ~mask - (~mask >> 1);
If the type is signed, for complete portability, you must use the definition for its maximum value from <limits.h>. Note however that it would be more efficient to use the unsigned type.
long long exponent = 297;
long long mask = LLONG_MAX - (LLONG_MAX >> 1);
The loop will waste time running through all the most significant 0 bits, so a simpler loop could be used first to skip these bits:
while (mask > exponent) {
mask >>= 1;
}

Can I use CHAR_BIT as the basis for determining the number of bits in other types?

For example, does the following code make no assumptions that might be incorrect on certain systems?
// Number of bits in an unsigned long long int:
const int ULLONG_BIT = sizeof(unsigned long long int) * CHAR_BIT;
I agree with PSkocik's comment to the original question. C11 6.2.6 says CHAR_BIT * sizeof (type) yields the number of bits in the object representation of type type, but some of them may be padding bits.
I suspect that your best bet for a "no-assumptions" code is to simply check the value of ULLONG_MAX (or ~0ULL or (unsigned long long)(-1LL), which should all evaluate to the same value):
#include <limits.h>
static inline int ullong_bit(void)
{
unsigned long long m = ULLONG_MAX;
int n = 0, i = 0;
while (m) {
n += m & 1;
i ++;
m >>= 1;
}
if (n == i)
return i;
else
return i-1;
}
If the binary pattern for the value is all ones, then the number of bits an unsigned long long can hold is the same as the number of binary digits in the value.
Otherwise, the most significant bit cannot really be used, because the maximum value in binary contains zeros.

Resources