I Google'd so hard before I wrote this question. I am an "ok" C & C++ programmer, but not expert level. Everything that I read tells me that unsigned integer overflow is safe in C. However, signed integer overflow is "undefined behaviour" (UB). Oh, dreaded UB!
Related: Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
Win32 API:
LONG InterlockedIncrement(
[in, out] LONG volatile *Addend
)
Ref: https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-interlockedincrement
To be clear, LONG is defined as signed 32-bit int.
By inspection (not defintion/docs), this Win32 API appears to support signed integer overflow.
Example code:
#include <stdio.h>
#include <windows.h>
#include <limits.h>
int main(int argc, char **argv)
{
printf("INT_MAX: %d\n", INT_MAX);
LONG zz = INT_MAX;
// Be careful about: 1 + 2,147,483,647 -> -2,147,483,648
const LONG zzz = InterlockedIncrement (&zz);
printf("INT_MAX+1: %d and %d\n", ((LONG) 1) + INT_MAX, zzz);
return 0;
}
When I compile on Cygwin with gcc, then run, I see:
INT_MAX: 2147483647
INT_MAX+1: -2147483648 and -2147483648
I am confused.
Do I misunderstand the rules of signed integer overflow for C?
Does Windows have special rules?
Is this argument theoretical in 2022 for Real World systems -- everything is 2's complement now?
Related
What is the size of long long integer in a 32-bit computer?
#include<stdio.h>
int main()
{
unsigned long long var = 0LL;
printf("%d",sizeof(var));
return 0;
}
What is the size of long long integer in a 32-bit computer?
The type of computer is irrelevant. The value of the long long variable/object is irrelevant. The issues are the compiler and the C spec.
C requires long long to represent a minimum range or [-9223372036854775807 ... 9223372036854775807], that takes at least 64-bits (See C11dr §5.2.4.2.1 1).
A 32-bit computer will likely use a 64-bit implementation. A future compiler may use 128-bit long long.
When printing, use a matching print specifier "%zu" for the return type of sizeof, which is size_t.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("size: %zu\n", sizeof(long long));
printf("bit size: %zu\n", sizeof(long long) * CHAR_BIT);
}
printf(sizeof(var)); is invalid code as the first parameter to printf() needs to be a string, not an integer. Insure your compiler warnings are fully enabled.
So... the modulo operation doesn't seem to work on a 64-bit value of all ones.
Here is my C code to set up the edge case:
#include <stdio.h>
int main(int argc, char *argv[]) {
long long max_ll = 0xFFFFFFFFFFFFFFFF;
long long large_ll = 0x0FFFFFFFFFFFFFFF;
long long mask_ll = 0x00000F0000000000;
printf("\n64-bit numbers:\n");
printf("0x%016llX\n", max_ll % mask_ll);
printf("0x%016llX\n", large_ll % mask_ll);
long max_l = 0xFFFFFFFF;
long large_l = 0x0FFFFFFF;
long mask_l = 0x00000F00;
printf("\n32-bit numbers:\n");
printf("0x%08lX\n", max_l % mask_l);
printf("0x%08lX\n", large_l % mask_l);
return 0;
}
The output shows this:
64-bit numbers:
0xFFFFFFFFFFFFFFFF
0x000000FFFFFFFFFF
32-bit numbers:
0xFFFFFFFF
0x000000FF
What is going on here?
Why doesn't modulo work on a 64-bit value of all ones, but it will on a 32-bit value of all ones?
It this a bug with the Intel CPU? Or with C somehow? Or is it something else?
More Info
I'm on a Windows 10 machine with an Intel i5-4570S CPU. I used the cl compiler from Visual Studio 2015.
I also verified this result using the Windows Calculator app (Version 10.1601.49020.0) by going into the Programmer mode. If you try to modulus 0xFFFF FFFF FFFF FFFF with anything, it just returns itself.
Specifying unsigned vs signed didn't seem to make any difference.
Please enlighten me :) I actually did have a use case for this operation... so it's not purely academic.
Your program causes undefined behaviour by using the wrong format specifier.
%llX may only be used for unsigned long long. If you use the right specifier, %lld then the apparent mystery will go away:
#include <stdio.h>
int main(int argc, char* argv[])
{
long long max_ll = 0xFFFFFFFFFFFFFFFF;
long long mask_ll = 0x00000F0000000000;
printf("%lld %% %lld = %lld\n", max_ll, mask_ll, max_ll % mask_ll);
}
Output:
-1 % 16492674416640 = -1
In ISO C the definition of the % operator is such that (a/b)*b + a%b == a. Also, for negative numbers, / follows "truncation towards zero".
So -1 / 16492674416640 is 0, therefore -1 % 16492674416640 must be -1 to make the above formula work.
As discussed in comments, the following line:
long long max_ll = 0xFFFFFFFFFFFFFFFF;
causes implementation-defined behaviour (assuming that your system has long long as a 64-bit type). The constant 0xFFFFFFFFFFFFFFFF has type unsigned long long, and it is out of range for long long whose maximum permitted value is 0x7FFFFFFFFFFFFFFF.
When an out-of-range assignment is made to a signed type, the behaviour is implementation-defined, which means the compiler documentation must say what happens.
Typically, this will be defined as generating the value which is in range of long long and has the same representation as the unsigned long long constant has. In 2's complement , (long long)-1 has the same representation as the unsigned long long value 0xFFFFFFFFFFFFFFFF, which explains why you ended up with max_ll holding the value -1.
Actually it does make a difference whether the values are defined as signed or unsigned:
#include <stdio.h>
#include <limits.h>
int main(void) {
#if ULLONG_MAX == 0xFFFFFFFFFFFFFFFF
long long max_ll = 0xFFFFFFFFFFFFFFFF; // converts to -1LL
long long large_ll = 0x0FFFFFFFFFFFFFFF;
long long mask_ll = 0x00000F0000000000;
printf("\n" "signed 64-bit numbers:\n");
printf("0x%016llX\n", max_ll % mask_ll);
printf("0x%016llX\n", large_ll % mask_ll);
unsigned long long max_ull = 0xFFFFFFFFFFFFFFFF;
unsigned long long large_ull = 0x0FFFFFFFFFFFFFFF;
unsigned long long mask_ull = 0x00000F0000000000;
printf("\n" "unsigned 64-bit numbers:\n");
printf("0x%016llX\n", max_ull % mask_ull);
printf("0x%016llX\n", large_ull % mask_ull);
#endif
#if UINT_MAX == 0xFFFFFFFF
int max_l = 0xFFFFFFFF; // converts to -1;
int large_l = 0x0FFFFFFF;
int mask_l = 0x00000F00;
printf("\n" "signed 32-bit numbers:\n");
printf("0x%08X\n", max_l % mask_l);
printf("0x%08X\n", large_l % mask_l);
unsigned int max_ul = 0xFFFFFFFF;
unsigned int large_ul = 0x0FFFFFFF;
unsigned int mask_ul = 0x00000F00;
printf("\n" "unsigned 32-bit numbers:\n");
printf("0x%08X\n", max_ul % mask_ul);
printf("0x%08X\n", large_ul % mask_ul);
#endif
return 0;
}
Produces this output:
signed 64-bit numbers:
0xFFFFFFFFFFFFFFFF
0x000000FFFFFFFFFF
unsigned 64-bit numbers:
0x000000FFFFFFFFFF
0x000000FFFFFFFFFF
signed 32-bit numbers:
0xFFFFFFFF
0x000000FF
unsigned 32-bit numbers:
0x000000FF
0x000000FF
64 bit hex constant 0xFFFFFFFFFFFFFFFF has value -1 when stored into a long long. This is actually implementation defined because of out of range conversion into a signed type, but on Intel processors, with current compilers, the conversion just keeps the same bit pattern.
Note that you are not using the fixed size integers defined in <stdint.h>: int64_t, uint64_t, int32_t and uint32_t. long long types are specified in the standard as having at least 64 bits, and on Intel x86_64, they do, and long has at least 32 bits, but for the same processor, the size differs between environments: 32 bits in Windows 10 (even in 64 bit mode) and 64 bits on MaxOS/10 and linux64. This is the reason why you observe surprising behavior for the long case where unsigned and signed may produce the same result. They don't on Windows, but they do in linux and MacOS because the computation is done in 64 bits and these values are just positive numbers.
Also note that LLONG_MIN / -1 and LLONG_MIN % -1 both invoke undefined behavior because of signed arithmetic overflow, and this one is not ignored on Intel PCs, it usually fires an uncaught exception and exits the program, just like 1 / 0 and 1 % 0.
Try putting unsigned before your long long. As a signed number, your 0xFF...FF is actually -1 on most platforms.
Also, in your code, your 32-bit numbers are still 64-bits (you have them declared as long long as well).
This question already has answers here:
(-2147483648> 0) returns true in C++?
(4 answers)
Closed 8 years ago.
AFAIK this is a standard "idiom"
# define INT_MIN (-INT_MAX - 1)
# define INT_MAX 2147483647
Question: Why is the definition of INT_MIN not as -2147483648?
Because 2147483648 is a long value as it does not fit in an int (in common system with 32-bit int and 64-bit long, on system with 32-bit long it is of type long long). So -2147483648 is of type long, not int.
Remember in C, an unsuffixed decimal integer constant is of the first type int, long or long long where it can be represented.
Also in C -2147483648 is not a integer constant; 2147483648 is an integer constant. -2147483648 is an expression formed with the unary operator - and the integer constant 2147483648.
EDIT: if you are not convinced -2147483648 is not of type int (some people in the comments still seem to doubt), you can try to print this:
printf("%zu %zu\n", sizeof INT_MIN, sizeof -2147483648);
You will most likely end up with:
4 8
on common 32 and 64-bit systems.
Also to follow a comment, I'm talking about recent C Standard: use c99 or c11 dialect to test this. c89 rules for decimal integer constant are different: -2147483648 is of type unsigned long in c89. Indeed in c89 (it is different in c99, see above), a unsuffixed decimal integer constant is of type int, long or unsigned long.
EDIT2: #WhozCraig added another example (but for C++) to show -2147483648 is not of type int.
The following example, though in C++, drives home this point. It was compiled with a 32-bit architecture g++. Note the type info gathered from the passed parameter deduction:
#include <iostream>
#include <climits>
template<typename T>
void foo(T value)
{
std::cout << __PRETTY_FUNCTION__ << '\n';
std::cout << value << '\n';
}
int main()
{
foo(-2147483648);
foo(INT_MIN);
return 0;
}
Output
void foo(T) [T = long long]
-2147483648
void foo(T) [T = int]
-2147483648
I'm trying to do, what I imagined to be, a fairly basic task. I have two unsigned char variables and I'm trying to combine them into a single signed int. The problem here is that the unsigned chars start as signed chars, so I have to cast them to unsigned first.
I've done this task in three IDE's; MPLAB (as this is an embedded application), MATLAB, and now trying to do it in visual studio. Visual is the only one having problems with the casting.
For an example, two signed chars are -5 and 94. In MPLAB I first cast the two chars into unsigned chars:
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
This gives me 251 and 94 respectively. I then want to do some bitshifting and concat:
int c = (int)((((unsigned int) a) << 8) | (unsigned int) b);
In MPLAB and MATLAB this gives me the right signed value of -1186. However, the exact same code in visual refuses to output results as a signed value, only unsigned (64350). This has been checked by both debugging and stepping through the code and printing the results:
printf("%d\n", c);
What am I doing wrong? This is driving me insane. The application is an electronic device that collects sensor data, then stores it on an SD card for later decoding using a program written in C. I technically could do all the calculations in MPLAB and then store those on the SDCARD, but I refuse to let Microsoft win.
I understand my method of casting is very unoptimised and you could probably do it in one line, but having had this problem for a couple of days now I've tried to break the steps down as much as possible.
Any help is most appreciated!
The problem is that an int on most systems is 32-bits. If you concatenate two 8-bit quantities and store it into a 32-bit quantity, you will get a positive integer because you are not setting the sign bit, which is the most significant bit. More specifically, you are only populating the lower 16 bits of a 32-bit integer, which will naturally be interpreted as a positive number.
You can fix this by explicitly using as 16-bit signed int.
#include <stdio.h>
#include <stdint.h>
int main() {
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
int16_t c = (int16_t)((((unsigned int) a) << 8) | (unsigned int) b);
printf("%d\n", c);
}
Note that I am on a Linux system, so you will probably have to change stdint.h to the Microsoft equivalent, and possibly change int16_t to whatever Microsoft calls their 16-bit signed integer type, if it is different, but this should work with those modifications.
This is the correct behavior of the standard C language. When you convert an unsigned to a signed type, the language does not perform sign extension, i.e. it does not propagate the highest bit of the unsigned into the sign bit of the signed type.
You can fix your problem by casting a to a signed char, like this:
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
int c = (signed char)a << 8 | b;
printf("%d\n", c); // Prints -1186
Now that a is treated as signed, the language propagates its top bit into the sign bit of the 32-bit int, making the result negative.
Demo on ideone.
Converting an out-of-range unsigned value to a signed value causes implementation-defined behaviour, which means that the compiler must document what it does in this situation; and different compilers can do different things.
In C99 there is also a provision that the compiler may raise a signal in this case (terminating the program if you don't have a signal handler). I believe it was undefined behaviour in C89, but C99 tightened this up a bit.
Is there some reason you can't go:
signed char x = -5;
signed char y = 94;
int c = x * 256 + y;
?
BTW if you are OK with implementation-defined behaviour, and your system has a 16-bit type then you can just go, with #include <stdint.h>,
int c = (int16_t)(x * 256 + y);
To explain, C deals in values. In math, 251 * 256 + 94 is a positive number, and C is no exception to that. The bit-shift operators are just *2 and /2 in disguise. If you want your value to be reduced (mod 65536) you have to specifically request that.
If you also think in terms of values rather than representations, you don't have to worry about things like sign bits and sign extension.
Why do I get -1 when I print the following?
unsigned long long int largestIntegerInC = 18446744073709551615LL;
printf ("largestIntegerInC = %d\n", largestIntegerInC);
I know I should use llu instead of d, but why do I get -1 instead of 18446744073709551615LL?
Is it because of overflow?
In C (99), LLONG_MAX, the maximum value of long long int type is guaranteed to be at least 9223372036854775807. The maximum value of an unsigned long long int is guaranteed to be at least 18446744073709551615, which is 264−1 (0xffffffffffffffff).
So, initialization should be:
unsigned long long int largestIntegerInC = 18446744073709551615ULL;
(Note the ULL.) Since largestIntegerInC is of type unsigned long long int, you should print it with the right format specifier, which is "%llu":
$ cat test.c
#include <stdio.h>
int main(void)
{
unsigned long long int largestIntegerInC = 18446744073709551615ULL;
/* good */
printf("%llu\n", largestIntegerInC);
/* bad */
printf("%d\n", largestIntegerInC);
return 0;
}
$ gcc -std=c99 -pedantic test.c
test.c: In function ‘main’:
test.c:9: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long long unsigned int’
The second printf() above is wrong, it can print anything. You are using "%d", which means printf() is expecting an int, but gets a unsigned long long int, which is (most likely) not the same size as int. The reason you are getting -1 as your output is due to (bad) luck, and the fact that on your machine, numbers are represented using two's complement representation.
To see how this can be bad, let's run the following program:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main(int argc, char *argv[])
{
const char *fmt;
unsigned long long int x = ULLONG_MAX;
unsigned long long int y = 42;
int i = -1;
if (argc != 2) {
fprintf(stderr, "Need format string\n");
return EXIT_FAILURE;
}
fmt = argv[1];
printf(fmt, x, y, i);
putchar('\n');
return 0;
}
On my Macbook, running the program with "%d %d %d" gives me -1 -1 42, and on a Linux machine, the same program with the same format gives me -1 42 -1. Oops.
In fact, if you are trying to store the largest unsigned long long int number in your largestIntegerInC variable, you should include limits.h and use ULLONG_MAX. Or you should store assing -1 to your variable:
#include <limits.h>
#include <stdio.h>
int main(void)
{
unsigned long long int largestIntegerInC = ULLONG_MAX;
unsigned long long int next = -1;
if (next == largestIntegerInC) puts("OK");
return 0;
}
In the above program, both largestIntegerInC and next contain the largest possible value for unsigned long long int type.
It's because you're passing a number with all the bits set to 1. When interpreted as a two's complement signed number, that works out to -1. In this case, it's probably only looking at 32 of those one bits instead of all 64, but that doesn't make any real difference.
In two's complement arithmetic, the signed value -1 is the same as the largest unsigned value.
Consider the bit patterns for negative numbers in two's complement (I'm using 8 bit integers, but the pattern applies regardless of the size):
0 - 0x00
-1 - 0xFF
-2 - 0xFE
-3 - 0xFD
So, you can see that negative 1 has the bit pattern of all 1's which is also the bit pattern for the largest unsigned value.
You used a format for a signed 32-bit number, so you got -1. printf() can't tell internally how big the number you passed in is, so it just pulls the first 32 bits from the varargs list and uses them as the value to be printed out. Since you gave a signed format, it prints it that way, and 0xffffffff is the two's complement representation of -1.
You can (should) see why in compiler warning. If not, try to set the highest warning level. With VS I've got this warning: warning C4245: 'initializing' : conversion from '__int64' to 'unsigned __int64', signed/unsigned mismatch.
No, there is no overflow. It's because it isn't printing the entire value:
18446744073709551615 is the same as 0xFFFFFFFFFFFFFFFF. When printf %d processes that, it grabs only 32 bits (or 64 bits if it's a 64-bit CPU) for conversion, and those are the signed value -1.
If the printf conversion had been %u instead, it would show either 4294967295 (32 bits) or 18446744073709551615 (64 bits).
An overflow is when a value increases to the point where it won't fit in the storage allocated. In this case, the value is allocated just fine, but isn't being completely retrieved.