I have written code like follow:
int a = -1;
unsigned int b = 0xffffffff;
if (a == b)
printf("a == b\n");
else
printf("a != b\n");
printf("a = %x b = %x\n", a, b);
return 0;
And the result is as follow:
It shows that a and b are equal. So I want to know how the computer make this judgement?
In any arithmetic operation with a signed integer a and unsigned integer b as operands, a will be implicitly cast to unsigned. Since -1 signed in this case is 0xffffffff unsigned, a and b compares equal.
The machine representation of your two values a and b is the same bit pattern (on your particular computer and implementation), so the a == b test is true.
BTW, you should enable all warnings and debug info when compiling (e.g. compile with gcc -Wall -Wextra -g if using GCC...). You'll probably get some warnings, because you probably has hit some undefined behavior. And you could run your code step by step in your debugger (e.g. gdb) and query the values (and their machine representations).
Related
This is a complete rewrite of the question. Hopefully it is clearer now.
I want to implement in C a function that performs addition of signed ints with wrapping in case of overflow.
I want to target mainly the x86-64 architecture, but of course the more portable the implementation is the better. I'm also concerned mostly about producing decent assembly code through gcc, clang, icc, and whatever is used on Windows.
The goal is twofold:
write correct C code that doesn't fall into the undefined behavior blackhole;
write code that gets compiled to decent machine code.
By decent machine code I mean a single leal or a single addl instruction on machines which natively support the operation.
I'm able to satisfy either of the two requisites, but not both.
Attempt 1
The first implementation that comes to mind is
int add_wrap(int x, int y) {
return (unsigned) x + (unsigned) y;
}
This seems to work with gcc, clang and icc. However, as far as I know, the C standard doesn't specify the cast from unsigned int to signed int, leaving freedom to the implementations (see also here).
Otherwise, if the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
I believe most (all?) major compilers do the expected conversion from unsigned to int, meaning that they take the correct representative modulus 2^N, where N is the number of bits, but it's not mandated by the standard so it cannot be relied upon (stupid C standard hits again). Also, while this is the simplest thing to do on two's complement machines, it is impossible on ones' complement machines, because there is a class which is not representable: 2^(N/2).
Attempt 2
According to the clang docs, one can use __builtin_add_overflow like this
int add_wrap(int x, int y) {
int res;
__builtin_add_overflow(x, y, &res);
return res;
}
and this should do the trick with clang, because the docs clearly say
If possible, the result will be equal to mathematically-correct result and the builtin will return 0. Otherwise, the builtin will return 1 and the result will be equal to the unique value that is equivalent to the mathematically-correct result modulo two raised to the k power, where k is the number of bits in the result type.
The problem is that in the GCC docs they say
These built-in functions promote the first two operands into infinite precision signed type and perform addition on those promoted operands. The result is then cast to the type the third pointer argument points to and stored there.
As far as I know, casting from long int to int is implementation specific, so I don't see any guarantee that this will result in the wrapping behavior.
As you can see [here][godbolt], GCC will also generate the expected code, but I wanted to be sure that this is not by chance ans is indeed part of the specification of __builtin_add_overflow.
icc also seems to produce something reasonable.
This produces decent assembly, but relies on intrinsics, so it's not really standard compliant C.
Attempt 3
Follow the suggestions of those pedantic guys from SEI CERT C Coding Standard.
In their CERT INT32-C recommendation they explain how to check in advance for potential overflow. Here is what comes out following their advice:
#include <limits.h>
int add_wrap(int x, int y) {
if ((x > 0) && (y > INT_MAX - x))
return (x + INT_MIN) + (y + INT_MIN);
else if ((x < 0) && (y < INT_MIN - x))
return (x - INT_MIN) + (y - INT_MIN);
else
return x + y;
}
The code performs the correct checks and compiles to leal with gcc, but not with clang or icc.
The whole CERT INT32-C recommendation is complete garbage, because it tries to transform C into a "safe" language by forcing the programmers to perform checks that should be part of the definition of the language in the first place. And in doing so it forces also the programmer to write code which the compiler can no longer optimize, so what is the reason to use C anymore?!
Edit
The contrast is between compatibility and decency of the assembly generated.
For instance, with both gcc and clang the two following functions which are supposed to do the same get compiled to different assembly.
f is bad in both cases, g is good in both cases (addl+jo or addl+cmovnol). I don't know if jo is better than cmovnol, but the function g is consistently better than f.
#include <limits.h>
signed int f(signed int si_a, signed int si_b) {
signed int sum;
if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
return 0;
} else {
return si_a + si_b;
}
}
signed int g(signed int si_a, signed int si_b) {
signed int sum;
if (__builtin_add_overflow(si_a, si_b, &sum)) {
return 0;
} else {
return sum;
}
}
A bit like #Andrew's answer without the memcpy().
Use a union to negate the need for memcpy(). With C2x, we are sure that int is 2's compliment.
int add_wrap(int x, int y) {
union {
unsigned un;
int in;
} u = {.un = (unsigned) x + (unsigned) y};
return u.in;
}
For those who like 1-liners, use a compound literal.
int add_wrap2(int x, int y) {
return ( union { unsigned un; int in; }) {.un = (unsigned) x + (unsigned) y}.in;
}
I'm not so sure because of the rules for casting from unsigned to signed
You exactly quoted the rules. If you convert from a unsigned value to a signed one, then the result is implementation-defined or a signal is raised. In simple words, what will happen is described by your compiler.
For example the gcc9.2.0 compiler has the following to in it's documentation about implementation defined behavior of integers:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
I had to do something similar; however, I was working with known width types from stdint.h and needed to handle wrapping 32-bit signed integer operations. The implementation below works because stdint types are required to be 2's complement. I was trying to emulate the behaviour in Java, so I had some Java code generate a bunch of test cases and have tested on clang, gcc and MSVC.
inline int32_t add_wrap_i32(int32_t a, int32_t b)
{
const int64_t a_widened = a;
const int64_t b_widened = b;
const int64_t sum = a_widened + b_widened;
return (int32_t)(sum & INT64_C(0xFFFFFFFF));
}
inline int32_t sub_wrap_i32(int32_t a, int32_t b)
{
const int64_t a_widened = a;
const int64_t b_widened = b;
const int64_t difference = a_widened - b_widened;
return (int32_t)(difference & INT64_C(0xFFFFFFFF));
}
inline int32_t mul_wrap_i32(int32_t a, int32_t b)
{
const int64_t a_widened = a;
const int64_t b_widened = b;
const int64_t product = a_widened * b_widened;
return (int32_t)(product & INT64_C(0xFFFFFFFF));
}
It seems ridiculous, but I think that the recommended method is to use memcpy. Apparently all modern compilers optimize the memcpy away and it ends up doing just what you're hoping in the first place -- preserving the bit pattern from the unsigned addition.
int a;
int b;
unsigned u = (unsigned)a + b;
int result;
memcpy(&result, &u, sizeof(result));
On x86 clang with optimization, this is a single instruction if the destination is a register.
Take for example int a=INT_MAX-1; and int b=INT_MAX-1; and assume that int is 32-bit and a function
int product(int a,int b)
{
return a*b;
}
Now here the product a*b overflows resulting in undefined behavior from the standard:
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
However if we have instead
int product(int a,int b)
{
long long x = (long long)a*b;
return x;
}
Then assuming this answer is correct and applies to long long as well by the standard the result is implementation-defined.
I'm thinking that undefined behavior can cause anything including a crash so it's better to avoid it all costs, hence that the second version is preferable. But I'm not quite sure if my reasoning is okay.
Question: Is second version preferable or is the first one or are they equally preferable?
Both of the options are bad because they do not produce the desired result. IMHO it is a moot point trying to rank them in badness order.
My advice would be to fix the function to be well-defined for all use cases.
If you (the programmer) will never (ever!) pass values to the product() function that will cause undefined behavior, then the first version, why not.
The second version returns the sizeof(int)*CHAR_BIT least significant bits of the result (this is implementation defined behavior) and still may overflow on architectures where LLONG_MAX == INT_MAX. The second version may take ages to execute on a 8-bit processor with real bad support for long long multiplication and maybe you should handle the overflow when converting long long to int with some if (x > INT_MAX) return INT_MAX;, unless you are only really interested in only the least significant bits of the product result.
The preferable version is that, where no undefined behavior exists. If you aren't sure if multiplication a and b will result in undefined behavior or not, you should check if it will and prepare for such a case.
#include <assert.h>
#include <limits.h>
int product(int a, int b)
{
assert(a < INT_MAX/b && b < INT_MAX/a);
if (!(a < INT_MAX/b && b < INT_MAX/a))
return INT_MAX;
return a * b;
}
or in GNUC:
int product(int a, int b) {
int c;
if (__builtin_sadd_overflow(a, b, &c)) {
assert(0);
return INT_MAX;
}
return c;
}
I believe that slightly tweaked second version might be interesting for you:
int product(int a, int b)
{
long long x = (long long)a * b;
if (x < INT_MIN || x > INT_MAX)
{
fprintf(stderr, "Error in product(): Result out of range of int\n");
abort();
}
return x;
}
This function takes two integers as long ints, computes their product and checks if
the result is in range of int. If it is, we can return it from the function without any bad consequences. If it is not, we can print error message and abort, or do exception handling of a different kind.
EDIT 1: But this code stil expects that (long long)a * b does not overflow, which is not guaranteed when i. e. sizeof(long long) == sizeof(int). In such case, an overflow check should be added to make sure this does not happen. The (6.54) Integer Overflow Builtins could be interesting for you if you don't mind using GCC-dependent code. If you want to stay in C without any extensions, there are methods to detect multiplication overflow as well, see this StackOverflow answer: https://stackoverflow.com/a/1815371/1003701
This question already has answers here:
Comparison operation on unsigned and signed integers
(7 answers)
Closed 5 years ago.
I have a "C"code snippet as below
int32_t A = 5;
uint32_t B = 8;
if ( A >= B )
{
printf("Test");
}
When i build this i received an remark/warning as "comparison between signed and unsigned operands.Can any one address this issue?
Everything is ok while A is positive and B is less than 2^31.
But, if A is less than 0, then unexpected behavior occurs.
A = -1, in memory it will be saved as 0xFFFFFFFF.
B = 5, in memory it will be saved as 0x00000005.
When you do
if (A < B) {
//Something, you are expecting to be here
}
Compiler will compare them as unsigned 32-bit integer and your if will be expanded to:
if (0xFFFFFFFF < 0x00000005) {
//Do something, it will fail.
}
Compiler warns you about this possible problem.
Comparison operation on unsigned and signed integers
Good, very good! You are reading and paying attention to your compiler warnings.
In your code:
int32_t A = 5;
uint32_t B = 8;
if ( A >= B )
{
printf("Test");
}
You have 'A' as a signed int32_t value with min/max values of -2147483648/2147483647 and you have and unsigned uint32_t with min/max of 0/4294967295, respectively. The compiler generates the warning to guard against cases that are always true or false based on the types involved. Here A can never be greater than B for any values in the allowable range of B from 2147483648 - 4294967295. That whole swath of numbers will provide False regardless of the individual values involved.
Another great example would be if ( A < B ) which produces a TRUE for all values of A from -2147483648 - -1 because the unsigned type can never be less than zero.
The compiler warnings are there to warn that testing with these types may not provide valid comparisons for certain ranges of numbers -- that you might not have anticipated.
In the real world, if you know A is only holding values from 0 - 900, then you can simply tell the compiler that 1) you understand the warning and by your cast will 2) guarantee the values will provide valid tests, e.g.
int32_t A = 5;
uint32_t B = 8;
if (A >= 0 ) {
if ( (uint32_t)A >= B )
printf("Test");
}
else
/* handle error */
If you cannot make the guarantees for 1) & 2), then it is time to go rewrite the code in a way you are not faced with the warning.
Two good things happened here. You had compiler warnings enabled, and you took the time to read and understand what the compiler was telling you. This will come up time and time again. Now you know how to approach a determination of what can/should be done.
I am writing a program in C to calculate the range of different data types. Please look at the following code:
#include <stdio.h>
main()
{
int a;
long b;
for (a = 0; a <= 0; --a)
;
++a;
printf("INT_MIN: %d\n", a);
for (a = 0; a >= 0; ++a)
;
--a;
printf("INT_MAX: %d\n", a);
for (b = 0; b <= 0; --b)
;
++b;
printf("LONG_MIN: %d\n", b);
for (b = 0; b >= 0; ++b)
;
--b;
printf("LONG_MAX: %d\n", b);
}
The output was:
INT_MIN: -32768
INT_MIN: 32767
LONG_MIN: 0
LONT_MAX: -1
The program took a long pause to print the long values. I also put a printf inside the third loop to test the program (not mentioned here). I found that b did not exit the loop even when it became positive.
I used the same method of calculation. Why did it work for int but not for long?
You are using the wrong format specifier. Since b is of type long, use
printf("LONG_MIN: %ld\n", b);
In fact, if you enabled all warnings, the compiler probably would warn you, e.g:
t.c:19:30: warning: format specifies type 'int' but the argument has type 'long' [-Wformat]
printf("LONG_MIN: %d\n", b);
In C it is undefined behaviour to decrement a signed integer beyond its minimum value (and similiarly for incrementing above the maximum value). Your program could do literally anything.
For example, gcc compiles your program to an infinite loop with no output.
The proper approach is:
#include <limits.h>
#include <stdio.h>
int main()
{
printf("INT_MIN = %d\n", INT_MIN);
// etc.
}
In
printf("LONG_MIN: %d\n", b);
the format specifier is %d which works for integers(int). It should be changed to %ld to print long integers(long) and so is the case with
printf("LONG_MAX: %d\n", b);
These statements should be
printf("LONG_MIN: %ld\n", b);
&
printf("LONG_MAX: %ld\n", b);
This approach may not work for all compilers(eg gcc) and an easier approach would be to use limits.h.
Also check Integer Limits.
As already stated, the code you provided invokes undefined behavior. Thus it could calculate what you want or launch nuclear missiles ...
The reason for the undefined behavior is the signed integer overflow that you are provoking in order to "test" the range of the data types.
If you just want to know the range of int, long and friends, then limits.h is the place to look for. But if you really want ...
[..] to calculate the range [..]
... for what ever reason, then you could do so with the unsigned variant of the respective type (though see the note at the end), and calculate the maximum like so:
unsigned long calculate_max_ulong(void) {
unsigned long follow = 0;
unsigned long lead = 1;
while (lead != 0) {
++lead;
++follow;
}
return follow;
}
This only results in an unsigned integer wrap (from the max value to 0), which is not classified as undefined behavior. With the result from above, you can get the minimum and maximum of the corresponding signed type like so:
assert(sizeof(long) == sizeof(unsigned long));
unsigned long umax_h = calculate_max_ulong() / 2u;
long max = umax_h;
long min = - max - 1;
(Ideone link)
Assuming two's complement for signed and that the unsigned type has only one value bit more than the signed type. See §6.2.6.2/2 (N1570, for example) for further information.
I have the following program
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(void) {
uint16_t o = 100;
uint32_t i1 = 30;
uint32_t i2 = 20;
o = (uint16_t) (o - (i1 - i2)); /*Case A*/
o -= (uint16_t) (i1 - i2); /*Case B*/
(void)o;
return 0;
}
Case A compiles with no errors.
Case B causes the following error
[error: conversion to ‘uint16_t’ from ‘int’ may alter its value [-Werror=conversion]]
The warning options I'm using are:
-Werror -Werror=strict-prototypes -pedantic-errors -Wconversion -pedantic -Wall -Wextra -Wno-unused-function
I'm using GCC 4.9.2 on Ubuntu 15.04 64-bits.
Why do I get this error in Case B but not in Case A?
PS:
I ran the same example with clang compiler and both cases are compiled fine.
Integer Promotion is a strange thing. Basically, all integer values, of any smaller size, are promoted to int so they can be operated on efficiently, and then converted back to the smaller size when stored. This is mandated by the C standard.
So, Case A really looks like this:
o = (uint16_t) ((int)o - ((uint32_t)i1 - (uint32_t)i2));
(Note that uint32_t does not fit in int, so needs no promotion.)
And, Case B really looks like this:
o = (int)o - (int)(uint16_t) ((uint32_t)i1 - (uint32_t)i2);
The main difference is that Case A has an explicit cast, whereas Case B has an implicit conversion.
From the GCC manual:
-Wconversion
Warn for implicit conversions that may alter a value. ....
So, only Case B gets a warning.
Your case B is equivalent to:
o = o - (uint16_t) (i1 - i2); /*Case B*/
The result is an int which may not fit in uint16_t, so, per your extreme warning options, it produces a warning (and thus an error since you're treating warnings as errors).