John Regehr's blog post A Guide to Undefined Behavior in C and C++, Part 1 contains the following "safe" function for "performing integer division without executing undefined behavior":
int32_t safe_div_int32_t (int32_t a, int32_t b) {
if ((b == 0) || ((a == INT32_MIN) && (b == -1))) {
report_integer_math_error();
return 0;
} else {
return a / b;
}
}
I'm wondering what is wrong with the division (a/b) when a = INT32_MIN and b = -1. Is it undefined? If so why?
I think it's because the absolute value of INT32_MIN is 1 larger than INT32_MAX. So INT32_MIN/-1 actually equals INT32_MAX + 1 which would overflow.
So for 32-bit integers, there are 4,294,967,296 values.
There are 2,147,483,648 values for negative numbers (-2,147,483,648 to -1).
There is 1 value for zero (0).
There are 2,147,483,647 values for positive numbers (1 to 2,147,483,647) because 0 took 1 value away from the positive numbers.
This is because int32_t is represented using two's-complement, and numbers with N bits in two's-complement range from −2^(N−1) to 2^(N−1)−1. Therefore, when you carry out the division, you get: -2^(31) / -1 = 2^(N-1). Notice that the result is larger than 2^(N-1)-1, meaning you get an overflow!
The other posters are correct about the causes of the overflow. The implication of the overflow on most machines is that INT_MIN / -1 => INT_ MIN. The same thing happens when multiplying by -1. This is an unexpected and possibly dangerous result. I've seen a fixed-point motor controller go out of control because it didn't check for this condition.
Because INT32_MIN is defined as (-INT32_MAX-1) = -(INT32_MAX+1) and when divided by -1, this would be (INT32+MAX) => there is an integer overflow. I must say, that is a nice way to check for overflows. Thoughtfully written code. +1 to the developer.
Related
I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.
My current implementation is the following:
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < INT64_MAX)
&& (round(d) == d)
&& (i == (int64_t)d);
}
My question is: is this implementation correct? And if not, what would be a correct answer? To be correct, it must leave no false positive, and no false negative.
Some sample inputs:
int64EqualsDouble(0, 0.0) should return 1
int64EqualsDouble(1, 1.0) should return 1
int64EqualsDouble(0x3FFFFFFFFFFFFFFF, (double)0x3FFFFFFFFFFFFFFF) should return 0, because 2^62 - 1 can be exactly represented with int64_t, but not with double.
int64EqualsDouble(0x4000000000000000, (double)0x4000000000000000) should return 1, because 2^62 can be exactly represented in both int64_t and double.
int64EqualsDouble(INT64_MAX, (double)INT64_MAX) should return 0, because INT64_MAX can not be exactly represented as a double
int64EqualsDouble(..., 1.0e100) should return 0, because 1.0e100 can not be exactly represented as an int64_t.
Yes, your solution works correctly because it was designed to do so, because int64_t is represented in two's complement by definition (C99 7.18.1.1:1), on platforms that use something resembling binary IEEE 754 double-precision for the double type. It is basically the same as this one.
Under these conditions:
d < INT64_MAX is correct because it is equivalent to d < (double) INT64_MAX and in the conversion to double, the number INT64_MAX, equal to 0x7fffffffffffffff, rounds up. Thus you want d to be strictly less than the resulting double to avoid triggering UB when executing (int64_t)d.
On the other hand, INT64_MIN, being -0x8000000000000000, is exactly representable, meaning that a double that is equal to (double)INT64_MIN can be equal to some int64_t and should not be excluded (and such a double can be converted to int64_t without triggering undefined behavior)
It goes without saying that since we have specifically used the assumptions about 2's complement for integers and binary floating-point, the correctness of the code is not guaranteed by this reasoning on platforms that differ. Take a platform with binary 64-bit floating-point and a 64-bit 1's complement integer type T. On that platform T_MIN is -0x7fffffffffffffff. The conversion to double of that number rounds down, resulting in -0x1.0p63. On that platform, using your program as it is written, using -0x1.0p63 for d makes the first three conditions true, resulting in undefined behavior in (T)d, because overflow in the conversion from integer to floating-point is undefined behavior.
If you have access to full IEEE 754 features, there is a shorter solution:
#include <fenv.h>
…
#pragma STDC FENV_ACCESS ON
feclearexcept(FE_INEXACT), f == i && !fetestexcept(FE_INEXACT)
This solution takes advantage of the conversion from integer to floating-point setting the INEXACT flag iff the conversion is inexact (that is, if i is not representable exactly as a double).
The INEXACT flag remains unset and f is equal to (double)i if and only if f and i represent the same mathematical value in their respective types.
This approach requires the compiler to have been warned that the code accesses the FPU's state, normally with #pragma STDC FENV_ACCESS on but that's typically not supported and you have to use a compilation flag instead.
OP's code has a dependency that can be avoided.
For a successful compare, d must be a whole number and round(d) == d takes care of that. Even d, as a NaN would fail that.
d must be mathematically in the range of [INT64_MIN ... INT64_MAX] and if the if conditions properly insure that, then the final i == (int64_t)d completes the test.
So the question comes down to comparing INT64 limits with the double d.
Let us assume FLT_RADIX == 2, but not necessarily IEEE 754 binary64.
d >= INT64_MIN is not a problem as -INT64_MIN is a power of 2 and exactly converts to a double of the same value, so the >= is exact.
Code would like to do the mathematical d <= INT64_MAX, but that may not work and so a problem. INT64_MAX is a "power of 2 - 1" and may not convert exactly - it depends on if the precision of the double exceeds 63 bits - rendering the compare unclear. A solution is to halve the comparison. d/2 suffers no precision loss and INT64_MAX/2 + 1 converts exactly to a double power-of-2
d/2 < (INT64_MAX/2 + 1)
[Edit]
// or simply
d < ((double)(INT64_MAX/2 + 1))*2
Thus if code does not want to rely on the double having less precision than uint64_t. (Something that likely applies with long double) a more portable solution would be
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < ((double)(INT64_MAX/2 + 1))*2) // (d/2 < (INT64_MAX/2 + 1))
&& (round(d) == d)
&& (i == (int64_t)d);
}
Note: No rounding mode issues.
[Edit] Deeper limit explanation
Insuring mathematically, INT64_MIN <= d <= INT64_MAX, can be re-stated as INT64_MIN <= d < (INT64_MAX + 1) as we are dealing with whole numbers. Since the raw application of (double) (INT64_MAX + 1) in code is certainly 0, an alternative, is ((double)(INT64_MAX/2 + 1))*2. This can be extended for rare machines with double of higher powers-of-2 to ((double)(INT64_MAX/FLT_RADIX + 1))*FLT_RADIX. The comparison limits being exact powers-of-2, conversion to double suffers no precision loss and (lo_limit >= d) && (d < hi_limit) is exact, regardless of the precision of the floating point. Note: that a rare floating point with FLT_RADIX == 10 is still a problem.
In addition to Pascal Cuoq's elaborate answer, and given the extra context you give in comments, I would add a test for negative zeros. You should preserve negative zeros unless you have good reasons not to. You need a specific test to avoid converting them to (int64_t)0. With your current proposal, negative zeros will pass your test, get stored as int64_t and read back as positive zeros.
I am not sure what is the most efficient way to test them, maybe this:
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < INT64_MAX)
&& (round(d) == d)
&& (i == (int64_t)d
&& (!signbit(d) || d != 0.0);
}
For example, there are 3 variables of long type, we add a and b and get s:
long a, b, s;
...
s = a + b
Now what does ((s^a) < 0 && (s^b) < 0) mean?
I saw a check like this in the source code of Python:
if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: int + int */
register long a, b, i;
a = PyInt_AS_LONG(v);
b = PyInt_AS_LONG(w);
i = a + b;
if ((i^a) < 0 && (i^b) < 0)
goto slow_iadd;
x = PyInt_FromLong(i);
}
This code is wrong.
Assuming the usual 2's-complement rules of bitwise XOR for signed integers, then
(s^a) < 0
is the case if s and a have their sign bits set to opposite values. Thus,
((s^a) < 0 && (s^b) < 0)
indicates that s has sign different from both a and b, which must then have equal sign (pretending 0 is positive). If you added two integers of equal sign and got a result of different sign, there must have been an overflow, so this is an overflow check.
If we assume that signed overflow wraps, then s has opposite sign from a and b exactly when overflow has occurred. However, signed overflow is undefined behavior. Computing s is already wrong; we need to check whether the overflow would occur without actually performing the operation.
Python isn't supposed to do this. You can see what it's supposed to do in the Python 2 source code for int.__add__:
/* casts in the line below avoid undefined behaviour on overflow */
x = (long)((unsigned long)a + b);
if ((x^a) >= 0 || (x^b) >= 0)
It's supposed to cast to unsigned to get defined overflow behavior. The cast-to-unsigned fix was introduced in 5 different places as a result of issue 7406 on the Python bug tracker, but it looks like they missed a spot, or perhaps INPLACE_ADD was changed since then. I've left a message on the tracker.
I don't see how this is possible.
If (s^a) < 0, then the sign bit of either s or a must be a 1, so either s or a (but not both) must be negative. Same for s and b. So either s is negative and both a and b are positive, or s is positive and both a and b are negative. Both situations seem impossible.
Unless you count integer overflow/underflow, of course.
int a = 31;
int b = 1;
while (a)
{
b = (2 * b) + 1;
a--;
};
printf("count:%d \n", b);
it prints the right number when a is smaller than 31. Starting from 31, it just prints -1 and I don't understand why. How can I fix it?
The integer overflows and will become negative.
To fix this, you can change the int variable b to long.
long b = 1;
In the two's complement internal representation of type int if its size is equl to 4 bytes the sign bit is equal to 2^31.
Thus then you multiply b by two when b is equal to INT_MAX then the sign bit is set and the number is converted to -1.
That is while the sign bit is not set you get poistive numbers 1, 3, 7, 15 and so on. As soon as the sign bit is set you get negative number -1 that has internal representation with all bits including the sign bit are set.
The while loop is terminated when a becomes 0, since the condition
while (a) ...
evaluating to
while (a != 0)...
It will happen after the loop is executed 31 times with the following expression in it:
b = (2 * b) + 1;
while the initial value of b is 1. It will generate the series: 1, 3, 7, 15... 2^(k+1)-1, while k is the iteration number (starting from 0 for initial value). So for k=31 the value would be 2^32-1. 2^32 is overflowing the 4-byte integer storage type, which is resulting in an undefined behavior. But some compilers are usually handling the overflow by just throwing away the overflowed leftmost bits, so the truncated 2^32 is becoming 0. So 0-1 = -1 and this is your result. But again, no standard is guaranteeing that you will get this result, so you should never rely on it.
To fix it, you can use a bigger storage type, like long, and use %ld for printf.
It's ok the #EugeneSh. answer, but the series don't start in 1, but 3. And to print all values of b, the printf needs to stay into the while loop.If the intention were starting in 1, the printf needs to be the first line into the while loop.
I'm just starting to learn C at school, I'm trying to get a hold of the basic concepts.
Our homework has a question,
for every int x: x+1 > x
Determine whether true or false, give reasoning if true and counterexample if false.
I'm confused because we were taught that the type int is of 32-bits and basically that means the integer is in binary format. Is x+1 adding 1 to the decimal value of 1?
x + 1 > x
is 1 for every int value except for value INT_MAX where INT_MAX + 1 is an overflow and therefore x + 1 > x expression is undefined behavior for x value of INT_MAX.
This actually means a compiler has the right to optimize out the expression:
x + 1 > x
by
1
As INT_MAX + 1 is undefined behavior, the compiler has the right to say that for this specific > expression INT_MAX + 1 is > INT_MAX.
As the x + 1 > x expression is undefined behavior for x == INT_MAX, it is also not safe to assume x + 1 > x can be false (0).
Note that if x was declared as an unsigned int instead of an int the situation is completely different. unsigned int operands never overflow (they wrap around): UINT_MAX + 1 == 0 and therefore x + 1 > x is 0 for x == UINT_MAX and 1 for all the other x values.
Modern compilers (like gcc) usually take the opportunity to optimize this expression and replace it with 1.
For the record, there was some serious security issues with known server programs using code like:
if (ptr + offset < ptr)
The code was meant to trigger a safety condition but the compiler would optimize out the if statement (by replacing the expression with 0) and it allowed an attacker to gain privilege escalation in the server program (by opening the possibility of an exploitable buffer overflow if I remember correctly).
Note for 32-bit number range is [-2147483648, 2147483647] that is equals to [-231, 231 -1 ].
So for expression x+1 > x is true for [-2147483648, 2147483646]
But not for 2147483647 because adding to 2147483647 in 32-bit size number causes bit overflow many implementations it makes x + 1 to -2147483648 But really behavior is
Undefined in C standard.
So,
x + 1 > x True for x in [-2147483648, 2147483646] only
x + 1 > x , for x = 2147483647 is Undefined value may be True or False depends on compiler. If a compiler calculates = -2147483648 value will be False.
I don't want to hand you the answer, so I'll reply with a question that should get you on the right track.
What is x + 1 when x is the largest possible value that can be stored in a 32-bit signed integer? (2,147,483,647)
Yes, x + 1 adds to the decimal value of 1.
This will be true almost all of the time. But if you add 1 to INT_MAX (which is 215 - 1 or greater), you might flip the sign. Think about the decimal representation of 0111111 versus 11111111. (Obviously not 32 bits, but the ideas hold.)
Look up two's complement if you're confused about why it flips. It's a pretty clever implementation of integers that makes addition easy.
EDIT: INT_MAX + 1 is undefined behavior. Doesn't necessarily become INT_MIN. But since x + 1 is not necessarily > x when x == INT_MAX, then the answer is clearly false!
Let us say we have x and y and both are signed integers in C, how do we find the most accurate mean value between the two?
I would prefer a solution that does not take advantage of any machine/compiler/toolchain specific workings.
The best I have come up with is:(a / 2) + (b / 2) + !!(a % 2) * !!(b %2) Is there a solution that is more accurate? Faster? Simpler?
What if we know if one is larger than the other a priori?
Thanks.
D
Editor's Note: Please note that the OP expects answers that are not subject to integer overflow when input values are close to the maximum absolute bounds of the C int type. This was not stated in the original question, but is important when giving an answer.
After accept answer (4 yr)
I would expect the function int average_int(int a, int b) to:
1. Work over the entire range of [INT_MIN..INT_MAX] for all combinations of a and b.
2. Have the same result as (a+b)/2, as if using wider math.
When int2x exists, #Santiago Alessandri approach works well.
int avgSS(int a, int b) {
return (int) ( ((int2x) a + b) / 2);
}
Otherwise a variation on #AProgrammer:
Note: wider math is not needed.
int avgC(int a, int b) {
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
}
return (a+b)/2;
}
A solution with more tests, but without %
All below solutions "worked" to within 1 of (a+b)/2 when overflow did not occur, but I was hoping to find one that matched (a+b)/2 for all int.
#Santiago Alessandri Solution works as long as the range of int is narrower than the range of long long - which is usually the case.
((long long)a + (long long)b) / 2
#AProgrammer, the accepted answer, fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
a/2 + b/2 + (a%2 + b%2)/2
#Guy Sirton, Solution fails about 1/8 of the time to match (a+b)/2. Example inputs like a == 1, b == 0
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a;
#R.., Solution fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == 1
return (a-(a|b)+b)/2+(a|b)/2;
#MatthewD, now deleted solution fails about 5/6 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
unsigned diff;
signed mean;
if (a > b) {
diff = a - b;
mean = b + (diff >> 1);
} else {
diff = b - a;
mean = a + (diff >> 1);
}
If (a^b)<=0 you can just use (a+b)/2 without fear of overflow.
Otherwise, try (a-(a|b)+b)/2+(a|b)/2. -(a|b) is at least as large in magnitude as both a and b and has the opposite sign, so this avoids the overflow.
I did this quickly off the top of my head so there might be some stupid errors. Note that there are no machine-specific hacks here. All behavior is completely determined by the C standard and the fact that it requires twos-complement, ones-complement, or sign-magnitude representation of signed values and specifies that the bitwise operators work on the bit-by-bit representation. Nope, the relative magnitude of a|b depends on the representation...
Edit: You could also use a+(b-a)/2 when they have the same sign. Note that this will give a bias towards a. You can reverse it and get a bias towards b. My solution above, on the other hand, gives bias towards zero if I'm not mistaken.
Another try: One standard approach is (a&b)+(a^b)/2. In twos complement it works regardless of the signs, but I believe it also works in ones complement or sign-magnitude if a and b have the same sign. Care to check it?
Edit: version fixed by #chux - Reinstate Monica:
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
} else {
return (a+b)/2;
}
Original answer (I'd have deleted it if it hadn't been accepted).
a/2 + b/2 + (a%2 + b%2)/2
Seems the simplest one fitting the bill of no assumption on implementation characteristics (it has a dependency on C99 which specifying the result of / as "truncated toward 0" while it was implementation dependent for C90).
It has the advantage of having no test (and thus no costly jumps) and all divisions/remainder are by 2 so the use of bit twiddling techniques by the compiler is possible.
For unsigned integers the average is the floor of (x+y)/2. But the same fails for signed integers. This formula fails for integers whose sum is an odd -ve number as their floor is one less than their average.
You can read up more at Hacker's Delight in section 2.5
The code to calculate average of 2 signed integers without overflow is
int t = (a & b) + ((a ^ b) >> 1)
unsigned t_u = (unsigned)t
int avg = t + ( (t_u >> 31 ) & (a ^ b) )
I have checked it's correctness using Z3 SMT solver
Just a few observations that may help:
"Most accurate" isn't necessarily unique with integers. E.g. for 1 and 4, 2 and 3 are an equally "most accurate" answer. Mathematically (not C integers):
(a+b)/2 = a+(b-a)/2 = b+(a-b)/2
Let's try breaking this down:
If sign(a)!=sign(b) then a+b will will not overflow. This case can be determined by comparing the most significant bit in a two's complement representation.
If sign(a)==sign(b) then if a is greater than b, (a-b) will not overflow. Otherwise (b-a) will not overflow. EDIT: Actually neither will overflow.
What are you trying to optimize exactly? Different processor architectures may have different optimal solutions. For example, in your code replacing the multiplication with an AND may improve performance. Also in a two's complement architecture you can simply (a & b & 1).
I'm just going to throw some code out, not looking too fast but perhaps someone can use and improve:
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a
I would do this, convert both to long long(64 bit signed integers) add them up, this won't overflow and then divide the result by 2:
((long long)a + (long long)b) / 2
If you want the decimal part, store it as a double.
It is important to note that the result will fit in a 32 bit integer.
If you are using the highest-rank integer, then you can use:
((double)a + (double)b) / 2
This answer fits to any number of integers:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test