Let us say we have x and y and both are signed integers in C, how do we find the most accurate mean value between the two?
I would prefer a solution that does not take advantage of any machine/compiler/toolchain specific workings.
The best I have come up with is:(a / 2) + (b / 2) + !!(a % 2) * !!(b %2) Is there a solution that is more accurate? Faster? Simpler?
What if we know if one is larger than the other a priori?
Thanks.
D
Editor's Note: Please note that the OP expects answers that are not subject to integer overflow when input values are close to the maximum absolute bounds of the C int type. This was not stated in the original question, but is important when giving an answer.
After accept answer (4 yr)
I would expect the function int average_int(int a, int b) to:
1. Work over the entire range of [INT_MIN..INT_MAX] for all combinations of a and b.
2. Have the same result as (a+b)/2, as if using wider math.
When int2x exists, #Santiago Alessandri approach works well.
int avgSS(int a, int b) {
return (int) ( ((int2x) a + b) / 2);
}
Otherwise a variation on #AProgrammer:
Note: wider math is not needed.
int avgC(int a, int b) {
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
}
return (a+b)/2;
}
A solution with more tests, but without %
All below solutions "worked" to within 1 of (a+b)/2 when overflow did not occur, but I was hoping to find one that matched (a+b)/2 for all int.
#Santiago Alessandri Solution works as long as the range of int is narrower than the range of long long - which is usually the case.
((long long)a + (long long)b) / 2
#AProgrammer, the accepted answer, fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
a/2 + b/2 + (a%2 + b%2)/2
#Guy Sirton, Solution fails about 1/8 of the time to match (a+b)/2. Example inputs like a == 1, b == 0
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a;
#R.., Solution fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == 1
return (a-(a|b)+b)/2+(a|b)/2;
#MatthewD, now deleted solution fails about 5/6 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
unsigned diff;
signed mean;
if (a > b) {
diff = a - b;
mean = b + (diff >> 1);
} else {
diff = b - a;
mean = a + (diff >> 1);
}
If (a^b)<=0 you can just use (a+b)/2 without fear of overflow.
Otherwise, try (a-(a|b)+b)/2+(a|b)/2. -(a|b) is at least as large in magnitude as both a and b and has the opposite sign, so this avoids the overflow.
I did this quickly off the top of my head so there might be some stupid errors. Note that there are no machine-specific hacks here. All behavior is completely determined by the C standard and the fact that it requires twos-complement, ones-complement, or sign-magnitude representation of signed values and specifies that the bitwise operators work on the bit-by-bit representation. Nope, the relative magnitude of a|b depends on the representation...
Edit: You could also use a+(b-a)/2 when they have the same sign. Note that this will give a bias towards a. You can reverse it and get a bias towards b. My solution above, on the other hand, gives bias towards zero if I'm not mistaken.
Another try: One standard approach is (a&b)+(a^b)/2. In twos complement it works regardless of the signs, but I believe it also works in ones complement or sign-magnitude if a and b have the same sign. Care to check it?
Edit: version fixed by #chux - Reinstate Monica:
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
} else {
return (a+b)/2;
}
Original answer (I'd have deleted it if it hadn't been accepted).
a/2 + b/2 + (a%2 + b%2)/2
Seems the simplest one fitting the bill of no assumption on implementation characteristics (it has a dependency on C99 which specifying the result of / as "truncated toward 0" while it was implementation dependent for C90).
It has the advantage of having no test (and thus no costly jumps) and all divisions/remainder are by 2 so the use of bit twiddling techniques by the compiler is possible.
For unsigned integers the average is the floor of (x+y)/2. But the same fails for signed integers. This formula fails for integers whose sum is an odd -ve number as their floor is one less than their average.
You can read up more at Hacker's Delight in section 2.5
The code to calculate average of 2 signed integers without overflow is
int t = (a & b) + ((a ^ b) >> 1)
unsigned t_u = (unsigned)t
int avg = t + ( (t_u >> 31 ) & (a ^ b) )
I have checked it's correctness using Z3 SMT solver
Just a few observations that may help:
"Most accurate" isn't necessarily unique with integers. E.g. for 1 and 4, 2 and 3 are an equally "most accurate" answer. Mathematically (not C integers):
(a+b)/2 = a+(b-a)/2 = b+(a-b)/2
Let's try breaking this down:
If sign(a)!=sign(b) then a+b will will not overflow. This case can be determined by comparing the most significant bit in a two's complement representation.
If sign(a)==sign(b) then if a is greater than b, (a-b) will not overflow. Otherwise (b-a) will not overflow. EDIT: Actually neither will overflow.
What are you trying to optimize exactly? Different processor architectures may have different optimal solutions. For example, in your code replacing the multiplication with an AND may improve performance. Also in a two's complement architecture you can simply (a & b & 1).
I'm just going to throw some code out, not looking too fast but perhaps someone can use and improve:
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a
I would do this, convert both to long long(64 bit signed integers) add them up, this won't overflow and then divide the result by 2:
((long long)a + (long long)b) / 2
If you want the decimal part, store it as a double.
It is important to note that the result will fit in a 32 bit integer.
If you are using the highest-rank integer, then you can use:
((double)a + (double)b) / 2
This answer fits to any number of integers:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test
Related
Say I want to multiply x by (3/8). So I can get the result using shift operation as follows (The result should round toward zero):
int Test(int x) {
int value = (x << 1) + x;
value = value >> 3;
value = value + ((x >> 31) & 1);
return value;
}
So I'll have 4 in Test(11) and -3 in Test(-9). The problem is, because I am doing the multiplication first, I'll have an overflow at some ranges and in those cases I won't get the correct value:
Test(0x80000000) // returns -268435455, but it should be -268435456
How can I fix this?
How can I fix this? (overflow at some ranges)
Divide by 8 first.
For each multiple of 8, the result increases by 3, exactly. So all that remains is to figure out 3/8 of numbers -7 to 7, which OP's test() can handle. Simplifications possible.
int Times3_8(int x) {
int div8 = x/8;
int value = div8*3 + Test(x%8);
}
int foo(int x)
{
return x/8*3 + x%8*3/8;
}
http://ideone.com/2wGtpl
Inspired by chux's answer: the key is to divide by 8 first (to sacrifice precision for range) and use a second term to handle the quantization error (correct for the error in a smaller range).
One solution would be to handle the high and low halves differently. For the high half of x, shift right 3 first, then multiply by 3. For the low half, multiply by 3 and then shift right by 3. Then add the two results together. That should work for the positive case. For negative numbers, you will need to tweak this a bit.
Let's suppose we have noramlly distributed random int values from function:
unsigned int myrand();
The commonest way to shrink its range to [0, A] (int A) is to do as follows:
(double)rand() / UINT_MAX * A
Now I need to do the same for values in range of __int64:
unsigned __int64 max64;
unsigned __int64 r64 = myrand();
r64 <<= 32;
r64 |= myrand();
r64 = normalize(r64, max64);
The problem is to normalize return range by some __int64 because it could not be placed in double. I wouldn't like to use various libraries for big numbers due to performance reasons. Is there a way to shrink return range quickly and easily while saving normal distribution of values?
The method that you give
(double)myrand() / UINT_MAX * A
is already broken. For example, if A = 1 and you want integers in the range [0, 1] you will only ever get a value of 1 if myrand () returned UINT_MAX. If you meant the range [0, A), that is only the value 0, then it is still broken because it will in that case return a value outside the range. No matter what, you are introducing a bias.
If you want A+1 different values from 0 to A inclusive, and 2^32 ≤ A < 2^64, you proceed as follows:
Step 1: Calculate a 64 bit random number R as you did. If A is one less than a power of two, you return R shifted by the right amount.
Step 2: Find how many different random values would be mapped to the same output value. Mathematically, that number is floor (2^64 / (A + 1)). 2^64 is too large, but that is no problem because it is equal to 1 + floor ((2^64 - (A + 1)) / (A + 1)), calculated in C or C++ as D = 1 + (- (A + 1)) / (A + 1) if A has type uint64_t.
Step 3: Find how many different random values should be mapped by calculating N = D * (A + 1). If R >= N then go back to Step 1.
Step 4: Return R / D.
No floating point arithmetic needed. The result is totally unbiased. If A < 2^32 you fall back to the 32 bit version (or you use the 64 bit version as well, but it calls myrandom () twice as often as needed).
Of course you calculate D and N only once unless A changes.
Maybe you can use "long double" if it is available in your platform.
A buddy of mine had these puzzles and this is one that is eluding me. Here is the problem, you are given a number and you want to return that number times 3 and divided by 16 rounding towards 0. Should be easy. The catch? You can only use the ! ~ & ^ | + << >> operators and of them only a combination of 12.
int mult(int x){
//some code here...
return y;
}
My attempt at it has been:
int hold = x + x + x;
int hold1 = 8;
hold1 = hold1 & hold;
hold1 = hold1 >> 3;
hold = hold >> 4;
hold = hold + hold1;
return hold;
But that doesn't seem to be working. I think I have a problem of losing bits but I can't seem to come up with a way of saving them. Another perspective would be nice. Just to add, you also can only use variables of type int and no loops, if statements or function calls may be used.
Right now I have the number 0xfffffff. It is supposed to return 0x2ffffff but it is returning 0x3000000.
For this question you need to worry about the lost bits before your division (obviously).
Essentially, if it is negative then you want to add 15 after you multiply by 3. A simple if statement (using your operators) should suffice.
I am not going to give you the code but a step by step would look like,
x = x*3
get the sign and store it in variable foo.
have another variable hold x + 15;
Set up an if statement so that if x is negative it uses that added 15 and if not then it uses the regular number (times 3 which we did above).
Then divide by 16 which you already showed you know how to do. Good luck!
This seems to work (as long as no overflow occurs):
((num<<2)+~num+1)>>4
Try this JavaScript code, run in console:
for (var num = -128; num <= 128; ++num) {
var a = Math.floor(num * 3 / 16);
var b = ((num<<2)+~num+1)>>4;
console.log(
"Input:", num,
"Regular math:", a,
"Bit math:", b,
"Equal: ", a===b
);
}
The Maths
When you divide a positive integer n by 16, you get a positive integer quotient k and a remainder c < 16:
(n/16) = k + (c/16).
(Or simply apply the Euclidan algorithm.) The question asks for multiplication by 3/16, so multiply by 3
(n/16) * 3 = 3k + (c/16) * 3.
The number k is an integer, so the part 3k is still a whole number. However, int arithmetic rounds down, so the second term may lose precision if you divide first, And since c < 16, you can safely multiply first without overflowing (assuming sizeof(int) >= 7). So the algorithm design can be
(3n/16) = 3k + (3c/16).
The design
The integer k is simply n/16 rounded down towards 0. So k can be found by applying a single AND operation. Two further operations will give 3k. Operation count: 3.
The remainder c can also be found using an AND operation (with the missing bits). Multiplication by 3 uses two more operations. And shifts finishes the division. Operation count: 4.
Add them together gives you the final answer.
Total operation count: 8.
Negatives
The above algorithm uses shift operations. It may not work well on negatives. However, assuming two's complement, the sign of n is stored in a sign bit. It can be removed beforing applying the algorithm and reapplied on the answer.
To find and store the sign of n, a single AND is sufficient.
To remove this sign, OR can be used.
Apply the above algorithm.
To restore the sign bit, Use a final OR operation on the algorithm output with the stored sign bit.
This brings the final operation count up to 11.
what you can do is first divide by 4 then add 3 times then again devide by 4.
3*x/16=(x/4+x/4+x/4)/4
with this logic the program can be
main()
{
int x=0xefffffff;
int y;
printf("%x",x);
y=x&(0x80000000);
y=y>>31;
x=(y&(~x+1))+(~y&(x));
x=x>>2;
x=x&(0x3fffffff);
x=x+x+x;
x=x>>2;
x=x&(0x3fffffff);
x=(y&(~x+1))+(~y&(x));
printf("\n%x %d",x,x);
}
AND with 0x3fffffff to make msb's zero. it'l even convert numbers to positive.
This uses 2's complement of negative numbers. with direct methods to divide there will be loss of bit accuracy for negative numbers. so use this work arround of converting -ve to +ve number then perform division operations.
Note that the C99 standard states in section section 6.5.7 that right shifts of signed negative integer invokes implementation-defined behavior. Under the provisions that int is comprised of 32 bits and that right shifting of signed integers maps to an arithmetic shift instruction, the following code works for all int inputs. A fully portable solution that also fulfills the requirements set out in the question may be possible, but I cannot think of one right now.
My basic idea is to split the number into high and low bits to prevent intermediate overflow. The high bits are divided by 16 first (this is an exact operation), then multiplied by three. The low bits are first multiplied by three, then divided by 16. Since arithmetic right shift rounds towards negative infinity instead of towards zero like integer division, a correction needs to be applied to the right shift for negative numbers. For a right shift by N, one needs to add 2N-1 prior to the shift if the number to be shifted is negative.
#include <stdio.h>
#include <stdlib.h>
int ref (int a)
{
long long int t = ((long long int)a * 3) / 16;
return (int)t;
}
int main (void)
{
int a, t, r, c, res;
a = 0;
do {
t = a >> 4; /* high order bits */
r = a & 0xf; /* low order bits */
c = (a >> 31) & 15; /* shift correction. Portable alternative: (a < 0) ? 15 : 0 */
res = t + t + t + ((r + r + r + c) >> 4);
if (res != ref(a)) {
printf ("!!!! error a=%08x res=%08x ref=%08x\n", a, res, ref(a));
return EXIT_FAILURE;
}
a++;
} while (a);
return EXIT_SUCCESS;
}
Its an embedded platform thats why such restrictions.
original equation: 0.02035*c*c - 2.4038*c
Did this:
int32_t val = 112; // this value is arbitrary
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = result>>24;
The precision is still poor. When we multiply val*0x535A8 Is there a way we can further improve the precision by rounding up, but without using any float, double, or division.
The problem is not precision. You're using plenty of bits.
I suspect the problem is that you're comparing two different methods of converting to int. The first is a cast of a double, the second is a truncation by right-shifting.
Converting floating point to integer simply drops the fractional part, leading to a round towards zero; right-shifting does a round down or floor. For positive numbers there's no difference, but for negative numbers the two methods will be 1 off from each other. See an example at http://ideone.com/rkckuy and some background reading at Wikipedia.
Your original code is easy to fix:
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
if (result < 0)
result += 0xffffff;
result = result>>24;
See the results at http://ideone.com/D0pNPF
You might also just decide that the right shift result is OK as is. The conversion error isn't greater than it is for the other method, just different.
Edit: If you want to do rounding instead of truncation the answer is even easier.
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = (result + (1L << 23)) >> 24;
I'm going to join in with some of the others in suggesting that you use a constant expression to replace those magic constants with something that documents how they were derived.
static const int32_t a = (int32_t)(0.02035 * (1L << 24) + 0.5);
static const int32_t b = (int32_t)(2.4038 * (1L << 24) + 0.5);
int32_t result = (val*((val * a) - b));
How about just scaling your constants by 10000. The maximum number you then get is 2035*120*120 - 24038*120 = 26419440, which is far below the 2^31 limit. So maybe there is no need to do real bit-tweaking here.
As noted by Joe Hass, your problem is that you shift your precision bits into the dustbin.
Whether shifting your decimals by 2 or by 10 to the left does actually not matter. Just pretend your decimal point is not behind the last bit but at the shifted position. If you keep computing with the result, shifting by 2 is likely easier to handle. If you just want to output the result, shift by powers of ten as proposed above, convert the digits and insert the decimal point 5 characters from the right.
Givens:
Lets assume 1 <= c <= 120,
original equation: 0.02035*c*c - 2.4038*c
then -70.98586 < f(c) < 4.585
--> -71 <= result <= 5
rounding f(c) to nearest int32_t.
Arguments A = 0.02035 and B = 2.4038
A & B may change a bit with subsequent compiles, but not at run-time.
Allow coder to input values like 0.02035 & 2.4038. The key components shown here and by others it to scale the factors like 0.02035 to by some power-of-2, do the equation (simplified into the form (A*c - B)*c) and the scale the result back.
Important features:
1 When determining A and B, insure the compile time floating point multiplication and final conversion occurs via a round and not a truncation. With positive values, the + 0.5 achieves that. Without a rounded answer UD_A*UD_Scaling could end up just under a whole number and truncate away 0.999999 when converting to the int32_t
2 Instead of doing expensive division at run-time, we do >> (right shift). By adding half the divisor (as suggested by #Joe Hass), before the division, we get a nicely rounded answer. It is important not to code in / here as some_signed_int / 4 and some_signed_int >> 2 do not round the same way. With 2's complement, >> truncates toward INT_MIN whereas / truncates toward 0.
#define UD_A (0.02035)
#define UD_B (2.4038)
#define UD_Shift (24)
#define UD_Scaling ((int32_t) 1 << UD_Shift)
#define UD_ScA ((int32_t) (UD_A*UD_Scaling + 0.5))
#define UD_ScB ((int32_t) (UD_B*UD_Scaling + 0.5))
for (int32_t val = 1; val <= 120; val++) {
int32_t result = ((UD_A*val - UD_B)*val + UD_Scaling/2) >> UD_Shift;
printf("%" PRId32 "%" PRId32 "\n", val, result);
}
Example differences:
val, OP equation, OP code, This code
1, -2.38345, -3, -2
54, -70.46460, -71, -70
120, 4.58400, 4, 5
This is a new answer. My old +1 answer deleted.
If you r input uses max 7 bits and you have 32 bit available then your best bet is to shift everything by as many bits as possible and work with that:
int32_t result;
result = (val * (int32_t)(0.02035 * 0x1000000)) - (int32_t)(2.4038 * 0x1000000);
result >>= 8; // make room for another 7 bit multiplication
result *= val;
result >>= 16;
Constant conversion will be done by an optimising compiler at compile time.
Is there a way to remove the following if-statement to check if the value is below 0?
int a = 100;
int b = 200;
int c = a - b;
if (c < 0)
{
c += 3600;
}
The value of c should lie between 0 and 3600. Both a and b are signed. The value of a also should lie between 0 and 3600. (yes, it is a counting value in 0.1 degrees). The value gets reset by an interrupt to 3600, but if that interrupt comes too late it underflows, which is not of a problem, but the software should still be able to handle it. Which it does.
We do this if (c < 0) check at quite some places where we are calculating positions. (Calculating a new position etc.)
I was used to pythons modulo operator to use the signedness of the divisor where our compiler (C89) is using the dividend signedness.
Is there some way to do this calculation differently?
example results:
a - b = c
100 - 200 = 3500
200 - 100 = 100
Good question! How about this?
c += 3600 * (c < 0);
This is one way we preserve branch predictor slots.
What about this (assuming 32-bit ints):
c += 3600 & (c >> 31);
c >> 31 sets all bits to the original MSB, which is 1 for negative numbers and and 0 for others in 2-complement.
Negative number shift right is formally implementation-defined according to C standard documents, however it's almost always implemented with MSB copying (common processors can do it in a single instruction).
This will surely result in no branches, unlike (c < 0) which might be implemented with branch in some cases.
Why are you worried about the branch? [Reason explained in comments to the question.]
The alternative is something like:
((a - b) + 3600) % 3600
This assumes a and b are in the range 0..3600 already; if they're not under control, the more general solution is the one Drew McGowen suggests:
((a - b) % 3600 + 3600) % 3600
The branch miss has to be very expensive to make that much calculation worthwhile.
#skjaidev showed how to do it without branching. Here's how to automatically avoid multiplication as well when ints are twos-complement:
#if ((3600 & -0) == 0) && ((3600 & -1) == 3600)
c += 3600 & -(c < 0);
#else
c += 3600 * (c < 0);
#endif
What you want to do is modular arithmetic. Your 2's complement machine already does this with integer math. So, by mapping your values into 2's complement arithmetic, you can get the modolo operation free.
The trick is represent your angle as a fraction of 360 degrees between 0 and 1-epsilon. Of course, then your constant angles would have to represented similarly, but that shouldn't be hard; its just a bit of math we can hide in a conversion function (er, macro).
The value in this idea is that if you add or subtract angles, you'll get a value whose fraction part you want, and whose integer part you want to throw away. If we represent the fraction as a 32 bit fixed point number with the binary point at 2^32 (e.g., to the left of what is normally considered to be a sign bit), any overflows of the fraction simply fall off the top of the 32 bit value for free. So, you do all integer math, and "overflow" removal happens for free.
So I'd rewrite your code (preserving the idea of degrees times 10):
typedef unsigned int32 angle; // angle*3600/(2^32) represents degrees
#define angle_scale_factor 1193046.47111111 // = 2^32/3600
#define make_angle(degrees) (unsigned int32)((degrees%3600)*angle_scale_factor )
#define make_degrees(angle) (angle/(angle_scale_factor*10)) // produces float number
...
angle a = make_angle(100); // compiler presumably does compile-time math to compute 119304647
angle b = make_angle(200); // = 238609294
angle c = a - b; // compiler should generate integer subtract, which computes 4175662649
#if 0 // no need for this at all; other solutions execute real code to do something here
if (c < 0) // this can't happen
{ c += 3600; } // this is the wrong representation for our variant
#endif
// speed doesn't matter here, we're doing output:
printf("final angle %f4.2 = \n", make_degrees(c)); // should print 350.00
I have not compiled and run this code.
Changes to make this degrees times 100 or times 1 are pretty easy; modify the angle_scale_factor. If you have a 16 bit machine, switching to 16 bits is similarly easy; if you have 32 bits, and you still want to only do 16 bit math, you will need to mask the value to be printed to 16 bits.
This solution has one other nice property: you've documented which variables are angles (and have funny representations). OP's original code just called them ints, but that's not what they represent; a future maintainer will get suprised by the original code, especially if he finds the subtraction isolated from the variables.