Is there a way to remove the following if-statement to check if the value is below 0?
int a = 100;
int b = 200;
int c = a - b;
if (c < 0)
{
c += 3600;
}
The value of c should lie between 0 and 3600. Both a and b are signed. The value of a also should lie between 0 and 3600. (yes, it is a counting value in 0.1 degrees). The value gets reset by an interrupt to 3600, but if that interrupt comes too late it underflows, which is not of a problem, but the software should still be able to handle it. Which it does.
We do this if (c < 0) check at quite some places where we are calculating positions. (Calculating a new position etc.)
I was used to pythons modulo operator to use the signedness of the divisor where our compiler (C89) is using the dividend signedness.
Is there some way to do this calculation differently?
example results:
a - b = c
100 - 200 = 3500
200 - 100 = 100
Good question! How about this?
c += 3600 * (c < 0);
This is one way we preserve branch predictor slots.
What about this (assuming 32-bit ints):
c += 3600 & (c >> 31);
c >> 31 sets all bits to the original MSB, which is 1 for negative numbers and and 0 for others in 2-complement.
Negative number shift right is formally implementation-defined according to C standard documents, however it's almost always implemented with MSB copying (common processors can do it in a single instruction).
This will surely result in no branches, unlike (c < 0) which might be implemented with branch in some cases.
Why are you worried about the branch? [Reason explained in comments to the question.]
The alternative is something like:
((a - b) + 3600) % 3600
This assumes a and b are in the range 0..3600 already; if they're not under control, the more general solution is the one Drew McGowen suggests:
((a - b) % 3600 + 3600) % 3600
The branch miss has to be very expensive to make that much calculation worthwhile.
#skjaidev showed how to do it without branching. Here's how to automatically avoid multiplication as well when ints are twos-complement:
#if ((3600 & -0) == 0) && ((3600 & -1) == 3600)
c += 3600 & -(c < 0);
#else
c += 3600 * (c < 0);
#endif
What you want to do is modular arithmetic. Your 2's complement machine already does this with integer math. So, by mapping your values into 2's complement arithmetic, you can get the modolo operation free.
The trick is represent your angle as a fraction of 360 degrees between 0 and 1-epsilon. Of course, then your constant angles would have to represented similarly, but that shouldn't be hard; its just a bit of math we can hide in a conversion function (er, macro).
The value in this idea is that if you add or subtract angles, you'll get a value whose fraction part you want, and whose integer part you want to throw away. If we represent the fraction as a 32 bit fixed point number with the binary point at 2^32 (e.g., to the left of what is normally considered to be a sign bit), any overflows of the fraction simply fall off the top of the 32 bit value for free. So, you do all integer math, and "overflow" removal happens for free.
So I'd rewrite your code (preserving the idea of degrees times 10):
typedef unsigned int32 angle; // angle*3600/(2^32) represents degrees
#define angle_scale_factor 1193046.47111111 // = 2^32/3600
#define make_angle(degrees) (unsigned int32)((degrees%3600)*angle_scale_factor )
#define make_degrees(angle) (angle/(angle_scale_factor*10)) // produces float number
...
angle a = make_angle(100); // compiler presumably does compile-time math to compute 119304647
angle b = make_angle(200); // = 238609294
angle c = a - b; // compiler should generate integer subtract, which computes 4175662649
#if 0 // no need for this at all; other solutions execute real code to do something here
if (c < 0) // this can't happen
{ c += 3600; } // this is the wrong representation for our variant
#endif
// speed doesn't matter here, we're doing output:
printf("final angle %f4.2 = \n", make_degrees(c)); // should print 350.00
I have not compiled and run this code.
Changes to make this degrees times 100 or times 1 are pretty easy; modify the angle_scale_factor. If you have a 16 bit machine, switching to 16 bits is similarly easy; if you have 32 bits, and you still want to only do 16 bit math, you will need to mask the value to be printed to 16 bits.
This solution has one other nice property: you've documented which variables are angles (and have funny representations). OP's original code just called them ints, but that's not what they represent; a future maintainer will get suprised by the original code, especially if he finds the subtraction isolated from the variables.
Related
I have a loop like this:
for(uint64_t i=0; i*i<n; i++) {
This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.
unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {
In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.
My question is: is the sqrt function accurate enough to do this for all cases?
Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.
My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.
sqrt(x+dx) < sqrt(x) + 1 //solve
dx < 2*sqrt(x) + 1
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1
Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:
unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi); //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi)); //65535 65536
Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.
First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.
If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:
for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)
replacing the multiply with an lea or a shift and two adds.
Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.
However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:
uint32_t cut = sqrt(n) + 1;
and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:
uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;
Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:
uint32_t cut = sqrt(n);
Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:
double_n = n*(1 + e)
which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:
square_root(double_n) = square_root(n)*square_root(1+e)
Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:
square_root(1+e) = 1 + e/2 + O(e^2)
= 1 + d with |d| <~ 2^-54
Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.
[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.
All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both
a simple way to avoid imul in loop
a simple way to compute the ceiling sqrt
Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:
sqrtFloor
"Return the integer part of the square root of self"
| guess delta |
guess := 1 bitShift: (self highBit + 1) // 2.
[
delta := (guess squared - self) // (guess + guess).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1
Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow
But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:
uint32_t sqrtFloor(uint64_t n)
{
int64_t diff;
int64_t delta;
uint64_t guess=sqrt(n); /* implicit conversions here... */
while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
guess -= delta;
return guess-(diff>0);
}
That's a few integer multiplications and divisions, but outside the main loop.
What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.
For x>0, there is
.
To make the notation more compact, rewriting the above formula as
Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:
Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.
For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.
Its an embedded platform thats why such restrictions.
original equation: 0.02035*c*c - 2.4038*c
Did this:
int32_t val = 112; // this value is arbitrary
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = result>>24;
The precision is still poor. When we multiply val*0x535A8 Is there a way we can further improve the precision by rounding up, but without using any float, double, or division.
The problem is not precision. You're using plenty of bits.
I suspect the problem is that you're comparing two different methods of converting to int. The first is a cast of a double, the second is a truncation by right-shifting.
Converting floating point to integer simply drops the fractional part, leading to a round towards zero; right-shifting does a round down or floor. For positive numbers there's no difference, but for negative numbers the two methods will be 1 off from each other. See an example at http://ideone.com/rkckuy and some background reading at Wikipedia.
Your original code is easy to fix:
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
if (result < 0)
result += 0xffffff;
result = result>>24;
See the results at http://ideone.com/D0pNPF
You might also just decide that the right shift result is OK as is. The conversion error isn't greater than it is for the other method, just different.
Edit: If you want to do rounding instead of truncation the answer is even easier.
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = (result + (1L << 23)) >> 24;
I'm going to join in with some of the others in suggesting that you use a constant expression to replace those magic constants with something that documents how they were derived.
static const int32_t a = (int32_t)(0.02035 * (1L << 24) + 0.5);
static const int32_t b = (int32_t)(2.4038 * (1L << 24) + 0.5);
int32_t result = (val*((val * a) - b));
How about just scaling your constants by 10000. The maximum number you then get is 2035*120*120 - 24038*120 = 26419440, which is far below the 2^31 limit. So maybe there is no need to do real bit-tweaking here.
As noted by Joe Hass, your problem is that you shift your precision bits into the dustbin.
Whether shifting your decimals by 2 or by 10 to the left does actually not matter. Just pretend your decimal point is not behind the last bit but at the shifted position. If you keep computing with the result, shifting by 2 is likely easier to handle. If you just want to output the result, shift by powers of ten as proposed above, convert the digits and insert the decimal point 5 characters from the right.
Givens:
Lets assume 1 <= c <= 120,
original equation: 0.02035*c*c - 2.4038*c
then -70.98586 < f(c) < 4.585
--> -71 <= result <= 5
rounding f(c) to nearest int32_t.
Arguments A = 0.02035 and B = 2.4038
A & B may change a bit with subsequent compiles, but not at run-time.
Allow coder to input values like 0.02035 & 2.4038. The key components shown here and by others it to scale the factors like 0.02035 to by some power-of-2, do the equation (simplified into the form (A*c - B)*c) and the scale the result back.
Important features:
1 When determining A and B, insure the compile time floating point multiplication and final conversion occurs via a round and not a truncation. With positive values, the + 0.5 achieves that. Without a rounded answer UD_A*UD_Scaling could end up just under a whole number and truncate away 0.999999 when converting to the int32_t
2 Instead of doing expensive division at run-time, we do >> (right shift). By adding half the divisor (as suggested by #Joe Hass), before the division, we get a nicely rounded answer. It is important not to code in / here as some_signed_int / 4 and some_signed_int >> 2 do not round the same way. With 2's complement, >> truncates toward INT_MIN whereas / truncates toward 0.
#define UD_A (0.02035)
#define UD_B (2.4038)
#define UD_Shift (24)
#define UD_Scaling ((int32_t) 1 << UD_Shift)
#define UD_ScA ((int32_t) (UD_A*UD_Scaling + 0.5))
#define UD_ScB ((int32_t) (UD_B*UD_Scaling + 0.5))
for (int32_t val = 1; val <= 120; val++) {
int32_t result = ((UD_A*val - UD_B)*val + UD_Scaling/2) >> UD_Shift;
printf("%" PRId32 "%" PRId32 "\n", val, result);
}
Example differences:
val, OP equation, OP code, This code
1, -2.38345, -3, -2
54, -70.46460, -71, -70
120, 4.58400, 4, 5
This is a new answer. My old +1 answer deleted.
If you r input uses max 7 bits and you have 32 bit available then your best bet is to shift everything by as many bits as possible and work with that:
int32_t result;
result = (val * (int32_t)(0.02035 * 0x1000000)) - (int32_t)(2.4038 * 0x1000000);
result >>= 8; // make room for another 7 bit multiplication
result *= val;
result >>= 16;
Constant conversion will be done by an optimising compiler at compile time.
I'm working with a microchip that doesn't have room for floating point precision, however. I need to account for fractional values during some equations. So far I've had good luck using the old *100 -> /100 method like so:
increment = (short int)(((value1 - value2)*100 / totalSteps));
// later in the code I loop through the number of totolSteps
// adding back the increment to arrive at the total I want at the precise time
// time I need it.
newValue = oldValue + (increment / 100);
This works great for values from 0-255 divided by a totalSteps of up to 300. After 300, the fractional values to the right of the decimal place, become important, because they add up over time of course.
I'm curious if anyone has a better way to save decimal accuracy within an integer paradigm? I tried using *1000 /1000, but that didn't work at all.
Thank you in advance.
Fractions with integers is called fixed point math.
Try Googling "fixed point".
Fixed point tips and tricks are out of the scope of SO answer...
Example: 5 tap FIR filter
// C is the filter coefficients using 2.8 fixed precision.
// 2 MSB (of 10) is for integer part and 8 LSB (of 10) is the fraction part.
// Actual fraction precision here is 1/256.
int FIR_5(int* in, // input samples
int inPrec, // sample fraction precision
int* c, // filter coefficients
int cPrec) // coefficients fraction precision
{
const int coefHalf = (cPrec > 0) ? 1 << (cPrec - 1) : 0; // value of 0.5 using cPrec
int sum = 0;
for ( int i = 0; i < 5; ++i )
{
sum += in[i] * c[i];
}
// sum's precision is X.N. where N = inPrec + cPrec;
// return to original precision (inPrec)
sum = (sum + coefHalf) >> cPrec; // adding coefHalf for rounding
return sum;
}
int main()
{
const int filterPrec = 8;
int C[5] = { 8, 16, 208, 16, 8 }; // 1.0 == 256 in 2.8 fixed point. Filter value are 8/256, 16/256, 208/256, etc.
int W[5] = { 10, 203, 40, 50, 72}; // A sampling window (example)
int res = FIR_5(W, 0, C, filterPrec);
return 0;
}
Notes:
In the above example:
the samples are integers (no fraction)
the coefs have fractions of 8 bit.
8 bit fractions mean that each change of 1 is treated as 1/256. 1 << 8 == 256.
Useful notation is Y.Xu or Y.Xs. where Y is how many bits are allocated for the integer part and X for he fraction. u/s denote signed/unsigned.
when multiplying 2 fixed point numbers, their precision (size of fraction bits) are added to each other.
Example A is 0.8u, B is 0.2U. C=A*B. C is 0.10u
when dividing, use a shift operation to lower the result precision. Amount of shifting is up to you. Before lowering precision it's better to add a half to lower the error.
Example: A=129 in 0.8u which is a little over 0.5 (129/256). We want the integer part so we right shift it by 8. Before that we want to add a half which is 128 (1<<7). So A = (A + 128) >> 8 --> 1.
Without adding a half you'll get a larger error in the final result.
Don't use this approach.
New paradigm: Do not accumulate using FP math or fixed point math. Do your accumulation and other equations with integer math. Anytime you need to get some scaled value, divide by your scale factor (100), but do the "add up" part with the raw, unscaled values.
Here's a quick attempt at a precise rational (Bresenham-esque) version of the interpolation if you truly cannot afford to directly interpolate at each step.
div_t frac_step = div(target - source, num_steps);
if(frac_step.rem < 0) {
// Annoying special case to deal with rounding towards zero.
// Alternatively check for the error term slipping to < -num_steps as well
frac_step.rem = -frac_step.rem;
--frac_step.quot;
}
unsigned int error = 0;
do {
// Add the integer term plus an accumulated fraction
error += frac_step.rem;
if(error >= num_steps) {
// Time to carry
error -= num_steps;
++source;
}
source += frac_step.quot;
} while(--num_steps);
A major drawback compared to the fixed-point solution is that the fractional term gets rounded off between iterations if you are using the function to continually walk towards a moving target at differing step lengths.
Oh, and for the record your original code does not seem to be properly accumulating the fractions when stepping, e.g. a 1/100 increment will always be truncated to 0 in the addition no matter how many times the step is taken. Instead you really want to add the increment to a higher-precision fixed-point accumulator and then divide it by 100 (or preferably right shift to divide by a power-of-two) each iteration in order to compute the integer "position".
Do take care with the different integer types and ranges required in your calculations. A multiplication by 1000 will overflow a 16-bit integer unless one term is a long. Go through you calculations and keep track of input ranges and the headroom at each step, then select your integer types to match.
Maybe you can simulate floating point behaviour by saving
it using the IEEE 754 specification
So you save mantisse, exponent, and sign as unsigned int values.
For calculation you use then bitwise addition of mantisse and exponent and so on.
Multiplication and Division you can replace by bitwise addition operations.
I think it is a lot of programming staff to emulate that but it should work.
Your choice of type is the problem: short int is likely to be 16 bits wide. That's why large multipliers don't work - you're limited to +/-32767. Use a 32 bit long int, assuming that your compiler supports it. What chip is it, by the way, and what compiler?
Let us say we have x and y and both are signed integers in C, how do we find the most accurate mean value between the two?
I would prefer a solution that does not take advantage of any machine/compiler/toolchain specific workings.
The best I have come up with is:(a / 2) + (b / 2) + !!(a % 2) * !!(b %2) Is there a solution that is more accurate? Faster? Simpler?
What if we know if one is larger than the other a priori?
Thanks.
D
Editor's Note: Please note that the OP expects answers that are not subject to integer overflow when input values are close to the maximum absolute bounds of the C int type. This was not stated in the original question, but is important when giving an answer.
After accept answer (4 yr)
I would expect the function int average_int(int a, int b) to:
1. Work over the entire range of [INT_MIN..INT_MAX] for all combinations of a and b.
2. Have the same result as (a+b)/2, as if using wider math.
When int2x exists, #Santiago Alessandri approach works well.
int avgSS(int a, int b) {
return (int) ( ((int2x) a + b) / 2);
}
Otherwise a variation on #AProgrammer:
Note: wider math is not needed.
int avgC(int a, int b) {
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
}
return (a+b)/2;
}
A solution with more tests, but without %
All below solutions "worked" to within 1 of (a+b)/2 when overflow did not occur, but I was hoping to find one that matched (a+b)/2 for all int.
#Santiago Alessandri Solution works as long as the range of int is narrower than the range of long long - which is usually the case.
((long long)a + (long long)b) / 2
#AProgrammer, the accepted answer, fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
a/2 + b/2 + (a%2 + b%2)/2
#Guy Sirton, Solution fails about 1/8 of the time to match (a+b)/2. Example inputs like a == 1, b == 0
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a;
#R.., Solution fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == 1
return (a-(a|b)+b)/2+(a|b)/2;
#MatthewD, now deleted solution fails about 5/6 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
unsigned diff;
signed mean;
if (a > b) {
diff = a - b;
mean = b + (diff >> 1);
} else {
diff = b - a;
mean = a + (diff >> 1);
}
If (a^b)<=0 you can just use (a+b)/2 without fear of overflow.
Otherwise, try (a-(a|b)+b)/2+(a|b)/2. -(a|b) is at least as large in magnitude as both a and b and has the opposite sign, so this avoids the overflow.
I did this quickly off the top of my head so there might be some stupid errors. Note that there are no machine-specific hacks here. All behavior is completely determined by the C standard and the fact that it requires twos-complement, ones-complement, or sign-magnitude representation of signed values and specifies that the bitwise operators work on the bit-by-bit representation. Nope, the relative magnitude of a|b depends on the representation...
Edit: You could also use a+(b-a)/2 when they have the same sign. Note that this will give a bias towards a. You can reverse it and get a bias towards b. My solution above, on the other hand, gives bias towards zero if I'm not mistaken.
Another try: One standard approach is (a&b)+(a^b)/2. In twos complement it works regardless of the signs, but I believe it also works in ones complement or sign-magnitude if a and b have the same sign. Care to check it?
Edit: version fixed by #chux - Reinstate Monica:
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
} else {
return (a+b)/2;
}
Original answer (I'd have deleted it if it hadn't been accepted).
a/2 + b/2 + (a%2 + b%2)/2
Seems the simplest one fitting the bill of no assumption on implementation characteristics (it has a dependency on C99 which specifying the result of / as "truncated toward 0" while it was implementation dependent for C90).
It has the advantage of having no test (and thus no costly jumps) and all divisions/remainder are by 2 so the use of bit twiddling techniques by the compiler is possible.
For unsigned integers the average is the floor of (x+y)/2. But the same fails for signed integers. This formula fails for integers whose sum is an odd -ve number as their floor is one less than their average.
You can read up more at Hacker's Delight in section 2.5
The code to calculate average of 2 signed integers without overflow is
int t = (a & b) + ((a ^ b) >> 1)
unsigned t_u = (unsigned)t
int avg = t + ( (t_u >> 31 ) & (a ^ b) )
I have checked it's correctness using Z3 SMT solver
Just a few observations that may help:
"Most accurate" isn't necessarily unique with integers. E.g. for 1 and 4, 2 and 3 are an equally "most accurate" answer. Mathematically (not C integers):
(a+b)/2 = a+(b-a)/2 = b+(a-b)/2
Let's try breaking this down:
If sign(a)!=sign(b) then a+b will will not overflow. This case can be determined by comparing the most significant bit in a two's complement representation.
If sign(a)==sign(b) then if a is greater than b, (a-b) will not overflow. Otherwise (b-a) will not overflow. EDIT: Actually neither will overflow.
What are you trying to optimize exactly? Different processor architectures may have different optimal solutions. For example, in your code replacing the multiplication with an AND may improve performance. Also in a two's complement architecture you can simply (a & b & 1).
I'm just going to throw some code out, not looking too fast but perhaps someone can use and improve:
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a
I would do this, convert both to long long(64 bit signed integers) add them up, this won't overflow and then divide the result by 2:
((long long)a + (long long)b) / 2
If you want the decimal part, store it as a double.
It is important to note that the result will fit in a 32 bit integer.
If you are using the highest-rank integer, then you can use:
((double)a + (double)b) / 2
This answer fits to any number of integers:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test
I have minimize cost of calculating modulus in C.
say I have a number x and n is the number which will divide x
when n == 65536 (which happens to be 2^16):
mod = x % n (11 assembly instructions as produced by GCC)
or
mod = x & 0xffff which is equal to mod = x & 65535 (4 assembly instructions)
so, GCC doesn't optimize it to this extent.
In my case n is not x^(int) but is largest prime less than 2^16 which is 65521
as I showed for n == 2^16, bit-wise operations can optimize the computation. What bit-wise operations can I preform when n == 65521 to calculate modulus.
First, make sure you're looking at optimized code before drawing conclusion about what GCC is producing (and make sure this particular expression really needs to be optimized). Finally - don't count instructions to draw your conclusions; it may be that an 11 instruction sequence might be expected to perform better than a shorter sequence that includes a div instruction.
Also, you can't conclude that because x mod 65536 can be calculated with a simple bit mask that any mod operation can be implemented that way. Consider how easy dividing by 10 in decimal is as opposed to dividing by an arbitrary number.
With all that out of the way, you may be able to use some of the 'magic number' techniques from Henry Warren's Hacker's Delight book:
Archive of http://www.hackersdelight.org/
Archive of http://www.hackersdelight.org/magic.htm
There was an added chapter on the website that contained "two methods of computing the remainder of division without computing the quotient!", which you may find of some use. The 1st technique applies only to a limited set of divisors, so it won't work for your particular instance. I haven't actually read the online chapter, so I don't know exactly how applicable the other technique might be for you.
x mod 65536 is only equivalent to x & 0xffff if x is unsigned - for signed x, it gives the wrong result for negative numbers. For unsigned x, gcc does indeed optimise x % 65536 to a bitwise and with 65535 (even on -O0, in my tests).
Because 65521 is not a power of 2, x mod 65521 can't be calculated so simply. gcc 4.3.2 on -O3 calculates it using x - (x / 65521) * 65521; the integer division by a constant is done using integer multiplication by a related constant.
rIf you don't have to fully reduce your integers modulo 65521, then you can use the fact that 65521 is close to 2**16. I.e. if x is an unsigned int you want to reduce then you can do the following:
unsigned int low = x &0xffff;
unsigned int hi = (x >> 16);
x = low + 15 * hi;
This uses that 2**16 % 65521 == 15. Note that this is not a full reduction. I.e. starting with a 32-bit input, you only are guaranteed that the result is at most 20 bits and that it is of course congruent to the input modulo 65521.
This trick can be used in applications where there are many operations that have to be reduced modulo the same constant, and where intermediary results do not have to be the smallest element in its residue class.
E.g. one application is the implementation of Adler-32, which uses the modulus 65521. This hash function does a lot of operations modulo 65521. To implement it efficiently one would only do modular reductions after a carefully computed number of additions. A reduction shown as above is enough and only the computation of the hash will need a full modulo operation.
The bitwise operation only works well if the divisor is of the form 2^n. In the general case, there is no such bit-wise operation.
If the constant with which you want to take the modulo is known at compile time
and you have a decent compiler (e.g. gcc), tis usually best to let the compiler
work its magic. Just declare the modulo const.
If you don't know the constant at compile time, but you are going to take - say -
a billion modulos with the same number, then use this http://libdivide.com/
As an approach when we deal with powers of 2, can be considered this one (mostly C flavored):
.
.
#define THE_DIVISOR 0x8U; /* The modulo value (POWER OF 2). */
.
.
uint8 CheckIfModulo(const sint32 TheDividend)
{
uint8 RetVal = 1; /* TheDividend is not modulus THE_DIVISOR. */
if (0 == (TheDividend & (THE_DIVISOR - 1)))
{
/* code if modulo is satisfied */
RetVal = 0; /* TheDividend IS modulus THE_DIVISOR. */
}
else
{
/* code if modulo is NOT satisfied */
}
return RetVal;
}
If x is an increasing index, and the increment i is known to be less than n (e.g. when iterating over a circular array of length n), avoid the modulus completely.
A loop going
x += i; if (x >= n) x -= n;
is way faster than
x = (x + i) % n;
which you unfortunately find in many text books...
If you really need an expression (e.g. because you are using it in a for statement), you can use the ugly but efficient
x = x + (x+i < n ? i : i-n)
idiv — Integer Division
The idiv instruction divides the contents of the 64 bit integer EDX:EAX (constructed by viewing EDX as the most significant four bytes and EAX as the least significant four bytes) by the specified operand value. The quotient result of the division is stored into EAX, while the remainder is placed in EDX.
source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html