Let's suppose we have noramlly distributed random int values from function:
unsigned int myrand();
The commonest way to shrink its range to [0, A] (int A) is to do as follows:
(double)rand() / UINT_MAX * A
Now I need to do the same for values in range of __int64:
unsigned __int64 max64;
unsigned __int64 r64 = myrand();
r64 <<= 32;
r64 |= myrand();
r64 = normalize(r64, max64);
The problem is to normalize return range by some __int64 because it could not be placed in double. I wouldn't like to use various libraries for big numbers due to performance reasons. Is there a way to shrink return range quickly and easily while saving normal distribution of values?
The method that you give
(double)myrand() / UINT_MAX * A
is already broken. For example, if A = 1 and you want integers in the range [0, 1] you will only ever get a value of 1 if myrand () returned UINT_MAX. If you meant the range [0, A), that is only the value 0, then it is still broken because it will in that case return a value outside the range. No matter what, you are introducing a bias.
If you want A+1 different values from 0 to A inclusive, and 2^32 ≤ A < 2^64, you proceed as follows:
Step 1: Calculate a 64 bit random number R as you did. If A is one less than a power of two, you return R shifted by the right amount.
Step 2: Find how many different random values would be mapped to the same output value. Mathematically, that number is floor (2^64 / (A + 1)). 2^64 is too large, but that is no problem because it is equal to 1 + floor ((2^64 - (A + 1)) / (A + 1)), calculated in C or C++ as D = 1 + (- (A + 1)) / (A + 1) if A has type uint64_t.
Step 3: Find how many different random values should be mapped by calculating N = D * (A + 1). If R >= N then go back to Step 1.
Step 4: Return R / D.
No floating point arithmetic needed. The result is totally unbiased. If A < 2^32 you fall back to the 32 bit version (or you use the 64 bit version as well, but it calls myrandom () twice as often as needed).
Of course you calculate D and N only once unless A changes.
Maybe you can use "long double" if it is available in your platform.
Related
I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.
Example:
A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.
The function I am currently using is:
int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
return ceil(n_digits*log(src_base)/log(dst_base));
}
Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.
I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.
GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.
I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.
In the following, let dst_base be 2^w, src_base be b, and n_digits be n.
Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)
For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:
ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) )
≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.
In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.
I'm not sure about float point rounding in this case, but it is relatively easy to implement this using only integers, as log2 is a classic bit manipulation pattern and integer division can be easily rounded up. The following code is equivalent to yours, but using integers:
// Returns log2(x) rounded up using bit manipulation (not most efficient way)
unsigned int log2(unsigned int x)
{
unsigned int y = 0;
--x;
while (x) {
y++;
x >>= 1;
}
return y;
}
// Returns ceil(a/b) using integer division
unsigned int roundup(unsigned int a, unsigned int b)
{
return (a + b - 1) / b;
}
unsigned int get_num_digits_in_different_base(unsigned int n_digits, unsigned int src_base, unsigned int log2_dst_base)
{
return roundup(n_digits * log2(src_base), log2_dst_base);
}
Please, note that:
This function return different results compared to yours! However, in every case I looked, both were still correct (the smaller value was more accurate, but your requirement is just an upper bound).
The integer version I wrote receives log2_dst_base instead of dst_base to avoid overflow for 2^64.
log2 can be made more efficient using lookup tables.
I've used unsigned int instead of int.
A buddy of mine had these puzzles and this is one that is eluding me. Here is the problem, you are given a number and you want to return that number times 3 and divided by 16 rounding towards 0. Should be easy. The catch? You can only use the ! ~ & ^ | + << >> operators and of them only a combination of 12.
int mult(int x){
//some code here...
return y;
}
My attempt at it has been:
int hold = x + x + x;
int hold1 = 8;
hold1 = hold1 & hold;
hold1 = hold1 >> 3;
hold = hold >> 4;
hold = hold + hold1;
return hold;
But that doesn't seem to be working. I think I have a problem of losing bits but I can't seem to come up with a way of saving them. Another perspective would be nice. Just to add, you also can only use variables of type int and no loops, if statements or function calls may be used.
Right now I have the number 0xfffffff. It is supposed to return 0x2ffffff but it is returning 0x3000000.
For this question you need to worry about the lost bits before your division (obviously).
Essentially, if it is negative then you want to add 15 after you multiply by 3. A simple if statement (using your operators) should suffice.
I am not going to give you the code but a step by step would look like,
x = x*3
get the sign and store it in variable foo.
have another variable hold x + 15;
Set up an if statement so that if x is negative it uses that added 15 and if not then it uses the regular number (times 3 which we did above).
Then divide by 16 which you already showed you know how to do. Good luck!
This seems to work (as long as no overflow occurs):
((num<<2)+~num+1)>>4
Try this JavaScript code, run in console:
for (var num = -128; num <= 128; ++num) {
var a = Math.floor(num * 3 / 16);
var b = ((num<<2)+~num+1)>>4;
console.log(
"Input:", num,
"Regular math:", a,
"Bit math:", b,
"Equal: ", a===b
);
}
The Maths
When you divide a positive integer n by 16, you get a positive integer quotient k and a remainder c < 16:
(n/16) = k + (c/16).
(Or simply apply the Euclidan algorithm.) The question asks for multiplication by 3/16, so multiply by 3
(n/16) * 3 = 3k + (c/16) * 3.
The number k is an integer, so the part 3k is still a whole number. However, int arithmetic rounds down, so the second term may lose precision if you divide first, And since c < 16, you can safely multiply first without overflowing (assuming sizeof(int) >= 7). So the algorithm design can be
(3n/16) = 3k + (3c/16).
The design
The integer k is simply n/16 rounded down towards 0. So k can be found by applying a single AND operation. Two further operations will give 3k. Operation count: 3.
The remainder c can also be found using an AND operation (with the missing bits). Multiplication by 3 uses two more operations. And shifts finishes the division. Operation count: 4.
Add them together gives you the final answer.
Total operation count: 8.
Negatives
The above algorithm uses shift operations. It may not work well on negatives. However, assuming two's complement, the sign of n is stored in a sign bit. It can be removed beforing applying the algorithm and reapplied on the answer.
To find and store the sign of n, a single AND is sufficient.
To remove this sign, OR can be used.
Apply the above algorithm.
To restore the sign bit, Use a final OR operation on the algorithm output with the stored sign bit.
This brings the final operation count up to 11.
what you can do is first divide by 4 then add 3 times then again devide by 4.
3*x/16=(x/4+x/4+x/4)/4
with this logic the program can be
main()
{
int x=0xefffffff;
int y;
printf("%x",x);
y=x&(0x80000000);
y=y>>31;
x=(y&(~x+1))+(~y&(x));
x=x>>2;
x=x&(0x3fffffff);
x=x+x+x;
x=x>>2;
x=x&(0x3fffffff);
x=(y&(~x+1))+(~y&(x));
printf("\n%x %d",x,x);
}
AND with 0x3fffffff to make msb's zero. it'l even convert numbers to positive.
This uses 2's complement of negative numbers. with direct methods to divide there will be loss of bit accuracy for negative numbers. so use this work arround of converting -ve to +ve number then perform division operations.
Note that the C99 standard states in section section 6.5.7 that right shifts of signed negative integer invokes implementation-defined behavior. Under the provisions that int is comprised of 32 bits and that right shifting of signed integers maps to an arithmetic shift instruction, the following code works for all int inputs. A fully portable solution that also fulfills the requirements set out in the question may be possible, but I cannot think of one right now.
My basic idea is to split the number into high and low bits to prevent intermediate overflow. The high bits are divided by 16 first (this is an exact operation), then multiplied by three. The low bits are first multiplied by three, then divided by 16. Since arithmetic right shift rounds towards negative infinity instead of towards zero like integer division, a correction needs to be applied to the right shift for negative numbers. For a right shift by N, one needs to add 2N-1 prior to the shift if the number to be shifted is negative.
#include <stdio.h>
#include <stdlib.h>
int ref (int a)
{
long long int t = ((long long int)a * 3) / 16;
return (int)t;
}
int main (void)
{
int a, t, r, c, res;
a = 0;
do {
t = a >> 4; /* high order bits */
r = a & 0xf; /* low order bits */
c = (a >> 31) & 15; /* shift correction. Portable alternative: (a < 0) ? 15 : 0 */
res = t + t + t + ((r + r + r + c) >> 4);
if (res != ref(a)) {
printf ("!!!! error a=%08x res=%08x ref=%08x\n", a, res, ref(a));
return EXIT_FAILURE;
}
a++;
} while (a);
return EXIT_SUCCESS;
}
Its an embedded platform thats why such restrictions.
original equation: 0.02035*c*c - 2.4038*c
Did this:
int32_t val = 112; // this value is arbitrary
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = result>>24;
The precision is still poor. When we multiply val*0x535A8 Is there a way we can further improve the precision by rounding up, but without using any float, double, or division.
The problem is not precision. You're using plenty of bits.
I suspect the problem is that you're comparing two different methods of converting to int. The first is a cast of a double, the second is a truncation by right-shifting.
Converting floating point to integer simply drops the fractional part, leading to a round towards zero; right-shifting does a round down or floor. For positive numbers there's no difference, but for negative numbers the two methods will be 1 off from each other. See an example at http://ideone.com/rkckuy and some background reading at Wikipedia.
Your original code is easy to fix:
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
if (result < 0)
result += 0xffffff;
result = result>>24;
See the results at http://ideone.com/D0pNPF
You might also just decide that the right shift result is OK as is. The conversion error isn't greater than it is for the other method, just different.
Edit: If you want to do rounding instead of truncation the answer is even easier.
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = (result + (1L << 23)) >> 24;
I'm going to join in with some of the others in suggesting that you use a constant expression to replace those magic constants with something that documents how they were derived.
static const int32_t a = (int32_t)(0.02035 * (1L << 24) + 0.5);
static const int32_t b = (int32_t)(2.4038 * (1L << 24) + 0.5);
int32_t result = (val*((val * a) - b));
How about just scaling your constants by 10000. The maximum number you then get is 2035*120*120 - 24038*120 = 26419440, which is far below the 2^31 limit. So maybe there is no need to do real bit-tweaking here.
As noted by Joe Hass, your problem is that you shift your precision bits into the dustbin.
Whether shifting your decimals by 2 or by 10 to the left does actually not matter. Just pretend your decimal point is not behind the last bit but at the shifted position. If you keep computing with the result, shifting by 2 is likely easier to handle. If you just want to output the result, shift by powers of ten as proposed above, convert the digits and insert the decimal point 5 characters from the right.
Givens:
Lets assume 1 <= c <= 120,
original equation: 0.02035*c*c - 2.4038*c
then -70.98586 < f(c) < 4.585
--> -71 <= result <= 5
rounding f(c) to nearest int32_t.
Arguments A = 0.02035 and B = 2.4038
A & B may change a bit with subsequent compiles, but not at run-time.
Allow coder to input values like 0.02035 & 2.4038. The key components shown here and by others it to scale the factors like 0.02035 to by some power-of-2, do the equation (simplified into the form (A*c - B)*c) and the scale the result back.
Important features:
1 When determining A and B, insure the compile time floating point multiplication and final conversion occurs via a round and not a truncation. With positive values, the + 0.5 achieves that. Without a rounded answer UD_A*UD_Scaling could end up just under a whole number and truncate away 0.999999 when converting to the int32_t
2 Instead of doing expensive division at run-time, we do >> (right shift). By adding half the divisor (as suggested by #Joe Hass), before the division, we get a nicely rounded answer. It is important not to code in / here as some_signed_int / 4 and some_signed_int >> 2 do not round the same way. With 2's complement, >> truncates toward INT_MIN whereas / truncates toward 0.
#define UD_A (0.02035)
#define UD_B (2.4038)
#define UD_Shift (24)
#define UD_Scaling ((int32_t) 1 << UD_Shift)
#define UD_ScA ((int32_t) (UD_A*UD_Scaling + 0.5))
#define UD_ScB ((int32_t) (UD_B*UD_Scaling + 0.5))
for (int32_t val = 1; val <= 120; val++) {
int32_t result = ((UD_A*val - UD_B)*val + UD_Scaling/2) >> UD_Shift;
printf("%" PRId32 "%" PRId32 "\n", val, result);
}
Example differences:
val, OP equation, OP code, This code
1, -2.38345, -3, -2
54, -70.46460, -71, -70
120, 4.58400, 4, 5
This is a new answer. My old +1 answer deleted.
If you r input uses max 7 bits and you have 32 bit available then your best bet is to shift everything by as many bits as possible and work with that:
int32_t result;
result = (val * (int32_t)(0.02035 * 0x1000000)) - (int32_t)(2.4038 * 0x1000000);
result >>= 8; // make room for another 7 bit multiplication
result *= val;
result >>= 16;
Constant conversion will be done by an optimising compiler at compile time.
A define sentence is:
#define _INTSIZEOF(n) ( (sizeof(n) + sizeof(int) - 1) & ~(sizeof(int) - 1) )
I have been told the purpose is bit alignment.
I wonder how it works, thx in advance.
The above macro simply aligns the size of n to the nearest greater-or-equal sizeof(int) boundary.
The basic algorithm for aligning value a to the nearest greater-or-equal arbitrary boundary b is to
Divide a by b rounding up, and then
Multiply the quotient by b again.
In the domain of unsigned (or just positive) values the first step is achieved by the following popular trick
q = (a + b - 1) / b
// where `/` is ordinary C-style integer division (rounding down)
// Now `q` is `a` divided by `b` rounded up
Combining this with the second step we get the following
aligned_a = (a + b - 1) / b * b
In aligned_a you get the desired aligned value.
Applying this algorithm to the problem at hand one would arrive at the following implementation of _INTSIZEOF macro
#define _INTSIZEOF(n)\
( (sizeof(n) + sizeof(int) - 1) / sizeof(int) * sizeof(int) )
This is already good enough.
However, if you know in advance that the alignment boundary is a power of 2, you can "optimize" the calculations by replacing the divide+multiply sequence with a simple bitwise operation
aligned_a = (a + b - 1) & ~(b - 1)
That is exactly what's done in the above original implementation of _INTSIZEOF macro.
This "optimization" might probably make sense with some compilers (although I would expect a modern compiler to be able to figure it out by itself). However, considering that the above _INTSIZEOF(n) macro is apparently intended to serve as a compile-time expression (it does not depend on any run-time values, barring VLA objects/types passed as n), there's not much point in optimizing it that way.
Here's a hint:
A common method to do ceil(a/b) is:
(a + (b-1)) / b
b * ( (a + b - 1) / b ) = (a + b - 1) & ~(b - 1)
To see why the above holds, consider this:
Part I (why q = (a + b - 1) / b produces the number we are looking for):
... note that we want q to be the number of b's that are in a, but rounded up (i.e., if after integer division, there is a remainder, then that remainder should be rounded up to b and hence q incremented by 1).
there exists Q and R such that a = Qb + R, and hence a + b - 1 = Qb + b - 1 + R. If we perform integer division on a + b - 1 by b we would get Q + (b-1+R)/b. The 2nd part of this will be zero if R is zero and 1 if R is not zero (note R is guaranteed to be less than b).
Part II (the macro):
now if b is a power of two, then integer division of a + b - 1 by b is simply a right shift of the exponent of b (i.e., b = 2^n, then shift right by n places).
in addition, multiplication by b is a left shift (shift left by n places)
hence combined, all we are doing is clearing the rightmost n bits to zero, and this is accomplished by masking: ~(b-1) gives us 1111...111000...0 where the number of 1s is equal to n (b = 2^n)
Let us say we have x and y and both are signed integers in C, how do we find the most accurate mean value between the two?
I would prefer a solution that does not take advantage of any machine/compiler/toolchain specific workings.
The best I have come up with is:(a / 2) + (b / 2) + !!(a % 2) * !!(b %2) Is there a solution that is more accurate? Faster? Simpler?
What if we know if one is larger than the other a priori?
Thanks.
D
Editor's Note: Please note that the OP expects answers that are not subject to integer overflow when input values are close to the maximum absolute bounds of the C int type. This was not stated in the original question, but is important when giving an answer.
After accept answer (4 yr)
I would expect the function int average_int(int a, int b) to:
1. Work over the entire range of [INT_MIN..INT_MAX] for all combinations of a and b.
2. Have the same result as (a+b)/2, as if using wider math.
When int2x exists, #Santiago Alessandri approach works well.
int avgSS(int a, int b) {
return (int) ( ((int2x) a + b) / 2);
}
Otherwise a variation on #AProgrammer:
Note: wider math is not needed.
int avgC(int a, int b) {
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
}
return (a+b)/2;
}
A solution with more tests, but without %
All below solutions "worked" to within 1 of (a+b)/2 when overflow did not occur, but I was hoping to find one that matched (a+b)/2 for all int.
#Santiago Alessandri Solution works as long as the range of int is narrower than the range of long long - which is usually the case.
((long long)a + (long long)b) / 2
#AProgrammer, the accepted answer, fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
a/2 + b/2 + (a%2 + b%2)/2
#Guy Sirton, Solution fails about 1/8 of the time to match (a+b)/2. Example inputs like a == 1, b == 0
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a;
#R.., Solution fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == 1
return (a-(a|b)+b)/2+(a|b)/2;
#MatthewD, now deleted solution fails about 5/6 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
unsigned diff;
signed mean;
if (a > b) {
diff = a - b;
mean = b + (diff >> 1);
} else {
diff = b - a;
mean = a + (diff >> 1);
}
If (a^b)<=0 you can just use (a+b)/2 without fear of overflow.
Otherwise, try (a-(a|b)+b)/2+(a|b)/2. -(a|b) is at least as large in magnitude as both a and b and has the opposite sign, so this avoids the overflow.
I did this quickly off the top of my head so there might be some stupid errors. Note that there are no machine-specific hacks here. All behavior is completely determined by the C standard and the fact that it requires twos-complement, ones-complement, or sign-magnitude representation of signed values and specifies that the bitwise operators work on the bit-by-bit representation. Nope, the relative magnitude of a|b depends on the representation...
Edit: You could also use a+(b-a)/2 when they have the same sign. Note that this will give a bias towards a. You can reverse it and get a bias towards b. My solution above, on the other hand, gives bias towards zero if I'm not mistaken.
Another try: One standard approach is (a&b)+(a^b)/2. In twos complement it works regardless of the signs, but I believe it also works in ones complement or sign-magnitude if a and b have the same sign. Care to check it?
Edit: version fixed by #chux - Reinstate Monica:
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
} else {
return (a+b)/2;
}
Original answer (I'd have deleted it if it hadn't been accepted).
a/2 + b/2 + (a%2 + b%2)/2
Seems the simplest one fitting the bill of no assumption on implementation characteristics (it has a dependency on C99 which specifying the result of / as "truncated toward 0" while it was implementation dependent for C90).
It has the advantage of having no test (and thus no costly jumps) and all divisions/remainder are by 2 so the use of bit twiddling techniques by the compiler is possible.
For unsigned integers the average is the floor of (x+y)/2. But the same fails for signed integers. This formula fails for integers whose sum is an odd -ve number as their floor is one less than their average.
You can read up more at Hacker's Delight in section 2.5
The code to calculate average of 2 signed integers without overflow is
int t = (a & b) + ((a ^ b) >> 1)
unsigned t_u = (unsigned)t
int avg = t + ( (t_u >> 31 ) & (a ^ b) )
I have checked it's correctness using Z3 SMT solver
Just a few observations that may help:
"Most accurate" isn't necessarily unique with integers. E.g. for 1 and 4, 2 and 3 are an equally "most accurate" answer. Mathematically (not C integers):
(a+b)/2 = a+(b-a)/2 = b+(a-b)/2
Let's try breaking this down:
If sign(a)!=sign(b) then a+b will will not overflow. This case can be determined by comparing the most significant bit in a two's complement representation.
If sign(a)==sign(b) then if a is greater than b, (a-b) will not overflow. Otherwise (b-a) will not overflow. EDIT: Actually neither will overflow.
What are you trying to optimize exactly? Different processor architectures may have different optimal solutions. For example, in your code replacing the multiplication with an AND may improve performance. Also in a two's complement architecture you can simply (a & b & 1).
I'm just going to throw some code out, not looking too fast but perhaps someone can use and improve:
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a
I would do this, convert both to long long(64 bit signed integers) add them up, this won't overflow and then divide the result by 2:
((long long)a + (long long)b) / 2
If you want the decimal part, store it as a double.
It is important to note that the result will fit in a 32 bit integer.
If you are using the highest-rank integer, then you can use:
((double)a + (double)b) / 2
This answer fits to any number of integers:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test