Consider two coordinate systems, one for the objects themselves, one for the chunks the objects are contained in. Let's consider a chunk size of 4, meaning that the object at coordinate 0 is in chunk 0, the object at coordinate 3 is also in chunk 0, but the object at coordinate 8 is in chunk 2 , the object at coordinate -4 is in chunk -1.
Calculating the chunk number for an object with a positive position is easy: object.number/chunk_size
But I do not find a formular that calculates the correct chunk position for objects at negative positions:
-4/4 = -1 is correct but -2/4 = 0 is not the required result, though -4/4 -1 = -2 is now incorrect, but -2/4 -1 = -1 is now correct ...
Is there a sweet, short way to calculate each position, or do I need to check 2 conditions:
chunkx = objectx > 0 ?
objectx / chunksize :
objectx % chunksize == 0 ?
objectx / chunksize :
objectx / chunksize - 1;
Alternative:
chunkx = objectx > 0 || objectx % chunksize == 0 ?
objectx / chunksize :
objectx / chunksize -1;
On a side note: calculating the position of an object within the chunk is:
internalx = objectx - chunkx * chunksize
for both positive and negative (-4 -> 0 ; -2 -> 2; 1 -> 1; 4 -> 0)
Is there a more elegant way to calculate this that I am blatantly overseeing here?
If you can afford to convert your numbers to floating point and have a cheap floor function you can use floor(-1.0/4.0) to get -1 as you wish, but the conversion to floating point may be more expensive than the branch.
Another option is to work with positive numbers only by adding a large enough number (multiple of chunk size) to your object coordinate, and subtracting that number divided by the chunk size from your chunk coordinate. This may be cheaper than a branch.
For your second question, if your chunk size happens to be a power of 2 as in your example you can use binary and (-1 & (chunksize-1) == 3)
If your chunk size is a power of two you can do:
chunkx = (objectx & ~(chunksize - 1)) / chunksize;
If the chunk size is also constant the compiler can probably turn that into a trivial AND and shift. E.g.,
chunkx = (objectx & 3) >> 2;
For the general case, I don't think you can eliminate the branch, but you can eliminate the slow modulo operation by offsetting the number before division:
chunkx = ((objectx >= 0) ? (objectx) : (objectx - (chunksize - 1))) / chunksize;
Related
I'd like to implement big number arithmetic operations modulo P, with P = 2^256 - 2^32 - 977. Note that P is fixed so any optimization can be hardcoded.
I'm using an array of 8 u32 to represent a u256:
struct fe {
uint32_t b[8]; // 256 = 8 x 32
};
Now a simple version of the addition could look like this
void fe_add(struct fe *x, struct fe *y, struct fe *res) {
int carry = 0;
for (int i = 0; i < 8; ++i) {
uint32_t tmp = x->b[i] + y->b[i] + carry;
carry = tmp < x->b[i] && tmp < y->b[i];
res->b[i] = tmp;
}
}
Now to support (x + y) % P, I could use this definition and define -, *, and / over struct fe:
// (x + y) % P = (x + y) - (P * int((x + y) / P))
fe_add(&x, &y, &t1); // t1 = x + y
fe_div(&t1, &P, &t2); // t2 = (x + y) / P
fe_mult(&P, &t2, &t3); // t3 = P * ((x + y) / P)
fe_sub(&t1, &t3, &res); // res = x + y - (P * ((x + y) / P))
What would be a better way to implement (x + y) % P directly during the addition, knowing that P won't change?
As Eric wrote in a comment, you should pay attention to the carry. After your loop is done, you may have some carry from the highest position. If in the end carry is not zero, then it has to be one. Then its value is 2^256, corresponding to index 8. Since
2^256 ≡ 2^32 + 977 (mod P)
you may account for this carry by adding 2^32 + 977 to your result so far. You can probably do so in an optimized manner (i.e. not re-using the same add loop), since you know the one term to be mostly zeros so you can stop after the first (least significant) two “digits” are added as soon as the carry has become zero. (I'm using the term “digit” for each of your u32 array members.)
What do you do if during that addition the carry at the high end of the addition is non-zero a second time? As Eric noted, when each of your inputs is less than P, the sum will be less than 2P so subtracting P once (which is what the shift from 2^256 to 2^32 + 977 does) will make it less than P. So no need to worry, you can stop the loop when carry becomes zero no matter the digit count.
And what if the resulting sum is bigger than P but less than 2^256 so you don't get any carry? To also cover this situation, you can compare the result against P, and subtract P unless it's smaller. Subtraction is a lot easier than division. You can skip this check if you did the code path for the non-zero carry. You can also optimize that check somewhat, aborting if any of the first 6 “digits” is less than 2^32-1. Only if they all equal 2^32-1 then you can do some minor comparisons and computations to do the actual subtraction in the lowest two “digits” before clearing all the higher “digits”.
In Python-like pseudo-code and glossing over the details of how to detect overflow or underflow happening in the line before:
def fe_add(x, y, res):
carry = 0
for i in 0 .. 7:
res[i] = x[i] + y[i] + carry
carry = 1 if overflow else 0
# So far this is what you had.
if carry != 0:
# If carry == 1: add 2^32 + 977 instead.
res[0] += 977
res[1] += 1 + (1 if overflow else 0)
carry = 1 if overflow else 0
i = 2
while carry != 0:
res[i] += 1
carry = 1 if overflow else 0
i++
else:
# Compare res against P.
for i in 7 .. 2:
if res[i] != 2^32 - 1:
return
if res[1] == 2^32 - 1 or (res[1] == 2^32 - 2 and res[0] >= 2^32 - 977):
# If res >= P, subtract P.
res[0] -= 2^32 - 977
res[1] -= 2^32 - 2 + (1 if underflow else 0)
for i in 2 .. 7:
res[i] = 0
There is an alternative. Instead of using numbers from the range [0 .. P-1] to represent your elements of the modulo group, you might also choose to use [2^32 + 977 .. 2^256-1] instead. That would simplify some operations but complicate others. Additions in particular would be simpler, because just handling the nonzero carry situation discussed above would be enough. Comparing whether a number is ≡ 0 (mod P) would be more complicated, for example. And it might also be confusing some code contributors. As usual with changes that might improve performance, tests would be best suited to tell whether one or the other solution performs better in practice. But perhaps you might want to design your API so that you can swap these implementation details without any code using them even noticing it. This could mean e.g. not relying on zero initialization to initialize a zero element of that data type but having a function instead.
I have a memory region, which is divided in blocks of predefined BLOCKSIZE size. Given a memory chunk, defined by its offset OFFSET in bytes and SIZE in bytes, how to efficiently calculate a number of blocks, containing this memory chunk?
For example, let's say the BLOCKSIZE=8. Then a memory chunk with OFFSET=0 and SIZE=16 will take 2 blocks, but a chunk with OFFSET=4 and SIZE=16 will take 3 blocks.
I can write a formula like this (using integer arithmetics in C):
numberOfBlocks = (OFFSET + SIZE - 1) / BLOCKSIZE - (OFFSET / BLOCKSIZE) + 1;
This calculation will take 2 divisions and 4 additions. Can we do better, provided that the BLOCKSIZE is a power of 2 and OFFSET >= 0 and SIZE > 0?
UPDATE: I uderstand that division can be replaced by shifting in this case.
Can we do better, provided that the BLOCKSIZE is a power of 2?
I don't think so. Your (corrected) formula is basically (index of the first block after the chunk) - (index of the first containing any part of the chunk). You could formulate it differently -- say, as the sum of a base number of blocks plus an adjustment for certain layouts that require one extra block -- but that actually increases the number of operations needed by a couple:
numberOfBlocks = (SIZE + BLOCKSIZE - 1) / BLOCKSIZE
+ ((SIZE % BLOCKSIZE) + (OFFSET % BLOCKSIZE)) / BLOCKSIZE;
I don't see any way around performing (at least) two integer divisions (or equivalent bit shifts), because any approach to the calculation requires computing two block counts. These two computations cannot be combined, because each one requires a separate remainder truncation.
That BLOCKSIZE is a power of two may help you choose more efficient operations, but it does not help reduce the number of operations required. However, you could reduce the number of operations slightly if you could rely on SIZE to be a multiple of BLOCKSIZE. In that case, you could do this:
numberOfBlocks = SIZE / BLOCKSIZE + (OFFSET % BLOCKSIZE) ? 1 : 0;
Alternatively, if it would be sufficient to compute an upper bound on the number of blocks covered, then you could do this:
numberOfBlocksBound = SIZE / BLOCKSIZE + 2;
or slightly tighter in many cases, but more costly to compute:
numberOfBlocksBound = (SIZE + BLOCKSIZE - 1) / BLOCKSIZE + 1;
Let's suppose we have noramlly distributed random int values from function:
unsigned int myrand();
The commonest way to shrink its range to [0, A] (int A) is to do as follows:
(double)rand() / UINT_MAX * A
Now I need to do the same for values in range of __int64:
unsigned __int64 max64;
unsigned __int64 r64 = myrand();
r64 <<= 32;
r64 |= myrand();
r64 = normalize(r64, max64);
The problem is to normalize return range by some __int64 because it could not be placed in double. I wouldn't like to use various libraries for big numbers due to performance reasons. Is there a way to shrink return range quickly and easily while saving normal distribution of values?
The method that you give
(double)myrand() / UINT_MAX * A
is already broken. For example, if A = 1 and you want integers in the range [0, 1] you will only ever get a value of 1 if myrand () returned UINT_MAX. If you meant the range [0, A), that is only the value 0, then it is still broken because it will in that case return a value outside the range. No matter what, you are introducing a bias.
If you want A+1 different values from 0 to A inclusive, and 2^32 ≤ A < 2^64, you proceed as follows:
Step 1: Calculate a 64 bit random number R as you did. If A is one less than a power of two, you return R shifted by the right amount.
Step 2: Find how many different random values would be mapped to the same output value. Mathematically, that number is floor (2^64 / (A + 1)). 2^64 is too large, but that is no problem because it is equal to 1 + floor ((2^64 - (A + 1)) / (A + 1)), calculated in C or C++ as D = 1 + (- (A + 1)) / (A + 1) if A has type uint64_t.
Step 3: Find how many different random values should be mapped by calculating N = D * (A + 1). If R >= N then go back to Step 1.
Step 4: Return R / D.
No floating point arithmetic needed. The result is totally unbiased. If A < 2^32 you fall back to the 32 bit version (or you use the 64 bit version as well, but it calls myrandom () twice as often as needed).
Of course you calculate D and N only once unless A changes.
Maybe you can use "long double" if it is available in your platform.
I'm working on a cryptographic exercise, and I'm trying to calculate (2n-1)mod p where p is a prime number
What would be the best approach to do this? I'm working with C so 2n-1 becomes too large to hold when n is large
I came across the equation (a*b)modp=(a(bmodp))modp, but I'm not sure this applies in this case, as 2n-1 may be prime (or I'm not sure how to factorise this)
Help much appreciated.
A couple tips to help you come up with a better way:
Don't use (a*b)modp=(a(bmodp))modp to compute 2n-1 mod p, use it to compute 2n mod p and then subtract afterward.
Fermat's little theorem can be useful here. That way, the exponent you actually have to deal with won't exceed p.
You mention in the comments that n and p are 9 or 10 digits, or something. If you restrict them to 32 bit (unsigned long) values, you can find 2^n mod p with a simple (binary) modular exponentiation:
unsigned long long u = 1, w = 2;
while (n != 0)
{
if ((n & 0x1) != 0)
u = (u * w) % p; /* (mul-rdx) */
if ((n >>= 1) != 0)
w = (w * w) % p; /* (sqr-rdx) */
}
r = (unsigned long) u;
And, since (2^n - 1) mod p = r - 1 mod p :
r = (r == 0) ? (p - 1) : (r - 1);
If 2^n mod p = 0 - which doesn't actually occur if p > 2 is prime - but we might as well consider the general case - then (2^n - 1) mod p = -1 mod p.
Since the 'common residue' or 'remainder' (mod p) is in [0, p - 1], we add a some multiple of p so that it is in this range.
Otherwise, the result of 2^n mod p was in [1, p - 1], and subtracting 1 will be in this range already. It's probably better expressed as:
if (r == 0)
r = p - 1; /* -1 mod p */
else
r = r - 1;
To take modulus you somehow must have 2^n-1 or you will move in a different direction of algorithms, interesting but seperate direction somehow, so i recommend you to use big int concept as it will be easy... make a structure and implement a big value in small values, e.g.
struct bigint{
int lowerbits;
int upperbits;
}
decomposition of the statement also has solution like 2^n = (2^n-4 * 2^4 )-1%p decompose and seperatly handle them, that will be quite algorithmic then
To compute 2^n - 1 mod p, you can use exponentiation by squaring after first removing any multiple of (p - 1) from n (since a^{p-1} = 1 mod p). In pseudo-code:
n = n % (p - 1)
result = 1
pow = 2
while n {
if n % 2 {
result = (result * pow) % p
}
pow = (pow * pow) % p
n /= 2
}
result = (result + p - 1) % p
I came across the answer that I am posting here, when solving one of the mathematical problems on HackerRank, and it has worked for all the given test cases given there.
If you restrict n and p to 64 bit (unsigned long) values, then here is the mathematical approach :
2^n - 1 can be written as 1*[ (2^n - 1)/(2 - 1) ]
If you look at this carefully, this is the sum of the GP 1 + 2 + 4 + .. + 2^(n-1)
And voila, we know that (a+b)%m = ( (a%m) + (b%m) )%m
If you have a confusion whether the above relation is true or not for addition, you can google for it or you can check this link : http://www.inf.ed.ac.uk/teaching/courses/dmmr/slides/13-14/Ch4.pdf
So, now we can apply the above mentioned relation to our GP, and you would have your answer!!
That is,
(2^n - 1)%p is equivalent to ( 1 + 2 + 4 + .. + 2^(n-1) )%p and now apply the given relation.
First, focus on 2n mod p because you can always subtract one at the end.
Consider the powers of two. This is a sequence of numbers produced by repeatedly multiplying by two.
Consider the modulo operation. If the number is written in base p, you're just grabbing the last digit. Higher digits can be thrown away.
So at some point(s) in the sequence, you get a two-digit number (a 1 in the p's place), and your task is really just to get rid of the first digit (subtract p) when that happens.
Stopping here conceptually, the brute-force approach would be something like this:
uint64_t exp2modp( uint64_t n, uint64_t p ) {
uint64_t ret = 1;
uint64_t limit = p / 2;
n %= p; // Apply Fermat's Little Theorem.
while ( n -- ) {
if ( ret >= limit ) {
ret *= 2;
ret -= p;
} else {
ret *= 2;
}
}
return ret;
}
Unfortunately, this still takes forever for large n and p, and I can't think of any better number theory offhand.
If you have a multiplication facility which can compute (p-1)^2 without overflow, then you can use an analogous algorithm using repeated squaring with a modulo after each square operation, and then take the product of the series of square residuals, again with a modulo after each multiplication.
step 1. x= shifting 1 n times and then subtract 1
step 2.result = logical and operation of x and p
I have an array that is of size 4,9,16 or 25 (according to the input) and the numbers in the array are the same but less by one (if the array size is 9 then the biggest element in the array would be 8) the numbers start with 0
and I would like to do some algorithm to generate some sort of a checksum for the array so that I can compare that 2 arrays are equal without looping through the whole array and checking each element one by one.
Where can I get this sort of information? I need something that is as simple as possible. Thank you.
edit: just to be clear on what I want:
-All the numbers in the array are distinct, so [0,1,1,2] is not valid because there is a repeated element (1)
-The position of the numbers matter, so [0,1,2,3] is not the same as [3,2,1,0]
-The array will contain the number 0, so this should also be taken into consideration.
EDIT:
Okay I tried to implement the Fletcher's algorithm here:
http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Straightforward
int fletcher(int array[], int size){
int i;
int sum1=0;
int sum2=0;
for(i=0;i<size;i++){
sum1=(sum1+array[i])%255;
sum2=(sum2+sum1)%255;
}
return (sum2 << 8) | sum1;
}
to be honest I have no idea what does the return line do but unfortunately, the algorithm does not work.
For arrays [2,1,3,0] and [1,3,2,0] I get the same checksum.
EDIT2:
okay here's another one, the Adler checksum
http://en.wikipedia.org/wiki/Adler-32#Example_implementation
#define MOD 65521;
unsigned long adler(int array[], int size){
int i;
unsigned long a=1;
unsigned long b=0;
for(i=0;i<size;i++){
a=(a+array[i])%MOD;
b=(b+a)%MOD;
}
return (b <<16) | a;
}
This also does not work.
Arrays [2,0,3,1] and [1,3,0,2] generate same checksum.
I'm losing hope here, any ideas?
Let's take the case of your array of 25 integers. You explain that it can contains any permutations of the unique integers 0 to 24. According to this page, there is 25! (25 factorial) possible permutations, that is 15511210043330985984000000. Far more than a 32bit integer can contains.
The conclusion is that you will have collision, no matter how hard you try.
Now, here is a simple algorithm that account for position:
int checksum(int[] array, int size) {
int c = 0;
for(int i = 0; i < size; i++) {
c += array[i];
c = c << 3 | c >> (32 - 3); // rotate a little
c ^= 0xFFFFFFFF; // invert just for fun
}
return c;
}
I think what you want is in the answer of the following thread:
Fast permutation -> number -> permutation mapping algorithms
You just take the number your permutation is mapped to and take that as your Checksum. As there is exactly one Checksum per permutation there can't be a smaller Checksum that is collision free.
How about the checksum of weighted sum? Let's take an example for [0,1,2,3]. First pick a seed and limit, let's pick a seed as 7 and limit as 10000007.
a[4] = {0, 1, 2, 3}
limit = 10000007, seed = 7
result = 0
result = ((result + a[0]) * seed) % limit = ((0 + 0) * 7)) % 10000007 = 0
result = ((result + a[1]) * seed) % limit = ((0 + 1) * 7)) % 10000007 = 7
result = ((result + a[2]) * seed) % limit = ((7 + 2) * 7)) % 10000007 = 63
result = ((result + a[3]) * seed) % limit = ((63 + 3) * 7)) % 10000007 = 462
Your checksum is 462 for that [0, 1, 2, 3].
The reference is http://www.codeabbey.com/index/wiki/checksum
For an array of N unique integers from 1 to N, just adding up the elements will always be N*(N+1)/2. Therefore the only difference is in the ordering. If by "checksum" you imply that you tolerate some collisions, then one way is to sum the differences between consecutive numbers. So for example, the delta checksum for {1,2,3,4} is 1+1+1=3, but the delta checksum for {4,3,2,1} is -1+-1+-1=-3.
No requirements were given for collision rates or computational complexity, but if the above doesn't suit, then I recommend a position dependent checksum
From what I understand your array contains a permutation of numbers from 0 to N-1. One check-sum which will be useful is the rank of the array in its lexicographic ordering. What does it means ? Given 0, 1, 2
You have the possible permutations
1: 0, 1, 2
2: 0, 2, 1
3: 1, 0, 2
4: 1, 2, 0
5: 2, 0, 1
6: 2, 1, 0
The check-sum will be the first number, and computed when you create the array. There are solutions proposed in
Find the index of a given permutation in the list of permutations in lexicographic order
which can be helpful, although it seems the best algorithm was of quadratic complexity. To improve it to linear complexity you should cache the values of the factorials before hand.
The advantage? ZERO collision.
EDIT: Computation
The value is like the evaluation of a polynomial where factorial is used for the monomial instead of power. So the function is
f(x0,....,xn-1) = X0 * (0!) + X1 * (1!) + X2 * (2!) +...+ Xn-1 * (n-1!)
The idea is to use each values to get a sub-range of permutations, and with enough values you pinpoint an unique permutation.
Now for the implementation (like the one of a polynomial):
pre compute 0!.... to n-1! at the beginning of the program
Each time you set an array you use f(elements) to compute its checksum
you compare in O(1) using this checksum