Im trying to reduce the information density of an RGB pixel in a 2d barcode for scanning.
I currently have a 3 byte array where information is encoded in multiples of bound until 256 is hit, at which point it will loop over to the next byte; For instance if bound was 60:
0,0,0 -> 60,0,0 -> 120,0,0 -> 180,0,0 -> 240,0,0 -> (loop over) 0,60,0 -> ...
I suspect there is a very simple method to do this, but i have only been able to implement this with a for loop
void multiples(uint32_t data){
uint32_t a[3] = {0, 0, 0};
uint8_t bound = 32;
for(int i = 0; i < data; i++){
a[0] += bound;
a[1] += bound*(a[0] - a[0]%256)/256;
a[2] += bound*(a[1] - a[1]%256)/256;
a[0] %= 256;
a[1] %= 256;
}
printf("%u,%u,%u\n",a[0],a[1],a[2]);
}
How should this be done optimally?
Your example does not match the code and your explanation. But assuming that your code is correct, I think that we can optimize it. Since you are adding bound constantly to a[0], the cumulative value of a[0] (say, acc[0]) will be,
acc[0] = data * bound
So a[0] can be directly computed as acc[0] % 256. Now the cumulative value of a[1] is adding bound whenever acc[0] crosses a multiple of 256. Therefore,
acc[1] = (acc[0]/256)*bound
So a[1] will be acc[1]%256. Cumulative a[2] similarly is the number of times acc[1] crosses a multiple of 256.
acc[2] = (acc[1]/256)*bound
I think the requirement must be a[2] to be less than 256, so a[2] should be acc[2]%256. But you can keep it equal to acc[2] as in the code.
Related
I'm trying to come up with an equation that relates the index of a value within a 3D array, to the index of the same array but reshaped into a column vector.
Consider the following.
A = randi([1,10],3,2,2);
A2 = reshape(A,3*2*2,1);
A and A2 have the same number of elements, but the arrangement of the elements is different for each array. If I lay out a possible example for A and A2 here it is clear geometrically how each index lines up.
A(:,:,1) = [9 10; 10 7; 2 1]
A(:,:,2) = [3 10; 6 2; 10 10]
A2 = [9; 10; 2; 10; 7; 1; 3; 6; 10; 10; 2; 10]
Let's say n=1:1:3*2*2, this is an array that is the same length as A2 and numbers each of the elements. The value of A(2,2,2)=2 and has indices [i,j,k]=[2,2,2]. I would like to have an equation relating i, j, k, and n.
I've looked into the built-in functions ind2sub and sub2ind but it seems that I inadvertently shaped my i, j, and k coordinates (which correspond with real x, y, and z points) differently than how MatLab does. This makes it difficult for me to change everything now, and is why I need an equation.
The conversion between 3D index and linear (1D) index is given by:
n=i+(j-1)*M + (k-1)*M*N
The reverse can be obtained recursively as:
k = floor((n-1)/(M*N)) +1
n = n - (k-1)*M*N
j = floor((n-1)/M) + 1
i = n - (j-1)*M
I haven't tested it, but I think it will give you what you are expeccting.
I can't wrap my head about idea of array of pointers. Problem is I'm trying to iterate throught list of pointers (or at least get second value from pointer's array). I understand that integer is 4 bytes long (assuming im on 32-bit). And what I'm trying to do is get first address that points to a[0] and add to this address 4 bytes, which in my opinion will result in a[1]. However, this works as I'm just adding value to index. I.e. f[0] + 4 -> f[5]
And I don't quite understand why.
#include "stdio.h"
int main()
{
int a[6] = {10,2,3,4,20, 42};
int *f[6];
for(int i = 0; i < sizeof(a)/sizeof(int); i++) f[i] = &a[i];
for(int i = 0; i < sizeof(a)/sizeof(int); i++) printf("Current pointer points to %i\n", *(*f+i));
printf("The is %i", *(f[0]+sizeof(int)));
return 1;
}
Pointer arithmetic takes into account the size of the pointer.
f[0] + 4 will multiply 4 by the size of the integer type.
Here's an online disassembler: https://godbolt.org/.
When I type the code f[0] + 4, the disassembly appears as
add QWORD PTR [rbp-8], 16
Meaning it has multiplied the 4 by 4 (32-bit = 4 bytes) to make 16.
An array is a pointer to a chunk of RAM. int a[6] = {10,2,3,4,20, 42}; actually creates a chunk with [0x0000000A, 0x00000002, 0x00000003, 0x00000004, 0x00000014, 0x0000002A], and a points to where the list starts.
Using an index a[n] basically means go to the position of a (start of the array), then advance by n*sizeof(int) bytes.
a[0] means Go to position of a, then don't jump
a[1] means Go to position of a, then jump 1 time the size of an integer
a[2] means Go to position of a, then jump 2 times the size of an integer
supposing a is at the address 0xF00D0000, and you're on a 32bit machine:
a[0] // Pointer to 0xF00D0000
a[1] // Pointer to 0xF00D0004
a[2] // Pointer to 0xF00D0008
a[32] // Pointer to 0xF00D0080
I hope this makes sense.
Given an array with n+2 elements, all elements in the array are in the range 1 to n and all elements occur only once except two elements which occur twice.
Find those 2 repeating numbers. For example, if the array is [4, 2, 4, 5, 2, 3, 1], then n is 5, there are n+2 = 7 elements with all elements occurring only once except 2 and 4.
So my question is how to solve the above problem using XOR operation. I have seen the solution on other websites but I'm not able to understand it. Please consider the following example:
arr[] = {2, 4, 7, 9, 2, 4}
XOR every element. xor = 2^4^7^9^2^4 = 14 (1110)
Get a number which has only one set bit of the xor. Since we can easily get the rightmost set bit, let us use it.
set_bit_no = xor & ~(xor-1) = (1110) & ~(1101) = 0010. Now set_bit_no will have only set as rightmost set bit of xor.
Now divide the elements in two sets and do xor of elements in each set, and we get the non-repeating elements 7 and 9.
Yes, you can solve it with XORs. This answer expands on Paulo Almeida's great comment.
The algorithm works as follows:
Since we know that the array contains every element in the range [1 .. n], we start by XORing every element in the array together and then XOR the result with every element in the range [1 .. n]. Because of the XOR properties, the unique elements cancel out and the result is the XOR of the duplicated elements (because the duplicate elements have been XORed 3 times in total, whereas all the others were XORed twice and canceled out). This is stored in xor_dups.
Next, find a bit in xor_dups that is a 1. Again, due to XOR's properties, a bit set to 1 in xor_dups means that that bit is different in the binary representation of the duplicate numbers. Any bit that is a 1 can be picked for the next step, my implementation chooses the least significant. This is stored in diff_bit.
Now, split the array elements into two groups: one group contains the numbers that have a 0 bit on the position of the 1-bit that we picked from xor_dups. The other group contains the numbers that have a 1-bit instead. Since this bit is different in the numbers we're looking for, they can't both be in the same group. Furthermore, both occurrences of each number go to the same group.
So now we're almost done. Consider the group for the elements with the 0-bit. XOR them all together, then XOR the result with all the elements in the range [1..n] that have a 0-bit on that position, and the result is the duplicate number of that group (because there's only one number repeated inside each group, all the non-repeated numbers canceled out because each one was XORed twice except for the repeated number which was XORed three times).
Rinse, repeat: for the group with the 1-bit, XOR them all together, then XOR the result with all the elements in the range [1..n] that have a 1-bit on that position, and the result is the other duplicate number.
Here's an implementation in C:
#include <assert.h>
void find_two_repeating(int arr[], size_t arr_len, int *a, int *b) {
assert(arr_len > 3);
size_t n = arr_len-2;
int i;
int xor_dups = 0;
for (i = 0; i < arr_len; i++)
xor_dups ^= arr[i];
for (i = 1; i <= n; i++)
xor_dups ^= i;
int diff_bit = xor_dups & -xor_dups;
*a = 0;
*b = 0;
for (i = 0; i < arr_len; i++)
if (arr[i] & diff_bit)
*a ^= arr[i];
else
*b ^= arr[i];
for (i = 1; i <= n; i++)
if (i & diff_bit)
*a ^= i;
else
*b ^= i;
}
arr_len is the total length of the array arr (the value of n+2), and the repeated entries are stored in *a and *b (these are so-called output parameters).
I have an array that is of size 4,9,16 or 25 (according to the input) and the numbers in the array are the same but less by one (if the array size is 9 then the biggest element in the array would be 8) the numbers start with 0
and I would like to do some algorithm to generate some sort of a checksum for the array so that I can compare that 2 arrays are equal without looping through the whole array and checking each element one by one.
Where can I get this sort of information? I need something that is as simple as possible. Thank you.
edit: just to be clear on what I want:
-All the numbers in the array are distinct, so [0,1,1,2] is not valid because there is a repeated element (1)
-The position of the numbers matter, so [0,1,2,3] is not the same as [3,2,1,0]
-The array will contain the number 0, so this should also be taken into consideration.
EDIT:
Okay I tried to implement the Fletcher's algorithm here:
http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Straightforward
int fletcher(int array[], int size){
int i;
int sum1=0;
int sum2=0;
for(i=0;i<size;i++){
sum1=(sum1+array[i])%255;
sum2=(sum2+sum1)%255;
}
return (sum2 << 8) | sum1;
}
to be honest I have no idea what does the return line do but unfortunately, the algorithm does not work.
For arrays [2,1,3,0] and [1,3,2,0] I get the same checksum.
EDIT2:
okay here's another one, the Adler checksum
http://en.wikipedia.org/wiki/Adler-32#Example_implementation
#define MOD 65521;
unsigned long adler(int array[], int size){
int i;
unsigned long a=1;
unsigned long b=0;
for(i=0;i<size;i++){
a=(a+array[i])%MOD;
b=(b+a)%MOD;
}
return (b <<16) | a;
}
This also does not work.
Arrays [2,0,3,1] and [1,3,0,2] generate same checksum.
I'm losing hope here, any ideas?
Let's take the case of your array of 25 integers. You explain that it can contains any permutations of the unique integers 0 to 24. According to this page, there is 25! (25 factorial) possible permutations, that is 15511210043330985984000000. Far more than a 32bit integer can contains.
The conclusion is that you will have collision, no matter how hard you try.
Now, here is a simple algorithm that account for position:
int checksum(int[] array, int size) {
int c = 0;
for(int i = 0; i < size; i++) {
c += array[i];
c = c << 3 | c >> (32 - 3); // rotate a little
c ^= 0xFFFFFFFF; // invert just for fun
}
return c;
}
I think what you want is in the answer of the following thread:
Fast permutation -> number -> permutation mapping algorithms
You just take the number your permutation is mapped to and take that as your Checksum. As there is exactly one Checksum per permutation there can't be a smaller Checksum that is collision free.
How about the checksum of weighted sum? Let's take an example for [0,1,2,3]. First pick a seed and limit, let's pick a seed as 7 and limit as 10000007.
a[4] = {0, 1, 2, 3}
limit = 10000007, seed = 7
result = 0
result = ((result + a[0]) * seed) % limit = ((0 + 0) * 7)) % 10000007 = 0
result = ((result + a[1]) * seed) % limit = ((0 + 1) * 7)) % 10000007 = 7
result = ((result + a[2]) * seed) % limit = ((7 + 2) * 7)) % 10000007 = 63
result = ((result + a[3]) * seed) % limit = ((63 + 3) * 7)) % 10000007 = 462
Your checksum is 462 for that [0, 1, 2, 3].
The reference is http://www.codeabbey.com/index/wiki/checksum
For an array of N unique integers from 1 to N, just adding up the elements will always be N*(N+1)/2. Therefore the only difference is in the ordering. If by "checksum" you imply that you tolerate some collisions, then one way is to sum the differences between consecutive numbers. So for example, the delta checksum for {1,2,3,4} is 1+1+1=3, but the delta checksum for {4,3,2,1} is -1+-1+-1=-3.
No requirements were given for collision rates or computational complexity, but if the above doesn't suit, then I recommend a position dependent checksum
From what I understand your array contains a permutation of numbers from 0 to N-1. One check-sum which will be useful is the rank of the array in its lexicographic ordering. What does it means ? Given 0, 1, 2
You have the possible permutations
1: 0, 1, 2
2: 0, 2, 1
3: 1, 0, 2
4: 1, 2, 0
5: 2, 0, 1
6: 2, 1, 0
The check-sum will be the first number, and computed when you create the array. There are solutions proposed in
Find the index of a given permutation in the list of permutations in lexicographic order
which can be helpful, although it seems the best algorithm was of quadratic complexity. To improve it to linear complexity you should cache the values of the factorials before hand.
The advantage? ZERO collision.
EDIT: Computation
The value is like the evaluation of a polynomial where factorial is used for the monomial instead of power. So the function is
f(x0,....,xn-1) = X0 * (0!) + X1 * (1!) + X2 * (2!) +...+ Xn-1 * (n-1!)
The idea is to use each values to get a sub-range of permutations, and with enough values you pinpoint an unique permutation.
Now for the implementation (like the one of a polynomial):
pre compute 0!.... to n-1! at the beginning of the program
Each time you set an array you use f(elements) to compute its checksum
you compare in O(1) using this checksum
I have a fitness function that is scoring the values on an int array based on data that lies on a 4D array. The profiler says this function is using 80% of CPU time (it needs to be called several million times). I can't seem to optimize it further (if it's even possible). Here is the function:
unsigned int lookup_array[26][26][26][26]; /* lookup_array is a global variable */
unsigned int get_i_score(unsigned int *input) {
register unsigned int i, score = 0;
for(i = len - 3; i--; )
score += lookup_array[input[i]][input[i + 1]][input[i + 2]][input[i + 3]];
return(score)
}
I've tried to flatten the array to a single dimension but there was no improvement in performance. This is running on an IA32 CPU. Any CPU specific optimizations are also helpful.
Thanks
What is the range of the array items? If you can change the array base type to unsigned short or unsigned char, you might get fewer cache misses because a larger portion of the array fits into the cache.
Most of your time probably goes into cache misses. If you can optimize those away, you can get a big performance boost.
Remember that C/C++ arrays are stored in row-major order. Remember to store your data so that addresses referenced closely in time reside closely in memory. For example, it may make sense to store sub-results in a temporary array. Then you could process exactly one row of elements located sequentially. That way the processor cache will always contain the row during iterations and less memory operations will be required. However, you might need to modularize your lookup_array function. Maybe even split it into four (by the number of dimensions in your array).
The problem is definitely related to the size of the matrix. You cannot optimize it by declaring as a single array just because it's what the compiler does automatically.
Everything depends on which order do you use for accessing the data, namely on the content of the input array.
The only think you can do is work on locality: read this one, it should give you some inspiration.
By the way, I suggest you to replace the input array with four parameters: it will be more intuitive and it will be less error prone.
Good luck
A few suggestions to improve performance:
Parallelise. This is a very easy reduction to be programmed in OpenMP or MPI.
Reorder data to improve locality. Try sorting input first, for example.
Use streaming processing instructions if the compiler is not already doing it.
About reordering, it would be possible if you flatten the array and use linear coordinates instead.
Another point, compare the theoretical peak performance of your processor (integer operations) with the performance you're getting (do a quick count of the assembly generated instructions, multiply by the length of the input, etc.) and see if there's room for a significant improvement there.
I have a couple of suggestions:
unsigned int lookup_array[26][26][26][26]; /* lookup_array is a global variable */
unsigned int get_i_score(unsigned int *input, len) {
register unsigned int i, score = 0;
unsigned int *a=input;
unsigned int *b=input+1;
unsigned int *c=input+2;
unsigned int *d=input+3;
for(i = 0; i < (len - 3); i++, a++, b++, c++, d++)
score += lookup_array[*a][*b][*c][*d];
return(score)
}
Or try
for(i = 0; i < (len - 3); i++, a=b, b=c, c=d, d++)
score += lookup_array[*a][*b][*c][*d];
Also, given that there are only 26 values, why are you putting the input array in terms of unsigned ints? If it were char *input, you'd be using 1/4 as much memory and therefore using 1/4 of the memory bandwidth. Obviously the types of a through d have to match. Similarly, if the score values don't need to be unsigned ints, make the array smaller by using chars or uint16_t.
You might be able to squeeze a bit out, by unrolling the loop in some variation of Duffs device.
Multidimesional arrays often constrain the compiler to one or more multiply operations. It may be slow on some CPUs. A common workaround is to transform the N-dimensional array into an array of pointers to elements of (N-1) dimensions. With a 4-dim. array is quite annoying (26 pointers to 26*26 pointers to 26*26*26 rows...) I suggest to try it and compare the result. It is not guaranteed that it's faster: compilers are quite smart in optimizing array accesses, while a chain of indirect accesses has higher probability to invalidate the cache.
Bye
if lookup_array is mostly zeroes, could def be replaced with a hash table lookup on a smaller array. The inline lookup function could calculate the offset of the 4-dimensions ([5,6,7,8] = (4*26*26*26)+(5*26*26)+(6*26)+7 = 73847). the hash key could just be the lower few bits of the offset (depending on how sparse the array is expected to be). if the offset exists in the hash table, use the value, if it doesn't exist it's 0...
the loop could also be unrolled using something like this if the input has arbitrary length. there are only len accesses of input needed (instead of around len * 4 in the original loop).
register int j, x1, x2, x3, x4;
register unsigned int *p;
p = input;
x1 = *p++;
x2 = *p++;
x3 = *p++;
for (j = (len - 3) / 20; j--; ) {
x4 = *p++;
score += lookup_array[x1][x2][x3][x4];
x1 = *p++;
score += lookup_array[x2][x3][x4][x1];
x2 = *p++;
score += lookup_array[x3][x4][x1][x2];
x3 = *p++;
score += lookup_array[x4][x1][x2][x3];
x4 = *p++;
score += lookup_array[x1][x2][x3][x4];
x1 = *p++;
score += lookup_array[x2][x3][x4][x1];
x2 = *p++;
score += lookup_array[x3][x4][x1][x2];
x3 = *p++;
score += lookup_array[x4][x1][x2][x3];
x4 = *p++;
score += lookup_array[x1][x2][x3][x4];
x1 = *p++;
score += lookup_array[x2][x3][x4][x1];
x2 = *p++;
score += lookup_array[x3][x4][x1][x2];
x3 = *p++;
score += lookup_array[x4][x1][x2][x3];
x4 = *p++;
score += lookup_array[x1][x2][x3][x4];
x1 = *p++;
score += lookup_array[x2][x3][x4][x1];
x2 = *p++;
score += lookup_array[x3][x4][x1][x2];
x3 = *p++;
score += lookup_array[x4][x1][x2][x3];
x4 = *p++;
score += lookup_array[x1][x2][x3][x4];
x1 = *p++;
score += lookup_array[x2][x3][x4][x1];
x2 = *p++;
score += lookup_array[x3][x4][x1][x2];
x3 = *p++;
score += lookup_array[x4][x1][x2][x3];
/* that's 20 iterations, add more if you like */
}
for (j = (len - 3) % 20; j--; ) {
x4 = *p++;
score += lookup_array[x1][x2][x3][x4];
x1 = x2;
x2 = x3;
x3 = x4;
}
If you convert it to a flat array of size 26*26*26*26, you only need to lookup the input array once per loop:
unsigned int get_i_score(unsigned int *input)
{
unsigned int i = len - 3, score = 0, index;
index = input[i] * 26 * 26 +
input[i + 1] * 26 +
input[i + 2];
while (--i)
{
index += input[i] * 26 * 26 * 26;
score += lookup_array[index];
index /= 26 ;
}
return score;
}
The additional cost is a multiplication and a division. Whether it ends up being faster in practice - you'll have to test.
(By the way, the register keyword is often ignored by modern compilers - it's usually better to leave register allocation up to the optimiser).
Does the content of the array change much? Perhaps it would be faster to pre-calculate the score, and then modify that pre-calculated score everytime the array changes? Similar to how you can materialize a view in SQL using triggers.
Maybe you can eliminate some accesses to the input array by using local variables.
unsigned int lookup_array[26][26][26][26]; /* lookup_array is a global variable */
unsigned int get_i_score(unsigned int *input, unsigned int len) {
unsigned int i, score, a, b, c, d;
score = 0;
a = input[i + 0];
b = input[i + 1];
c = input[i + 2];
d = input[i + 3];
for (i = len - 3; i-- > 0; ) {
d = c, c = b, b = a, a = input[i];
score += lookup_array[a][b][c][d];
}
return score;
}
Moving around registers may be faster than accessing memory, although this kind of memory should remain in the innermost cache anyway.