In a [N][N] Matrix, what would be the best way of obtaining the sum of the 8 elements surrounding a certain element?
We've been doing it the brute way, just checking with a lot of if statements but i was wondering if there could be a most clever way of doing this.
The problems we face are the borders of the matrix, since we cannot find a way that looks more subtle than the original bunch of if(i>0 && j>0){...}
Assuming the matrix has been initialized and you are considering calculating sums of those elements whose all eight counterparts exist.Then you can save your time if you apply double for loops for only those elements by doing the following :
Let a N x N matrix then use the following to cover all the elements satisfying the above conditions
for( i = 1; i < N - 1 ;i++)
{
for( j = 1;j < N -1 ;j++)
{
//YOUR CODE
}
}
Related
I'm trying to figure out a suitable way to apply row-wise permutation of a matrix using SIMD intrinsics (mainly AVX/AVX2 and AVX512).
The problem is basically calculating R = PX where P is a permutation matrix (sparse) with only only 1 nonzero element per column. This allows one to represent matrix P as a vector p where p[i] is the row index of nonzero value for column i. Code below shows a simple loop to achieve this:
// R and X are 2d matrices with shape = (m,n), same size
for (size_t i = 0; i < m; ++i){
for (size_t j = 0; j < n; ++j) {
R[p[i],j] += X[i,j]
}
}
I assume it all boils down to gather, but before spending long time trying implement various approaches, I would love to know what you folks think about this and what is the more/most suitable approach tackling this?
Isn't it strange that none of the compilers use avx-512 for this?
https://godbolt.org/z/ox9nfjh8d
Why is it that gcc doesn't do register blocking? I see clang does a better job, is this common?
I have a 2D matrix A[row][col]. Let say it consists only boolean values.
I want to iterate through every element. For each iteration, it will scan the 8 surrounding elements of that element, and count the number of True values. Let just ignore the counter process.
// scanning element A[i][j]
counter = countHowManyTrue(A[i+1][j], A[i-1][j], A[i][j+1], A[i][j-1]... and so on)
Except the first row, the last row, the first column, and the last column, every other elements have 8 surrounding elements. So I have to write if-else statement to check, like this:
for (i = 0; i < m; i++){
for(j = 0; j < n; j++){
if(i == 0 && j == 0){ // first row, first col
// do sth
}
if(i == m && j == 0){
//do sth
}
}
}
and I have to repeat the if-else statement many times to check the edge cases, which is very time-consuming.
Is there a better way to do this?
One of the simplest ways to do this is to add a sentinel rows and columns: row of all zeros above and below the real data. Then you don't need to special-case it.
As other have mentioned you can pad matrix with a border of zeroes so there's no edge cases:
If space is not an issue there's another speedup, instead of looking 8 neighbors for each cell we can actually get sum of neighbors values in fewer operation with some pre-calculation.
Let's in each matrix cell (after adding 0 borders) keep sum of elements from corner 0,0
It's simple to do:
//lets assume matrix was of size n*m before padding with border of zeroes twice
for(int i=2; i<=n+2; i++)
for(int j=2; j<=m+2; j++)
matrix[i][j] += matrix[i-1][j] + matrix[i][j-1] - matrix[i-1][j-1]
Now getting some cells neighbors is just some math, let's say we want neighbors of some cell x,y we can get it by matrix[x+1][y+1] - matrix[x-2][y] - matrix[x][y-2] + matrix[x-2][y-2] - matrix[x][y]
Now it takes only 7 additions/subtractions per cell compared to 8 originally
I recently came across a question on a competitive programming contest. Given an array of integers, find indices of a pair of array elements with least value of LCM.
I know there's a naive double loop O(n^2) solution but as expected, it gave a time limit exception. I've heard that dynamic programming is used to optimise brute force approaches but I'm not able to get how to divide this problem into subproblems so that there's an optimal substructure.
Can I get any direction to approach this problem using DP? Or any better approach? Thanks.
(Assuming positive numbers.)
The smallest LCM is likely to stem from the smallest elements, hence to prune trying values one could sort the array.
int[] v = { ... };
int minLCM = Integer.MAX_VALUE;
int bestVi = -1;
int bestVj = -1;
Arrays.sort(v);
for (int i = 0; i < v.length; ++i) {
int vi = v[i];
// _____________
for (int j = i + 1; j < v.length && v[j] < minLCM; ++j) {
int vj = v[j];
int lcm = lcm(vi, vj);
if (lcm < minLcm) {
minLCM = lcm;
bestVi = vi;
bestVj = vj;
}
}
}
For pruning:
lcm(vi, vj) >= vi
lcm(vi, vj) >= vj
(lcm(vi, vj) <= vi*vj
This pruning can be done in the for-j loop only as vj >= vi.
Better pruning could be done if instead a number you had a list of (at most ²log value)prime factors.
As factorisation costs too, one might just try for (2, 3, 5, 7) for instance.
Better looping can also achieved by replacing both nested loops so that the smallest (vi, vj) come first. Above (v0, v100) comes before (v1, v2). Loop over an increasing i+j.
i j
0 1
0 2
0 3
1 2
0 4
1 3
0 5
1 4
2 3
...
(The math for the loop counters using diagonals is a nice puzzle. Should really be done.)
Though still O(n²) this might work.
Sometimes also the programming language matters, and one sees in those contexts often C contributions. In such a case using Ruby might be contraproductive.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I am trying to improve my knowledge in Algorithms and I was wondering if someone can give me a good explanation on how to easily calculate running time.
boolean hasDuplicate(int[] array) {
for (int i = 0; i < array.length; i++) {
for (int j = 0; j < array.length; j++) {
if (array[i] == array[j] && i != j) {
return true;
}
}
}
return false;
}
So it says:
This
array
takes
O(n2)
running
time
because
each
element
has
to
be
compared
with
n
elements
(where
n
is
the
length
of
the
array).
Therefore,
if
we
double
the
input
size,
we
quadruple
the
running
time.
Question:
Let's say the array was [1,2,3] and if we double it [1,2,3,4,5,6], how does that take quadruple? Shouldn't that be double the running time too?
The if statement executes array.length * array.length times. That's O(N^2) if N denotes the array length.
Here's one way to think about this - your for loop iterates over all possible pairs of indices into the array (do you see why?)
Let's suppose you have an array of length n. There are n2 possible pairs of indices into that array (do you see why?). If you double the size of the array to 2n, then the number of possible pairs of indices is (2n)2 = 4n2. Notice that this is four times the original number, meaning that there are four times more pairs to consider. Therefore, since the runtime of your code is proportional to the number of pairs of indices in the array, the runtime should go up by a factor of four.
More generally, any quadratic-time algorithm should take roughly four times as long to finish when you double the size of the input.
Hope this helps!
The loops iterate over the array n number of times. Say, consider the first loop :
for (int i = 0; i < array.length; i++) {
for (int j = 0; j < array.length; j++) {
if (array[i] == array[j] && i != j) {
return true;
}
}
}
The variable i will have values from 0 to n-1. n denotes the length of the array in your case.
For each value of i, the variable j will have values from 0 to n-1.
Say, n = 5. Then
i = 0 j <-- 0,1,2,3,4
i = 1 j <-- 0,1,2,3,4
i = 2 j <-- 0,1,2,3,4
i = 3 j <-- 0,1,2,3,4
i = 4 j <-- 0,1,2,3,4
You can see that the index changes 5 x 5 = 25 times. This is equal to n-squared times.
The way I learned it is this: as an experiment, change the program to not use comparisons directly. Instead, write a function compareTo that does the comparison, but also tracks how many comparisons have been done. Then do the following:
Run the sort with an O(n^2) algorithm and K, 2K, 3K, 4K elements. Log the number of elements sorted and the number of comparisons for each run.
Repeat 1, but with an O(n log n) sort.
Graph the results
You should see that, indeed, you get a quadratic curve for the first, and a curve that looks between a line and a quadratic for the second.
Given your example, for [1,2,3] input
if (array[i] == array[j] && i != j) {
return true;
}
this part of the code is executed 3*3=9 times, but when the input is [1,2,3,4,5,6] this part of the code is executed 6*6=36 times. This is how runtime is quadrupled (36/9=4).
In general, when the amount of elements in the input is N, this if statement is executed N*N times, which makes the runtime O(N^2).
I'm working on a demo that requires a lot of vector math, and in profiling, I've found that it spends the most time finding the distances between given vectors.
Right now, it loops through an array of X^2 vectors, and finds the distance between each one, meaning it runs the distance function X^4 times, even though (I think) there are only (X^2)/2 unique distances.
It works something like this: (pseudo c)
#define MATRIX_WIDTH 8
typedef float vec2_t[2];
vec2_t matrix[MATRIX_WIDTH * MATRIX_WIDTH];
...
for(int i = 0; i < MATRIX_WIDTH; i++)
{
for(int j = 0; j < MATRIX_WIDTH; j++)
{
float xd, yd;
float distance;
for(int k = 0; k < MATRIX_WIDTH; k++)
{
for(int l = 0; l < MATRIX_WIDTH; l++)
{
int index_a = (i * MATRIX_LENGTH) + j;
int index_b = (k * MATRIX_LENGTH) + l;
xd = matrix[index_a][0] - matrix[index_b][0];
yd = matrix[index_a][1] - matrix[index_b][1];
distance = sqrtf(powf(xd, 2) + powf(yd, 2));
}
}
// More code that uses the distances between each vector
}
}
What I'd like to do is create and populate an array of (X^2) / 2 distances without redundancy, then reference that array when I finally need it. However, I'm drawing a blank on how to index this array in a way that would work. A hash table would do it, but I think it's much too complicated and slow for a problem that seems like it could be solved by a clever indexing method.
EDIT: This is for a flocking simulation.
performance ideas:
a) if possible work with the squared distance, to avoid root calculation
b) never use pow for constant, integer powers - instead use xd*xd
I would consider changing your algorithm - O(n^4) is really bad. When dealing with interactions in physics (also O(n^4) for distances in 2d field) one would implement b-trees etc and neglect particle interactions with a low impact. But it will depend on what "more code that uses the distance..." really does.
just did some considerations: the number of unique distances is 0.5*n*n(+1) with n = w*h.
If you write down when unique distances occur, you will see that both inner loops can be reduced, by starting at i and j.
Additionally if you only need to access those distances via the matrix index, you can set up a 4D-distance matrix.
If memory is limited we can save up nearly 50%, as mentioned above, with a lookup function that will access a triangluar matrix, as Code-Guru said. We would probably precalculate the line index to avoid summing up on access
float distanceArray[(H*W+1)*H*W/2];
int lineIndices[H];
searchDistance(int i, int j)
{
return i<j?distanceArray[i+lineIndices[j]]:distanceArray[j+lineIndices[i]];
}