cache speed related issue - c

just wondering if I have a matrix like this:
so for the entry in the red position, I just want to know if the green entry is 1 or 0. If it is 0, I just have the value 0 in the same red position of a new matrix. So
if the current position of the red entry is at Matrix[4,3], then the input to the position Matrix[4,3] will be 1 since the green entry (the top-left entry of red) is 1. If the green entry is 0, then Matrix[4,3] will be 0.
So I have to do it pretty much for every entry in this matrix (except the entries that don't have the top-left neighbour for example all entries in first column would not have the top-left neighbour so this don't have to be done for those). I am just wondering will it makes a difference in terms of speed (cache access,reading, writing...etc) if I do the following in C (note that C is row-major):
Code 1:
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
if (i > 0 && j > 0) {
row = j - 1;
col = i - 1; // oldMatrix here is the matrix in the attached image here
*** newMatrix[j][i] += oldMatrix[row][col]; // new Matrix is the matrix to put the //left-top corner value
}
}
}
Code 2:
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
if (i > 0 && j > 0) {
row = j - 1;
col = i - 1; // oldMatrix here is the matrix in the attached image here
if(oldMatrix[row][col] == 1) {
newMatrix[j][i] = 1;
} else {newMatrix[j][i] = 0};
}
}
}
is it better to have this:
*** newMatrix[j][i] += oldMatrix[row][col]
or have this instead:
if(oldMatrix[row][col] == 1) {
newMatrix[j][i] = 1;
} else {newMatrix[j][i] = 0};
So the first one (the line with ***) will first have to read the value of oldMatrix[row][col] and then write to newMatrix[j][i] no matters what.
But the second code will read oldMatrix[row][col] during the "if" check. Then we just assign (assign means "write" 1 or 0 to the newMatrix[j][i] without having to copy oldMatrix[row][col] to the newMatrix[j][i]).
So which one will be better in terms of speed if we talking about cache performance (C is row-major) and why the option would be faster or better?
thank you

Related

Rotate a matrix by one element: is there an easier/simpler implementation?

I was written this code to rotate the matrix elements one by one.
I wanted to do this on my own but I am stuck in this position someone please find what is the problem in my code and suggest me there is any easy implementation other than this.
#include<iostream>
using namespace std;
void rotate(int a[][10],int r,int c)
{
int b[10][10];
// Copying the input matrix 2d array to temp array
for (int x = 0; x < r; x++)
{
for (int y = 0; y < c; y++)
{
b[x][y] = a[x][y];
}
}
//Rotation process
int i = 0;
int j = 0;
int flag = 0;
while (flag == 0)
{
if (i == 0 && j < c-1)
{
b[i][j+1] = a[i][j];
j++;
}
else if (j == c-1 && i < r-1)
{
b[i+1][j] = a[i][j];
i++;
}
else if (i == r-1 && j <= c-1 && j > 0)
{
b[i][j-1]=a[i][j];
j--;
}
else if (j == 0 && i <= r-1)
{
if (i==0 && j==0)
{
//to break the loop
flag = 1;
}
b[i-1][j] = a[i][j];
i--;
}
}
for (int k = 0; k < r; k++)
{
for (int l = 0; l < c; l++)
{
cout<<"\t"<<b[k][l];
}
cout<<endl;
}
}
int main()
{
int a[10][10],row,col;
cout<<"Enter no of rows : ";
cin>>row;
cout<<"Enter no of columns : ";
cin>>col;
// Getting array elements
cout<<"Enter "<<row<<"X"<<col<<" matrix elements : ";
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
cin>>a[i][j];
}
}
rotate(a,row,col);
cout<<endl;
system("pause");
return 0;
}
This code is not working. Someone help me to find what wrong with this code or suggest me another way.
Thank you.
If I were solving this, I might use 4 independent loops and a single variable to hold the value to carry forward (can be read from the original or matrix copy). While perhaps more repetitive than some approaches, it also removes needing to track 'which direction' state or delta movement variables.
In pseudo-code, it might look like this:
carry = m[0][0]
x, y = 1, 0 # start at (1,0) so we end on (0,0)
# go right from (1,0) to (cols-1,0)
while x < cols:
temp = m[x][y] # hold value of this cell
m[x][y] = carry # replace it with the carried-over value
carry = temp # and forward the previous value as the next carry
x += 1 # update position
# reset back to valid index (avoid additional check in loop)
x -= 1
# do same for other directions around
# the movement is thus:
# (1,0) -> (cols-1,0) NW -> NE
# (cols-1,0) -> (cols-1,rows-1) NE -> SE
# (cols-1,rows-1) -> (0,rows-1) SE -> SW
# (0,rows-1) -> (0,0) SW -> NW
A 1xN or Nx1 matrix might need additional consideration depending on expectations.
Another approach is to use delta variables to move. Think of a little turtle that walks straight and turns right when walking into a wall. The terminal condition is once again set as (0,0) which can be checked at the end of the logic - in this case when the turtle attempts to walk North into a wall, we know it was from (0,0) and the path is completed.
This approach feels less repetitive while maintaining simple state transitions.
carry = m[0][0]
x, y = 1, 0 # start at (1,0)
dx, dy = 1, 0 # and facing East
while true:
# could also use prev_x and prev_y instead of a carry
temp = m[x][y]
m[x][y] = carry
carry = temp
# move / walk
x += dx
y += dy
# turn right when running into a wall
# at most one bound can be violated at a time
if x >= cols:
dx, dy = 0, 1 # face South (was facing East)
x = cols - 1
else if y >= rows:
dx, dy = -1, 0 # face West (was facing South)
y = rows - 1
else if x < 0:
dx, dy = 0, -1 # face North (was facing West)
x = 0
else if y < 0:
# at (0,0) walking North - finished!
break

How to find all possible 5 dots alignments in Join Five game

I'm trying to implement the Join Five game. It is a game where, given a grid and a starting configuration of dots, you have to add dots in free crossings, so that each dot that you add forms a 5-dot line with those already in the grid. Two lines may only have 1 dot in common (they may cross or touch end to end)
My game grid is an int array that contains 0 or 1. 1 if there is a dot, 0 if there isn't.
I'm doing kinda well in the implementation, but I'd like to display all the possibles moves.
I made a very long and ugly function that is available here : https://pastebin.com/tw9RdNgi (it was way too long for my post i'm sorry)
here is a code snippet :
if(jeu->plat[i][j] == 0) // if we're on a empty spot
{
for(k = 0; k < lineSize; k++) // for each direction
{
//NORTH
if(jeu->plat[i-1-k][j] == 1) // if there is a dot north
{
n++; // we count it
}
else
{
break; //we change direction
}
} //
This code repeats itself 7 other times changing directions and if n or any other variable reaches 4 we count the x and y as a possible move.
And it's not even treating all the cases, if the available spot is between 2 and 2 dots it will not count it. same for 3 and 1 and 1 and 3.
But I don't think the way I started doing it is the best one. I'm pretty sure there is an easier and more optimized way but i can't figure it out.
So my question is: could somebody help me figure out how to find all the possible 5-dot alignments, or tell me if there is a better way of doing it?
Ok, the problem is more difficult than it appears, and a lot of code is required. Everything would have been simpler if you posted all of the necessary code to run it, that is a Minimal, Complete, and Verifiable Example. Anyway, I resorted to putting together a structure for the problem which allows to test it.
The piece which answers your question is the following one:
typedef struct board {
int side_;
char **dots_;
} board;
void board_set_possible_moves(board *b)
{
/* Directions
012
7 3
654 */
static int dr[8] = { -1,-1,-1, 0, 1, 1, 1, 0 };
static int dc[8] = { -1, 0, 1, 1, 1, 0,-1,-1 };
int side_ = b->side_;
char **dots_ = b->dots_;
for (int r = 0; r < side_; ++r) {
for (int c = 0; c < side_; ++c) {
// The place already has a dot
if (dots_[r][c] == 1)
continue;
// Count up to 4 dots in the 8 directions from current position
int ndots[8] = { 0 };
for (int d = 0; d < 8; ++d) {
for (int i = 1; i <= 4; ++i) {
int nr = r + dr[d] * i;
int nc = c + dc[d] * i;
if (nr < 0 || nc < 0 || nr >= side_ || nc >= side_ || dots_[nr][nc] != 1)
break;
++ndots[d];
}
}
// Decide if the position is a valid one
for (int d = 0; d < 4; ++d) {
if (ndots[d] + ndots[d + 4] >= 4)
dots_[r][c] = 2;
}
}
}
}
Note that I defined a square board with a pointer to pointers to chars, one per place. If there is a 0 in one of the places, then there is no dot and the place is not a valid move; if there is a 1, then there is a dot; if there is a 2, then the place has no dot, but it is a valid move. Valid here means that there are at least 4 dots aligned with the current one.
You can model the directions with a number from 0 to 7 (start from NW, move clockwise). Each direction has an associated movement expressed as dr and dc. Moving in every direction I count how many dots are there (up to 4, and stopping as soon as I find a non dot), and later I can sum opposite directions to obtain the total number of aligned points.
Of course these move are not necessarily valid, because we are missing the definition of lines already drawn and so we cannot check for them.
Here you can find a test for the function.
#include <stdio.h>
#include <stdlib.h>
board *board_init(board *b, int side) {
b->side_ = side;
b->dots_ = malloc(side * sizeof(char*));
b->dots_[0] = calloc(side*side, 1);
for (int r = 1; r < side; ++r) {
b->dots_[r] = b->dots_[r - 1] + side;
}
return b;
}
board *board_free(board *b) {
free(b->dots_[0]);
free(b->dots_);
return b;
}
void board_cross(board *b) {
board_init(b, 18);
for (int i = 0; i < 4; ++i) {
b->dots_[4][7 + i] = 1;
b->dots_[7][4 + i] = 1;
b->dots_[7][10 + i] = 1;
b->dots_[10][4 + i] = 1;
b->dots_[10][10 + i] = 1;
b->dots_[13][7 + i] = 1;
b->dots_[4 + i][7] = 1;
b->dots_[4 + i][10] = 1;
b->dots_[7 + i][4] = 1;
b->dots_[7 + i][13] = 1;
b->dots_[10 + i][7] = 1;
b->dots_[10 + i][10] = 1;
}
}
void board_print(const board *b, FILE *f)
{
int side_ = b->side_;
char **dots_ = b->dots_;
for (int r = 0; r < side_; ++r) {
for (int c = 0; c < side_; ++c) {
static char map[] = " oX";
fprintf(f, "%c%s", map[dots_[r][c]], c == side_ - 1 ? "" : " - ");
}
fprintf(f, "\n");
if (r < side_ - 1) {
for (int c = 0; c < side_; ++c) {
fprintf(f, "|%s", c == side_ - 1 ? "" : " ");
}
fprintf(f, "\n");
}
}
}
int main(void)
{
board b;
board_cross(&b);
board_set_possible_moves(&b);
board_print(&b, stdout);
board_free(&b);
return 0;
}

2D convolution with a with a kernel which is not center originated

I want to do 2D convolution of an image with a Gaussian kernel which is not centre originated given by equation:
h(x-x', y-y') = exp(-((x-x')^2+(y-y'))/2*sigma)
Lets say the centre of kernel is (1,1) instead of (0,0). How should I change my following code for generation of kernel and for the convolution?
int krowhalf=krow/2, kcolhalf=kcol/2;
int sigma=1
// sum is for normalization
float sum = 0.0;
// generate kernel
for (int x = -krowhalf; x <= krowhalf; x++)
{
for(int y = -kcolhalf; y <= kcolhalf; y++)
{
r = sqrtl((x-1)*(x-1) + (y-1)*(y-1));
gKernel[x + krowhalf][y + kcolhalf] = exp(-(r*r)/(2*sigma));
sum += gKernel[x + krowhalf][y + kcolhalf];
}
}
//normalize the Kernel
for(int i = 0; i < krow; ++i)
for(int j = 0; j < kcol; ++j)
gKernel[i][j] /= sum;
float **convolve2D(float** in, float** out, int h, int v, float **kernel, int kCols, int kRows)
{
int kCenterX = kCols / 2;
int kCenterY = kRows / 2;
int i,j,m,mm,n,nn,ii,jj;
for(i=0; i < h; ++i) // rows
{
for(j=0; j < v; ++j) // columns
{
for(m=0; m < kRows; ++m) // kernel rows
{
mm = kRows - 1 - m; // row index of flipped kernel
for(n=0; n < kCols; ++n) // kernel columns
{
nn = kCols - 1 - n; // column index of flipped kernel
//index of input signal, used for checking boundary
ii = i + (m - kCenterY);
jj = j + (n - kCenterX);
// ignore input samples which are out of bound
if( ii >= 0 && ii < h && jj >= 0 && jj < v )
//out[i][j] += in[ii][jj] * (kernel[mm+nn*29]);
out[i][j] += in[ii][jj] * (kernel[mm][nn]);
}
}
}
}
}
Since you're using the convolution operator you have 2 choices:
Using it Spatial Invariant property.
To so so, just calculate the image using regular convolution filter (Better done using either conv2 or imfilter) and then shift the result.
You should mind the boundary condition you'd to employ (See imfilter properties).
Calculate the shifted result specifically.
You can do this by loops as you suggested or more easily create non symmetric kernel and still use imfilter or conv2.
Sample Code (MATLAB)
clear();
mInputImage = imread('3.png');
mInputImage = double(mInputImage) / 255;
mConvolutionKernel = zeros(3, 3);
mConvolutionKernel(2, 2) = 1;
mOutputImage01 = conv2(mConvolutionKernel, mInputImage);
mConvolutionKernelShifted = [mConvolutionKernel, zeros(3, 150)];
mOutputImage02 = conv2(mConvolutionKernelShifted, mInputImage);
figure();
imshow(mOutputImage01);
figure();
imshow(mOutputImage02);
The tricky part is to know to "Crop" the second image in the same axis as the first.
Then you'll have a shifted image.
You can use any Kernel and any function which applies convolution.
Enjoy.

Using two Arrays in C/Gameboy programming

For a game in Gameboy programming, I am using four arrays called top, oldTop, bottom and oldBottom:
struct Point { int x, y; };
struct Rect { struct Point xx, yy; };
Rect top[size], oldTop[size];
Rect bottom[size], oldBottom[i];
where Rect is a struct made of two Struct Points, the top-left and the bottom right corner points.
The idea of the game is to have random-heighted blocks top-down from the ceiling and bottom-up from the floor.
It is similar to the copter-classic game. In my infinite while loop, I shift all of the rectangles down by one pixel using the following code
while (1)
{
for (int i = 0; i < size; i++)
{
//in Struct Rect, xx is the top-left corner point, and yy is the bottom right
top[i].xx.x--;
top[i].yy.x--;
bottom[i].xx.x--;
bottom[i].yy.x--;
if (top[i].xx.x < 0)
{
top[i].xx.x += 240;
top[i].yy.x += 240;
}
if (bottom[i].xx.x < 0)
{
bottom[i].xx.x += 240;
bottom[i].yy.x += 240;
}
}
for (int i = 0; i < size; i++)
{
drawRect(oldTop[i], colorBlack);
drawRect(oldBottom[i], colorBlack);
}
/*call delay function that wait for Vertical Blank*/
for(int i = 0; i < size; i++)
{
drawRect(top[i], colorGreen);
drawRect(bottom[i], colorGreen);
oldTop[i] = top[i];
oldBottom[i] = bottom[i];
}
}
The drawRect method uses DMA to draw the rectangle.
with this code, the code should display the rectangles like this: (drew this up in paint)
But the result I get is
What is odd is that if I don't draw the bottom row at all, then the top row draws fine. The result only messes up when I draw both. This is really weird because I think that the code should be working fine, and the code is not very complicated. Is there a specific reason this is happening, and is there a way to remedy this?
Thanks.
The code that I use to draw the rectangle looks like this:
void drawRect(int row, int col, int width, int height){
int i;
for (i=0; i<height; i++)
{
DMA[3].src = &color;
DMA[3].dst = videoBuffer + (row+r)*240 + col);
DMA[3].cnt = DMA_ON | DMA_FIXED_SOURCE | width;
}
}
Here's a debugging SSCCE (Short, Self-Contained, Correct Example) based on your code. There are assertions in this code that fire; it runs, but is known not to be correct. I've renamed bottom to btm and oldBottom to oldBtm so that the names are symmetric; it makes the code layout more systematic (but is otherwise immaterial).
#include <assert.h>
#include <stdio.h>
typedef struct Point { int x, y; } Point;
typedef struct Rect { struct Point xx, yy; } Rect;
enum { size = 2 };
typedef enum { colourGreen = 0, colourBlack = 1 } Colour;
/*ARGSUSED*/
static void drawRect(Rect r, Colour c)
{
printf(" (%3d)(%3d)", r.xx.x, r.yy.x);
}
int main(void)
{
Rect top[size], oldTop[size];
Rect btm[size], oldBtm[size];
int counter = 0;
for (int i = 0; i < size; i++)
{
top[i].xx.x = 240 - 4 * i;
top[i].xx.y = 0 + 10 + i;
top[i].yy.x = 240 - 14 * i;
top[i].yy.y = 0 + 20 + i;
btm[i].xx.x = 0 + 72 * i;
btm[i].xx.y = 0 + 10 * i;
btm[i].yy.x = 0 + 12 * i;
btm[i].yy.y = 0 + 20 * i;
oldTop[i] = top[i];
oldBtm[i] = btm[i];
}
while (1)
{
if (counter++ > 480) // Limit amount of output!
break;
for (int i = 0; i < size; i++)
{
//in Struct Rect, xx is the top-left corner point, and yy is the bottom right
top[i].xx.x--;
top[i].yy.x--;
btm[i].xx.x--;
btm[i].yy.x--;
if (top[i].xx.x < 0)
{
top[i].xx.x += 240;
top[i].yy.x += 240;
}
if (btm[i].xx.x < 0)
{
btm[i].xx.x += 240;
btm[i].yy.x += 240;
}
}
for (int i = 0; i < size; i++)
{
assert(top[i].xx.x >= 0 && top[i].yy.x >= 0);
assert(btm[i].xx.x >= 0 && btm[i].yy.x >= 0);
}
for (int i = 0; i < size; i++)
{
drawRect(oldTop[i], colourBlack);
drawRect(oldBtm[i], colourBlack);
}
/*call delay function that wait for Vertical Blank*/
for(int i = 0; i < size; i++)
{
drawRect(top[i], colourGreen);
drawRect(btm[i], colourGreen);
oldTop[i] = top[i];
oldBtm[i] = btm[i];
}
putchar('\n');
}
return(0);
}
As noted in a late comment, one big difference between this and your code is that oldBottom in your code is declared as:
Rect top[size], oldTop[size];
Rect bottom[size], oldBottom[i];
using the size i instead of size. This probably accounts for array overwriting issues you see.
There's a second problem though; the assertions in the loop in the middle fire:
(240)(240) ( 0)( 0) (236)(226) ( 72)( 12) (239)(239) (239)(239) (235)(225) ( 71)( 11)
(239)(239) (239)(239) (235)(225) ( 71)( 11) (238)(238) (238)(238) (234)(224) ( 70)( 10)
(238)(238) (238)(238) (234)(224) ( 70)( 10) (237)(237) (237)(237) (233)(223) ( 69)( 9)
(237)(237) (237)(237) (233)(223) ( 69)( 9) (236)(236) (236)(236) (232)(222) ( 68)( 8)
(236)(236) (236)(236) (232)(222) ( 68)( 8) (235)(235) (235)(235) (231)(221) ( 67)( 7)
(235)(235) (235)(235) (231)(221) ( 67)( 7) (234)(234) (234)(234) (230)(220) ( 66)( 6)
(234)(234) (234)(234) (230)(220) ( 66)( 6) (233)(233) (233)(233) (229)(219) ( 65)( 5)
(233)(233) (233)(233) (229)(219) ( 65)( 5) (232)(232) (232)(232) (228)(218) ( 64)( 4)
(232)(232) (232)(232) (228)(218) ( 64)( 4) (231)(231) (231)(231) (227)(217) ( 63)( 3)
(231)(231) (231)(231) (227)(217) ( 63)( 3) (230)(230) (230)(230) (226)(216) ( 62)( 2)
(230)(230) (230)(230) (226)(216) ( 62)( 2) (229)(229) (229)(229) (225)(215) ( 61)( 1)
(229)(229) (229)(229) (225)(215) ( 61)( 1) (228)(228) (228)(228) (224)(214) ( 60)( 0)
Assertion failed: (btm[i].xx.x >= 0 && btm[i].yy.x >= 0), function main, file video.c, line 63.
I think your 'not negative' checks should be revised to:
if (top[i].xx.x < 0)
top[i].xx.x += 240;
if (top[i].yy.x < 0)
top[i].yy.x += 240;
if (btm[i].xx.x < 0)
btm[i].xx.x += 240;
if (btm[i].yy.x < 0)
btm[i].yy.x += 240;
This stops anything going negative. However, it is perfectly plausible that you should simply be checking on the bottom-right x-coordinate (instead of the top-left coordinate) using the original block. Or the wraparound may need to be more complex altogether. That's for you to decipher. But I think that the odd displays occur because you were providing negative values where you didn't intend to and weren't supposed to.
The key points to note here are:
When you're debugging an algorithm, you don't have to use the normal display mechanisms.
When you're debugging, reduce loop sizes where you can (size == 2).
Printing just the relevant information (here, the x-coordinates) helped reduce the output.
Putting the counter code to limit the amount of output simplifies things.
If things are going wrong, look for patterns in what is going wrong early.
I had various versions of the drawRect() function before I got to the design shown, which works well on a wide screen (eg 120x65) terminal window.

Finding largest square submatrix of ones in a given square matrix of 0's and 1's?

The following was given as an interview question:
Write a function that outputs the size of the largest square submatrix consisting solely of ones in a square matrix of ones and zeros.
Example 1:
0 1
0 0
Output: 1
Example 2:
0 0 0
0 1 1
0 1 1
Output: 2
Example 3:
1 1 1
1 1 1
1 1 1
Output 3
I was hoping for an efficient solution to this problem if at all possible.
Use Search and then Dynamic Programming.
First idea of implementation:
Start search on row r=1.
Find longest sequence of ones in that row, and assign this length to x.
Try to find a square matrix of ones with side=x starting at row r. If successful, max=x. If not, decrease x and repeat this step if x>1. If nothing found, max could be 0 or 1.
Increase r, and repeat.
Then improve your algorithm (stop if remaining rows are less than current max, and so on).
Here is O(n) implementation in C# using dynamic programming. Basically you are building another matrix of biggest size (including itself) while you are reading every cell of the matrix.
public static int LargestSquareMatrixOfOne(int[,] original_mat)
{
int[,] AccumulatedMatrix = new int[original_mat.GetLength(0), original_mat.GetLength(1)];
AccumulatedMatrix[0, 0] = original_mat[0, 0];
int biggestSize = 1;
for (int i = 0; i < original_mat.GetLength(0); i++)
{
for (int j = 0; j < original_mat.GetLength(1); j++)
{
if (i > 0 && j > 0)
{
if (original_mat[i, j] == 1)
{
AccumulatedMatrix[i, j] = Math.Min(AccumulatedMatrix[i - 1, j - 1], (Math.Min(AccumulatedMatrix[i - 1, j], AccumulatedMatrix[i, j - 1]))) + 1;
if (AccumulatedMatrix[i, j] > biggestSize)
{
biggestSize = AccumulatedMatrix[i, j];
}
}
else
{
AccumulatedMatrix[i, j] = 0;
}
}
else if ( (i > 0 && j == 0) || (j > 0 && i == 0))
{
if (original_mat[i, j] == 1) { AccumulatedMatrix[i, j] = 1; }
else { AccumulatedMatrix[i, j] = 0; }
}
}
}
return biggestSize;
}

Resources