Loop interchange versus Loop tiling - c

Which of these optimizations is better and in what situation? Why?
Intuitively, I am getting the feeling that loop tiling will in general
be a better optimization.
What about for the below example?
Assume a cache which can only store about 20 elements in it at any time.
Original Loop:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
a[i] += a[i]*b[j];
}
}
Loop Interchange:
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 10; j++)
{
a[j] += a[j]*b[i];
}
}
Loop Tiling:
for(int k = 0; k < 1000; k += 20)
{
for(int i = 0; i < 10; i++)
{
for(int j = k; j < min(1000, k+20); j++)
{
a[i] += a[i]*b[j];
}
}
}

The first two cases you are exposing in your question are about the same. Things would really change in the following two cases:
CASE 1:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
b[i] += a[i]*a[j];
}
}
Here you are accessing the matrix "a" as follows: a[0]*a[0], a[0]*a1, a[0]*a[2],.... In most architectures, matrix structures are stored in memory like: a[0]*a[0], a1*a[0], a[2]*a[0] (first column of first row followed by second column of first raw,....). Imagine your cache only could store 5 elements and your matrix is 6x6. The first "pack" of elements that would be stored in cache would be a[0]*a[0] to a[4]*a[0]. Your first acces would cause no cache miss so a[0][0] is stored in cache but the second yes!! a0 is not stored in cache! Then the OS would bring to cache the pack of elements a0 to a4. Then you do the third acces: a[0]*a[2] wich is out of cache again. Another cache miss!
As you can colcude, case 1 is not a good solution for the problem. It causes lots of cache misses that we can avoid changing the code for the following:
CASE 2:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
b[i] += a[i]*a[j];
}
}
Here, as you can see, we are accessing the matrix as it's stored in memory. Consequently it's much better (faster) than case 1.
About the third code you posted about loop tiling, loop tiling and also loop unrolling are optimizations that in most cases the compiler does automaticaly. Here's a very interesting post in stackoverflow explaining these two techniques;
Hope it helps! (sorry about my english, I'm not a native speaker)

Related

Nested loop when inner loop index unequal outer loop index

Nested loop logic to skip the inner loop when its index equals the outer loop index.
Used an if statement within the inner loop to the effect of:
for (i=0;i<N;i++)
for (j=0;j<N;j++)
if (j!=i)
... some code
I believe this gives me the expected results but is there a less CPU consuming method that I may not be aware of?
If you can assume that N <= i, you can split the inner loop into 2 separate for loops to reduce the number of tests:
for (i = 0; i < N; i++) {
for (j = 0; j < i; j++) {
... some code
}
/* here we have j == i, skip this one */
j++;
for (; j < N; j++) {
... same code
}
}
This results in more code but half as many tests on j. Note however that if N is a constant, the compiler might unroll the original inner loop more efficiently. Careful benchmarking is the only way to determine if this solution is worth the effort for your problem, compiler and architecture.
For completeness, this code can be simplified as:
for (i = 0; i < N; i++) {
for (j = 0; j < i; j++) {
... some code
}
/* here we have j == i, skip this one */
while (++j < N) {
... same code
}
}

C 2D Arrays removing certain predetermined rows by shifting the ones below it

This is my first post here.
An assignment of my online course in C asked me to remove each row in a real(non dynamically allocated, pointers are not used) whose average sum is greater than the average sum of the whole matrix. The rows should be "removed" by shifting the ones below it up by one position.
I have set up a matrix with the following code:
int matrix[100][100]
Now, my idea was to create a regular 1D array which stores the indexes of the rows to-be-removed.
This is how I did it:
k = 0;
for (i = 0; i < no_of_rows; i++) {
average_sum_of_row = 0;
for (j = 0; j < no_of_columns; j++) {
average_sum_of_row += matrix[i][j];
}
average_sum_of_row = average_sum_of_row / no_of_columns;
if (average_sum_of_row > average_sum_of_matrix) {
indexes_of_rows_to_remove[k] = i;
k++;
l++;
}
}
Which works just fine! I get an array whose elements are the indexes of the rows which need to be removed. However, while implementing my code into the following:
m = 0;
for (i = 0; i < V; i++) {
if (indexes_of_rows_to_remove[m] == i) {
for (k = i; k < no_of_rows - 1; k++) {
for (j = 0; j < no_of_columns; j++) {
matrix[k][j] = matrix[k + 1][j];
}
}
i--;
no_of_rows--;
}
m++;
}
It does not work. What I used is my existing code of removing a row by shifting the ones below it up and decreasing the number of rows by one, but this simply doesn't work and I don't know why.
I tried using a separate integer(m) to go through all elements of the array of indexes, but for some reason it does not work.
Thanks all!
You can use this algorithm, which skips the rows to be deleted:
k = 0
For i in number of rows:
If i not to be deleted:
matrix[k] = matrix[i] # copy the whole row here
k++
The algorithm you are trying to implement is complicated and very inefficient.

Is calculation or condition checking yield a better performance

This question contains a JavaScript example but it could possibly be relevant for other languages as well.
I got a 2d binary array (values are set to 1 and 0 only). I want to make an action which toggles all values, meaning turn all 0 to 1 and all 1 to 0.
Which is a better way to do it:
1)
for(var i = 0; i < rowsNum; i++)
{
for(var j = 0; j < colNum; j++)
{
if(arr[i][j] == 0)
{
arr[i][j] = 1;
}
else
{
arr[i][j] = 0;
}
}
}
or
2)
for(var i = 0; i < rowsNum; i++)
{
for(var j = 0; j < colNum; j++)
{
arr[i][j] = 1 - arr[i][j];
}
}
I would like to know if there's a generic method which is best for most cases. Also, specifically regarding JS, is there a better way to do it than these 2 methods?
I would go for the second way of doing it, or I would use the xor operation, like this:
for (var i = 0; i < rows; i++) {
for (var j = 0; j < cols; j++) {
arr[i][j] ^= 1;
}
}
The thing is, if statements translate into branching instructions which can be slow due to branch mispredictions. However, the performance gain in an example like this will barely show, and if it makes the code less readable, then it's not worth it. Always optimize last and if it's absolutely necessary.

Was told my C program was "hard coded" and I don't understand why

I turned in my assignment to my online C programming class and was docked huge due to the fact that my program was "hard coded, and I can't see how it would be considered "hard coded" as I ask for user input. The following was my code:
#include <stdio.h>
#include <stdlib.h>
#define IMAX 3
#define JMAX 4
int main()
{
float a[IMAX][JMAX];
float avgrow[5];
float avgcol[5];
int i,j;
char c;
printf ("This program will allow you to enter numbers for 3 rows and 4 columns from left to right then filling down, and take the averages of the rows and columns and list them next to the row and under the columns. You may use decimals but only 2 will display in the results. Press enter!\n");
scanf ("%c",&c);
printf("Enter 12 numbers here for your rows and columns:\n");
for(i = 0; i < IMAX; i++)
{
for(j = 0; j < JMAX; j++)
{
scanf("%f",&a[i][j]);
}
}
for(j = 0; j < JMAX; j++)
{
avgrow[0] = (a[0][0]+a[0][1]+a[0][2]+a[0][3])/JMAX;
avgrow[1] = (a[1][0]+a[1][1]+a[1][2]+a[1][3])/JMAX;
avgrow[2] = (a[2][0]+a[2][1]+a[2][2]+a[2][3])/JMAX;
}
for(i=0; i < IMAX; i++)
{
avgcol[0] = (a[0][0]+a[1][0]+a[2][0])/IMAX;
avgcol[1] = (a[0][1]+a[1][1]+a[2][1])/IMAX;
avgcol[2] = (a[0][2]+a[1][2]+a[2][2])/IMAX;
avgcol[3] = (a[0][3]+a[1][3]+a[2][3])/IMAX;
}
printf(" Column1 Column2 Column3 Column4 Row Average\n\n");
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t%8.2f\n",a[0][0],a[0][1],a[0][2],a[0][3],avgrow[0]);
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t%8.2f\n",a[1][0],a[1][1],a[1][2],a[1][3],avgrow[1]);
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t%8.2f\n",a[2][0],a[2][1],a[2][2],a[2][3],avgrow[2]);
printf("\n");
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t\n",avgcol[0],avgcol[1],avgcol[2],avgcol[3]);
return 0;
}
All it was supposed to do was make a 2-d array with 3 rows and 4 columns, then take the average of the rows and display that next to to the row in a table. Than take the average of the columns and display them beneath the columns in the table.
This was his comments on my assignment "Well, you got the correct answers, but when dealing with a 2-D array, you should use nested for loops. Not one for loop and then a lot of "hard coding" values into the program."
Any help deciphering this would be appreciated as I though I was finally understanding programming until this.
First of all it's not meaningful to talk about a program being hard coded or not. Rather one would talk about specific values being hard coded. What this means is that you wrote their values directly into the code rather than putting them in a constant or variable that can easily be changed.
In this case the values you hard-coded are the number of rows and the number of columns. You do have constants for these, but you don't use them consistently. That is if you changed your constants to turn the array into a 5x5 array, your code would now break because parts of the code would still act like it is an 3x4 array.
Specifically there are two loops in your code where you're accessing the indices [0][0] through [2][3] by spelling out each index in that range specifically rather than using a loop. This means that if you change IMAX and JMAX, it will still use those same indices, which aren't correct any more.
So your array indices are hard-coded and changing the array dimensions breaks your program.
for(j = 0; j < JMAX; j++)
{
avgrow[0] = (a[0][0]+a[0][1]+a[0][2]+a[0][3])/JMAX;
avgrow[1] = (a[1][0]+a[1][1]+a[1][2]+a[1][3])/JMAX;
avgrow[2] = (a[2][0]+a[2][1]+a[2][2]+a[2][3])/JMAX;
}
for(i=0; i < IMAX; i++)
{
avgcol[0] = (a[0][0]+a[1][0]+a[2][0])/IMAX;
avgcol[1] = (a[0][1]+a[1][1]+a[2][1])/IMAX;
avgcol[2] = (a[0][2]+a[1][2]+a[2][2])/IMAX;
avgcol[3] = (a[0][3]+a[1][3]+a[2][3])/IMAX;
}
Notice the copying/pasting of nearly identical code? That's often a sign of hardcoding -- the presence of constants in the code's text or structure. How do you change the 3 and the 4? They're "hard" -- built into the code.
The proof it's a problem -- you have:
#define IMAX 3
#define JMAX 4
But if you actually change those, the code will break. Look at this line of code:
avgrow[0] = (a[0][0]+a[0][1]+a[0][2]+a[0][3])/JMAX;
That's an average if, and only if, JMAX is 4. The code was build with the understanding that JMAX had to be 4 -- JMAX was "hard coded" to 4.
Looking at the following code:
for(j = 0; j < JMAX; j++)
{
avgrow[0] = (a[0][0]+a[0][1]+a[0][2]+a[0][3])/JMAX;
avgrow[1] = (a[1][0]+a[1][1]+a[1][2]+a[1][3])/JMAX;
avgrow[2] = (a[2][0]+a[2][1]+a[2][2]+a[2][3])/JMAX;
}
This code assumes that a always has 3 rows and 4 columns, regardless of how a was actually declared. If you changed JMAX to 2, for example, then your code above would break because a would have dimension 3x2, and you'd be attempting to access elements outside of the array bounds.
What your instructor was looking for was something along these lines:
for(j = 0; j < JMAX; j++)
{
float sum = 0.0;
for (i = 0; i < IMAX; i++ )
{
sum += a[i][j];
}
avgrow[j] = sum/JMAX;
}
This code makes no assumptions about the dimensions of a beyond what is specified by IMAX and JMAX.
Note also that your declarations for avgrow and avgcol are hard-coded to 5, when they should also be based on IMAX and JMAX:
float avgrow[IMAX];
float avgcol[JMAX];
float avgrow[5];
float avgcol[5];
for(j = 0; j < JMAX; j++)
{
avgrow[0] = (a[0][0]+a[0][1]+a[0][2]+a[0][3])/JMAX;
avgrow[1] = (a[1][0]+a[1][1]+a[1][2]+a[1][3])/JMAX;
avgrow[2] = (a[2][0]+a[2][1]+a[2][2]+a[2][3])/JMAX;
}
for(i=0; i < IMAX; i++)
{
avgcol[0] = (a[0][0]+a[1][0]+a[2][0])/IMAX;
avgcol[1] = (a[0][1]+a[1][1]+a[2][1])/IMAX;
avgcol[2] = (a[0][2]+a[1][2]+a[2][2])/IMAX;
avgcol[3] = (a[0][3]+a[1][3]+a[2][3])/IMAX;
}
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t%8.2f\n",a[0][0],a[0][1],a[0][2],a[0][3],avgrow[0]);
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t%8.2f\n",a[1][0],a[1][1],a[1][2],a[1][3],avgrow[1]);
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t%8.2f\n",a[2][0],a[2][1],a[2][2],a[2][3],avgrow[2]);
printf("\n");
printf("%8.2f\t%8.2f\t%8.2f\t%8.2f\t\n",avgcol[0],avgcol[1],avgcol[2],avgcol[3]);
THESE ALL STEPS YOU HAVE TAKEN ARE HARD CODING AS YOU ARE SPECIFICALLY MENTIONING THE INDICES IF SUPOSE the value imax and jmax changes then you have to manually change/add avgcol[index] in taking averages as well as in output...
Your programm should be independent of it it should only depend on the value of i max and jamx
for a reference you can view a sample of your code in which i have removed hardcoding
http://ideone.com/mXymKS
Although this code could be simplied to a great extent...
You have used explicit integer values for the array indices of avgrow and avgcol. To avoid hard-coding, try using a loop with an integer variable as index, e.g.,
for(int k=0; k<4; ++k)
and then assign values to avgrow[k].
Your professor expected you to be able to modify the number of line and columns in your program easily. The disadvantage of the program you wrote is that a modification of one of those two parameters require you to change the whole program. You can achieve more flexibility for instance like this:
#include <stdio.h>
#include <stdlib.h>
#define IMAX 3
#define JMAX 4
int main()
{
float a[IMAX][JMAX];
float avgrow[IMAX] = {0};
float avgcol[JMAX] = {0};
printf ("This program will allow you to enter numbers for %d rows"
"and %d columns from left to right then filling down, and"
" take the averages of the rows and columns and list them"
" next to the row and under the columns. You may use "
"decimals but only 2 will display in the results. Press"
" enter!\n", IMAX, JMAX);
char c;
scanf ("%c",&c);
printf("Enter %d numbers here for your rows and columns:\n", IMAX * JMAX);
for(int i = 0; i < IMAX; i++) {
for(int j = 0; j < JMAX; j++) {
scanf("%f",&a[i][j]);
}
}
for(int i = 0; i < IMAX; i++) {
for(int j = 0; j < JMAX; j++) {
avgrow[i] += a[i][j];
}
avgrow[i] /= JMAX;
}
for(int j = 0; j < JMAX; j++) {
for(int i = 0; i < IMAX; i++) {
avgcol[j] += a[i][j];
}
avgcol[j] /= IMAX;
}
for(int i = 0; i < IMAX; i++) {
printf("Column%d\t", i);
}
printf("Row-Average\n\n");
for(int j = 0; j < JMAX; j++) {
for (int i = 0; i < IMAX; i++) {
printf("%8.2f\t", a[i][j]);
}
printf("%8.2f\n", avgrow[j]);
}
for(int i = 0; i < IMAX; i++) {
printf("%8.2f\t", avgcol[i]);
}
return 0;
}
In addition to the rich answers already exist, I'd like to point out something about acquiring data and computing averages without much repetitions :
So define your average arrays like this :
float avgrow[IMAX] ={0};
float avgcol[JMAX] ={0};
Then in the same loop where you scanf user's entries you can simultaneously compute averages like this:
printf("Enter %d numbers here for your rows and columns:\n", IMAX*JMAX);
for(i = 0; i < IMAX; i++)
{
for(j = 0; j < JMAX; j++)
{
scanf("%f",&a[i][j]);
avgrow[i] += a[i][j]/JMAX;
avgcol[j] += a[i][j]/IMAX;
}
}
Next step is just to print out everything, and let it be automated too :)
for(i=1; i<= JMAX; i++) printf("Column%d\t\t", i);
printf("Row Average\n");
for(i=0; i<IMAX; i++)
{
for(j=0; j<JMAX; j++)
{
printf("%8.2f\t", a[i][j]);
}
printf("%8.2f\n", avgrow[i]);
}
for(i=0; i<JMAX; i++)
printf("%8.2f\t", avgcol[i]);
By the END
you have a code that computes row and col averages for any sizes. i.e. try changing IMAX or JMAX

What is the best way to loop through a 2D sub-array of a 2D array?

If I have a 2D array, it is trivial to loop through the entire array, a row or a column by using for loops. However, occasionally, I need to traverse an arbitrary 2D sub-array.
A great example would be sudoku in which I might store an entire grid in a 2D array but then need to analyse each individual block of 9 squares. Currently, I would do something like the following:
for(i = 0; i < 9; i += 3) {
for(j = 0; j < 9; j += 3) {
for(k = 0; k < 3; k++) {
for(m = 0; m < 3; m++) {
block[m][k] == grid[j + m][i + k];
}
}
//At this point in each iteration of i/j we will have a 2D array in block
//which we can then iterate over using more for loops.
}
}
Is there a better way to iterate over arbitrary sub-arrays especially when they occur in a regular pattern such as above?
The performance on this loop structure will be horrendous. Consider the inner most loop:
for(m = 0; m < 3; m++) {
block[m][k] == grid[j + m][i + k];
}
C is "row-major" ordered, which means that accessing block will cause a cache miss on each iteration! That's because the memory is not accessed contiguously.
There's a similar issue for grid. Your nested loop order is to fix i before varying j, yet you are accessing grid on j as the row. This again is not contiguous and will cache miss on every iteration.
So a rule of thumb for when dealing with nested loops and multidimensional arrays is to place the loop indices and array indices in the same order. For your code, that's
for(j = 0; j < 9; j += 3) {
for(m = 0; m < 3; m++) {
for(i = 0; i < 9; i += 3) {
for(k = 0; k < 3; k++) {
block[m][k] == grid[j + m][i + k];
}
}
// make sure you access everything so that order doesn't change
// your program's semantics
}
}
Well in the case of sudoku couldn't you just store 9 3x3 arrays. Then you don't need to bother with sub arrays... If you start moving to much larger grids than sudoku you would improve cache performance this way as well.
Ignoring that, your code above works fine.
Imagine you have a 2D array a[n][m]. In order to loop a subarray q x r whose upper right corner is at position x,y use:
for(int i = x; i < n && i < x + q; ++i)
for(int j = y; j < m && j < y + r; ++j)
{
///
}
For your sudoku example, you could do this
for(int i = 0; i<3; ++i)
for(int j = 0; j < 3; ++j)
for(int locali = 0; locali < 3; ++locali)
for(int localj = 0; localkj <3; ++localj)
//the locali,localj element of the bigger i,j 3X3 square is
a[3*i + locali][3*j+localj]

Resources