Double sum optimization - arrays

Recently I got this question in one of my interviews, which I unfortunately skipped, but I'm very curious to get the answer. Can you help me?
int sum = 0;
int num = 100000000;
for (int i = 0; i < num; i++){
for (int j = 0; j < num; j++ ){
sum += m_DataX[i] * m_DataX[j];
}
}
EDITED: Also I would like to see if it is possible to optimize if we have the following expression for sum:
sum += m_DataX[i] * m_DataY[j];

Simply, square of sum of the numbers.
Why?
Let, an array is, |1|2|3|
Then, the code produces
1*1 + 1*2 + 1*3
2*1 + 2*2 + 2*3
3*1 + 3*2 + 3*3
That is,
(1*1 + 1*2 + 1*3) + (2*1 + 2*2 + 2*3) + (3*1 + 3*2 + 3*3)
=>1(1+2+3) + 2(1+2+3) + 3(1+2+3)
=>(1+2+3)*(1+2+3)
Therefore, the code will be
int tempSum = 0;
for (int i = 0; i < num ; i ++){
tempSum+=m_DataX [i];
}
sum=tempSum*tempSum;
Update:
What if, sum += m_DataX[i]*m_DataY[j]
Let, two arrays are, |1|2|3| and |4|5|6|
Therefore,
1*4 + 1*5 + 1*5
2*4 + 2*5 + 2*6
3*4 + 3*5 + 3*6
=> 1*4 + 2*4 + 3*4 + 1*5 + 2*5 + 3*5 + 1*6 + 2*6 + 3*6
=> (1+2+3)*(4+5+6)

First, instantiate i and j outside the for loop. Then sum of all the elements and compute the square of it that will be your result.

int tempSumX = 0;
int tempSumY = 0;
for (int i = 0; i < num; i++) {
tempSumX += m_deltaX[i];
tempSumY += m_deltaY[i];
}
sum = tempSumX * tempSumY;
For the 2nd case

Related

Remove trailing '+' sign in this output resulting from double for loops in C

In the code attached, how do I modify it to remove Remove the trailing '+' signs.
int i,j,sum;
sum=1;
for(i=2; i<=10; i++) {
for(j=1; j<(i+1); j++) {
sum = sum + 1;
printf("%d + ",j);
}
printf(" = %d", sum);
printf("\n");
}
return EXIT_SUCCESS;
}
Here is the output:
1 + 2 + = 3
1 + 2 + 3 + = 6
1 + 2 + 3 + 4 + = 10
1 + 2 + 3 + 4 + 5 + = 15
1 + 2 + 3 + 4 + 5 + 6 + = 21
1 + 2 + 3 + 4 + 5 + 6 + 7 + = 28
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + = 36
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + = 45
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + = 55
For example you can do it the following way
for(j=1; j<(i+1); j++) {
sum = sum + 1;
if ( j != 1 ) printf( " + " );
printf("%d",j);
}
You can't 'remove' output; you have to avoid generating it.
One way is to use:
for (int i = 2; i <= 10; i++)
{
int sum = 0;
const char *pad = "";
for (int j = 1; j <= i; j++)
{
sum += j;
printf("%s%d", pad, j);
pad = " + ";
}
printf(" = %d\n", sum);
}
Note that this recalculates sum more directly, setting it to zero before the inner loop. It also minimizes the scope of the variables.
You can set and print the initial value in the outer loop. For my opinion also make it more readable.
Furthermore you can use j instead of sum+1 for the addend
for (int i = 2; i <= 10; i++) {
int sum = 1;
printf("%d", sum);
for (int j = 2; j<(i + 1); j++) {
sum += j;
printf(" + %d", j);
}
printf(" = %d", sum);
printf("\n");
}

Loop unrolling doesn't work with remaining elements

I have a typical algorithm for matrix multiplication. I am trying to apply and understand loop unrolling, but I am having a problem implementing the algorithm when I am trying to unroll k times when k isn't a multiple of the matrices size. (I get very large numbers as a result instead). That means I am not getting how to handle the remaining elements after unrolling. Here is what I have:
void Mult_Matx(unsigned long* a, unsigned long* b, unsigned long*c, long n)
{
long i = 0, j = 0, k = 0;
unsigned long sum, sum1, sum2, sum3, sum4, sum5, sum6, sum7;
for (i = 0; i < n; i++)
{
long in = i * n;
for (j = 0; j < n; j++)
{
sum = sum1 = sum2 = sum3 = sum4 = sum5 = sum6 = sum7 = 0;
for (k = 0; k < n; k += 8)
{
sum = sum + a[in + k] * b[k * n + j];
sum1 = sum1 + a[in + (k + 1)] * b[(k + 1) * n + j];
sum2 = sum2 + a[in + (k + 2)] * b[(k + 2) * n + j];
sum3 = sum3 + a[in + (k + 3)] * b[(k + 3) * n + j];
sum4 = sum4 + a[in + (k + 4)] * b[(k + 4) * n + j];
sum5 = sum5 + a[in + (k + 5)] * b[(k + 5) * n + j];
sum6 = sum6 + a[in + (k + 6)] * b[(k + 6) * n + j];
sum7 = sum7 + a[in + (k + 7)] * b[(k + 7) * n + j];
}
if (n % 8 != 0)
{
for (k = 8 * (n / 8); k < n; k++)
{
sum = sum + a[in + k] * b[k * n + j];
}
}
c[in + j] = sum + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7;
}
}
}
Let's say size aka n is 12. When I unroll it 4 times, this code works, meaning when it never enters the remainder loop. But I am losing track of what's going on when it does! If anyone can direct me where I am going wrong, I'd really appreciate it. I am new to this, and having a hard time figuring out.
A generic way of unrolling a loop on this shape:
for(int i=0; i<N; i++)
...
is
int i;
for(i=0; i<N-L; i+=L)
...
for(; i<N; i++)
...
or if you want to keep the index variable in the scope of the loops:
for(int i=0; i<N-L; i+=L)
...
for(int i=L*(N/L); i<N; i++)
...
Here, I'm using the fact that integer division is rounded down. L is the number of steps you do in the first loop.
Example:
const int N=22;
const int L=6;
int i;
for(i=0; i<N-L; i+=L)
{
printf("%d\n", i);
printf("%d\n", i+1);
printf("%d\n", i+2);
printf("%d\n", i+3);
printf("%d\n", i+4);
printf("%d\n", i+5);
}
for(; i<N; i++)
printf("%d\n", i);
But I recommend taking a look at Duff's device. However, I do suspect that it's not always a good thing to use. The reason is that modulo is a pretty expensive operation.
The condition if (n % 8 != 0) should not be needed. The for header should take care of that if written properly.

optimizing for loop addition [duplicate]

This question already has answers here:
How to optimize these loops (with compiler optimization disabled)?
(3 answers)
Closed 5 years ago.
I have an assignment to optimize a for loop so the compiler compiles code that runs faster. The objective is to get the code to run in 5 or less seconds, with the original run time being around 23 seconds. The original code looks like this:
#include <stdio.h>
#include <stdlib.h>
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main(void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
printf("CS201 - Asgmt 4 - I. Forgot\n");
for (i = 0; i < N_TIMES; i++) {
int j;
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
}
return 0;
}
My first thought was to do loop unrolling on the inner for loop which got it down to 5.7 seconds and that loop looked like this:
for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
}
After taking it out to 12 spots in the array per loop the performance wasn't increasing anymore so my next thought was to try and introduce some parallelism so I did this:
sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5]);
sum1 = sum1 + (array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
That actually ended up slowing down the code and each additional variable again slowed the code down more so. I'm not sure if parallelism doesn't work here or if I'm implementing it wrong or what but that didn't work so now I'm not really sure how I can optimize it anymore to get it below 5 seconds.
EDIT: I forgot to mention I can't make any changes to the outer loop, only the inner loop
EDIT2: This is the part of the code I'm trying to optimize for my assignment:
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
Im using gcc compiler with the flags
gcc -m32 -std=gnu11 -Wall -g a04.c -o a04
All compiler optimizations are turned off
Since j and i don't depend on one another, I think you can just do:
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
sum *= N_TIMES
You can move the declaration of variable 'j' out of the loop like so:
int j;
for (i = 0; i < N_TIMES; i++) {
//int j; <-- Move this line out of the loop
for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
}
}
You don't need to declare a new variable 'j' each time the loop runs.

Speeding up reading and writing to a 2D Array

I have a grid where each element holds 9 values. Every time-step grabs some values from neighbouring elements, does some trivial calculations, and then writes back new values to the same addresses.
On my machine this program runs in about 3 minutes. However, an alternative program that simply reads values from neighbouring elements, and then writes back the values (i.e. the below program just without intermediate calculations) runs in only 50 seconds.
Assuming the intermediate calculations don't take very long to compute (I may be wrong about that), how can I speed up the former program to achieve a similar performance to the latter? The issue is most likely to do with caching, but all changes I've attempted have either had no affect on the performance, or made the performance worse.
What I've attempted so far
Swapping the grid array from a structure of arrays (grid[9][256*256]) to a array of structures (grid[256*256][9]) seemed to have negligible impact on performance.
Hoisting the a[9] array to be outside the loop also didn't seem to affect the performance.
I've run the code through a profiler which tells me that cpu performance is poor whenever I access elements from the grid.
Simplified program:
#include <stdio.h>
#include <stdlib.h>
int main() {
double **grid = (double**)malloc(9*sizeof(double*));
for(int i = 0; i < 9; i++)
grid[i] = (double*)malloc(256*256*sizeof(double));
// double **grid = (double**)malloc(256*256*sizeof(double*));
// for(int i = 0; i < 256*256; i++)
// grid[i] = (double*)malloc(9*sizeof(double));
double res = 0.0;
for (int tt = 0; tt < 80000; tt++) {
for (int ii = 0; ii < 256; ii++) {
for (int jj = 0; jj < 256; jj++) {
int up = (ii + 1) % 256;
int rt = (jj + 1) % 256;
int dn = (ii == 0) ? 255 : (ii - 1);
int lf = (jj == 0) ? 255 : (jj - 1);
double sum = grid[0][ii*256 + jj] + grid[1][ii*256 + lf]
+ grid[2][dn*256 + jj] + grid[3][ii*256 + rt]
+ grid[4][up*256 + jj] + grid[5][dn*256 + lf]
+ grid[6][dn*256 + rt] + grid[7][up*256 + rt]
+ grid[8][up*256 + lf];
double odd = ( grid[1][ii*256 + jj] + grid[3][up*256 + lf]
+ grid[5][dn*256 + rt] + grid[7][up*256 + rt]
) / sum;
double even = ( grid[0][ii*256 + jj] + grid[2][up*256 + lf]
+ grid[4][dn*256 + rt] + grid[6][dn*256 + lf]
+ grid[8][ii*256 + lf]
) / sum;
double hypot = odd*odd + even*even;
double a[9];
a[1] = ( odd ) * hypot;
a[2] = ( even ) * hypot;
a[3] = ( - odd ) * hypot;
a[4] = ( - even ) * hypot;
a[5] = ( odd + even ) * hypot;
a[6] = ( - odd + even ) * hypot;
a[7] = ( - odd - even ) * hypot;
a[8] = ( odd - even ) * hypot;
sum = 0.0;
sum += ( grid[0][ii*256 + jj] = hypot * grid[0][ii*256 + jj] );
sum += ( grid[1][ii*256 + lf] = a[3] * grid[3][ii*256 + rt] );
sum += ( grid[2][dn*256 + jj] = a[4] * grid[4][up*256 + jj] );
sum += ( grid[3][ii*256 + rt] = a[1] * grid[1][ii*256 + lf] );
sum += ( grid[4][up*256 + jj] = a[2] * grid[2][dn*256 + jj] );
sum += ( grid[5][dn*256 + lf] = a[7] * grid[7][up*256 + rt] );
sum += ( grid[6][dn*256 + rt] = a[8] * grid[8][up*256 + lf] );
sum += ( grid[7][up*256 + rt] = a[5] * grid[5][dn*256 + lf] );
sum += ( grid[8][up*256 + lf] = a[6] * grid[6][dn*256 + rt] );
res += sum;
}
}
}
printf("%f", res);
return 0;
}
Vim command to swap the index ordering of all 2D arrays in the program:
:%s/\[\([^\]]\+\)\]\[\([^\]]\+\)]/\[\2\]\[\1\]/g

Sum of the elements on the right side of the second bisector in C

I'm trying to make a function in C that calculates the sum of the elements on the right side of a 2 dimensional square matrix. (only the 1 elements in the below comments)
So far I've got this but it's incorrect since it calculates the sum of the elements for the whole matrix:
#define N 5
int a[N][N] ={{0,0,0,0,1},
{0,0,0,1,1},
{0,0,1,1,1},
{0,1,1,1,1},
{1,1,1,1,1}};
/*
{0,0,0,0,1},
{0,0,0,1,1},
{0,0,1,1,1},
{0,1,1,1,1},
{1,1,1,1,1},
sum =
a[0][4] +
a[1][3] + a[1][4] +
a[2][2] + a[2][3] + a[2][4] +
a[3][1] + a[3][2] + a[3][3] + a[3][4] +
a[4][0] + a[4][1] + a[4][2] + a[4][3] + a[4][4]
*/
int sumSndBisRight(int a[N][N]) {
int i, j, sum = 0, k = N - 1;
for( i = 0;i < N;i++)
for( j = (N - 1);j >= 0;j--)
sum += a[i][j];
return sum;
}
void main() {
int sum;
sum = sumSndBisRight(a);
printf("%d", sum);
}
Thanks in advance for your help.
Change
for( j = (N - 1);j >= 0;j--)
to
for( j = N-1-i; j < N; j++)

Resources