Array increment doesnt give the correct value - c

This is a very strange problem. I cannot see any differences between code1 and code2 . However, there should be a difference because they produce different results : (notice f0 and f0A (acts as a buffer))
code1 :
for (k = 0; k < 6; k++) {
r1 = i0 + 6 * k;
f0 = 0.0F;
for (r2 = 0; r2 < 6; r2++) {
f0 += (float)b_a[i0 + 6 * r2] * p_est[r2 + 6 * k];
}
a[r1] = f0;
}
code2:
float f0A[6] = {0};
for (k = 0; k < 6; k++) {
r1 = i0 + 6 * k;
for (r2 = 0; r2 < 6; r2++) {
f0A[r2] += (float)b_a[i0 + 6 * r2] * p_est[r2 + 6 * k];
}
}
for (r2 = 0; r2 < 6; r2++) {
r1 = i0 + 6 * r2;
a[r1] = f0A[r2];
}

In the first loop, you are setting a[r1] to a summation stored in f0. It is being added to each loop.
In the second loop, you aren't doing a summation, your loop is using += but it is storing each one in a different f0A index. Thus a[r1] is not given the correct value
There is the difference

Related

Minimum sum partition of an array

Problem Statement:
Given an array, the task is to divide it into two sets S1 and S2 such that the absolute difference between their sums is minimum.
Sample Inputs,
[1,6,5,11] => 1. The 2 subsets are {1,5,6} and {11} with sums being 12 and 11. Hence answer is 1.
[36,7,46,40] => 23. The 2 subsets are {7,46} and {36,40} with sums being 53 and 76. Hence answer is 23.
Constraints
1 <= size of array <= 50
1 <= a[i] <= 50
My Effort:
int someFunction(int n, int *arr) {
qsort(arr, n, sizeof(int), compare);// sorted it for simplicity
int i, j;
int dp[55][3000]; // sum of the array won't go beyond 3000 and size of array is less than or equal to 50(for the rows)
// initialize
for (i = 0; i < 55; ++i) {
for (j = 0; j < 3000; ++j)
dp[i][j] = 0;
}
int sum = 0;
for (i = 0; i < n; ++i)
sum += arr[i];
for (i = 0; i < n; ++i) {
for (j = 0; j <= sum; ++j) {
dp[i + 1][j + 1] = max(dp[i + 1][j], dp[i][j + 1]);
if (j >= arr[i])
dp[i + 1][j + 1] = max(dp[i + 1][j + 1], arr[i] + dp[i][j + 1 - arr[i]]);
}
}
for (i = 0; i < n; ++i) {
for (j = 0; j <= sum; ++j)
printf("%d ", dp[i + 1][j + 1]);
printf("\n");
}
return 0;// irrelevant for now as I am yet to understand what to do next to get the minimum.
}
OUTPUT
Let's say for input [1,5,6,11], I am getting the dp array output as below.
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
0 1 1 1 1 5 6 7 7 7 7 11 12 12 12 12 12 12 12 12 12 12 12 12
0 1 1 1 1 5 6 7 7 7 7 11 12 12 12 12 16 17 18 18 18 18 22 23
Now, how to decide the 2 subsets to get the minimum?
P.S - I have already seen this link but explanation is not good enough for a DP beginner like me.
You have to solve subset sum problem for SumValue = OverallSum / 2
Note that you don't need to solve any optimization problem (as using max operation in your code reveals).
Just fill linear table (1D array A) of size (SumValue + 1) with possible sums, get the closest to the last cell non-zero result (scan A backward) wint index M and calculate final result as abs(OverallSum - M - M).
To start, set 0-th entry to 1.
Then for every source array item D[i] scan array A from the end to beginning:
A[0] = 1;
for (i = 0; i < D.Length(); i++)
{
for (j = SumValue; j >= D[i]; j--)
{
if (A[j - D[i]] == 1)
// we can compose sum j from D[i] and previously made sum
A[j] = 1;
}
}
For example D = [1,6,5,11] you have SumValue = 12, make array A[13], and calculate possible sums
A array after filling: [0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1]
working Python code:
def besthalf(d):
s = sum(d)
half = s // 2
a = [1] + [0] * half
for v in d:
for j in range(half, v - 1, -1):
if (a[j -v] == 1):
a[j] = 1
for j in range(half, 0, -1):
if (a[j] == 1):
m = j
break
return(s - 2 * m)
print(besthalf([1,5,6,11]))
print(besthalf([1,1,1,50]))
>>1
>>47
I'll convert this problem to subset sum problem
let's take array int[] A = { 10,20,15,5,25,33 };
it should be divided into {25 20 10} and { 33 20 } and answer is 55-53=2
Notations : SUM == sum of whole array
sum1 == sum of subset1
sum2 == sum of subset1
step 1: get sum of whole array SUM=108
step 2: whichever way we divide our array into two part one thing will remain true
sum1+ sum2= SUM
step 3: if our intention is to get minimum sum difference then
sum1 and sum2 should be near SUM/2 (example sum1=54 and sum2=54 then diff=0 )
steon 4: let's try combinations
sum1 = 54 AND sum2 = 54 (not possible to divide like this)
sum1 = 55 AND sum2 = 53 (possible and our solution, should break here)
sum1 = 56 AND sum2 = 52
sum1 = 57 AND sum2 = 51 .......so on
pseudo code
SUM=Array.sum();
sum1 = SUM/2;
sum2 = SUM-sum1;
while(true){
if(subSetSuMProblem(A,sum1) && subSetSuMProblem(A,sum2){
print "possible"
break;
}
else{
sum1++;
sum2--;
}
}
Java code for the same
import java.util.ArrayList;
import java.util.List;
public class MinimumSumSubsetPrint {
public static void main(String[] args) {
int[] A = {10, 20, 15, 5, 25, 32};
int sum = 0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
subsetSumDynamic(A, sum);
}
private static boolean subsetSumDynamic(int[] A, int sum) {
int n = A.length;
boolean[][] T = new boolean[n + 1][sum + 1];
// sum2[0][0]=true;
for (int i = 0; i <= n; i++) {
T[i][0] = true;
}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= sum; j++) {
if (A[i - 1] > j) {
T[i][j] = T[i - 1][j];
} else {
T[i][j] = T[i - 1][j] || T[i - 1][j - A[i - 1]];
}
}
}
int sum1 = sum / 2;
int sum2 = sum - sum1;
while (true) {
if (T[n][sum1] && T[n][sum2]) {
printSubsets(T, sum1, n, A);
printSubsets(T, sum2, n, A);
break;
} else {
sum1 = sum1 - 1;
sum2 = sum - sum1;
System.out.println(sum1 + ":" + sum2);
}
}
return T[n][sum];
}
private static void printSubsets(boolean[][] T, int sum, int n, int[] A) {
List<Integer> sumvals = new ArrayList<Integer>();
int i = n;
int j = sum;
while (i > 0 && j > 0) {
if (T[i][j] == T[i - 1][j]) {
i--;
} else {
sumvals.add(A[i - 1]);
j = j - A[i - 1];
i--;
}
}
System.out.println();
for (int p : sumvals) {
System.out.print(p + " ");
}
System.out.println();
}
}
Working Java code if anyone is interested but the idea remains the same as what #MBo has answered
class Solution{
public int minDifference(int arr[]) {
int sum = 0;
for(int x : arr) sum += x;
int half = (sum >> 1) + (sum & 1);
boolean[] sums = new boolean[half + 1];
sums[0] = true;
for(int i = 0; i < arr.length; ++i){
if(arr[i] > half) continue;
for(int j = half; j >= arr[i]; --j){
if(sums[j - arr[i]]){
sums[j] = true;
}
}
}
for(int i = sums.length - 1; i >= 1; --i){
if(sums[i]) return Math.abs((sum - i) - i);
}
return sum; // for arrays like [2] or [100] etc
}
}

3-D Loop comparison in 7-pt Stencil

I carry out a 7-pt stencil update on two 3-D domains. The first one is 258x130x258and the second one is 130x258x258. Both of them have the same number of elements being updated. In C they are represented as contiguous arrays : a1[258][130][258] and x1[130][258][258]. Simply stated their x-dimension and y-dimension are exchanged but z-dimension (fastest changing index) is equal.
Loop 1:
for(i = 1; i <= 256 ; i++)
for(j = 1; j <= 128 ; j++)
for(k = 1; k <= 256; k++)
a1[i][j][k] = alpha * b1[i][j][k] + (Omega_6) *(b1[i-1][j][k] + b1[i+1][j][k] +
b1[i][j-1][k] + b1[i][j+1][k] +
b1[i][j][k-1] + b1[i][j][k+1] +
c1[i][j][k] * H);
Loop 2:
for(i = 1; i <= 128 ; i++)
for(j = 1; j <= 256 ; j++)
for(k = 1; k <= 256; k++)
x1[i][j][k] = alpha * y1[i][j][k] + (Omega_6) *(y1[i-1][j][k] + y1[i+1][j][k] +
y1[i][j-1][k] + y1[i][j+1][k] +
y1[i][j][k-1] + y1[i][j][k+1] +
z1[i][j][k] * H);
a1, b1, c1 all have same dimensions and x1, y1, z1 have the same dimensions. alpha and Omega_6 are constants. Loop 1 runs 0.5 seconds faster than Loop 2. Why does this happen ?

Why does this C code yields a double free or corruption? [duplicate]

This question already has answers here:
C free() routine and incremented array pointers
(2 answers)
Closed 7 years ago.
Why is this code for the computation of the inner product of two vectors yields a double free or corruption error, when compiled with:
ejspeiro#Eduardo-Alienware-14:~/Dropbox/HPC-Practices$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
The code comes from this reference.
// Computation of the inner product of vectors aa and bb.
#include <stdio.h>
#include <stdlib.h>
int main() {
size_t nn = 100000000;
size_t total_mem_array = nn*sizeof(double);
double *aa;
double *bb;
double ss = 0.0;
aa = (double *) malloc(total_mem_array);
bb = (double *) malloc(total_mem_array);
int ii = 0;
for (ii = 0; ii < nn; ++ii) {
aa[ii] = 1.0;
bb[ii] = 1.0;
}
double sum1 = 0.0;
double sum2 = 0.0;
for (ii = 0; ii < nn/2 - 1; ++ii) {
sum1 += (*(aa + 0))*(*(bb + 0));
sum2 += (*(aa + 1))*(*(bb + 1));
aa += 2;
bb += 2;
}
ss = sum1 + sum2;
free(aa);
free(bb);
return 0;
}
The error is caused because the value passed to free() is not the same value returned by malloc(), as you increment aa and bb.
To correct it you could, for example, define two additional pointer variables that are used only for memory management, i.e. allocation and deallocation. Once memory is acquired by them, assign it to aa and bb.
You can simplify:
for (ii = 0; ii < nn/2 - 1; ++ii) {
sum1 += (*(aa + 0))*(*(bb + 0));
sum2 += (*(aa + 1))*(*(bb + 1));
aa += 2;
bb += 2;
}
to:
for (ii = 0; ii < nn/2 - 1; ++ii) {
sum1 += aa[ii * 2] * bb[ii * 2];
sum2 += aa[ii * 2 + 1] * bb[ii * 2 + 1];
}
which has the dual benefits of avoiding incrementing your pointers which causes your problem, and making your code a whole lot clearer.

Loop Unrolling Multi Dimesional Array

I recently tried unrolling the inner i and j loops within this multi-dimensional array, but the filter->get(i,j) always messes up the texture of the image. Can anyone assist me with unrolling the i and j loop? Thanks.
My attempt:
double
applyFilter(struct Filter *filter, cs1300bmp *input, cs1300bmp *output)
{
long long cycStart, cycStop;
cycStart = rdtscll();
output -> width = input -> width;
output -> height = input -> height;
int a = filter -> getDivisor();
int n = filter -> getSize();
for (int plane = 0; plane < 3; plane++){
for(int row = 1; row < (input -> height) - 1 ; row = row + 1) {
for(int col = 1; col < (input -> width) - 1; col = col + 1) {
int value = 0;
int val1, val2;
for (int j = 0; j < n; j++) {
for (int i = 0; i < n; i+=2) {
val1 = val1 + input -> color[plane][row + i - 1][col + j - 1]
* filter -> get(i, j);
val2 = val2 + input -> color[plane][row + i][col + j -1] * filter->get(i+1,j);
}
}
value = (val1 + val2) / a;
if ( value < 0 ) { value = 0; }
if ( value > 255 ) { value = 255; }
output -> color[plane][row][col] = value;
}
}
}
cycStop = rdtscll();
double diff = cycStop - cycStart;
double diffPerPixel = diff / (output -> width * output -> height);
fprintf(stderr, "Took %f cycles to process, or %f cycles per pixel\n",
diff, diff / (output -> width * output -> height));
return diffPerPixel;
}
Original:
int a = filter -> getDivisor();
int n = filter -> getSize();
for (int plane = 0; plane < 3; plane++){
for(int row = 1; row < (input -> height) - 1 ; row = row + 1) {
for(int col = 1; col < (input -> width) - 1; col = col + 1) {
int value = 0;
for (int j = 0; j < n; j++) {
for (int i = 0; i < n; i++) {
value = value + input -> color[plane][row + i - 1][col + j - 1]
* filter -> get(i, j);
}
}
value = value / a;
if ( value < 0 ) { value = 0; }
if ( value > 255 ) { value = 255; }
output -> color[plane][row][col] = value;
Try replacing the inner loop with:
int value = 0;
int val1 = 0, val2 = 0;
for (int j = 0; j < n; j++) {
int i;
for (i = 0; i < n; i+=2) {
val1 += input->color[plane][row+i-1][col+j-1] * filter->get(i,j);
val2 += input->color[plane][row+i ][col+j-1] * filter->get(i+1,j);
}
if (i < n)
val1 += input->color[plane][row+i-1][col+j-1] * filter->get(i,j);
}
value = (val1 + val2) / a;
Your method only is correct if n is a multiple of 2. Otherwise you will miss one line.
ADDED:
First of all, I just realized that you forgot to initialize val1 and val2 which is probably the main reason for your problems.
Second, it seems to me, that your code was written specifically for filter sizes of 3:
For smaller filters you don't access the borders at all.
For bigger ones, you access positions outside of the picture, as e.g.
[row + i - 1] becomes bigger than or equal to input->height.
If you only want to use filters of size 3, then I would simply unrol the inner loops completely. Otherwise check the boundaries for the row and col values.
Now, for loop unrolling, I would recommend doing a google search, as you can find many examples on how to do that properly. One can be found on the wikipedia page.
In your case, the simplest solution would be:
int value = 0;
int val1=0, val2=0;
for (int j = 0; j < n; j++) {
for (int i = 0; i < n-1; i+=2) {
val1 = val1 + input->color[plane][row+i-1][col+j-1] * filter->get(i ,j);
val2 = val2 + input->color[plane][row+i ][col+j-1] * filter->get(i+1,j);
}
if (n%2 !=0) {
val1 = val1 + input->color[plane][row+n-2][col+j-1] * filter->get(n-1,j);
}
}
value = (val1 + val2) / a;
In case you want to unroll the loop even more, the more generic way would be (e.g. for 4):
int value = 0;
int val1=0, val2=0, val3=0, val4=0;
for (int j = 0; j < n; j++) {
for (int i = 0; i < n-3; i+=4) {
val1 = val1 + input->color[plane][row+i-1][col+j-1] * filter->get(i ,j);
val2 = val2 + input->color[plane][row+i ][col+j-1] * filter->get(i+1,j);
val3 = val3 + input->color[plane][row+i+1][col+j-1] * filter->get(i+2,j);
val4 = val4 + input->color[plane][row+i+2][col+j-1] * filter->get(i+3,j);
}
switch (n % 4) {
case 3: val1+=input->color[plane][row+n-4][col+j-1] * filter->get(i+n-3,j);
case 2: val1+=input->color[plane][row+n-3][col+j-1] * filter->get(i+n-2,j);
case 1: val1+=input->color[plane][row+n-2][col+j-1] * filter->get(i+n-1,j);
}
value = (val1 + val2 + val3 + val4) / a;
}
NOTE:
Please be aware, that depending on the size of your filter, the used compiler and compiler options and your system, the solutions above might not speed up your code but even slow it down. You should also be aware that the compiler can usually do loop unroling for you (e.g. with the -funroll-loops option in gcc) if it makes sense.

Optimization of C code

For an assignment of a course called High Performance Computing, I required to optimize the following code fragment:
int foobar(int a, int b, int N)
{
int i, j, k, x, y;
x = 0;
y = 0;
k = 256;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
if (i > j){
y = y + 8*(i-j);
}else{
y = y + 8*(j-i);
}
}
}
return x;
}
Using some recommendations, I managed to optimize the code (or at least I think so), such as:
Constant Propagation
Algebraic Simplification
Copy Propagation
Common Subexpression Elimination
Dead Code Elimination
Loop Invariant Removal
bitwise shifts instead of multiplication as they are less expensive.
Here's my code:
int foobar(int a, int b, int N) {
int i, j, x, y, t;
x = 0;
y = 0;
for (i = 0; i <= N; i++) {
t = i + 512;
for (j = i + 1; j <= N; j++) {
x = x + ((i<<3) + (j<<2))*t;
}
}
return x;
}
According to my instructor, a well optimized code instructions should have fewer or less costly instructions in assembly language level.And therefore must be run, the instructions in less time than the original code, ie calculations are made with::
execution time = instruction count * cycles per instruction
When I generate assembly code using the command: gcc -o code_opt.s -S foobar.c,
the generated code has many more lines than the original despite having made ​​some optimizations, and run-time is lower, but not as much as in the original code. What am I doing wrong?
Do not paste the assembly code as both are very extensive. So I'm calling the function "foobar" in the main and I am measuring the execution time using the time command in linux
int main () {
int a,b,N;
scanf ("%d %d %d",&a,&b,&N);
printf ("%d\n",foobar (a,b,N));
return 0;
}
Initially:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
if (i > j){
y = y + 8*(i-j);
}else{
y = y + 8*(j-i);
}
}
}
Removing y calculations:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
}
}
Splitting i, j, k:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 8*i*i + 16*i*k ; // multiple of 1 (no j)
x = x + (4*i + 8*k)*j ; // multiple of j
}
}
Moving them externally (and removing the loop that runs N-i times):
for (i = 0; i <= N; i++) {
x = x + (8*i*i + 16*i*k) * (N-i) ;
x = x + (4*i + 8*k) * ((N*N+N)/2 - (i*i+i)/2) ;
}
Rewritting:
for (i = 0; i <= N; i++) {
x = x + ( 8*k*(N*N+N)/2 ) ;
x = x + i * ( 16*k*N + 4*(N*N+N)/2 + 8*k*(-1/2) ) ;
x = x + i*i * ( 8*N + 16*k*(-1) + 4*(-1/2) + 8*k*(-1/2) );
x = x + i*i*i * ( 8*(-1) + 4*(-1/2) ) ;
}
Rewritting - recalculating:
for (i = 0; i <= N; i++) {
x = x + 4*k*(N*N+N) ; // multiple of 1
x = x + i * ( 16*k*N + 2*(N*N+N) - 4*k ) ; // multiple of i
x = x + i*i * ( 8*N - 20*k - 2 ) ; // multiple of i^2
x = x + i*i*i * ( -10 ) ; // multiple of i^3
}
Another move to external (and removal of the i loop):
x = x + ( 4*k*(N*N+N) ) * (N+1) ;
x = x + ( 16*k*N + 2*(N*N+N) - 4*k ) * ((N*(N+1))/2) ;
x = x + ( 8*N - 20*k - 2 ) * ((N*(N+1)*(2*N+1))/6);
x = x + (-10) * ((N*N*(N+1)*(N+1))/4) ;
Both the above loop removals use the summation formulas:
Sum(1, i = 0..n) = n+1
Sum(i1, i = 0..n) = n(n + 1)/2
Sum(i2, i = 0..n) = n(n + 1)(2n + 1)/6
Sum(i3, i = 0..n) = n2(n + 1)2/4
y does not affect the final result of the code - removed:
int foobar(int a, int b, int N)
{
int i, j, k, x, y;
x = 0;
//y = 0;
k = 256;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
//if (i > j){
// y = y + 8*(i-j);
//}else{
// y = y + 8*(j-i);
//}
}
}
return x;
}
k is simply a constant:
int foobar(int a, int b, int N)
{
int i, j, x;
x = 0;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*256);
}
}
return x;
}
The inner expression can be transformed to: x += 8*i*i + 4096*i + 4*i*j + 2048*j. Use math to push all of them to the outer loop: x += 8*i*i*(N-i) + 4096*i*(N-i) + 2*i*(N-i)*(N+i+1) + 1024*(N-i)*(N+i+1).
You can expand the above expression, and apply sum of squares and sum of cubes formula to obtain a close form expression, which should run faster than the doubly nested loop. I leave it as an exercise to you. As a result, i and j will also be removed.
a and b should also be removed if possible - since a and b are supplied as argument but never used in your code.
Sum of squares and sum of cubes formula:
Sum(x2, x = 1..n) = n(n + 1)(2n + 1)/6
Sum(x3, x = 1..n) = n2(n + 1)2/4
This function is equivalent with the following formula, which contains only 4 integer multiplications, and 1 integer division:
x = N * (N + 1) * (N * (7 * N + 8187) - 2050) / 6;
To get this, I simply typed the sum calculated by your nested loops into Wolfram Alpha:
sum (sum (8*i*i+4096*i+4*i*j+2048*j), j=i+1..N), i=0..N
Here is the direct link to the solution. Think before coding. Sometimes your brain can optimize code better than any compiler.
Briefly scanning the first routine, the first thing you notice is that expressions involving "y" are completely unused and can be eliminated (as you did). This further permits eliminating the if/else (as you did).
What remains is the two for loops and the messy expression. Factoring out the pieces of that expression that do not depend on j is the next step. You removed one such expression, but (i<<3) (ie, i * 8) remains in the inner loop, and can be removed.
Pascal's answer reminded me that you can use a loop stride optimization. First move (i<<3) * t out of the inner loop (call it i1), then calculate, when initializing the loop, a value j1 that equals (i<<2) * t. On each iteration increment j1 by 4 * t (which is a pre-calculated constant). Replace your inner expression with x = x + i1 + j1;.
One suspects that there may be some way to combine the two loops into one, with a stride, but I'm not seeing it offhand.
A few other things I can see. You don't need y, so you can remove its declaration and initialisation.
Also, the values passed in for a and b aren't actually used, so you could use these as local variables instead of x and t.
Also, rather than adding i to 512 each time through you can note that t starts at 512 and increments by 1 each iteration.
int foobar(int a, int b, int N) {
int i, j;
a = 0;
b = 512;
for (i = 0; i <= N; i++, b++) {
for (j = i + 1; j <= N; j++) {
a = a + ((i<<3) + (j<<2))*b;
}
}
return a;
}
Once you get to this point you can also observe that, aside from initialising j, i and j are only used in a single mutiple each - i<<3 and j<<2. We can code this directly in the loop logic, thus:
int foobar(int a, int b, int N) {
int i, j, iLimit, jLimit;
a = 0;
b = 512;
iLimit = N << 3;
jLimit = N << 2;
for (i = 0; i <= iLimit; i+=8) {
for (j = i >> 1 + 4; j <= jLimit; j+=4) {
a = a + (i + j)*b;
}
b++;
}
return a;
}
OK... so here is my solution, along with inline comments to explain what I did and how.
int foobar(int N)
{ // We eliminate unused arguments
int x = 0, i = 0, i2 = 0, j, k, z;
// We only iterate up to N on the outer loop, since the
// last iteration doesn't do anything useful. Also we keep
// track of '2*i' (which is used throughout the code) by a
// second variable 'i2' which we increment by two in every
// iteration, essentially converting multiplication into addition.
while(i < N)
{
// We hoist the calculation '4 * (i+2*k)' out of the loop
// since k is a literal constant and 'i' is a constant during
// the inner loop. We could convert the multiplication by 2
// into a left shift, but hey, let's not go *crazy*!
//
// (4 * (i+2*k)) <=>
// (4 * i) + (4 * 2 * k) <=>
// (2 * i2) + (8 * k) <=>
// (2 * i2) + (8 * 512) <=>
// (2 * i2) + 2048
k = (2 * i2) + 2048;
// We have now converted the expression:
// x = x + 4*(2*i+j)*(i+2*k);
//
// into the expression:
// x = x + (i2 + j) * k;
//
// Counterintuively we now *expand* the formula into:
// x = x + (i2 * k) + (j * k);
//
// Now observe that (i2 * k) is a constant inside the inner
// loop which we can calculate only once here. Also observe
// that is simply added into x a total (N - i) times, so
// we take advantange of the abelian nature of addition
// to hoist it completely out of the loop
x = x + (i2 * k) * (N - i);
// Observe that inside this loop we calculate (j * k) repeatedly,
// and that j is just an increasing counter. So now instead of
// doing numerous multiplications, let's break the operation into
// two parts: a multiplication, which we hoist out of the inner
// loop and additions which we continue performing in the inner
// loop.
z = i * k;
for (j = i + 1; j <= N; j++)
{
z = z + k;
x = x + z;
}
i++;
i2 += 2;
}
return x;
}
The code, without any of the explanations boils down to this:
int foobar(int N)
{
int x = 0, i = 0, i2 = 0, j, k, z;
while(i < N)
{
k = (2 * i2) + 2048;
x = x + (i2 * k) * (N - i);
z = i * k;
for (j = i + 1; j <= N; j++)
{
z = z + k;
x = x + z;
}
i++;
i2 += 2;
}
return x;
}
I hope this helps.
int foobar(int N) //To avoid unuse passing argument
{
int i, j, x=0; //Remove unuseful variable, operation so save stack and Machine cycle
for (i = N; i--; ) //Don't check unnecessary comparison condition
for (j = N+1; --j>i; )
x += (((i<<1)+j)*(i+512)<<2); //Save Machine cycle ,Use shift instead of Multiply
return x;
}

Resources