Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
This post was edited and submitted for review 7 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
This is the question:
Here is the expected answer:
Here is my code:
#include <stdio.h>
int main() {
long trial, n, p, A, B, tp[10000] = {}, min;
scanf("%li", &trial);
for (int i = 0; i < trial; i++) {
scanf("%li %li %li", &p, &A, &B);
for (int m = 0, n = p; m <= p, n >= 0; m++, n--) {
tp[m] = ((A * m * m) + (B * n * n));
}
min = tp[0];
for (int j = 1; j <= p; j++) {
if (tp[j] < min) {
min = tp[j];
}
}
printf("%li\n", min);
}
}
What can I do to get the right answer?
There are some problems in your code:
the initializer = {} will be part of the upcoming C2x standard but it is invalid in many C implementations. The array does not need initializing anyway.
the test m <= p, n >= 0 is somewhat incorrect: only the second part n >= 0 is tested, the comma operator ignores the result of its left operand. This happens to suffice, but you should just write:
for (int m = 0, n = p; m <= p; m++, n--)
the array tp only has 10000 elements, which is not enough for up to 100000 (105) cases as stated in the problem description. You probably have a buffer overflow for some test cases. Note that you can simplify the code and remove the array: you just need a single loop and keep track of the best price:
#include <stdio.h>
int main() {
long trials = 0;
scanf("%li", &trials);
while (trials --> 0) {
long p = 0, A = 0, B = 0;
scanf("%li %li %li", &p, &A, &B);
long min = B * p * p;
for (long m = 1; m <= p; m++) {
long price = A * m * m + B * (p - m) * (p - m);
if (min > price)
min = price;
}
printf("%li\n", min);
}
return 0;
}
The problem can be solved directly using calculus, reducing the time complexity from O(p) to O(1), which might be required to fit within the time limit of 1, albeit 100000 iterations should be easily achieved in 1 second.
The problems consists in finding the minimum of the quadratic function f(x) = A.x2 + B.(p-x)2
Normalizing: f(x) = (A+B).x2 - 2.B.p.x + B.p2
First derivative: f'(x) = 2(A+B).x - 2.B.p
Assuming A and B are positive, the minimum is obtained for x = B.p / (A+B)
The solution is obtained by testing at most 2 numbers: B * p / (A + B) and B * p / (A + B) + 1
Here is an analytical solution:
#include <stdio.h>
int main() {
long trials = 0;
scanf("%li", &trials);
while (trials --> 0) {
long p = 0, A = 0, B = 0;
scanf("%li %li %li", &p, &A, &B);
// assuming A, B and p are positive
long min = 0;
if (A + B != 0) {
long m = B * p / (A + B);
if (m < 0) { // cannot happen if A, B and p are positive
min = B * p * p;
} else
if (m >= p) {
min = A * p * p;
} else {
long price1 = A * m * m + B * (p - m) * (p - m);
long price2 = A * (m + 1) * (m + 1) +
B * (p - m - 1) * (p - m - 1);
min = price1 < price2 ? price1 : price2;
}
}
printf("%li\n", min);
}
return 0;
}
I tried to solve it on my own way, but instead of giving me consecutive odd numbers, it gave me the result of a cubic number. How can I make so it'll give me, after the compilation, consecutive odd numbers whose sum equals to n^3(cubic number) like the examples shown above? Thanks in advance and also do not forget to explain me how you did it. Please use C, when giving a much more viable solution.
#include <stdio.h>
#include <math.h>
int main()
{
int n, sum;
printf("Value of n= ");
scanf("%d",&n);
for(int n=1; n<=20; n++)
{
if(n%2!=0);
}
printf("%d", sum=pow(n,3));
return 0;
}
Instead of thinking it like that, I tried a few numbers and found the solution in math. I guess this is not what you wanted, but it does work and consists of only odd numbers.
1^3=1
2^3=3+5
3^3=7+9+11
4^3=13+15+17+19
5^3=21+23+25+27+29
...
x^3=[x*(x-1)+1]+[x*(x-1)+3]+...+[x*(x-1)+2x-1]
for math proof, define it as a series:
An=x^2-x+1+2(n-1)=x^2-x+1+2n-2=x^2-x+2n-1
Sx=(A1+Ax)*x/2=[(x^2-x+1)+(x^2-x+2x-1)]*x/2=(2x^2)*x/2=x^3
Also, because n(n-1) is always even, the numbers must be odd.
to write it as code:
void printOddNumbers(int n)
{
int a1 = n * (n - 1) + 1;
for(int i = 0; i < n; i++)
{
printf("%d+", a1 + 2 * i);
}
printf("\b=%d^3=%d\n", n, n * n * n);
}
So the output will look like: 13+15+17+19=4^3=64.
So, you have an input, n, which is the number of consecutive odd numbers, which should yield a given sum as a result. So, this is how it looks alike:
sum = k + (k + 2) + ... + (k + 2n - 2) =
= n * k + 2 * (1 + 2 + ... + n - 1) =
= n * k + 2 (n * (n - 1) / 2) =
= n * k + n * (n - 1) =
= n * (n + k - 1)
n is known, k is an unkown odd number. So, for sum = 27 and n = 3 this would mean
3 * (3 + k - 1) = 27
3 + k - 1 = 9
k + 2 = 9
k = 7
7 + 9 + 11 = 27
For sum = 125, k = 5:
5 * (5 + k - 1) = 125
5 + k - 1 = 25
k = 21
21 + 23 + 25 + 27 + 29 = 125
So, the implementation would look like this:
int getK(int n, int sum) {
int k = (sum / n) - n + 1;
int currentSum = k;
int result = k;
for (int i = 1; i < n; i++) currentSum += 2 * i + k;
return ((currentSum == sum) && (k % 2)) ? k : 0;
}
Explanation: We return the smallest of the set when the problem is solvable. If it is not an odd, then we return 0 as a sign of error. Also, if the sum does not add up, then the problem is unsolvable and we return 0.
I'm trying to convert a 128-bit unsigned integer stored as an array of 4 unsigned ints to the decimal string representation in C:
unsigned int src[] = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
printf("%s", some_func(src)); // gives "53072739890371098123344"
(The input and output examples above are completely fictional; I have no idea what that input would produce.)
If I was going to hex, binary or octal, this would be a simple matter of masks and bit shifts to peel of the least significant characters. However, it seems to me that I need to do base-10 division. Unfortunately, I can't remember how to do that across multiple ints, and the system I'm using doesn't support data types larger than 32-bits, so using a 128-bit type is not possible. Using a different language is also out, and I'd rather avoid a big number library just for this one operation.
Division is not necessary:
#include <string.h>
#include <stdio.h>
typedef unsigned long uint32;
/* N[0] - contains least significant bits, N[3] - most significant */
char* Bin128ToDec(const uint32 N[4])
{
// log10(x) = log2(x) / log2(10) ~= log2(x) / 3.322
static char s[128 / 3 + 1 + 1];
uint32 n[4];
char* p = s;
int i;
memset(s, '0', sizeof(s) - 1);
s[sizeof(s) - 1] = '\0';
memcpy(n, N, sizeof(n));
for (i = 0; i < 128; i++)
{
int j, carry;
carry = (n[3] >= 0x80000000);
// Shift n[] left, doubling it
n[3] = ((n[3] << 1) & 0xFFFFFFFF) + (n[2] >= 0x80000000);
n[2] = ((n[2] << 1) & 0xFFFFFFFF) + (n[1] >= 0x80000000);
n[1] = ((n[1] << 1) & 0xFFFFFFFF) + (n[0] >= 0x80000000);
n[0] = ((n[0] << 1) & 0xFFFFFFFF);
// Add s[] to itself in decimal, doubling it
for (j = sizeof(s) - 2; j >= 0; j--)
{
s[j] += s[j] - '0' + carry;
carry = (s[j] > '9');
if (carry)
{
s[j] -= 10;
}
}
}
while ((p[0] == '0') && (p < &s[sizeof(s) - 2]))
{
p++;
}
return p;
}
int main(void)
{
static const uint32 testData[][4] =
{
{ 0, 0, 0, 0 },
{ 1048576, 0, 0, 0 },
{ 0xFFFFFFFF, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 }
};
printf("%s\n", Bin128ToDec(testData[0]));
printf("%s\n", Bin128ToDec(testData[1]));
printf("%s\n", Bin128ToDec(testData[2]));
printf("%s\n", Bin128ToDec(testData[3]));
printf("%s\n", Bin128ToDec(testData[4]));
return 0;
}
Output:
0
1048576
4294967295
4294967296
11248221411398543556294285637029484152
Straightforward division base 2^32, prints decimal digits in reverse order, uses 64-bit arithmetic, complexity O(n) where n is the number of decimal digits in the representation:
#include <stdio.h>
unsigned int a [] = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
/* 24197857161011715162171839636988778104 */
int
main ()
{
unsigned long long d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 32) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [3];
a [2] = d;
d = r / 10;
r = r - d * 10;
a [3] = d;
printf ("%d\n", (unsigned int) r);
}
while (a[0] || a[1] || a[2] || a[3]);
return 0;
}
EDIT: Corrected the loop so it displays a 0 if the array a contains only zeros.
Also, the array is read left to right, a[0] is most-significant, a[3] is least significant digits.
A slow but simple approach is to just printing digits from most significant to least significant using subtraction. Basically you need a function for checking if x >= y and another for computing x -= y when that is the case.
Then you can start counting how many times you can subtract 10^38 (and this will be most significant digit), then how many times you can subtract 10^37 ... down to how many times you can subtract 1.
The following is a full implementation of this approach:
#include <stdio.h>
typedef unsigned ui128[4];
int ge128(ui128 a, ui128 b)
{
int i = 3;
while (i >= 0 && a[i] == b[i])
--i;
return i < 0 ? 1 : a[i] >= b[i];
}
void sub128(ui128 a, ui128 b)
{
int i = 0;
int borrow = 0;
while (i < 4)
{
int next_borrow = (borrow && a[i] <= b[i]) || (!borrow && a[i] < b[i]);
a[i] -= b[i] + borrow;
borrow = next_borrow;
i += 1;
}
}
ui128 deci128[] = {{1u,0u,0u,0u},
{10u,0u,0u,0u},
{100u,0u,0u,0u},
{1000u,0u,0u,0u},
{10000u,0u,0u,0u},
{100000u,0u,0u,0u},
{1000000u,0u,0u,0u},
{10000000u,0u,0u,0u},
{100000000u,0u,0u,0u},
{1000000000u,0u,0u,0u},
{1410065408u,2u,0u,0u},
{1215752192u,23u,0u,0u},
{3567587328u,232u,0u,0u},
{1316134912u,2328u,0u,0u},
{276447232u,23283u,0u,0u},
{2764472320u,232830u,0u,0u},
{1874919424u,2328306u,0u,0u},
{1569325056u,23283064u,0u,0u},
{2808348672u,232830643u,0u,0u},
{2313682944u,2328306436u,0u,0u},
{1661992960u,1808227885u,5u,0u},
{3735027712u,902409669u,54u,0u},
{2990538752u,434162106u,542u,0u},
{4135583744u,46653770u,5421u,0u},
{2701131776u,466537709u,54210u,0u},
{1241513984u,370409800u,542101u,0u},
{3825205248u,3704098002u,5421010u,0u},
{3892314112u,2681241660u,54210108u,0u},
{268435456u,1042612833u,542101086u,0u},
{2684354560u,1836193738u,1126043566u,1u},
{1073741824u,1182068202u,2670501072u,12u},
{2147483648u,3230747430u,935206946u,126u},
{0u,2242703233u,762134875u,1262u},
{0u,952195850u,3326381459u,12621u},
{0u,932023908u,3199043520u,126217u},
{0u,730304488u,1925664130u,1262177u},
{0u,3008077584u,2076772117u,12621774u},
{0u,16004768u,3587851993u,126217744u},
{0u,160047680u,1518781562u,1262177448u}};
void print128(ui128 x)
{
int i = 38;
int z = 0;
while (i >= 0)
{
int c = 0;
while (ge128(x, deci128[i]))
{
c++; sub128(x, deci128[i]);
}
if (i==0 || z || c > 0)
{
z = 1; putchar('0' + c);
}
--i;
}
}
int main(int argc, const char *argv[])
{
ui128 test = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
print128(test);
return 0;
}
That number in the problem text in decimal becomes
11248221411398543556294285637029484152
and Python agrees this is the correct value (this of course doesn't mean the code is correct!!! ;-) )
Same thing, but with 32-bit integer arithmetic:
#include <stdio.h>
unsigned short a [] = {
0x0876, 0x5421,
0xfedc, 0xba90,
0x90ab, 0xcdef,
0x1234, 0x5678
};
int
main ()
{
unsigned int d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 16) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [3];
a [2] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [4];
a [3] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [5];
a [4] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [6];
a [5] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [7];
a [6] = d;
d = r / 10;
r = r - d * 10;
a [7] = d;
printf ("%d\n", r);
}
while (a[0] || a[1] || a[2] || a[3] || a [4] || a [5] || a[6] || a[7]);
return 0;
}
You actually don't need to implement long division. You need to implement multiplication by a power of two, and addition. You have four uint_32. First convert each of them to a string. Multiply them by (2^32)^3, (2^32)^2, (2^32)^1, and (2^32)^0 respectively, then add them together. You don't need to do the base conversion, you just need to handle putting the four pieces together. You'll obviously need to make sure the strings can handle a number up to UINT_32_MAX*(2^32)^3.
Supposing you have a fast 32-bit multiplication and division the result can be computed 4 digits at a time by implementing a bigint division/modulo 10000 and then using (s)printf for output of digit groups.
This approach is also trivial to extend to higher (or even variable) precision...
#include <stdio.h>
typedef unsigned long bigint[4];
void print_bigint(bigint src)
{
unsigned long int x[8]; // expanded version (16 bit per element)
int result[12]; // 4 digits per element
int done = 0; // did we finish?
int i = 0; // digit group counter
/* expand to 16-bit per element */
x[0] = src[0] & 65535;
x[1] = src[0] >> 16;
x[2] = src[1] & 65535;
x[3] = src[1] >> 16;
x[4] = src[2] & 65535;
x[5] = src[2] >> 16;
x[6] = src[3] & 65535;
x[7] = src[3] >> 16;
while (!done)
{
done = 1;
{
unsigned long carry = 0;
int j;
for (j=7; j>=0; j--)
{
unsigned long d = (carry << 16) + x[j];
x[j] = d / 10000;
carry = d - x[j] * 10000;
if (x[j]) done = 0;
}
result[i++] = carry;
}
}
printf ("%i", result[--i]);
while (i > 0)
{
printf("%04i", result[--i]);
}
}
int main(int argc, const char *argv[])
{
bigint tests[] = { { 0, 0, 0, 0 },
{ 0xFFFFFFFFUL, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678UL, 0x90abcdefUL, 0xfedcba90UL, 0x8765421UL } };
{
int i;
for (i=0; i<4; i++)
{
print_bigint(tests[i]);
printf("\n");
}
}
return 0;
}
#Alexey Frunze's method is easy but it's very slow. You should use #chill's 32-bit integer method above. Another easy method without any multiplication or division is double dabble. This may work slower than chill's algorithm but much faster than Alexey's one. After running you'll have a packed BCD of the decimal number
On github is an open source project (c++) which provides a class for a datatype uint265_t and uint128_t.
https://github.com/calccrypto/uint256_t
No, I' not affiliated with that project, but I was using it for such a purpose, but I guess it could be usefull for others as well.
For an assignment of a course called High Performance Computing, I required to optimize the following code fragment:
int foobar(int a, int b, int N)
{
int i, j, k, x, y;
x = 0;
y = 0;
k = 256;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
if (i > j){
y = y + 8*(i-j);
}else{
y = y + 8*(j-i);
}
}
}
return x;
}
Using some recommendations, I managed to optimize the code (or at least I think so), such as:
Constant Propagation
Algebraic Simplification
Copy Propagation
Common Subexpression Elimination
Dead Code Elimination
Loop Invariant Removal
bitwise shifts instead of multiplication as they are less expensive.
Here's my code:
int foobar(int a, int b, int N) {
int i, j, x, y, t;
x = 0;
y = 0;
for (i = 0; i <= N; i++) {
t = i + 512;
for (j = i + 1; j <= N; j++) {
x = x + ((i<<3) + (j<<2))*t;
}
}
return x;
}
According to my instructor, a well optimized code instructions should have fewer or less costly instructions in assembly language level.And therefore must be run, the instructions in less time than the original code, ie calculations are made with::
execution time = instruction count * cycles per instruction
When I generate assembly code using the command: gcc -o code_opt.s -S foobar.c,
the generated code has many more lines than the original despite having made some optimizations, and run-time is lower, but not as much as in the original code. What am I doing wrong?
Do not paste the assembly code as both are very extensive. So I'm calling the function "foobar" in the main and I am measuring the execution time using the time command in linux
int main () {
int a,b,N;
scanf ("%d %d %d",&a,&b,&N);
printf ("%d\n",foobar (a,b,N));
return 0;
}
Initially:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
if (i > j){
y = y + 8*(i-j);
}else{
y = y + 8*(j-i);
}
}
}
Removing y calculations:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
}
}
Splitting i, j, k:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 8*i*i + 16*i*k ; // multiple of 1 (no j)
x = x + (4*i + 8*k)*j ; // multiple of j
}
}
Moving them externally (and removing the loop that runs N-i times):
for (i = 0; i <= N; i++) {
x = x + (8*i*i + 16*i*k) * (N-i) ;
x = x + (4*i + 8*k) * ((N*N+N)/2 - (i*i+i)/2) ;
}
Rewritting:
for (i = 0; i <= N; i++) {
x = x + ( 8*k*(N*N+N)/2 ) ;
x = x + i * ( 16*k*N + 4*(N*N+N)/2 + 8*k*(-1/2) ) ;
x = x + i*i * ( 8*N + 16*k*(-1) + 4*(-1/2) + 8*k*(-1/2) );
x = x + i*i*i * ( 8*(-1) + 4*(-1/2) ) ;
}
Rewritting - recalculating:
for (i = 0; i <= N; i++) {
x = x + 4*k*(N*N+N) ; // multiple of 1
x = x + i * ( 16*k*N + 2*(N*N+N) - 4*k ) ; // multiple of i
x = x + i*i * ( 8*N - 20*k - 2 ) ; // multiple of i^2
x = x + i*i*i * ( -10 ) ; // multiple of i^3
}
Another move to external (and removal of the i loop):
x = x + ( 4*k*(N*N+N) ) * (N+1) ;
x = x + ( 16*k*N + 2*(N*N+N) - 4*k ) * ((N*(N+1))/2) ;
x = x + ( 8*N - 20*k - 2 ) * ((N*(N+1)*(2*N+1))/6);
x = x + (-10) * ((N*N*(N+1)*(N+1))/4) ;
Both the above loop removals use the summation formulas:
Sum(1, i = 0..n) = n+1
Sum(i1, i = 0..n) = n(n + 1)/2
Sum(i2, i = 0..n) = n(n + 1)(2n + 1)/6
Sum(i3, i = 0..n) = n2(n + 1)2/4
y does not affect the final result of the code - removed:
int foobar(int a, int b, int N)
{
int i, j, k, x, y;
x = 0;
//y = 0;
k = 256;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
//if (i > j){
// y = y + 8*(i-j);
//}else{
// y = y + 8*(j-i);
//}
}
}
return x;
}
k is simply a constant:
int foobar(int a, int b, int N)
{
int i, j, x;
x = 0;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*256);
}
}
return x;
}
The inner expression can be transformed to: x += 8*i*i + 4096*i + 4*i*j + 2048*j. Use math to push all of them to the outer loop: x += 8*i*i*(N-i) + 4096*i*(N-i) + 2*i*(N-i)*(N+i+1) + 1024*(N-i)*(N+i+1).
You can expand the above expression, and apply sum of squares and sum of cubes formula to obtain a close form expression, which should run faster than the doubly nested loop. I leave it as an exercise to you. As a result, i and j will also be removed.
a and b should also be removed if possible - since a and b are supplied as argument but never used in your code.
Sum of squares and sum of cubes formula:
Sum(x2, x = 1..n) = n(n + 1)(2n + 1)/6
Sum(x3, x = 1..n) = n2(n + 1)2/4
This function is equivalent with the following formula, which contains only 4 integer multiplications, and 1 integer division:
x = N * (N + 1) * (N * (7 * N + 8187) - 2050) / 6;
To get this, I simply typed the sum calculated by your nested loops into Wolfram Alpha:
sum (sum (8*i*i+4096*i+4*i*j+2048*j), j=i+1..N), i=0..N
Here is the direct link to the solution. Think before coding. Sometimes your brain can optimize code better than any compiler.
Briefly scanning the first routine, the first thing you notice is that expressions involving "y" are completely unused and can be eliminated (as you did). This further permits eliminating the if/else (as you did).
What remains is the two for loops and the messy expression. Factoring out the pieces of that expression that do not depend on j is the next step. You removed one such expression, but (i<<3) (ie, i * 8) remains in the inner loop, and can be removed.
Pascal's answer reminded me that you can use a loop stride optimization. First move (i<<3) * t out of the inner loop (call it i1), then calculate, when initializing the loop, a value j1 that equals (i<<2) * t. On each iteration increment j1 by 4 * t (which is a pre-calculated constant). Replace your inner expression with x = x + i1 + j1;.
One suspects that there may be some way to combine the two loops into one, with a stride, but I'm not seeing it offhand.
A few other things I can see. You don't need y, so you can remove its declaration and initialisation.
Also, the values passed in for a and b aren't actually used, so you could use these as local variables instead of x and t.
Also, rather than adding i to 512 each time through you can note that t starts at 512 and increments by 1 each iteration.
int foobar(int a, int b, int N) {
int i, j;
a = 0;
b = 512;
for (i = 0; i <= N; i++, b++) {
for (j = i + 1; j <= N; j++) {
a = a + ((i<<3) + (j<<2))*b;
}
}
return a;
}
Once you get to this point you can also observe that, aside from initialising j, i and j are only used in a single mutiple each - i<<3 and j<<2. We can code this directly in the loop logic, thus:
int foobar(int a, int b, int N) {
int i, j, iLimit, jLimit;
a = 0;
b = 512;
iLimit = N << 3;
jLimit = N << 2;
for (i = 0; i <= iLimit; i+=8) {
for (j = i >> 1 + 4; j <= jLimit; j+=4) {
a = a + (i + j)*b;
}
b++;
}
return a;
}
OK... so here is my solution, along with inline comments to explain what I did and how.
int foobar(int N)
{ // We eliminate unused arguments
int x = 0, i = 0, i2 = 0, j, k, z;
// We only iterate up to N on the outer loop, since the
// last iteration doesn't do anything useful. Also we keep
// track of '2*i' (which is used throughout the code) by a
// second variable 'i2' which we increment by two in every
// iteration, essentially converting multiplication into addition.
while(i < N)
{
// We hoist the calculation '4 * (i+2*k)' out of the loop
// since k is a literal constant and 'i' is a constant during
// the inner loop. We could convert the multiplication by 2
// into a left shift, but hey, let's not go *crazy*!
//
// (4 * (i+2*k)) <=>
// (4 * i) + (4 * 2 * k) <=>
// (2 * i2) + (8 * k) <=>
// (2 * i2) + (8 * 512) <=>
// (2 * i2) + 2048
k = (2 * i2) + 2048;
// We have now converted the expression:
// x = x + 4*(2*i+j)*(i+2*k);
//
// into the expression:
// x = x + (i2 + j) * k;
//
// Counterintuively we now *expand* the formula into:
// x = x + (i2 * k) + (j * k);
//
// Now observe that (i2 * k) is a constant inside the inner
// loop which we can calculate only once here. Also observe
// that is simply added into x a total (N - i) times, so
// we take advantange of the abelian nature of addition
// to hoist it completely out of the loop
x = x + (i2 * k) * (N - i);
// Observe that inside this loop we calculate (j * k) repeatedly,
// and that j is just an increasing counter. So now instead of
// doing numerous multiplications, let's break the operation into
// two parts: a multiplication, which we hoist out of the inner
// loop and additions which we continue performing in the inner
// loop.
z = i * k;
for (j = i + 1; j <= N; j++)
{
z = z + k;
x = x + z;
}
i++;
i2 += 2;
}
return x;
}
The code, without any of the explanations boils down to this:
int foobar(int N)
{
int x = 0, i = 0, i2 = 0, j, k, z;
while(i < N)
{
k = (2 * i2) + 2048;
x = x + (i2 * k) * (N - i);
z = i * k;
for (j = i + 1; j <= N; j++)
{
z = z + k;
x = x + z;
}
i++;
i2 += 2;
}
return x;
}
I hope this helps.
int foobar(int N) //To avoid unuse passing argument
{
int i, j, x=0; //Remove unuseful variable, operation so save stack and Machine cycle
for (i = N; i--; ) //Don't check unnecessary comparison condition
for (j = N+1; --j>i; )
x += (((i<<1)+j)*(i+512)<<2); //Save Machine cycle ,Use shift instead of Multiply
return x;
}