Fastest way to compute maximal n s.t. n over k <= x - c

I'm looking for a fast way to compute the maximal n s.t. n over k <= x for given k and x.
In my context n \leq n' for some known constant n', lets say 1000. k is either 1,2, or 3 and x is choosen at random from 0 ... n' over k
My current approach is to compute the binomial coefficient iterativly, starting from a_0 = k over k = 1. The next coefficient a_1 = k+1 over k can be computed as a_1 = a_0 * (k+1) / 1 and so on.
The current C code looks like this
uint32_t max_bc(const uint32_t a, const uint32_t n, const uint32_t k) {
uint32_t tmp = 1;
int ctr = 0;
uint32_t c = k, d = 1;
while(tmp <= a && ctr < n) {
c += 1;
tmp = tmp*c/d;
ctr += 1;
d += 1;
}
return ctr + k - 1;
}
int main() {
const uint32_t n = 10, w = 2;
for (uint32_t a = 0; a < 10 /*bc(n, w)*/; a++) {
const uint32_t b = max_bc(a, n, w);
printf("%d %d\n", a, b);
}
}
which outputs
0 1
1 2
2 2
3 3
4 3
5 3
6 4
7 4
8 4
9 4
So I'm looking for a Bittrick or something to get around the while-loop to speed up my application. Thats because the while loop gets executedat worst n-k times. Precomputation is not an option, because this code is part of a bigger algorithm which uses a lot of memory.
Thanks to #Aleksei
This is my solution:
template<typename T, const uint32_t k>
inline T opt_max_bc(const T a, const uint32_t n) {
if constexpr(k == 1) {
return n - k - a;
}
if constexpr (k == 2) {
const uint32_t t = __builtin_floor((double)(__builtin_sqrt(8 * a + 1) + 1)/2.);
return n - t - 1;
}
if constexpr (k == 3) {
if (a == 1)
return n-k-1;
float x = a;
float t1 = sqrtf(729.f * x * x);
float t2 = cbrtf(3.f * t1 + 81.f * x);
float t3 = t2 / 2.09f;
float ctr2 = t3;
int ctr = int(ctr2);
return n - ctr - k;
}
if constexpr (k == 4) {
const float x = a;
const float t1 = __builtin_floorf(__builtin_sqrtf(24.f * x + 1.f));
const float t2 = __builtin_floorf(__builtin_sqrtf(4.f * t1 + 5.f));
uint32_t ctr = (t2 + 3.f)/ 2.f - 3;
return n - ctr - k;
}
// will never happen
return -1;
}

If k is really limited to just 1, 2 or 3, you can use different methods depending on k:
k == 1: C(n, 1) = n <= x, so the answer is n.
k == 2: C(n, 2) = n * (n - 1) / 4 <= x. You can solve the equation n * (n - 1) / 4 = x, the positive solution is n = 1/2 (sqrt(16x + 1) + 1), the answer to the initial question should be floor( 1/2 (sqrt(16x + 1) + 1) ).
k == 3: C(n, 3) = n(n-1)(n-2)/6 <= x. There is no nice solution, but the formula for the number of combinations is straightforward, so you can use a binary search to find the answer.

Related

I tried attempting a question in hacker earth and it passed for most of the tested inputs. The answer was partially accepted [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
This post was edited and submitted for review 7 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
This is the question:
Here is the expected answer:
Here is my code:
#include <stdio.h>
int main() {
long trial, n, p, A, B, tp[10000] = {}, min;
scanf("%li", &trial);
for (int i = 0; i < trial; i++) {
scanf("%li %li %li", &p, &A, &B);
for (int m = 0, n = p; m <= p, n >= 0; m++, n--) {
tp[m] = ((A * m * m) + (B * n * n));
}
min = tp[0];
for (int j = 1; j <= p; j++) {
if (tp[j] < min) {
min = tp[j];
}
}
printf("%li\n", min);
}
}
What can I do to get the right answer?
There are some problems in your code:
the initializer = {} will be part of the upcoming C2x standard but it is invalid in many C implementations. The array does not need initializing anyway.
the test m <= p, n >= 0 is somewhat incorrect: only the second part n >= 0 is tested, the comma operator ignores the result of its left operand. This happens to suffice, but you should just write:
for (int m = 0, n = p; m <= p; m++, n--)
the array tp only has 10000 elements, which is not enough for up to 100000 (105) cases as stated in the problem description. You probably have a buffer overflow for some test cases. Note that you can simplify the code and remove the array: you just need a single loop and keep track of the best price:
#include <stdio.h>
int main() {
long trials = 0;
scanf("%li", &trials);
while (trials --> 0) {
long p = 0, A = 0, B = 0;
scanf("%li %li %li", &p, &A, &B);
long min = B * p * p;
for (long m = 1; m <= p; m++) {
long price = A * m * m + B * (p - m) * (p - m);
if (min > price)
min = price;
}
printf("%li\n", min);
}
return 0;
}
The problem can be solved directly using calculus, reducing the time complexity from O(p) to O(1), which might be required to fit within the time limit of 1, albeit 100000 iterations should be easily achieved in 1 second.
The problems consists in finding the minimum of the quadratic function f(x) = A.x2 + B.(p-x)2
Normalizing: f(x) = (A+B).x2 - 2.B.p.x + B.p2
First derivative: f'(x) = 2(A+B).x - 2.B.p
Assuming A and B are positive, the minimum is obtained for x = B.p / (A+B)
The solution is obtained by testing at most 2 numbers: B * p / (A + B) and B * p / (A + B) + 1
Here is an analytical solution:
#include <stdio.h>
int main() {
long trials = 0;
scanf("%li", &trials);
while (trials --> 0) {
long p = 0, A = 0, B = 0;
scanf("%li %li %li", &p, &A, &B);
// assuming A, B and p are positive
long min = 0;
if (A + B != 0) {
long m = B * p / (A + B);
if (m < 0) { // cannot happen if A, B and p are positive
min = B * p * p;
} else
if (m >= p) {
min = A * p * p;
} else {
long price1 = A * m * m + B * (p - m) * (p - m);
long price2 = A * (m + 1) * (m + 1) +
B * (p - m - 1) * (p - m - 1);
min = price1 < price2 ? price1 : price2;
}
}
printf("%li\n", min);
}
return 0;
}

I need to determine the successive odd numbers whose sum is equal to n^3, for n = 1, ..., 20 (as an example 1^3 = 1; 2^3 = 3 + 5; 3^3 = 7 + 9 + 11)

I tried to solve it on my own way, but instead of giving me consecutive odd numbers, it gave me the result of a cubic number. How can I make so it'll give me, after the compilation, consecutive odd numbers whose sum equals to n^3(cubic number) like the examples shown above? Thanks in advance and also do not forget to explain me how you did it. Please use C, when giving a much more viable solution.
#include <stdio.h>
#include <math.h>
int main()
{
int n, sum;
printf("Value of n= ");
scanf("%d",&n);
for(int n=1; n<=20; n++)
{
if(n%2!=0);
}
printf("%d", sum=pow(n,3));
return 0;
}
Instead of thinking it like that, I tried a few numbers and found the solution in math. I guess this is not what you wanted, but it does work and consists of only odd numbers.
1^3=1
2^3=3+5
3^3=7+9+11
4^3=13+15+17+19
5^3=21+23+25+27+29
...
x^3=[x*(x-1)+1]+[x*(x-1)+3]+...+[x*(x-1)+2x-1]
for math proof, define it as a series:
An=x^2-x+1+2(n-1)=x^2-x+1+2n-2=x^2-x+2n-1
Sx=(A1+Ax)*x/2=[(x^2-x+1)+(x^2-x+2x-1)]*x/2=(2x^2)*x/2=x^3
Also, because n(n-1) is always even, the numbers must be odd.
to write it as code:
void printOddNumbers(int n)
{
int a1 = n * (n - 1) + 1;
for(int i = 0; i < n; i++)
{
printf("%d+", a1 + 2 * i);
}
printf("\b=%d^3=%d\n", n, n * n * n);
}
So the output will look like: 13+15+17+19=4^3=64.
So, you have an input, n, which is the number of consecutive odd numbers, which should yield a given sum as a result. So, this is how it looks alike:
sum = k + (k + 2) + ... + (k + 2n - 2) =
= n * k + 2 * (1 + 2 + ... + n - 1) =
= n * k + 2 (n * (n - 1) / 2) =
= n * k + n * (n - 1) =
= n * (n + k - 1)
n is known, k is an unkown odd number. So, for sum = 27 and n = 3 this would mean
3 * (3 + k - 1) = 27
3 + k - 1 = 9
k + 2 = 9
k = 7
7 + 9 + 11 = 27
For sum = 125, k = 5:
5 * (5 + k - 1) = 125
5 + k - 1 = 25
k = 21
21 + 23 + 25 + 27 + 29 = 125
So, the implementation would look like this:
int getK(int n, int sum) {
int k = (sum / n) - n + 1;
int currentSum = k;
int result = k;
for (int i = 1; i < n; i++) currentSum += 2 * i + k;
return ((currentSum == sum) && (k % 2)) ? k : 0;
}
Explanation: We return the smallest of the set when the problem is solvable. If it is not an odd, then we return 0 as a sign of error. Also, if the sum does not add up, then the problem is unsolvable and we return 0.

256-bit integer to string [duplicate]

I'm trying to convert a 128-bit unsigned integer stored as an array of 4 unsigned ints to the decimal string representation in C:
unsigned int src[] = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
printf("%s", some_func(src)); // gives "53072739890371098123344"
(The input and output examples above are completely fictional; I have no idea what that input would produce.)
If I was going to hex, binary or octal, this would be a simple matter of masks and bit shifts to peel of the least significant characters. However, it seems to me that I need to do base-10 division. Unfortunately, I can't remember how to do that across multiple ints, and the system I'm using doesn't support data types larger than 32-bits, so using a 128-bit type is not possible. Using a different language is also out, and I'd rather avoid a big number library just for this one operation.
Division is not necessary:
#include <string.h>
#include <stdio.h>
typedef unsigned long uint32;
/* N[0] - contains least significant bits, N[3] - most significant */
char* Bin128ToDec(const uint32 N[4])
{
// log10(x) = log2(x) / log2(10) ~= log2(x) / 3.322
static char s[128 / 3 + 1 + 1];
uint32 n[4];
char* p = s;
int i;
memset(s, '0', sizeof(s) - 1);
s[sizeof(s) - 1] = '\0';
memcpy(n, N, sizeof(n));
for (i = 0; i < 128; i++)
{
int j, carry;
carry = (n[3] >= 0x80000000);
// Shift n[] left, doubling it
n[3] = ((n[3] << 1) & 0xFFFFFFFF) + (n[2] >= 0x80000000);
n[2] = ((n[2] << 1) & 0xFFFFFFFF) + (n[1] >= 0x80000000);
n[1] = ((n[1] << 1) & 0xFFFFFFFF) + (n[0] >= 0x80000000);
n[0] = ((n[0] << 1) & 0xFFFFFFFF);
// Add s[] to itself in decimal, doubling it
for (j = sizeof(s) - 2; j >= 0; j--)
{
s[j] += s[j] - '0' + carry;
carry = (s[j] > '9');
if (carry)
{
s[j] -= 10;
}
}
}
while ((p[0] == '0') && (p < &s[sizeof(s) - 2]))
{
p++;
}
return p;
}
int main(void)
{
static const uint32 testData[][4] =
{
{ 0, 0, 0, 0 },
{ 1048576, 0, 0, 0 },
{ 0xFFFFFFFF, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 }
};
printf("%s\n", Bin128ToDec(testData[0]));
printf("%s\n", Bin128ToDec(testData[1]));
printf("%s\n", Bin128ToDec(testData[2]));
printf("%s\n", Bin128ToDec(testData[3]));
printf("%s\n", Bin128ToDec(testData[4]));
return 0;
}
Output:
0
1048576
4294967295
4294967296
11248221411398543556294285637029484152
Straightforward division base 2^32, prints decimal digits in reverse order, uses 64-bit arithmetic, complexity O(n) where n is the number of decimal digits in the representation:
#include <stdio.h>
unsigned int a [] = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
/* 24197857161011715162171839636988778104 */
int
main ()
{
unsigned long long d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 32) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [3];
a [2] = d;
d = r / 10;
r = r - d * 10;
a [3] = d;
printf ("%d\n", (unsigned int) r);
}
while (a[0] || a[1] || a[2] || a[3]);
return 0;
}
EDIT: Corrected the loop so it displays a 0 if the array a contains only zeros.
Also, the array is read left to right, a[0] is most-significant, a[3] is least significant digits.
A slow but simple approach is to just printing digits from most significant to least significant using subtraction. Basically you need a function for checking if x >= y and another for computing x -= y when that is the case.
Then you can start counting how many times you can subtract 10^38 (and this will be most significant digit), then how many times you can subtract 10^37 ... down to how many times you can subtract 1.
The following is a full implementation of this approach:
#include <stdio.h>
typedef unsigned ui128[4];
int ge128(ui128 a, ui128 b)
{
int i = 3;
while (i >= 0 && a[i] == b[i])
--i;
return i < 0 ? 1 : a[i] >= b[i];
}
void sub128(ui128 a, ui128 b)
{
int i = 0;
int borrow = 0;
while (i < 4)
{
int next_borrow = (borrow && a[i] <= b[i]) || (!borrow && a[i] < b[i]);
a[i] -= b[i] + borrow;
borrow = next_borrow;
i += 1;
}
}
ui128 deci128[] = {{1u,0u,0u,0u},
{10u,0u,0u,0u},
{100u,0u,0u,0u},
{1000u,0u,0u,0u},
{10000u,0u,0u,0u},
{100000u,0u,0u,0u},
{1000000u,0u,0u,0u},
{10000000u,0u,0u,0u},
{100000000u,0u,0u,0u},
{1000000000u,0u,0u,0u},
{1410065408u,2u,0u,0u},
{1215752192u,23u,0u,0u},
{3567587328u,232u,0u,0u},
{1316134912u,2328u,0u,0u},
{276447232u,23283u,0u,0u},
{2764472320u,232830u,0u,0u},
{1874919424u,2328306u,0u,0u},
{1569325056u,23283064u,0u,0u},
{2808348672u,232830643u,0u,0u},
{2313682944u,2328306436u,0u,0u},
{1661992960u,1808227885u,5u,0u},
{3735027712u,902409669u,54u,0u},
{2990538752u,434162106u,542u,0u},
{4135583744u,46653770u,5421u,0u},
{2701131776u,466537709u,54210u,0u},
{1241513984u,370409800u,542101u,0u},
{3825205248u,3704098002u,5421010u,0u},
{3892314112u,2681241660u,54210108u,0u},
{268435456u,1042612833u,542101086u,0u},
{2684354560u,1836193738u,1126043566u,1u},
{1073741824u,1182068202u,2670501072u,12u},
{2147483648u,3230747430u,935206946u,126u},
{0u,2242703233u,762134875u,1262u},
{0u,952195850u,3326381459u,12621u},
{0u,932023908u,3199043520u,126217u},
{0u,730304488u,1925664130u,1262177u},
{0u,3008077584u,2076772117u,12621774u},
{0u,16004768u,3587851993u,126217744u},
{0u,160047680u,1518781562u,1262177448u}};
void print128(ui128 x)
{
int i = 38;
int z = 0;
while (i >= 0)
{
int c = 0;
while (ge128(x, deci128[i]))
{
c++; sub128(x, deci128[i]);
}
if (i==0 || z || c > 0)
{
z = 1; putchar('0' + c);
}
--i;
}
}
int main(int argc, const char *argv[])
{
ui128 test = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
print128(test);
return 0;
}
That number in the problem text in decimal becomes
11248221411398543556294285637029484152
and Python agrees this is the correct value (this of course doesn't mean the code is correct!!! ;-) )
Same thing, but with 32-bit integer arithmetic:
#include <stdio.h>
unsigned short a [] = {
0x0876, 0x5421,
0xfedc, 0xba90,
0x90ab, 0xcdef,
0x1234, 0x5678
};
int
main ()
{
unsigned int d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 16) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [3];
a [2] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [4];
a [3] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [5];
a [4] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [6];
a [5] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [7];
a [6] = d;
d = r / 10;
r = r - d * 10;
a [7] = d;
printf ("%d\n", r);
}
while (a[0] || a[1] || a[2] || a[3] || a [4] || a [5] || a[6] || a[7]);
return 0;
}
You actually don't need to implement long division. You need to implement multiplication by a power of two, and addition. You have four uint_32. First convert each of them to a string. Multiply them by (2^32)^3, (2^32)^2, (2^32)^1, and (2^32)^0 respectively, then add them together. You don't need to do the base conversion, you just need to handle putting the four pieces together. You'll obviously need to make sure the strings can handle a number up to UINT_32_MAX*(2^32)^3.
Supposing you have a fast 32-bit multiplication and division the result can be computed 4 digits at a time by implementing a bigint division/modulo 10000 and then using (s)printf for output of digit groups.
This approach is also trivial to extend to higher (or even variable) precision...
#include <stdio.h>
typedef unsigned long bigint[4];
void print_bigint(bigint src)
{
unsigned long int x[8]; // expanded version (16 bit per element)
int result[12]; // 4 digits per element
int done = 0; // did we finish?
int i = 0; // digit group counter
/* expand to 16-bit per element */
x[0] = src[0] & 65535;
x[1] = src[0] >> 16;
x[2] = src[1] & 65535;
x[3] = src[1] >> 16;
x[4] = src[2] & 65535;
x[5] = src[2] >> 16;
x[6] = src[3] & 65535;
x[7] = src[3] >> 16;
while (!done)
{
done = 1;
{
unsigned long carry = 0;
int j;
for (j=7; j>=0; j--)
{
unsigned long d = (carry << 16) + x[j];
x[j] = d / 10000;
carry = d - x[j] * 10000;
if (x[j]) done = 0;
}
result[i++] = carry;
}
}
printf ("%i", result[--i]);
while (i > 0)
{
printf("%04i", result[--i]);
}
}
int main(int argc, const char *argv[])
{
bigint tests[] = { { 0, 0, 0, 0 },
{ 0xFFFFFFFFUL, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678UL, 0x90abcdefUL, 0xfedcba90UL, 0x8765421UL } };
{
int i;
for (i=0; i<4; i++)
{
print_bigint(tests[i]);
printf("\n");
}
}
return 0;
}
#Alexey Frunze's method is easy but it's very slow. You should use #chill's 32-bit integer method above. Another easy method without any multiplication or division is double dabble. This may work slower than chill's algorithm but much faster than Alexey's one. After running you'll have a packed BCD of the decimal number
On github is an open source project (c++) which provides a class for a datatype uint265_t and uint128_t.
https://github.com/calccrypto/uint256_t
No, I' not affiliated with that project, but I was using it for such a purpose, but I guess it could be usefull for others as well.

Optimization of C code

For an assignment of a course called High Performance Computing, I required to optimize the following code fragment:
int foobar(int a, int b, int N)
{
int i, j, k, x, y;
x = 0;
y = 0;
k = 256;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
if (i > j){
y = y + 8*(i-j);
}else{
y = y + 8*(j-i);
}
}
}
return x;
}
Using some recommendations, I managed to optimize the code (or at least I think so), such as:
Constant Propagation
Algebraic Simplification
Copy Propagation
Common Subexpression Elimination
Dead Code Elimination
Loop Invariant Removal
bitwise shifts instead of multiplication as they are less expensive.
Here's my code:
int foobar(int a, int b, int N) {
int i, j, x, y, t;
x = 0;
y = 0;
for (i = 0; i <= N; i++) {
t = i + 512;
for (j = i + 1; j <= N; j++) {
x = x + ((i<<3) + (j<<2))*t;
}
}
return x;
}
According to my instructor, a well optimized code instructions should have fewer or less costly instructions in assembly language level.And therefore must be run, the instructions in less time than the original code, ie calculations are made with::
execution time = instruction count * cycles per instruction
When I generate assembly code using the command: gcc -o code_opt.s -S foobar.c,
the generated code has many more lines than the original despite having made ​​some optimizations, and run-time is lower, but not as much as in the original code. What am I doing wrong?
Do not paste the assembly code as both are very extensive. So I'm calling the function "foobar" in the main and I am measuring the execution time using the time command in linux
int main () {
int a,b,N;
scanf ("%d %d %d",&a,&b,&N);
printf ("%d\n",foobar (a,b,N));
return 0;
}
Initially:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
if (i > j){
y = y + 8*(i-j);
}else{
y = y + 8*(j-i);
}
}
}
Removing y calculations:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
}
}
Splitting i, j, k:
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 8*i*i + 16*i*k ; // multiple of 1 (no j)
x = x + (4*i + 8*k)*j ; // multiple of j
}
}
Moving them externally (and removing the loop that runs N-i times):
for (i = 0; i <= N; i++) {
x = x + (8*i*i + 16*i*k) * (N-i) ;
x = x + (4*i + 8*k) * ((N*N+N)/2 - (i*i+i)/2) ;
}
Rewritting:
for (i = 0; i <= N; i++) {
x = x + ( 8*k*(N*N+N)/2 ) ;
x = x + i * ( 16*k*N + 4*(N*N+N)/2 + 8*k*(-1/2) ) ;
x = x + i*i * ( 8*N + 16*k*(-1) + 4*(-1/2) + 8*k*(-1/2) );
x = x + i*i*i * ( 8*(-1) + 4*(-1/2) ) ;
}
Rewritting - recalculating:
for (i = 0; i <= N; i++) {
x = x + 4*k*(N*N+N) ; // multiple of 1
x = x + i * ( 16*k*N + 2*(N*N+N) - 4*k ) ; // multiple of i
x = x + i*i * ( 8*N - 20*k - 2 ) ; // multiple of i^2
x = x + i*i*i * ( -10 ) ; // multiple of i^3
}
Another move to external (and removal of the i loop):
x = x + ( 4*k*(N*N+N) ) * (N+1) ;
x = x + ( 16*k*N + 2*(N*N+N) - 4*k ) * ((N*(N+1))/2) ;
x = x + ( 8*N - 20*k - 2 ) * ((N*(N+1)*(2*N+1))/6);
x = x + (-10) * ((N*N*(N+1)*(N+1))/4) ;
Both the above loop removals use the summation formulas:
Sum(1, i = 0..n) = n+1
Sum(i1, i = 0..n) = n(n + 1)/2
Sum(i2, i = 0..n) = n(n + 1)(2n + 1)/6
Sum(i3, i = 0..n) = n2(n + 1)2/4
y does not affect the final result of the code - removed:
int foobar(int a, int b, int N)
{
int i, j, k, x, y;
x = 0;
//y = 0;
k = 256;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*k);
//if (i > j){
// y = y + 8*(i-j);
//}else{
// y = y + 8*(j-i);
//}
}
}
return x;
}
k is simply a constant:
int foobar(int a, int b, int N)
{
int i, j, x;
x = 0;
for (i = 0; i <= N; i++) {
for (j = i + 1; j <= N; j++) {
x = x + 4*(2*i+j)*(i+2*256);
}
}
return x;
}
The inner expression can be transformed to: x += 8*i*i + 4096*i + 4*i*j + 2048*j. Use math to push all of them to the outer loop: x += 8*i*i*(N-i) + 4096*i*(N-i) + 2*i*(N-i)*(N+i+1) + 1024*(N-i)*(N+i+1).
You can expand the above expression, and apply sum of squares and sum of cubes formula to obtain a close form expression, which should run faster than the doubly nested loop. I leave it as an exercise to you. As a result, i and j will also be removed.
a and b should also be removed if possible - since a and b are supplied as argument but never used in your code.
Sum of squares and sum of cubes formula:
Sum(x2, x = 1..n) = n(n + 1)(2n + 1)/6
Sum(x3, x = 1..n) = n2(n + 1)2/4
This function is equivalent with the following formula, which contains only 4 integer multiplications, and 1 integer division:
x = N * (N + 1) * (N * (7 * N + 8187) - 2050) / 6;
To get this, I simply typed the sum calculated by your nested loops into Wolfram Alpha:
sum (sum (8*i*i+4096*i+4*i*j+2048*j), j=i+1..N), i=0..N
Here is the direct link to the solution. Think before coding. Sometimes your brain can optimize code better than any compiler.
Briefly scanning the first routine, the first thing you notice is that expressions involving "y" are completely unused and can be eliminated (as you did). This further permits eliminating the if/else (as you did).
What remains is the two for loops and the messy expression. Factoring out the pieces of that expression that do not depend on j is the next step. You removed one such expression, but (i<<3) (ie, i * 8) remains in the inner loop, and can be removed.
Pascal's answer reminded me that you can use a loop stride optimization. First move (i<<3) * t out of the inner loop (call it i1), then calculate, when initializing the loop, a value j1 that equals (i<<2) * t. On each iteration increment j1 by 4 * t (which is a pre-calculated constant). Replace your inner expression with x = x + i1 + j1;.
One suspects that there may be some way to combine the two loops into one, with a stride, but I'm not seeing it offhand.
A few other things I can see. You don't need y, so you can remove its declaration and initialisation.
Also, the values passed in for a and b aren't actually used, so you could use these as local variables instead of x and t.
Also, rather than adding i to 512 each time through you can note that t starts at 512 and increments by 1 each iteration.
int foobar(int a, int b, int N) {
int i, j;
a = 0;
b = 512;
for (i = 0; i <= N; i++, b++) {
for (j = i + 1; j <= N; j++) {
a = a + ((i<<3) + (j<<2))*b;
}
}
return a;
}
Once you get to this point you can also observe that, aside from initialising j, i and j are only used in a single mutiple each - i<<3 and j<<2. We can code this directly in the loop logic, thus:
int foobar(int a, int b, int N) {
int i, j, iLimit, jLimit;
a = 0;
b = 512;
iLimit = N << 3;
jLimit = N << 2;
for (i = 0; i <= iLimit; i+=8) {
for (j = i >> 1 + 4; j <= jLimit; j+=4) {
a = a + (i + j)*b;
}
b++;
}
return a;
}
OK... so here is my solution, along with inline comments to explain what I did and how.
int foobar(int N)
{ // We eliminate unused arguments
int x = 0, i = 0, i2 = 0, j, k, z;
// We only iterate up to N on the outer loop, since the
// last iteration doesn't do anything useful. Also we keep
// track of '2*i' (which is used throughout the code) by a
// second variable 'i2' which we increment by two in every
// iteration, essentially converting multiplication into addition.
while(i < N)
{
// We hoist the calculation '4 * (i+2*k)' out of the loop
// since k is a literal constant and 'i' is a constant during
// the inner loop. We could convert the multiplication by 2
// into a left shift, but hey, let's not go *crazy*!
//
// (4 * (i+2*k)) <=>
// (4 * i) + (4 * 2 * k) <=>
// (2 * i2) + (8 * k) <=>
// (2 * i2) + (8 * 512) <=>
// (2 * i2) + 2048
k = (2 * i2) + 2048;
// We have now converted the expression:
// x = x + 4*(2*i+j)*(i+2*k);
//
// into the expression:
// x = x + (i2 + j) * k;
//
// Counterintuively we now *expand* the formula into:
// x = x + (i2 * k) + (j * k);
//
// Now observe that (i2 * k) is a constant inside the inner
// loop which we can calculate only once here. Also observe
// that is simply added into x a total (N - i) times, so
// we take advantange of the abelian nature of addition
// to hoist it completely out of the loop
x = x + (i2 * k) * (N - i);
// Observe that inside this loop we calculate (j * k) repeatedly,
// and that j is just an increasing counter. So now instead of
// doing numerous multiplications, let's break the operation into
// two parts: a multiplication, which we hoist out of the inner
// loop and additions which we continue performing in the inner
// loop.
z = i * k;
for (j = i + 1; j <= N; j++)
{
z = z + k;
x = x + z;
}
i++;
i2 += 2;
}
return x;
}
The code, without any of the explanations boils down to this:
int foobar(int N)
{
int x = 0, i = 0, i2 = 0, j, k, z;
while(i < N)
{
k = (2 * i2) + 2048;
x = x + (i2 * k) * (N - i);
z = i * k;
for (j = i + 1; j <= N; j++)
{
z = z + k;
x = x + z;
}
i++;
i2 += 2;
}
return x;
}
I hope this helps.
int foobar(int N) //To avoid unuse passing argument
{
int i, j, x=0; //Remove unuseful variable, operation so save stack and Machine cycle
for (i = N; i--; ) //Don't check unnecessary comparison condition
for (j = N+1; --j>i; )
x += (((i<<1)+j)*(i+512)<<2); //Save Machine cycle ,Use shift instead of Multiply
return x;
}

Euler 160 : Find the non trivial 5 digits of the factorial

Given a number find the 5 digits before the trailing 0. 9! = 362880
so f(9)=36288 10! = 3628800 so f(10)=36288 20! = 2432902008176640000
so f(20)=17664 Find f(1,000,000,000,000)
For this I have computed the f(10^6) and then f(10^12) =
(f(10^6))^(10^6) for computing the f(n) ... I am computing the
factorial by removing any 5 and corresponding 2 so that all the
trailing zeros are removed.
But I am getting a wrong answer.
Is there a problem in approach or some silly mistake ?
Code for reference
long long po(long long n, long long m, long long mod) {
if (m == 0) return 1;
if (m == 1) return n % mod;
long long r = po(n, m / 2, mod) % mod;
if (m % 2 == 0) return (r * r) % mod;
return (((r * r) % mod) * n) % mod;
}
void foo() {
unsigned long long i, res = 1, m = 1000000 , c = 0, j, res1 = 1, mod;
mod = ceil(pow(10, 9));
cout << mod << endl;
long long a = 0, a2 = 0, a5 = 0;
for (i = 1 ; i <= m; i++) {
j = i;
while (j % 10 == 0)
j /= 10;
while (j % 2 == 0) {
j /= 2;
a2++;
}
while (j % 5 == 0) {
j /= 5;
a5++;
}
res = (res * j ) % mod;
}
a = a2 - a5;
for (i = 1; i <= a; i++)
res = (res * 2) % mod;
for (i = 1; i <= 1000000; i++) {
res1 = (res1 * res) % mod;
}
cout << res1 << endl;
}
Your equality f(10^12) = (f(10^6))^(10^6) is wrong. f() is based on factorials, not powers.
Your assumptions are erroneous:
f(10^12) is not the same as f(10^6)^(10^6).
in order to get the low order non 0 digits of the factorial, it does not suffice to remove all multiples of 10, 2 and 5 from the multiplicands. Removing the multiples of 10 is a good idea, for 5 and 2, you should remove the factor 2 or 5 only if the other multiplicand is a multiple of 5 and 2 respectively.
You should simplify the code and compute modulo some power of 10, but 10^9 seems too high as 10^9 * 10^12 will overflow 64-bit type unsigned long long.

Resources