Optimizing calculating combination and avoiding overflows - c

I am solving a programming problem which is stuck at calculating nCr efficiently and at the same time avoiding overflows. I have made the following trivial simplification but am just curious about if there are any more sophisticated simplifications available out there.
(n)!/(n-k)!*k! = n*(n-1)*.....*(max(n-k+1, k))/(min(n-k, k-1))
Can there be any more simplification possible considering different cases for k as even or odd, just suggesting a way.
Any comment is appreciated.

I found an interesting solution here: http://blog.plover.com/math/choose.html
unsigned choose(unsigned n, unsigned k) {
unsigned r = 1;
unsigned d;
if (k > n) return 0;
for (d=1; d <= k; d++) {
r *= n--;
r /= d;
}
return r;
}
This avoids overflows (or at least limits the problem) by performing multiplication and division alternatively.
E.g. for n = 8, k = 4:
result = 1;
result *= 8;
result /= 1;
result *= 7;
result /= 2;
result *= 6;
result /= 3;
result *= 5;
result /= 4;
done

I had to solve this problem, too. What I did was use the fact that there are the same number of multiplications as divisions and bundled them together, taking one multiplication and one division at a time. It comes out as an integer at the end, but I use double for the intermediate terms and then round to the nearest integer at the end.
// Return the number of combinations of 'n choose k'
unsigned int binomial(unsigned int n, unsigned int k) {
unsigned int higher_idx;
unsigned int lower_idx;
if(k > n-k) {
higher_idx = k;
lower_idx = n - k;
} else {
higher_idx = n - k;
lower_idx = k;
}
double product = 1.0;
double factor;
unsigned int idx;
for(idx=n; idx>higher_idx; idx--) {
factor = (double)idx / double(lower_idx - (n - idx));
product *= factor;
}
return (unsigned int)(product + 0.5);
}

Related

Failed to reuse variable in C

I'm trying to code a program that can tell apart real and fake credit card numbers using Luhn's algorithm in C, which is
Multiply every other digit by 2, starting with the number’s
second-to-last digit, and then add those products’ digits together.
Add the sum to the sum of the digits that weren’t multiplied by 2.
If the total’s last digit is 0 (or, put more formally, if the total
modulo 10 is congruent to 0), the number is valid!
Then I coded something like this (I already declared all the functions at the top and included all the necessary libraries)
//Luhn's Algorithm
int luhn(long z)
{
int c;
return c = (sumall(z)-sumodd(z)) * 2 + sumaodd(z);
}
//sum of digits in odd position starting from the end
int sumodd(long x)
{
int a;
while(x)
{
a = a + x % 10;
x /= 100;
}
return a;
}
//sum of all digits
int sumall(long y)
{
int b;
while(y)
{
b = b + y % 10;
y /= 10;
}
return b;
}
But somehow it always gives out the wrong answer even though there's no error or bug detected. I came to notice that it works fine when my variable z stands alone, but when it's used multiple times in the same line of code with different functions, their values get messed up (in function luhn). I'm writing this to ask for any fix I can make to make my code run correctly as I intended.
I'd appreciate any help as I'm very new to this, and I'm not a native English speaker so I may have messed up some technical terms, but I hope you'd be able to understand my concerns.
sumall is wrong.
It should be sumeven from:
Add the sum to the sum of the digits that weren’t multiplied by 2.
Your sumall is summing all digits instead of the non-odd (i.e. even) digits.
You should do the * 2 inside sumodd as it should not be applied to the other [even] sum. And, it should be applied to the individual digits [vs the total sum].
Let's start with a proper definition from https://en.wikipedia.org/wiki/Luhn_algorithm
The check digit is computed as follows:
If the number already contains the check digit, drop that digit to form the "payload." The check digit is most often the last digit.
With the payload, start from the rightmost digit. Moving left, double the value of every second digit (including the rightmost digit).
Sum the digits of the resulting value in each position (using the original value where a digit did not get doubled in the previous step).
The check digit is calculated by 10 − ( s mod ⁡ 10 )
Note that if we have a credit card of 9x where x is the check digit, then the payload is 9.
The correct [odd] sum for that digit is: 9 * 2 --> 18 --> 1 + 8 --> 9
But, sumodd(9x) * 2 --> 9 * 2 --> 18
Here's what I came up with:
// digsum -- calculate sum of digits
static inline int
digsum(int digcur)
{
int sum = 0;
for (; digcur != 0; digcur /= 10)
sum += digcur % 10;
return sum;
}
// luhn -- luhn's algorithm using digits array
int
luhn(long z)
{
char digits[16] = { 0 };
// get check digit and remove from "payload"
int check_expected = z % 10;
z /= 10;
// split into digits (we use little-endian)
int digcnt = 0;
for (digcnt = 0; z != 0; ++digcnt, z /= 10)
digits[digcnt] = z % 10;
int sum = 0;
for (int digidx = 0; digidx < digcnt; ++digidx) {
int digcur = digits[digidx];
if ((digidx & 1) == 0)
sum += digsum(digcur * 2);
else
sum += digcur;
}
int check_actual = 10 - (sum % 10);
return (check_actual == check_expected);
}
// luhn -- luhn's algorithm using long directly
int
luhn2(long z)
{
// get check digit and remove from "payload"
int check_expected = z % 10;
z /= 10;
int sum = 0;
for (int digidx = 0; z != 0; ++digidx, z /= 10) {
int digcur = z % 10;
if ((digidx & 1) == 0)
sum += digsum(digcur * 2);
else
sum += digcur;
}
int check_actual = 10 - (sum % 10);
return (check_actual == check_expected);
}
You've invoked undefined behavior by not initializing a few local variables in your functions, for instance you can remove your undefined behaviour in sumodd() by initializing a to zero like so:
//sum of digits in odd position starting from the end
int sumodd(long x)
{
int a = 0; //Initialize
while(x)
{
a += x % 10; //You can "a += b" instead of "a = a + b"
x /= 100;
}
return a;
}
It's also important to note that long is only required to be a minimum of 4-bytes wide, so it is not guaranteed to be wide enough to represent a decimal-16-digit-integer. Using long long solves this problem.
Alternatively you may find this problem much easier to solve by treating your credit card number as a char[] instead of an integer type altogether, for instance if we assume a 16-digit credit card number:
int luhn(long long z){
char number[16]; //Convert CC number to array of digits and store them here
for(int c = 0; c < 16; ++c){
number[c] = z % 10; //Last digit is at number[0], first digit is at number[15]
z /= 10;
}
int sum = 0;
for(int c = 0; c < 16; c += 2){
sum += number[c] + number[c + 1] * 2; //Sum the even digits and the doubled odd digits
}
return sum;
}
...and you could skip the long long to char[] translation part altogether if you treat the credit card number as an array of digits in the whole program
This expression:
(sumall(z)-sumodd(z)) * 2 + sumall(z);
Should be:
((sumall(z)-sumodd(z)) * 2 + sumodd(z))%10;
Based on your own definition.
But how about:
(sumall(z) * 2 - sumodd(z))%10
If you're trying to be smart and base off sumall(). You don't need to call anything twice.
Also you don't initialise your local variables. You must assign variables values before using them in C.
Also you don't need the local variable c in the luhn() function. It's harmless but unnecessary.
As others mention in a real-world application we can't recommend enough that such 'codes' are held in a character array. The amount of grief caused by people using integer types to represent digit sequence 'codes' and identifiers is vast. Unless a variable represents a numerical quantity of something, don't represent it as an arithmetic type. More issue has been caused in my career by that error than people trying to use double to represent monetary amounts.
#include <stdio.h>
//sum of digits in odd position starting from the end
int sumodd(long x)
{
int a=0;
while(x)
{
a = a + x % 10;
x /= 100;
}
return a;
}
//sum of all digits
int sumall(long y)
{
int b=0;
while(y)
{
b = b + y % 10;
y /= 10;
}
return b;
}
//Luhn's Algorithm
int luhn(long z)
{
return (sumall(z)*2-sumodd(z))%10;
}
int check_luhn(long y,int expect){
int result=luhn(y);
if(result==expect){
return 0;
}
return 1;
}
int check_sumodd(long y,int expect){
int result=sumodd(y);
if(result==expect){
return 0;
}
return 1;
}
int check_sumall(long y,int expect){
int result=sumall(y);
if(result==expect){
return 0;
}
return 1;
}
int main(void) {
int errors=0;
errors+=check_sumall(1,1);
errors+=check_sumall(12,3);
errors+=check_sumall(123456789L,45);
errors+=check_sumall(4273391,4+2+7+3+3+9+1);
errors+=check_sumodd(1,1);
errors+=check_sumodd(91,1);
errors+=check_sumodd(791,8);
errors+=check_sumodd(1213191,1+1+1+1);
errors+=check_sumodd(4273391,15);
errors+=check_luhn(1234567890,((9+7+5+3+1)*2+(0+8+6+4+2))%10);
errors+=check_luhn(9264567897,((9+7+5+6+9)*2+(7+8+6+4+2))%10);
if(errors!=0){
printf("*ERRORS*\n");
}else{
printf("Success\n");
}
return 0;
}

How to get the result 2^100 * 3^3 in modulo 1000000007

I have a question, how to get a result of (2^100)*(3^5) in modulo 10^9 + 7? The program will ask the user to input the power (2^a) and 3^b, after that, the output will show the result of 2^a * 3^b.
I tried to convert all the big numbers into modulo, and times the modulo. But, it doesnt work for 2*100 * 3^5
#include "stdio.h"
int main()
{
long long int testcase,b,c,N,a;
long long int pow2,pow3 = 1;
long long int m = 1000000007;
// input the power
scanf("%lld",&a); getchar();
scanf("%lld",&b); getchar();
// power of 2 (2^a)
for(int i = 1; i <= a; i++){
pow2 = pow2 * 2;
}
// power of 3 (3^b)
for(int j = 1; j <= b; j++){
pow3 = pow3 * 3;
}
// convert the big numbers into modulo
long long int i = 1;
i = (1*pow2) % m ;
long long int j = 1;
j = (1*pow3) % m;
// the result of first modulo times second modulo
printf("%lld\n", i*j);
// doesnt work for 2^100 * 3^5
return 0;
}
For a = 2 and b = 5 its gives the output of 972 (which is correct)
for a = 100 and b = 3 its gives 0 output.
Firstly, pow2 is uninitialized and therefore the behaviour is undefined. If initialized to 1, then the problem is that 2^100 does not fit in the long long int. The best fix is to take the modulo as often as possible.
// power of 2 (2^a)
for(int i = 1; i <= a; i++){
pow2 *= 2;
pow2 %= m;
}
// power of 3 (3^b)
for(int j = 1; j <= b; j++){
pow3 *= 3;
pow3 %= m;
}
Notice that this is still suboptimal - it is possible to calculate much larger powers by using exponentiation by squaring.
Finally you must note that the last product must be mod 1000000007 too, otherwise the result is larger than expected:
printf("%lld\n", i * j % m);

Read from the standard input a natural number, n. Find the greatest perfect square that is less than or equal to n

#include <stdio.h>
#include <stdlib.h>
int main() {
int i, j, n, maxi = 0;
printf("\n Introduce the number:\n");
scanf("%d", &n);
for (j = 1; j <= n; j++)
{
i = 0;
while (i < j) {
i++;
if (j == i * i) {
if (j > maxi) {
maxi = j;
printf("%d", maxi);
}
}
}
}
return 0;
}
I have to find the greatest perfect square smaller than than a number n, I succeeded in finding all the perfect squares that are smaller than the number n but because each time it finds a perfect square it displays it I couldn't think of any way to compare all the perfect square that were found (or at least that's what I think the problem is) so I would appreciate some help. I already know that you could also solve this problem using a more simpler method ( like the one below ) and if you have any other ideas on how to solve it I'd like to hear them.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int n,j;
printf("\n Your number:\n");
scanf("%d",&n);
j=(int)sqrt(n);
printf("%d",j*j);
return 0;
}
You only need a single loop here. Check if i*i <= n. If so, set maxi to i*i and increment i:
int n, i = 1, sq = 1;
printf("\n Introduce the number:\n");
scanf("%d", &n);
while (i*i <= n) {
sq = i*i;
i++;
}
printf("sq=%d\n", sq);
Find the greatest perfect square that is less than or equal to n
For n>=0, this is akin to finding the integer square root of n.
unsigned greatest_perfect_square(unsigned x) {
unsigned root = usqrt(x);
return root * root;
}
if you have any other ideas on how to solve it I'd like to hear them.
The order of complexity to find the square root is O(bit-width-of-type-n). e.g. 16 iterations.
#include <limits.h>
unsigned usqrt(unsigned x) {
unsigned y = 0;
unsigned xShifted = 0;
const unsigned MSBit = UINT_MAX - UINT_MAX/2;
// This constant relies on no padding and bit width even
const unsigned TwoBitCount_N = sizeof(x) * CHAR_BIT / 2;
for (unsigned TwoBitCount = TwoBitCount_N; TwoBitCount > 0; TwoBitCount--) {
// Shift `xShifted` 2 places left while shifting in the 2 MSbits of x
xShifted <<= 1;
if (x & MSBit) {
xShifted |= 1;
}
x <<= 1;
xShifted <<= 1;
if (x & MSBit) {
xShifted |= 1;
}
x <<= 1;
// Shift the answer 1 bit left
y <<= 1;
// Form test value as y*2 + 1
unsigned Test = (y << 1) | 1;
// If xShifted big enough ...
if (xShifted >= Test) {
xShifted -= Test;
// Increment answer
y |= 1;
}
}
return y;
}
OP's method is far far slower. Even the inner loop takes O(sqrt(n)) time.
Note:
OP's code: j == i * i is subject to overflow and leads to the incorrect answer when j is larger.
j/i == i performs a like test without overflow.
#Jonathan Leffler suggested a Newton-Raphson approximation approach. Some lightly tested code below works quite fast, often taking only a few iterations.
I suspect this is O(log(bit-width-of-type-n)) for the main part, yet of course still O(log(bit-width-of-type-n)) for bit_width().
Both of the functions could be improved.
unsigned bit_width(unsigned x) {
unsigned width = 0;
while (x) {
x /= 2;
width++;
}
return width;
}
unsigned usqrt_NR(unsigned x) {
if (x == 0) {
return 0;
}
unsigned y = 1u << bit_width(x)/2;
unsigned y_previous;
unsigned diff;
unsigned diff1count = 0;;
do {
y_previous = y;
y = (y + x/y)/2;
diff = y_previous < y ? y - y_previous : y_previous - y;
if (diff == 1) diff1count++;
} while (diff > 1 || (diff == 1 && diff1count <= 1));
y = (y_previous + y)/2;
return y;
}
This minimizes the number of multiplications: it looks for the first square which is larger than n, meaning that the perfect square immediately before was the solution.
for (i = 1; i <= n; i++) {
if (i*i > n) {
break;
}
}
i--;
// i*i is your answer
On some platforms it might be useful to exploit the fact that (i+1)*(i+1) = i*i + 2*i + 1, or in other words, if you already have i^2, (i+1)^2 is obtained by adding i to it twice, and incrementing by 1; and at the beginning, 0^2 is 0 to prime the cycle.
for (i = 0, sq = 0; i < n; i++) {
sq += i; // Or on some platforms sq += i<<1 instead of two sums
sq += i; // Some compilers will auto-optimize "sq += 2*i" for the platform
sq++; // Or even sq += ((2*i)|1) as adding 1 to even numbers is OR'ing 1
if (sq > n) {
break;
}
// if sq is declared as signed integer, a possible overflow will
// show it as being negative. This way we can still get a "correct" result
// with i the smallest root that does not overflow.
// In 16-bit arithmetic this is 181, root of 32761; next square would be
// 33124 which cannot be represented in signed 16-bit space.
if (sq < 0) {
break;
}
}
// (i*i) is your answer

For loop with unsigned int

I have a logical problem in my code, maybe it is caused by overflowing but I can't solve this on my own, so I would be thankful if anyone can help me.
In the following piece of code, I have implemented the function taylor_log(), which can count "n" iterations of taylor polynomial. In the void function I am looking for number of iterations (*limit) which is enough to count a logarithm with desired accuracy compared to log function from .
The thing is that sometimes UINT_MAX is not enough iterations to get the desired accuracy and at this point I want to let the user know that the number of needed iterations is higher than UINT_MAX. But my code don't work, for example for x = 1e+280, eps = 623. It just counts, counts and never give result.
TaylorPolynomial
double taylor_log(double x, unsigned int n){
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; i <= n; i++)
{
f_sum *= (x - 1) / x;
sum += f_sum / i;
}
return sum;
}
void guessIt(double x, double eps, unsigned int *limit){
*limit = 10;
double real_log = log(x);
double t_log = taylor_log(x, *limit);
while(myabs(real_log - t_log) > eps)
{
if (*limit == UINT_MAX)
{
*limit = 0;
break;
}
if (*limit >= UINT_MAX/2)
{
*limit = UINT_MAX;
t_log = taylor_log(x, *limit);
}
else
{
*limit = (*limit) *2;
t_log = taylor_log(x, *limit);
}
}
}
EDIT: Ok guys, thanks for your reactions so far. I have changed my code to this:
if (*limit == UINT_MAX-1)
{
*limit = 0;
break;
}
if (*limit >= UINT_MAX/2)
{
*limit = UINT_MAX-1;
t_log = taylor_log(x, *limit);
}
but it still doesn't work correctly, I have set printf to the beggining of taylor_log() function to see the value of "n" and its (..., 671088640, 1342177280, 2684354560, 5, 4, 3, 2, 2, 1, 2013265920, ...). Don't understand it..
This code below assigns the limit to UINT_MAX
if (*limit >= UINT_MAX/2)
{
*limit = UINT_MAX;
t_log = taylor_log(x, *limit);
}
And your for loop is defined like this:
for (unsigned int i = 1; i <= n; i++)
i will ALWAYS be less than or equal to UINT_MAX because there is never going to be a value of i that is greater than UINT_MAX. Because that's the largest value i could ever be. So there is certainly overflow and your loop exit condition is never met. i rolls over to zero and the process repeats indefinitely.
You should change your loop condition to i < n or change your limit to UINT_MAX - 1.
[Edit]
OP coded correctly but must insure a limited range (0.5 < x < 2.0 ?)
Below is a code version that self determines when to stop. Iteration count goes high near x near 0.5 and 2.0. The iteration count needed goes into the millions. Such the alternative coded far below.
double taylor_logA(double x) {
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; ; i++) {
f_sum *= (x - 1) / x;
double sum_before = sum;
sum += f_sum / i;
if (sum_before == sum) {
printf("%d\n", i);
break;
}
}
return sum;
}
Wrongalternative implementation of the series: Ref
Sample alternative - it converges faster.
double taylor_log2(double x, unsigned int n) {
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; i <= n; i++) {
f_sum *= (x - 1) / 1; // / 1 (or remove)
if (i & 1) sum += f_sum / i;
else sum -= f_sum / i; // subtract even terms
}
return sum;
}
A reasonable number of terms will converge as needed.
Alternatively, continue until terms are too small (maybe 50 or so)
double taylor_log3(double x) {
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; ; i++) {
double sum_before = sum;
f_sum *= x - 1;
if (i & 1) sum += f_sum / i;
else sum -= f_sum / i;
if (sum_before == sum) {
printf("%d\n", i);
break;
}
}
return sum;
}
Other improvements possible. example see More efficient series
First, using std::numeric_limits<unsigned int>::max() will make your code more c++-ish than c-ish. Second, you can use the integral type unsigned long long and std::numeric_limits<unsigned long long>::max() for the limit, which is pretty mush the limit for an integral type. If you want a higher limit, you may use long double. floating points also allows you to use infinity with std::numeric_limits<double>::infinity() note that infinity work with double, float and long double.
If neither of these types provide you the precision you need, look at boost::multiprecision
First of all, the Taylor series for the logarithm function only converges for values of 0 < x < 2, so it's quite possible that the eps precision is never hit.
Secondly, are you sure that it loops forever, instead of hitting the *limit >= UINT_MAX/2 after a very long time?
OP is using the series well outside its usable range of 0.5 x < 2.0 with calls like taylor_log(1e280, n)
Even within the range, x values near the limits of 0.5 and 2.0 converge very slowly needing millions+ of iterations. A precise log() will not result. Best to use the 2x range about 1.0.
Create a wrapper function to call the original function in its sweet range of sqrt(2)/2 < x < sqrt(2). Converges, worst case, with about 40 iterations.
#define SQRT_0_5 0.70710678118654752440084436210485
#define LN2 0.69314718055994530941723212145818
// Valid over the range (0...DBL_MAX]
double taylor_logB(double x, unsigned int n) {
int expo;
double signif = frexp(x, &expo);
if (signif < SQRT_0_5) {
signif *= 2;
expo--;
}
double y = taylor_log(signif,n);
y += expo*LN2;
return y;
}

Given 2^n, find n using logarithm

Given a integer(2^n) which is power of 2, I want to find out n, the index value using logarithm. The formula to find index is : log(number) / log(2). Following is the code snippet :
unsigned long int a;
double apower;
apower = log((double)a) / log((double)2);
I found that value of 'apower' is wrong at some large value of a, I do not know the value, as my code fails, after I submit it. Why is it so? Is there some casting issue?
Following is the entire snippet :
int count = 0;
unsigned long int a,b;
double apower,bpower;
apower = log((double)a) / log((double)2);
bpower = log((double)b) / log((double)2);
count = abs(apower - bpower);
printf("%d\n",count);
Values of a and b will always be power of 2. So apower and bpower must be have 00 in decimal places. That is why, value of count will be int (%d).
I just want to know the behavior of Logarithm.
I am only answering half of your question, because it is not necessary to use logs to solve this. An easy way is to use this:
unsigned long long a = 0x8000000000000000ULL;
int n = 0;
while (a >>= 1) n++;
printf("%d\n", n);
Output:
63
Converting to logs and divding may cause loss of significance, in which case you should use round. You use the word "submit", so it was an online challenge that failed? What exactly did you print? (in this case) 63.000000? That would be got from the default format of %f.
Why not take advantage of the fact that the log2 is stored in the exponent field of a double? :)
unsigned long long a = 0x8000000000000000ULL;
union {
double d;
unsigned long long l;
} u;
u.d = a;
int apower = (u.l >> 52) - 1023;
printf("%d\n", apower);
Output:
63
This assumes that unsigned long longs and doubles are 64 bit and that the input is > 0.
When using double math, the log result or quotient may not be exactly the mathematical result but 1 (or 2) next representable double away.
Calculating log() only returns an exact mathematical result for log(0), all other mathematical results are irrational. All double are rational.
This may result in an answer like 29.999999..., which saved as an int is 29.
Recommend using integer math instead
int mylog2(unsigned long x) {
int y = 0;
#if (ULONG_MAX>>16 > 1lu<<16)
if (x >= 1lu<<32) { x >>= 32; y += 32;
#endif
if (x >= 1lu<<16) { x >>= 16; y += 16; }
if (x >= 1lu<<8) { x >>= 8; y += 8; }
if (x >= 1lu<<4) { x >>= 4; y += 4; }
if (x >= 1lu<<2) { x >>= 2; y += 2; }
if (x >= 1lu<<1) { x >>= 1; y += 1; }
return y;
}

Resources