How to use Modulo efficiently? - c

Im doing a (for myself) very complex task, where i have to calculate the largest possible number of sequences when given a number n of segments.
I found out that the Catalan Number represents this sequences, and i got it to work for n<=32. The results i get should be calculated mod 1.000.000.007. The problem i have is that "q" and "p" get to big for a long long int and i can't just mod 1.000.000.007 before dividing "q" and "p" because i would get a different result.
My question is, is there a really efficient way to solve my problem, or do i have to think about storing the values differently?
My limitations are the following:
- stdio.h/iostream only
- only Integers
- n<=20.000.000
- n>=2
#include <stdio.h>
long long cat(long long l, long long m, long long n);
int main(){
long long n = 0;
long long val;
scanf("%lld", &n);
val = cat(1, 1, n / 2);
printf("%lld", (val));
return 0;
}
long long cat(long long q, long long p, long long n){
if (n == 0) {
return (q / p) % 1000000007;
}
else {
q *= 4 * n - 2;
}
p *= (n + 1);
return cat(q, p, n - 1);
}

To solve this efficiently, you'll want to use modular arithmetic, with modular inverses substituting for division.
It's simple to prove that, in the absence of overflow, (a * b) % c == ((a % c) * b) % c. If we were just multiplying, we could take results mod 1000000007 at every step and always stay within the bounds of a 64-bit integer. The problem is division. (a / b) % c does not necessarily equal ((a % c) / b) % c.
To solve the problem with division, we use modular inverses. For integers a and c with c prime and a % c != 0, we can always find an integer b such that a * b % c == 1. This means we can use multiplication as division. For any integer d divisible by a, (d * b) % c == (d / a) % c. This means that ((d % c) * b) % c == (d / a) % c, so we can reduce intermediate results mod c without screwing up our ability to divide.
The number we want to calculate is of the form (x1 * x2 * x3 * ...) / (y1 * y2 * y3 * ...) % 1000000007. We can instead compute x = x1 % 1000000007 * x2 % 1000000007 * x3 % 1000000007 ... and y = y1 % 1000000007 * y2 % 1000000007 * y3 % 1000000007 ..., then compute the modular inverse z of y using the extended Euclidean algorithm and return (x * z) % 1000000007.

If you're using gcc or clang and a 64-bit target, there exists a __int128 type. This gives you extra bits to work with, but obviously only to a point.
Most likely the easiest way to deal with this kind of issue is to use a "bignum" library, i.e. a library that deals with representing and doing arithmetic on arbitrarily large numbers. The arguably most popular open source example is libgmp - you should be able to get your algorithm going quite easily with that. It's also tuned to high performance standards.
Obviously you can reimplement this yourself, by representing your numbers as e.g. arrays of integers of a certain size. You'll have to implement algorithms for doing basic arithmetic such as +, -, *, /, % yourself. If you want to do this as a learning experience that's fine, but there's no shame in using libgmp if you just want to focus on the algorithm you're trying to implement.

Related

How to efficiently verify whether pow(a, b) % b == a in C (without overflow)

I'd like to verify whether
pow(a, b) % b == a
is true in C, with 2 ≤ b ≤ 32768 (215) and 2 ≤ a ≤ b with a and b being integers.
However, directly computing pow(a, b) % b with b being a large number, this will quickly cause C to overflow. What would be a trick/efficient way of verifying whether this condition holds?
This question is based on finding a witness for Fermat's little theorem, which states that if this condition is false, b is not prime.
Also, I am also limited in the time it may take, it can't be too slow (near or over 2 seconds). The biggest Carmichael number, a number b that's not prime but also doesn't satisfy pow(a, b)% b == a with 2 <= a <= b (with b <= 32768) is 29341. Thus the method for checking pow(a, b) % b == a with 2 <= a <= 29341 shouldn't be too slow.
You can use the Exponentiation by squaring method.
The idea is the following:
Decompose b in binary form and decompose the product
Notice that we always use %b which is below 32768, so the result will always fit in a 32 bit number.
So the C code is:
/*
* this function computes (num ** pow) % mod
*/
int pow_mod(int num, int pow, int mod)
{
int res = 1
while (pow>0)
{
if (pow & 1)
{
res = (res*num) % mod;
}
pow /= 2;
num = (num*num)%mod;
}
return res;
}
You are doing modular arithmetic in Z/bZ.
Note that, in a quotient ring, the n-th power of the class of an element is the class of the n-th power of the element, so we have the following result:
(a^b) mod b = ((((a mod b) * a) mod b) * a) mod b [...] (b times)
So, you do not need a big integer library.
You can simply write a C program using the following algorithm (pseudo-code):
declare your variables a and b as integers.
use a temporary variable temp that is initialized with a.
do a loop with b steps, and compute (temp * a) mod b at each step, to get the new temp value.
compare the result with a.
With this formula, you can see that the highest value for temp is 32768, so you can choose an integer to store temp.

How to calculate nPr mod m, when m is a prime number?

I need to find out the value of nPr%m.
This is the approach I used.
Find, n!%m, (n-r)!%m and divide them
However, for certain cases, (n-r)!%m is greater than n!%m, so the resultant nPr is 0.
What do I need to do then?
This is more a math question than a programming question, but anyway.
Note that
n! / (n - r)! = n * (n - 1) * ... * (n - r + 1)
Now for multiplication,
(a * b * c) % m = (((a * b) % m) * c) % m
i.e. rather than mod ming the entire product, you can mod m the intermediate result of any multiplication of two factors in the product.
I won't provide the full code here, but hopefully this will be enough for you to figure it out.

bitwise division by multiples of 2

I found many posts about bitwise division and I completely understand most bitwise usage but I can't think of a specific division. I want to divide a given number (lets say 100) with all the multiples of 2 possible (ATTENTION: I don't want to divide with powers of 2 bit multiples!)
For example: 100/2, 100/4, 100/6, 100/8, 100/10...100/100
Also I know that because of using unsigned int the answers will be rounded for example 100/52=0 but it doesn't really matter, because I can both skip those answers or print them, no problem. My concern is mostly how I can divide with 6 or 10, etc. (multiples of 2). There is need for it to be done in C, because I can manage to transform any code you give me from Java to C.
Following the math shown for the accepted solution to the division by 3 question, you can derive a recurrence for the division algorithm:
To compute (int)(X / Y)
Let k be such that 2k &geq; Y and 2k-1 < Y
(note, 2k = (1 << k))
Let d = 2k - Y
Then, if A = (int)(X / 2k) and B = X % 2k,
X = (1 << k) * A + B
= (1 << k) * A - Y * A + Y * A + B
= d * A + Y * A + B
= Y * A + (d * A + B)
Thus,
X/Y = A + (d * A + B)/Y
In otherwords,
If S(X, Y) := X/Y, then S(X, Y) := A + S(d * A + B, Y).
This recurrence can be implemented with a simple loop. The stopping condition for the loop is when the numerator falls below 2k. The function divu implements the recurrence, using only bitwise operators and using unsigned types. Helper functions for the math operations are left unimplemented, but shouldn't be too hard (the linked answer provides a full add implementation already). The rs() function is for "right-shift", which does sign extension on the unsigned input. The function div is the actual API for int, and checks for divide by zero and negative y before delegating to divu. negate does 2's complement negation.
static unsigned divu (unsigned x, unsigned y) {
unsigned k = 0;
unsigned pow2 = 0;
unsigned mask = 0;
unsigned diff = 0;
unsigned sum = 0;
while ((1 << k) < y) k = add(k, 1);
pow2 = (1 << k);
mask = sub(pow2, 1);
diff = sub(pow2, y);
while (x >= pow2) {
sum = add(sum, rs(x, k));
x = add(mul(diff, rs(x, k)), (x & mask));
}
if (x >= y) sum = add(sum, 1);
return sum;
}
int div (int x, int y) {
assert(y);
if (y > 0) return divu(x, y);
return negate(divu(x, negate(y)));
}
This implementation depends on signed int using 2's complement. For maximal portability, div should convert negative arguments to 2's complement before calling divu. Then, it should convert the result from divu back from 2's complement to the native signed representation.
The following code works for positive numbers. When the dividend or the divisor or both are negative, have flags to change the sign of the answer appropriately.
int divi(long long m, long long n)
{
if(m==0 || n==0 || m<n)
return 0;
long long a,b;
int f=0;
a=n;b=1;
while(a<=m)
{
b = b<<1;
a = a<<1;
f=1;
}
if(f)
{
b = b>>1;
a = a>>1;
}
b = b + divi(m-a,n);
return b;
}
Use the operator / for integer division as much as you can.
For instance, when you want to divide 100 by 6 or 10 you should write 100/6 or 100/10.
When you mention bit wise division do you (1) mean an implementation of operator / or (2) you are referring to the division by a power of two number.
For (1) a processor should have an integer division unit. If not the compiler should provide a good implementation.
For (2) you can use 100>>2 instead of 100/4. If the numerator is known at compile time then a good compiler should automatically use the shift instruction.

Calculating natural logarithm and exponent by core C for Embedded System

I need to write two functions in C language to calculate natural log and to calculate exponent which will be executed in embedded system (Microcontroller). I am not going to use any library function rather I need to write those function by using core C instruction.
You'll have to learn/use some calculus in order to do this:
http://en.wikipedia.org/wiki/Natural_logarithm#Derivative.2C_Taylor_series
Not very difficult to implement (unless you know ranges, I would say use a Maclaurin series, which, if memory serves correctly, should work well), but, little mistakes lead to big problems.
I would agree with Dhaivat that approximation via Taylor or Maclaurin series is the way to go should you need to implement natural logarithm yourself for an embedded system.
As to exponentiation, you might want to look here:
The most efficient way to implement an integer based power function pow(int, int)
Good luck,
The two usual solutions are Taylor series and lookup tables.
Choosing one over the other depends on two main aspects:
maximum speed: lookup table wins
minimum memory: Taylor serie wins
It is also guided by other aspects that impact the first two ones:
range of input values
precision
If precision can be loose, you may consider using a trick with floating point values: the exponent part of a value x actually is an approximation of log2(x). Switching to/from log2() and ln() is easy if you know ln(2).
The computation of logarithms are possible using division and multiplication in C :
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}

Efficient implementation of natural logarithm (ln) and exponentiation

I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory.
The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.
EDIT
The algorithm should be correct up to 3 decimal places.
Using Taylor series is not the simplest neither the fastest way of doing this. Most professional implementations are using approximating polynomials. I'll show you how to generate one in Maple (it is a computer algebra program), using the Remez algorithm.
For 3 digits of accuracy execute the following commands in Maple:
with(numapprox):
Digits := 8
minimax(ln(x), x = 1 .. 2, 4, 1, 'maxerror')
maxerror
Its response is the following polynomial:
-1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x
With the maximal error of: 0.000061011436
We generated a polynomial which approximates the ln(x), but only inside the [1..2] interval. Increasing the interval is not wise, because that would increase the maximal error even more. Instead of that, do the following decomposition:
So first find the highest power of 2, which is still smaller than the number (See: What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?). That number is actually the base-2 logarithm. Divide with that value, then the result gets into the 1..2 interval. At the end we will have to add n*ln(2) to get the final result.
An example implementation for numbers >= 1:
float ln(float y) {
int log2;
float divisor, x, result;
log2 = msb((int)y); // See: https://stackoverflow.com/a/4970859/6630230
divisor = (float)(1 << log2);
x = y / divisor; // normalized value between [1.0, 2.0]
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
result += ((float)log2) * 0.69314718; // ln(2) = 0.69314718
return result;
}
Although if you plan to use it only in the [1.0, 2.0] interval, then the function is like:
float ln(float x) {
return -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
}
The Taylor series for e^x converges extremely quickly, and you can tune your implementation to the precision that you need. (http://en.wikipedia.org/wiki/Taylor_series)
The Taylor series for log is not as nice...
If you don't need floating-point math for anything else, you may compute an approximate fractional base-2 log pretty easily. Start by shifting your value left until it's 32768 or higher and store the number of times you did that in count. Then, repeat some number of times (depending upon your desired scale factor):
n = (mult(n,n) + 32768u) >> 16; // If a function is available for 16x16->32 multiply
count<<=1;
if (n < 32768) n*=2; else count+=1;
If the above loop is repeated 8 times, then the log base 2 of the number will be count/256. If ten times, count/1024. If eleven, count/2048. Effectively, this function works by computing the integer power-of-two logarithm of n**(2^reps), but with intermediate values scaled to avoid overflow.
Would basic table with interpolation between values approach work? If ranges of values are limited (which is likely for your case - I doubt temperature readings have huge range) and high precisions is not required it may work. Should be easy to test on normal machine.
Here is one of many topics on table representation of functions: Calculating vs. lookup tables for sine value performance?
Necromancing.
I had to implement logarithms on rational numbers.
This is how I did it:
Occording to Wikipedia, there is the Halley-Newton approximation method
which can be used for very-high precision.
Using Newton's method, the iteration simplifies to (implementation), which has cubic convergence to ln(x), which is way better than what the Taylor-Series offers.
// Using Newton's method, the iteration simplifies to (implementation)
// which has cubic convergence to ln(x).
public static double ln(double x, double epsilon)
{
double yn = x - 1.0d; // using the first term of the taylor series as initial-value
double yn1 = yn;
do
{
yn = yn1;
yn1 = yn + 2 * (x - System.Math.Exp(yn)) / (x + System.Math.Exp(yn));
} while (System.Math.Abs(yn - yn1) > epsilon);
return yn1;
}
This is not C, but C#, but I'm sure anybody capable to program in C will be able to deduce the C-Code from that.
Furthermore, since
logn(x) = ln(x)/ln(n).
You have therefore just implemented logN as well.
public static double log(double x, double n, double epsilon)
{
return ln(x, epsilon) / ln(n, epsilon);
}
where epsilon (error) is the minimum precision.
Now as to speed, you're probably better of using the ln-cast-in-hardware, but as I said, I used this as a base to implement logarithms on a rational numbers class working with arbitrary precision.
Arbitrary precision might be more important than speed, under certain circumstances.
Then, use the logarithmic identities for rational numbers:
logB(x/y) = logB(x) - logB(y)
In addition to Crouching Kitten's answer which gave me inspiration, you can build a pseudo-recursive (at most 1 self-call) logarithm to avoid using polynomials. In pseudo code
ln(x) :=
If (x <= 0)
return NaN
Else if (!(1 <= x < 2))
return LN2 * b + ln(a)
Else
return taylor_expansion(x - 1)
This is pretty efficient and precise since on [1; 2) the taylor series converges A LOT faster, and we get such a number 1 <= a < 2 with the first call to ln if our input is positive but not in this range.
You can find 'b' as your unbiased exponent from the data held in the float x, and 'a' from the mantissa of the float x (a is exactly the same float as x, but now with exponent biased_0 rather than exponent biased_b). LN2 should be kept as a macro in hexadecimal floating point notation IMO. You can also use http://man7.org/linux/man-pages/man3/frexp.3.html for this.
Also, the trick
unsigned long tmp = *(ulong*)(&d);
for "memory-casting" double to unsigned long, rather than "value-casting", is very useful to know when dealing with floats memory-wise, as bitwise operators will cause warnings or errors depending on the compiler.
Possible computation of ln(x) and expo(x) in C without <math.h> :
static double expo(double n) {
int a = 0, b = n > 0;
double c = 1, d = 1, e = 1;
for (b || (n = -n); e + .00001 < (e += (d *= n) / (c *= ++a)););
// approximately 15 iterations
return b ? e : 1 / e;
}
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}
Try it Online
Building off #Crouching Kitten's great natural log answer above, if you need it to be accurate for inputs <1 you can add a simple scaling factor. Below is an example in C++ that i've used in microcontrollers. It has a scaling factor of 256 and it's accurate to inputs down to 1/256 = ~0.04, and up to 2^32/256 = 16777215 (due to overflow of a uint32 variable).
It's interesting to note that even on an STMF103 Arm M3 with no FPU, the float implementation below is significantly faster (eg 3x or better) than the 16 bit fixed-point implementation in libfixmath (that being said, this float implementation still takes a few thousand cycles so it's still not ~fast~)
#include <float.h>
float TempSensor::Ln(float y)
{
// Algo from: https://stackoverflow.com/a/18454010
// Accurate between (1 / scaling factor) < y < (2^32 / scaling factor). Read comments below for more info on how to extend this range
float divisor, x, result;
const float LN_2 = 0.69314718; //pre calculated constant used in calculations
uint32_t log2 = 0;
//handle if input is less than zero
if (y <= 0)
{
return -FLT_MAX;
}
//scaling factor. The polynomial below is accurate when the input y>1, therefore using a scaling factor of 256 (aka 2^8) extends this to 1/256 or ~0.04. Given use of uint32_t, the input y must stay below 2^24 or 16777216 (aka 2^(32-8)), otherwise uint_y used below will overflow. Increasing the scaing factor will reduce the lower accuracy bound and also reduce the upper overflow bound. If you need the range to be wider, consider changing uint_y to a uint64_t
const uint32_t SCALING_FACTOR = 256;
const float LN_SCALING_FACTOR = 5.545177444; //this is the natural log of the scaling factor and needs to be precalculated
y = y * SCALING_FACTOR;
uint32_t uint_y = (uint32_t)y;
while (uint_y >>= 1) // Convert the number to an integer and then find the location of the MSB. This is the integer portion of Log2(y). See: https://stackoverflow.com/a/4970859/6630230
{
log2++;
}
divisor = (float)(1 << log2);
x = y / divisor; // FInd the remainder value between [1.0, 2.0] then calculate the natural log of this remainder using a polynomial approximation
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x; //This polynomial approximates ln(x) between [1,2]
result = result + ((float)log2) * LN_2 - LN_SCALING_FACTOR; // Using the log product rule Log(A) + Log(B) = Log(AB) and the log base change rule log_x(A) = log_y(A)/Log_y(x), calculate all the components in base e and then sum them: = Ln(x_remainder) + (log_2(x_integer) * ln(2)) - ln(SCALING_FACTOR)
return result;
}

Resources