I need to compute the mathematical expression floor(ln(u)/ln(1-p)) for 0 < u < 1 and 0 < p < 1 in C on an embedded processor with no floating point arithmetics and no ln function. The result is a positive integer. I know about the limit cases (p=0), I'll deal with them later...
I imagine that the solution involves having u and p range over 0..UINT16_MAX, and appeal to a lookup table for the logarithm, but I cannot figure out how exactly: what does the lookup table map to?
The result needs not be 100% exact, approximations are OK.
Thanks!
Since the logarithm is used in both dividend and divisor, there is no need to use log(); we can use log2() instead. Due to the restrictions on the inputs u and p the logarithms are known to be both negative, so we can restrict ourselves to compute the positive quantity -log2().
We can use fixed-point arithmetic to compute the logarithm. We do so by multiplying the original input by a sequence of factors of decreasing magnitude that approach 1. Considering each of the factor in sequence, we multiply the input only by those factors that result in a product closer to 1, but without exceeding it. While doing so, we sum the log2() of the factors that "fit". At the end of this procedure we wind up with a number very close to 1 as our final product, and a sum that represents the binary logarithm.
This process is known in the literature as multiplicative normalization or pseudo division, and some early publications describing it are the works by De Lugish and Meggitt. The latter indicates that the origin is basically Henry Briggs's method for computing common logarithms.
B. de Lugish. "A Class of Algorithms for Automatic Evaluation of Functions and Computations in a Digital Computer". PhD thesis, Dept. of Computer Science, University of Illinois, Urbana, 1970.
J. E. Meggitt. "Pseudo division and pseudo multiplication processes". IBM Journal of Research and Development, Vol. 6, No. 2, April 1962, pp. 210-226
As the chosen set of factors comprises 2i and (1+2-i) the necessary multiplications can be performed without the need for a multiplication instruction: the products can be computed by either shift or shift plus add.
Since the inputs u and p are purely fractional numbers with 16 bits, we may want to chose a 5.16 fixed-point result for the logarithm. By simply dividing the two logarithm values, we remove the fixed-point scale factor, and apply a floor() operation at the same time, because for positive numbers, floor(x) is identical to trunc(x) and integer division is truncating.
Note that the fixed-point computation of the logarithm results in large relative error for inputs near 1. This in turn means the entire function computed using fixed-point arithmetic may deliver results significantly different from the reference if p is small. An example of this is the following test case: u=55af p=0052 res=848 ref=874.
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
/* input x is a 0.16 fixed-point number in [0,1)
function returns -log2(x) as a 5.16 fixed-point number in (0, 16]
*/
uint32_t nlog2_16 (uint16_t x)
{
uint32_t r = 0;
uint32_t t, a = x;
/* try factors 2**i with i = 8, 4, 2, 1 */
if ((t = a << 8 ) < 0x10000) { a = t; r += 0x80000; }
if ((t = a << 4 ) < 0x10000) { a = t; r += 0x40000; }
if ((t = a << 2 ) < 0x10000) { a = t; r += 0x20000; }
if ((t = a << 1 ) < 0x10000) { a = t; r += 0x10000; }
/* try factors (1+2**(-i)) with i = 1, .., 16 */
if ((t = a + (a >> 1)) < 0x10000) { a = t; r += 0x095c0; }
if ((t = a + (a >> 2)) < 0x10000) { a = t; r += 0x0526a; }
if ((t = a + (a >> 3)) < 0x10000) { a = t; r += 0x02b80; }
if ((t = a + (a >> 4)) < 0x10000) { a = t; r += 0x01664; }
if ((t = a + (a >> 5)) < 0x10000) { a = t; r += 0x00b5d; }
if ((t = a + (a >> 6)) < 0x10000) { a = t; r += 0x005ba; }
if ((t = a + (a >> 7)) < 0x10000) { a = t; r += 0x002e0; }
if ((t = a + (a >> 8)) < 0x10000) { a = t; r += 0x00171; }
if ((t = a + (a >> 9)) < 0x10000) { a = t; r += 0x000b8; }
if ((t = a + (a >> 10)) < 0x10000) { a = t; r += 0x0005c; }
if ((t = a + (a >> 11)) < 0x10000) { a = t; r += 0x0002e; }
if ((t = a + (a >> 12)) < 0x10000) { a = t; r += 0x00017; }
if ((t = a + (a >> 13)) < 0x10000) { a = t; r += 0x0000c; }
if ((t = a + (a >> 14)) < 0x10000) { a = t; r += 0x00006; }
if ((t = a + (a >> 15)) < 0x10000) { a = t; r += 0x00003; }
if ((t = a + (a >> 16)) < 0x10000) { a = t; r += 0x00001; }
return r;
}
/* Compute floor(log(u)/log(1-p)) for 0 < u < 1 and 0 < p < 1,
where 'u' and 'p' are represented as 0.16 fixed-point numbers
Result is an integer in range [0, 1048676]
*/
uint32_t func (uint16_t u, uint16_t p)
{
uint16_t one_minus_p = 0x10000 - p; // 1.0 - p
uint32_t log_u = nlog2_16 (u);
uint32_t log_p = nlog2_16 (one_minus_p);
uint32_t res = log_u / log_p; // divide and floor in one go
return res;
}
The maximum value of this function basically depends on the precision limit; that is, how arbitrarily close to the limits (u -> 0) or (1 - p -> 1) the fixed point values can be.
If we assume (k) fractional bits, e.g., with the limits: u = (2^-k) and 1 - p = 1 - (2^-k),
then the maximum value is: k / (k - log2(2^k - 1))
(As the ratio of natural logarithms, we are free to use any base e.g., lb(x) or log2)
Unlike njuffa's answer, I went with a lookup table approach, settling on k = 10 fractional bits to represent 0 < frac(u) < 1024 and 0 < frac(p) < 1024. This requires a log table with 2^k entries. Using 32-bit table values, we're only looking at a 4KiB table.
Any more than that, and you are using enough memory that you could seriously consider using the relevant parts of a 'soft-float' library. e.g., k = 16 would yield a 256KiB LUT.
We're computing the values - log2(i / 1024.0) for 0 < i < 1024. Since these values are in the open interval (0, k), we only need 4 binary digits to store the integral part. So we store the precomputed LUT in 32-bit [4.28] fixed-point format:
uint32_t lut[1024]; /* never use lut[0] */
for (uint32_t i = 1; i < 1024; i++)
lut[i] = (uint32_t) (- (log2(i / 1024.0) * (268435456.0));
Given: u, p represented by [0.10] fixed-point values in [1, 1023] :
uint32_t func (uint16_t u, uint16_t p)
{
/* assert: 0 < u, p < 1024 */
return lut[u] / lut[1024 - p];
}
We can easily test all valid (u, p) pairs against the 'naive' floating-point evaluation:
floor(log(u / 1024.0) / log(1.0 - p / 1024.0))
and only get a mismatch (+1 too high) on the following cases:
u = 193, p = 1 : 1708 vs 1707 (1.7079978488147417e+03)
u = 250, p = 384 : 3 vs 2 (2.9999999999999996e+00)
u = 413, p = 4 : 232 vs 231 (2.3199989016957960e+02)
u = 603, p = 1 : 542 vs 541 (5.4199909906444600e+02)
u = 680, p = 1 : 419 vs 418 (4.1899938077226307e+02)
Finally, it turns out that using the natural logarithm in a [3.29] fixed-point format gives us even higher precision, where:
lut[i] = (uint32_t) (- (log(i / 1024.0) * (536870912.0));
only yields a single 'mismatch', though 'bignum' precision suggests it's correct:
u = 250, p = 384 : 3 vs 2 (2.9999999999999996e+00)
What is the most efficient way given to raise an integer to the power of another integer in C?
// 2^3
pow(2,3) == 8
// 5^5
pow(5,5) == 3125
Exponentiation by squaring.
int ipow(int base, int exp)
{
int result = 1;
for (;;)
{
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
break;
base *= base;
}
return result;
}
This is the standard method for doing modular exponentiation for huge numbers in asymmetric cryptography.
Note that exponentiation by squaring is not the most optimal method. It is probably the best you can do as a general method that works for all exponent values, but for a specific exponent value there might be a better sequence that needs fewer multiplications.
For instance, if you want to compute x^15, the method of exponentiation by squaring will give you:
x^15 = (x^7)*(x^7)*x
x^7 = (x^3)*(x^3)*x
x^3 = x*x*x
This is a total of 6 multiplications.
It turns out this can be done using "just" 5 multiplications via addition-chain exponentiation.
n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15
There are no efficient algorithms to find this optimal sequence of multiplications. From Wikipedia:
The problem of finding the shortest addition chain cannot be solved by dynamic programming, because it does not satisfy the assumption of optimal substructure. That is, it is not sufficient to decompose the power into smaller powers, each of which is computed minimally, since the addition chains for the smaller powers may be related (to share computations). For example, in the shortest addition chain for a¹⁵ above, the subproblem for a⁶ must be computed as (a³)² since a³ is re-used (as opposed to, say, a⁶ = a²(a²)², which also requires three multiplies).
If you need to raise 2 to a power. The fastest way to do so is to bit shift by the power.
2 ** 3 == 1 << 3 == 8
2 ** 30 == 1 << 30 == 1073741824 (A Gigabyte)
Here is the method in Java
private int ipow(int base, int exp)
{
int result = 1;
while (exp != 0)
{
if ((exp & 1) == 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
An extremely specialized case is, when you need say 2^(-x to the y), where x, is of course is negative and y is too large to do shifting on an int. You can still do 2^x in constant time by screwing with a float.
struct IeeeFloat
{
unsigned int base : 23;
unsigned int exponent : 8;
unsigned int signBit : 1;
};
union IeeeFloatUnion
{
IeeeFloat brokenOut;
float f;
};
inline float twoToThe(char exponent)
{
// notice how the range checking is already done on the exponent var
static IeeeFloatUnion u;
u.f = 2.0;
// Change the exponent part of the float
u.brokenOut.exponent += (exponent - 1);
return (u.f);
}
You can get more powers of 2 by using a double as the base type.
(Thanks a lot to commenters for helping to square this post away).
There's also the possibility that learning more about IEEE floats, other special cases of exponentiation might present themselves.
power() function to work for Integers Only
int power(int base, unsigned int exp){
if (exp == 0)
return 1;
int temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else
return base*temp*temp;
}
Complexity = O(log(exp))
power() function to work for negative exp and float base.
float power(float base, int exp) {
if( exp == 0)
return 1;
float temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else {
if(exp > 0)
return base*temp*temp;
else
return (temp*temp)/base; //negative exponent computation
}
}
Complexity = O(log(exp))
If you want to get the value of an integer for 2 raised to the power of something it is always better to use the shift option:
pow(2,5) can be replaced by 1<<5
This is much more efficient.
int pow( int base, int exponent)
{ // Does not work for negative exponents. (But that would be leaving the range of int)
if (exponent == 0) return 1; // base case;
int temp = pow(base, exponent/2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
Just as a follow up to comments on the efficiency of exponentiation by squaring.
The advantage of that approach is that it runs in log(n) time. For example, if you were going to calculate something huge, such as x^1048575 (2^20 - 1), you only have to go thru the loop 20 times, not 1 million+ using the naive approach.
Also, in terms of code complexity, it is simpler than trying to find the most optimal sequence of multiplications, a la Pramod's suggestion.
Edit:
I guess I should clarify before someone tags me for the potential for overflow. This approach assumes that you have some sort of hugeint library.
Late to the party:
Below is a solution that also deals with y < 0 as best as it can.
It uses a result of intmax_t for maximum range. There is no provision for answers that do not fit in intmax_t.
powjii(0, 0) --> 1 which is a common result for this case.
pow(0,negative), another undefined result, returns INTMAX_MAX
intmax_t powjii(int x, int y) {
if (y < 0) {
switch (x) {
case 0:
return INTMAX_MAX;
case 1:
return 1;
case -1:
return y % 2 ? -1 : 1;
}
return 0;
}
intmax_t z = 1;
intmax_t base = x;
for (;;) {
if (y % 2) {
z *= base;
}
y /= 2;
if (y == 0) {
break;
}
base *= base;
}
return z;
}
This code uses a forever loop for(;;) to avoid the final base *= base common in other looped solutions. That multiplication is 1) not needed and 2) could be int*int overflow which is UB.
more generic solution considering negative exponenet
private static int pow(int base, int exponent) {
int result = 1;
if (exponent == 0)
return result; // base case;
if (exponent < 0)
return 1 / pow(base, -exponent);
int temp = pow(base, exponent / 2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
The O(log N) solution in Swift...
// Time complexity is O(log N)
func power(_ base: Int, _ exp: Int) -> Int {
// 1. If the exponent is 1 then return the number (e.g a^1 == a)
//Time complexity O(1)
if exp == 1 {
return base
}
// 2. Calculate the value of the number raised to half of the exponent. This will be used to calculate the final answer by squaring the result (e.g a^2n == (a^n)^2 == a^n * a^n). The idea is that we can do half the amount of work by obtaining a^n and multiplying the result by itself to get a^2n
//Time complexity O(log N)
let tempVal = power(base, exp/2)
// 3. If the exponent was odd then decompose the result in such a way that it allows you to divide the exponent in two (e.g. a^(2n+1) == a^1 * a^2n == a^1 * a^n * a^n). If the eponent is even then the result must be the base raised to half the exponent squared (e.g. a^2n == a^n * a^n = (a^n)^2).
//Time complexity O(1)
return (exp % 2 == 1 ? base : 1) * tempVal * tempVal
}
int pow(int const x, unsigned const e) noexcept
{
return !e ? 1 : 1 == e ? x : (e % 2 ? x : 1) * pow(x * x, e / 2);
//return !e ? 1 : 1 == e ? x : (((x ^ 1) & -(e % 2)) ^ 1) * pow(x * x, e / 2);
}
Yes, it's recursive, but a good optimizing compiler will optimize recursion away.
One more implementation (in Java). May not be most efficient solution but # of iterations is same as that of Exponential solution.
public static long pow(long base, long exp){
if(exp ==0){
return 1;
}
if(exp ==1){
return base;
}
if(exp % 2 == 0){
long half = pow(base, exp/2);
return half * half;
}else{
long half = pow(base, (exp -1)/2);
return base * half * half;
}
}
I use recursive, if the exp is even,5^10 =25^5.
int pow(float base,float exp){
if (exp==0)return 1;
else if(exp>0&&exp%2==0){
return pow(base*base,exp/2);
}else if (exp>0&&exp%2!=0){
return base*pow(base,exp-1);
}
}
In addition to the answer by Elias, which causes Undefined Behaviour when implemented with signed integers, and incorrect values for high input when implemented with unsigned integers,
here is a modified version of the Exponentiation by Squaring that also works with signed integer types, and doesn't give incorrect values:
#include <stdint.h>
#define SQRT_INT64_MAX (INT64_C(0xB504F333))
int64_t alx_pow_s64 (int64_t base, uint8_t exp)
{
int_fast64_t base_;
int_fast64_t result;
base_ = base;
if (base_ == 1)
return 1;
if (!exp)
return 1;
if (!base_)
return 0;
result = 1;
if (exp & 1)
result *= base_;
exp >>= 1;
while (exp) {
if (base_ > SQRT_INT64_MAX)
return 0;
base_ *= base_;
if (exp & 1)
result *= base_;
exp >>= 1;
}
return result;
}
Considerations for this function:
(1 ** N) == 1
(N ** 0) == 1
(0 ** 0) == 1
(0 ** N) == 0
If any overflow or wrapping is going to take place, return 0;
I used int64_t, but any width (signed or unsigned) can be used with little modification. However, if you need to use a non-fixed-width integer type, you will need to change SQRT_INT64_MAX by (int)sqrt(INT_MAX) (in the case of using int) or something similar, which should be optimized, but it is uglier, and not a C constant expression. Also casting the result of sqrt() to an int is not very good because of floating point precission in case of a perfect square, but as I don't know of any implementation where INT_MAX -or the maximum of any type- is a perfect square, you can live with that.
I have implemented algorithm that memorizes all computed powers and then uses them when need. So for example x^13 is equal to (x^2)^2^2 * x^2^2 * x where x^2^2 it taken from the table instead of computing it once again. This is basically implementation of #Pramod answer (but in C#).
The number of multiplication needed is Ceil(Log n)
public static int Power(int base, int exp)
{
int tab[] = new int[exp + 1];
tab[0] = 1;
tab[1] = base;
return Power(base, exp, tab);
}
public static int Power(int base, int exp, int tab[])
{
if(exp == 0) return 1;
if(exp == 1) return base;
int i = 1;
while(i < exp/2)
{
if(tab[2 * i] <= 0)
tab[2 * i] = tab[i] * tab[i];
i = i << 1;
}
if(exp <= i)
return tab[i];
else return tab[i] * Power(base, exp - i, tab);
}
Here is a O(1) algorithm for calculating x ** y, inspired by this comment. It works for 32-bit signed int.
For small values of y, it uses exponentiation by squaring. For large values of y, there are only a few values of x where the result doesn't overflow. This implementation uses a lookup table to read the result without calculating.
On overflow, the C standard permits any behavior, including crash. However, I decided to do bound-checking on LUT indices to prevent memory access violation, which could be surprising and undesirable.
Pseudo-code:
If `x` is between -2 and 2, use special-case formulas.
Otherwise, if `y` is between 0 and 8, use special-case formulas.
Otherwise:
Set x = abs(x); remember if x was negative
If x <= 10 and y <= 19:
Load precomputed result from a lookup table
Otherwise:
Set result to 0 (overflow)
If x was negative and y is odd, negate the result
C code:
#define POW9(x) x * x * x * x * x * x * x * x * x
#define POW10(x) POW9(x) * x
#define POW11(x) POW10(x) * x
#define POW12(x) POW11(x) * x
#define POW13(x) POW12(x) * x
#define POW14(x) POW13(x) * x
#define POW15(x) POW14(x) * x
#define POW16(x) POW15(x) * x
#define POW17(x) POW16(x) * x
#define POW18(x) POW17(x) * x
#define POW19(x) POW18(x) * x
int mypow(int x, unsigned y)
{
static int table[8][11] = {
{POW9(3), POW10(3), POW11(3), POW12(3), POW13(3), POW14(3), POW15(3), POW16(3), POW17(3), POW18(3), POW19(3)},
{POW9(4), POW10(4), POW11(4), POW12(4), POW13(4), POW14(4), POW15(4), 0, 0, 0, 0},
{POW9(5), POW10(5), POW11(5), POW12(5), POW13(5), 0, 0, 0, 0, 0, 0},
{POW9(6), POW10(6), POW11(6), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(7), POW10(7), POW11(7), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(8), POW10(8), 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(9), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(10), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int is_neg;
int r;
switch (x)
{
case 0:
return y == 0 ? 1 : 0;
case 1:
return 1;
case -1:
return y % 2 == 0 ? 1 : -1;
case 2:
return 1 << y;
case -2:
return (y % 2 == 0 ? 1 : -1) << y;
default:
switch (y)
{
case 0:
return 1;
case 1:
return x;
case 2:
return x * x;
case 3:
return x * x * x;
case 4:
r = x * x;
return r * r;
case 5:
r = x * x;
return r * r * x;
case 6:
r = x * x;
return r * r * r;
case 7:
r = x * x;
return r * r * r * x;
case 8:
r = x * x;
r = r * r;
return r * r;
default:
is_neg = x < 0;
if (is_neg)
x = -x;
if (x <= 10 && y <= 19)
r = table[x - 3][y - 9];
else
r = 0;
if (is_neg && y % 2 == 1)
r = -r;
return r;
}
}
}
My case is a little different, I'm trying to create a mask from a power, but I thought I'd share the solution I found anyway.
Obviously, it only works for powers of 2.
Mask1 = 1 << (Exponent - 1);
Mask2 = Mask1 - 1;
return Mask1 + Mask2;
In case you know the exponent (and it is an integer) at compile-time, you can use templates to unroll the loop. This can be made more efficient, but I wanted to demonstrate the basic principle here:
#include <iostream>
template<unsigned long N>
unsigned long inline exp_unroll(unsigned base) {
return base * exp_unroll<N-1>(base);
}
We terminate the recursion using a template specialization:
template<>
unsigned long inline exp_unroll<1>(unsigned base) {
return base;
}
The exponent needs to be known at runtime,
int main(int argc, char * argv[]) {
std::cout << argv[1] <<"**5= " << exp_unroll<5>(atoi(argv[1])) << ;std::endl;
}
I've noticed something strange about the standard exponential squaring algorithm with gnu-GMP :
I implemented 2 nearly-identical functions - a power-modulo function using the most vanilla binary exponential squaring algorithm,
labeled ______2()
then another one basically the same concept, but re-mapped to dividing by 10 at each round instead of dividing by 2,
labeled ______10()
.
( time ( jot - 1456 9999999999 6671 | pvE0 |
gawk -Mbe '
function ______10(_, __, ___, ____, _____, _______) {
__ = +__
____ = (____+=_____=____^= \
(_ %=___=+___)<_)+____++^____—
while (__) {
if (_______= __%____) {
if (__==_______) {
return (_^__ *_____) %___
}
__-=_______
_____ = (_^_______*_____) %___
}
__/=____
_ = _^____%___
}
}
function ______2(_, __, ___, ____, _____) {
__=+__
____+=____=_____^=(_%=___=+___)<_
while (__) {
if (__ %____) {
if (__<____) {
return (_*_____) %___
}
_____ = (_____*_) %___
--__
}
__/=____
_= (_*_) %___
}
}
BEGIN {
OFMT = CONVFMT = "%.250g"
__ = (___=_^= FS=OFS= "=")(_<_)
_____ = __^(_=3)^--_ * ++_-(_+_)^_
______ = _^(_+_)-_ + _^!_
_______ = int(______*_____)
________ = 10 ^ 5 + 1
_________ = 8 ^ 4 * 2 - 1
}
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
.
($++NF = ______10(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:08 [6.02MiB/s] [6.02MiB/s] [ <=> ]
in0: 15.6MiB 0:00:08 [1.95MiB/s] [1.95MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
8.31s user 0.06s system 103% cpu 8.058 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
($++NF = ______2(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:12 [3.78MiB/s] [3.78MiB/s] [<=> ]
in0: 15.6MiB 0:00:12 [1.22MiB/s] [1.22MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
13.05s user 0.07s system 102% cpu 12.821 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
For reasons extremely counter-intuitive and unknown to me, for a wide variety of inputs i threw at it, the div-10 variant is nearly always faster. It's the matching of hashes between the 2 that made it truly baffling, despite computers obviously not being built in and for a base-10 paradigm.
Am I missing something critical or obvious in the code/approach that might be skewing the results in a confounding manner ? Thanks.