Efficient exponentials with small base - c

I need to perform a softmax operation. That is, given a sequence of n real values ranging from -inf to +inf, I turn them into probabilities by exponentianting each value and dividing for the sum of exponentials:
for (i = 0; i < n; i++)
p_x[i] = exp(x[i]) / sum_exp(x, n)
(don't take the code literally, I'm not summing up all exp's every iteration!)
I'm having overflow problems when values go above 700 in some extreme cases (using 8-bytes doubles). I know I could use another base instead of e, however, I'm afraid calling pow will be much slower than exp (speed is critical for me).
What is the fastest way to solve this?

Use each number as the 52-bit mantissa in a 64-bit floating point number. This is simply a matter of masking then casting.
#include <stdio.h>
int main(int argc, char *argv[])
{
long long val = 1234567890;
long long mval = val & ~0xfff0000000000000ULL;
float fval = *((float*)&mval);
printf("%f", fval);
}

b^x = e^(x * ln b)
So using a smaller base b is equivalent to multiplying your values by ln b before applying exp, and dividing again at the end.

Related

Buggy transfer of single long long numbers to int array

I’m trying to grab a Long Long Int and split each place number into it’s own spot in an array, in order of course, with array[0] being the largest number.
So for instance, if the number was 314, then array[0] = 3, array[1] =1, and array[2] = 4.
This is part of a calculator project for a microcontroller where I’m writing the graphics library (for fun) and using arrays to display each line.
The issue is, it needs to be able to deal with really large numbers (9,999,999,999+), and I’m having dramas with the large stuff. If the Long Long is < 1,000,000, it will writes all the numbers perfectly, but the more numbers I add, they all start to be written wrong towards the end.
For instance, 1,234,567,890 displays as 1,234,567,966.
Here’s the snippet of code I’m using:
long long int number = 1234567890;
int answerArray[10];
int numberLength = 10;
for(writeNumber = 0; writeNumber < numberLength; writeNumber++)
{
answerArray[writeNumber] = ((int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
}
I’m fairly sure this has to do with either the “%” and multiple data types, because any number within the Int range works perfectly.
Can you see where I’m going wrong? Is there a better way achieve my goal? Any tips for large numbers?
The signature of pow is
double pow(double x, double y);
When you call the function, the computation will implicitly use floating point. That is why it is no longer exact as pure integer operations.
In addition, you have to be careful how you cast to int.
In your question, you have
((int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
The parentheses do not match, so I will assume you meant:
(int)(number / pow(10, (numberLength - 1 - writeNumber))) % 10;
However, here you cast a number that may exceed the range of int before you apply the modulo 10 operation. That can result in an integer overflow. The code is doing the same as if you had written:
((int)(number / pow(10, (numberLength - 1 - writeNumber)))) % 10;
To avoid the overflow, it would be better to perform the modulo operation first. However, you are dealing implicitly with double at this point (because of pow), so it is not ideal either. It is best to stick with pure integer operations to avoid these pitfalls.
Your issue is that you're casting what is potentially a very large number to an int. Look at the iteration when writeNumber is numberLength-1. In that case, you're dividing a long long by 1 and then forcing the result into an int. Once number becomes larger than 2^31-1, you're going to run into problems.
You should remove the cast altogether as well as the call to pow. Instead, you should iteratively grab the next digit by modding out by 10 and then dividing number (or a copy of it) by 10.
E.g.,
int index = sizeof(answerArray)/sizeof(answerArray[0]);
for (long long x=number; x>0; x /= 10) {
answerArray[--index] = x%10;
}

C - erroneous output after multiplication of large numbers

I'm implementing my own decrease-and-conquer method for an.
Here's the program:
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
double dncpow(int a, int n)
{
double p = 1.0;
if(n != 0)
{
p = dncpow(a, n / 2);
p = p * p;
if(n % 2)
{
p = p * (double)a;
}
}
return p;
}
int main()
{
int a;
int n;
int a_upper = 10;
int n_upper = 50;
int times = 5;
time_t t;
srand(time(&t));
for(int i = 0; i < times; ++i)
{
a = rand() % a_upper;
n = rand() % n_upper;
printf("a = %d, n = %d\n", a, n);
printf("pow = %.0f\ndnc = %.0f\n\n", pow(a, n), dncpow(a, n));
}
return 0;
}
My code works for small values of a and n, but a mismatch in the output of pow() and dncpow() is observed for inputs such as:
a = 7, n = 39
pow = 909543680129861204865300750663680
dnc = 909543680129861348980488826519552
I'm pretty sure that the algorithm is correct, but dncpow() is giving me wrong answers.
Can someone please help me rectify this? Thanks in advance!
Simple as that, these numbers are too large for what your computer can represent exactly in a single variable. With a floating point type, there's an exponent stored separately and therefore it's still possible to represent a number near the real number, dropping the lowest bits of the mantissa.
Regarding this comment:
I'm getting similar outputs upon replacing 'double' with 'long long'. The latter is supposed to be stored exactly, isn't it?
If you call a function taking double, it won't magically operate on long long instead. Your value is simply converted to double and you'll just get the same result.
Even with a function handling long long (which has 64 bits on nowadays' typical platforms), you can't deal with such large numbers. 64 bits aren't enough to store them. With an unsigned integer type, they will just "wrap around" to 0 on overflow. With a signed integer type, the behavior of overflow is undefined (but still somewhat likely a wrap around). So you'll get some number that has absolutely nothing to do with your expected result. That's arguably worse than the result with a floating point type, that's just not precise.
For exact calculations on large numbers, the only way is to store them in an array (typically of unsigned integers like uintmax_t) and implement all the arithmetics yourself. That's a nice exercise, and a lot of work, especially when performance is of interest (the "naive" arithmetic algorithms are typically very inefficient).
For some real-life program, you won't reinvent the wheel here, as there are libraries for handling large numbers. The arguably best known is libgmp. Read the manuals there and use it.

Upper bound for number of digits of big integer in different base

I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.
Example:
A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.
The function I am currently using is:
int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
return ceil(n_digits*log(src_base)/log(dst_base));
}
Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.
I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.
GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.
I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.
In the following, let dst_base be 2^w, src_base be b, and n_digits be n.
Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)
For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:
ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) )
≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.
In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.
I'm not sure about float point rounding in this case, but it is relatively easy to implement this using only integers, as log2 is a classic bit manipulation pattern and integer division can be easily rounded up. The following code is equivalent to yours, but using integers:
// Returns log2(x) rounded up using bit manipulation (not most efficient way)
unsigned int log2(unsigned int x)
{
unsigned int y = 0;
--x;
while (x) {
y++;
x >>= 1;
}
return y;
}
// Returns ceil(a/b) using integer division
unsigned int roundup(unsigned int a, unsigned int b)
{
return (a + b - 1) / b;
}
unsigned int get_num_digits_in_different_base(unsigned int n_digits, unsigned int src_base, unsigned int log2_dst_base)
{
return roundup(n_digits * log2(src_base), log2_dst_base);
}
Please, note that:
This function return different results compared to yours! However, in every case I looked, both were still correct (the smaller value was more accurate, but your requirement is just an upper bound).
The integer version I wrote receives log2_dst_base instead of dst_base to avoid overflow for 2^64.
log2 can be made more efficient using lookup tables.
I've used unsigned int instead of int.

does modulus function is only applicable on integer data types?

my algorithm calculates the arithmetic operations given below,for small values it works perfectly but for large numbers such as 218194447 it returns a random value,I have tried to use long long int,double but nothing works because modulus function which I have used can only be used with int types , can anyone explain how to solve it or could provide a links that can be useful
#include<stdio.h>
#include<math.h>
int main()
{
long long i,j;
int t,n;
scanf("%d\n",&t);
while(t--)
{
scanf("%d",&n);
long long k;
i = (n*n);
k = (1000000007);
j = (i % k);
printf("%d\n",j);
}
return 0;
}
You could declare your variables as int64_t or long long ; then they would compute the modulus in their range (e.g. 64 bits for int64_t). And it would work correctly only if all intermediate values fit in their range.
However, you probably want or need bignums. I suggest you to learn and use GMPlib for that.
BTW, don't use pow since it computes in floating point. Try i = n * n; instead of i = pow(n,2);
P.S. this is not for a beginner in C programming, using gmplib requires some fluency with C programming (and programming in general)
The problem in your code is that intermittent values of your computation exceed the range of values that can be stored in an int. n^2 for values of n>2^30 cannot be represented as int.
Follow the link above given by R.T. for a way of doing modulo on big numbers. That won't be enough though, since you also need a class/library that can handle big integer values . With only standard C libraries in place, that will otherwise be a though task do do on your own. (ok, for 2^31, a 64 bit integer would do, but if you're going even larger, you're out of luck again)
After accept answer
To find the modulo of a number n raised to some power p (2 in OP's case), there is no need to first calculate power(n,p). Instead calculate intermediate modulo values as n is raise to intermediate powers.
The following code works with p==2 as needed by OP, but also works quickly if p=1000000000.
The only wider integers needed are integers that are twice as wide as n.
Performing all this with unsigned integers simplifies the needed code.
The resultant code is quite small.
#include <stdint.h>
uint32_t powmod(uint32_t base, uint32_t expo, uint32_t mod) {
// `y = 1u % mod` needed only for the cases expo==0, mod<=1
// otherwise `y = 1u` would do.
uint32_t y = 1u % mod;
while (expo) {
if (expo & 1u) {
y = ((uint64_t) base * y) % mod;
}
expo >>= 1u;
base = ((uint64_t) base * base) % mod;
}
return y;
}
#include<stdio.h>
#include<math.h>
int main(void) {
unsigned long j;
unsigned t, n;
scanf("%u\n", &t);
while (t--) {
scanf("%u", &n);
unsigned long k;
k = 1000000007u;
j = powmod(n, 2, k);
printf("%lu\n", j);
}
return 0;
}

how do we print a number that's greater than 2^32-1 with int and float? (is it even possible?)

how do we print a number that's greater than 2^32-1 with int and float? (is it even possible?)
How does your variable contain a number that is greater than 2^32 - 1? Short answer: It'll probably be a specific data-structure and assorted functions (oh, a class?) that deal with this.
Given this data structure, how do we print it? With BigInteger_Print(BigInteger*) of course :)
Really though, there is no correct answer to this, as printing a number larger than 2^32-1 depends entirely upon how you're storing that number.
More theoretically: suppose you have a very very very large number stored somewhere somehow; if so, I suppose that you are somehow able to do math on that number, otherwise it would be quite pointless storing it.
If you can do math on it, just divide the bignumber by ten (10); store the remainder somewhere. Repeat until the result is smaller than 10. When it's smaller than ten, print it, then print the remainders, from the last to the first. Finish.
You can speed up things by dividing for the largest power of 10 that you are able to print without effort (on 32 bit, 1'000'000'000).
Edit: pseudo code:
#include <stdio.h>
#include <math.h>
#include <math_with_very_very_big_num.h>
int main(int argc, char **argv) {
very_very_big_num bignum = someveryverybignum;
very_very_big_num quot;
int size = (int) floor(vvbn_log10(bignum)) + 1;
char *result = calloc(size, sizeof(char));
int i = 0;
do {
quot = vvbn_divide(bignum, 10);
result[i++] = (char) vvbn_remainder(bignum, 10) + '0';
bignum = quot;
} while (vvbn_greater(bignum, 9));
result[i] = (char) vvbn_to_i(bignum) + '0';
while(i >= 0)
printf("%c", result[i--]);
printf("\n");
}
(I wrote this using long, than translating it with veryverybignum stuff; it worked with long, unluckily I cannot try this version, so please forgive me if I made transation errors...)
If you are talking about int64 types, you can try %I64u, %I64d, %I64x, %llu, %lld
On common hardware, the largest float is (2^128 - 2^104), so if it's smaller than that, you just use %f (or %g or %a) with printf( ).
For int64 types, JustJeff's answer is spot on.
The range of double (%f) extends to nearly 2^1024, which is really quite huge; on Intel hardware, when the long double (%Lf) type corresponds to 80-bit float, the range of that type goes up to 2^16384.
If you need larger numbers than that, you need to use a library (which will likely have its own print routines) or roll your own representation and provide your own printing support.

Resources