HashSet not storing equal Integer and Double values the same way - hashset

So, I'm still new to coding in Java and upon trying ProjectEuler's 29th problem, I tried using the brute-force solution using HashSets but the values stored in the HashSet differs if it is set to store Integer and Double values even if they are always the same values.
ProjectEuler Problem 29: How many distinct terms are in the sequence generated by ab for 2 ≤ a ≤ 100 and 2 ≤ b ≤ 100?
See below:
// Java
private static int distinctAb(int start, int last) {
HashSet<Double> products = new HashSet<>();
for (int a = start; a <= last; a++) {
for (int b = start; b <= last; b++) {
double result = Math.pow(a, b);
products.add(result);
}
}
return products.size();
}
private static int distinctAbInt(int start, int last) {
HashSet<Integer> products = new HashSet<>();
for (int a = start; a <= last; a++) {
for (int b = start; b <= last; b++) {
double result = Math.pow(a, b);
products.add((int) result);
}
}
return products.size();
}
The only difference between the two snippets is HashSet storing either Integer or Double elements. start and end are always 2 and 100 respectively. The first method using Double produces 9183 (which is correct) but the second using Integer produces 422 (wrong).
Is there a limiting factor in HashSet's Integer element which it to produce different answers?

Because casting a very large double to int returns Integer.MAX_VALUE.
System.out.println((int) Math.pow(99,100)); // Prints 2147483647
System.out.println((int) Math.pow(100,100)); // Prints 2147483647
Please see the document to find "Narrowing primitive conversions" saying "The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long."

Related

Cubic integer overflow

The following simple calculation causes an integer overflow:
void main(void) {
int n = 1291;
long cube = n*n*n;
printf("Cube: %ld, n: %d", cube, n);
}
Output:
Cube: -2143282125, n: 1291
My thinking was that since the result of n*n*n is assigned to a long, the result should evaluate to 2151685171. However, it appears that the result is calculated first into an int; because if int n = 1291 is changed to long n = 1291, it works as expected.
Question:
Is the 'intermediary' result of n*n*n stored to int (the declared type) before being assigned to the long declaration? Or, more simply: Why does n*n*n cause an integer overflow when being assigned to a long type?
I have researched to find the answer first, unfortunately must be searching incorrectly.
Your question resembles a lot to the typical division question:
int a = 7;
double b = a / 2;
=> b seems to be equal to 3 instead of 3.5, and in order to avoid this, you need to do:
double b = ((double)a) / 2; // or:
double b = a / 2.0; // which is the same as (double)2
So, I believe that you might benefit from the same reasoning, doing something like this:
int n = 1291;
long cube = ((long)n) * n * n;

Track first digit during a long multiplication in c

I have an array of integer element range up to 10^5, and I have to find the first element after the total multiplication.
Example:
Array : 2,4,6,7
multiplication result: 336 and the first element is 3.
Obviously I cannot multiply the elements with the range up to 10^5.
How can I track only the first digit during multiplication?
We can also find the first digit with another method.
Suppose p be the final value after multiplying all the elements.
So, we have to find
P = a[0]*a[1]*a[2]*a[3]*.......*a[n-1]
for n sized array then we can take log with base 10 on both the side after that our expression changes to
log(p) = log(a[i])+log(a[1])+log(a[2])+.....+log(a[n-1])
Now, to find the first digit we have to get the fractional part of this variable sum which can be done in this way
frac = sum - (integer)sum
and at the last step calculate the 10^frac and convert it to the integer value which is our required first digit.
This algorithm is better in comparison to time complexity.
int getFirstDigit(long a[], long n) {
double p;
for(int i=0;i<n;i++) {
p = p+log10(a[i]);
}
double frac = p - (long)p;
int firdig = (int)pow(10,frac);
return firdig;
}
In c or c++ make integer data type as long double such that first digit of number is before decimal point and rest are after decimal point.
Above can be done as follows:-
long double GetFraction(int number){
int length = (int) log(number) + 1; // this will give number of digits in given number. And log is log base 10.
long double fraction = (long double) number / (10^(length - 1);
return fraction;
}
Example :-
Let number = 12345
length = log(12345) + 1 = 5;
fraction = (long double) 12345 / (10^4) = 1.2345
Now for all integers in array find fraction as mention above and multiply them as follow:-
int GetFirstDigit(int arr[] , int size){
if(size == 0)
return 0;
long double firstDigit = 1.0;
for(int i = 0 ; i < size ; i++){
firstDigit = firstDigit*GetFraction(arr[i]);
if(firstDigit >= 10.00) // You have to shorten your number otherwise it will same as large multiplication and will overflow.
firstDigit/=10;
}
return (int) firstDigit;
}
Disclaimer:- This is my approach and I don't have any formal proof about accuracy of result. But I have verified result for integer up to 10^9 and array size up to 10^5
Please donot forget to note that this is just an attempt to make you understand the logic and that you need to make changes in the code as per your requirement. I strongly suggest you make this a subroutine in your program and parse the arguments to it from the main thread in your program.
#include <stdio.h>
void main()
{
int num1, num2;
printf("Enter ur lovely number:\n");
scanf("%d",&num1);
num2=num1;
while(num2)
{
num2=num2/10;
if(num2!=0)
num1=num2;
}
printf("The first digit of the lovely number is %d !! :P\n ",num1);
}
Try this approach,
Take integer as input let us say int x1, now copy this in a double let us say double x2, and suppose you have previous product as double y, initially y = 1 . now use this loop,
while(x1!<10){
x1 = x1/10;
x2 = x2/10; //this will make double in standard form x*10^y without 10^y part
}
ex x1 = 52, then x2 will be converted to 5.2.
Now let us assume y = 3 and x is 5.2.
then product now is 15.6, again reduce this to 1.56 and repeat the process. in the end you will have the only digit before the decimal as the first digit of the product of all the numbers.

taylor series with error at most 10^-3

I'm trying to calculate the the taylor series of cos(x) with error at most 10^-3 and for all x ∈ [-pi/4, pi/4], that means my error needs to be less than 0.001. I can modify the x +=in the for loop to have different result. I tried several numbers but it never turns to an error less than 0.001.
#include <stdio.h>
#include <math.h>
float cosine(float x, int j)
{
float val = 1;
for (int k = j - 1; k >= 0; --k)
val = 1 - x*x/(2*k+2)/(2*k+1)*val;
return val;
}
int main( void )
{
for( double x = 0; x <= PI/4; x += 0.9999 )
{
if(cosine(x, 2) <= 0.001)
{
printf("cos(x) : %10g %10g %10g\n", x, cos(x), cosine(x, 2));
}
printf("cos(x) : %10g %10g %10g\n", x, cos(x), cosine(x, 2));
}
return 0;
}
I'm also doing this for e^x too. For this part, x must in [-2,2] .
float exponential(int n, float x)
{
float sum = 1.0f; // initialize sum of series
for (int i = n - 1; i > 0; --i )
sum = 1 + x * sum / i;
return sum;
}
int main( void )
{
// change the number of x in for loop so you can have different range
for( float x = -2.0f; x <= 2.0f; x += 1.587 )
{
// change the frist parameter to have different n value
if(exponential(5, x) <= 0.001)
{
printf("e^x = %f\n", exponential(5, x));
}
printf("e^x = %f\n", exponential(5, x));
}
return 0;
}
But whenever I changed the number of terms in the for loop, it always have an error that is greater than 1. How am I suppose to change it to have errors less than 10^-3?
Thanks!
My understanding is that to increase precision, you would need to consider more terms in the Taylor series. For example, consider what happens when
you attempt to calculate e(1) by a Taylor series.
$e(x) = \sum\limits_{n=0}^{\infty} frac{x^n}{n!}$
we can consider the first few terms in the expansion of e(1):
n value of nth term sum
0 x^0/0! = 1 1
1 x^1/1! = 1 2
2 x^2/2! = 0.5 2.5
3 x^3/3! = 0.16667 2.66667
4 x^4/4! = 0.04167 2.70834
You should notice two things, first that as we add more terms we are getting closer to the exact value of e(1), also that the difference between consecutive sums are getting smaller.
So, an implementation of e(x) could be written as:
#include <stdbool.h>
#include <stdio.h>
#include <math.h>
typedef float (*term)(int, int);
float evalSum(int, int, int, term);
float expTerm(int, int);
int fact(int);
int mypow(int, int);
bool sgn(float);
const int maxTerm = 10; // number of terms to evaluate in series
const float epsilon = 0.001; // the accepted error
int main(void)
{
// change these values to modify the range and increment
float start = -2;
float end = 2;
float inc = 1;
for(int x = start; x <= end; x += inc)
{
float value = 0;
float prev = 0;
for(int ndx = 0; ndx < maxTerm; ndx++)
{
value = evalSum(0, ndx, x, expTerm);
float diff = fabs(value-prev);
if((sgn(value) && sgn(prev)) && (diff < epsilon))
break;
else
prev = value;
}
printf("the approximate value of exp(%d) is %f\n", x, value);
}
return 0;
}
I've used as a guess that we will not need to use more then ten terms in the expansion to get to the desired precision, thus the inner for loop is where we loop over values of n in the range [0,10].
Also, we have several lines dedicated to checking if we reach the required precision. First I calculate the absolute value of the difference between the current evaluation and the previous evaluation, and take the absolute difference. Checking if the difference is less than our epsilon value (1E-3) is on of the criteria to exit the loop early. I also needed to check that the sign of of the current and the previous values were the same due to some fluctuation in calculating the value of e(-1), that is what the first clause in the conditional is doing.
float evalSum(int start, int end, int val, term fnct)
{
float sum = 0;
for(int n = start; n <= end; n++)
{
sum += fnct(n, val);
}
return sum;
}
This is a utility function that I wrote to evaluate the first n-terms of a series. start is the starting value (which is this code always 0), and end is the ending value. The final parameter is a pointer to a function that represents how to calculate a given term. In this code, fnct can be a pointer to any function that takes to integer parameters and returns a float.
float expTerm(int n, int x)
{
return (float)mypow(x,n)/(float)fact(n);
}
Buried down in this one-line function is where most of the work happens. This function represents the closed form of a Taylor expansion for e(n). Looking carefully at the above, you should be able to see that we are calculating $\fract{x^n}{n!}$ for a given value of x and n. As a hint, for doing the cosine part you would need to create a function to evaluate the closed for a term in the Taylor expansion of cos. This is given by $(-1)^n\fact{x^{2n}}{(2n)!}$.
int fact(int n)
{
if(0 == n)
return 1; // by defination
else if(1 == n)
return 1;
else
return n*fact(n-1);
}
This is just a standard implementation of the factorial function. Nothing special to see here.
int mypow(int base, int exp)
{
int result = 1;
while(exp)
{
if(exp&1) // b&1 quick check for odd power
{
result *= base;
}
exp >>=1; // exp >>= 1 quick division by 2
base *= base;
}
return result;
}
A custom function for doing exponentiation. We certainly could have used the version from <math.h>, but because I knew we would only be doing integer powers we could write an optimized version. Hint: in doing cosine you probably will need to use the version from <math.h> to work with floating point bases.
bool sgn(float x)
{
if(x < 0) return false;
else return true;
}
An incredibly simple function to determine the sign of a floating point value, returning true is positive and false otherwise.
This code was compiled on my Ubuntu-14.04 using gcc version 4.8.4:
******#crossbow:~/personal/projects$ gcc -std=c99 -pedantic -Wall series.c -o series
******#crossbow:~/personal/projects$ ./series
the approximate value of exp(-2) is 0.135097
the approximate value of exp(-1) is 0.367857
the approximate value of exp(0) is 1.000000
the approximate value of exp(1) is 2.718254
the approximate value of exp(2) is 7.388713
The expected values, as given by using bc are:
******#crossbow:~$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
e(-2)
.13533528323661269189
e(-1)
.36787944117144232159
e(0)
1.00000000000000000000
e(1)
2.71828182845904523536
e(2)
7.38905609893065022723
As you can see, the values are well within the tolerances that you requests. I leave it as an exercise to do the cosine part.
Hope this helps,
-T
exp and cos have power series that converge everywhere on the real line. For any bounded interval, e.g. [-pi/4, pi/4] or [-2, 2], the power series converge not just pointwise, but uniformly to exp and cos.
Pointwise convergence means that for any x in the region, and any epsilon > 0, you can pick a large enough N so that the approximation you get from the first N terms of the taylor series is within epsilon of the true value. However, with pointwise convergence, the N may be small for some x's and large for others, and since there are infinitely many x's there may be no finite N that accommodates them all. For some functions that really is what happens sometimes.
Uniform convergence means that for any epsilon > 0, you can pick a large enough N so that the approximation is within epsilon for EVERY x in the region. That's the kind of approximation that you are looking for, and you are guaranteed that that's the kind of convergence that you have.
In principle you could look at one of the proofs that exp, cos are uniformly convergent on any finite domain, sit down and say "what if we take epsilon = .001, and the regions to be ...", and compute some finite bound on N using a pen and paper. However most of these proofs will use at some steps some estimates that aren't sharp, so the value of N that you compute will be larger than necessary -- maybe a lot larger. It would be simpler to just implement it for N being a variable, then check the values using a for-loop like you did in your code, and see how large you have to make it so that the error is less than .001 everywhere.
So, I can't tell what the right value of N you need to pick is, but the math guarantees that if you keep trying larger values eventually you will find one that works.

Round-off error when calculating a geometric mean [duplicate]

I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be
double geometric_mean(std::vector<double> const&data) // failure
{
auto product = 1.0;
for(auto x:data) product *= x;
return std::pow(product,1.0/data.size());
}
However, this may well fail because of underflow or overflow in the accumulated product (note: long double doesn't really avoid this problem). So, the next option is to sum-up the logarithms:
double geometric_mean(std::vector<double> const&data)
{
auto sumlog = 0.0;
for(auto x:data) sum_log += std::log(x);
return std::exp(sum_log/data.size());
}
This works, but calls std::log() for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product separately?
The "split exponent and mantissa" solution:
double geometric_mean(std::vector<double> const & data)
{
double m = 1.0;
long long ex = 0;
double invN = 1.0 / data.size();
for (double x : data)
{
int i;
double f1 = std::frexp(x,&i);
m*=f1;
ex+=i;
}
return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN);
}
If you are concerned that ex might overflow you can define it as a double instead of a long long, and multiply by invN at every step, but you might lose a lot of precision with this approach.
EDIT For large inputs, we can split the computation in several buckets:
double geometric_mean(std::vector<double> const & data)
{
long long ex = 0;
auto do_bucket = [&data,&ex](int first,int last) -> double
{
double ans = 1.0;
for ( ;first != last;++first)
{
int i;
ans *= std::frexp(data[first],&i);
ex+=i;
}
return ans;
};
const int bucket_size = -std::log2( std::numeric_limits<double>::min() );
std::size_t buckets = data.size() / bucket_size;
double invN = 1.0 / data.size();
double m = 1.0;
for (std::size_t i = 0;i < buckets;++i)
m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN );
m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN );
return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m;
}
I think I figured out a way to do it, it combined the two routines in the question, similar to Peter's idea. Here is an example code.
double geometric_mean(std::vector<double> const&data)
{
const double too_large = 1.e64;
const double too_small = 1.e-64;
double sum_log = 0.0;
double product = 1.0;
for(auto x:data) {
product *= x;
if(product > too_large || product < too_small) {
sum_log+= std::log(product);
product = 1;
}
}
return std::exp((sum_log + std::log(product))/data.size());
}
The bad news is: this comes with a branch. The good news: the branch predictor is likely to get this almost always right (the branch should only rarely be triggered).
The branch could be avoided using Peter's idea of a constant number of terms in the product. The problem with that is that overflow/underflow may still occur within only a few terms, depending on the values.
You may be able to accelerate this by multiplying numbers as in your original solution and only converting to logarithms every certain number of multiplications (depending on the size of your initial numbers).
A different approach which would give better accuracy and performance than the logarithm method would be to compensate out-of-range exponents by a fixed amount, maintaining an exact logarithm of the cancelled excess. Like so:
const int EXP = 64; // maximal/minimal exponent
const double BIG = pow(2, EXP); // overflow threshold
const double SMALL = pow(2, -EXP); // underflow threshold
double product = 1;
int excess = 0; // number of times BIG has been divided out of product
for(int i=0; i<n; i++)
{
product *= A[i];
while(product > BIG)
{
product *= SMALL;
excess++;
}
while(product < SMALL)
{
product *= BIG;
excess--;
}
}
double mean = pow(product, 1.0/n) * pow(BIG, double(excess)/n);
All multiplications by BIG and SMALL are exact, and there's no calls to log (a transcendental, and therefore particularly imprecise, function).
There is simple idea to reduce computation and also to prevent overflow. You can group together numbers say atleast two at time and calculate their log and then evaluate their sum.
log(abcde) = 5*log(K)
log(ab) + log(cde) = 5*log(k)
Summing logs to compute products stably is perfectly fine, and rather efficient (if this is not enough: there are ways to get vectorized logarithms with a few SSE operations -- there are also Intel MKL's vector operations).
To avoid overflow, a common technique is to divide every number by the maximum or minimum magnitude entry beforehand (or sum log differences to the log max or log min). You can also use buckets if the numbers vary a lot (eg. sum the log of small numbers and large numbers separately). Note that typically neither of this is needed except for very large sets since the log of a double is never huge (between say -700 and 700).
Also, you need to keep track of the signs separately.
Computing log x keeps typically the same number of significant digits as x, except when x is close to 1: you want to use std::log1p if you need to compute prod(1 + x_n) with small x_n.
Finally, if you have roundoff error problems when summing, you can use Kahan summation or variants.
Instead of using logarithms, which are very expensive, you can directly scale the results by powers of two.
double geometric_mean(std::vector<double> const&data) {
double huge = scalbn(1,512);
double tiny = scalbn(1,-512);
int scale = 0;
double product = 1.0;
for(auto x:data) {
if (x >= huge) {
x = scalbn(x, -512);
scale++;
} else if (x <= tiny) {
x = scalbn(x, 512);
scale--;
}
product *= x;
if (product >= huge) {
product = scalbn(product, -512);
scale++;
} else if (product <= tiny) {
product = scalbn(product, 512);
scale--;
}
}
return exp2((512.0*scale + log2(product)) / data.size());
}

How to take modulus of a large value stored in array?

Suppose I have a integer array containing digits and I want to take modulus of value stored in it, i.e
int a[36]={1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9}
and convert it into a number like 987654321987654321987654321987654321.
In C language long long int permits only 10^18. I want to take modulus with 10^9+7. How can i do that?
Program:
int main()
{
int a[36]={1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9};
long long int temp=0;
int i;
for(i=0;i<36;i++)
{
temp=temp+a[i]*pow(10,i);
}
temp=temp%1000000007;
printf("%lld",temp);
return 0;
}
Since 36 decimal digits is too much for a typical long long, you need to perform your modulus operation during the conversion:
int a[36]={1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9};
long long int temp=0;
for(int i=35 ; i >= 0 ; i--) {
temp = 10*temp + a[i];
temp %= 1000000007;
}
printf("%lld",temp);
I made two changes to your code:
Fixed the way you convert an array of digits to a number - your code used pow, and treated digits at higher indexes as higher-order digits. This creates precision problems once you get past the highest power of ten that can be represented as double.
Moved the %= into the loop - your code does not let the number overflow by keeping the value in the range from 0 to 1000000006, inclusive.
Running this code produces the same value that you would obtain with a library that supports arbitrary precision of integers (I used Java BigInteger here).

Resources