floor(x) does not equal x where x is a double - c

I have a simple code to check if an INTEGER n is a power of 2
bool isPowerOfTwo(int n)
{
double x = log(n)/log(2);
return floor(x) == x ;
}
Most of the test cases are fine until n = 536870912 = 2^29.
In this case, the function return FALSE (which is not correct).
I used printf to print floor(x) and x, both of them give 29.000000. But still, it returns FALSE.
I hexdump x and floor(x)
x:
01 00 00 00 00 00 3D 40
floor(x):
00 00 00 00
00 00 3D 40
Why would floor(x) change a certain bit of x when x is a round number (29)? How to fix this problem ?
Thanks in advance.
FYI:
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
Target: x86_64-linux-gnu
EDIT: interestingly, if log() is replaced with log10(), it works fine.

Why would floor(x) change a certain bit of x when x is a round number (29)?
Because double you are using has no precision to store the result of calculations exact, so it rounds, and because the result of calculation is away from a whole number floor rounds to the whole number. On your platform double is Double precision IEEE 745.
calculation
real result
rounded to double
log(536870912)
20.10126823623841397309
20.10126823623841474387
log(2)
.69314718055994530941
.69314718055994528623
20.10126823623841474387/.69314718055994528623
29.00000000000000208209
29.00000000000000355271
(double)log(536870912)/(double)log(2)
29.00000000000000000028
29.00000000000000355271
The result of calculation of ln(536870912)/ln(2) is 29.00000000000000355271 (i.e. 0x1.d000000000001p+4). Floor then truncates the 1 and gives exact 29.
Note how the result of (double)log(536870912)/(double)log(2) is closer to 29.00000000000000355271 then to 29. You can play around with https://www.binaryconvert.com/convert_double.html# and https://www.h-schmidt.net/FloatConverter/IEEE754.html to learn floats better.
How to fix this problem ?
The usuall solutions: increase precision by using greater datatype (i.e. use long double) or use an arbitrary-precision arithmetic library. In this case, do not use floating point numbers at all, stick to integers.

Function which checks if is the power of 2 and if it is returns log2(x)
#define MAX(x) (CHAR_BIT * sizeof(x))
#define HALF(x) (1UL << (MAX(x) / 2))
int ispower2andLog(unsigned x)
{
int result = 0;
if(x > HALF(x))
{
for(int pos = MAX(x) / 2; pos < MAX(x); pos++)
{
if((1UL << pos) == x)
{
result = pos;
break;
}
}
}
else
{
for(int pos = 0; pos <= MAX(x / 2); pos++)
{
if((1UL << pos) == x)
{
result = pos;
break;
}
}
}
return result;
}
int main(void)
{
unsigned testcases[] = {0, 2,4, 5, 0x100, HALF(UINT_MAX) - 1, HALF(UINT_MAX) - 0, HALF(UINT_MAX) + 1, UINT_MAX & (~(UINT_MAX >> 1)), UINT_MAX};
for(size_t i = 0; i < sizeof(testcases) / sizeof(testcases[0]); i++)
printf("%12u (%08x) result = %d\n", testcases[i], testcases[i], ispower2andLog(testcases[i]));
}

The following returns true iff n is a power of two:
_Bool isPowerOfTwo(int n)
{
return 0 < n && n == (n & - (unsigned) n);
}
- (unsigned) n computes the two’s complement of n (because unsigned arithmetic wraps modulo a power of two). In the two’s complement of a positive number, all bits above the lowest 1 bit change. For example, 0100 0110 changes to 1011 1010. So, when this is ANDed with the original number, the only bit on is the lowest 1 bit. If this equals the original number, the original number had only one bit set to 1, so it is a power of two. If it does not equal the original number, the original number had more than one bit set to 1, so it is not a power of two. (The case where the original number has no bits set to 1 is handled by the test 0 < n.)

Your method is not reliable for multiple reasons:
converting an int might lose precision if the int type has more value bits than the floating point type. This would be the case when converting a long long to an IEEE-754 double type.
both log(n) and log(2) are approximations: the result is rounded to the closest value representable as a double. Dividing these approximations might not produce the exact value of log2(n), and the result is itself rounded so even a whole number might not be proof that n be an exact power of 2.
if n is negative, log(n) might return a NAN and the test will evaluate to false, but log might trigger a signal and cause unexpected behavior. log(0) might return -Infinity or send a signal too.
in a test like this, a floating point calculation is not the expected approach. Taking advantage of the binary representation of integers is a better solution.
A much simpler method is this:
bool isPowerOfTwo(unsigned int n) {
return (n & (n - 1)) == 0 && n != 0;
}

Related

ieee754 floating point 1/x * x > 1.0

I want to know whether the program defined below can return 1 assuming:
IEEE754 floating point arithmetics
no overflow (neither in max/x nor in f*x)
no nan or inf (obviously)
0 < x and 0 < n < 32
no unsafe math optimization
int canfail(int n, double x) {
double max = 1ULL << n; // 2^n
double f = max / x;
return f * x > max;
}
In my opinion, it should sometime return 1, as roundToNearest(max / x) can in general be greater than max/x.
I'm able to find numbers for the opposite case, where f * x < max, but I have no examples of input that show f * x > max and I have no idea of how to find one. Can somebody help ?
EDIT:
I know the value of x if in a range between 10^(-6) and 10^6 (that still leaves a lot (too much possible double values), but I know I will not have to deal with overflow, underflow or sub-normal numbers !
In addition, I just realized that because max is a power of two and we don't deal with overflow, the solution will be the same by fixing max=1 as it is exactly the same computation, but shifted.
Therefore, the problem correspond to finding a positive, normal double value x such that `(1/x) * x > 1.0 !!
I made a little program to try to find a solution:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
#include <omp.h>
int main( void ) {
#pragma omp parallel
{
unsigned short int xsubi[3] = {
omp_get_thread_num(),
omp_get_thread_num(),
omp_get_thread_num()
};
#pragma omp for
for(int64_t i=0; i<INT64_MAX; i++) {
double x = fmod(nrand48(xsubi), 1048576.0);
if(x<0.000001)
continue;
double f = 1.0 / x;
if(f * x > 1.0) {
printf("found !!! x=%.30f\n", x);
fflush(stdout);
}
}
}
return 1;
}
If you change the sign of the comparison, you will find some value quickly. However, it seems to run forever with f * x > 1.0
In the absence of underflow or overflow, the exponents are irrelevant; if M/x*x > M, then (M/p) / (x/q) * (x/q) > (M/p) for any powers of two p and q. So let’s consider 252 ≤ x < 253 and M = 2105. We can eliminate x = 252 since this yields exact floating-point arithmetic, so 252 < x < 253.
Division of 2105 by x yields integer quotient q and integer remainder r, with 252 q < 253, 0 < r < x, and 2105 = q•x + r.
In order for M/x*x to exceed M, both the division and the multiplication must round up. Since the division rounds up, x/2 ≤ r.
With rounding up, the result of floating-point division of 2105 by x yields q+1. Then the exact (not rounded) multiplication yields (q+1)•x = q•x + x = q•x + x + r - r = q•x + r + x − r = 2105 + x − r. Since x/2 < r, x − r ≤ x/2, so rounding this exact result rounds down, yielding 2105. (The “<” case always rounds down, and the “=” case rounds down because 2105 has the low even bit.)
Therefore, for powers of two M and all arithmetic within exponent bounds, M/x*x > M never occurs with round-to-nearest-ties-to-even.
Multiplication by a power of two is just a scaling of exponent, it does not change the problem: so it's the same as finding x such that (1/x) * x > 1.
One solution is brute force search.
For same reasons, we can limit the search of such x in the interval (1.0,2.0(
A better approach is to analyze error bounds without brute force.
Let's note ix the nearest floating point to 1/x.
Considering xand ixas exact fractions, we can write the integer division: 1 = ix * x + r where ris the remainder
(these are all fractions with denominators being powers of 2, so we have to multiply the whole equation by appropriate power of 2 to really have integer division).
In other words, ix = 1/x - r/x, where -r/x is the rounding error of inversion.
When we multiply the inverse approximation by x, the exact value is ix*x = 1 - r.
We know that the floating point result will be rounded to the nearest float to that exact value.
So, assumming default rounding mode to nearest, tie to even, the question asked is whether -r can exceed 0.5 ulp.
The short answer is never!
Suppose |r| > 0.5 ulp, then the rounding error -r/x does exceed half ulp of exact result 1/x.
This is not a proper answer, because the exact result is not a floating point and does not have an ulp, but you get the idea...
I might come back with a correct proof if i have time, but my bet is that you can find it already done, possibly on SO
EDIT
Why can you find (1/x) * x < 1?
Simply because 1.0 is at a binade limit, so below 1, we have to prove that r<0.25 ulp, what we cannot...
canfail(1, pow(2, 1023) * (2 - pow(2, -51))) will return 1.

How to compare (long double) values in C?

Here is my code:
#include <stdio.h>
static long double ft_ldmod(long double x, long double mod)
{
long double res;
long double round;
res = x / mod;
round = 0.0L;
while (res >= 1.0L || res <= -1.0L)
{
round += (res < 0.0L) ? -1.0L : 1.0L;
res += (res < 0.0L) ? 1.0L : -1.0L;
}
return ((x / mod - round) * mod);
}
int main(void)
{
long double x;
long double r;
x = 0.0000042L;
r = ft_ldmod(x, 1.0L);
while (r != 0.0L) // <-- I have an infinite loop here
{
x *= 10.0L;
r = ft_ldmod(x, 1.0L);
}
printf("%Lf", x);
return (0);
}
There is seem something wrong but can not figure it out.
The while loop in the main function loops and don't break.
Even the condition is false, it just pass out...
Helps are welcome, thanks.
After x = 0.0000042L;, the value of x depends on the long double format used by your C implementation. It might be 4.2000000000000000001936105559186517000025418155928491614758968353271484375•10−6. Thus, there are more digits in its decimal representation than the code in the question anticipates. As the number is being repeatedly multiplied 10, it grows large.
As it grows large, into the millions and billions, ft_ldmod becomes slower and slower, as it finds the desired value of round by counting by ones.
Furthermore, even if ft_ldmod is given sufficient time, x and round will eventually become so large that adding one to round has no effect. That is, representing the large value of round in long double will require an exponent so large that the lowest bit used to represent round in long double represents a value of 2.
Essentially, the program is fundamentally flawed as a way to find a decimal representation of x. Additionally, the statement x *= 10.0L; will incur rounding errors, as the exact mathematical result of multiplying a number by ten is often not exactly representable in long double, so it is rounded to the nearest representable value. (This is akin to multiplying by 11 in decimal. Starting with 1, we get 11, 121, 1331, 14641, and so on. The number of digits grows. Similarly, multiplying by ten in binary increases the number of significant bits.)

c change the exponent of a double

Lets say I have a double a = 0.3;. How would I be able to change the exponent of the variable, without using math functions like pow(), or multiplying it manually.
I am guessing I would have to acces the memory addres of the variable using pointers, find the exponent and change it manualy. But how would I accomplish this?
Note, that this is on a 8-bit system, and I am trying to find a faster way to multiply the number by 10^12, 10^9, 10^6 or 10^3.
Best regards!
Note that a*10^3 = a*1000 = a*1024 - a*16 - a*8 = a*2^10 - a*2^4 - a*2^3.
So you can calculate a*10^3 as follows:
Read the 11 exponent bits into int exp
Read the 52 fraction bits into double frac
Calculate double x with exp+10 as the exponent and frac as the fraction
Calculate double y with exp+4 as the exponent and frac as the fraction
Calculate double z with exp+3 as the exponent and frac as the fraction
Calculate the output as x-y-z, and don't forget to add the sign bit if a < 0
You can use a similar method for the other options (a*10^6, a*10^9 and a*10^12)...
Here is how you can do the whole thing in a "clean" manner:
double MulBy1000(double a)
{
double x = a;
double y = a;
double z = a;
unsigned long long* px = (unsigned long long*)&x;
unsigned long long* py = (unsigned long long*)&y;
unsigned long long* pz = (unsigned long long*)&z;
*px += 10ULL << 52;
*py += 4ULL << 52;
*pz += 3ULL << 52;
return x - y - z;
}
Please note that I'm not sure whether or not this code breaks strict-aliasing rules.
Multiplying a number by 10 is the equivalent of
a) Multiplying the original number by 2
b) Multiplying the original number by 8
c) Adding the results of (a) and (b).
This works because to is binary 1010.
One approach would therefore be to increment the exponent (for (a)), add 3 to the exponent (for (b)), then add the results.
To multiply by 10^n, repeat the above n times. Alternatively work out the binary representation of 1,000, 1,000,000, etc, and add the relevant 1s. You may make things easier by noting that 1000 for instance 1024 (for instance) is 1024 - 16 - 8, i.e.
a) Add 10 to the exponent of the original to multiply by 1024
b) Add 4 to the exponent of the original to multiply by 16
c) Add 3 to the exponent of the original to multiply by 8
d) From (a) subtract (b) and (c) to get the answer.
Again, you can do that multiple times for 10^6, 10^9 etc.
For a quick approximation and powers of n which are multiples of 3, just add 10n/3 to the exponent (as 1024 ~= 1000)
For fun a simple recursive solution.
double ScalePower10(double x, unsigned power) {
if (power <= 1) {
if (power == 0) return x;
return x * 10.0;
}
double y = ScalePower10(x, power/2);
y = y*y;
if (power%2) y *= 10.0;
return y;
}

Primality test using Fermat little theorem

code:
void prime()
{
int i,N;
scanf("%d",&N);
for(i=2;i<N;i++)
{
if (((i^(N-1))%N )==1);
else{
printf("not prime");
return;
}
}
printf("prime");
return;
}
This program is based on Fermat's Theorem on prime numbers. N is number to be tested as prime. This program is not showing correct result for '11'. Maybe due to some mistake which is not identified by me.
You are running into overflow if this is pseudo-code or
If C code, use of ^ as power operator is not valid.
Working with large integers quickly becomes a problem in C. The are various BigInt libraries available.
Using floating point is challenging with large integer computation. Recommend avoiding double, pow(), etc.
Since the problem is all >= 0, suggest using unsigned integers. Also use the largest integer type available - typically unsigned long long. As overflow is a real possibility, detect it.
unsigned long long upower(unsigned i, unsigned N) {
unsigned long long power = 1;
if (i <= 1) return i;
while (N-- > 0) {
unsigned long long power_before = power;
power *= i;
if (power < power_before) {
printf("Overflow\n");
return 0;
}
}
return power;
}
void prime() {
unsigned i, N;
scanf("%u", &N);
for (i = 2; i < N; i++) {
if ((upower(i, N - 1) % N) != 1) {
printf("not prime");
return;
}
}
printf("prime");
return;
}
In lieu of huge integers, the Chinese remainder theorem may offer an alternative to (upower(i, N - 1) % N) != 1.
If I read your code as pseudo-code, You're overflowing.
10^10 is bigger that 2^31 -1 which is the max value for most int. You could solve this for N=11 by using longs, but that will not get you far, you'll start overflowing at some point as well.
That theorem, at least expressed like this, is very unpractical to use with finite length numbers.
Now, if your code is real C, note that ^ means XOR, not exponentiation. Exponentiation is pow(). Thanks to the commenters for pointing that out.
Modular mathematical rules and principles can be applied here to show that in order to compute
(i ^ (N-1)) % N,
you do not even need to compute i^(N-1) at the first place.
You can easily break down (N-1) into powers of 2.
Let's take an example to make it more clear.
Assume that the subject of our primality test, N = 58.
So,
N - 1 = 57
57 can be easily rewritten as:
57 = 1 + 8 + 16 + 32
or,
57 = 2^0 + 2^3 + 2^4 + 2^5
So, substituting this value for N-1, we need to compute
(i ^ (2^0 + 2^3 + 2^4 + 2^5))% 58
or,
((i^1) × (i^8) × (i^16) × (i^32))% 58
Which, using the Modular Multiplication identities, can be rewritten as:
((i^1)% 58 × (i^8)% 58 × (i^16)% 58 × (i^32)% 58) mod 58 ---(1)
Note that,
(i^1)% 58 = i%58
can be easily computed without worrying of any overflows.
Once again utilising the Modular Multiplication identities, we know that
(i^2)% 58 = ((i^1)% 58 × (i^1)% 58)% 58
Substitute the value of (i^1)% 58 to find (i^2)% 58.
You can continue in this fashion, computing (i^4)% 58 through (i^32)% 58. Once completed, you can finally substitute the values in (1) to finally find the required value, very efficiently avoiding any overflows.
Note that, other modular exponientation techniques exist too. This was just an example to showcase how modular mathematical techniques can be used in implementing Fermat's little primality test.
Cheers!
Sorry to change your code a little. Using the BigInteger class, you can calculate very quickly for much larger numbers. However, you can use this method not to get prime numbers in order, but to test if any numbers are prime.
using System;
using System.Numerics;
public class Program
{
public static void Main()
{
Console.WriteLine(2);
for(var i = 3; i < 100000; i+=2)
{
if(BigInteger.ModPow(2, i , i) == 2)
Console.WriteLine(i);
}
}
}
https://dotnetfiddle.net/nwDP7h
This code will produce erroneous results when it falls into the following numbers.
https://oeis.org/a001567
https://oeis.org/a006935
To fix these errors, you need to edit the code as follows and make a binary search within these numbers to test whether the number is a pseudo prime.
public static bool IsPrime(ulong number)
{
return number == 2
? true
: (BigInterger.ModPow(2, number, number) == 2
? (number & 1 != 0 && BinarySearchInA001567(number) == false)
: false)
}
public static bool BinarySearchInA001567(ulong number)
{
// Is number in list?
// todo: Binary Search in A001567 (https://oeis.org/A001567) below 2 ^ 64
// Only 2.35 Gigabytes as a text file http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html
}

How can you easily calculate the square root of an unsigned long long in C?

I was looking at another question (here) where someone was looking for a way to get the square root of a 64 bit integer in x86 assembly.
This turns out to be very simple. The solution is to convert to a floating point number, calculate the sqrt and then convert back.
I need to do something very similar in C however when I look into equivalents I'm getting a little stuck. I can only find a sqrt function which takes in doubles. Doubles do not have the precision to store large 64bit integers without introducing significant rounding error.
Is there a common math library that I can use which has a long double sqrt function?
There is no need for long double; the square root can be calculated with double (if it is IEEE-754 64-bit binary). The rounding error in converting a 64-bit integer to double is nearly irrelevant in this problem.
The rounding error is at most one part in 253. This causes an error in the square root of at most one part in 254. The sqrt itself has a rounding error of less than one part in 253, due to rounding the mathematical result to the double format. The sum of these errors is tiny; the largest possible square root of a 64-bit integer (rounded to 53 bits) is 232, so an error of three parts in 254 is less than .00000072.
For a uint64_t x, consider sqrt(x). We know this value is within .00000072 of the exact square root of x, but we do not know its direction. If we adjust it to sqrt(x) - 0x1p-20, then we know we have a value that is less than, but very close to, the square root of x.
Then this code calculates the square root of x, truncated to an integer, provided the operations conform to IEEE 754:
uint64_t y = sqrt(x) - 0x1p-20;
if (2*y < x - y*y)
++y;
(2*y < x - y*y is equivalent to (y+1)*(y+1) <= x except that it avoids wrapping the 64-bit integer if y+1 is 232.)
Function sqrtl(), taking a long double, is part of C99.
Note that your compilation platform does not have to implement long double as 80-bit extended-precision. It is only required to be as wide as double, and Visual Studio implements is as a plain double. GCC and Clang do compile long double to 80-bit extended-precision on Intel processors.
Yes, the standard library has sqrtl() (since C99).
If you only want to calculate sqrt for integers, using divide and conquer should find the result in max 32 iterations:
uint64_t mysqrt (uint64_t a)
{
uint64_t min=0;
//uint64_t max=1<<32;
uint64_t max=((uint64_t) 1) << 32; //chux' bugfix
while(1)
{
if (max <= 1 + min)
return min;
uint64_t sqt = min + (max - min)/2;
uint64_t sq = sqt*sqt;
if (sq == a)
return sqt;
if (sq > a)
max = sqt;
else
min = sqt;
}
Debugging is left as exercise for the reader.
Here we collect several observations in order to arrive to a solution:
In standard C >= 1999, it is garanted that non-netative integers have a representation in bits as one would expected for any base-2 number.
----> Hence, we can trust in bit manipulation of this type of numbers.
If x is a unsigned integer type, tnen x >> 1 == x / 2 and x << 1 == x * 2.
(!) But: It is very probable that bit operations shall be done faster than their arithmetical counterparts.
sqrt(x) is mathematically equivalent to exp(log(x)/2.0).
If we consider truncated logarithms and base-2 exponential for integers, we could obtain a fair estimate: IntExp2( IntLog2(x) / 2) "==" IntSqrtDn(x), where "=" is informal notation meaning almost equatl to (in the sense of a good approximation).
If we write IntExp2( IntLog2(x) / 2 + 1) "==" IntSqrtUp(x), we obtain an "above" approximation for the integer square root.
The approximations obtained in (4.) and (5.) are a little rough (they enclose the true value of sqrt(x) between two consecutive powers of 2), but they could be a very well starting point for any algorithm that searchs for the square roor of x.
The Newton algorithm for square root could be work well for integers, if we have a good first approximation to the real solution.
http://en.wikipedia.org/wiki/Integer_square_root
The final algorithm needs some mathematical comprobations to be plenty sure that always work properly, but I will not do it right now... I will show you the final program, instead:
#include <stdio.h> /* For printf()... */
#include <stdint.h> /* For uintmax_t... */
#include <math.h> /* For sqrt() .... */
int IntLog2(uintmax_t n) {
if (n == 0) return -1; /* Error */
int L;
for (L = 0; n >>= 1; L++)
;
return L; /* It takes < 64 steps for long long */
}
uintmax_t IntExp2(int n) {
if (n < 0)
return 0; /* Error */
uintmax_t E;
for (E = 1; n-- > 0; E <<= 1)
;
return E; /* It takes < 64 steps for long long */
}
uintmax_t IntSqrtDn(uintmax_t n) { return IntExp2(IntLog2(n) / 2); }
uintmax_t IntSqrtUp(uintmax_t n) { return IntExp2(IntLog2(n) / 2 + 1); }
int main(void) {
uintmax_t N = 947612934; /* Try here your number! */
uintmax_t sqrtn = IntSqrtDn(N), /* 1st approx. to sqrt(N) by below */
sqrtn0 = IntSqrtUp(N); /* 1st approx. to sqrt(N) by above */
/* The following means while( abs(sqrt-sqrt0) > 1) { stuff... } */
/* However, we take care of subtractions on unsigned arithmetic, just in case... */
while ( (sqrtn > sqrtn0 + 1) || (sqrtn0 > sqrtn+1) )
sqrtn0 = sqrtn, sqrtn = (sqrtn0 + N/sqrtn0) / 2; /* Newton iteration */
printf("N==%llu, sqrt(N)==%g, IntSqrtDn(N)==%llu, IntSqrtUp(N)==%llu, sqrtn==%llu, sqrtn*sqrtn==%llu\n\n",
N, sqrt(N), IntSqrtDn(N), IntSqrtUp(N), sqrtn, sqrtn*sqrtn);
return 0;
}
The last value stored in sqrtn is the integer square root of N.
The last line of the program just shows all the values, with comprobation purposes.
So, you can try different values of Nand see what happens.
If we add a counter inside the while-loop, we'll see that no more than a few iterations happen.
Remark: It is necessary to verify that the condition abs(sqrtn-sqrtn0)<=1 is always achieved when working in the integer-number setting. If not, we shall have to fix the algorithm.
Remark2: In the initialization sentences, observe that sqrtn0 == sqrtn * 2 == sqrtn << 1. This avoids us some calculations.
// sqrt_i64 returns the integer square root of v.
int64_t sqrt_i64(int64_t v) {
uint64_t q = 0, b = 1, r = v;
for( b <<= 62; b > 0 && b > r; b >>= 2);
while( b > 0 ) {
uint64_t t = q + b;
q >>= 1;
if( r >= t ) {
r -= t;
q += b;
}
b >>= 2;
}
return q;
}
The for loop may be optimized by using the clz machine code instruction.

Resources