When I run the following code:
#include <stdio.h>
int main()
{
int i = 0;
volatile long double sum = 0;
for (i = 1; i < 50; ++i) /* first snippet */
{
sum += (long double)1 / i;
}
printf("%.20Lf\n", sum);
sum = 0;
for (i = 49; i > 0; --i) /* second snippet */
{
sum += (long double)1 / i;
}
printf("%.20Lf", sum);
return 0;
}
The output is:
4.47920533832942346919
4.47920533832942524555
Shouldn't the two numbers be same?
And more interestingly, the following code:
#include <stdio.h>
int main()
{
int i = 0;
volatile long double sum = 0;
for (i = 1; i < 100; ++i) /* first snippet */
{
sum += (long double)1 / i;
}
printf("%.20Lf\n", sum);
sum = 0;
for (i = 99; i > 0; --i) /* second snippet */
{
sum += (long double)1 / i;
}
printf("%.20Lf", sum);
return 0;
}
produces:
5.17737751763962084084
5.17737751763962084084
So why are they different then and same now?
First, please correct your code. By C standard, %lf isn't principal for *printf ('l' is void, the data type remains double). To print long double, one should use %Lf. With your variant %lf, it's possible to get into a bug with improper format, cut-down value, etc. (You seem running 32-bit environment: in 64 bits, both Unix and Windows pass double in XMM registers, but long double otherwhere - stack for Unix, memory by pointer for Windows. On Windows/x86_64, you code will segfault because callee expects pointer. But, with Visual Studio, long double is AFAIK aliased to double, so you can remain ignorant of this change.)
Second, you can't be sure this code is not optimized by your C compiler to compile-time calculations (which can be done with more precision than default run-time one). To avoid such optimization, mark sum as volatile.
With these changes, your code shows:
At Linux/amd64, gcc4.8:
for 50:
4.47920533832942505776
4.47920533832942505820
for 100:
5.17737751763962026144
5.17737751763962025971
At FreeBSD/i386, gcc4.8, without precision setting or with explicit fpsetprec(FP_PD):
4.47920533832942346919
4.47920533832942524555
5.17737751763962084084
5.17737751763962084084
(the same as in your example);
but, the same test on FreeBSD with fpsetprec(FP_PE), which switches FPU to real long double operations:
4.47920533832942505776
4.47920533832942505820
5.17737751763962026144
5.17737751763962025971
identical to Linux case; so, in real long double, there is some real difference with 100 summands, and it is, in accordance with common sense, larger than for 50. But your platform defaults to rounding to double.
And, finally, in general, this is well-known effect of a finite precision and consequent rounding. For example, in this classical book, this misrounding of decreasing number series sum is explained in the very first chapters.
I am not really ready now to investigate source of results with 50 summands and rounding to double, why it shows such huge difference and why this difference is compensated with 100 summands. That needs much deeper investigation than I can afford now, but, I hope, this answer clearly shows you a next place to dig.
UPDATE: if it's Windows, you can manipulate FPU mode with _controlfp() and _controlfp_s(). In Linux, _FPU_SETCW does the same. This description elaborates some details and gives example code.
UPDATE2: using Kahan summation gives stable results in all cases. The following shows 4 values: ascending i, no KS; ascending i, KS; descending i, no KS; descending i, KS:
50 and FPU to double:
4.47920533832942346919 4.47920533832942524555
4.47920533832942524555 4.47920533832942524555
100 and FPU to double:
5.17737751763962084084 5.17737751763961995266
5.17737751763962084084 5.17737751763961995266
50 and FPU to long double:
4.47920533832942505776 4.47920533832942524555
4.47920533832942505820 4.47920533832942524555
100 and FPU to long double:
5.17737751763962026144 5.17737751763961995266
5.17737751763962025971 5.17737751763961995266
you can see difference disappeared, results are stable. I would assume this is nearly final point that can be added here :)
Related
I can't get the correct value of 15136704000 for the third print line and I am not sure what the issue is. It works correctly when compiled via gcc on Linux but Windows keeps spitting out nonsense and I just would like to understand why.
Windows displays it as Which is 2251802112 inches away.
#include <stdio.h>
int main(void)
{
const int five = 5;
const int eight = 8;
const int mi_to_in = 63360;
int miles_to_moon = 238900;
int km_to_moon = (float) eight / five * miles_to_moon;
unsigned long inches_to_moon = (long) miles_to_moon * mi_to_in;
printf("The moon is %d miles away.\n", miles_to_moon);
printf("Which is equivalent to %d kilometers away.\n", km_to_moon);
printf("Which is %lu inches away.\n", inches_to_moon);
}
As commented by #jamesdlin, the expression (long)miles_to_moon * mi_to_in causes an arithmetic overflow on Windows because the type long only has 32 bits on this system, including on its 64-bit version. Using unsigned long long for this computation would solve the problem, and you should actually use long for mi_to_in and miles_to_moon for portability to some systems.
The C Standard provides fixed length integer types such as int32_t and int64_t defined in <stdin.h> on systems that support them. These types could be used for these variables with the proper range, but for better portability and simplicity, you should use double for such computations:
#include <stdio.h>
int main() {
double mi_to_in = 63360; /* exact figure */
double mi_to_km = 1.60934; /* exact figure */
double miles_to_moon = 238900; /* average distance approximation */
double km_to_moon = miles_to_moon * mi_to_km;
double inches_to_moon = miles_to_moon * mi_to_in;
printf("The moon is %.0f miles away.\n", miles_to_moon);
printf("Which is equivalent to %.0f kilometers away.\n", km_to_moon);
printf("Which is %.0f inches away.\n", inches_to_moon);
return 0;
}
Output:
The moon is 238900 miles away.
Which is equivalent to 384471 kilometers away.
Which is 15136704000 inches away.
Note however that multiplying an approximate figure by an exact one does not increase the precision, which the number of significant digits in the above output might suggest. Rounding these figures seems preferable, yet this would produce 384500 km, which is not the commonly used figure 384400 km.
A more precise average semi-axis is 384399 km, approximately 238855 miles, commonly converted to 238900 mi.
Rounding to a specified number of significant digits is not simple and there is no standard function in the C library to do it. You can use snprintf with %.3e to produce the digits in exponential format, and convert back using strtod, but is cumbersome and inefficient.
I'm implementing my own decrease-and-conquer method for an.
Here's the program:
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
double dncpow(int a, int n)
{
double p = 1.0;
if(n != 0)
{
p = dncpow(a, n / 2);
p = p * p;
if(n % 2)
{
p = p * (double)a;
}
}
return p;
}
int main()
{
int a;
int n;
int a_upper = 10;
int n_upper = 50;
int times = 5;
time_t t;
srand(time(&t));
for(int i = 0; i < times; ++i)
{
a = rand() % a_upper;
n = rand() % n_upper;
printf("a = %d, n = %d\n", a, n);
printf("pow = %.0f\ndnc = %.0f\n\n", pow(a, n), dncpow(a, n));
}
return 0;
}
My code works for small values of a and n, but a mismatch in the output of pow() and dncpow() is observed for inputs such as:
a = 7, n = 39
pow = 909543680129861204865300750663680
dnc = 909543680129861348980488826519552
I'm pretty sure that the algorithm is correct, but dncpow() is giving me wrong answers.
Can someone please help me rectify this? Thanks in advance!
Simple as that, these numbers are too large for what your computer can represent exactly in a single variable. With a floating point type, there's an exponent stored separately and therefore it's still possible to represent a number near the real number, dropping the lowest bits of the mantissa.
Regarding this comment:
I'm getting similar outputs upon replacing 'double' with 'long long'. The latter is supposed to be stored exactly, isn't it?
If you call a function taking double, it won't magically operate on long long instead. Your value is simply converted to double and you'll just get the same result.
Even with a function handling long long (which has 64 bits on nowadays' typical platforms), you can't deal with such large numbers. 64 bits aren't enough to store them. With an unsigned integer type, they will just "wrap around" to 0 on overflow. With a signed integer type, the behavior of overflow is undefined (but still somewhat likely a wrap around). So you'll get some number that has absolutely nothing to do with your expected result. That's arguably worse than the result with a floating point type, that's just not precise.
For exact calculations on large numbers, the only way is to store them in an array (typically of unsigned integers like uintmax_t) and implement all the arithmetics yourself. That's a nice exercise, and a lot of work, especially when performance is of interest (the "naive" arithmetic algorithms are typically very inefficient).
For some real-life program, you won't reinvent the wheel here, as there are libraries for handling large numbers. The arguably best known is libgmp. Read the manuals there and use it.
Just found the following line in some old src code:
int e = (int)fmod(matrix[i], n);
where matrix is an array of int, and n is a size_t
I'm wondering why the use of fmod rather than % where we have integer arguments, i.e. why not:
int e = (matrix[i]) % n;
Could there possibly be a performance reason for choosing fmod over % or is it just a strange bit of code?
Could there possibly be a performance reason for choosing fmod over %
or is it just a strange bit of code?
The fmod might be a bit faster on architectures with high-latency IDIV instruction, that takes (say) ~50 cycles or more, so fmod's function call and int <---> doubleconversions cost can be amortized.
According to Agner's Fog instruction tables, IDIV on AMD K10 architecture takes 24-55 cycles. Comparing with modern Intel Haswell, its latency range is listed as 22-29 cycles, however if there are no dependency chains, the reciprocal throughput is much better on Intel, 8-11 clock cycles.
fmod might be a tiny bit faster than the integer division on selected architectures.
Note however that if n has a known non zero value at compile time, matrix[i] % n would be compiled as a multiplication with a small adjustment, which should be much faster than both the integer modulus and the floating point modulus.
Another interesting difference is the behavior on n == 0 and INT_MIN % -1. The integer modulus operation invokes undefined behavior on overflow which results in abnormal program termination on many current architectures. Conversely, the floating point modulus does not have these corner cases, the result is +Infinity, -Infinity, Nan depending on the value of matrix[i] and -INT_MIN, all exceeding the range of int and the conversion back to int is implementation defined, but does not usually cause abnormal program termination. This might be the reason for the original programmer to have chosen this surprising solution.
Experimentally (and quite counter-intuitively), fmod is faster than % - at least on AMD Phenom(tm) II X4 955 with 6400 bogomips. Here are two programs that use either of the techniques, both compiled with the same compiler (GCC) and the same options (cc -O3 foo.c -lm), and ran on the same hardware:
#include <math.h>
#include <stdio.h>
int main()
{
int volatile a=10,b=12;
int i, sum = 0;
for (i = 0; i < 1000000000; i++)
sum += a % b;
printf("%d\n", sum);
return 0;
}
Running time: 9.07 sec.
#include <math.h>
#include <stdio.h>
int main()
{
int volatile a=10,b=12;
int i, sum = 0;
for (i = 0; i < 1000000000; i++)
sum += (int)fmod(a, b);
printf("%d\n", sum);
return 0;
}
Running time: 8.04 sec.
my algorithm calculates the arithmetic operations given below,for small values it works perfectly but for large numbers such as 218194447 it returns a random value,I have tried to use long long int,double but nothing works because modulus function which I have used can only be used with int types , can anyone explain how to solve it or could provide a links that can be useful
#include<stdio.h>
#include<math.h>
int main()
{
long long i,j;
int t,n;
scanf("%d\n",&t);
while(t--)
{
scanf("%d",&n);
long long k;
i = (n*n);
k = (1000000007);
j = (i % k);
printf("%d\n",j);
}
return 0;
}
You could declare your variables as int64_t or long long ; then they would compute the modulus in their range (e.g. 64 bits for int64_t). And it would work correctly only if all intermediate values fit in their range.
However, you probably want or need bignums. I suggest you to learn and use GMPlib for that.
BTW, don't use pow since it computes in floating point. Try i = n * n; instead of i = pow(n,2);
P.S. this is not for a beginner in C programming, using gmplib requires some fluency with C programming (and programming in general)
The problem in your code is that intermittent values of your computation exceed the range of values that can be stored in an int. n^2 for values of n>2^30 cannot be represented as int.
Follow the link above given by R.T. for a way of doing modulo on big numbers. That won't be enough though, since you also need a class/library that can handle big integer values . With only standard C libraries in place, that will otherwise be a though task do do on your own. (ok, for 2^31, a 64 bit integer would do, but if you're going even larger, you're out of luck again)
After accept answer
To find the modulo of a number n raised to some power p (2 in OP's case), there is no need to first calculate power(n,p). Instead calculate intermediate modulo values as n is raise to intermediate powers.
The following code works with p==2 as needed by OP, but also works quickly if p=1000000000.
The only wider integers needed are integers that are twice as wide as n.
Performing all this with unsigned integers simplifies the needed code.
The resultant code is quite small.
#include <stdint.h>
uint32_t powmod(uint32_t base, uint32_t expo, uint32_t mod) {
// `y = 1u % mod` needed only for the cases expo==0, mod<=1
// otherwise `y = 1u` would do.
uint32_t y = 1u % mod;
while (expo) {
if (expo & 1u) {
y = ((uint64_t) base * y) % mod;
}
expo >>= 1u;
base = ((uint64_t) base * base) % mod;
}
return y;
}
#include<stdio.h>
#include<math.h>
int main(void) {
unsigned long j;
unsigned t, n;
scanf("%u\n", &t);
while (t--) {
scanf("%u", &n);
unsigned long k;
k = 1000000007u;
j = powmod(n, 2, k);
printf("%lu\n", j);
}
return 0;
}
I want to know the first double from 0d upwards that deviates by the long of the "same value" by some delta, say 1e-8. I'm failing here though. I'm trying to do this in C although I usually use managed languages, just in case. Please help.
#include <stdio.h>
#include <limits.h>
#define DELTA 1e-8
int main() {
double d = 0; // checked, the literal is fine
long i;
for (i = 0L; i < LONG_MAX; i++) {
d=i; // gcc does the cast right, i checked
if (d-i > DELTA || d-i < -DELTA) {
printf("%f", d);
break;
}
}
}
I'm guessing that the issue is that d-i casts i to double and therefore d==i and then the difference is always 0. How else can I detect this properly -- I'd prefer fun C casting over comparing strings, which would take forever.
ANSWER: is exactly as we expected. 2^53+1 = 9007199254740993 is the first point of difference according to standard C/UNIX/POSIX tools. Thanks much to pax for his program. And I guess mathematics wins again.
Doubles in IEE754 have a precision of 52 bits which means they can store numbers accurately up to (at least) 251.
If your longs are 32-bit, they will only have the (positive) range 0 to 231 so there is no 32-bit long that cannot be represented exactly as a double. For a 64-bit long, it will be (roughly) 252 so I'd be starting around there, not at zero.
You can use the following program to detect where the failures start to occur. An earlier version I had relied on the fact that the last digit in a number that continuously doubles follows the sequence {2,4,8,6}. However, I opted eventually to use a known trusted tool (bc) for checking the whole number, not just the last digit.
Keep in mind that this may be affected by the actions of sprintf() rather than the real accuracy of doubles (I don't think so personally since it had no troubles with certain numbers up to 2143).
This is the program:
#include <stdio.h>
#include <string.h>
int main() {
FILE *fin;
double d = 1.0; // 2^n-1 to avoid exact powers of 2.
int i = 1;
char ds[1000];
char tst[1000];
// Loop forever, rely on break to finish.
while (1) {
// Get C version of the double.
sprintf (ds, "%.0f", d);
// Get bc version of the double.
sprintf (tst, "echo '2^%d - 1' | bc >tmpfile", i);
system(tst);
fin = fopen ("tmpfile", "r");
fgets (tst, sizeof (tst), fin);
fclose (fin);
tst[strlen (tst) - 1] = '\0';
// Check them.
if (strcmp (ds, tst) != 0) {
printf( "2^%d - 1 <-- bc failure\n", i);
printf( " got [%s]\n", ds);
printf( " expected [%s]\n", tst);
break;
}
// Output for status then move to next.
printf( "2^%d - 1 = %s\n", i, ds);
d = (d + 1) * 2 - 1; // Again, 2^n - 1.
i++;
}
}
This keeps going until:
2^51 - 1 = 2251799813685247
2^52 - 1 = 4503599627370495
2^53 - 1 = 9007199254740991
2^54 - 1 <-- bc failure
got [18014398509481984]
expected [18014398509481983]
which is about where I expected it to fail.
As an aside, I originally used numbers of the form 2n but that got me up to:
2^136 = 87112285931760246646623899502532662132736
2^137 = 174224571863520493293247799005065324265472
2^138 = 348449143727040986586495598010130648530944
2^139 = 696898287454081973172991196020261297061888
2^140 = 1393796574908163946345982392040522594123776
2^141 = 2787593149816327892691964784081045188247552
2^142 = 5575186299632655785383929568162090376495104
2^143 <-- bc failure
got [11150372599265311570767859136324180752990210]
expected [11150372599265311570767859136324180752990208]
with the size of a double being 8 bytes (checked with sizeof). It turned out these numbers were of the binary form "1000..." which can be represented for far longer with doubles. That's when I switched to using 2n-1 to get a better bit pattern: all one bits.
The first long to be 'wrong' when cast to a double will not be off by 1e-8, it will be off by 1. As long as the double can fit the long in its significand, it will represent it accurately.
I forget exactly how many bits a double has for precision vs offset, but that would tell you the max size it could represent. The first long to be wrong should have the binary form 10000..., so you can find it much quicker by starting at 1 and left-shifting.
Wikipedia says 52 bits in the significand, not counting the implicit starting 1. That should mean the first long to be cast to a different value is 2^53.
Although I'm hesitant to mention Fortran 95 and successors in this discussion, I'll mention that Fortran since the 1990 standard has offered a SPACING intrinsic function which tells you what the difference between representable REALs are about a given REAL. You could do a binary search on this, stopping when SPACING(X) > DELTA. For compilers that use the same floating point model as the one you are interested in (likely to be the IEEE754 standard), you should get the same results.
Off hand, I thought that doubles could represent all integers (within their bounds) exactly.
If that is not the case, then you're going to want to cast both i and d to something with MORE precision than either of them. Perhaps a long double will work.