Temperature Converter Challenge [closed] - c

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I was doing a coding challenge for a website, the premise was:
In this challenge, write a program that takes in three arguments, a start temperature (in
Celsius), an end temperature (in Celsius) and a step size. Print out a table that goes from > the start temperature to the end temperature, in steps of the step size; you do not
actually need to print the final end temperature if the step size does not exactly match.
You should perform input validation: do not accept start temperatures less than a lower
limit (which your code should specify as a constant) or higher than an upper limit (which
your code should also specify). You should not allow a step size greater than the
difference in temperatures. (This exercise was based on a problem from C Programming
Language).
I got the same results as the solution did, but I'm curios as to why their solution is more efficient (I'd presume it is). Anyone able to explain it to me? Their solution is first followed by mine.
#include <stdio.h>
#define LOWER_LIMIT 0
#define HIGHER_LIMIT 50000
int main(void) {
double fahr, cel;
int limit_low = -1;
int limit_high = -1;
int step = -1;
int max_step_size = 0;
/* Read in lower, higher limit and step */
while(limit_low < (int) LOWER_LIMIT) {
printf("Please give in a lower limit, limit >= %d: ", (int) LOWER_LIMIT);
scanf("%d", &limit_low);
}
while((limit_high <= limit_low) || (limit_high > (int) HIGHER_LIMIT)) {
printf("Please give in a higher limit, %d < limit <= %d: ", limit_low, (int) HIGHER_LIMIT);
scanf("%d", &limit_high);
}
max_step_size = limit_high - limit_low;
while((step <= 0) || (step > max_step_size)) {
printf("Please give in a step, 0 < step >= %d: ", max_step_size);
scanf("%d", &step);
}
/* Initialise Celsius-Variable */
cel = limit_low;
/* Print the Table */
printf("\nCelsius\t\tFahrenheit");
printf("\n-------\t\t----------\n");
while(cel <= limit_high) {
fahr = (9.0 * cel) / 5.0 + 32.0;
printf("%f\t%f\n", cel, fahr);
cel += step;
}
printf("\n");
return 0;
}
My solution:
#include <stdio.h>
#include <stdlib.h>
#define LOW 0
#define HIGH 50000
int main(void)
{
int lower, higher, step, max_step;
float cel, fahren;
printf("\nPlease enter a lower limit, limit >= 0: ");
scanf("%d", &lower);
if (lower < LOW)
{
printf("\nERROR: Lower limit must be >= 0.");
exit(1);
}
printf("\nPlease enter a upper limit, limit <= 50000: ");
scanf("%d", &higher);
if (higher > HIGH)
{
printf("\nERROR: Upper limit must be <= 50,000.");
exit(1);
}
printf("\nPlease enter an increment amount, 0 < step <= 10: ");
scanf("%d", &step);
max_step = higher - lower;
if (step > max_step)
{
printf("\nERROR: Step size cannot exceed difference between higher and lower limit.");
exit(1);
}
printf("Celsuis \tFahrenheit\n");
printf("------- \t-----------\n\n");
cel = (float)lower;
while (cel < higher)
{
fahren = cel * 9/5 + 32;
printf("%f \t%f\n", cel, fahren);
cel = cel + step;
}
return 0;
}

Hmm, which is more efficient... before making that claim we need some metrics, what are we talking about here? Run time? Binary size?
Just a quick example… we can compile both solutions and run them with the "time" command and a worst case (0-50000 with a step of 1) to see what type of times we're using:
"Their" solution size:
text data bss dec hex filename
1937 276 8 2221 8ad a.out
"Their" solution running time:
user 0m 0.024s
sys 0m 0.601s
Your solution size:
text data bss dec hex filename
2054 276 8 2338 922 a.out
Your solution running time:
user 0m 0.025s
sys 0m 1.047s
So your solution takes longer and has a larger image size. Can we say now that "they" have a more efficient program? Not really, the time isn't totally accurate on this scale (and with other things happening in the system) so we'd need a number of runs. Averaged over four* runs:
// you
user 0.0178
system 0.9015
// "them"
user 0.016
system 0.914
So no, they are not really more "efficient" then you are.
There are some trivial things we can do to increase the "efficiency" here, but because the solutions are so similar, and the code is so trivial (a single stepping traversal) I'm not really sure if it matters all that much.
As far as "efficient" code size, you'll note that your .text size is larger than the other solution. Your messages are more verbose and readable, so you take a hit on size. Is that more efficient? Perhaps if size is important, but personally I think readability is more important unless we're talking an embedded solution.
* - you need a lot more runs and a more sensitive timing mechanism, but this is just a quick example

Related

if else-if ladder always outputting final statement, ignoring everything above it [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 days ago.
Improve this question
Having to write a program that takes distance traveled and hours spent traveling (both have to be float type), calculate the average speed, and then output a message relative to the speed output(i.e., if speed is greater than 5 output a message about bikes but if its above 30 output a message about cars).
I tried writing an if else-if ladder but i am now always ending up with the first if statement result regardless of the numbers i input. I'm very new to this so it is probably a simple error im just not seeing but im stuck regardless
int main()
{
float distance = 0.0, time_traveled = 0, speed = 0;
//program that validates input
printf("Your average speed was: %.2f ", distance / time_traveled, &speed);
if ((speed>=0) && (speed<6))
printf("\nWALK");
else if ((speed>=6) && (speed<31))
printf("\nBIKES");
else if ((speed>=31) && (speed<201))
printf("\nCARS");
else
printf("\nSPIRIT AIRLINES #1 !!!!!!");
return 0;
}
Every time I run this program I get this error (compiler warning). However, the text is white in the onlinegdb compiler:
main.c:22:12: warning: too many arguments for format [-Wformat-extra-args]
22 | printf("Your average speed was: %.2f ", distance / time_traveled, &speed);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You never assigned the average speed to the speed variable.
Change
printf("Your average speed was: %.2f ", distance / time_traveled, &speed);
to
speed = distance / time_traveled;
printf("Your average speed was: %.2f ", speed);
You should calculate speed once and use that in your printf and when displaying WALK etc. You also do not need to check the lower bounds for the speeds in your if statements. If speed is 7 for example, it's enough to check else if (speed < 31) since you know that it must be >= 6 since it failed the first if check.
Example:
// program that validates input
// calculate speed
speed = distance / time_traveled;
printf("Your average speed was: %.2f\n", speed);
// bonus:
speed = fabsf(speed); // if in reverse, make it positive
if (speed < 6)
puts("WALK");
else if (speed < 31)
puts("BIKES");
else if (speed < 201)
puts("CARS");
else
puts("SPIRIT AIRLINES #1 !!!!!!");

Im trying to code a Discomfort Index program in C but im stuck [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I've built the main part of the program but the task requires us to add a feature in which if the index is above the "No discomfort" zone, the program returns the decrease in temperature required for the index to be at "No discomfort" (considering humidity is consistent).
The problem im facing is I set a variable named x which i want to represent the decrease in temperature needed but when i try to form a equation to solve for x it only prints 0.Im pretty sure i cant give an equation to the compiler to solve but is there any way i can get the decrease needed printed?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
float main()
{
float T, RH, x, y;
printf("Insert current temperature in Celsius: \n");
scanf("%f", &T);
printf("Insert current humidity percentage: \n");
scanf("%f", &RH);
float DI = T - 0.55 * (1 - 0.01 * RH) * (T - 14.5);
if (DI < 21)
printf("No discomfort");
else if (DI >= 21 && DI < 24)
printf("Under 50 percent population feels discomfort");
else if (DI >= 24 && DI < 27)
printf("Most 50 percent population feels discomfort");
else if (DI >= 27 && DI < 29)
printf("Most of population suffers discomfort");
else if (DI >= 29 && DI < 32)
printf("Everyone feels severe stress");
else if (DI >= 32)
printf("State of medical emergency");
if (DI >= 21)
DI=21;
x=(DI - 14.5 * 0.55(1 - 0.01 * RH))/(1 - 0.55(1 - 0.01*RH));
printf("\nThe temperature should be decreased to %.2f degrees\n", x);
return 0;
Any help is appreciated
Your compiler is already telling you what's wrong here:
x=(DI - 14.5 * 0.55(1 - 0.01 * RH))/(1 - 0.55(1 - 0.01*RH));
Something like this:
error: called object type 'double' is not a function or function pointer
That's because 0.55(1) is math, not C. You need a *.
Always enable and inspect your compiler warnings before wondering why your code doesn't work.

C program, can't for the life of me figure it out

Ok so I have to increment a function into my code to get it to load a bunch of numbers that eventually will reach the sqrt of the number that is input my the user, all by using a while loop. The problem is, the number does not go into the function, and it loops indefinitely because the false is never reached. Any help?
#include <stdio.h>
#include <math.h>
int main(void)
{
double in, out, var, new_guess, old_guess;
printf("Enter a number: ");
scanf("%lf", &in);
while(fabs(in - sqrt(old_guess)) >= 1e-5) {
new_guess = (old_guess + (in / old_guess)) / 2.0;
printf("%11.5lf", old_guess);
}
printf("Estimated square root of %11.5lf: %11.5lf\n", in, new_guess);
return 0;
}
Once you get all your syntax issues resolved, you will still never get the desired result because the math in your predictor/corrector method will never converge. Specifically, fabs(in - sqrt(old_guess)) will always be >= 1e-5 as in will always be greater than the sqrt of old_guess.
Further, if you are using a predictor/corrector method to compute the square root of a number, it rather defeats the purpose to use sqrt in the iteration. If you were going to use the sqrt function to find the answer, you could simply do:
double answer = sqrt (in); /* problem solved */
The purpose of an iterative method is to converge on a solution by using either a rate or average difference to repeatedly refine your guess until it satisfies some condition like a error tolerance between repeated terms (which it appears you are attempting to do here)
To iteratively find the square root of a number using the method you are attempting to use, you first find the next lower or higher perfect square of the number entered by the user. A simple brute force of starting at 1 and incrementing x until x * x is no longer less than in is fine.
You then divide the input by the perfect square to predict the answer, and then take the average of the input divided by the predicted answer plus the predicted answer to correct for error between the terms (and repeat until your error tolerance is reached)
note you should also include an iteration limit to prevent against an endless loop if your solution does not converge for some reason.
Putting it altogether, you could do something similar to:
#include <stdio.h>
#include <math.h>
#define ILIM 64 /* max iteration limit */
#define TOL 1e-5 /* tolerance */
int main(void)
{
double in, n = 0, new_guess, old_guess, root = 1;
printf ("Enter a number: ");
if (scanf ("%lf", &in) != 1) {
fprintf (stderr, "error: invalid input.\n");
return 1;
}
while (root * root < in) /* find next larger perfect square */
root++;
/* compute initial old/new_guess */
old_guess = (in / root + root) / 2.0;
new_guess = (in / old_guess + old_guess) / 2.0;
/* compare old/new_guess, repeat until limit or tolerance met */
while (n++ < ILIM && fabs (new_guess - old_guess) >= TOL) {
old_guess = new_guess;
new_guess = (in / old_guess + old_guess) / 2.0;
}
printf ("Estimated square root of %.5f: %.5f\n", in, new_guess);
printf ("Actual : %.5f\n", sqrt (in));
return 0;
}
(note: sqrt is only used to provide a comparison with your iterative solution)
Example Use/Output
$ ./bin/sqrthelp
Enter a number: 9
Estimated square root of 9.00000: 3.00000
Actual : 3.00000
$ ./bin/sqrthelp
Enter a number: 9.6
Estimated square root of 9.60000: 3.09839
Actual : 3.09839
$ ./bin/sqrthelp
Enter a number: 10
Estimated square root of 10.00000: 3.16228
Actual : 3.16228
$ ./bin/sqrthelp
Enter a number: 24
Estimated square root of 24.00000: 4.89898
Actual : 4.89898
$ ./bin/sqrthelp
Enter a number: 25
Estimated square root of 25.00000: 5.00000
Actual : 5.00000
$ ./bin/sqrthelp
Enter a number: 30
Estimated square root of 30.00000: 5.47723
Actual : 5.47723

Why is recursion taking so long?

In using recursion to calculate the nth number of the fibonacci sequence, I have written this simple program:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
unsigned int long long fibonacci(unsigned int number);
//game of craps
int main(int argc, char** argv)
{
for(int n = 1; n <= 100; n++)
{
printf("%llu\n", fibonacci(n));
}
return (EXIT_SUCCESS);
}
unsigned long long int fibonacci(unsigned int number)
{
if (number == 0 || number == 1)
{
return number;
}
else
{
return fibonacci(number - 2) + fibonacci(number - 1);
}
}
where each call to the n+1 number in the sequence doubles the number of function calls the program has to run. Therefore the number of calls being made to the recursive function is something of 2^n, or exponential complexity. Understood. But where is all of the computing power going? once the nth number in the sequence starts to hit 40, the computer starts taking noticeable time to compute the result where at n = 47 its take 30+ seconds. However my computer shows that I'm only using 21 percent of cpu power. I'm using NetBeans IDE to run the program. It's a quad core system.
the number of calls being made to the recursive function is something of 2^n, or exponential complexity. Understood.
I'm not sure you do entirely understand this, since you seem surprised about how slow it becomes around n=40, and n=47.
With a complexity of 2^n, and an n of 40, that would be 240, or 1,099,511,627,776, or about 1 trillion operations. If your computer can run about one of these operations per nanosecond, i.e. 1 billion operations per second, it would take 1000 seconds to finish.
Consider if n was only 30. 230 is 1,073,741,824, which would take only about 1 second to do on that same computer.
As has been mentioned, you're only using one core. You could parallelize, but that won't help much. Use four cores instead of one, and my n=40 example will still take 250 seconds. Go up to n=42 and you're back to 1000 seconds, because parallelizing at best multiplies your performance, but an algorithm like this grows exponentially.
the posted code contains some extreme over complexity.
even a long long unsigned int cannot contain a Fibonacci value
number 100 (or even close to it)
Suggest using a very simple program to start, one that calculates the Fibonacci sequence. Then use that program to determine how to display the results.
The following program calculates the numbers, is very fast, but still has the problem of overflow of a long long unsigned int
#include <stdio.h> // printf()
int main( void )
{
long long unsigned currentNum = 1;
long long unsigned priorNum = 1;
printf( "1\n1\n" );
for (size_t i = 2; i < 100; i++ )
{
long long unsigned newNum = currentNum+priorNum;
printf( "%llu\n", newNum );
priorNum = currentNum;
currentNum = newNum;
}
}
On my linux 86-64 computer, here are the last few lines of the output, showing the overflow problem.
99194853094755497
160500643816367088
259695496911122585
420196140727489673
679891637638612258
1100087778366101931
1779979416004714189
2880067194370816120
4660046610375530309
7540113804746346429
12200160415121876738
1293530146158671551
13493690561280548289
14787220707439219840
9834167195010216513
6174643828739884737
16008811023750101250
3736710778780434371
So, why is recursion taking so long?
because of the huge number of recursions and the handling of the overflows
The above suggested code eliminates the recursions, but not the overflows and it takes less than a second (on my computer) to run.
You won't exploit a quad core system if you have a single-threaded program.
It will run on one core only, so the 21/25% CPU usage is realistic.
A way to use it all would be, first of all not using recursion as it makes it annoying to do, and when you have a for/while loop split it into 4 while loops and put each of them in a new thread. Then you'll have to manage synchronization in order to print the message properly, but it's not even that hard. You could store all the results in an array and then print it when all the threads are done.
Building on the answer made by #user3629249, you can get rid of the overflows he mentioned by using the infinite precision arithmetic library provided by GMP.
e.g.
#include <stdio.h> // printf
#include <stdlib.h> // free
#include <gmp.h> // mpz_t
int main( void )
{
mpz_t prevNum, currNum, tempNum, counter;
mpz_init_set_si(prevNum, 0);
mpz_init_set_si(currNum, 1);
mpz_init_set_si(tempNum, 1);
mpz_init_set_si(counter, 1);
printf( "0: 0\n" );
while (1) {
char *tempNumRepr = mpz_get_str(NULL, 10, tempNum);
char *counterRepr = mpz_get_str(NULL, 10, counter);
printf("%s: %s\n", counterRepr, tempNumRepr);
free(tempNumRepr);
free(counterRepr);
mpz_add(tempNum, currNum, prevNum); // tempNum = currNum + prevNum;
mpz_add_ui(counter, counter, 1); // counter = counter + 1;
mpz_set(prevNum, currNum); // prevNum = currNum;
mpz_set(currNum, tempNum); // currNum = tempNum;
}
mpz_clear(prevNum);
mpz_clear(currNum);
mpz_clear(tempNum);
mpz_clear(counter);
return EXIT_SUCCESS;
};
To compile it, ensure that you have libgmp installed, type:
~$ gcc fib.c -lgmp
You get massive fibonacci values pretty fast:
~$ ./a.out
0: 0
1: 1
2: 1
3: 2
4: 3
5: 5
6: 8
7: 13
8: 21
9: 34
...
90: 2880067194370816120
91: 4660046610375530309
92: 7540113804746346429
93: 12200160415121876738
94: 19740274219868223167
95: 31940434634990099905
96: 51680708854858323072
97: 83621143489848422977
98: 135301852344706746049
99: 218922995834555169026
100: 354224848179261915075
...
142: 212207101440105399533740733471
143: 343358302784187294870275058337
144: 555565404224292694404015791808
145: 898923707008479989274290850145
146: 1454489111232772683678306641953
147: 2353412818241252672952597492098
148: 3807901929474025356630904134051
149: 6161314747715278029583501626149
150: 9969216677189303386214405760200
...
10456: 6687771891046976665010914682715972428661740561209776353485935351631179302708216108795962659308263419533746676628535531789045787219342206829688433844719175383255599341828410480942962469553971997586487609675800755252584139702413749597015823849849046700521430415467867019518212926720410106893075072562394664597041033593563521410003073230903292197734713471051090595503533547412747118747787351929732433449493727418908972479566909080954709569018619548197645271462668017096925677064951824250666293199593131718849011440475925874263429880250725807157443918222920142864819346465587051597207982477956741428300547495546275347374411309127960079792636429623948756731669388275421014167909883947268371246535572766045766175917299574719971717954980856956555916099403979976768699108922030154574061373884317374443228652666763423361895311742060974910298682465051864682016439317005971937944787596597197162234588349001773183227535867183191706435572614767923270023480287832648770215573899455920695896713514952891911913499762717737021116746179317675622780792638129991728650763618970292905899648572351513919065201266611540504973510404007895858009291738402611754822294670524761118059571137973416151185102238975390542996959456114838498320921216851752236455715812273599551395186676228882752252829522673168259864505917922994675966393982705428427387550834530918600733123354437191268657802903434440996622861582962869292133202292740984119730918997492224957849300327645752441866958526558379656521799598935096546592129670888574358354955519855060127168291877171959996776081517513455753528959306416265886428706197994064431298142841481516239689015446304286858347321708226391039390175388745315544138793021359869227432464706950061238138314080606377506673283324908921190615421862717588664540813607678946107283312579595718137450873566434040358736923152893920579043838335105796035360841757227288861017982575677839192583578548045589322945
...
Use CTRL+C to stop the program.

I do *not* want correct rounding for function exp

The GCC implementation of the C mathematical library on Debian systems has apparently an (IEEE 754-2008)-compliant implementation of the function exp, implying that rounding shall always be correct:
(from Wikipedia) The IEEE floating point standard guarantees that add, subtract, multiply, divide, fused multiply–add, square root, and floating point remainder will give the correctly rounded result of the infinite precision operation. No such guarantee was given in the 1985 standard for more complex functions and they are typically only accurate to within the last bit at best. However, the 2008 standard guarantees that conforming implementations will give correctly rounded results which respect the active rounding mode; implementation of the functions, however, is optional.
It turns out that I am encountering a case where this feature is actually hindering, because the exact result of the exp function is often nearly exactly at the middle between two consecutive double values (1), and then the program carries plenty of several further computations, losing up to a factor 400 (!) in speed: this was actually the explanation to my (ill-asked :-S) Question #43530011.
(1) More precisely, this happens when the argument of exp turns out to be of the form (2 k + 1) × 2-53 with k a rather small integer (like 242 for instance). In particular, the computations involved by pow (1. + x, 0.5) tend to call exp with such an argument when x is of the order of magnitude of 2-44.
Since implementations of correct rounding can be so much time-consuming in certain circumstances, I guess that the developers will also have devised a way to get a slightly less precise result (say, only up to 0.6 ULP or something like this) in a time which is (roughly) bounded for every value of the argument in a given range… (2)
… But how to do this??
(2) What I mean is that I just do not want that some exceptional values of the argument like (2 k + 1) × 2-53 would be much more time-consuming than most values of the same order of magnitude; but of course I do not mind if some exceptional values of the argument go much faster, or if large arguments (in absolute value) need a larger computation time.
Here is a minimal program showing the phenomenon:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
int main (void)
{
int i;
double a, c;
c = 0;
clock_t start = clock ();
for (i = 0; i < 1e6; ++i) // Doing a large number of times the same type of computation with different values, to smoothen random fluctuations.
{
a = (double) (1 + 2 * (rand () % 0x400)) / 0x20000000000000; // "a" has only a few significant digits, and its last non-zero digit is at (fixed-point) position 53.
c += exp (a); // Just to be sure that the compiler will actually perform the computation of exp (a).
}
clock_t stop = clock ();
printf ("%e\n", c); // Just to be sure that the compiler will actually perform the computation.
printf ("Clock time spent: %d\n", stop - start);
return 0;
}
Now after gcc -std=c99 program53.c -lm -o program53:
$ ./program53
1.000000e+06
Clock time spent: 13470008
$ ./program53
1.000000e+06
Clock time spent: 13292721
$ ./program53
1.000000e+06
Clock time spent: 13201616
On the other hand, with program52 and program54 (got by replacing 0x20000000000000 by resp. 0x10000000000000 and 0x40000000000000):
$ ./program52
1.000000e+06
Clock time spent: 83594
$ ./program52
1.000000e+06
Clock time spent: 69095
$ ./program52
1.000000e+06
Clock time spent: 54694
$ ./program54
1.000000e+06
Clock time spent: 86151
$ ./program54
1.000000e+06
Clock time spent: 74209
$ ./program54
1.000000e+06
Clock time spent: 78612
Beware, the phenomenon is implementation-dependent! Apparently, among the common implementations, only those of the Debian systems (including Ubuntu) show this phenomenon.
P.-S.: I hope that my question is not a duplicate: I searched for a similar question thoroughly without success, but maybe I did note use the relevant keywords… :-/
To answer the general question on why the library functions are required to give correctly rounded results:
Floating-point is hard, and often times counterintuitive. Not every programmer has read what they should have. When libraries used to allow some slightly inaccurate rounding, people complained about the precision of the library function when their inaccurate computations inevitably went wrong and produced nonsense. In response, the library writers made their libraries exactly rounded, so now people cannot shift the blame to them.
In many cases, specific knowledge about floating point algorithms can produce considerable improvements to accuracy and/or performance, like in the testcase:
Taking the exp() of numbers very close to 0 in floating-point numbers is problematic, since the result is a number close to 1 while all the precision is in the difference to one, so most significant digits are lost. It is more precise (and significantly faster in this testcase) to compute exp(x) - 1 through the C math library function expm1(x). If the exp() itself is really needed, it is still much faster to do expm1(x) + 1.
A similar concern exists for computing log(1 + x), for which there is the function log1p(x).
A quick fix that speeds up the provided testcase:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
int main (void)
{
int i;
double a, c;
c = 0;
clock_t start = clock ();
for (i = 0; i < 1e6; ++i) // Doing a large number of times the same type of computation with different values, to smoothen random fluctuations.
{
a = (double) (1 + 2 * (rand () % 0x400)) / 0x20000000000000; // "a" has only a few significant digits, and its last non-zero digit is at (fixed-point) position 53.
c += expm1 (a) + 1; // replace exp() with expm1() + 1
}
clock_t stop = clock ();
printf ("%e\n", c); // Just to be sure that the compiler will actually perform the computation.
printf ("Clock time spent: %d\n", stop - start);
return 0;
}
For this case, the timings on my machine are thus:
Original code
1.000000e+06
Clock time spent: 21543338
Modified code
1.000000e+06
Clock time spent: 55076
Programmers with advanced knowledge about the accompanying trade-offs may sometimes consider using approximate results where the precision is not critical
For an experienced programmer it may be possible to write an approximative implementation of a slow function using methods like Newton-Raphson, Taylor or Maclaurin polynomials, specifically inexactly rounded specialty functions from libraries like Intel's MKL, AMD's AMCL, relaxing the floating-point standard compliance of the compiler, reducing precision to ieee754 binary32 (float), or a combination of these.
Note that a better description of the problem would enable a better answer.
Regarding your comment to #EOF 's answer, the "write your own" remark from #NominalAnimal seems simple enough here, even trivial, as follows.
Your original code above seems to have a max possible argument for exp() of a=(1+2*0x400)/0x2000...=4.55e-13 (that should really be 2*0x3FF, and I'm counting 13 zeroes after your 0x2000... which makes it 2x16^13). So that 4.55e-13 max argument is very, very small.
And then the trivial taylor expansion is exp(a)=1+a+(a^2)/2+(a^3)/6+... which already gives you all double's precision for such small arguments. Now, you'll have to discard the 1 part, as explained above, and then that just reduces to expm1(a)=a*(1.+a*(1.+a/3.)/2.) And that should go pretty darn quick! Just make sure a stays small. If it gets a little bigger, just add the next term, a^4/24 (you see how to do that?).
>>EDIT<<
I modified the OP's test program as follows to test a little more stuff (discussion follows code)
/* https://stackoverflow.com/questions/44346371/
i-do-not-want-correct-rounding-for-function-exp/44397261 */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define BASE 16 /*denominator will be (multiplier)xBASE^EXPON*/
#define EXPON 13
#define taylorm1(a) (a*(1.+a*(1.+a/3.)/2.)) /*expm1() approx for small args*/
int main (int argc, char *argv[]) {
int N = (argc>1?atoi(argv[1]):1e6),
multiplier = (argc>2?atoi(argv[2]):2),
isexp = (argc>3?atoi(argv[3]):1); /* flags to turn on/off exp() */
int isexpm1 = 1; /* and expm1() for timing tests*/
int i, n=0;
double denom = ((double)multiplier)*pow((double)BASE,(double)EXPON);
double a, c=0.0, cm1=0.0, tm1=0.0;
clock_t start = clock();
n=0; c=cm1=tm1=0.0;
/* --- to smooth random fluctuations, do the same type of computation
a large number of (N) times with different values --- */
for (i=0; i<N; i++) {
n++;
a = (double)(1 + 2*(rand()%0x400)) / denom; /* "a" has only a few
significant digits, and its last non-zero
digit is at (fixed-point) position 53. */
if ( isexp ) c += exp(a); /* turn this off to time expm1() alone */
if ( isexpm1 ) { /* you can turn this off to time exp() alone, */
cm1 += expm1(a); /* but difference is negligible */
tm1 += taylorm1(a); }
} /* --- end-of-for(i) --- */
int nticks = (int)(clock()-start);
printf ("N=%d, denom=%dx%d^%d, Clock time: %d (%.2f secs)\n",
n, multiplier,BASE,EXPON,
nticks, ((double)nticks)/((double)CLOCKS_PER_SEC));
printf ("\t c=%.20e,\n\t c-n=%e, cm1=%e, tm1=%e\n",
c,c-(double)n,cm1,tm1);
return 0;
} /* --- end-of-function main() --- */
Compile and run it as test to reproduce OP's 0x2000... scenario, or run it with (up to three) optional args test #trials multiplier timeexp where #trials defaults to the OP's 1000000, and multipler defaults to 2 for the OP's 2x16^13 (change it to 4, etc, for her other tests). For the last arg, timeexp, enter a 0 to do only the expm1() (and my unnecessary taylor-like) calculation. The point of that is to show that the bad-timing-cases displayed by the OP disappear with expm1(), which takes "no time at all" regardless of multiplier.
So default runs, test and test 1000000 4, produce (okay, I called the program rounding)...
bash-4.3$ ./rounding
N=1000000, denom=2x16^13, Clock time: 11155070 (11.16 secs)
c=1.00000000000000023283e+06,
c-n=2.328306e-10, cm1=1.136017e-07, tm1=1.136017e-07
bash-4.3$ ./rounding 1000000 4
N=1000000, denom=4x16^13, Clock time: 200211 (0.20 secs)
c=1.00000000000000011642e+06,
c-n=1.164153e-10, cm1=5.680083e-08, tm1=5.680083e-08
So the first thing you'll note is that the OP's c-n using exp() differs substantially from both cm1==tm1 using expm1() and my taylor approx. If you reduce N they come into agreement, as follows...
N=10, denom=2x16^13, Clock time: 941 (0.00 secs)
c=1.00000000000007140954e+01,
c-n=7.140954e-13, cm1=7.127632e-13, tm1=7.127632e-13
bash-4.3$ ./rounding 100
N=100, denom=2x16^13, Clock time: 5506 (0.01 secs)
c=1.00000000000010103918e+02,
c-n=1.010392e-11, cm1=1.008393e-11, tm1=1.008393e-11
bash-4.3$ ./rounding 1000
N=1000, denom=2x16^13, Clock time: 44196 (0.04 secs)
c=1.00000000000011345946e+03,
c-n=1.134595e-10, cm1=1.140730e-10, tm1=1.140730e-10
bash-4.3$ ./rounding 10000
N=10000, denom=2x16^13, Clock time: 227215 (0.23 secs)
c=1.00000000000002328306e+04,
c-n=2.328306e-10, cm1=1.131288e-09, tm1=1.131288e-09
bash-4.3$ ./rounding 100000
N=100000, denom=2x16^13, Clock time: 1206348 (1.21 secs)
c=1.00000000000000232831e+05,
c-n=2.328306e-10, cm1=1.133611e-08, tm1=1.133611e-08
And as far as timing of exp() versus expm1() is concerned, see for yourself...
bash-4.3$ ./rounding 1000000 2
N=1000000, denom=2x16^13, Clock time: 11168388 (11.17 secs)
c=1.00000000000000023283e+06,
c-n=2.328306e-10, cm1=1.136017e-07, tm1=1.136017e-07
bash-4.3$ ./rounding 1000000 2 0
N=1000000, denom=2x16^13, Clock time: 24064 (0.02 secs)
c=0.00000000000000000000e+00,
c-n=-1.000000e+06, cm1=1.136017e-07, tm1=1.136017e-07
Question: you'll note that once the exp() calculation reaches N=10000 trials, its sum remains constant regardless of larger N. Not sure why that would be happening.
>>__SECOND EDIT__<<
Okay, #EOF , "you made me look" with your "heirarchical accumulation" comment. And that indeed works to bring the exp() sum closer (much closer) to the (presumably correct) expm1() sum. The modified code's immediately below followed by a discussion. But one discussion note here: recall multiplier from above. That's gone, and in its same place is expon so that denominator is now 2^expon where the default is 53, matching OP's default (and I believe better matching how she was thinking about it). Okay, and here's the code...
/* https://stackoverflow.com/questions/44346371/
i-do-not-want-correct-rounding-for-function-exp/44397261 */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define BASE 2 /*denominator=2^EXPON, 2^53=2x16^13 default */
#define EXPON 53
#define taylorm1(a) (a*(1.+a*(1.+a/3.)/2.)) /*expm1() approx for small args*/
int main (int argc, char *argv[]) {
int N = (argc>1?atoi(argv[1]):1e6),
expon = (argc>2?atoi(argv[2]):EXPON),
isexp = (argc>3?atoi(argv[3]):1), /* flags to turn on/off exp() */
ncparts = (argc>4?atoi(argv[4]):1), /* #partial sums for c */
binsize = (argc>5?atoi(argv[5]):10);/* #doubles to sum in each bin */
int isexpm1 = 1; /* and expm1() for timing tests*/
int i, n=0;
double denom = pow((double)BASE,(double)expon);
double a, c=0.0, cm1=0.0, tm1=0.0;
double csums[10], cbins[10][65537]; /* c partial sums and heirarchy */
int nbins[10], ibin=0; /* start at lowest level */
clock_t start = clock();
n=0; c=cm1=tm1=0.0;
if ( ncparts > 65536 ) ncparts=65536; /* array size check */
if ( ncparts > 1 ) for(i=0;i<ncparts;i++) cbins[0][i]=0.0; /*init bin#0*/
/* --- to smooth random fluctuations, do the same type of computation
a large number of (N) times with different values --- */
for (i=0; i<N; i++) {
n++;
a = (double)(1 + 2*(rand()%0x400)) / denom; /* "a" has only a few
significant digits, and its last non-zero
digit is at (fixed-point) position 53. */
if ( isexp ) { /* turn this off to time expm1() alone */
double expa = exp(a); /* exp(a) */
c += expa; /* just accumulate in a single "bin" */
if ( ncparts > 1 ) cbins[0][n%ncparts] += expa; } /* accum in ncparts */
if ( isexpm1 ) { /* you can turn this off to time exp() alone, */
cm1 += expm1(a); /* but difference is negligible */
tm1 += taylorm1(a); }
} /* --- end-of-for(i) --- */
int nticks = (int)(clock()-start);
if ( ncparts > 1 ) { /* need to sum the partial-sum bins */
nbins[ibin=0] = ncparts; /* lowest-level has everything */
while ( nbins[ibin] > binsize ) { /* need another heirarchy level */
if ( ibin >= 9 ) break; /* no more bins */
ibin++; /* next available heirarchy bin level */
nbins[ibin] = (nbins[ibin-1]+(binsize-1))/binsize; /*#bins this level*/
for(i=0;i<nbins[ibin];i++) cbins[ibin][i]=0.0; /* init bins */
for(i=0;i<nbins[ibin-1];i++) {
cbins[ibin][(i+1)%nbins[ibin]] += cbins[ibin-1][i]; /*accum in nbins*/
csums[ibin-1] += cbins[ibin-1][i]; } /* accumulate in "one bin" */
} /* --- end-of-while(nprevbins>binsize) --- */
for(i=0;i<nbins[ibin];i++) csums[ibin] += cbins[ibin][i]; /*highest level*/
} /* --- end-of-if(ncparts>1) --- */
printf ("N=%d, denom=%d^%d, Clock time: %d (%.2f secs)\n", n, BASE,expon,
nticks, ((double)nticks)/((double)CLOCKS_PER_SEC));
printf ("\t c=%.20e,\n\t c-n=%e, cm1=%e, tm1=%e\n",
c,c-(double)n,cm1,tm1);
if ( ncparts > 1 ) { printf("\t binsize=%d...\n",binsize);
for (i=0;i<=ibin;i++) /* display heirarchy */
printf("\t level#%d: #bins=%5d, c-n=%e\n",
i,nbins[i],csums[i]-(double)n); }
return 0;
} /* --- end-of-function main() --- */
Okay, and now you can notice two additional command-line args following the old timeexp. They are ncparts for the initial number of bins into which the entire #trials will be distributed. So at the lowest level of the heirarchy, each bin should (modulo bugs:) have the sum of #trials/ncparts doubles. The argument after that is binsize, which will be the number of doubles summed in each bin at every successive level, until the last level has fewer (or equal) #bins as binsize. So here's an example dividing 1000000 trials into 50000 bins, meaning 20doubles/bin at the lowest level, and 5doubles/bin thereafter...
bash-4.3$ ./rounding 1000000 53 1 50000 5
N=1000000, denom=2^53, Clock time: 11129803 (11.13 secs)
c=1.00000000000000465661e+06,
c-n=4.656613e-09, cm1=1.136017e-07, tm1=1.136017e-07
binsize=5...
level#0: #bins=50000, c-n=4.656613e-09
level#1: #bins=10002, c-n=1.734588e-08
level#2: #bins= 2002, c-n=7.974450e-08
level#3: #bins= 402, c-n=1.059379e-07
level#4: #bins= 82, c-n=1.133885e-07
level#5: #bins= 18, c-n=1.136214e-07
level#6: #bins= 5, c-n=1.138542e-07
Note how the c-n for exp() converges pretty nicely towards the expm1() value. But note how it's best at level#5, and isn't converging uniformly at all. And note if you break the #trials into only 5000 initial bins, you get just as good a result,
bash-4.3$ ./rounding 1000000 53 1 5000 5
N=1000000, denom=2^53, Clock time: 11165924 (11.17 secs)
c=1.00000000000003527384e+06,
c-n=3.527384e-08, cm1=1.136017e-07, tm1=1.136017e-07
binsize=5...
level#0: #bins= 5000, c-n=3.527384e-08
level#1: #bins= 1002, c-n=1.164153e-07
level#2: #bins= 202, c-n=1.158332e-07
level#3: #bins= 42, c-n=1.136214e-07
level#4: #bins= 10, c-n=1.137378e-07
level#5: #bins= 4, c-n=1.136214e-07
In fact, playing with ncparts and binsize doesn't seem to show much sensitivity, and it's not always "more is better" (i.e., less for binsize) either. So I'm not sure exactly what's going on. Could be a bug (or two), or could be yet another question for #EOF ...???
>>EDIT -- example showing pair addition "binary tree" heirarchy<<
Example below added as per #EOF 's comment
(Note: re-copy preceding code. I had to edit nbins[ibin] calculation for each next level to nbins[ibin]=(nbins[ibin-1]+(binsize-1))/binsize; from nbins[ibin]=(nbins[ibin-1]+2*binsize)/binsize; which was "too conservative" to create ...16,8,4,2 sequence)
bash-4.3$ ./rounding 1024 53 1 512 2
N=1024, denom=2^53, Clock time: 36750 (0.04 secs)
c=1.02400000000011573320e+03,
c-n=1.157332e-10, cm1=1.164226e-10, tm1=1.164226e-10
binsize=2...
level#0: #bins= 512, c-n=1.159606e-10
level#1: #bins= 256, c-n=1.166427e-10
level#2: #bins= 128, c-n=1.166427e-10
level#3: #bins= 64, c-n=1.161879e-10
level#4: #bins= 32, c-n=1.166427e-10
level#5: #bins= 16, c-n=1.166427e-10
level#6: #bins= 8, c-n=1.166427e-10
level#7: #bins= 4, c-n=1.166427e-10
level#8: #bins= 2, c-n=1.164153e-10
>>EDIT -- to show #EOF's elegant solution in comment below<<
"Pair addition" can be elegantly accomplished recursively, as per #EOF's comment below, which I'm reproducing here. (Note case 0/1 at end-of-recursion to handle n even/odd.)
/* Quoting from EOF's comment...
What I (EOF) proposed is effectively a binary tree of additions:
a+b+c+d+e+f+g+h as ((a+b)+(c+d))+((e+f)+(g+h)).
Like this: Add adjacent pairs of elements, this produces
a new sequence of n/2 elements.
Recurse until only one element is left.
(Note that this will require n/2 elements of storage,
rather than a fixed number of bins like your implementation) */
double trecu(double *vals, double sum, int n) {
int midn = n/2;
switch (n) {
case 0: break;
case 1: sum += *vals; break;
default: sum = trecu(vals+midn, trecu(vals,sum,midn), n-midn); break; }
return(sum);
}
This is an "answer"/followup to EOF's preceding comments re his trecu() algorithm and code for his "binary tree summation" suggestion. "Prerequisites" before reading this are reading that discussion. It would be nice to collect all that in one organized place, but I haven't done that yet...
...What I did do was build EOF's trecu() into the test program from the preceding answer that I'd written by modifying the OP's original test program. But then I found that trecu() generated exactly (and I mean exactly) the same answer as the "plain sum" c using exp(), not the sum cm1 using expm1() that we'd expected from a more accurate binary tree summation.
But that test program's a bit (maybe two bits:) "convoluted" (or, as EOF said, "unreadable"), so I wrote a separate smaller test program, given below (with example runs and discussion below that), to separately test/exercise trecu(). Moreover, I also wrote function bintreesum() into the code below, which abstracts/encapsulates the iterative code for binary tree summation that I'd embedded into the preceding test program. In that preceding case, my iterative code indeed came close to the cm1 answer, which is why I'd expected EOF's recursive trecu() to do the same. Long-and-short of it is that, below, same thing happens -- bintreesum() remains close to correct answer, while trecu() gets further away, exactly reproducing the "plain sum".
What we're summing below is just sum(i),i=1...n, which is just the well-known n(n+1)/2. But that's not quite right -- to reproduce OP's problem, summand is not sum(i) alone but rather sum(1+i*10^(-e)), where e can be given on the command-line. So for, say, n=5, you don't get 15 but rather 5.000...00015, or for n=6 you get 6.000...00021, etc. And to avoid a long, long format, I printf() sum-n to remove that integer part. Okay??? So here's the code...
/* Quoting from EOF's comment...
What I (EOF) proposed is effectively a binary tree of additions:
a+b+c+d+e+f+g+h as ((a+b)+(c+d))+((e+f)+(g+h)).
Like this: Add adjacent pairs of elements, this produces
a new sequence of n/2 elements.
Recurse until only one element is left. */
#include <stdio.h>
#include <stdlib.h>
double trecu(double *vals, double sum, int n) {
int midn = n/2;
switch (n) {
case 0: break;
case 1: sum += *vals; break;
default: sum = trecu(vals+midn, trecu(vals,sum,midn), n-midn); break; }
return(sum);
} /* --- end-of-function trecu() --- */
double bintreesum(double *vals, int n, int binsize) {
double binsum = 0.0;
int nbin0 = (n+(binsize-1))/binsize,
nbin1 = (nbin0+(binsize-1))/binsize,
nbins[2] = { nbin0, nbin1 };
double *vbins[2] = {
(double *)malloc(nbin0*sizeof(double)),
(double *)malloc(nbin1*sizeof(double)) },
*vbin0=vbins[0], *vbin1=vbins[1];
int ibin=0, i;
for ( i=0; i<nbin0; i++ ) vbin0[i] = 0.0;
for ( i=0; i<n; i++ ) vbin0[i%nbin0] += vals[i];
while ( nbins[ibin] > 1 ) {
int jbin = 1-ibin; /* other bin, 0<-->1 */
nbins[jbin] = (nbins[ibin]+(binsize-1))/binsize;
for ( i=0; i<nbins[jbin]; i++ ) vbins[jbin][i] = 0.0;
for ( i=0; i<nbins[ibin]; i++ )
vbins[jbin][i%nbins[jbin]] += vbins[ibin][i];
ibin = jbin; /* swap bins for next pass */
} /* --- end-of-while(nbins[ibin]>0) --- */
binsum = vbins[ibin][0];
free((void *)vbins[0]); free((void *)vbins[1]);
return ( binsum );
} /* --- end-of-function bintreesum() --- */
#if defined(TESTTRECU)
#include <math.h>
#define MAXN (2000000)
int main(int argc, char *argv[]) {
int N = (argc>1? atoi(argv[1]) : 1000000 ),
e = (argc>2? atoi(argv[2]) : -10 ),
binsize = (argc>3? atoi(argv[3]) : 2 );
double tens = pow(10.0,(double)e);
double *vals = (double *)malloc(sizeof(double)*MAXN),
sum = 0.0;
double trecu(), bintreesum();
int i;
if ( N > MAXN ) N=MAXN;
for ( i=0; i<N; i++ ) vals[i] = 1.0 + tens*(double)(i+1);
for ( i=0; i<N; i++ ) sum += vals[i];
printf(" N=%d, Sum_i=1^N {1.0 + i*%.1e} - N = %.8e,\n"
"\t plain_sum-N = %.8e,\n"
"\t trecu-N = %.8e,\n"
"\t bintreesum-N = %.8e \n",
N, tens, tens*((double)N)*((double)(N+1))/2.0,
sum-(double)N,
trecu(vals,0.0,N)-(double)N,
bintreesum(vals,N,binsize)-(double)N );
} /* --- end-of-function main() --- */
#endif
So if you save that as trecu.c, then compile it as cc –DTESTTRECU trecu.c –lm –o trecu And then run with zero to three optional command-line args as trecu #trials e binsize Defaults are #trials=1000000 (like OP's program), e=–10, and binsize=2 (for my bintreesum() function to do a binary-tree sum rather than larger-size bins).
And here are some test results illustrating the problem described above,
bash-4.3$ ./trecu
N=1000000, Sum_i=1^N {1.0 + i*1.0e-10} - N = 5.00000500e+01,
plain_sum-N = 5.00000500e+01,
trecu-N = 5.00000500e+01,
bintreesum-N = 5.00000500e+01
bash-4.3$ ./trecu 1000000 -15
N=1000000, Sum_i=1^N {1.0 + i*1.0e-15} - N = 5.00000500e-04,
plain_sum-N = 5.01087168e-04,
trecu-N = 5.01087168e-04,
bintreesum-N = 5.00000548e-04
bash-4.3$
bash-4.3$ ./trecu 1000000 -16
N=1000000, Sum_i=1^N {1.0 + i*1.0e-16} - N = 5.00000500e-05,
plain_sum-N = 6.67552231e-05,
trecu-N = 6.67552231e-05,
bintreesum-N = 5.00001479e-05
bash-4.3$
bash-4.3$ ./trecu 1000000 -17
N=1000000, Sum_i=1^N {1.0 + i*1.0e-17} - N = 5.00000500e-06,
plain_sum-N = 0.00000000e+00,
trecu-N = 0.00000000e+00,
bintreesum-N = 4.99992166e-06
So you can see that for the default run, e=–10, everybody's doing everything right. That is, the top line that says "Sum" just does the n(n+1)/2 thing, so presumably displays the right answer. And everybody below that agrees for the default e=–10 test case. But for the e=–15 and e=–16 cases below that, trecu() exactly agrees with the plain_sum, while bintreesum stays pretty close to the right answer. And finally, for e=–17, plain_sum and trecu() have "disappeared", while bintreesum()'s still hanging in there pretty well.
So trecu()'s correctly doing the sum all right, but its recursion's apparently not doing that "binary tree" type of thing that my more straightforward iterative bintreesum()'s apparently doing correctly. And that indeed demonstrates that EOF's suggestion for "binary tree summation" realizes quite an improvement over the plain_sum for these 1+epsilon kind of cases. So we'd really like to see his trecu() recursion work!!! When I originally looked at it, I thought it did work. But that double-recursion (is there a special name for that?) in his default: case is apparently more confusing (at least to me:) than I thought. Like I said, it is doing the sum, but not the "binary tree" thing.
Okay, so who'd like to take on the challenge and explain what's going on in that trecu() recursion? And, maybe more importantly, fix it so it does what's intended. Thanks.

Resources