Related
I have seen various answer here that depicts Strange behavior of pow function in C.
But I Have something different to ask here.
In the below code I have initialized int x = pow(10,2) and int y = pow(10,n) (int n = 2).
In first case it when I print the result it shows 100 and in the other case it comes out to be 99.
I know that pow returns double and it gets truncated on storing in int, but I want to ask why the output comes to be different.
CODE1
#include<stdio.h>
#include<math.h>
int main()
{
int n = 2;
int x;
int y;
x = pow(10,2); //Printing Gives Output 100
y = pow(10,n); //Printing Gives Output 99
printf("%d %d" , x , y);
}
Output : 100 99
Why is the output coming out to be different. ?
My gcc version is 4.9.2
Update :
Code 2
int main()
{
int n = 2;
int x;
int y;
x = pow(10,2); //Printing Gives Output 100
y = pow(10,n); //Printing Gives Output 99
double k = pow(10,2);
double l = pow(10,n);
printf("%d %d\n" , x , y);
printf("%f %f\n" , k , l);
}
Output : 100 99
100.000000 100.000000
Update 2 Assembly Instructions FOR CODE1
Generated Assembly Instructions GCC 4.9.2 using gcc -S -masm=intel :
.LC1:
.ascii "%d %d\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
push ebp
mov ebp, esp
and esp, -16
sub esp, 48
call ___main
mov DWORD PTR [esp+44], 2
mov DWORD PTR [esp+40], 100 //Concerned Line
fild DWORD PTR [esp+44]
fstp QWORD PTR [esp+8]
fld QWORD PTR LC0
fstp QWORD PTR [esp]
call _pow //Concerned Line
fnstcw WORD PTR [esp+30]
movzx eax, WORD PTR [esp+30]
mov ah, 12
mov WORD PTR [esp+28], ax
fldcw WORD PTR [esp+28]
fistp DWORD PTR [esp+36]
fldcw WORD PTR [esp+30]
mov eax, DWORD PTR [esp+36]
mov DWORD PTR [esp+8], eax
mov eax, DWORD PTR [esp+40]
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], OFFSET FLAT:LC1
call _printf
leave
ret
.section .rdata,"dr"
.align 8
LC0:
.long 0
.long 1076101120
.ident "GCC: (tdm-1) 4.9.2"
.def _pow; .scl 2; .type 32; .endef
.def _printf; .scl 2; .type 32; .endef
I know that pow returns double and it gets truncated on storing in int, but I want to ask why the output comes to be different.
You must first, if you haven't already, divest yourself of the idea that floating-point numbers are in any way sensible or predictable. double only approximates real numbers and almost anything you do with a double is likely to be an approximation to the actual result.
That said, as you have realized, pow(10, n) resulted in a value like 99.99999999999997, which is an approximation accurate to 15 significant figures. And then you told it to truncate to the largest integer less than that, so it threw away most of those.
(Aside: there is rarely a good reason to convert a double to an int. Usually you should either format it for display with something like sprintf("%.0f", x), which does rounding correctly, or use the floor function, which can handle floating-point numbers that may be out of the range of an int. If neither of those suit your purpose, like in currency or date calculations, possibly you should not be using floating point numbers at all.)
There are two weird things going on here. First, why is pow(10, n) inaccurate? 10, 2, and 100 are all precisely representable as double. The best answer I can offer is that the C standard library you are using has a bug. (The compiler and the standard library, which I assume are gcc and glibc, are developed on different release schedules and by different teams. If pow is returning inaccurate results, that is probably a bug in glibc, not gcc.)
In the comments on your question, amdn found a glibc bug to do with FP rounding that might be related and another Q&A that goes into more detail about why this happens and how it's not a violation of the C standard. chux's answer also addresses this. (C doesn't require implementation of IEEE 754, but even if it did, pow isn't required to use correct rounding.) I will still call this a glibc bug, because it's an undesirable property.
(It's also conceivable, though unlikely, that your processor's FPU is wrong.)
Second, why is pow(10, n) different from pow(10, 2)? This one is far easier. gcc optimizes away function calls for which the result can be calculated at compile time, so pow(10, 2) is almost certainly being optimized to 100.0. If you look at the generated assembly code, you will find only one call to pow.
The GCC manual, section 6.59 describes which standard library functions may be treated in this way (follow the link for the full list):
The remaining functions are provided for optimization purposes.
With the exception of built-ins that have library equivalents such as the standard C library functions discussed below, or that expand to library calls, GCC built-in functions are always expanded inline and thus do not have corresponding entry points and their address cannot be obtained. Attempting to use them in an expression other than a function call results in a compile-time error.
[...]
The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).
So it would seem you can disable this behavior with -fno-builtin-pow.
Why is the output coming out to be different. ? (in the updated appended code)
We do not know the values are that different.
When comparing the textual out of int/double, be sure to print the double with sufficient precision to see if it is 100.000000 or just near 100.000000 or in hex to remove all doubt.
printf("%d %d\n" , x , y);
// printf("%f %f\n" , k , l);
// Is it the FP number just less than 100?
printf("%.17e %.17e\n" , k , l); // maybe 9.99999999999999858e+01
printf("%a %a\n" , k , l); // maybe 0x1.8ffffffffffff0000p+6
Why is the output coming out to be different. ? (in the original code)
C does not specify the accuracy of most <math.h> functions. The following are all compliant results.
// Higher quality functions return 100.0
pow(10,2) --> 100.0
// Lower quality and/or faster one may return nearby results
pow(10,2) --> 100.0000000000000142...
pow(10,2) --> 99.9999999999999857...
Assigning a floating point (FP) number to an int simple drops the fraction regardless of how close the fraction is to 1.0
When converting FP to an integer, better to control the conversion and round to cope with minor computational differences.
// long int lround(double x);
long i = lround(pow(10.0,2.0));
You're not the first to find this. Here's a discussion form 2013:
pow() cast to integer, unexpected result
I'm speculating that the assembly code produced by the tcc guys is causing the second value to be rounded down after calculating a result that is REALLY close to 100.
Like mikijov said in that historic post, looks like the bug has been fixed.
As others have mentioned, Code 2 returns 99 due to floating point truncation. The reason why Code 1 returns a different and correct answer is because of a libc optimization.
When the power is a small positive integer, it is more efficient to perform the operation as repeated multiplication. The simpler path removes roundoff. Since this is inlined you don't see function calls being made.
You've fooled it into thinking that the inputs are real and so it gives an approximate answer, which happens to be slightly under 100, e.g. 99.999999 that is then truncated to 99.
It is nearly impossible(*) to provide strict IEEE 754 semantics at reasonable cost when the only floating-point instructions one is allowed to used are the 387 ones. It is particularly hard when one wishes to keep the FPU working on the full 64-bit significand so that the long double type is available for extended precision. The usual “solution” is to do intermediate computations at the only available precision, and to convert to the lower precision at more or less well-defined occasions.
Recent versions of GCC handle excess precision in intermediate computations according to the interpretation laid out by Joseph S. Myers in a 2008 post to the GCC mailing list. This description makes a program compiled with gcc -std=c99 -mno-sse2 -mfpmath=387 completely predictable, to the last bit, as far as I understand. And if by chance it doesn't, it is a bug and it will be fixed: Joseph S. Myers' stated intention in his post is to make it predictable.
Is it documented how Clang handles excess precision (say when the option -mno-sse2 is used), and where?
(*) EDIT: this is an exaggeration. It is slightly annoying but not that difficult to emulate binary64 when one is allowed to configure the x87 FPU to use a 53-bit significand.
Following a comment by R.. below, here is the log of a short interaction of mine with the most recent version of Clang I have :
Hexa:~ $ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
Hexa:~ $ cat fem.c
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <fenv.h>
double x;
double y = 2.0;
double z = 1.0;
int main(){
x = y + z;
printf("%d\n", (int) FLT_EVAL_METHOD);
}
Hexa:~ $ clang -std=c99 -mno-sse2 fem.c
Hexa:~ $ ./a.out
0
Hexa:~ $ clang -std=c99 -mno-sse2 -S fem.c
Hexa:~ $ cat fem.s
…
movl $0, %esi
fldl _y(%rip)
fldl _z(%rip)
faddp %st(1)
movq _x#GOTPCREL(%rip), %rax
fstpl (%rax)
…
This does not answer the originally posed question, but if you are a programmer working with similar issues, this answer might help you.
I really don't see where the perceived difficulty is. Providing strict IEEE-754 binary64 semantics while being limited to 80387 floating-point math, and retaining 80-bit long double computation, seems to follow well-specified C99 casting rules with both GCC-4.6.3 and clang-3.0 (based on LLVM 3.0).
Edited to add: Yet, Pascal Cuoq is correct: neither gcc-4.6.3 or clang-llvm-3.0 actually enforce those rules correctly for '387 floating-point math. Given the proper compiler options, the rules are correctly applied to expressions evaluated at compile time, but not for run-time expressions. There are workarounds, listed after the break below.
I do molecular dynamics simulation code, and am very familiar with the repeatability/predictability requirements and also with the desire to retain maximum precision available when possible, so I do claim I know what I am talking about here. This answer should show that the tools exist and are simple to use; the problems arise from not being aware of or not using those tools.
(A preferred example I like, is the Kahan summation algorithm. With C99 and proper casting (adding casts to e.g. Wikipedia example code), no tricks or extra temporary variables are needed at all. The implementation works regardless of compiler optimization level, including at -O3 and -Ofast.)
C99 explicitly states (in e.g. 5.4.2.2) that casting and assignment both remove all extra range and precision. This means that you can use long double arithmetic by defining your temporary variables used during computation as long double, also casting your input variables to that type; whenever a IEEE-754 binary64 is needed, just cast to a double.
On '387, the cast generates an assignment and a load on both the above compilers; this does correctly round the 80-bit value to IEEE-754 binary64. This cost is very reasonable in my opinion. The exact time taken depends on the architecture and surrounding code; usually it is and can be interleaved with other code to bring the cost down to neglible levels. When MMX, SSE or AVX are available, their registers are separate from the 80-bit 80387 registers, and the cast usually is done by moving the value to the MMX/SSE/AVX register.
(I prefer production code to use a specific floating-point type, say tempdouble or such, for temporary variables, so that it can be defined to either double or long double depending on architecture and speed/precision tradeoffs desired.)
In a nutshell:
Don't assume (expression) is of double precision just because all the variables and literal constants are. Write it as (double)(expression) if you want the result at double precision.
This applies to compound expressions, too, and may sometimes lead to unwieldy expressions with many levels of casts.
If you have expr1 and expr2 that you wish to compute at 80-bit precision, but also need the product of each rounded to 64-bit first, use
long double expr1;
long double expr2;
double product = (double)(expr1) * (double)(expr2);
Note, product is computed as a product of two 64-bit values; not computed at 80-bit precision, then rounded down. Calculating the product at 80-bit precision, then rounding down, would be
double other = expr1 * expr2;
or, adding descriptive casts that tell you exactly what is happening,
double other = (double)((long double)(expr1) * (long double)(expr2));
It should be obvious that product and other often differ.
The C99 casting rules are just another tool you must learn to wield, if you do work with mixed 32-bit/64-bit/80-bit/128-bit floating point values. Really, you encounter the exact same issues if you mix binary32 and binary64 floats (float and double on most architectures)!
Perhaps rewriting Pascal Cuoq's exploration code, to correctly apply casting rules, makes this clearer?
#include <stdio.h>
#define TEST(eq) printf("%-56s%s\n", "" # eq ":", (eq) ? "true" : "false")
int main(void)
{
double d = 1.0 / 10.0;
long double ld = 1.0L / 10.0L;
printf("sizeof (double) = %d\n", (int)sizeof (double));
printf("sizeof (long double) == %d\n", (int)sizeof (long double));
printf("\nExpect true:\n");
TEST(d == (double)(0.1));
TEST(ld == (long double)(0.1L));
TEST(d == (double)(1.0 / 10.0));
TEST(ld == (long double)(1.0L / 10.0L));
TEST(d == (double)(ld));
TEST((double)(1.0L/10.0L) == (double)(0.1));
TEST((long double)(1.0L/10.0L) == (long double)(0.1L));
printf("\nExpect false:\n");
TEST(d == ld);
TEST((long double)(d) == ld);
TEST(d == 0.1L);
TEST(ld == 0.1);
TEST(d == (long double)(1.0L / 10.0L));
TEST(ld == (double)(1.0L / 10.0));
return 0;
}
The output, with both GCC and clang, is
sizeof (double) = 8
sizeof (long double) == 12
Expect true:
d == (double)(0.1): true
ld == (long double)(0.1L): true
d == (double)(1.0 / 10.0): true
ld == (long double)(1.0L / 10.0L): true
d == (double)(ld): true
(double)(1.0L/10.0L) == (double)(0.1): true
(long double)(1.0L/10.0L) == (long double)(0.1L): true
Expect false:
d == ld: false
(long double)(d) == ld: false
d == 0.1L: false
ld == 0.1: false
d == (long double)(1.0L / 10.0L): false
ld == (double)(1.0L / 10.0): false
except that recent versions of GCC promote the right hand side of ld == 0.1 to long double first (i.e. to ld == 0.1L), yielding true, and that with SSE/AVX, long double is 128-bit.
For the pure '387 tests, I used
gcc -W -Wall -m32 -mfpmath=387 -mno-sse ... test.c -o test
clang -W -Wall -m32 -mfpmath=387 -mno-sse ... test.c -o test
with various optimization flag combinations as ..., including -fomit-frame-pointer, -O0, -O1, -O2, -O3, and -Os.
Using any other flags or C99 compilers should lead to the same results, except for long double size (and ld == 1.0 for current GCC versions). If you encounter any differences, I'd be very grateful to hear about them; I may need to warn my users of such compilers (compiler versions). Note that Microsoft does not support C99, so they are completely uninteresting to me.
Pascal Cuoq does bring up an interesting problem in the comment chain below, which I didn't immediately recognize.
When evaluating an expression, both GCC and clang with -mfpmath=387 specify that all expressions are evaluated using 80-bit precision. This leads to for example
7491907632491941888 = 0x1.9fe2693112e14p+62 = 110011111111000100110100100110001000100101110000101000000000000
5698883734965350400 = 0x1.3c5a02407b71cp+62 = 100111100010110100000001001000000011110110111000111000000000000
7491907632491941888 * 5698883734965350400 = 42695510550671093541385598890357555200 = 100000000111101101101100110001101000010100100001011110111111111111110011000111000001011101010101100011000000000000000000000000
yielding incorrect results, because that string of ones in the middle of the binary result is just at the difference between 53- and 64-bit mantissas (64 and 80-bit floating point numbers, respectively). So, while the expected result is
42695510550671088819251326462451515392 = 0x1.00f6d98d0a42fp+125 = 100000000111101101101100110001101000010100100001011110000000000000000000000000000000000000000000000000000000000000000000000000
the result obtained with just -std=c99 -m32 -mno-sse -mfpmath=387 is
42695510550671098263984292201741942784 = 0x1.00f6d98d0a43p+125 = 100000000111101101101100110001101000010100100001100000000000000000000000000000000000000000000000000000000000000000000000000000
In theory, you should be able to tell gcc and clang to enforce the correct C99 rounding rules by using options
-std=c99 -m32 -mno-sse -mfpmath=387 -ffloat-store -fexcess-precision=standard
However, this only affects expressions the compiler optimizes, and does not seem to fix the 387 handling at all. If you use e.g. clang -O1 -std=c99 -m32 -mno-sse -mfpmath=387 -ffloat-store -fexcess-precision=standard test.c -o test && ./test with test.c being Pascal Cuoq's example program, you will get the correct result per IEEE-754 rules -- but only because the compiler optimizes away the expression, not using the 387 at all.
Simply put, instead of computing
(double)d1 * (double)d2
both gcc and clang actually tell the '387 to compute
(double)((long double)d1 * (long double)d2)
This is indeed I believe this is a compiler bug affecting both gcc-4.6.3 and clang-llvm-3.0, and an easily reproduced one. (Pascal Cuoq points out that FLT_EVAL_METHOD=2 means operations on double-precision arguments is always done at extended precision, but I cannot see any sane reason -- aside from having to rewrite parts of libm on '387 -- to do that in C99 and considering IEEE-754 rules are achievable by the hardware! After all, the correct operation is easily achievable by the compiler, by modifying the '387 control word to match the precision of the expression. And, given the compiler options that should force this behaviour -- -std=c99 -ffloat-store -fexcess-precision=standard -- make no sense if FLT_EVAL_METHOD=2 behaviour is actually desired, there is no backwards compatibility issues, either.) It is important to note that given the proper compiler flags, expressions evaluated at compile time do get evaluated correctly, and that only expressions evaluated at run time get incorrect results.
The simplest workaround, and the portable one, is to use fesetround(FE_TOWARDZERO) (from fenv.h) to round all results towards zero.
In some cases, rounding towards zero may help with predictability and pathological cases. In particular, for intervals like x = [0,1), rounding towards zero means the upper limit is never reached through rounding; important if you evaluate e.g. piecewise splines.
For the other rounding modes, you need to control the 387 hardware directly.
You can use either __FPU_SETCW() from #include <fpu_control.h>, or open-code it. For example, precision.c:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define FP387_NEAREST 0x0000
#define FP387_ZERO 0x0C00
#define FP387_UP 0x0800
#define FP387_DOWN 0x0400
#define FP387_SINGLE 0x0000
#define FP387_DOUBLE 0x0200
#define FP387_EXTENDED 0x0300
static inline void fp387(const unsigned short control)
{
unsigned short cw = (control & 0x0F00) | 0x007f;
__asm__ volatile ("fldcw %0" : : "m" (*&cw));
}
const char *bits(const double value)
{
const unsigned char *const data = (const unsigned char *)&value;
static char buffer[CHAR_BIT * sizeof value + 1];
char *p = buffer;
size_t i = CHAR_BIT * sizeof value;
while (i-->0)
*(p++) = '0' + !!(data[i / CHAR_BIT] & (1U << (i % CHAR_BIT)));
*p = '\0';
return (const char *)buffer;
}
int main(int argc, char *argv[])
{
double d1, d2;
char dummy;
if (argc != 3) {
fprintf(stderr, "\nUsage: %s 7491907632491941888 5698883734965350400\n\n", argv[0]);
return EXIT_FAILURE;
}
if (sscanf(argv[1], " %lf %c", &d1, &dummy) != 1) {
fprintf(stderr, "%s: Not a number.\n", argv[1]);
return EXIT_FAILURE;
}
if (sscanf(argv[2], " %lf %c", &d2, &dummy) != 1) {
fprintf(stderr, "%s: Not a number.\n", argv[2]);
return EXIT_FAILURE;
}
printf("%s:\td1 = %.0f\n\t %s in binary\n", argv[1], d1, bits(d1));
printf("%s:\td2 = %.0f\n\t %s in binary\n", argv[2], d2, bits(d2));
printf("\nDefaults:\n");
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nExtended precision, rounding to nearest integer:\n");
fp387(FP387_EXTENDED | FP387_NEAREST);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nDouble precision, rounding to nearest integer:\n");
fp387(FP387_DOUBLE | FP387_NEAREST);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nExtended precision, rounding to zero:\n");
fp387(FP387_EXTENDED | FP387_ZERO);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nDouble precision, rounding to zero:\n");
fp387(FP387_DOUBLE | FP387_ZERO);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
return 0;
}
Using clang-llvm-3.0 to compile and run, I get the correct results,
clang -std=c99 -m32 -mno-sse -mfpmath=387 -O3 -W -Wall precision.c -o precision
./precision 7491907632491941888 5698883734965350400
7491907632491941888: d1 = 7491907632491941888
0100001111011001111111100010011010010011000100010010111000010100 in binary
5698883734965350400: d2 = 5698883734965350400
0100001111010011110001011010000000100100000001111011011100011100 in binary
Defaults:
Product = 42695510550671098263984292201741942784
0100011111000000000011110110110110011000110100001010010000110000 in binary
Extended precision, rounding to nearest integer:
Product = 42695510550671098263984292201741942784
0100011111000000000011110110110110011000110100001010010000110000 in binary
Double precision, rounding to nearest integer:
Product = 42695510550671088819251326462451515392
0100011111000000000011110110110110011000110100001010010000101111 in binary
Extended precision, rounding to zero:
Product = 42695510550671088819251326462451515392
0100011111000000000011110110110110011000110100001010010000101111 in binary
Double precision, rounding to zero:
Product = 42695510550671088819251326462451515392
0100011111000000000011110110110110011000110100001010010000101111 in binary
In other words, you can work around the compiler issues by using fp387() to set the precision and rounding mode.
The downside is that some math libraries (libm.a, libm.so) may be written with the assumption that intermediate results are always computed at 80-bit precision. At least the GNU C library fpu_control.h on x86_64 has the comment "libm requires extended precision". Fortunately, you can take the '387 implementations from e.g. GNU C library, and implement them in a header file or write a known-to-work libm, if you need the math.h functionality; in fact, I think I might be able to help there.
For the record, below is what I found by experimentation. The following program shows various behaviors when compiled with Clang:
#include <stdio.h>
int r1, r2, r3, r4, r5, r6, r7;
double ten = 10.0;
int main(int c, char **v)
{
r1 = 0.1 == (1.0 / ten);
r2 = 0.1 == (1.0 / 10.0);
r3 = 0.1 == (double) (1.0 / ten);
r4 = 0.1 == (double) (1.0 / 10.0);
ten = 10.0;
r5 = 0.1 == (1.0 / ten);
r6 = 0.1 == (double) (1.0 / ten);
r7 = ((double) 0.1) == (1.0 / 10.0);
printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d\n", r1, r2, r3, r4, r5, r6, r7);
}
The results vary with the optimization level:
$ clang -v
Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)
$ clang -mno-sse2 -std=c99 t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=0 r7=1
$ clang -mno-sse2 -std=c99 -O2 t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=1 r7=1
The cast (double) that differentiates r5 and r6 at -O2 has no effect at -O0 and for variables r3 and r4. The result r1 is different from r5 at all optimization levels, whereas r6 only differs from r3 at -O2.
I'm just starting to learn assembly and I want to round a floating-point value using a specified rounding mode. I've tried to implement this using fstcw, fldcw, and frndint.
Right now I get a couple of errors:
~ $ gc a02p
gcc -Wall -g a02p.c -o a02p
a02p.c: In function `roundD':
a02p.c:33: error: parse error before '[' token
a02p.c:21: warning: unused variable `mode'
~ $
I'm not sure if I am even doing this right at all. I don't want to use any predefined functions. I want to use GCC inline assembly.
This is the code:
#include <stdio.h>
#include <stdlib.h>
#define PRECISION 3
#define RND_CTL_BIT_SHIFT 10
// floating point rounding modes: IA-32 Manual, Vol. 1, p. 4-20
typedef enum {
ROUND_NEAREST_EVEN = 0 << RND_CTL_BIT_SHIFT,
ROUND_MINUS_INF = 1 << RND_CTL_BIT_SHIFT,
ROUND_PLUS_INF = 2 << RND_CTL_BIT_SHIFT,
ROUND_TOWARD_ZERO = 3 << RND_CTL_BIT_SHIFT
} RoundingMode;
double roundD (double n, RoundingMode roundingMode)
{
short c;
short mode = (( c & 0xf3ff) | (roundingMode));
asm("fldcw %[nIn] \n"
"fstcw %%eax \n" // not sure why i would need to store the CW
"fldcw %[modeIn] \n"
"frndint \n"
"fistp %[nOut] \n"
: [nOut] "=m" (n)
: [nIn] "m" (n)
: [modeIn] "m" (mode)
);
return n;
}
int main (int argc, char **argv)
{
double n = 0.0;
if (argc > 1)
n = atof(argv[1]);
printf("roundD even %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_NEAREST_EVEN));
printf("roundD down %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_MINUS_INF));
printf("roundD up %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_PLUS_INF));
printf("roundD zero %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_TOWARD_ZERO));
return 0;
}
Am I even remotely close to getting this right?
A better process is to write a simple function that rounds a floating point value. Next, instruct your compiler to print an assembly listing for the function. You may want to put the function in a separate file.
This process will show you the calling and exiting conventions used by the compiler. By placing the function in a separate file, you won't have to build other files. Also, it will give you the opportunity to replace the C language function with an assembly language function.
Although inline assembly is supported, I prefer to replace an entire function in assembly language and not use inline assembly (inline assembly isn't portable, so the source will have to be changed when porting to a different platform).
GCC's inline assembler syntax is arcane to say the least, and I do not claim to be an expert, but when I have used it I used this howto guide. In all examples all template markers are of the form %n where n is a number, rather then the %[ttt] form that you have used.
I also note that the line numbers reported in your error messages do not seem to correspond with the code you posted. So I wonder if they are in fact for this exact code?
I am trying write a function named absD that returns the absolute value of its argument.
I do not want to use any predefined functions. Right now i am getting a parse error when i try to compile it.
I would image all i would have to do to get the absolute value of a double is change the sign bit? this is what i have
#include <stdio.h>
#include <stdlib.h>
#define PRECISION 3
double absD (double n)
{
asm(" fld %eax \n"
" movl $0x7FFFFFFFFFFFFFFF, %eax \n"
" pop %eax \n"
);
return n;
}
int main (int argc, char **argv)
{
double n = 0.0;
printf("Absolute value\n");
if (argc > 1)
n = atof(argv[1]);
printf("abs(%.*f) = %.*f\n", PRECISION, n, PRECISION, absD(n));
return 0;
}
I fixed the curly brace..
the error i am getting is
~ $ gc a02
gcc -Wall -g a02.c -o a02
/tmp/ccl2H7rf.s: Assembler messages:
/tmp/ccl2H7rf.s:228: Error: suffix or operands invalid for `fld'
/tmp/ccl2H7rf.s:229: Error: missing or invalid immediate expression `0x7FFFFFFFF
FFFFFFF'
~ $
Do you need to do it in assembly? Is this a homework requirement, or are you looking for very high performance?
This doesn't use any predefined functions:
double absD(double n)
{
if (n < 0.0)
n = -n;
return n;
}
I'm no expert, but it looks like you're using ( to open the assembly block and } to end it. You probably should use one or the other, not both inconsistently.
asm(" fld %eax \n"
" movl $0x7FFFFFFFFFFFFFFF, %eax \n"
" pop %eax \n"
};
notice the curley bracket before the semicolon.
Depending on how you want to treat -0.0, you can use C99 / POSIX (2004)'s signbit() function.
#include <math.h>
double absD (double x)
{
if ( signbit(x) ) {
#ifdef NAIVE
return 0.0 - x;
#else
return x &= 0x7FFFFFFFFFFFFFFF;
#endif
} else {
return x;
}
}
But frankly if you're using Standard C Library (libc) atof and printf, I don't see why not using fabs() is desirable. As you can also do normal bit-twiddling in C.
Of course if you're using assembly, why not usefchs op anyhow?
You have errors in your assembly code, which the assembler gives you perfectly reasonable error messages about.
you can't load a floating point value directly from %eax -- the operand needs to be an address to load from
you can't have constant literals that don't fit in 32 bits.
The sign of a floating bit number is just the high bit, so all you need to to is clear the most significant bit.
If you must do this in assembly, then it seems to me that you would be better of using integer rather than floating point instructions. You can't do bitwise operations on floating point registers.
Also, there isn't any need to load the entire 8 byte value into any register, you could just as easily operate only on the high byte (or int) if your processor doesn't have 8 byte integer registers.
/tmp/ccl2H7rf.s:228: Error: suffix or
operands invalid for `fld'
fld needs the operand to be in memory. Put it in memory, ie the stack, and supply the address.
/tmp/ccl2H7rf.s:229: Error: missing or
invalid immediate expression
`0x7FFFFFFFF FFFFFFF'
EAX does not hold more than 32 bits. If you meant for this to be a floating point number, put it on the floating point stack with a load instruction, ie fld.
Is it possible to compute pow(10,x) at compile time?
I've got a processor without floating point support and slow integer division. I'm trying to perform as many calculations as possible at compile time. I can dramatically speed up one particular function if I pass both x and C/pow(10,x) as arguments (x and C are always constant integers, but they are different constants for each call). I'm wondering if I can make these function calls less error prone by introducing a macro which does the 1/pow(10,x) automatically, instead of forcing the programmer to calculate it?
Is there a pre-processor trick? Can I force the compiler optimize out the library call?
There are very few values possible before you overflow int (or even long). For clarities sake, make it a table!
edit: If you are using floats (looks like you are), then no it's not going to be possible to call the pow() function at compile time without actually writing code that runs in the make process and outputs the values to a file (such as a header file) which is then compiled.
GCC will do this at a sufficiently high optimization level (-O1 does it for me). For example:
#include <math.h>
int test() {
double x = pow(10, 4);
return (int)x;
}
Compiles at -O1 -m32 to:
.file "test.c"
.text
.globl test
.type test, #function
test:
pushl %ebp
movl %esp, %ebp
movl $10000, %eax
popl %ebp
ret
.size test, .-test
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
This works without the cast as well - of course, you do get a floating-point load instruction in there, as the Linux ABI passes floating point return values in FPU registers.
You can do it with Boost.Preprocessor:
http://www.boost.org/doc/libs/1_39_0/libs/preprocessor/doc/index.html
Code:
#include <boost/preprocessor/repeat.hpp>
#define _TIMES_10(z, n, data) * 10
#define POW_10(n) (1 BOOST_PP_REPEAT(n, _TIMES_10, _))
int test[4] = {POW_10(0), POW_10(1), POW_10(2), POW_10(3)};
Actually, by exploiting the C preprocessor, you can get it to compute C pow(10, x) for any real C and integral x. Observe that, as #quinmars noted, C allows you to use scientific syntax to express numerical constants:
#define myexp 1.602E-19 // == 1.602 * pow(10, -19)
to be used for constants. With this in mind, and a bit of cleverness, we can construct a preprocessor macro that takes C and x and combine them into an exponentiation token:
#define EXP2(a, b) a ## b
#define EXP(a, b) EXP2(a ## e,b)
#define CONSTPOW(C,x) EXP(C, x)
This can now be used as a constant numerical value:
const int myint = CONSTPOW(3, 4); // == 30000
const double myfloat = CONSTPOW(M_PI, -2); // == 0.03141592653
You can use the scientific notation for floating point values which is part of the C language. It looks like that:
e = 1.602E-19 // == 1.602 * pow(10, -19)
The number before the E ( the E maybe capital or small 1.602e-19) is the fraction part where as the (signed) digit sequence after the E is the exponent part. By default the number is of the type double, but you can attach a floating point suffix (f, F, l or L) if you need a float or a long double.
I would not recommend to pack this semantic into a macro:
It will not work for variables, floating point values, etc.
The scientific notation is more readable.
Actually, you have M4 which is a pre-processor way more powerful than the GCC’s. A main difference between those two is GCC’s is not recursive whereas M4 is. It makes possible things like doing arithmetic at compile-time (and much more!). The below code sample is what you would like to do, isn’t it? I made it bulky in a one-file source; but I usually put M4's macro definitions in separate files and tune my Makefile rules. This way, your code is kept from ugly intrusive M4 definitions into the C source code I've done here.
$ cat foo.c
define(M4_POW_AUX, `ifelse($2, 1, $1, `eval($1 * M4_POW_AUX($1, decr($2)))')')dnl
define(M4_POW, `ifelse($2, 0, 1, `M4_POW_AUX($1, $2)')')dnl
#include <stdio.h>
int main(void)
{
printf("2^0 = %d\n", M4_POW(2, 0));
printf("2^1 = %d\n", M4_POW(2, 1));
printf("2^4 = %d\n", M4_POW(2, 4));
return 0;
}
The command line to compile this code sample uses the ability of GCC and M4 to read from the standard input.
$ cat foo.c | m4 - | gcc -x c -o m4_pow -
$ ./m4_pow
2^0 = 1
2^1 = 2
2^4 = 16
Hope this help!
If you just need to use the value at compile time, use the scientific notation like 1e2 for pow(10, 2)
If you want to populate the values at compile time and then use them later at runtime then simply use a lookup table because there are only 23 different powers of 10 that are exactly representable in double precision
double POW10[] = {1., 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10,
1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22};
You can get larger powers of 10 at runtime from the above lookup table to quickly get the result without needing to multiply by 10 again and again, but the result is just a value close to a power of 10 like when you use 10eX with X > 22
double pow10(int x)
{
if (x > 22)
return POW10[22] * pow10(x - 22);
else if (x >= 0)
return POW10[x];
else
return 1/pow10(-x);
}
If negative exponents is not needed then the final branch can be removed.
You can also reduce the lookup table size further if memory is a constraint. For example by storing only even powers of 10 and multiply by 10 when the exponent is odd, the table size is now only a half.
Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimizations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do. Here's a link to the description in the changelog, which includes a list of functions that are supported by this. 'pow' is one them.
Unfortunately, you can't use the preprocessor to precalculate library calls. If x is integral you could write your own function, but if it's a floating-point type I don't see any good way to do this.
bdonlan's replay is spot on but keep in mind that you can perform nearly any optimization you chose on the compile box provided you are willing to parse and analyze the code in your own custom preprocessor. It is a trivial task in most version of unix to override the implicit rules that call the compiler to call a custom step of your own before it hits the compiler.