double-precision numbers in inline assembly (GCC, IA-32)

double-precision numbers in inline assembly (GCC, IA-32) - c

I'm just starting to learn assembly and I want to round a floating-point value using a specified rounding mode. I've tried to implement this using fstcw, fldcw, and frndint.
Right now I get a couple of errors:
~ $ gc a02p
gcc -Wall -g a02p.c -o a02p
a02p.c: In function `roundD':
a02p.c:33: error: parse error before '[' token
a02p.c:21: warning: unused variable `mode'
~ $
I'm not sure if I am even doing this right at all. I don't want to use any predefined functions. I want to use GCC inline assembly.
This is the code:
#include <stdio.h>
#include <stdlib.h>
#define PRECISION 3
#define RND_CTL_BIT_SHIFT 10
// floating point rounding modes: IA-32 Manual, Vol. 1, p. 4-20
typedef enum {
ROUND_NEAREST_EVEN = 0 << RND_CTL_BIT_SHIFT,
ROUND_MINUS_INF = 1 << RND_CTL_BIT_SHIFT,
ROUND_PLUS_INF = 2 << RND_CTL_BIT_SHIFT,
ROUND_TOWARD_ZERO = 3 << RND_CTL_BIT_SHIFT
} RoundingMode;
double roundD (double n, RoundingMode roundingMode)
{
short c;
short mode = (( c & 0xf3ff) | (roundingMode));
asm("fldcw %[nIn] \n"
"fstcw %%eax \n" // not sure why i would need to store the CW
"fldcw %[modeIn] \n"
"frndint \n"
"fistp %[nOut] \n"
: [nOut] "=m" (n)
: [nIn] "m" (n)
: [modeIn] "m" (mode)
);
return n;
}
int main (int argc, char **argv)
{
double n = 0.0;
if (argc > 1)
n = atof(argv[1]);
printf("roundD even %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_NEAREST_EVEN));
printf("roundD down %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_MINUS_INF));
printf("roundD up %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_PLUS_INF));
printf("roundD zero %.*f = %.*f\n",
PRECISION, n, PRECISION, roundD(n, ROUND_TOWARD_ZERO));
return 0;
}
Am I even remotely close to getting this right?

A better process is to write a simple function that rounds a floating point value. Next, instruct your compiler to print an assembly listing for the function. You may want to put the function in a separate file.
This process will show you the calling and exiting conventions used by the compiler. By placing the function in a separate file, you won't have to build other files. Also, it will give you the opportunity to replace the C language function with an assembly language function.
Although inline assembly is supported, I prefer to replace an entire function in assembly language and not use inline assembly (inline assembly isn't portable, so the source will have to be changed when porting to a different platform).

GCC's inline assembler syntax is arcane to say the least, and I do not claim to be an expert, but when I have used it I used this howto guide. In all examples all template markers are of the form %n where n is a number, rather then the %[ttt] form that you have used.
I also note that the line numbers reported in your error messages do not seem to correspond with the code you posted. So I wonder if they are in fact for this exact code?

Related

atan2f gives different results with m32 flag

I'm porting some code from 32 bit to 64 bit, and ensuring the answers are the same. In doing so, I noticed that atan2f was giving different results between the two.
I created this min repro:
#include <stdio.h>
#include <math.h>
void testAtan2fIssue(float A, float B)
{
float atan2fResult = atan2f(A, B);
printf("atan2f: %.15f\n", atan2fResult);
float atan2Result = atan2(A, B);
printf("atan2: %.15f\n", atan2Result);
}
int main()
{
float A = 16.323556900024414;
float B = -5.843180656433105;
testAtan2fIssue(A, B);
}
When built with:
gcc compilerTest.c -m32 -o 32bit.out -lm
it gives:
atan2f: 1.914544820785522
atan2: 1.914544820785522
When built with:
gcc compilerTest.c -o 64bit.out -lm
it gives:
atan2f: 1.914544701576233
atan2: 1.914544820785522
Note that atan2 gives the same result in both cases, but atan2f does not.
Things I have tried:
Building the 32 bit version with -ffloat-store
Building the 32 bit version with -msse2 -mfpmath=sse
Building the 64 bit version with -mfpmath=387
None changed the results for me.
(All of these were based on the hypothesis that it has something to do with the way floating point operations happen on 32 bit vs 64 bit architectures.)
Question:
What are my options for getting them to give the same result? (Is there a compiler flag I could use?) And also, what is happening here?
I'm running on an i7 machine, if that is helpful.

This is easier to see in hex notation.
void testAtan2fIssue(float A, float B) {
double d = atan2(A, B);
printf(" atan2 : %.13a %.15f\n", d, d);
float f = atan2f(A, B);
printf(" atan2f: %.13a %.15f\n", f, f);
printf("(float) atan2 : %.13a %.15f\n", (float) d, (float) d);
float f2 = nextafterf(f, 0);
printf("problem value : %.13a %.15f\n", f2, f2);
}
// _ added for clarity
atan2 : 0x1.ea1f9_b9d85de4p+0 1.914544_797857041
atan2f: 0x1.ea1f9_c0000000p+0 1.914544_820785522
(float) atan2 : 0x1.ea1f9_c0000000p+0 1.914544_820785522
problem value : 0x1.ea1f9_a0000000p+0 1.914544_701576233
what is happening here?
The conversion from double to float can be expected to be optimal, yet arctangent functions may be a few ULP off on various platforms. The 1.914544701576233 is the next smaller float value and reflects the slightly inferior arctangent calculation.
What are my options for getting them to give the same result?
Few. Code could roll your own my_atan2() from an established code base. Yet even that may have subtle implementation differences. #stark
Instead, consider making code checking tolerant of the minute variations.

Is there a document describing how Clang handles excess floating-point precision?

It is nearly impossible(*) to provide strict IEEE 754 semantics at reasonable cost when the only floating-point instructions one is allowed to used are the 387 ones. It is particularly hard when one wishes to keep the FPU working on the full 64-bit significand so that the long double type is available for extended precision. The usual “solution” is to do intermediate computations at the only available precision, and to convert to the lower precision at more or less well-defined occasions.
Recent versions of GCC handle excess precision in intermediate computations according to the interpretation laid out by Joseph S. Myers in a 2008 post to the GCC mailing list. This description makes a program compiled with gcc -std=c99 -mno-sse2 -mfpmath=387 completely predictable, to the last bit, as far as I understand. And if by chance it doesn't, it is a bug and it will be fixed: Joseph S. Myers' stated intention in his post is to make it predictable.
Is it documented how Clang handles excess precision (say when the option -mno-sse2 is used), and where?
(*) EDIT: this is an exaggeration. It is slightly annoying but not that difficult to emulate binary64 when one is allowed to configure the x87 FPU to use a 53-bit significand.
Following a comment by R.. below, here is the log of a short interaction of mine with the most recent version of Clang I have :
Hexa:~ $ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
Hexa:~ $ cat fem.c
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <fenv.h>
double x;
double y = 2.0;
double z = 1.0;
int main(){
x = y + z;
printf("%d\n", (int) FLT_EVAL_METHOD);
}
Hexa:~ $ clang -std=c99 -mno-sse2 fem.c
Hexa:~ $ ./a.out
0
Hexa:~ $ clang -std=c99 -mno-sse2 -S fem.c
Hexa:~ $ cat fem.s
…
movl $0, %esi
fldl _y(%rip)
fldl _z(%rip)
faddp %st(1)
movq _x#GOTPCREL(%rip), %rax
fstpl (%rax)
…

This does not answer the originally posed question, but if you are a programmer working with similar issues, this answer might help you.
I really don't see where the perceived difficulty is. Providing strict IEEE-754 binary64 semantics while being limited to 80387 floating-point math, and retaining 80-bit long double computation, seems to follow well-specified C99 casting rules with both GCC-4.6.3 and clang-3.0 (based on LLVM 3.0).
Edited to add: Yet, Pascal Cuoq is correct: neither gcc-4.6.3 or clang-llvm-3.0 actually enforce those rules correctly for '387 floating-point math. Given the proper compiler options, the rules are correctly applied to expressions evaluated at compile time, but not for run-time expressions. There are workarounds, listed after the break below.
I do molecular dynamics simulation code, and am very familiar with the repeatability/predictability requirements and also with the desire to retain maximum precision available when possible, so I do claim I know what I am talking about here. This answer should show that the tools exist and are simple to use; the problems arise from not being aware of or not using those tools.
(A preferred example I like, is the Kahan summation algorithm. With C99 and proper casting (adding casts to e.g. Wikipedia example code), no tricks or extra temporary variables are needed at all. The implementation works regardless of compiler optimization level, including at -O3 and -Ofast.)
C99 explicitly states (in e.g. 5.4.2.2) that casting and assignment both remove all extra range and precision. This means that you can use long double arithmetic by defining your temporary variables used during computation as long double, also casting your input variables to that type; whenever a IEEE-754 binary64 is needed, just cast to a double.
On '387, the cast generates an assignment and a load on both the above compilers; this does correctly round the 80-bit value to IEEE-754 binary64. This cost is very reasonable in my opinion. The exact time taken depends on the architecture and surrounding code; usually it is and can be interleaved with other code to bring the cost down to neglible levels. When MMX, SSE or AVX are available, their registers are separate from the 80-bit 80387 registers, and the cast usually is done by moving the value to the MMX/SSE/AVX register.
(I prefer production code to use a specific floating-point type, say tempdouble or such, for temporary variables, so that it can be defined to either double or long double depending on architecture and speed/precision tradeoffs desired.)
In a nutshell:
Don't assume (expression) is of double precision just because all the variables and literal constants are. Write it as (double)(expression) if you want the result at double precision.
This applies to compound expressions, too, and may sometimes lead to unwieldy expressions with many levels of casts.
If you have expr1 and expr2 that you wish to compute at 80-bit precision, but also need the product of each rounded to 64-bit first, use
long double expr1;
long double expr2;
double product = (double)(expr1) * (double)(expr2);
Note, product is computed as a product of two 64-bit values; not computed at 80-bit precision, then rounded down. Calculating the product at 80-bit precision, then rounding down, would be
double other = expr1 * expr2;
or, adding descriptive casts that tell you exactly what is happening,
double other = (double)((long double)(expr1) * (long double)(expr2));
It should be obvious that product and other often differ.
The C99 casting rules are just another tool you must learn to wield, if you do work with mixed 32-bit/64-bit/80-bit/128-bit floating point values. Really, you encounter the exact same issues if you mix binary32 and binary64 floats (float and double on most architectures)!
Perhaps rewriting Pascal Cuoq's exploration code, to correctly apply casting rules, makes this clearer?
#include <stdio.h>
#define TEST(eq) printf("%-56s%s\n", "" # eq ":", (eq) ? "true" : "false")
int main(void)
{
double d = 1.0 / 10.0;
long double ld = 1.0L / 10.0L;
printf("sizeof (double) = %d\n", (int)sizeof (double));
printf("sizeof (long double) == %d\n", (int)sizeof (long double));
printf("\nExpect true:\n");
TEST(d == (double)(0.1));
TEST(ld == (long double)(0.1L));
TEST(d == (double)(1.0 / 10.0));
TEST(ld == (long double)(1.0L / 10.0L));
TEST(d == (double)(ld));
TEST((double)(1.0L/10.0L) == (double)(0.1));
TEST((long double)(1.0L/10.0L) == (long double)(0.1L));
printf("\nExpect false:\n");
TEST(d == ld);
TEST((long double)(d) == ld);
TEST(d == 0.1L);
TEST(ld == 0.1);
TEST(d == (long double)(1.0L / 10.0L));
TEST(ld == (double)(1.0L / 10.0));
return 0;
}
The output, with both GCC and clang, is
sizeof (double) = 8
sizeof (long double) == 12
Expect true:
d == (double)(0.1): true
ld == (long double)(0.1L): true
d == (double)(1.0 / 10.0): true
ld == (long double)(1.0L / 10.0L): true
d == (double)(ld): true
(double)(1.0L/10.0L) == (double)(0.1): true
(long double)(1.0L/10.0L) == (long double)(0.1L): true
Expect false:
d == ld: false
(long double)(d) == ld: false
d == 0.1L: false
ld == 0.1: false
d == (long double)(1.0L / 10.0L): false
ld == (double)(1.0L / 10.0): false
except that recent versions of GCC promote the right hand side of ld == 0.1 to long double first (i.e. to ld == 0.1L), yielding true, and that with SSE/AVX, long double is 128-bit.
For the pure '387 tests, I used
gcc -W -Wall -m32 -mfpmath=387 -mno-sse ... test.c -o test
clang -W -Wall -m32 -mfpmath=387 -mno-sse ... test.c -o test
with various optimization flag combinations as ..., including -fomit-frame-pointer, -O0, -O1, -O2, -O3, and -Os.
Using any other flags or C99 compilers should lead to the same results, except for long double size (and ld == 1.0 for current GCC versions). If you encounter any differences, I'd be very grateful to hear about them; I may need to warn my users of such compilers (compiler versions). Note that Microsoft does not support C99, so they are completely uninteresting to me.
Pascal Cuoq does bring up an interesting problem in the comment chain below, which I didn't immediately recognize.
When evaluating an expression, both GCC and clang with -mfpmath=387 specify that all expressions are evaluated using 80-bit precision. This leads to for example
7491907632491941888 = 0x1.9fe2693112e14p+62 = 110011111111000100110100100110001000100101110000101000000000000
5698883734965350400 = 0x1.3c5a02407b71cp+62 = 100111100010110100000001001000000011110110111000111000000000000
7491907632491941888 * 5698883734965350400 = 42695510550671093541385598890357555200 = 100000000111101101101100110001101000010100100001011110111111111111110011000111000001011101010101100011000000000000000000000000
yielding incorrect results, because that string of ones in the middle of the binary result is just at the difference between 53- and 64-bit mantissas (64 and 80-bit floating point numbers, respectively). So, while the expected result is
42695510550671088819251326462451515392 = 0x1.00f6d98d0a42fp+125 = 100000000111101101101100110001101000010100100001011110000000000000000000000000000000000000000000000000000000000000000000000000
the result obtained with just -std=c99 -m32 -mno-sse -mfpmath=387 is
42695510550671098263984292201741942784 = 0x1.00f6d98d0a43p+125 = 100000000111101101101100110001101000010100100001100000000000000000000000000000000000000000000000000000000000000000000000000000
In theory, you should be able to tell gcc and clang to enforce the correct C99 rounding rules by using options
-std=c99 -m32 -mno-sse -mfpmath=387 -ffloat-store -fexcess-precision=standard
However, this only affects expressions the compiler optimizes, and does not seem to fix the 387 handling at all. If you use e.g. clang -O1 -std=c99 -m32 -mno-sse -mfpmath=387 -ffloat-store -fexcess-precision=standard test.c -o test && ./test with test.c being Pascal Cuoq's example program, you will get the correct result per IEEE-754 rules -- but only because the compiler optimizes away the expression, not using the 387 at all.
Simply put, instead of computing
(double)d1 * (double)d2
both gcc and clang actually tell the '387 to compute
(double)((long double)d1 * (long double)d2)
This is indeed I believe this is a compiler bug affecting both gcc-4.6.3 and clang-llvm-3.0, and an easily reproduced one. (Pascal Cuoq points out that FLT_EVAL_METHOD=2 means operations on double-precision arguments is always done at extended precision, but I cannot see any sane reason -- aside from having to rewrite parts of libm on '387 -- to do that in C99 and considering IEEE-754 rules are achievable by the hardware! After all, the correct operation is easily achievable by the compiler, by modifying the '387 control word to match the precision of the expression. And, given the compiler options that should force this behaviour -- -std=c99 -ffloat-store -fexcess-precision=standard -- make no sense if FLT_EVAL_METHOD=2 behaviour is actually desired, there is no backwards compatibility issues, either.) It is important to note that given the proper compiler flags, expressions evaluated at compile time do get evaluated correctly, and that only expressions evaluated at run time get incorrect results.
The simplest workaround, and the portable one, is to use fesetround(FE_TOWARDZERO) (from fenv.h) to round all results towards zero.
In some cases, rounding towards zero may help with predictability and pathological cases. In particular, for intervals like x = [0,1), rounding towards zero means the upper limit is never reached through rounding; important if you evaluate e.g. piecewise splines.
For the other rounding modes, you need to control the 387 hardware directly.
You can use either __FPU_SETCW() from #include <fpu_control.h>, or open-code it. For example, precision.c:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define FP387_NEAREST 0x0000
#define FP387_ZERO 0x0C00
#define FP387_UP 0x0800
#define FP387_DOWN 0x0400
#define FP387_SINGLE 0x0000
#define FP387_DOUBLE 0x0200
#define FP387_EXTENDED 0x0300
static inline void fp387(const unsigned short control)
{
unsigned short cw = (control & 0x0F00) | 0x007f;
__asm__ volatile ("fldcw %0" : : "m" (*&cw));
}
const char *bits(const double value)
{
const unsigned char *const data = (const unsigned char *)&value;
static char buffer[CHAR_BIT * sizeof value + 1];
char *p = buffer;
size_t i = CHAR_BIT * sizeof value;
while (i-->0)
*(p++) = '0' + !!(data[i / CHAR_BIT] & (1U << (i % CHAR_BIT)));
*p = '\0';
return (const char *)buffer;
}
int main(int argc, char *argv[])
{
double d1, d2;
char dummy;
if (argc != 3) {
fprintf(stderr, "\nUsage: %s 7491907632491941888 5698883734965350400\n\n", argv[0]);
return EXIT_FAILURE;
}
if (sscanf(argv[1], " %lf %c", &d1, &dummy) != 1) {
fprintf(stderr, "%s: Not a number.\n", argv[1]);
return EXIT_FAILURE;
}
if (sscanf(argv[2], " %lf %c", &d2, &dummy) != 1) {
fprintf(stderr, "%s: Not a number.\n", argv[2]);
return EXIT_FAILURE;
}
printf("%s:\td1 = %.0f\n\t %s in binary\n", argv[1], d1, bits(d1));
printf("%s:\td2 = %.0f\n\t %s in binary\n", argv[2], d2, bits(d2));
printf("\nDefaults:\n");
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nExtended precision, rounding to nearest integer:\n");
fp387(FP387_EXTENDED | FP387_NEAREST);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nDouble precision, rounding to nearest integer:\n");
fp387(FP387_DOUBLE | FP387_NEAREST);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nExtended precision, rounding to zero:\n");
fp387(FP387_EXTENDED | FP387_ZERO);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
printf("\nDouble precision, rounding to zero:\n");
fp387(FP387_DOUBLE | FP387_ZERO);
printf("Product = %.0f\n\t %s in binary\n", d1 * d2, bits(d1 * d2));
return 0;
}
Using clang-llvm-3.0 to compile and run, I get the correct results,
clang -std=c99 -m32 -mno-sse -mfpmath=387 -O3 -W -Wall precision.c -o precision
./precision 7491907632491941888 5698883734965350400
7491907632491941888: d1 = 7491907632491941888
0100001111011001111111100010011010010011000100010010111000010100 in binary
5698883734965350400: d2 = 5698883734965350400
0100001111010011110001011010000000100100000001111011011100011100 in binary
Defaults:
Product = 42695510550671098263984292201741942784
0100011111000000000011110110110110011000110100001010010000110000 in binary
Extended precision, rounding to nearest integer:
Product = 42695510550671098263984292201741942784
0100011111000000000011110110110110011000110100001010010000110000 in binary
Double precision, rounding to nearest integer:
Product = 42695510550671088819251326462451515392
0100011111000000000011110110110110011000110100001010010000101111 in binary
Extended precision, rounding to zero:
Product = 42695510550671088819251326462451515392
0100011111000000000011110110110110011000110100001010010000101111 in binary
Double precision, rounding to zero:
Product = 42695510550671088819251326462451515392
0100011111000000000011110110110110011000110100001010010000101111 in binary
In other words, you can work around the compiler issues by using fp387() to set the precision and rounding mode.
The downside is that some math libraries (libm.a, libm.so) may be written with the assumption that intermediate results are always computed at 80-bit precision. At least the GNU C library fpu_control.h on x86_64 has the comment "libm requires extended precision". Fortunately, you can take the '387 implementations from e.g. GNU C library, and implement them in a header file or write a known-to-work libm, if you need the math.h functionality; in fact, I think I might be able to help there.

For the record, below is what I found by experimentation. The following program shows various behaviors when compiled with Clang:
#include <stdio.h>
int r1, r2, r3, r4, r5, r6, r7;
double ten = 10.0;
int main(int c, char **v)
{
r1 = 0.1 == (1.0 / ten);
r2 = 0.1 == (1.0 / 10.0);
r3 = 0.1 == (double) (1.0 / ten);
r4 = 0.1 == (double) (1.0 / 10.0);
ten = 10.0;
r5 = 0.1 == (1.0 / ten);
r6 = 0.1 == (double) (1.0 / ten);
r7 = ((double) 0.1) == (1.0 / 10.0);
printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d\n", r1, r2, r3, r4, r5, r6, r7);
}
The results vary with the optimization level:
$ clang -v
Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)
$ clang -mno-sse2 -std=c99 t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=0 r7=1
$ clang -mno-sse2 -std=c99 -O2 t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=1 r7=1
The cast (double) that differentiates r5 and r6 at -O2 has no effect at -O0 and for variables r3 and r4. The result r1 is different from r5 at all optimization levels, whereas r6 only differs from r3 at -O2.

absolute Value of double

I am trying write a function named absD that returns the absolute value of its argument.
I do not want to use any predefined functions. Right now i am getting a parse error when i try to compile it.
I would image all i would have to do to get the absolute value of a double is change the sign bit? this is what i have
#include <stdio.h>
#include <stdlib.h>
#define PRECISION 3
double absD (double n)
{
asm(" fld %eax \n"
" movl $0x7FFFFFFFFFFFFFFF, %eax \n"
" pop %eax \n"
);
return n;
}
int main (int argc, char **argv)
{
double n = 0.0;
printf("Absolute value\n");
if (argc > 1)
n = atof(argv[1]);
printf("abs(%.*f) = %.*f\n", PRECISION, n, PRECISION, absD(n));
return 0;
}
I fixed the curly brace..
the error i am getting is
~ $ gc a02
gcc -Wall -g a02.c -o a02
/tmp/ccl2H7rf.s: Assembler messages:
/tmp/ccl2H7rf.s:228: Error: suffix or operands invalid for `fld'
/tmp/ccl2H7rf.s:229: Error: missing or invalid immediate expression `0x7FFFFFFFF
FFFFFFF'
~ $

Do you need to do it in assembly? Is this a homework requirement, or are you looking for very high performance?
This doesn't use any predefined functions:
double absD(double n)
{
if (n < 0.0)
n = -n;
return n;
}

I'm no expert, but it looks like you're using ( to open the assembly block and } to end it. You probably should use one or the other, not both inconsistently.

asm(" fld %eax \n"
" movl $0x7FFFFFFFFFFFFFFF, %eax \n"
" pop %eax \n"
};
notice the curley bracket before the semicolon.

Depending on how you want to treat -0.0, you can use C99 / POSIX (2004)'s signbit() function.
#include <math.h>
double absD (double x)
{
if ( signbit(x) ) {
#ifdef NAIVE
return 0.0 - x;
#else
return x &= 0x7FFFFFFFFFFFFFFF;
#endif
} else {
return x;
}
}
But frankly if you're using Standard C Library (libc) atof and printf, I don't see why not using fabs() is desirable. As you can also do normal bit-twiddling in C.
Of course if you're using assembly, why not usefchs op anyhow?

You have errors in your assembly code, which the assembler gives you perfectly reasonable error messages about.
you can't load a floating point value directly from %eax -- the operand needs to be an address to load from
you can't have constant literals that don't fit in 32 bits.

The sign of a floating bit number is just the high bit, so all you need to to is clear the most significant bit.
If you must do this in assembly, then it seems to me that you would be better of using integer rather than floating point instructions. You can't do bitwise operations on floating point registers.
Also, there isn't any need to load the entire 8 byte value into any register, you could just as easily operate only on the high byte (or int) if your processor doesn't have 8 byte integer registers.

/tmp/ccl2H7rf.s:228: Error: suffix or
operands invalid for `fld'
fld needs the operand to be in memory. Put it in memory, ie the stack, and supply the address.
/tmp/ccl2H7rf.s:229: Error: missing or
invalid immediate expression
`0x7FFFFFFFF FFFFFFF'
EAX does not hold more than 32 bits. If you meant for this to be a floating point number, put it on the floating point stack with a load instruction, ie fld.

Can I compute pow(10,x) at compile-time in c?

Is it possible to compute pow(10,x) at compile time?
I've got a processor without floating point support and slow integer division. I'm trying to perform as many calculations as possible at compile time. I can dramatically speed up one particular function if I pass both x and C/pow(10,x) as arguments (x and C are always constant integers, but they are different constants for each call). I'm wondering if I can make these function calls less error prone by introducing a macro which does the 1/pow(10,x) automatically, instead of forcing the programmer to calculate it?
Is there a pre-processor trick? Can I force the compiler optimize out the library call?

There are very few values possible before you overflow int (or even long). For clarities sake, make it a table!
edit: If you are using floats (looks like you are), then no it's not going to be possible to call the pow() function at compile time without actually writing code that runs in the make process and outputs the values to a file (such as a header file) which is then compiled.

GCC will do this at a sufficiently high optimization level (-O1 does it for me). For example:
#include <math.h>
int test() {
double x = pow(10, 4);
return (int)x;
}
Compiles at -O1 -m32 to:
.file "test.c"
.text
.globl test
.type test, #function
test:
pushl %ebp
movl %esp, %ebp
movl $10000, %eax
popl %ebp
ret
.size test, .-test
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
This works without the cast as well - of course, you do get a floating-point load instruction in there, as the Linux ABI passes floating point return values in FPU registers.

You can do it with Boost.Preprocessor:
http://www.boost.org/doc/libs/1_39_0/libs/preprocessor/doc/index.html
Code:
#include <boost/preprocessor/repeat.hpp>
#define _TIMES_10(z, n, data) * 10
#define POW_10(n) (1 BOOST_PP_REPEAT(n, _TIMES_10, _))
int test[4] = {POW_10(0), POW_10(1), POW_10(2), POW_10(3)};

Actually, by exploiting the C preprocessor, you can get it to compute C pow(10, x) for any real C and integral x. Observe that, as #quinmars noted, C allows you to use scientific syntax to express numerical constants:
#define myexp 1.602E-19 // == 1.602 * pow(10, -19)
to be used for constants. With this in mind, and a bit of cleverness, we can construct a preprocessor macro that takes C and x and combine them into an exponentiation token:
#define EXP2(a, b) a ## b
#define EXP(a, b) EXP2(a ## e,b)
#define CONSTPOW(C,x) EXP(C, x)
This can now be used as a constant numerical value:
const int myint = CONSTPOW(3, 4); // == 30000
const double myfloat = CONSTPOW(M_PI, -2); // == 0.03141592653

You can use the scientific notation for floating point values which is part of the C language. It looks like that:
e = 1.602E-19 // == 1.602 * pow(10, -19)
The number before the E ( the E maybe capital or small 1.602e-19) is the fraction part where as the (signed) digit sequence after the E is the exponent part. By default the number is of the type double, but you can attach a floating point suffix (f, F, l or L) if you need a float or a long double.
I would not recommend to pack this semantic into a macro:
It will not work for variables, floating point values, etc.
The scientific notation is more readable.

Actually, you have M4 which is a pre-processor way more powerful than the GCC’s. A main difference between those two is GCC’s is not recursive whereas M4 is. It makes possible things like doing arithmetic at compile-time (and much more!). The below code sample is what you would like to do, isn’t it? I made it bulky in a one-file source; but I usually put M4's macro definitions in separate files and tune my Makefile rules. This way, your code is kept from ugly intrusive M4 definitions into the C source code I've done here.
$ cat foo.c
define(M4_POW_AUX, `ifelse($2, 1, $1, `eval($1 * M4_POW_AUX($1, decr($2)))')')dnl
define(M4_POW, `ifelse($2, 0, 1, `M4_POW_AUX($1, $2)')')dnl
#include <stdio.h>
int main(void)
{
printf("2^0 = %d\n", M4_POW(2, 0));
printf("2^1 = %d\n", M4_POW(2, 1));
printf("2^4 = %d\n", M4_POW(2, 4));
return 0;
}
The command line to compile this code sample uses the ability of GCC and M4 to read from the standard input.
$ cat foo.c | m4 - | gcc -x c -o m4_pow -
$ ./m4_pow
2^0 = 1
2^1 = 2
2^4 = 16
Hope this help!

If you just need to use the value at compile time, use the scientific notation like 1e2 for pow(10, 2)
If you want to populate the values at compile time and then use them later at runtime then simply use a lookup table because there are only 23 different powers of 10 that are exactly representable in double precision
double POW10[] = {1., 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10,
1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22};
You can get larger powers of 10 at runtime from the above lookup table to quickly get the result without needing to multiply by 10 again and again, but the result is just a value close to a power of 10 like when you use 10eX with X > 22
double pow10(int x)
{
if (x > 22)
return POW10[22] * pow10(x - 22);
else if (x >= 0)
return POW10[x];
else
return 1/pow10(-x);
}
If negative exponents is not needed then the final branch can be removed.
You can also reduce the lookup table size further if memory is a constraint. For example by storing only even powers of 10 and multiply by 10 when the exponent is odd, the table size is now only a half.

Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimizations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do. Here's a link to the description in the changelog, which includes a list of functions that are supported by this. 'pow' is one them.

Unfortunately, you can't use the preprocessor to precalculate library calls. If x is integral you could write your own function, but if it's a floating-point type I don't see any good way to do this.

bdonlan's replay is spot on but keep in mind that you can perform nearly any optimization you chose on the compile box provided you are willing to parse and analyze the code in your own custom preprocessor. It is a trivial task in most version of unix to override the implicit rules that call the compiler to call a custom step of your own before it hits the compiler.

log2 not found in my math.h?

I'm using a fairly new install of Visual C++ 2008 Express.
I'm trying to compile a program that uses the log2 function, which was found by including using Eclipse on a Mac, but this Windows computer can't find the function (error C3861: 'log2': identifier not found).
The way I understood it, include directories are specific to the IDE, right? math.h is not present in my Microsoft SDKs\Windows\v6.0A\Include\ directory, but I did find a math.h in this directory: Microsoft Visual Studio 9.0\VC\include. There is also a cmath in that directory...
Where is log2?

From here:
Prototype: double log2(double anumber);
Header File: math.h (C) or cmath (C++)
Alternatively emulate it like here
#include <math.h>
...
// Calculates log2 of number.
double Log2( double n )
{
// log(n)/log(2) is log2.
return log( n ) / log( 2 );
}
Unfortunately Microsoft does not provide it.

log2() is only defined in the C99 standard, not the C90 standard. Microsoft Visual C++ is not fully C99 compliant (heck, there isn't a single fully C99 compliant compiler in existence, I believe -- not even GCC fully supports it), so it's not required to provide log2().

If you're trying to find the log2 of strictly integers, some bitwise can't hurt:
#include <stdio.h>
unsigned int log2( unsigned int x )
{
unsigned int ans = 0 ;
while( x>>=1 ) ans++;
return ans ;
}
int main()
{
// log(7) = 2 here, log(8)=3.
//for( int i = 0 ; i < 32 ; i++ )
// printf( "log_2( %d ) = %d\n", i, log2( i ) ) ;
for( unsigned int i = 1 ; i <= (1<<30) ; i <<= 1 )
printf( "log_2( %d ) = %d\n", i, log2( i ) ) ;
}

With Visual Studio 2013, log2() was added. See C99 library support in Visual Studio 2013.

Note that:
log2(x) = log(x) * log(e)
where log(e) is a constant. math.h defines M_LOG2E to the value of log(e) if you define _USE_MATH_DEFINES before inclusion of math.h:
#define _USE_MATH_DEFINES // needed to have definition of M_LOG2E
#include <math.h>
static inline double log2(double n)
{
return log(n) * M_LOG2E;
}
Even though usual approach is to do log(n)/log(2), I would advise to use multiplication instead as division is always slower especially for floats and more so on mobile CPUs. For example, on modern Intel CPUs the difference in generated code in just one instruction mulsd vs divsd and according to Intel manuals we could expect the division to be 5-10 times slower. On mobile ARM cpus I would expect floating point division to be somewhere 10-100 slower than multiplication.
Also, in case if you have compilation issues with log2 for Android, seems like log2 is available in headers starting from android-18:
#include <android/api-level.h>
#if __ANDROID_API__ < 18
static inline double log2(double n)
{
return log(n) * M_LOG2E;
}
#endif

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

double-precision numbers in inline assembly (GCC, IA-32) - c

Related

atan2f gives different results with m32 flag

Is there a document describing how Clang handles excess floating-point precision?

absolute Value of double

Can I compute pow(10,x) at compile-time in c?

log2 not found in my math.h?

Categories

Resources