Program activating the "else" clause despite condition being met (C) [duplicate] - c

float f = 0.7;
if( f == 0.7 )
printf("equal");
else
printf("not equal");
Why is the output not equal ?
Why does this happen?

This happens because in your statement
if(f == 0.7)
the 0.7 is treated as a double. Try 0.7f to ensure the value is treated as a float:
if(f == 0.7f)
But as Michael suggested in the comments below you should never test for exact equality of floating-point values.

This answer to complement the existing ones: note that 0.7 is not representable exactly either as a float (or as a double). If it was represented exactly, then there would be no loss of information when converting to float and then back to double, and you wouldn't have this problem.
It could even be argued that there should be a compiler warning for literal floating-point constants that cannot be represented exactly, especially when the standard is so fuzzy regarding whether the rounding will be made at run-time in the mode that has been set as that time or at compile-time in another rounding mode.
All non-integer numbers that can be represented exactly have 5 as their last decimal digit. Unfortunately, the converse is not true: some numbers have 5 as their last decimal digit and cannot be represented exactly. Small integers can all be represented exactly, and division by a power of 2 transforms a number that can be represented into another that can be represented, as long as you do not enter the realm of denormalized numbers.

First of all let look inside float number. I take 0.1f it is 4 byte long (binary32), in hex it is
3D CC CC CD.
By the standart IEEE 754 to convert it to decimal we must do like this:
In binary 3D CC CC CD is
0 01111011 1001100 11001100 11001101
here first digit is a Sign bit. 0 means (-1)^0 that our number is positive.
Second 8 bits is an Exponent. In binary it is 01111011 - in decimal 123. But the real Exponent is 123-127 (always 127)=-4, it's mean we need to multiply the number we will get by 2^ (-4).
The last 23 bytes is the Significand precision. There the first bit we multiply by 1/ (2^1) (0.5), second by 1/ (2^2) (0.25) and so on. Here what we get:
We need to add all numbers (power of 2) and add to it 1 (always 1, by standart). It is
1,60000002384185791015625
Now let's multiply this number by 2^ (-4), it's from Exponent. We just devide number above by 2 four time:
0,100000001490116119384765625
I used MS Calculator
**
Now the second part. Converting from decimal to binary.
**
I take the number 0.1
It ease because there is no integer part. First Sign bit - it is 0.
Exponent and Significand precision I will calculate now. The logic is multiply by 2 whole number (0.1*2=0.2) and if it's bigger than 1 substract and continue.
And the number is .00011001100110011001100110011, standart says that we must shift left before we get 1. (something). How you see we need 4 shifts, from this number calculating Exponent (127-4=123). And the Significand precision now is 10011001100110011001100 (and there is lost bits).
Now the whole number. Sign bit 0 Exponent is 123 (01111011) and Significand precision is 10011001100110011001100 and whole it is
00111101110011001100110011001100
let's compare it with those we have from previous chapter
00111101110011001100110011001101
As you see the lasts bit are not equal. It is because I truncate the number. The CPU and compiler know that the is something after Significand precision can not hold and just set the last bit to 1.

Another near exact question was linked to this one, thus the years late answer. I don't think the above answers are complete.
int fun1 ( void )
{
float x=0.7;
if(x==0.7) return(1);
else return(0);
}
int fun2 ( void )
{
float x=1.1;
if(x==1.1) return(1);
else return(0);
}
int fun3 ( void )
{
float x=1.0;
if(x==1.0) return(1);
else return(0);
}
int fun4 ( void )
{
float x=0.0;
if(x==0.0) return(1);
else return(0);
}
int fun5 ( void )
{
float x=0.7;
if(x==0.7f) return(1);
else return(0);
}
float fun10 ( void )
{
return(0.7);
}
double fun11 ( void )
{
return(0.7);
}
float fun12 ( void )
{
return(1.0);
}
double fun13 ( void )
{
return(1.0);
}
Disassembly of section .text:
00000000 <fun1>:
0: e3a00000 mov r0, #0
4: e12fff1e bx lr
00000008 <fun2>:
8: e3a00000 mov r0, #0
c: e12fff1e bx lr
00000010 <fun3>:
10: e3a00001 mov r0, #1
14: e12fff1e bx lr
00000018 <fun4>:
18: e3a00001 mov r0, #1
1c: e12fff1e bx lr
00000020 <fun5>:
20: e3a00001 mov r0, #1
24: e12fff1e bx lr
00000028 <fun10>:
28: e59f0000 ldr r0, [pc] ; 30 <fun10+0x8>
2c: e12fff1e bx lr
30: 3f333333 svccc 0x00333333
00000034 <fun11>:
34: e28f1004 add r1, pc, #4
38: e8910003 ldm r1, {r0, r1}
3c: e12fff1e bx lr
40: 66666666 strbtvs r6, [r6], -r6, ror #12
44: 3fe66666 svccc 0x00e66666
00000048 <fun12>:
48: e3a005fe mov r0, #1065353216 ; 0x3f800000
4c: e12fff1e bx lr
00000050 <fun13>:
50: e3a00000 mov r0, #0
54: e59f1000 ldr r1, [pc] ; 5c <fun13+0xc>
58: e12fff1e bx lr
5c: 3ff00000 svccc 0x00f00000 ; IMB
Why did fun3 and fun4 return one and not the others? why does fun5 work?
It is about the language. The language says that 0.7 is a double unless you use this syntax 0.7f then it is a single. So
float x=0.7;
the double 0.7 is converted to a single and stored in x.
if(x==0.7) return(1);
The language says we have to promote to the higher precision so the single in x is converted to a double and compared with the double 0.7.
00000028 <fun10>:
28: e59f0000 ldr r0, [pc] ; 30 <fun10+0x8>
2c: e12fff1e bx lr
30: 3f333333 svccc 0x00333333
00000034 <fun11>:
34: e28f1004 add r1, pc, #4
38: e8910003 ldm r1, {r0, r1}
3c: e12fff1e bx lr
40: 66666666 strbtvs r6, [r6], -r6, ror #12
44: 3fe66666 svccc 0x00e66666
single 3f333333
double 3fe6666666666666
As Alexandr pointed out if that answer remains IEEE 754 a single is
seeeeeeeefffffffffffffffffffffff
And double is
seeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffffff
with 52 bits of fraction rather than the 23 that single has.
00111111001100110011... single
001111111110011001100110... double
0 01111110 01100110011... single
0 01111111110 01100110011... double
Just like 1/3rd in base 10 is 0.3333333... forever. We have a repeating pattern here 0110
01100110011001100110011 single, 23 bits
01100110011001100110011001100110.... double 52 bits.
And here is the answer.
if(x==0.7) return(1);
x contains 01100110011001100110011 as its fraction, when that gets converted back
to double the fraction is
01100110011001100110011000000000....
which is not equal to
01100110011001100110011001100110...
but here
if(x==0.7f) return(1);
That promotion doesn't happen the same bit patterns are compared with each other.
Why does 1.0 work?
00000048 <fun12>:
48: e3a005fe mov r0, #1065353216 ; 0x3f800000
4c: e12fff1e bx lr
00000050 <fun13>:
50: e3a00000 mov r0, #0
54: e59f1000 ldr r1, [pc] ; 5c <fun13+0xc>
58: e12fff1e bx lr
5c: 3ff00000 svccc 0x00f00000 ; IMB
0011111110000000...
0011111111110000000...
0 01111111 0000000...
0 01111111111 0000000...
In both cases the fraction is all zeros. So converting from double to single to double there is no loss of precision. It converts from single to double exactly and the bit comparison of the two values works.
The highest voted and checked answer by halfdan is the correct answer, this is a case of mixed precision AND you should never do an equals comparison.
The why wasn't shown in that answer. 0.7 fails 1.0 works. Why did 0.7 fail wasn't shown. A duplicate question 1.1 fails as well.
Edit
The equals can be taken out of the problem here, it is a different question that has already been answered, but it is the same problem and also has the "what the ..." initial shock.
int fun1 ( void )
{
float x=0.7;
if(x<0.7) return(1);
else return(0);
}
int fun2 ( void )
{
float x=0.6;
if(x<0.6) return(1);
else return(0);
}
Disassembly of section .text:
00000000 <fun1>:
0: e3a00001 mov r0, #1
4: e12fff1e bx lr
00000008 <fun2>:
8: e3a00000 mov r0, #0
c: e12fff1e bx lr
Why does one show as less than and the other not less than? When they should be equal.
From above we know the 0.7 story.
01100110011001100110011 single, 23 bits
01100110011001100110011001100110.... double 52 bits.
01100110011001100110011000000000....
is less than.
01100110011001100110011001100110...
0.6 is a different repeating pattern 0011 rather than 0110.
but when converted from a double to a single or in general when represented
as a single IEEE 754.
00110011001100110011001100110011.... double 52 bits.
00110011001100110011001 is NOT the fraction for single
00110011001100110011010 IS the fraction for single
IEEE 754 uses rounding modes, round up, round down or round to zero. Compilers tend to round up by default. If you remember rounding in grade school 12345678 if I wanted to round to the 3rd digit from the top it would be 12300000 but round to the next digit 1235000 if the digit after is 5 or greater then round up. 5 is 1/2 of 10 the base (Decimal) in binary 1 is 1/2 of the base so if the digit after the position we want to round is 1 then round up else don't. So for 0.7 we didn't round up, for 0.6 we do round up.
And now it is easy to see that
00110011001100110011010
converted to a double because of (x<0.7)
00110011001100110011010000000000....
is greater than
00110011001100110011001100110011....
So without having to talk about using equals the issue still presents itself 0.7 is double 0.7f is single, the operation is promoted to the highest precision if they differ.

The problem you're facing is, as other commenters have noted, that it's generally unsafe to test for exact equivalency between floats, as initialization errors, or rounding errors in calculations can introduce minor differences that will cause the == operator to return false.
A better practice is to do something like
float f = 0.7;
if( fabs(f - 0.7) < FLT_EPSILON )
printf("equal");
else
printf("not equal");
Assuming that FLT_EPSILON has been defined as an appropriately small float value for your platform.
Since the rounding or initialization errors will be unlikely to exceed the value of FLT_EPSILON, this will give you the reliable equivalency test you're looking for.

A lot of the answers around the web make the mistake of looking at the abosulute difference between floating point numbers, this is only valid for special cases, the robust way is to look at the relative difference as in below:
// Floating point comparison:
bool CheckFP32Equal(float referenceValue, float value)
{
const float fp32_epsilon = float(1E-7);
float abs_diff = std::abs(referenceValue - value);
// Both identical zero is a special case
if( referenceValue==0.0f && value == 0.0f)
return true;
float rel_diff = abs_diff / std::max(std::abs(referenceValue) , std::abs(value) );
if(rel_diff < fp32_epsilon)
return true;
else
return false;
}

Consider this:
int main()
{
float a = 0.7;
if(0.7 > a)
printf("Hi\n");
else
printf("Hello\n");
return 0;
}
if (0.7 > a) here a is a float variable and 0.7 is a double constant. The double constant 0.7 is greater than the float variable a. Hence the if condition is satisfied and it prints 'Hi'
Example:
int main()
{
float a=0.7;
printf("%.10f %.10f\n",0.7, a);
return 0;
}
Output:
0.7000000000 0.6999999881

Pointing value saved in variable and constant have not same data types. It's the difference in the precision of data types.
If you change the datatype of f variable to double, it'll print equal, This is because constants in floating-point stored in double and non-floating in long by default, double's precision is higher than float. it'll be completely clear if you see the method of floating-point numbers conversion to binary conversion

Related

How to make this LC3 program multiply instead?

Was trying to learn how to multiply in LC3 but having trouble modifying my old program that was just meant for adding sums. How would I go about modifying this program to multiply by the 2 given inputs?
Code:
.ORIG x3000 ; begin at x3000
; input two numbers
IN ;input an integer character (ascii) {TRAP 23}
LD R3, HEXN30 ;subtract x30 to get integer
ADD R0, R0, R3
ADD R1, R0, x0 ;move the first integer to register 1
IN ;input another integer {TRAP 23}
ADD R0, R0, R3 ;convert it to an integer
; add the numbers
ADD R2, R0, R1 ;add the two integers
; print the results
LEA R0, MESG ;load the address of the message string
PUTS ;"PUTS" outputs a string {TRAP 22}
ADD R0, R2, x0 ;move the sum to R0, to be output
LD R3, HEX30 ;add 30 to integer to get integer character
ADD R0, R0, R3
OUT ;display the sum {TRAP 21}
; stop
HALT ;{TRAP 25}
; data
MESG .STRINGZ "The sum of those two numbers is: "
HEXN30 .FILL xFFD0 ; -30 HEX
HEX30 .FILL x0030 ; 30 HEX
.END```
The simplest approach to multiply on LC-3 is repetitive addition. So keep summing the multiplicand and decrement the multiplier; the iteration stops when the multiplier is consumed (i.e. zero).
There are lot's of caveats: if the multiplier is negative, then we would either negate it to use with count down, or count up instead — either way, the final result would be negated.
Since multiplication is commutative, we might consider using the lessor (absolute) value for the multiplier so that fewer iterations are done. But for more optimal multiplication, we would switch to a whole 'nother algorithm, the shift and add. Note that this algorithm is usually presented for hardware implementation, in which saving precious register bits is important, whereas for software this is not a really significant concern.

Why does gcc compile f(1199) and f(1200) differently?

What causes GCC 7.2.1 on ARM to use a load from memory (lr) for certain constants, and an immediate (mov) in some other cases? Concretely, I'm seeing the following:
GCC 7.2.1 for ARM compiles this:
extern void abc(int);
int test() { abc(1199); return 0; }
…into that:
test():
push {r4, lr}
ldr r0, .L4 // ??!
bl abc(int)
mov r0, #0
pop {r4, lr}
bx lr
.L4:
.word 1199
and this:
extern void abc(int);
int test() { abc(1200); return 0; }
…into that:
test():
push {r4, lr}
mov r0, #1200 // OK
bl abc(int)
mov r0, #0
pop {r4, lr}
bx lr
At first I expected 1200 to be some sort of unique cutoff, but there are other cut-offs like this at 1024 (1024 yields a mov r0, #1024, whereas 1025 uses ldr) and at other values.
Why would GCC use a load from memory to fetch a constant, rather than using an immediate?
This has to do with the way that constant operands are encoded in the ARM instruction set. They are encoded as an (unsigned) 8-bit constant combined with a 4 bit rotate field -- the 8 bit value will be rotated by 2 times the value in that 4 bit field. So any value that fits in that form can be used as a constant argument.
The constant 1200 is 10010110000 in binary, so it can be encoded as the 8-bit constant 01001011 combined with a rotate of 4.
The constant 1199 is 10010101111 in binary, so there's no way to fit it in an ARM constant operand.

Micro-optimizing C code for ARM

Apparently it's true that on ARM cpus, division is 10-100x slower than bit shifts. On this site it is stated that this can be solved in a number of ways. One of them being look-up tables for small problems, which is fine and standard. But listed was also replacing division with multiplication by a fixed-point reciprocal followed by a bit shift (so that x/3 becomes (x*6) << 1 etc) Another was replacing (x % y) > z with x > (z * y).
I'm far from an expert, but this sounds really odd to me. I mean, if you're using a modern compiler, wouldn't this be exactly the kind of thing that is optimized for you?
unsigned int fun1 ( unsigned int a, unsigned int b )
{
return(a/b);
}
unsigned int fun2 ( unsigned int a )
{
return(a/2);
}
unsigned int fun3 ( unsigned int a )
{
return(a/3);
}
unsigned int fun10 ( unsigned int a )
{
return(a/10);
}
unsigned int fun13 ( void )
{
return(10/13);
}
and just try it.
00000000 <fun1>:
0: e92d4010 push {r4, lr}
4: ebfffffe bl 0 <__aeabi_uidiv>
8: e8bd4010 pop {r4, lr}
c: e12fff1e bx lr
00000010 <fun2>:
10: e1a000a0 lsr r0, r0, #1
14: e12fff1e bx lr
00000018 <fun3>:
18: e59f3008 ldr r3, [pc, #8] ; 28 <fun3+0x10>
1c: e0802093 umull r2, r0, r3, r0
20: e1a000a0 lsr r0, r0, #1
24: e12fff1e bx lr
28: aaaaaaab bge feaaaadc <fun13+0xfeaaaa9c>
0000002c <fun10>:
2c: e59f3008 ldr r3, [pc, #8] ; 3c <fun10+0x10>
30: e0802093 umull r2, r0, r3, r0
34: e1a001a0 lsr r0, r0, #3
38: e12fff1e bx lr
3c: cccccccd stclgt 12, cr12, [r12], {205} ; 0xcd
00000040 <fun13>:
40: e3a00000 mov r0, #0
44: e12fff1e bx lr
As one would expect, if the compiler couldn't deal with it compile-time then it calls the appropriate library function, which is the root of the performance issue. If you don't have a native divide instruction then you end up with many instructions executed, plus all of their fetches. 10 to 100 times slower sounds about right.
Interesting that they do use the 1/3 and 1/10th trick here, and if the result can be computed at compile time, then just return the fixed result.
Compiler authors can read the same Hackers Delight and Stack Overflow pages we can and know the same tricks and, if willing and interested, can implement those optimizations. Don't assume they always will; just because I have some version of some compiler that finds these doesn't mean all compilers can/will.
As far as whether you should let the compiler/toolchain do it or not for you: that's up to you; even if you have the divide instruction, if you target multiple platforms you may choose to shift right instead of divide by 2; you may choose to do other of these tricks. If you own the divide then you at least know what it is doing; if you give it over to the compiler then you have to regularly disassemble to understand what it is doing (if you care). If this is in a timing critical section then you may wish to do both, see what the compiler does, then steal that answer or create your own deterministic solution (leaving it up to the compiler is not necessarily deterministic and I think that is the point).
EDIT
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 6.3.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I have a gcc 4.8.3 here that also produced those optimizations...as well as a 5.4.0, so they have been doing it for a while.
The arm UMULL instruction is a 64 bit = 32 bit * 32 bit operation, so it can't overflow the multiply. Certainly for 1/3rd and 1/10th and not sure how large a value of N for 1/N you can go in 64 bits and have any 32 bit operand work. Performing a simple experiment shows that at least for these two cases all possible 32 bit patterns work that is for unsigned.
It appears to use the trick for signed as well:
int negfun ( int a )
{
return(a/3);
}
00000000 <negfun>:
0: e59f3008 ldr r3, [pc, #8] ; 10 <negfun+0x10>
4: e0c32390 smull r2, r3, r0, r3
8: e0430fc0 sub r0, r3, r0, asr #31
c: e12fff1e bx lr
10: 55555556 ldrbpl r5, [r5, #-1366] ; 0xfffffaaa
Divide by constant is often optimized by compilers to a multiply and shift sequence even on processors with a divide instruction. In some cases the sequence is a bit longer, but still only uses one multiply. Link to prior thread about this.
Why does GCC use multiplication by a strange number in implementing integer division?
Divide by variable on a processor without a divide is usually handled by an optimized function, based on some variation of the methods mentioned in this wiki article:
http://en.wikipedia.org/wiki/Division_algorithm#Fast_division_methods
Using a 32 bit by 32 bit divide as an example, there may be 3 main paths used. For divisor < 256, the divide by constant method can be used (256 entry table). For expected quotients < 256, an unfolded subtract and shift sequence may be used. The main path does a table lookup to get an initial approximation, then a sequence that includes 4 multiplies, some adds, subtracts, and shifts to quadruple the number of correct bits from the table value in the estimated quotient such that estimated quotient = actual quotient or actual quotient - 1. Then the product of estimated quotient * divisor is subtracted from dividend, and if remainder >= divisor, quotient is incremented and divisor subtracted from dividend. For a 64 bit by 64 bit divide, the main sequence would involve 6 multiplies, ... to produce the estimated quotient.

Why do i get C as output for the following code snippet [duplicate]

float f = 0.7;
if( f == 0.7 )
printf("equal");
else
printf("not equal");
Why is the output not equal ?
Why does this happen?
This happens because in your statement
if(f == 0.7)
the 0.7 is treated as a double. Try 0.7f to ensure the value is treated as a float:
if(f == 0.7f)
But as Michael suggested in the comments below you should never test for exact equality of floating-point values.
This answer to complement the existing ones: note that 0.7 is not representable exactly either as a float (or as a double). If it was represented exactly, then there would be no loss of information when converting to float and then back to double, and you wouldn't have this problem.
It could even be argued that there should be a compiler warning for literal floating-point constants that cannot be represented exactly, especially when the standard is so fuzzy regarding whether the rounding will be made at run-time in the mode that has been set as that time or at compile-time in another rounding mode.
All non-integer numbers that can be represented exactly have 5 as their last decimal digit. Unfortunately, the converse is not true: some numbers have 5 as their last decimal digit and cannot be represented exactly. Small integers can all be represented exactly, and division by a power of 2 transforms a number that can be represented into another that can be represented, as long as you do not enter the realm of denormalized numbers.
First of all let look inside float number. I take 0.1f it is 4 byte long (binary32), in hex it is
3D CC CC CD.
By the standart IEEE 754 to convert it to decimal we must do like this:
In binary 3D CC CC CD is
0 01111011 1001100 11001100 11001101
here first digit is a Sign bit. 0 means (-1)^0 that our number is positive.
Second 8 bits is an Exponent. In binary it is 01111011 - in decimal 123. But the real Exponent is 123-127 (always 127)=-4, it's mean we need to multiply the number we will get by 2^ (-4).
The last 23 bytes is the Significand precision. There the first bit we multiply by 1/ (2^1) (0.5), second by 1/ (2^2) (0.25) and so on. Here what we get:
We need to add all numbers (power of 2) and add to it 1 (always 1, by standart). It is
1,60000002384185791015625
Now let's multiply this number by 2^ (-4), it's from Exponent. We just devide number above by 2 four time:
0,100000001490116119384765625
I used MS Calculator
**
Now the second part. Converting from decimal to binary.
**
I take the number 0.1
It ease because there is no integer part. First Sign bit - it is 0.
Exponent and Significand precision I will calculate now. The logic is multiply by 2 whole number (0.1*2=0.2) and if it's bigger than 1 substract and continue.
And the number is .00011001100110011001100110011, standart says that we must shift left before we get 1. (something). How you see we need 4 shifts, from this number calculating Exponent (127-4=123). And the Significand precision now is 10011001100110011001100 (and there is lost bits).
Now the whole number. Sign bit 0 Exponent is 123 (01111011) and Significand precision is 10011001100110011001100 and whole it is
00111101110011001100110011001100
let's compare it with those we have from previous chapter
00111101110011001100110011001101
As you see the lasts bit are not equal. It is because I truncate the number. The CPU and compiler know that the is something after Significand precision can not hold and just set the last bit to 1.
Another near exact question was linked to this one, thus the years late answer. I don't think the above answers are complete.
int fun1 ( void )
{
float x=0.7;
if(x==0.7) return(1);
else return(0);
}
int fun2 ( void )
{
float x=1.1;
if(x==1.1) return(1);
else return(0);
}
int fun3 ( void )
{
float x=1.0;
if(x==1.0) return(1);
else return(0);
}
int fun4 ( void )
{
float x=0.0;
if(x==0.0) return(1);
else return(0);
}
int fun5 ( void )
{
float x=0.7;
if(x==0.7f) return(1);
else return(0);
}
float fun10 ( void )
{
return(0.7);
}
double fun11 ( void )
{
return(0.7);
}
float fun12 ( void )
{
return(1.0);
}
double fun13 ( void )
{
return(1.0);
}
Disassembly of section .text:
00000000 <fun1>:
0: e3a00000 mov r0, #0
4: e12fff1e bx lr
00000008 <fun2>:
8: e3a00000 mov r0, #0
c: e12fff1e bx lr
00000010 <fun3>:
10: e3a00001 mov r0, #1
14: e12fff1e bx lr
00000018 <fun4>:
18: e3a00001 mov r0, #1
1c: e12fff1e bx lr
00000020 <fun5>:
20: e3a00001 mov r0, #1
24: e12fff1e bx lr
00000028 <fun10>:
28: e59f0000 ldr r0, [pc] ; 30 <fun10+0x8>
2c: e12fff1e bx lr
30: 3f333333 svccc 0x00333333
00000034 <fun11>:
34: e28f1004 add r1, pc, #4
38: e8910003 ldm r1, {r0, r1}
3c: e12fff1e bx lr
40: 66666666 strbtvs r6, [r6], -r6, ror #12
44: 3fe66666 svccc 0x00e66666
00000048 <fun12>:
48: e3a005fe mov r0, #1065353216 ; 0x3f800000
4c: e12fff1e bx lr
00000050 <fun13>:
50: e3a00000 mov r0, #0
54: e59f1000 ldr r1, [pc] ; 5c <fun13+0xc>
58: e12fff1e bx lr
5c: 3ff00000 svccc 0x00f00000 ; IMB
Why did fun3 and fun4 return one and not the others? why does fun5 work?
It is about the language. The language says that 0.7 is a double unless you use this syntax 0.7f then it is a single. So
float x=0.7;
the double 0.7 is converted to a single and stored in x.
if(x==0.7) return(1);
The language says we have to promote to the higher precision so the single in x is converted to a double and compared with the double 0.7.
00000028 <fun10>:
28: e59f0000 ldr r0, [pc] ; 30 <fun10+0x8>
2c: e12fff1e bx lr
30: 3f333333 svccc 0x00333333
00000034 <fun11>:
34: e28f1004 add r1, pc, #4
38: e8910003 ldm r1, {r0, r1}
3c: e12fff1e bx lr
40: 66666666 strbtvs r6, [r6], -r6, ror #12
44: 3fe66666 svccc 0x00e66666
single 3f333333
double 3fe6666666666666
As Alexandr pointed out if that answer remains IEEE 754 a single is
seeeeeeeefffffffffffffffffffffff
And double is
seeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffffff
with 52 bits of fraction rather than the 23 that single has.
00111111001100110011... single
001111111110011001100110... double
0 01111110 01100110011... single
0 01111111110 01100110011... double
Just like 1/3rd in base 10 is 0.3333333... forever. We have a repeating pattern here 0110
01100110011001100110011 single, 23 bits
01100110011001100110011001100110.... double 52 bits.
And here is the answer.
if(x==0.7) return(1);
x contains 01100110011001100110011 as its fraction, when that gets converted back
to double the fraction is
01100110011001100110011000000000....
which is not equal to
01100110011001100110011001100110...
but here
if(x==0.7f) return(1);
That promotion doesn't happen the same bit patterns are compared with each other.
Why does 1.0 work?
00000048 <fun12>:
48: e3a005fe mov r0, #1065353216 ; 0x3f800000
4c: e12fff1e bx lr
00000050 <fun13>:
50: e3a00000 mov r0, #0
54: e59f1000 ldr r1, [pc] ; 5c <fun13+0xc>
58: e12fff1e bx lr
5c: 3ff00000 svccc 0x00f00000 ; IMB
0011111110000000...
0011111111110000000...
0 01111111 0000000...
0 01111111111 0000000...
In both cases the fraction is all zeros. So converting from double to single to double there is no loss of precision. It converts from single to double exactly and the bit comparison of the two values works.
The highest voted and checked answer by halfdan is the correct answer, this is a case of mixed precision AND you should never do an equals comparison.
The why wasn't shown in that answer. 0.7 fails 1.0 works. Why did 0.7 fail wasn't shown. A duplicate question 1.1 fails as well.
Edit
The equals can be taken out of the problem here, it is a different question that has already been answered, but it is the same problem and also has the "what the ..." initial shock.
int fun1 ( void )
{
float x=0.7;
if(x<0.7) return(1);
else return(0);
}
int fun2 ( void )
{
float x=0.6;
if(x<0.6) return(1);
else return(0);
}
Disassembly of section .text:
00000000 <fun1>:
0: e3a00001 mov r0, #1
4: e12fff1e bx lr
00000008 <fun2>:
8: e3a00000 mov r0, #0
c: e12fff1e bx lr
Why does one show as less than and the other not less than? When they should be equal.
From above we know the 0.7 story.
01100110011001100110011 single, 23 bits
01100110011001100110011001100110.... double 52 bits.
01100110011001100110011000000000....
is less than.
01100110011001100110011001100110...
0.6 is a different repeating pattern 0011 rather than 0110.
but when converted from a double to a single or in general when represented
as a single IEEE 754.
00110011001100110011001100110011.... double 52 bits.
00110011001100110011001 is NOT the fraction for single
00110011001100110011010 IS the fraction for single
IEEE 754 uses rounding modes, round up, round down or round to zero. Compilers tend to round up by default. If you remember rounding in grade school 12345678 if I wanted to round to the 3rd digit from the top it would be 12300000 but round to the next digit 1235000 if the digit after is 5 or greater then round up. 5 is 1/2 of 10 the base (Decimal) in binary 1 is 1/2 of the base so if the digit after the position we want to round is 1 then round up else don't. So for 0.7 we didn't round up, for 0.6 we do round up.
And now it is easy to see that
00110011001100110011010
converted to a double because of (x<0.7)
00110011001100110011010000000000....
is greater than
00110011001100110011001100110011....
So without having to talk about using equals the issue still presents itself 0.7 is double 0.7f is single, the operation is promoted to the highest precision if they differ.
The problem you're facing is, as other commenters have noted, that it's generally unsafe to test for exact equivalency between floats, as initialization errors, or rounding errors in calculations can introduce minor differences that will cause the == operator to return false.
A better practice is to do something like
float f = 0.7;
if( fabs(f - 0.7) < FLT_EPSILON )
printf("equal");
else
printf("not equal");
Assuming that FLT_EPSILON has been defined as an appropriately small float value for your platform.
Since the rounding or initialization errors will be unlikely to exceed the value of FLT_EPSILON, this will give you the reliable equivalency test you're looking for.
A lot of the answers around the web make the mistake of looking at the abosulute difference between floating point numbers, this is only valid for special cases, the robust way is to look at the relative difference as in below:
// Floating point comparison:
bool CheckFP32Equal(float referenceValue, float value)
{
const float fp32_epsilon = float(1E-7);
float abs_diff = std::abs(referenceValue - value);
// Both identical zero is a special case
if( referenceValue==0.0f && value == 0.0f)
return true;
float rel_diff = abs_diff / std::max(std::abs(referenceValue) , std::abs(value) );
if(rel_diff < fp32_epsilon)
return true;
else
return false;
}
Consider this:
int main()
{
float a = 0.7;
if(0.7 > a)
printf("Hi\n");
else
printf("Hello\n");
return 0;
}
if (0.7 > a) here a is a float variable and 0.7 is a double constant. The double constant 0.7 is greater than the float variable a. Hence the if condition is satisfied and it prints 'Hi'
Example:
int main()
{
float a=0.7;
printf("%.10f %.10f\n",0.7, a);
return 0;
}
Output:
0.7000000000 0.6999999881
Pointing value saved in variable and constant have not same data types. It's the difference in the precision of data types.
If you change the datatype of f variable to double, it'll print equal, This is because constants in floating-point stored in double and non-floating in long by default, double's precision is higher than float. it'll be completely clear if you see the method of floating-point numbers conversion to binary conversion

ARM Parameter Passing

I am trying to write an ARM program that takes three numbers and calculates the discriminant. It has two source files, driver.s & prog3.s. I understand how to find the discriminate, but how do I pass the values A, B, & C into the discrim function from the main function? I have included the code I typed thus far....
MAIN() driver.s
avalue .reg r0
bvalue .req r1
cvalue .req r2
final .req r3
loopcount .req r4
readA:
.ascii “%d”
readB:
.ascii “%d”
readC:
.ascii “%d”
addressReadA: .word readA
addressReadB: .word readB
addressReadC: .word readC
main:
ldr avalue, addressReadA # load in avalue
ldr bvalue, addressReadB # load in bvalue
ldr cvalue, addressReadC # load in cvalue
DISCRIM() prog3.s
avalue .reg r0
bvalue .req r1
cvalue .req r2
final .req r3
discrim:
mul bvalue, bvalue, bvalue # square bvalue
mul avalue, avalue, #4 # multiply avalue by 4
mul cvalue, avalue, cvalue # multiply avalue by cvalue
add final, bvalue, cvalue # calculated discriminant
Going with the calling convention that C compilers use is not a bad idea, esp since if you go from pure assembly programs to C and asm mixed, you already have that experience. And/or you may see the simplicity and wisdom in the calling conventions used.
How do you know what the calling convention for a compiler is? 1) read the manual/documentation and google. 2) just try it. Prototype a function that is similar in the number of operands the type of operands and return value and feed it real-ish numbers and see what it produces.
Compiling to asm sometimes works but with pseudo instructions and other things done by the assembler I prefer to dissemble than to compile to asm YMMV.
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c );
unsigned int test ( void )
{
return(fun(1,2,3));
}
which with gnu currently results in
00000000 <test>:
0: e92d4010 push {r4, lr}
4: e3a02003 mov r2, #3
8: e3a01002 mov r1, #2
c: e3a00001 mov r0, #1
10: ebfffffe bl 0 <fun>
14: e8bd4010 pop {r4, lr}
18: e12fff1e bx lr
Each combination of compiler and target may have a different calling convention, there is no reason to assume that different compilers or versions of the same compiler use the same convention. ARM, MIPS, and no doubt others try to help/encourage/suggest a calling convention to use and some compilers simply follow that, why not.
There are lots of exceptions to the rule in the convention, but for ARM for the first up to four registers worth of parameters, in this case for up to four signed or unsigned integers or up to four less than or equal to 32 bit quantities (float can create exceptions) the first four general purposes regisers are used r0 for the first parameter r1 for the second and so on. And currently the standard keeps the stack aligned on 64 bit boundaries.
So we see that the first parameter is indeed placed in r0 the second in r1 and third in r2, obviously you dont have to arrange those three instructions in that order, doesnt matter.
because this function is calling another function it has to preserve its return value in lr so that goes on the stack, because the standard says to keep the stack aligned on 64 bit boundaries they are pushing another register on the stack r4 is arbitrary it could be any register, this is the one the tool chose.
because the standard says to return in r0, code that implements one of these functions.
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c )
{
return(a+b^c);
}
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e0200002 eor r0, r0, r2
8: e12fff1e bx lr
it is very interesting now that I see this that the compiler did not do a tail optimization on the call, it could have not saved lr and did a branch to fun, since the return value in r0 is what test() was also returning in the same register. really kind of baffled that that didnt happen.
but you can see that indeed the return value is left in r0, and per the convention we can trash r0-r3 we dont have to preserve them, and these functions are not.
if you change test to this
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c );
unsigned int test ( void )
{
return(fun(1,2,3)+7);
}
then it cant tail optimize and also shows the return register so you dont have to create a fun() function to see it.
00000000 <test>:
0: e92d4010 push {r4, lr}
4: e3a02003 mov r2, #3
8: e3a01002 mov r1, #2
c: e3a00001 mov r0, #1
10: ebfffffe bl 0 <fun>
14: e8bd4010 pop {r4, lr}
18: e2800007 add r0, r0, #7
1c: e12fff1e bx lr
you can do this kind of thing with other targets or other compilers, and there is no reason to assume that one target has the same convention as another.
Disassembly of section .text:
00000000 <fun>:
0: 0f 5e add r14, r15
2: 0f ed xor r13, r15
4: 30 41 ret
0000000000000000 <fun>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: 31 d0 xor %edx,%eax
5: c3 retq
and this one is stack based instead of register based
Disassembly of section .text:
00000000 <_fun>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 1d41 0004 mov 4(r5), r1
8: 6d41 0006 add 6(r5), r1
c: 1d40 0008 mov 10(r5), r0
10: 7840 xor r1, r0
12: 1585 mov (sp)+, r5
14: 0087 rts pc
But if this is just a pure assembly project and you dont have to interface with compiled output, do whatever you want, part of designing the project is not just each individual function but how they interact, no different than C or Python or some other language you have to still define the interface for yourself between functions. Assembly doesnt make that special or different, just another language.

Resources