I saw the following piece of code in an opensource AAC decoder,
static void flt_round(float32_t *pf)
{
int32_t flg;
uint32_t tmp, tmp1, tmp2;
tmp = *(uint32_t*)pf;
flg = tmp & (uint32_t)0x00008000;
tmp &= (uint32_t)0xffff0000;
tmp1 = tmp;
/* round 1/2 lsb toward infinity */
if (flg)
{
tmp &= (uint32_t)0xff800000; /* extract exponent and sign */
tmp |= (uint32_t)0x00010000; /* insert 1 lsb */
tmp2 = tmp; /* add 1 lsb and elided one */
tmp &= (uint32_t)0xff800000; /* extract exponent and sign */
*pf = *(float32_t*)&tmp1 + *(float32_t*)&tmp2 - *(float32_t*)&tmp;
} else {
*pf = *(float32_t*)&tmp;
}
}
In that the line,
*pf = *(float32_t*)&tmp;
is same as,
*pf = (float32_t)tmp;
Isn't it?
Or is there a difference? Maybe in performance?
Thank you.
No, they're completely different. Say the value of tmp is 1. Their code will give *pf the value of whatever floating point number has the same binary representation as the integer 1. Your code would give it the floating point value 1.0!
This code is editing the value of a float knowing it is formatted using the standard IEEE 754 floating representation.
*(float32_t*)&tmp;
means reinterpret the address of temp as being a pointer on a 32 bit float, extract the value pointed.
(float32_t)tmp;
means cast the integer to float 32. Which means 32.1111f may well produce 32.
Very different.
The first causes the bit pattern of tmp to be reinterpreted as a float.
The second causes the numerical value of tmp to be converted to float (within the accuracy that it can be represented including rounding).
Try this:
int main(void) {
int32_t n=1078530011;
float32_t f;
f=*(float32_t*)(&n);
printf("reinterpet the bit pattern of %d as float - f==%f\n",n,f);
f=(float32_t)n;
printf("cast the numerical value of %d as float - f==%f\n",n,f);
return 0;
}
Example output:
reinterpet the bit pattern of 1078530011 as float - f==3.141593
cast the numerical value of 1078530011 as float - f==1078530048.000000
It's like thinking that
const char* str="3568";
int a=*(int*)str;
int b=atoi(str);
Will assign a and b the same values.
First to answer the question, my_float = (float)my_int safely converts the integer to a float according to the rules of the standard (6.3.1.4).
When a value of integer type is converted to a real floating type, if
the value being converted can be represented exactly in the new type,
it is unchanged. If the value being converted is in the range of
values that can be represented but cannot be represented exactly, the
result is either the nearest higher or nearest lower representable
value, chosen in an implementation-defined manner. If the value being
converted is outside the range of values that can be represented, the
behavior is undefined.
my_float = *(float*)&my_int on the other hand, is a dirty trick, telling the program that the binary contents of the integer should be treated as if they were a float variable, with no concerns at all.
However, the person who wrote the dirty trick was probably not aware of it leading to undefined behavior for another reason: it violates the strict aliasing rule.
To fix this bug, you either have to tell your compiler to behave in a non-standard, non-portable manner (for example gcc -fno-strict-aliasing), which I don't recommend.
Or preferably, you rewrite the code so that it doesn't rely on undefined behavior. Best way is to use unions, for which strict aliasing doesn't apply, in the following manner:
typedef union
{
uint32_t as_int;
float32_t as_float;
} converter_t;
uint32_t value1, value2, value3; // do something with these variables
*pf = (converter_t){value1}.as_float +
(converter_t){value2}.as_float -
(converter_t){value3}.as_float;
Also it is good practice to add the following sanity check:
static_assert(sizeof(converter_t) == sizeof(uint32_t),
"Unexpected padding or wrong type sizes!");
Related
I have a very basic question. In my program, i am doing multiplication of two fixed point numbers, which is given below. My inputs are of Q1.31 format and output also should be of same format. In order to do this, i am storing the result of multiplication in a temporary 64 bit variable and then doing some operations to get the result in required format.
int conversion1(float input, int Q_FORMAT)
{
return ((int)(input * ((1 << Q_FORMAT)-1)));
}
int mul(int input1, int input2, int format)
{
__int64 result;
result = (__int64)input1 * (__int64)input2;//Q2.62 format
result = result << 1;//Q1.63 format
result = result >> (format + 1);//33.31 format
return (int)result;//Q1.31 format
}
int main()
{
int Q_FORMAT = 31;
float input1 = 0.5, input2 = 0.5;
int q_input1, q_input2;
int temp_mul;
float q_muls;
q_input1 = conversion1(input1, Q_FORMAT);
q_input2 = conversion1(input2, Q_FORMAT);
q_muls = ((float)temp_mul / ((1 << (Q_FORMAT)) - 1));
printf("result of multiplication using q format = %f\n", q_muls);
return 0;
}
My question is while converting float input to integer input (and also while converting int output
to float output), i am using (1<<Q_FORMAT)-1 format. But i have seen people using (1<<Q_FORMAT)
directly in their codes. The Problem i am facing when using (1<<Q_FORMAT) is i am getting the
negative of the desired result.
For example, in my program,
If i use (1<<Q_FORMAT), i am getting -0.25 as the result
But, if i use (1<<Q_FORMAT)-1, i am getting 0.25 as the result which is correct.
Where am i going wrong? Do i need to understand any other concepts?
On common platforms, int is a two’s complement 32-bit integer providing 31 digits (plus a 'sign' bit). It's a bit too narrow to represent a Q1.31 number which requires 32 digits (plus a 'sign' bit).
In your example, this is manifesting as effective arithmetic overflow in the expression, 1 << Q_FORMAT.
To avoid this, you need to either use a type providing more digits (e.g. long long) or a fixed-point format requiring fewer digits (e.g. Q1.30). You can use unsigned to fix your example but the result will be a 'sign' bit short of Q2.30.
I'm trying to interface a board with a raspberry.
I have to read/write value to the board via modbus, but I can't write floating point value like the board.
I'm using C, and Eclipse debug perspective to see the variable's value directly.
The board send me 0x46C35000 which should value 25'000 Dec but eclipse shows me 1.18720512e+009...
When I try on this website http://www.binaryconvert.com/convert_float.html?hexadecimal=46C35000 I obtain 25,000.
What's the problem?
For testing purposes I'm using this:
int main(){
while(1){ // To view easily the value in the debug perspective
float test = 0x46C35000;
printf("%f\n",test);
}
return 0;
}
Thanks!
When you do this:
float test = 0x46C35000;
You're setting the value to 0x46C35000 (decimal 1187205120), not the representation.
You can do what you want as follows:
union {
uint32_t i;
float f;
} u = { 0x46C35000 };
printf("f=%f\n", u.f);
This safely allows an unsigned 32-bit value to be interpreted as a float.
You’re confusing logical value and internal representation. Your assignments sets the value, which is thereafter 0x46C35000, i.e. 1187205120.
To set the internal representation of the floating point number you need to make a few assumptions about how floating point numbers are represented in memory. The assumptions on the website you’re using (IEEE 754, 32 bit) are fair on a general purpose computer though.
To change the internal representation, use memcpy to copy the raw bytes into the float:
// Ensure our assumptions are correct:
#if !defined(__STDC_IEC_559__) && !defined(__GCC_IEC_559)
# error Floating points might not be in IEEE 754/IEC 559 format!
#endif
_Static_assert(sizeof(float) == sizeof(uint32_t), "Floats are not 32 bit numbers");
float f;
uint32_t rep = 0x46C35000;
memcpy(&f, &rep, sizeof f);
printf("%f\n", f);
Output: 25000.000000.
(This requires the header stdint.h for uint32_t, and string.h for memcpy.)
The constant 0x46C35000 being assigned to a float will implicitly convert the int value 1187205120 into a float, rather than directly overlay the bits into the IEEE-754 floating point format.
I normally use a union for this sort of thing:
#include <stdio.h>
typedef union
{
float f;
uint32_t i;
} FU;
int main()
{
FU foo;
foo.f = 25000.0;
printf("%.8X\n", foo.i);
foo.i = 0x46C35000;
printf("%f\n", foo.f);
return 0;
}
Output:
46C35000
25000.000000
You can understand how data are represented in memory when you access them through their address:
#include <stdio.h>
int main()
{
float f25000; // totally unused, has exactly same size as `int'
int i = 0x46C35000; // put binary value of 0x46C35000 into `int' (4 bytes representation of integer)
float *faddr; // pointer (address) to float
faddr = (float*)&i; // put address of `i' into `faddr' so `faddr' points to `i' in memory
printf("f=%f\n", *faddr); // print value pointed bu `faddr'
return 0;
}
and the result:
$ gcc -of25000 f25000.c; ./f25000
f=25000.000000
What it does is:
put 0x46C35000 into int i
copy address of i into faddr, which is also address that points data in memory, in this case of float type
print value pointed by faddr; treat it as float type
you get your 25000.0.
I cannot figure out how to convert the value of a referenced float pointer when it is referenced from an integer casted into a float pointer. I'm sorry if I'm wording this incorrectly. Here is an example of what I mean:
#include <stdio.h>
main() {
int i;
float *f;
i = 1092616192;
f = (float *)&i;
printf("i is %d and f is %f\n", i, *f);
}
the output for f is 10. How did I get that result?
Normally, the value of 1092616192 in hexadecimal is 0x41200000.
In floating-point, that will give you:
sign = positive (0b)
exponent = 130, 2^3 (10000010b)
significand = 2097152, 1.25 (01000000000000000000000b)
2^3*1.25
= 8 *1.25
= 10
To explain the exponent part uses an offset encoding, so you have to subtract 127 from it to get the real value. 130 - 127 = 3. And since this is a binary encoding, we use 2 as the base. 2 ^ 3 = 8.
To explain the significand part, you start with an invisible 'whole' value of 1. the uppermost (leftmost) bit is half of that, 0.5. The next bit is half of 0.5, 0.25. Because only the 0.25 bit and the default '1' bit is set, the significand represents 1 + 0.25 = 1.25.
What you are trying to do is called type-punning. It should be done via a union, or using memcpy() and is only meaningful on an architecture where sizeof(int) == sizeof(float) without padding bits. The result is highly dependent on the architecture: byte ordering and floating point representation will affect the reinterpreted value. The presence of padding bits would invoke undefined behavior as the representation of float 15.0 could be a trap value for type int.
Here is how you get the number corresponding to 15.0:
#include <stdio.h>
int main(void) {
union {
float f;
int i;
unsigned int u;
} u;
u.f = 15;
printf("re-interpreting the bits of float %.1f as int gives %d (%#x in hex)\n",
u.f, u.i, u.u);
return 0;
}
output on an Intel PC:
re-interpreting the bits of float 15.0 as int gives 1097859072 (0x41700000 in hex)
You are trying to predict the consequence of an undefined activity - it depends on a lot of random things, and on the hardware and OS you are using.
Basically, what you are doing is throwing a glass against the wall and getting a certain shard. Now you are asking how to get a differently formed shard. well, you need to throw the glass differently against the wall...
Here's the code:
#include <stdio.h>
union
{
unsigned u;
double d;
} a,b;
int main(void)
{
printf("Enter a, b:");
scanf("%lf %lf",&a.d,&b.d);
if(a.d>b.d)
{
a.u^=b.u^=a.u^=b.u;
}
printf("a=%g, b=%g\n",a.d,b.d);
return 0;
}
The a.u^=b.u^=a.u^=b.u; statement should have swapped a and b if a>b, but it seems that whatever I enter, the output will always be exactly my input.
a.u^=b.u^=a.u^=b.u; causes undefined behaviour by writing to a.u twice without a sequence point. See here for discussion of this code.
You could write:
unsigned tmp;
tmp = a.u;
a.u = b.u;
b.u = tmp;
which will swap a.u and b.u. However this may not achieve the goal of swapping the two doubles, if double is a larger type than unsigned on your system (a common scenario).
It's likely that double is 64 bits, while unsigned is only 32 bits. When you swap the unsigned members of the unions, you're only getting half of the doubles.
If you change d to float, or change u to unsigned long long, it will probably work, since they're likely to be the same size.
You're also causing UB by writing to the variables twice without a sequence point. The proper way to write the XOR swap is with multiple statements.
b.u ^= a.u;
a.u ^= b.u;
b.u ^= a.u;
For more about why not to use XOR for swapping, see Why don't people use xor swaps?
In usual environment, memory size of datatype 'unsigned' and 'double' are different.
That is why variables are not look like changed.
And you cannot using XOR swap on floating point variable.
because they are represented totally different in memory.
I'm using a function (Borrowing code from: http://www.exploringbinary.com/converting-floating-point-numbers-to-binary-strings-in-c/) to convert a float into binary; stored in a char. I need to be able to perform bitwise operations on the result though, so I've been trying to find a way to take the string and convert it to an integer so that I can shift the bits around as needed. I've tried atoi() but that seems to return -1.
Thus far, I have:
char binStringRaw[FP2BIN_STRING_MAX];
float myfloat;
printf("Enter a floating point number: ");
scanf("%f", &myfloat);
int castedFloat = (*((int*)&myfloat));
fp2bin(castedFloat, binStringRaw);
Where the input is "12.125", the output of binStringRaw is "10000010100001000000000000000000". However, attempting to perform a bitwise operation on this give an error: "Invalid operands to binary expression ('char[1077]' and 'int')".
P.S. - I apologize if this is a simple question or if there are some general problems with my code. I'm very new to C programming coming from Python.
"castedFloat already is the binary representation of the float, as the cast-operation tells it to interpret the bits of myfloat as bits of an integer instead of a float. "
EDIT: Thanks to Eric Postpischil:
Eric Postpischil in Comments:
"the above is not guaranteed by the C standard. Dereferencing a
converted pointer is not fully specified by the standard. A proper way
to do this is to use a union: int x = (union { float f; int i; }) {
myfloat } .i;. (And one must still ensure that int and float are the
same size in the C implementation being used.)"
Bitwise operations are only defined for Integer-type values, such as char, int, long, ..., thats why it fails when using them on the string (char-array)
btw,
int atoi(char*)
returns the integer-value of a number written inside that string, eg.
atoi("12")
will return an integer with value 12
If you would want to convert the binary representation stored in a string, you have to set the integer bit by bit corresponding to the chars, a function to do this could look like that:
long intFromBinString(char* str){
long ret=0; //initialize returnvalue with zero
int i=0; //stores the current position in string
while(str[i] != 0){ //in c, strings are NULL-terminated, so end of string is 0
ret<<1; //another bit in string, so binary shift resutl-value
if(str[i] == '1') //if the new bit was 1, add that by binary or at the end
ret || 0x01;
i++; //increment position in string
}
return ret; //return result
}
The function fp2bin needs to get a double as parameter. if you call it with castedFloat, the (now interpreted as an integer)value will be implicitly casted to float, and then pass it on.
I assume you want to get a binary representation of the float, play some bitwise ops on it, and then pass it on.
In order to do that you have to cast it back to float, the reverse way you did before, so
int castedFloat = (*((int*)&myfloat));
{/*** some bitwise magic ***/}
float backcastedFloat = (*(float*)&castedFloat);
fp2bin(castedFloat, binStringRaw);
EDIT:(Thanks again, Eric):
union bothType { float f; int i; }) both;
both.f = myfloat;
{/*** some bitwise magic on both.i ***/}
fp2bin(both.f, binStringRaw);
should work