getting exponent of a floating number in c - c

Sorry if this is already been asked, and I've seen other way of extracting the exponent of a floating point number, however this is what is given to me:
unsigned f2i(float f)
{
union {
unsigned i;
float f;
} x;
x.i = 0;
x.f = f;
return x.i;
}
I'm having trouble understanding this union datatype, because shouldn't the return x.i at the end always make f2i return a 0?
Also, what application could this data type even be useful for? For example, say I have a function:
int getexponent(float f){
}
This function is supposed to get the exponent value of the floating point number with bias of 127. I've found many ways to make this possible, however how could I manipulate the f2i function to serve this purpose?
I appreciate any pointers!
Update!!
Wow, years later and this just seem trivial.
For those who may be interested, here is the function!
int getexponent(float f) {
unsigned f2u(float f);
unsigned int ui = (f2u(f)>>23) & 0xff ;//shift over by 23 and compare to 0xff to get the exponent with the bias
int bias = 127;//initialized bias
if(ui == 0) return 1-bias; // special case 0
else if(ui == 255) return 11111111; //special case infinity
return ui - bias;
}

I'm having trouble understanding this union datatype
The union data type is a way for a programmer to indicate that some variable can be one of a number of different types. The wording of the C11 standard is something like "a union contains at most one of its members". It is used for things like parameters that may be logically one thing or another. For example, an IP address might be an IPv4 address or an IPv6 address so you might define an address type as follows:
struct IpAddress
{
bool isIPv6;
union
{
uint8_t v4[4];
uint8_t v6[16];
} bytes;
}
And you would use it like this:
struct IpAddress address = // Something
if (address.isIPv6)
{
doSomeV6ThingWith(address.bytes.v6);
}
else
{
doSomeV4ThingWith(address.bytes.v4);
}
Historically, unions have also been used to get the bits of one type into an object of another type. This is because, in a union, the members all start at the same memory address. If I just do this:
float f = 3.0;
int i = f;
The compiler will insert code to convert a float to an integer, so the exponent will be lost. However, in
union
{
unsigned int i;
float f;
} x;
x.f = 3.0;
int i = x.i;
i now contains the exact bits that represent 3.0 in a float. Or at least you hope it does. There's nothing in the C standard that says float and unsigned int have to be the same size. There's also nothing in the C standard that mandates a particular representation for float (well, annex F says floats conform to IEC 60559 , but I don't know if that counts as part of the standard). So the above code is, at best, non portable.
To get the exponent of a float the portable way is the frexpf() function defined in math.h
how could I manipulate the f2i function to serve this purpose?
Let's make the assumption that a float is stored in IEC 60559 format in 32 bits which Wkipedia thinks is the same as IEEE 754. Let's also assume that integers are stored in little endian format.
union
{
uint32_t i;
float f;
} x;
x.f = someFloat;
uint32_t bits = x.i;
bits now contains the bit pattern of the floating point number. A single precision floating point number looks like this
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM
^ ^ ^
bit 31 bit 22 bit 0
Where S is the sign bit, E is an exponent bit, M is a mantissa bit.
So having got your int32_t you just need to do some shifting and masking:
uint32_t exponentWithBias = (bits >> 23) & 0xff;

Because it's a union it means that x.i and x.f have the same address, what this allows you to do is reinterpret one data type to another. In this scenario the union is first zeroed out by x.i = 0; and then filled with f. Then x.i is returned which is the integer representation of the float f. If you would then shift that value you would get the exponent of the original f because of the way a float is laid out in memory.

I'm having trouble understanding this union datatype, because shouldn't the return x.i at the end always make f2i return a 0?
The line x.i = 0; is a bit paranoid and shouldn't be necessary. Given that unsigned int and float are both 32 bits, the union creates a single chunk of 32 bits in memory, which you can access either as a float or as the pure binary representation of that float, which is what the unsigned is for. (It would have been better to use uint32_t.)
This means that the lines x.i = 0; and x.f = f; write to the very same memory area twice.
What you end up with after the function is the pure binary notation of the float. Parsing out the exponent or any other part from there is very much implementation-defined, since it depends on floating point format and endianess. How to represent FLOAT number in memory in C might be helpful.

That union type is strongly discouraged, as it is strongly architecture dependant and compiler implementation dependant.... both things make it almost impossible to determine a correct way to achieve the information you request.
There are portable ways of doing that, and all of them have to deal with the calculation of logarithm to the base ten. If you get the integer part of the log10(x) you'll get the number you want,
int power10 = (int)log10(x);
double log10(double x)
{
return log(x)/log(10.0);
}
will give you the exponent of 10 to raise to get the number to multiply the mantissa to get the number.... if you divide the original number by the last result, you'll get the mantissa.
Be careful, as the floating point numbers are normally internally stored in a power of two's basis, which means the exponent you get stored is not a power of ten, but a power of two.

Related

Is it defined what will happen if you shift a float?

I am following This video tutorial to implement a raycaster. It contains this code:
if(ra > PI) { ry = (((int)py>>6)<<6)-0.0001; rx=(py-ry)*aTan+px; yo=-64; xo=-yo*aTan; }//looking up
I hope I have transcribed this correctly. In particular, my question is about casting py (it's declared as float) to integer, shifting it back and forth, subtracting something, and then assigning it to a ry (also a float) This line of code is entered at time 7:24, where he also explains that he wants to
round the y position to the nearest 64th value
(I'm unsure if that means the nearest multiple of 64 or the nearest (1/64), but I know that the 6 in the source is derived from the number 64, being 2⁶)
For one thing, I think that it would be valid for the compiler to load (say) a 32-bit float into a machine register, and then shift that value down by six spaces, and then shift it back up by six spaces (these two operations could interfere with the mantissa, or the exponent, or maybe something else, or these two operations could be deleted by a peephole optimisation step.)
Also I think it would be valid for the compiler to make demons fly out of your nose when this statement is executed.
So my question is, is (((int)py>>6)<<6) defined in C when py is float?
is (((int)py>>6)<<6) defined in C when py is float?
It is certainly undefined behavior (UB) for many float. The cast to an int is UB for float with a whole number value outside the [INT_MIN ... INT_MAX] range.
So code is UB for about 38% of all typical float - the large valued ones, NaNs and infinities.
For typical float, a cast to int128_t is defined for nearly all float.
To get to OP's goal, code could use the below, which I believe to be well defined for all float.
If anything, use the below to assess the correctness of one's crafted code.
// round the y position to the nearest 64th value
float round_to_64th(float x) {
if (isfinite(x)) {
float ipart;
// The modf functions break the argument value into integral and fractional parts
float frac = modff(x, &ipart);
x = ipart + roundf(frac*64)/64;
}
return x;
}
"I'm unsure if that means the nearest multiple of 64 or the nearest (1/64)"
On review, OP's code is attempting to truncate to the nearest multiple of 64 or 2⁶.
It is still UB for many float.
That code doesn't shift a float because the bitshift operators aren't defined for floating-point types. If you try it you will get a compiler error.
Notice that the code is (int)py >> 6, the float is cast to an int before the shift operation. The integer value is what is being shifted.
If your question is "what will happen if you shift a float?", the answer is it won't compile. Example on Compiler Explorer.
The best possible recreation of the shift ops for floating points, in short, without using additional functions are the following:
Left shift:
ShiftFloat(py,6,1);
Right shift:
ShiftFloat(py,6,0);
float ShiftFloat(float x, int count, int ismultiplication)
{
float value = x;
for (int i = 0; i < count; ++i)
{
value *= (powf(0.5,(float)(ismultiplication^1)) / powf(2.0,(float)(ismultiplication)));
}
return count != 0 ? value : x;
}

get the float number's exponents [duplicate]

This question already has answers here:
How to get the sign, mantissa and exponent of a floating point number
(7 answers)
Closed 6 years ago.
I just started learning floating point and get to know the SME stuff. I'm still very confused about the mantissa... Can somebody explain to me how can I get the exp part of the float. I am sorry if that's a super stupid and basic question but I am having a hard time understanding it...
Also how do I implement the following function... clearly my implementation is wrong. But how do I do it?
// Extract the 8-bit exponent field of single precision
// floating point number f and return it as an unsigned byte
unsigned char get_exponent_field(float f)
{
// TODO: Your code here.
int bias = 127;
int expp = (int)f;
unsigned char E = expp-bias;
return E;
}
If you want to extract the IEEE-754 single precision exponent from a float value (in excess 127 notation), you can use the float functions, or you can use a simple union with a shift and mask to do the same:
unsigned float_getexp (float f)
{
union {
unsigned u;
float f;
} uf;
uf.f = f;
return (uf.u >> 23) & 0xff;
}
If you want the actual exponent bias (i.e. the number of places the mantissa decimal is shifted during normalization prior to hidden bit removal), just subtract 127 from the value returned, or if you want that value returned, subtract it before the return.
Give it a try and let me know if you have questions. (note: the type should be unsigned for your exponent, instead of the int you have).
First, get your floating-point number and calculate its binary form by converting both the integral and fractional parts separately. Once you've got that, say you've got 11010.101(base-2). Normalize the binary string: 1.1010101 x 2^4. Next, add your excess value, say excess 15, to the exponent of the sci. not. value, which would give you 19(base-ten). Convert this to base-two; this will be your exponent.
This is just the structure of the operation, plug in your own bias, etc.

I don't know how to convert 16 byte hexadecimal to floating point

Probably from the time I am trying to convert and wandering internet solely for the answer of this question but I could not find. I just got I can convert hexadecimal to decimal either by some serious programming or manually through math.
I am looking to convert. If there is any way to do that then please share. Well I have searched and found IEEE754 which seems not to be working or I am not comprehending it. Can I do it manually through any equation, I think I heard about it? Or a neat C program which may do it.
Please help! Any help would be highly appreciated.
You need to study the IEEE floating point spec.
This would be quite straightforward in Java. You have handy methods like Float.floatToRawIntBits(float x) and Float.intBitsToFloat(int x)
You might be able to do it with a union.
In C its a bit more hacky. You can abuse a union. Unions in C reuse the same memory for two different variables. A union like
union DoubleLong {
long l;
double d;
} u;
would allow you to treat the same bit of memory as either a long u.i or a double u.f. There are both 8 byte so they take the same space. So doing u.d = M_PI; printf("%lx\n", u.l); prints the binary representation of pi 0x400921fb54442d18.
For 16 byte we need the union to have an array or two 8 byte longs.
#include <stdio.h>
union Data {
long i[2];
long double f;
} u;
int main(int argc, char const *argv[]) {
// Using random IP6 address 2602:306:cecd:7130:5421:a679:6d71:a660
// Store in two separate 8-byte longs
u.i[0] = 0x2602306cecd7130;
u.i[1] = 0x5421a6796d71a660;
// Print out in hexidecimal
printf("%.15La %lx %lx\n", u.f,u.i[0],u.i[1]);
// print out in decimal
printf("%.15Le %ld %ld\n", u.f,u.i[0],u.i[1]);
return 0;
}
One problem is 16 byte hexadecimal floating point numbers might not be defined on you system. float is typically 32 bit - 4 byte, double is 64 bit - 8 byte. There is an long double type but on my mac its only 80-bit - 10 byte. It might be simpler to convert to two double precision numbers. So on my system only the last 4 hexadecimal digits of the second number are significant.
Not all hexadecimal numbers correspond to valid floating point numbers, a lot of values will correspond to NaN's. If the higher bits are 7FFF or FFFF (or 7FF, FFF for double) that will either give infinity of NaN.

A small program for understanding unions in C [duplicate]

Suppose I define a union like this:
#include <stdio.h>
int main() {
union u {
int i;
float f;
};
union u tst;
tst.f = 23.45;
printf("%d\n", tst.i);
return 0;
}
Can somebody tell me what the memory where tst is stored will look like?
I am trying to understand the output 1102813594 that this program produces.
It depends on the implementation (compiler, OS, etc.) but you can use the debugger to actually see the memory contents if you want.
For example, in my MSVC 2008:
0x00415748 9a 99 bb 41
is the memory contents. Read from LSB on the left side (Intel, little-endian machine), this is 0x41bb999a or indeed 1102813594.
Generally, however, the integer and float are stored in the same bytes. Depending on how you access the union, you get the integer or floating point interpretation of those bytes. The size of the memory space, again, depends on the implementation, although it's usually the largest of its constituents aligned to some fixed boundary.
Why is the value such as it is in your (or mine) case? You should read about floating-point number representation for that (look up ieee 754)
The result is depends on the compiler implementation, But for most x86 compilers, float and int will be the same size. Wikipedia has a pretty good diagram of the layout of a 32 bit float http://en.wikipedia.org/wiki/Single_precision_floating-point_format, that can help to explain 1102813594.
If you print out the int as a hex value, it will be easier to figure out.
printf("%x\n", tst.i);
With a union, both variables are stored starting at the same memory location. A float is stored in an IEEE format (can't remember the standard number, you can look that up[edit: as pointed out by others, IEEE 754]). But, it will be a two's complement normalized (mantissa is always between 0 and 10, exponent can be anything) floating point number.
you are taking the first 4 bytes of that number (again, you can look up what bits go where in the 16 or 32 bits that a float takes up, can't remember). So it basically means nothing and it isn't useful as an int. That is, unless you know why you would want to do something like that, but usually, a float and int combo isn't very useful.
And, no, I don't think it is implementation defined. I believe that the standard dictates what format a float is in.
In union, members will be share the same memory. so that we can get the float value as integer value.
Floating number format will be different from integer storage. so that we can understand the difference using the union.
For Ex:
If I store the 12 integer value in ( 32 bits ). we can get this 12 value as floating point format.
It will stored as signed(1 bit), exponent(8 bits) and significant precision(23 bits).
I wrote a little program that shows what happens when you preserve the bit pattern of a 32-bit float into a 32-bit integer. It gives you the exact same output you are experiencing:
#include <iostream>
int main()
{
float f = 23.45;
int x = *reinterpret_cast<int*>(&f);
std::cout << x; // 1102813594
}

Arithmetic Bit Shift of Double Variable Data Type in C

I am trying to arithmetic bit shift a double data type in C. I was wondering if this is the correct way to do it:
NOTE: firdelay[ ][ ] is declared in main as
double firdelay[8][12]
void function1(double firdelay[][12]) {
int * shiftptr;
// Cast address of element of 2D matrix (type double) to integer pointer
*shiftptr = (int *) (&firdelay[0][5]);
// Dereference integer pointer and shift right by 12 bits
*shiftptr >>= 12;
}
Bitwise shifting a floating point data type will not give you the result you're looking for.
In Simulink, the Shift Arithmetic block only does bit shifting for integer data types. If you feed it a floating point type it divides the input signal by 2^N where N is the number of bits to shift specified in the mask dialog box.
EDIT:
Since you don't have the capability to perform any floating point math your options are:
understand the layout of a floating point single precision number, then figure out how to manipulate it bitwise to achieve division.
convert whatever algorithm you're porting to use fixed point data types instead of floating point
I'd recommend option 2, it's a whole lot easier than 1
Bit-shifting a floating-point data type (reinterpreted as an int) will give you gibberish (take a look at the diagrams of the binary representation here to see why).
If you want to multiply/divide by a power of 2, then you should do that explicitly.
According to the poorly worded and very unclear documentation, it seems that "bit shifting" in Simulink takes two arguments for floating point values and has the effect of multiplying a floating point value by two raised to the difference of the arguments.
You can use ldexp(double_number, bits_to_pseudo_shift) to obtain this behavior. The function ldexp is found in <math.h>.
There is no correct way to do this. Both operands of << must be of some integer type.
What you're doing is interpreting ("type-punning") a double object as if it were an int object, and then shifting the resulting int value. Even if double and int happen to be the same size, this is very unlikely to do anything useful. (And even if it is useful, it makes more sense to shift unsigned values rather than signed values).
One potential use case for this is for capturing mantissa bits, exponent bits, and sign bit if interested. For this you can use a union:
union doubleBits {
double d;
long l;
};
You can set take your double and set it in the union:
union doubleBits myUnion;
myUnion.d = myDouble;
And bit shift the long portion of the union after extracting the bits like so:
myUnion.l >>= 1;
Since the number of bits for each portion of the double is defined, this is one way to extract the underlying bit representation. This is one use case where it could be feasible to want to get the raw underlying bits. I'm not familiar with Simulink, but if that is possibly why the double was being shifted in the first place, this could be a way to achieve that behavior in C. The fact that it was always 12 bits makes me think otherwise, but just in case figured it was worth pointing out for others who stumble upon this question.
There's a way to achieve this : just add your n to the exponent part of the bitwise representation of the double.
Convert your double to long using "reinterpret" or bitwise conversion (using a union for instance).. extract bits from 52 to 63 (11 bits).. then add your shift and put back the result in the exponent.
You should take into account the special values of double (+infinity, nan or zero)
double operator_shift_left(double a,int n)
{
union
{
long long l;
double d;
} r;
r.d=a;
switch(r.l)
{
case 0x0000000000000000: // 0
case 0x8000000000000000: // -0
case 0x7FF0000000000000: // pos infnity
case 0xFFF0000000000000: // neg infnity
case 0x7FF0000000000001: // Nan
case 0x7FF8000000000001: // Nan
case 0x7FFFFFFFFFFFFFFF: // Nan
return a;
}
int nexp=(((r.l>>52)&0x7FF)+n); // new exponent
if (nexp<0) // underflow
{
r.l=r.l & 0x8000000000000000;
// returns +/- 0
return r.d;
}
if (nexp>2047) // overflow
{
r.l=(r.l & 0x8000000000000000)| 0x7FF0000000000000;
// returns +/- infinity
return r.d;
}
// returns the number with the new exponant
r.l=(r.l & 0x800FFFFFFFFFFFFF)|(((long long)nexp)<<52);
return r.d;
}
(there's probably some x64 processor instruction that does it ?)

Resources