Converting given mantissa, exponent, and sign to float? - c

I am given the mantissa, exponent, and sign and I have to convert it into the corresponding float. I am using 22 bits for mantissa, 9 bits for exponent, and 1 bit for the sign.
I conceptually know how to convert them into a float, first adjusting the exponent back to its place, then converting the resulting number back into a float, but I'm having trouble implementing this in C. I saw this thread, but I couldn't understand the code, and I'm not sure the answer is even right. Can anyone point me in the right direction? I need to code it in C
Edit: I've made some progress by first converting the mantissa into binary, then adjusting the decimal point of the binary, then converting the decimal-point binary back into the actual float. I based my conversion functions off these two GeekforGeek pages (one, two) But it seems like doing all these binary conversions is doing it the long and hard way. The link above apparently does it in very little steps by using the >> operators, but I don't understand exactly how that results in a float.

Here is a program with comments explaining the decoding:
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Define constants describing the floating-point encoding.
enum
{
SignificandBits = 22, // Number of bits in signficand field.
ExponentBits = 9, // Number of bits in exponent field.
ExponentMaximum = (1 << ExponentBits) - 1,
ExponentBias = (1 << ExponentBits-1) - 1,
};
/* Given the contents of the sign, exponent, and significand fields that
encode a floating-point number following IEEE-754 patterns for binary
floating-point, return the encoded number.
"double" is used for the return type as not all values represented by the
sample format (9 exponent bits, 22 significand bits) will fit in a "float"
when it is the commonly used IEEE-754 binary32 format.
*/
double DecodeCustomFloat(
unsigned SignField, uint32_t ExponentField, uint32_t SignificandField)
{
/* We are given a significand field as an integer, but it is used as the
value of a binary numeral consisting of “.” followed by the significand
bits. That value equals the integer divided by 2 to the power of the
number of significand bits. Define a constant with that value to be
used for converting the significand field to represented value.
*/
static const double SignificandRatio = (uint32_t) 1 << SignificandBits;
/* Decode the sign field:
If the sign bit is 0, the sign is +, for which we use +1.
If the sign bit is 1, the sign is -, for which we use -1.
*/
double Sign = SignField ? -1. : +1.;
// Dispatch to handle the different categories of exponent field.
switch (ExponentField)
{
/* When the exponent field is all ones, the value represented is a
NaN or infinity:
If the significand field is zero, it is an infinity.
Otherwise, it is a NaN. In either case, the sign should be
preserved.
Note this is a simple demonstration implementation that does not
preserve the bits in the significand field of a NaN -- we just
return the generic NAN without attempting to set its significand
bits.
*/
case ExponentMaximum:
{
return Sign * (SignificandField ? NAN : INFINITY);
}
/* When the exponent field is not all zeros or all ones, the value
represented is a normal number:
The exponent represented is ExponentField - ExponentBias, and
the significand represented is the value given by the binary
numeral “1.” followed by the significand bits.
*/
default:
{
int Exponent = ExponentField - ExponentBias;
double Significand = 1 + SignificandField / SignificandRatio;
return Sign * ldexp(Significand, Exponent);
}
/* When the exponent field is zero, the value represented is subnormal:
The exponent represented is 1 - ExponentBias, and the
significand represented is the value given by the binary
numeral “0.” followed by the significand bits.
*/
case 0:
{
int Exponent = 1 - ExponentBias;
double Significand = 0 + SignificandField / SignificandRatio;
return Sign * ldexp(Significand, Exponent);
}
}
}
/* Test that a given set of fields decodes to the expected value and
print the fields and the decoded value.
*/
static void Demonstrate(
unsigned SignField, uint32_t SignificandField, uint32_t ExponentField,
double Expected)
{
double Observed
= DecodeCustomFloat(SignField, SignificandField, ExponentField);
if (! (Observed == Expected) && ! (isnan(Observed) && isnan(Expected)))
{
fprintf(stderr,
"Error, expected (%u, %" PRIu32 ", %" PRIu32 ") to represent "
"%g (hexadecimal %a) but got %g (hexadecimal %a).\n",
SignField, SignificandField, ExponentField,
Expected, Expected,
Observed, Observed);
exit(EXIT_FAILURE);
}
printf(
"(%u, %" PRIu32 ", %" PRIu32 ") represents %g (hexadecimal %a).\n",
SignField, SignificandField, ExponentField, Observed, Observed);
}
int main(void)
{
Demonstrate(0, 0, 0, +0.);
Demonstrate(1, 0, 0, -0.);
Demonstrate(0, 255, 0, +1.);
Demonstrate(1, 255, 0, -1.);
Demonstrate(0, 511, 0, +INFINITY);
Demonstrate(1, 511, 0, -INFINITY);
Demonstrate(0, 511, 1, +NAN);
Demonstrate(1, 511, 1, -NAN);
Demonstrate(0, 0, 1, +0x1p-276);
Demonstrate(1, 0, 1, -0x1p-276);
Demonstrate(0, 255, 1, +1. + 0x1p-22);
Demonstrate(1, 255, 1, -1. - 0x1p-22);
Demonstrate(0, 1, 0, +0x1p-254);
Demonstrate(1, 1, 0, -0x1p-254);
Demonstrate(0, 510, 0x3fffff, +0x1p256 - 0x1p233);
Demonstrate(1, 510, 0x3fffff, -0x1p256 + 0x1p233);
}
Some notes:
ldexp is a standard C library function. ldexp(x, e) returns x multiplied by 2 to the power of e.
uint32_t is an unsigned 32-bit integer type. It is defined in stdint.h.
"%" PRIu32 provides a printf conversion specification for formatting a uint32_t.

Here is a simple program to illustrate how to break a float into its components and how to compose a float value from a (sign, exponent, mantissa) triplet:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
void dumpbits(uint32_t bits, int n) {
while (n--)
printf("%d%c", (bits >> n) & 1, ".|"[!n]);
}
int main(int argc, char *argv[]) {
unsigned sign = 0;
unsigned exponent = 127;
unsigned long mantissa = 0;
union {
float f32;
uint32_t u32;
} u;
if (argc == 2) {
u.f32 = strtof(argv[1], NULL);
sign = u.u32 >> 31;
exponent = (u.u32 >> 23) & 0xff;
mantissa = (u.u32) & 0x7fffff;
printf("%.8g -> sign:%u, exponent:%u, mantissa:0x%06lx\n",
(double)u.f32, sign, exponent, mantissa);
printf("+s+----exponent---+------------------mantissa-------------------+\n");
printf("|");
dumpbits(sign, 1);
dumpbits(exponent, 8);
dumpbits(mantissa, 23);
printf("\n");
printf("+-+---------------+---------------------------------------------+\n");
} else {
if (argc > 1) sign = strtol(argv[1], NULL, 0);
if (argc > 2) exponent = strtol(argv[2], NULL, 0);
if (argc > 3) mantissa = strtol(argv[3], NULL, 0);
u.u32 = (sign << 31) | (exponent << 23) | mantissa;
printf("sign:%u, exponent:%u, mantissa:0x%06lx -> %.8g\n",
sign, exponent, mantissa, (double)u.f32);
}
return 0;
}
Note that contrary to your assignment, the size of the mantissa is 23 bits and the exponent has 8 bits, which correspond to the IEEE 754 Standard for 32-bit aka single-precision float. See the Wikipedia article on Single-precision floating-point format.

Linked question is C++ not C. To convert between datatypes in C preserving bits, a tool to use is the union. Something like
union float_or_int {
uint32_t i;
float f;
}
float to_float(uint32_t mantissa, uint32_t exponent, uint32_t sign)
{
union float_or_int result;
result.i = (sign << 31) | (exponent << 22) | mantissa;
return result.f;
}
Sorry for typos, it's been a while since I've coded in C 😙

Related

How can I convert this number representation to a float?

I read this 16-bit value from a temperature sensor (type MCP9808)
Ignoring the first three MSBs, what's an easy way to convert the other bits to a float?
I managed to convert the values 2^7 through 2^0 to an integer with some bit-shifting:
uint16_t rawBits = readSensor();
int16_t value = (rawBits << 3) / 128;
However I can't think of an easy way to also include the bits with an exponent smaller than 0, except for manually checking if they're set and then adding 1/2, 1/4, 1/8 and 1/16 to the result respectively.
Something like this seems pretty reasonable. Take the number portion, divide by 16, and fix the sign.
float tempSensor(uint16_t value) {
bool negative = (value & 0x1000);
return (negative ? -1 : 1) * (value & 0x0FFF) / 16.0f;
}
float convert(unsigned char msb, unsigned char lsb)
{
return ((lsb | ((msb & 0x0f) << 8)) * ((msb & 0x10) ? -1 : 1)) / 16.0f;
}
or
float convert(uint16_t val)
{
return (((val & 0x1000) ? -1 : 1) * (val << 4)) / 256.0f;
}
If performance isn't a super big deal, I would go for something less clever and more explcit, along the lines of:
bool is_bit_set(uint16_t value, uint16_t bit) {
uint16_t mask = 1 << bit;
return (value & mask) == mask;
}
float parse_temperature(uint16_t raw_reading) {
if (is_bit_set(raw_reading, 15)) { /* temp is above Tcrit. Do something about it. */ }
if (is_bit_set(raw_reading, 14)) { /* temp is above Tupper. Do something about it. */ }
if (is_bit_set(raw_reading, 13)) { /* temp is above Tlower. Do something about it. */ }
uint16_t whole_degrees = (raw_reading & 0x0FF0) >> 4;
float magnitude = (float) whole_degrees;
if (is_bit_set(raw_reading, 0)) magnitude += 1.0f/16.0f;
if (is_bit_set(raw_reading, 1)) magnitude += 1.0f/8.0f;
if (is_bit_set(raw_reading, 2)) magnitude += 1.0f/4.0f;
if (is_bit_set(raw_reading, 3)) magnitude += 1.0f/2.0f;
bool is_negative = is_bit_set(raw_reading, 12);
// TODO: What do the 3 most significant bits do?
return magnitude * (is_negative ? -1.0 : 1.0);
}
Honestly this is a lot of simple constant math, I'd be surprised if the compiler wasn't able to heavily optimize it. That would need confirmation, of course.
If your C compiler has a clz buitlin or equivalent, it could be useful to avoid mul operation.
In your case, as the provided temp value looks like a mantissa and if your C compiler uses IEEE-754 float representation, translating the temp value in its IEEE-754 equivalent may be a most efficient way :
Update: Compact the code a little and more clear explanation about the mantissa.
float convert(uint16_t val) {
uint16_t mantissa = (uint16_t)(val <<4);
if (mantissa==0) return 0.0;
unsigned char e = (unsigned char)(__builtin_clz(mantissa) - 16);
uint32_t r = (uint32_t)((val & 0x1000) << 19 | (0x86 - e) << 23 | ((mantissa << (e+8)) & 0x07FFFFF));
return *((float *)(&r));
}
or
float convert(unsigned char msb, unsigned char lsb) {
uint16_t mantissa = (uint16_t)((msb<<8 | lsb) <<4);
if (mantissa==0) return 0.0;
unsigned char e = (unsigned char)(__builtin_clz(mantissa) - 16);
uint32_t r = (uint32_t)((msb & 0x10) << 27 | (0x86 - e) << 23 | ((mantissa << (e+8)) & 0x07FFFFF));
return *((float *)(&r));
}
Explanation:
We use the fact that the temp value is somehow a mantissa in the range -255 to 255.
We can then consider that its IEEE-754 exponent will be 128 at max to -128 at min.
We use the clz buitlin to get the "order" of the first bit set in the mantissa,
this way we can define the exponent as the therorical max (2^7 =>128) less this "order".
We use also this order to left shift the temp value to get the IEEE-754 mantissa,
plus one left shift to substract the '1' implied part of the significand for IEEE-754.
Thus we build a 32 bits binary IEEE-754 representation from the temp value with :
At first the sign bit to the 32th bit of our binary IEEE-754 representation.
The computed exponent as the theorical max 7 (2^7 =>128) plus the IEEE-754 bias (127) minus the actual "order" of the temp value.
The "order" of the temp value is deducted from the number of leading '0' of its 12 bits representation in the variable mantissa through the clz builtin.
Beware that here we consider that the clz builtin is expecting a 32 bit value as parameter, that is why we substract 16 here. This code may require adaptation if your clz expects anything else.
The number of leading '0' can go from 0 (temp value above 128 or under -127) to 11 as we directly return 0.0 for a zero temp value.
As the following bit of the "order" is then 1 in the temp value, it is equivalent to a power of 2 reduction from the theorical max 7.
Thus, with 7 + 127 => 0x86, we can simply substract to that the "order" as the number of leading '0' permits us to deduce the 'first' base exponent for IEEE-754.
If the "order" is greater than 7 we will still get the negative exponent required for less than 1 values.
We add then this 8bits exponent to our binary IEEE-754 representation from 24th bit to 31th bit.
The temp value is somehow already a mantissa, we suppress the leading '0' and its first bit set by shifting it to the left (e + 1) while also shifting left for 7 bits to place the mantissa in the 32 bits (e+7+1 => e+8) . We mask then only the desired 23 bits (AND &0x7FFFFF).
Its first bit set must be removed as it is the '1' implied significand in IEEE-754 (the power of 2 of the exponent).
We have then the IEEEE-754 mantissa and place it from the 8th bit to the 23th bit of our binary IEEE-754 representation.
The 4 initial trailing 0 from our 16 bits temp value and the added seven 'right' 0 from the shifting won't change the effective IEEE-754 value.
As we start from a 32 bits value and use or operator (|) on a 32 bits exponent and mantissa, we have then the final IEEE-754 representation.
We can then return this binary representation as an IEEE-754 C float value.
Due to the required clz and the IEEE-754 translation, this way is less portable. The main interest is to avoid MUL operations in the resulting machine code for performance on arch with a "poor" FPU.
P.S.: Casts explanation. I've added explicit casts to let the C compiler know that we discard voluntary some bits :
uint16_t mantissa = (uint16_t)(val <<4); : The cast here tells the compiler that we know we'll "loose" four left bits, as it the goal here. We discard the four first bits of the temp value for the mantissa.
(unsigned char)(__builtin_clz(mantissa) - 16) : We tell to the C compiler that we will only consider a 8 bits range for the builtin return, as we know our mantissa has only 12 significatives bits and thus a range output from 0 to 12. Thus we do not need the full int return.
uint32_t r = (uint32_t) ... : We tell the C compiler to not bother with the sign representation here as we build an IEEE-754 representation.

How can I obtain a float value from a double, with mantissa?

I'm sorry if I can't explain correctly, but my english management is so bad.
Well, the question is: I have a double var, and I cast this var to float, because I need to send exclusively 4 bytes, not 8. This isn't work for me, so I decide to calculate the value directly from IEEE754 standard.
I have this code:
union DoubleNumberIEEE754{
struct{
uint64_t mantissa : 52;
uint64_t exponent : 11;
uint64_t sign : 1;
}raw;
double d;
char c[8];
}dnumber;
floatval = (pow((-1), dnumber.raw.sign) * (1 + dnumber.raw.mantissa) * pow(2, (dnumber.raw.exponent - 1023)));
With these code, I can't obtain the correct value.
I am watching the header from linux to see the correct order of components, but I don't know if this code is correct.
I am skeptical that the double-to-float conversion is broken, but, assuming it is:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Create a mask of n low bits, for n from 0 to 63.
#define Mask(n) (((uint64_t) 1 << (n)) - 1)
/* This routine converts float values to double values:
float and double must be IEEE-754 binary32 and binary64, respectively.
The payloads of NaNs are not preserved, and only a quiet NaN is
returned.
The double is represented to the nearest value in float, with ties
rounded to the float with the even low bit in the significand.
We assume a standard C conversion from double to float is broken for
unknown reasons but that a converstion from a representable uint32_t to a
float works.
*/
static float ConvertDoubleToFloat(double x)
{
// Copy the double into a uint64_t so we can access its representation.
uint64_t u;
memcpy(&u, &x, sizeof u);
// Extract the fields from the representation of a double.
int SignCode = u >> 63;
int ExponentCode = u >> 52 & Mask(11);
uint64_t SignificandCode = u & Mask(52);
/* Convert the fields to their represented values.
The sign code merely encodes - or +.
The exponent code is biased by 1023 from the actual exponent.
The significand code represents the portion of the significand
after the radix point. However, since there is some problem
converting float to double, we will maintain it with an integer
type, scaled by 2**52 from its represented value.
The exponent code also represents the portion of the significand
before the radix point -- 1 if the exponent is non-zero, 0 if the
exponent is zero. We include that in the significand, scaled by
2**52.
*/
float Sign = SignCode ? -1 : +1;
int Exponent = ExponentCode - 1023;
uint64_t ScaledSignificand =
(ExponentCode ? ((uint64_t) 1 << 52) : 0) + SignificandCode;
// Handle NaNs and infinities.
if (ExponentCode == Mask(11))
return Sign * (SignificandCode == 0 ? INFINITY : NAN);
/* Round the significand:
If Exponent < -150, all bits of the significand are below 1/2 ULP
of the least positive float, so they round to zero.
If -150 <= Exponent < -126, only bits of the significand
corresponding to exponent -149 remain in the significand, so we
shift accordingly and round the residue.
Otherwise, the top 24 bits of the significand remain in the
significand (except when there is overflow to infinity), so we
shift accordingly and round the residue.
Note that the scaling in the new significand is 2**23 instead of 2**52,
since we are shifting it for the float format.
*/
uint32_t NewScaledSignificand;
if (Exponent < -150)
NewScaledSignificand = 0;
else
{
unsigned Shift = 53 - (Exponent < -126 ? Exponent - -150 : 24);
NewScaledSignificand = ScaledSignificand >> Shift;
// Clamp the exponent for subnormals.
if (Exponent < -126)
Exponent = -126;
// Examine the residue being lost and round accordingly.
uint64_t Residue = ScaledSignificand - ((uint64_t) NewScaledSignificand << Shift);
uint64_t Half = (uint64_t) 1 << Shift-1;
// If the residue is greater than 1/2 ULP, round up (in magnitude).
if (Half < Residue)
NewScaledSignificand += 1;
/* If the residue is 1/2 ULP, round 0.1 to 0 and 1.1 to 10.0 (these
numerals are binary with "." marking the ULP position).
*/
else if (Half == Residue)
NewScaledSignificand += NewScaledSignificand & 1;
/* Otherwise, the residue is less than 1/2, and we have already
rounded down, in the shift.
*/
}
// Combine the components, including removing the significand scaling.
return Sign * ldexpf(NewScaledSignificand, Exponent-23);
}
static void TestOneSign(double x)
{
float Expected = x;
float Observed = ConvertDoubleToFloat(x);
if (Observed != Expected && !(isnan(Observed) && isnan(Expected)))
{
printf("Error, %a -> %a, but expected %a.\n",
x, Observed, Expected);
exit(EXIT_FAILURE);
}
}
static void Test(double x)
{
TestOneSign(+x);
TestOneSign(-x);
}
int main(void)
{
for (int e = -1024; e < 1024; ++e)
{
Test(ldexp(0x1.0p0, e));
Test(ldexp(0x1.4p0, e));
Test(ldexp(0x1.8p0, e));
Test(ldexp(0x1.cp0, e));
Test(ldexp(0x1.5555540p0, e));
Test(ldexp(0x1.5555548p0, e));
Test(ldexp(0x1.5555550p0, e));
Test(ldexp(0x1.5555558p0, e));
Test(ldexp(0x1.5555560p0, e));
Test(ldexp(0x1.5555568p0, e));
Test(ldexp(0x1.5555570p0, e));
Test(ldexp(0x1.5555578p0, e));
}
Test(3.14);
Test(0);
Test(INFINITY);
Test(NAN);
Test(1/3.);
Test(0x1p128);
Test(0x1p128 - 0x1p104);
Test(0x1p128 - 0x.9p104);
Test(0x1p128 - 0x.8p104);
Test(0x1p128 - 0x.7p104);
}

Turn int to IEEE 754, extract exponent, and add 1 to exponent

Problem
I need to multiply a number without using * or + operator or other libs, only binary logic
To multiply a number by two using the IEEE norm, you add one to the exponent, for example:
12 = 1 10000010 100000(...)
So the exponent is: 10000010 (130)
If I want to multiply it by 2, I just add 1 to it and it becomes 10000011 (131).
Question
If I get a float, how do I turn it into, binary, then IEEE norm? Example:
8.0 = 1000.0 in IEEE I need it to have only one number on the left side, so 1.000 * 2^3. Then how do I add one so I multiply it by 2?
I need to get a float, ie. 6.5
Turn it to binary 110.1
Then to IEEE 754 0 10000001 101000(...)
Extract the exponent 10000001
Add one to it 10000010
Return it to IEEE 754 0 10000010 101000(...)
Then back to float 13
Given that the C implementation is known to use IEEE-754 basic 32-bit binary floating-point for its float type, the following code shows how to take apart the bits that represent a float, adjust the exponent, and reassemble the bits. Only simple multiplications involving normal numbers are handled.
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(void)
{
float f = 6.125;
// Copy the bits that represent the float f into a 32-bit integer.
uint32_t u;
assert(sizeof f == sizeof u);
memcpy(&u, &f, sizeof u);
// Extract the sign, exponent, and significand fields.
uint32_t sign = u >> 31;
uint32_t exponent = (u >> 23) & 0xff;
uint32_t significand = u & 0x7fffff;
// Assert the exponent field is in the normal range and will remain so.
assert(0 < exponent && exponent < 254);
// Increment the exponent.
++exponent;
// Reassemble the bits and copy them back into f.
u = sign << 31 | exponent << 23 | significand;
memcpy(&f, &u, sizeof f);
// Display the result.
printf("%g\n", f);
}
Maybe not exactly what you are looking for, but C has a library function ldexp which does exactly what you need:
double x = 6.5;
x = ldexp(x, 1); // now x is 13
Maybe unions is the tool you need.
#include<iostream>
union fb {
float f;
struct b_s {
unsigned int sign :1;
unsigned int mant :22;
unsigned int exp :8;
} b;
};
fb num;
int main() {
num.f = 3.1415;
num.b.exp++;
std::cout << num.f << std::endl;
return 0;
}

Convert IEEE-754 floating-point encoding to floating-point value

How can an IEEE-754 basic 32-bit floating-point encoding, such as 0x466F9100, be converted to the represented value, 15332.25?
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/* Interpret a string containing a numeral for an integer value as an encoding
of an IEEE-754 basic 32-bit binary floating-point value.
*/
static float Interpret(const char *s)
{
// Interpret the string as an integer numeral. Errors are not handled.
uint32_t x = strtoumax(s, NULL, 16);
// Separate the sign (1 bit), exponent (8), and significand (23) fields.
uint32_t sign = x>>31;
uint32_t exponent = (x>>23) & 0xff;
uint32_t significand = x & 0x7fffff;
// Interpret the sign field.
float Sign = sign ? -1 : +1;
// Create an object to hold the magnitude (or NaN).
float Magnitude;
// How we form the magnitude depends on the exponent field.
switch (exponent)
{
// If the exponent field is zero, the number is zero or subnormal.
case 0:
{
// In a zero or subnormal number, the significand starts with 0.
float Significand = 0 + significand * 0x1p-23f;
// In a zero or subnormal number, the exponent has its minimum value.
int Exponent = 1 - 127;
// Form the magnitude from the significand and exponent.
Magnitude = ldexpf(Significand, Exponent);
break;
}
// If the exponent field is all ones, the datum is infinity or a NaN.
case 0x7fffff:
{
/* If the significand field is zero, the datum is infinity.
Otherwise it is a NaN. Note that different NaN payloads and
types (quiet or signaling) are not supported here. Standard C
does not provide good support for these.
*/
Magnitude = significand == 0 ? INFINITY : NAN;
break;
}
// Otherwise, the number is normal.
default:
{
// In a normal number, the significand starts with 1.
float Significand = 1 + significand * 0x1p-23f;
// In a normal number, the exponent is biased by 127.
int Exponent = (int) exponent - 127;
// Form the magnitude from the significand and exponent.
Magnitude = ldexpf(Significand, Exponent);
}
}
// Combine the sign and magnitude and return the result.
return copysignf(Magnitude, Sign);
}
int main(void)
{
printf("%.99g\n", Interpret("0x466F9100"));
}

Getting the mantissa (of a float) of either an unsigned int or a float (C)

So, i am trying to program a function which prints a given float number (n) in its (mantissa * 2^exponent) format. I was abled to get the sign and the exponent, but not the mantissa (whichever the number is, mantissa is always equal to 0.000000). What I have is:
unsigned int num = *(unsigned*)&n;
unsigned int m = num & 0x007fffff;
mantissa = *(float*)&m;
Any ideas of what the problem might be?
The C library includes a function that does this exact task, frexp:
int expon;
float mant = frexpf(n, &expon);
printf("%g = %g * 2^%d\n", n, mant, expon);
Another way to do it is with log2f and exp2f:
if (n == 0) {
mant = 0;
expon = 0;
} else {
expon = floorf(log2f(fabsf(n)));
mant = n * exp2f(-expon);
}
These two techniques are likely to give different results for the same input. For instance, on my computer the frexpf technique describes 4 as 0.5 × 23 but the log2f technique describes 4 as 1 × 22. Both are correct, mathematically speaking. Also, frexp will give you the exact bits of the mantissa, whereas log2f and exp2f will probably round off the last bit or two.
You should know that *(unsigned *)&n and *(float *)&m violate the rule against "type punning" and have undefined behavior. If you want to get the integer with the same bit representation as a float, or vice versa, use a union:
union { uint32_t i; float f; } u;
u.f = n;
num = u.i;
(Note: This use of unions is well-defined in C since roughly 2003, but, due to the C++ committee's long-standing habit of not paying sufficient attention to changes going into C, it is not officially well-defined in C++.)
You should also know IEEE floating-point numbers use "biased" exponents. When you initialize a float variable's mantissa field but leave its exponent field at zero, that gives you the representation of a number with a large negative exponent: in other words, a number so small that printf("%f", n) will print it as zero. Whenever printf("%f", variable) prints zero, change %f to %g or %a and rerun the program before assuming that variable actually is zero.
You are stripping off the bits of the exponent, leaving 0. An exponent of 0 is special, it means the number is denormalized and is quite small, at the very bottom of the range of representable numbers. I think you'd find if you looked closely that your result isn't quite exactly zero, just so small that you have trouble telling the difference.
To get a reasonable number for the mantissa, you need to put an appropriate exponent back in. If you want a mantissa in the range of 1.0 to 2.0, you need an exponent of 0, but adding the bias means you really need an exponent of 127.
unsigned int m = (num & 0x007fffff) | (127 << 23);
mantissa = *(float*)&m;
If you'd rather have a fully integer mantissa you need an exponent of 23, biased it becomes 150.
unsigned int m = (num & 0x007fffff) | ((23+127) << 23);
mantissa = *(float*)&m;
In addition to zwol's remarks: if you want to do it yourself you have to acquire some knowledge about the innards of an IEEE-754 float. Once you have done so you can write something like
#include <stdlib.h>
#include <stdio.h>
#include <math.h> // for testing only
typedef union {
float value;
unsigned int bits; // assuming 32 bit large ints (better: uint32_t)
} ieee_754_float;
// clang -g3 -O3 -W -Wall -Wextra -Wpedantic -Weverything -std=c11 -o testthewest testthewest.c -lm
int main(int argc, char **argv)
{
unsigned int m, num;
int exp; // the exponent can be negative
float n, mantissa;
ieee_754_float uf;
// neither checks nor balances included!
if (argc == 2) {
n = atof(argv[1]);
} else {
exit(EXIT_FAILURE);
}
uf.value = n;
num = uf.bits;
m = num & 0x807fffff; // extract mantissa (i.e.: get rid of sign bit and exponent)
num = num & 0x7fffffff; // full number without sign bit
exp = (num >> 23) - 126; // extract exponent and subtract bias
m |= 0x3f000000; // normalize mantissa (add bias)
uf.bits = m;
mantissa = uf.value;
printf("n = %g, mantissa = %g, exp = %d, check %g\n", n, mantissa, exp, mantissa * powf(2, exp));
exit(EXIT_SUCCESS);
}
Note: the code above is one of the quick&dirty(tm) species and is not meant for production. It also lacks handling for subnormal (denormal) numbers, a thing you must include. Hint: multiply the mantissa with a large power of two (e.g.: 2^25 or in that ballpark) and adjust the exponent accordingly (if you took the value of my example subtract 25).

Resources