Convert 4 bytes to 32 bit float with high precision in C?

Convert 4 bytes to 32 bit float with high precision in C? - c

I have 4 bytes like with the value as unsigned char are: 63 129 71 174.
Supposedly, when convert it to float, it should become 1.0099999904632568.
However, all I got in return is 1.01, which is not enough precision for what I am doing.
I have used popular methods like memcpy or uninion but no avail, which led me to believe... is this some kind of limitation in C?
If so, what is the optimal solution? Thanks.
EDIT: Sorry for the bad example. I should have taken a better one for my case. Consider this 4 bytes: 0 1 229 13.
It is very small, like really really small. However, it's 4 bytes, so it still does represent a float number. However, C will just return 0. I put 16 number after decimal, and it just does not work.
So why, and how to work with such number?
EDIT 2: Sorry. My friend messed up. She gave me the 4 bytes sequence and said its 32 bit float, but turn out it's 32 but unsigned int. It pretty much messed up my entire afternoon. REALLY SORRY FOR BOTHERING.
I guess the conclusion here is: do not always trust your friend.

Using memcpy() really is the way to go.
#include <stdio.h>
#include <string.h>
int main(void)
{
const unsigned char raw1[] = { 174, 71, 129, 63 };
const unsigned char raw2[] = { 0, 1, 229, 13 };
float x, y;
memcpy(&x, raw1, sizeof x);
memcpy(&y, raw2, sizeof y);
printf("%.6f\n", x);
printf("%g\n", y);
return 0;
}
This prints 1.010000, I don't think it's reasonable to expect more precision out of a float. Note that I swapped the order of the bytes, tested on little-endian system I think (ideone.com).
With %g for the second number, it prints 1.41135e-30.

Related

Separate byte in groups

So Im using this (its from another question I did),
unsigned char *y = resultado->informacion;
int i = 0;
int tam = data->tamanho;
unsigned char resAfter;
for (int i=0; i<tam;i++)
{
unsigned char x = data->informacion[i];
x <<= 3;
if (i>0)
{
resAfter = (resAfter << 5) | x;
}
else
{
resAfter = x;
}
}
printf("resAfter es %s\n", resAfter);
so at the end I have this really long number (Im estimating about 43 bits), how can I get groups of 8 bits, I think im gettin something like (010101010101010.....000) and I want to separate this in groups of 8.
Another question, I know for sure that resAfter is going to have n number of bits where n is a multiply of 8 plus 3, so my question is: is this possible? or c is going to complete the byte? like if I get 43 bits then c is going to fill them with 0 and complete so I have 48 bits; and is there a way to delete these 3 bits?
Im new on c and bitwise so sorry if what Im doing is reallly bad.

Basically in programming you deal with bytes (i think, at least in most cases), in C you deal with types of specific size (depending on system you run it on).
That said char usually has size of 1 byte, and I don't really think you can playing around with single bits. I mean u can do operation on them (<< for instance) in scale of single bits but i don't know of any standard way to preserve less than 8 bits in variable in C (though i may be wrong about it)

Task: describe what the following C program does? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
this is the code (copy & pasted):
#include <stdio.h>
int main(){
char x,y,result;
//Sample 1:
x = 35;
y = 85;
result = x + y;
printf("Sample 1: %hi + %hi = %hi\n",x ,y, result);
//Sample 2:
x = 85;
y = 85;
result = x + y;
printf("Sample 2: %hi + %hi = %hi\n",x ,y, result);
return 0;
}
I've tried to compile it but it doesn't work. Am I stupid or is it "int" or "short" instead of char at the beginning? Once I change this it works, but I'm worried that it should work as is...
Does the program really just add x and y and show the result? That's what it does if I use short instead of char.
Thanks in advance!
E: Why the down votes? What is wrong with my post?

Thoughts:
For an introductory course, this is a terrible example. Depending on your implementation, char is either a signed or unsigned number. And the code will behave very differently depending on this fact.
That being said, yes, this code is basically adding two numbers and printing the result. I agree that the %hi is odd. That expects a short int. I'd personally expect either %hhi or just %i, and let integer promotion do it's thing.
If the numbers are unsigned chars
85 + 35 == 120, which is probably less than CHAR_MAX (which is probably 255). So there's no problem and everything works fine.
85 + 85 == 170, which is probably less than CHAR_MAX (which is probably 255). So there's no problem and everything works fine.
If the numbers are signed chars
85 + 35 == 120, which is probably less than CHAR_MAX (which is probably 127). So there's no problem and everything works fine.
85 + 85 == 170, which is probably greater than CHAR_MAX. This causes signed integer overflow, which is undefined behavior.

The output of the program appears to be
Sample 1: 35 + 85 = 120
Sample 2: 85 + 85 = -86
I compiled this on http://ideone.com/ and it worked fine.
The output is in fact what you would expect. The program is working! The reason you are seeing a number that you do not expect is due to the width of a char data type - 1 byte.
The C standard does not dictate whether char is signed or unsigned but assuming it is signed it can represent numbers in the range -128 to 127 (a char is 8 bits or 1 byte). 85 + 85 = 170 which is outside of this range... the MSB of the byte becomes 1 and the number system wraps round to give you a negative number. Try reading up on twos compliment arithmetic.
The arithmetic is:
01010101 +
01010101
--------
10101010
Because the data type is signed and the MSB is set, the number is now negative, in this case -86
Note: Bill Lynch's answer... he has rightly pointed out that signed overflow is UB

Is using the most significant bit to tag a union considered a bad practice?

Suppose I have the following tagged union:
// f32 is a float of 32 bits
// uint32 is an unsigned int of 32 bits
struct f32_or_uint32 {
char tag;
union {
f32 f;
uint32 u;
}
}
If tag == 0, then it is a f32. If tag == 1, then it is a uint32. There is only one problem with that representation: it uses 64 bits, when only 33 should be necessary. That is almost a ´1/2´ waste, which can be considerably when you are dealing with huge buffers. I never use the 32 bits, so I thought in using one bit as the flag and doing this instead:
#define IS_UINT32(x) (!(x&0x80000000))
#define IS_F323(x) (x&0x80000000)
#define MAKE_F32(x) (x|0x80000000)
#define EXTRACT_F32(x) (x&0x7FFFFFF)
union f32_or_uint32 {
f32 f;
uint32 u;
}
This way, I am using 31 bits for the value and only 1 for the tag. My question is: could this practice be detrimental to performance, maintainability and portability?

No, you can't do that. At least, not in the general sense.
An unsigned integer takes on 2^32 different values. It uses all 32 bits. Likewise, a float takes on (nearly) 2^32 different values. It uses all 32 bits.
With some care it might well be possible to isolate a bit that will always be 1 in one type and 0 for the other, across the range of values that you actually want to use. The high bit of unsigned int would be available if you decided to use values only up to 2^31. The low bit of float could be available if you didn't mind a small rounding error.
There is a better strategy available if the range of unsigned ints is smaller (say only 23 bits). You could select a high order bit pattern of 1+8 bits that was illegal for your usage of float. Perhaps you can manage without +/- infinity? Try 0x1ff.
To answer your other questions, it's relatively easy to create a new type like this in C++, using a class and some inline functions, and get good performance. Doing it with macros in C would tend to be more invasive of the code and more prone to bugs, but with similar performance. The instruction overhead required to do these tests and perhaps do some mask operations is unlikely to be detectable in most normal usages. Obviously that would have to be reconsidered in the case of a computationally intensive usage, but you can just see this as a typical space/speed trade-off.

Let's talk first about whether this works conceptually. This trick more or less works if you're storing unsigned 32-bit numbers but you know they will never be greater than 231. It works because all numbers smaller than 231 will always have a "0" in the high bit. If you know it will always be 0, you don't actually have to store it.
The trick also more or less works if you are storing floating point numbers that are never negative. For single-precision floating point numbers, the high bit indicates sign, and is always 0 if the number is positive. (This property of floating-point numbers is not nearly as well-known among programmers, so you'd want to document this).
So assuming your use case fits in these parameters, the approach works conceptually. Now let's investigate whether it is possible to express in C.
You can't perform bitwise operations on floating-point values; for more info see [Why you can't] perform a bitwise operation on floating point numbers. So to get at the floating-point number's bit pattern, you need to treat it as a char* array:
typedef uint32_t tagged_t;
tagged_t float_to_tagged(float f) {
uint32_t ret;
memcpy(&ret, &f, sizeof(f));
// Make sure the user didn't pass us a negative number.
assert((ret & 0x80000000) == 0);
return ret | 0x80000000
}
Don't worry about that memcpy() call -- any compiler worth it's salt will optimize it away. This is the best and fastest way to get at the float's underlying bit pattern.
And you'd likewise need to use memcpy to get the original float back.
float tagged_to_float(tagged_t val) {
float ret;
val &= 0x7FFFFFF;
memcpy(&ret, &val, sizeof(val));
return ret;
}
I have answered your question directly because I believe in giving people the facts. That said, I agree with other posters who say this is unlikely to be your best design choice. Reflect on your use case: if you have very large buffers of these values, is it really the case that every single one can be either a uint32 or a float, and there is no pattern to it? If you can move this type information to a higher level, where the type info applies to all values in some part of the buffer, it will most definitely be more efficient than making your loops test the type of every value individually.

Using the high bit is going to be annoying on the most diffuse x86 platform because it's the sign bit and the most significant bit for unsigned ints.
A scheme that's IMO slightly better is to use the lowest bit instead but that requires decoding (i.e. storing a shifted integer):
#include <stdio.h>
typedef union tag_uifp {
unsigned int ui32;
float fp32;
} uifp;
#define FLOAT_VALUE 0x00
#define UINT_VALUE 0x01
int get_type(uifp x) {
return x.ui32 & 1;
}
unsigned get_uiv(uifp x) {
return x.ui32 >> 1;
}
float get_fpv(uifp x) {
return x.fp32;
}
uifp make_uiv(unsigned x) {
uifp result;
result.ui32 = 1 + (x << 1);
return result;
}
uifp make_fpv(float x) {
uifp result;
result.fp32 = x;
result.ui32 &= ~1;
return result;
}
uifp data[10];
void setNumbers() {
int i;
for (i=0; i<10; i++) {
data[i] = (i & 1) ? make_fpv(i/10.0) : make_uiv(i);
}
}
void printNumbers() {
int i;
for (i=0; i<10; i++) {
if (get_type(data[i]) == FLOAT_VALUE) {
printf("%0.3f\n", get_fpv(data[i]));
} else {
printf("%i\n", get_uiv(data[i]));
}
data[i] = (i & 1) ? make_fpv(i) : make_uiv(i);
}
}
int main(int argc, const char *argv[]) {
setNumbers();
printNumbers();
return 0;
}
With this approach what you are losing is the least significant bit of precision from the float number (i.e. storing a float value and re-reading it is going to lose some accuracy) and only 31 bits are available for the integer.
You could try instead to use only NaNs floating point values, but this means that only 22 bits are easily available for the integers because of the float format (23 if you're willing to lose also infinity).
The idea of using lowest bits for tagging is used often (e.g. Lisp implementations).

where is the high part of a double number stored in memory?

Hello I have a question regarding doubles. I am on IA32 machine and want to see how double is represented in memory. Below I have a program.
int main(int argc, char *argv[])
{
double d = 0.333333333333333314829616256247390992939472198486328125; //in hex: 3FD5 5555 5555 5555
printf("%x\n", d); //prints 55555555
return 0;
}
For some reason this only prints the latter 4 bytes which is 5555555. My question is where are the high bits (3FD5 5555) stored? is it at address (&d + 4)? or (&d - 4) or somewhere else defined in memory? Since double has 8 bytes how is it stored on a 32 bit machine?

I'm not going to say the following is "correct" by any means, but it Works Here (TM) - or really on whatever compiler/machine ideone uses (for the continuation of this answer I will assume it is a modern x86 target) - and can be used for examining the individual bytes/bits of the double value.
#include <stdio.h>
int main(void) {
double d = (double)1/3;
unsigned char *x = (unsigned char *)&d;
printf("chars: %2x%2x %2x%2x %2x%2x %2x%2x\n",
x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
// as per Yu Hao's answer
long long dd = *(long long*)&d;
printf("lld : %8llx\n", dd);
return 0;
}
Result:
chars: 5555 5555 5555 d53f // raw view doesn't account for LE/reversal
lld : 3fd5555555555555 // this is the "correct" value
The values in the two outputs are different due to the little-endian nature (within each pair of bytes) of how the integer is encoded in memory (and that the MSB affects the magnitude the most), while the individual chars are "in sequence of memory".
With 1234.5678 as input the results are:
chars: adfa 5c6d 454a 9340
lld : 40934a456d5cfaad
And with some unscrambling, a correlation can be seen:
chars: AAaa BBbb CCcc DDdd
lld : ddDDccCCbbBBaaAA

In most machines today, double has 8 bytes, try using long long(at least 64 bits) like this:
printf("%llx\n", *(long long*)(&d));

A portable method to see the hexadecimal nature of a double. "%a" prints double in hexadecimal significand and decimal exponent notation.
printf("%a\n", d);
A not completely, but reasonable portable method.
This works even if long long is more than 8 bytes.
(A long long that is not the same size as double is uncommon these days.)
#include <inttypes.h>
...
printf("%016" PRIX64 "\n", *(uint64_t *) &d);`

Convert FFFFFF to decimal value (C language)

I am trying to convert a string representing a 24-bit hexadecimal number (FFFFFF) to its decimal equivalent (-1). Could anyone help me understand why the following code does not return -1?
Thanks, LC
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
char temp_str[] = "FFFFFF";
long value;
value = strtol(temp_str, NULL, 16);
printf("value is %ld\n", value);
}

It seems like your input is the 24-bit 2's complement representation of the number, but strtol does not handle negative numbers in this way (and even if it did, it has no way of knowing that you meant a 24-bit representation). It only determines the sign of its output based on the existence of a - sign.
You can modify your code to get the result you want by adding this after the strtol:
if (value > 0x7fffff)
value -= 0x1000000;
Of course, this will only work for a 24-bit representation, other sizes will need different constants.

Hacker's delight covers this under sign extension.
For your 24 bit number, the sign bit is the 24th bit from the right and if it was set the hex value would be 0x800000.
The book suggests these:
((x + 0x800000) & 0xFFFFFF) - 0x800000
or
((x & 0xFFFFFF) xor 0x800000) - 0x800000
From your question I would say that your number is never going to be more than 24 bits so I would use the second option in your code as follows:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
char temp_str[] = "FFFFFF";
long value;
value = strtol(temp_str, NULL, 16);
value = (value ^ 0x800000) - 0x800000; // Notice that I'm not using the & 0xFFFFFF since I assumed that the number won't have more than 24 bits.
printf("value is %ld\n", value);
}
Edit 1:
I fear that my original answer, though technically sound did not answer the posed question.
Could anyone help me understand why the following code does not return -1?
Others have already covered this by the time I answered but I will restate it here anyway.
Your string is "FFFFFF", it consists of 6 hex digits. Each hex digit represents 4 bits, therefore your string represents a 24 bit number.
Your variable long value is of type long which normally corresponds to your CPU's word width (32bit or 64bit). Since these days long can be either 32 bits or 64 bits depending on your architecture you are not guaranteed to get -1 unless you give exactly the right number of hex digits.
If long on your machine is 32 bits then two things are true:
sizeof(long) will return 4
Using "FFFFFFFF" will return -1
If long on your machine is 64 bits then two things are true:
sizeof(long) will return 8
Using "FFFFFFFFFFFFFFFF" will return -1
Digression
This then lead me down a completely different path. We can generalize this and make a program that constructs a string for your machine, such that it will always return -1 from a string.
#include #include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
const char* ff = "FF";
char temp[sizeof(long) * 2 + 1]; // Ensure that the string can store enough hex digits so that we can populate the entire width of long. Include 1 byte for '\0'
int i;
long value;
/* Fill the temp array with FF */
for (i = 0; i < sizeof(long); ++i)
{
strcpy(&temp[i * 2], ff);
}
value = strtol(temp, NULL, 16);
printf("value of %s is %ld\n", temp, value);
}
This is a bad way to get a -1 result since the clear option is to just use
long value = -1;
but I will assume that this was simply an academic exercise.

Don't think as a computer now, just convery (FFFFFF)16 to decimal use ordinary math thinking. This is not about two's complement negative notation.

Because you run this program on 32- or 64-bit machine, not 24-bit. 0xffffff is actually 0x00ffffff, which is 16777215 in decimal.
Hex representation of -1 is 0xffffffff or 0xffffffffffffffff.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Convert 4 bytes to 32 bit float with high precision in C? - c

Related

Separate byte in groups

Task: describe what the following C program does? [closed]

Is using the most significant bit to tag a union considered a bad practice?

where is the high part of a double number stored in memory?

Convert FFFFFF to decimal value (C language)

Categories

Resources