converting byte array to double - c - c

I'm trying to get the numerical (double) value from a byte array of 16 elements, as follows:
unsigned char input[16];
double output;
double a = input[0];
distance = a;
for (i=1;i<16;i++){
a = input[i] << 8*i;
output += a;
but it does not work.
It seems that the temporary variable that contains the result of the left-shift can store only 32 bits, because after 4 shift operations of 8 bits it overflows.
I know that I can use something like
a = input[i] * pow(2,8*i);
but, for curiosity, I was wondering if there's any solution to this problem using the shift operator...

Edit: this won't work (see comment) without something like __int128.
a = input[i] << 8*i;
The expression input[i] is promoted to int ( , which is 32bit on your machine. To overcome this issue, the lefthand operand has to be 64bit, like in
a = (1L * input[i]) << 8*i;
a = (long long unsigned) input[i] << 8*i;
and remember about endianness

The problem here is that indeed the 32 bit variables cannot be shifted more than 4*8 times, i.e. your code works for 4 char's only.
What you could do is find the first significant char, and use Horner's law: anxn + an-1n-1 + ... = ((...( anx + an-1 ).x + an-2 ) . x + ... ) + a0 as follows:
char coefficients[16] = { 0, 0, ..., 14, 15 };
int exponent=15;
double result = 0.;
for(int exponent = 15; exp >= 0; --exp ) {
result *= 256.; // instead of <<8.
result += coefficients[ exponent ];

In short, No, you can't convert a sequence of bytes directly into a double by bit-shifting as shown by your code sample.
byte, an integer type and double, a floating point type (i.e. not an integer type) are not bitwise compatible (i.e. you can't just bitshift to values of a bunch of bytes into a floating point type and expect an equivalent result.)
1) Assuming the byte array is a memory buffer referencing an integer value, you should be able to convert your byte array into a 128-bit integer via bit-shifting and then convert that resulting integer into a double. Don't forget that endian-issues may come into play depending on the CPU architecture.
2) Assuming the byte array is a memory buffer that contains a 128-bit long double value, and assuming there are no endian issues, you should be able to memcpy the value from the byte array into the long double value
union doubleOrByte {
BYTE buffer[16];
long double val;
} dOrb;
dOrb.val = 3.14159267;
long double newval = 0.0;
memcpy((void*)&newval, (void*)dOrb.buffer, sizeof(dOrb.buffer));

Why not simply cast the array to a double pointer?
unsigned char input[16];
double* pd = (double*)input;
for (int i=0; i<sizeof(input)/sizeof(double); ++i)
cout << pd[i];
if you need to fix endian-ness, reverse the char array using the STL reverse() before casting to a double array.

Have you tried std::atof:

Are you trying to convert a string representation of a number to a real number? In that case, the C-standard atof is your best friend.

Well based off of operator precedence the right hand side of
a = input[i] << 8*i;
gets evaluated before it gets converted to a double, so you are shifting input[i] by 8*i bits, which stores its result in a 32 bit temporary variable and thus overflows. You can try the following:
a = (long long unsigned int)input[i] << 8*i;
Edit: Not sure what the size of a double is on your system, but on mine it is 8 bytes, if this is the case for you as well the second half of your input array will never be seen as the shift will overflow even the double type.


Hex array to Float

I have an array of
unsigned char array_a[4] = {0x00,0x00,0x08,0x4D};
unsigned char val[4];
float test;
what I want to do is combine all the elements and store it to val to make it 0x0000084D and converter it to float, which is 2125.
I tried memcpy
val[4] = '\0';
but still not work.
First, 0x0000084D is the big endian representation of the integer value 2125, not IEEE float.
Second, no need to copy to another char array (and accessing the 5th element out of bounds in an attempt to "nul-terminate" the array). That part makes no sense.
To convert this array to an integer on your host, copy it in a standardized 32 bit integer first, then convert it according to the endianness of your machine (else you'd get a bad value on a little endian machine)
unsigned char array_a[4] = {0x00,0x00,0x08,0x4D};
uint32_t the_int;
the_int = ntohl(the_int);
or without any external conversion libs using bit shifting making it endian-independent:
uint32_t the_int = 0;
int i;
for (i=0;i<sizeof(uint32_t);i++)
the_int <<= 8;
the_int += array_a[i];
you get 2125 all right, now you can assign it to a float if you like
float test = the_int;

C Bit-Level Int to Float Conversion Unexpected Output

I am playing around with bit-level coding (this is not homework - just curious). I found a lot of good material online and in a book called Hacker's Delight, but I am having trouble with one of the online problems.
It asks to convert an integer to a float. I used the following links as reference to work through the problem:
How to manually (bitwise) perform (float)x?
How to convert an unsigned int to a float?
Problem and Question:
I thought I understood the process well enough (I tried to document the process in the comments), but when I test it, I don't understand the output.
Test Cases:
float_i2f(2) returns 1073741824
float_i2f(3) returns 1077936128
I expected to see something like 2.0000 and 3.0000.
Did I mess up the conversion somewhere? I thought maybe this was a memory address, so I was thinking maybe I missed something in the conversion step needed to access the actual number? Or maybe I am printing it incorrectly? I am printing my output like this:
printf("Float_i2f ( %d ): ", 3);
printf("%u", float_i2f(3));
But I thought that printing method was fine for unsigned values in C (I'm used to programming in Java).
Thanks for any advice.
* float_i2f - Return bit-level equivalent of expression (float) x
* Result is returned as unsigned int, but
* it is to be interpreted as the bit-level representation of a
* single-precision floating point values.
* Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
* Max ops: 30
* Rating: 4
unsigned float_i2f(int x) {
if (x == 0){
return 0;
//save the sign bit for later and get the asolute value of x
//the absolute value is needed to shift bits to put them
//into the appropriate position for the float
unsigned int signBit = 0;
unsigned int absVal = (unsigned int)x;
if (x < 0){
signBit = 0x80000000;
absVal = (unsigned int)-x;
//Calculate the exponent
// Shift the input left until the high order bit is set to form the mantissa.
// Form the floating exponent by subtracting the number of shifts from 158.
unsigned int exponent = 158; //158 possibly because of place in byte range
while ((absVal & 0x80000000) == 0){//this checks for 0 or 1. when it reaches 1, the loop breaks
absVal <<= 1;
//find the mantissa (bit shift to the right)
unsigned int mantissa = absVal >> 8;
//place the exponent bits in the right place
exponent = exponent << 23;
//get the mantissa
mantissa = mantissa & 0x7fffff;
//return the reconstructed float
return signBit | exponent | mantissa;
Continuing from the comment. Your code is correct, and you are simply looking at the equivalent unsigned integer made up by the bits in your IEEE-754 single-precision floating point number. The IEEE-754 single-precision number format (made up of the sign, extended exponent, and mantissa), can be interpreted as a float, or those same bits can be interpreted as an unsigned integer (just the number that is made up by the 32-bits). You are outputting the unsigned equivalent for the floating point number.
You can confirm with a simple union. For example:
#include <stdio.h>
#include <stdint.h>
typedef union {
uint32_t u;
float f;
} u2f;
int main (void) {
u2f tmp = { .f = 2.0 };
printf ("\n u : %u\n f : %f\n", tmp.u, tmp.f);
return 0;
Example Usage/Output
$ ./bin/unionuf
u : 1073741824
f : 2.000000
Let me know if you have any further questions. It's good to see that your study resulted in the correct floating point conversion. (also note the second comment regarding truncation/rounding)
I'll just chime in here, because nothing specifically about endianness has been addressed. So let's talk about it.
The construction of the value in the original question was endianness-agnostic, using shifts and other bitwise operations. This means that regardless of whether your system is big- or little-endian, the actual value will be the same. The difference will be its byte order in memory.
The generally accepted convention for IEEE-754 is that the byte order is big-endian (although I believe there is no formal specification of this, and therefore no requirement on implementations to follow it). This means if you want to directly interpret your integer value as a float, it needs to be laid out in big-endian byte order.
So, you can use this approach combined with a union if and only if you know that the endianness of floats and integers on your system is the same.
On the common Intel-based architectures this is not okay. On those architectures, integers are little-endian and floats are big-endian. You need to convert your value to big-endian. A simple approach to this is to repack its bytes even if they are already big-endian:
uint32_t n = float_i2f( input_val );
uint8_t char bytes[4] = {
(uint8_t)((n >> 24) & 0xff),
(uint8_t)((n >> 16) & 0xff),
(uint8_t)((n >> 8) & 0xff),
(uint8_t)(n & 0xff)
float fval;
memcpy( &fval, bytes, sizeof(float) );
I'll stress that you only need to worry about this if you are trying to reinterpret your integer representation as a float or the other way round.
If you're only trying to output what the representation is in bits, then you don't need to worry. You can just display your integer in a useful form such as hex:
printf( "0x%08x\n", n );

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.
To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.
I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
int x; // your integer or whatever else data type
BYTE p[4];
} a;
and access the bytes directly via 1D array
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

Shifting bit values in C

Say I have the following code:
uint32_t fillThisNum(int16_t a, int16_t b, int16_t c){
uint32_t x = 0;
uint16_t temp_a = 0, temp_b = 0, temp_c = 0;
temp_a = a << 24;
temp_b = b << 4;
temp_c = c << 4;
x = temp_a|temp_b|temp_c;
return x;
Essentially what I'm trying to do is fill the 32-bit number with bit information that I can extract at a later time to perform different operations.
Parameter a would hold the first 24 bits of "data", b would hold the next 4 bits of "data" and c would hold the final 4 bits of "data".
I have a couple questions:
Do the parameters have to be the same bit length as the function type, and must they be unsigned?
Can I assign an unsigned int to a signed int? (i.e. uint32_t a = int32_t b;)
Can I fill a 32-bit number with the 16-bit parameters so long they don't exceed the length of the 32-bit return value.
Any advice/tips/hints would be much appreciated, thank you.
A correct way to write this code is:
uint32_t fillThisNum(uint32_t a, uint32_t b, uint32_t c)
// mask out the bits we are not interested in
a &= 0xFFFFFF; // save lowest 24 bits
b &= 0xF; // save lowest 4 bits
c &= 0xF; // save lowest 4 bits
// arrange a,b,c within a 32-bit unit so that they do not overlap
return (a << 8) + (b << 4) + c;
By using an unsigned type for the parameters, you avoid any issues with signed arithmetic overflow, sign extension, etc.
It's OK to pass signed values as arguments when calling the function, those values will be converted to unsigned.
By using uint32_t as the parameter type then you avoid having to declare any temporary variables or worry about type width when doing your casting. It makes it easier for you to write clear code, this way.
You don't have to do it this way but this is a simple way to make sure you don't make any mistakes.
Do the parameters have to be the same bit length as the function type, and must they be unsigned?
No, the arguments and the return value can be different types.
Can I assign an unsigned int to a signed int? (i.e. uint32_t a = int32_t b;)
Yes, the value will be converted from a signed to an unsigned value. The bits in "b" will stay the same, so while "b" is in 2's complement, "a" will be a positive 32-bit number.
So, for example, let int8_t c = -127. If you perform an assignment uint8_t d = c, then "d" will be 129.
Can I fill a 32-bit number with the 16-bit parameters so long they don't exceed the length of the 32-bit return value.
If by that, you mean the way that you did in your code:
x = temp_a|temp_b|temp_c;
Yes, that is fine, with the caveat that #chux mentioned: you can't shift an n-bit value more than n bits. If you wanted to set bits more significant than bit 15 in x, a way to do this would be to set up one of the temp masks with a 32-bit value instead of a 16-bit one.

How to get float in bytes?

I am using the HIDAPI to send some data to a USB device. This data can be sent only as byte array and I need to send some float numbers inside this data array. I know floats have 4 bytes. So I thought this might work:
float f = 0.6;
char data[4];
data[0] = (int) f >> 24;
data[1] = (int) f >> 16;
data[2] = (int) f >> 8;
data[3] = (int) f;
And later all I had to do is:
g = (float)((data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]) );
But testing this shows me that the lines like data[0] = (int) f >> 24; returns always 0. What is wrong with my code and how may I do this correctly (i.e. break a float inner data in 4 char bytes and rebuild the same float later)?
I was able to accomplish this with the following codes:
float f = 0.1;
unsigned char *pc;
pc = (unsigned char*)&f;
// 0.6 in float
pc[0] = 0x9A;
pc[1] = 0x99;
pc[2] = 0x19;
pc[3] = 0x3F;
std::cout << f << std::endl; // will print 0.6
*(unsigned int*)&f = (0x3F << 24) | (0x19 << 16) | (0x99 << 8) | (0x9A << 0);
I know memcpy() is a "cleaner" way of doing it, but this way I think the performance is somewhat better.
You can do it like this:
char data[sizeof(float)];
float f = 0.6f;
memcpy(data, &f, sizeof f); // send data
float g;
memcpy(&g, data, sizeof g); // receive data
In order for this to work, both machines need to use the same floating point representations.
As was rightly pointed out in the comments, you don't necessarily need to do the extra memcpy; instead, you can treat f directly as an array of characters (of any signedness). You still have to do memcpy on the receiving side, though, since you may not treat an arbitrary array of characters as a float! Example:
unsigned char const * const p = (unsigned char const *)&f;
for (size_t i = 0; i != sizeof f; ++i)
printf("Byte %zu is %02X\n", i, p[i]);
In standard C is guaranted that any type can be accessed as an array of bytes.
A straight way to do this is, of course, by using unions:
#include <stdio.h>
int main(void)
float x = 0x1.0p-3; /* 2^(-3) in hexa */
union float_bytes {
float val;
unsigned char bytes[sizeof(float)];
} data;
data.val = x;
for (int i = 0; i < sizeof(float); i++)
printf("Byte %d: %.2x\n", i, data.bytes[i]);
data.val *= 2; /* Doing something with the float value */
x = data.val; /* Retrieving the float value */
printf("%.4f\n", data.val);
As you can see, it is not necessary at all to use memcpy or pointers...
The union approach is easy to understand, standard and fast.
I will explain why this approach is valid in C (C99).
[] A byte has CHAR_BIT bits (an integer constant >= 8, in almost cases is 8).
[] The unsigned char type uses all its bits to represent the value of the object, which is an nonnegative integer, in a pure binary representation. This means that there are not padding bits or bits used for any other extrange purpouse. (The same thing is not guaranted for signed char or char types).
[] Every non-bitfield type is represented in memory as a contiguous sequence of bytes.
[] (Cited) "Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); [...]"
[] A pointer to a structure object (in particular, unions), suitably converted, points to its initial member. (Thus, there is no padding bytes at the beginning of a union).
[6.5(7)] The content of an object can be accessed by a character type:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively,amember of a subaggregate or contained union), or
— a character type
More information:
A discussion in google groups
Another detail of the standard C99:
[ footnote 82] Type-punning is allowed:
If the member used to access the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called "type
punning"). This might be a trap representation.
The C language guarantees that any value of any type¹ can be accessed as an array of bytes. The type of bytes is unsigned char. Here's a low-level way of copying a float to an array of bytes. sizeof(f) is the number of bytes used to store the value of the variable f; you can also use sizeof(float) (you can either pass sizeof a variable or more complex expression, or its type).
float f = 0.6;
unsigned char data[sizeof(float)];
size_t i;
for (i = 0; i < sizeof(float); i++) {
data[i] = (unsigned char*)f + i;
The functions memcpy or memmove do exactly that (or an optimized version thereof).
float f = 0.6;
unsigned char data[sizeof(float)];
memcpy(data, f, sizeof(f));
You don't even need to make this copy, though. You can directly pass a pointer to the float to your write-to-USB function, and tell it how many bytes to copy (sizeof(f)). You'll need an explicit cast if the function takes a pointer argument other than void*.
int write_to_usb(unsigned char *ptr, size_t size);
result = write_to_usb((unsigned char*)f, sizeof(f))
Note that this will work only if the device uses the same representation of floating point numbers, which is common but not universal. Most machines use the IEEE floating point formats, but you may need to switch endianness.
As for what is wrong with your attempt: the >> operator operates on integers. In the expression (int) f >> 24, f is cast to an int; if you'd written f >> 24 without the cast, f would still be automatically converted to an int. Converting a floating point value to an integer approximates it by truncating or rounding it (usually towards 0, but the rule depends on the platform). 0.6 rounded to an integer is 0 or 1, so data[0] is 0 or 1 and the others are all 0.
You need to act on the bytes of the float object, not on its value.
¹ Excluding functions which can't really be manipulated in C, but including function pointers which functions decay to automatically.
Assuming that both devices have the same notion of how floats are represented then why not just do a memcpy. i.e
unsigned char payload[4];
memcpy(payload, &f, 4);
the safest way to do this, if you control both sides is to send some sort of standardized representation... this isn't the most efficient, but it isn't too bad for small numbers.
hostPort writes char * "34.56\0" byte by byte
client reads char * "34.56\0"
then converts to float with library function atof or atof_l.
of course that isn't the most optimized, but it sure will be easy to debug.
if you wanted to get more optimized and creative, first byte is length then the exponent, then each byte represents 2 decimal places... so
34.56 becomes char array[] = {4,-2,34,56}; something like that would be portable... I would just try not to pass binary float representations around... because it can get messy fast.
It might be safer to union the float and char array. Put in the float member, pull out the 4 (or whatever the length is) bytes.
