C - combine two uint32_t to a double - c

Hi I'm working on socket translation between two protocols. I read from a binary file and store the parsed header into an array of type uint32_t. Then I grab the fields out of the array and convert them into respective types. So far uint32_t/int32_t/uint16_t to int32_t works fine.
However, I get all kinds of wrong outputs when trying to combine two uint32_t (append one after the other) and then converting this 64bit long data into a double.
Being a newbie to C programming, I'm struggling with the computer methodology of double / float representation.
Basically what I want to do is: without altering the bit pattern of the two uint32_t, concast concatenate one after the other to make a 64-bit data, then convert the data as a double. The most important thing is not to alter the bit pattern as that part of the bit stream is supposed to be a double.
The following is part of the code:
uint32_t* buffer = (uint32_t*) malloc (arraySize * sizeof(uint32_t));
...
double outputSampleRate = ((union { double i; double outputSampleRate; })
{ .i = ((uint64_t)buffer[6] << 32 | (uint64_t)buffer[7])}).outputSampleRate;
Data in input file:
35.5
value after my code:
4630192998146113536.000000
Also, is there a better way to handle the socket header parsing?

Reinterpreting bit patterns through a union requires that the union elements have the right type. Your union has two doubles, so when you read from one, it will have the same value as the other. The conversion from uint32_t to double will be one that preserves numeric results, explaining the "garbage", which is really just the double reinterpreted as an integer. You will also need to use the correct byte order (low word first? high word first?) and the easiest way to do this is by avoiding bit shifting altogether.
double outputSampleRate = ((union { uint32_t i[2]; double d; })
{ .i = { buffer[6], buffer[7] } }).d;
You could use uint64_t i but... why bother?
You could also use memcpy() to copy the bytes...
double outputSampleRate;
memcpy(&outputSampleRate, &buffer[6], sizeof(outputSampleRate));
The usual caveats apply: while these solutions are relatively portable, they do not take endian issues into account, and they will not work on systems that violate your assumptions about e.g. how big a double is, but it is generally safe to make these assumptions.

Your union definition is incorrect, you want i to be defined as uint64_t:
double outputSampleRate = ((union { uint64_t i; double d; })
{ .i = ((uint64_t)buffer[6] << 32 | (uint64_t)buffer[7])}).d;
You might also be running into an endianness issue. Try little endian:
double outputSampleRate = ((union { unt64_t i; double d; })
{ .i = ((uint64_t)buffer[7] << 32) | (uint64_t)buffer[6]}).d;
Reinterpreting the bits of the representation via a union is actually supported by the C Standard and is known as type punning. It is not guaranteed to work if the bits represent a trap value for the destination type.
You could try other casts and tricks: test your luck and use a pointer cast:
double outputSampleRate = *(uint64_t*)&buffer[6];
Another way to force type punning is to use the memcpy function:
double outputSampleRate;
uint64_t temp = ((uint64_t)buffer[7] << 32) | (uint64_t)buffer[6];
memcpy(&outputSampleRate, &temp, sizeof(outputSampleRate));
Or simply:
double outputSampleRate;
memcpy(&outputSampleRate, &buffer[6], sizeof(outputSampleRate));
But it does not seem guaranteed to work either, although I have seen some instances of both of the above in production code.

Related

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]
General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.
Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

How to change double's endianness?

I need to read data from serial port. They are in little endian, but I need to do it platform independent, so I have to cover the endianness of double. I couldn't find anywhere, how to do it, so I wrote my own function. But I am not sure with it. (and I don't have a machine with big endian to try it on).
Will this work correctly? or is there some better approach I wasn't able to find?
double get_double(uint8_t * buff){
double value;
memcpy(&value,buff,sizeof(double));
uint64_t tmp;
tmp = le64toh(*(uint64_t*)&value);
value = *(double*) &tmp;
return value;
}
p.s. I count with double 8 bytes long, so don't bother with this pls. I know that there might be problems with this
EDIT: After suggestion, that I should use union, I did this:
union double_int{
double d;
uint64_t i;
};
double get_double(uint8_t * buff){
union double_int value;
memcpy(&value,buff,sizeof(double));
value.i = le64toh(value.i);
return value.d;
}
better? (though I don't see much of a difference)
EDIT2: attemtp #3, what do you think now?
double get_double(uint8_t * buff){
double value;
uint64_t tmp;
memcpy(&tmp,buff,sizeof(double));
tmp = le64toh(tmp);
memcpy(&value,&tmp,sizeof(double));
return value;
}
Edit3: I compile it with gcc -std=gnu99 -lpthread -Wall -pedantic
Edit4: After next suggestion I added a condition for endianness order checking. I honestly have no idea, what I am doing right now (shouldn't there be something like __DOUBLE_WORD_ORDER__ ?)
double get_double(uint8_t * buff){
double value;
if (__FLOAT_WORD_ORDER__ == __ORDER_BIG_ENDIAN__){
uint64_t tmp;
memcpy(&tmp,buff,sizeof(double));
tmp = le64toh(tmp);
memcpy(&value,&tmp,sizeof(double));
}
else {
memcpy(&value,buff,sizeof(double));
}
return value;
}
I'd just go and copy the bytes manually to a temporary double and then return that. In C (and I think C++) it is fine to cast any pointer to char *; that's one of the explicit permissions for aliasing (in the standard draft n1570, par. 6.5/7), as I mentioned in one of my comments. The exception is absolutely necessary in order to get anything done that is close to hardware; including reversing bytes received over a network :-).
There is no standard compile time way to determine whether that's necessary which is a pity; if you want to avoid branches which is probably a good idea if you deal with lots of data, you should look up your compiler's documentation for proprietary defines so that you can choose the proper code branch at compile time. gcc, for example, has __FLOAT_WORD_ORDER__ set to either __ORDER_LITTLE_ENDIAN__ or __ORDER_BIG_ENDIAN__.
(Because of your question in comments: __FLOAT_WORD_ORDER__ means floating points in general. It would be a very sick mind who designs a FPU that has different byte orders for different data sizes :-). In all reality there aren't many mainstream architectures which have different byte orders for floating point vs. integer types. As Wikipedia says, small systems may differ.)
Basile pointed to ntohd, a conversion function which exists on Windows but apparently not on Linux.
My naive sample implementation would be like
/** copy the bytes at data into a double, reversing the
byte order, and return that.
*/
double reverseValue(const char *data)
{
double result;
char *dest = (char *)&result;
for(int i=0; i<sizeof(double); i++)
{
dest[i] = data[sizeof(double)-i-1];
}
return result;
}
/** Adjust the byte order from network to host.
On a big endian machine this is a NOP.
*/
double ntohd(double src)
{
# if !defined(__FLOAT_WORD_ORDER__) \
|| !defined(__ORDER_LITTLE_ENDIAN__)
# error "oops: unknown byte order"
# endif
# if __FLOAT_WORD_ORDER__ == __ORDER_LITTLE_ENDIAN__
return reverseValue((char *)&src);
# else
return src;
# endif
}
There is a working example here: https://ideone.com/aV9mj4.
An improved version would cater to the given CPU -- it may have an 8 byte swap command.
If you can adapt the software both sides (emitter & receiver) you could use some serialization library and format. It could be using old XDR routines like xdr_double in your case (see xdr(3)...). You could also consider ASN1 or using textual formats like JSON.
XDR is big endian. You might try to find some NDR implementation, is is little endian.
See also this related question, and STFW for htond
Ok, so finally, thanks to you all, I have found the best solution. Now my code looks like this:
double reverseDouble(const char *data){
double result;
char *dest = (char *)&result;
for(int i=0; i<sizeof(double); i++)
dest[i] = data[sizeof(double)-i-1];
return result;
}
double get_double(uint8_t * buff){
double value;
memcpy(&value,buff,sizeof(double));
if (__FLOAT_WORD_ORDER__ == __ORDER_BIG_ENDIAN__)
return reverseDouble((char *)&value);
else
return value;
}
p.s. (checking for defines etc. is somewhere else)
If you can assume the endianness of doubles is the same as for integers (which you can't, generally, but it's almost always the same), you can read it as an integer by shifting it byte by byte, and then cast the representation to a double.
double get_double_from_little_endian(uint8_t * buff) {
uint64_t u64 = ((uint64_t)buff[0] << 0 |
(uint64_t)buff[1] << 8 |
(uint64_t)buff[2] << 16 |
(uint64_t)buff[3] << 24 |
(uint64_t)buff[4] << 32 |
(uint64_t)buff[5] << 40 |
(uint64_t)buff[6] << 48 |
(uint64_t)buff[7] << 56);
return *(double *)(char *)&u64;
}
This is only standard C and it's optimized well by compilers. The C standard doesn't define how the floats are represented in memory though, so compiler-dependent macros like __FLOAT_WORD_ORDER__ may give better results, but it gets more complex if you want to cover also the "mixed-endian IEEE format" which is found on ARM old-ABI.

Is using the most significant bit to tag a union considered a bad practice?

Suppose I have the following tagged union:
// f32 is a float of 32 bits
// uint32 is an unsigned int of 32 bits
struct f32_or_uint32 {
char tag;
union {
f32 f;
uint32 u;
}
}
If tag == 0, then it is a f32. If tag == 1, then it is a uint32. There is only one problem with that representation: it uses 64 bits, when only 33 should be necessary. That is almost a ´1/2´ waste, which can be considerably when you are dealing with huge buffers. I never use the 32 bits, so I thought in using one bit as the flag and doing this instead:
#define IS_UINT32(x) (!(x&0x80000000))
#define IS_F323(x) (x&0x80000000)
#define MAKE_F32(x) (x|0x80000000)
#define EXTRACT_F32(x) (x&0x7FFFFFF)
union f32_or_uint32 {
f32 f;
uint32 u;
}
This way, I am using 31 bits for the value and only 1 for the tag. My question is: could this practice be detrimental to performance, maintainability and portability?
No, you can't do that. At least, not in the general sense.
An unsigned integer takes on 2^32 different values. It uses all 32 bits. Likewise, a float takes on (nearly) 2^32 different values. It uses all 32 bits.
With some care it might well be possible to isolate a bit that will always be 1 in one type and 0 for the other, across the range of values that you actually want to use. The high bit of unsigned int would be available if you decided to use values only up to 2^31. The low bit of float could be available if you didn't mind a small rounding error.
There is a better strategy available if the range of unsigned ints is smaller (say only 23 bits). You could select a high order bit pattern of 1+8 bits that was illegal for your usage of float. Perhaps you can manage without +/- infinity? Try 0x1ff.
To answer your other questions, it's relatively easy to create a new type like this in C++, using a class and some inline functions, and get good performance. Doing it with macros in C would tend to be more invasive of the code and more prone to bugs, but with similar performance. The instruction overhead required to do these tests and perhaps do some mask operations is unlikely to be detectable in most normal usages. Obviously that would have to be reconsidered in the case of a computationally intensive usage, but you can just see this as a typical space/speed trade-off.
Let's talk first about whether this works conceptually. This trick more or less works if you're storing unsigned 32-bit numbers but you know they will never be greater than 231. It works because all numbers smaller than 231 will always have a "0" in the high bit. If you know it will always be 0, you don't actually have to store it.
The trick also more or less works if you are storing floating point numbers that are never negative. For single-precision floating point numbers, the high bit indicates sign, and is always 0 if the number is positive. (This property of floating-point numbers is not nearly as well-known among programmers, so you'd want to document this).
So assuming your use case fits in these parameters, the approach works conceptually. Now let's investigate whether it is possible to express in C.
You can't perform bitwise operations on floating-point values; for more info see [Why you can't] perform a bitwise operation on floating point numbers. So to get at the floating-point number's bit pattern, you need to treat it as a char* array:
typedef uint32_t tagged_t;
tagged_t float_to_tagged(float f) {
uint32_t ret;
memcpy(&ret, &f, sizeof(f));
// Make sure the user didn't pass us a negative number.
assert((ret & 0x80000000) == 0);
return ret | 0x80000000
}
Don't worry about that memcpy() call -- any compiler worth it's salt will optimize it away. This is the best and fastest way to get at the float's underlying bit pattern.
And you'd likewise need to use memcpy to get the original float back.
float tagged_to_float(tagged_t val) {
float ret;
val &= 0x7FFFFFF;
memcpy(&ret, &val, sizeof(val));
return ret;
}
I have answered your question directly because I believe in giving people the facts. That said, I agree with other posters who say this is unlikely to be your best design choice. Reflect on your use case: if you have very large buffers of these values, is it really the case that every single one can be either a uint32 or a float, and there is no pattern to it? If you can move this type information to a higher level, where the type info applies to all values in some part of the buffer, it will most definitely be more efficient than making your loops test the type of every value individually.
Using the high bit is going to be annoying on the most diffuse x86 platform because it's the sign bit and the most significant bit for unsigned ints.
A scheme that's IMO slightly better is to use the lowest bit instead but that requires decoding (i.e. storing a shifted integer):
#include <stdio.h>
typedef union tag_uifp {
unsigned int ui32;
float fp32;
} uifp;
#define FLOAT_VALUE 0x00
#define UINT_VALUE 0x01
int get_type(uifp x) {
return x.ui32 & 1;
}
unsigned get_uiv(uifp x) {
return x.ui32 >> 1;
}
float get_fpv(uifp x) {
return x.fp32;
}
uifp make_uiv(unsigned x) {
uifp result;
result.ui32 = 1 + (x << 1);
return result;
}
uifp make_fpv(float x) {
uifp result;
result.fp32 = x;
result.ui32 &= ~1;
return result;
}
uifp data[10];
void setNumbers() {
int i;
for (i=0; i<10; i++) {
data[i] = (i & 1) ? make_fpv(i/10.0) : make_uiv(i);
}
}
void printNumbers() {
int i;
for (i=0; i<10; i++) {
if (get_type(data[i]) == FLOAT_VALUE) {
printf("%0.3f\n", get_fpv(data[i]));
} else {
printf("%i\n", get_uiv(data[i]));
}
data[i] = (i & 1) ? make_fpv(i) : make_uiv(i);
}
}
int main(int argc, const char *argv[]) {
setNumbers();
printNumbers();
return 0;
}
With this approach what you are losing is the least significant bit of precision from the float number (i.e. storing a float value and re-reading it is going to lose some accuracy) and only 31 bits are available for the integer.
You could try instead to use only NaNs floating point values, but this means that only 22 bits are easily available for the integers because of the float format (23 if you're willing to lose also infinity).
The idea of using lowest bits for tagging is used often (e.g. Lisp implementations).

Big-endian arithmetic in C

Is there a convenient way of doing arithmetic with big-endian data? Here's what I've been doing (in pseudocode):
main:
unsigned int big_endian_number = 0x12345678;
int multiplier = 7;
unsigned int little_endian_number = reverse_the_bytes(big_endian_number);
little_endian_number = little_endian_number * multiplier;
big_endian_number = reverse_the_bytes(little_endian_number);
This seems direct, but verbose and error-prone. There has to be a better way.
Network byte order is big endian, use ntohl (network to host) that will convert to your local endian, then htonl to convert back.
I can post a code example if necessary but I think that's fairly straight forward.
Personally, I would define some functions in a header to do the arithmetic ops you need:
#include <arpa/inet.h>
static inline uint32_t BEAdd_u32(uint32_t x, uint32_t y) {
return htonl(ntohl(x) + ntohl(y));
}
and use those instead of littering your code with conversions.
This question makes no sense. If you write x = 0x12345678; x *= 2, then x will have have the value 0x2468acf0. Whether that is stored in memory with the f0 in the first byte or the 24 in the first byte is totally irrelevant. The whole point of using a high level language is that it works and you don't care how 0x12345678 is stored. (That is, the compiler converts the literal 0x2468acf0 into the appropriate representation on the box, and you don't have to worry about it.)

How to treat a struct with two unsigned shorts as if it were an unsigned int? (in C)

I created a structure to represent a fixed-point positive number. I want the numbers in both sides of the decimal point to consist 2 bytes.
typedef struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
Now I want to add two fixed point numbers, Fixed x and Fixed y. To do so I treat them like integers and add.
(Fixed) ( (int)x + (int)y );
But as my visual studio 2010 compiler says, I cannot convert between Fixed and int.
What's the right way to do this?
EDIT: I'm not committed to the {short floor, short fraction} implementation of Fixed.
You could attempt a nasty hack, but there's a problem here with endian-ness. Whatever you do to convert, how is the compiler supposed to know that you want floor to be the most significant part of the result, and fraction the less significant part? Any solution that relies on re-interpreting memory is going to work for one endian-ness but not another.
You should either:
(1) define the conversion explicitly. Assuming short is 16 bits:
unsigned int val = (x.floor << 16) + x.fraction;
(2) change Fixed so that it has an int member instead of two shorts, and then decompose when required, rather than composing when required.
If you want addition to be fast, then (2) is the thing to do. If you have a 64 bit type, then you can also do multiplication without decomposing: unsigned int result = (((uint64_t)x) * y) >> 16.
The nasty hack, by the way, would be this:
unsigned int val;
assert(sizeof(Fixed) == sizeof(unsigned int)) // could be a static test
assert(2 * sizeof(unsigned short) == sizeof(unsigned int)) // could be a static test
memcpy(&val, &x, sizeof(unsigned int));
That would work on a big-endian system, where Fixed has no padding (and the integer types have no padding bits). On a little-endian system you'd need the members of Fixed to be in the other order, which is why it's nasty. Sometimes casting through memcpy is the right thing to do (in which case it's a "trick" rather than a "nasty hack"). This just isn't one of those times.
If you have to you can use a union but beware of endian issues. You might find the arithmetic doesn't work and certainly is not portable.
typedef struct Fixed_t {
union {
struct { unsigned short floor; unsigned short fraction };
unsigned int whole;
};
} Fixed;
which is more likely (I think) to work big-endian (which Windows/Intel isn't).
Some magic:
typedef union Fixed {
uint16_t w[2];
uint32_t d;
} Fixed;
#define Floor w[((Fixed){1}).d==1]
#define Fraction w[((Fixed){1}).d!=1]
Key points:
I use fixed-size integer types so you're not depending on short being 16-bit and int being 32-bit.
The macros for Floor and Fraction (capitalized to avoid clashing with floor() function) access the two parts in an endian-independent way, as foo.Floor and foo.Fraction.
Edit: At OP's request, an explanation of the macros:
Unions are a way of declaring an object consisting of several different overlapping types. Here we have uint16_t w[2]; overlapping uint32_t d;, making it possible to access the value as 2 16-bit units or 1 32-bit unit.
(Fixed){1} is a compound literal, and could be written more verbosely as (Fixed){{1,0}}. Its first element (uint16_t w[2];) gets initialized with {1,0}. The expression ((Fixed){1}).d then evaluates to the 32-bit integer whose first 16-bit half is 1 and whose second 16-bit half is 0. On a little-endian system, this value is 1, so ((Fixed){1}).d==1 evaluates to 1 (true) and ((Fixed){1}).d!=1 evaluates to 0 (false). On a big-endian system, it'll be the other way around.
Thus, on a little-endian system, Floor is w[1] and Fraction is w[0]. On a big-endian system, Floor is w[0] and Fraction is w[1]. Either way, you end up storing/accessing the correct half of the 32-bit value for the endian-ness of your platform.
In theory, a hypothetical system could use a completely different representation for 16-bit and 32-bit values (for instance interleaving the bits of the two halves), breaking these macros. In practice, that's not going to happen. :-)
This is not possible portably, as the compiler does not guarantee a Fixed will use the same amount of space as an int. The right way is to define a function Fixed add(Fixed a, Fixed b).
Just add the pieces separately. You need to know the value of the fraction that means "1" - here I'm calling that FRAC_MAX:
// c = a + b
void fixed_add( Fixed* a, Fixed* b, Fixed* c){
unsigned short carry = 0;
if((int)(a->floor) + (int)(b->floor) > FRAC_MAX){
carry = 1;
c->fraction = a->floor + b->floor - FRAC_MAX;
}
c->floor = a->floor + b->floor + carry;
}
Alternatively, if you're just setting the fixed point as being at the 2 byte boundary you can do something like:
void fixed_add( Fixed* a, Fixed *b, Fixed *c){
int ia = a->floor << 16 + a->fraction;
int ib = b->floor << 16 + b->fraction;
int ic = ia + ib;
c->floor = ic >> 16;
c->fraction = ic - c->floor;
}
Try this:
typedef union {
struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
int Fixed_int;
}
If your compiler puts the two short on 4 bytes, then you can use memcpy to copy your int in your struct, but as said in another answer, this is not portable... and quite ugly.
Do you really care adding separately each field in a separate method?
Do you want to keep the integer for performance reason?
// add two Fixed
Fixed operator+( Fixed a, Fixed b )
{
...
}
//add Fixed and int
Fixed operator+( Fixed a, int b )
{
...
}
You may cast any addressable type to another one by using:
*(newtype *)&var

Resources