Unexpected Union behaviour

Unexpected Union behaviour - c

The code below outputs different numbers each time ..
apples.num prints 2 which is correct, and apples.weight prints different numbers each time, it once even printed out "nan", and I don't know why is this happening ..
The really strange thing is that the double (apples.volume) prints out 2.0 ..
Can anybody explain things to me ?
#include <stdio.h>
typedef union {
short num;
float weight;
double volume;
} Count;
int main(int argc, char const *argv[])
{
Count apples;
apples.num = 2;
printf("Num: %d\nWeight: %f\nVolume: %f\n", apples.num, apples.weight, apples.volume);
return 0;
}

It seems to me you don't quite understand what a union is. The members of a union are overlapping values (in other words, the three members of a Count union share the same space).
Assuming, just for the sake of demonstration, a short is 16 bits (2 bytes), a float is 32 bits (4 bytes) and a double is 64 bits (8 bytes), then the union is 8 bytes in size. In little-endian format, the num member refers to the first 2 bytes, the weight member refers to the first 4 bytes (including the 2 bytes of num) and the volume member refers to the full 8 bytes (including the 2 bytes of num and the four bytes of weight).
Initially, your union contains garbage, i.e. some unknown bit pattern, let's display it like this (in hex):
GG GG GG GG GG GG GG GG // GG stands for garbage, i.e. unknown bit patterns
If you set num to 2, then the first two bytes are 0x02 0x00, but the other bytes are still garbage:
02 00 GG GG GG GG GG GG
If you read weight, you are simply reading the first four bytes, interpreted as a float, so the float contains the bytes
02 00 GG GG
Since floating point values have a totally different format as integral types like short, you can't predict what those bytes (i.e. that particular bit pattern) represent. They do not represent the floating point value 2.0f, which is what you probably want. Actually, the "more significant" part of a float is stored in the upper bytes, i.e. in the "garbage" part of weight, so it can be almost anything, including a NaN, +infinity, -infinity, etc.
Similarly, if you read volume, you have a double that consists of the bytes
02 00 GG GG GG GG GG GG
and that does not necessarily represent 2.0 either (although, by chance, it MAY come very close, if by coincidence the right bits are set at the right places, and if the low bits are rounded away when you display such a value).
Unions are not meant to do a proper conversion from int to float or double. They are merely meant to be able to store different kinds of values to the same type, and reading from another member as you set simply means you are reinterpreting a number of bits that are present in the union as something completely different. You are not converting.
So how do you convert? It is quite simple and does not require a union:
short num = 2;
float weight = num; // the compiler inserts code that performs a conversion to float
double volume = num; // the compiler inserts code that performs a conversion to double

If you access a union via the "wrong" member (i.e. a member other than the one it was assigned through), the result will depend on the semantics of the particular bit pattern for that type. Where the assigned type has a smaller bit-width that he accessed type, some of those bits will be undefined.

You are accessing uninitialized data. It will provide undefined behavior (ie: unknown values in this case). You also likely mean to use a struct instead of a union.
#include <stdio.h>
typedef union {
short num;
float weight;
double volume;
} Count;
int main(int argc, char const *argv[])
{
Count apples = { 0 };
apples.num = 2;
printf("Num: %d\nWeight: %f\nVolume: %f\n", apples.num, apples.weight, apples.volume);
return 0;
}
Initialize the union by either zeroing it out, or setting the largest member to a value. Even if you set the largest member, the other values might not even make sense. This is commonly used for creating a byte/word/nibble/long-word data type and making the individual bits accessible.

Related

"Bitwise assignment" in C? Assigning a variable's bits to another

Is it possible to do a bitwise assignment in C? (Assigning the bits of a variable to another, assuming for simplicity that the source and the target of assignment have the same number of bits.)
For example, assign the number int 1 (which has bits 0...01) to a float variable, obtaining not the float number 1.0f but the number (assuming IEEE-754 representation and assuming a float is 4 bytes as the int) with bits:
0 (sign) 0000'0000 (exponent) 0...01 (mantissa)
which would be a subnormal number (cause the exponent bits are all 0's and the mantissa is not zero), hence representing the number
+ 2^-126 2^-23 (assuming mantissa has 23 bits, then 0..(23 zeroes in total)..1 is 2^-23), that is 2^-149, that is approx. 1.4 10^-45.
NOTE: I'm in the process of learning. I am not trying to do this in a real-life scenario.

Given two objects a and b that are known to be the same size, you can copy the bits of b into a with memcpy(&a, &b, sizeof a);.

You could use a union for that:
int source;
float target;
union Data {
int i;
float f;
} data;
source = 42;
data.i = source;
target = data.f; // target should now have the bitwise equivalent of 42.
Be mindful about the sizes of the union members. If they are not equal I think they are padded to the right, but to be sure check with the documentation.

You can access the bits of other types via pointers (including memcpy, which takes them as arguments) or via unions. There is already another answer about the former, so I'll focus on the union approach.
Union members share the same memory, so you could use bit fields or integer types to access the individual bits, and then view the same bits by using a member of another type. However, note that both accessing the value of another type via a union and bit fields themselves are implementation defined, so this is inherently non-portable. In particular it is not specified how the bits end up being aligned in relation to other union members…
An example for the case of floats:
#include <stdio.h>
union float_bits {
float value;
struct {
unsigned mantissa:23;
unsigned exponent:8;
unsigned sign:1;
};
struct {
unsigned bits:32;
};
};
static void print_float_bits(union float_bits f) {
printf("(%c %02x %06x) (%08x) %f\n", f.sign ? '-' : '+', (unsigned) f.exponent, (unsigned) f.mantissa, (unsigned) f.bits, f.value);
}
int main(void) {
union float_bits f;
f.value = 1;
print_float_bits(f);
f.sign = 1;
print_float_bits(f);
// Largest normal number
f.sign = 0; f.exponent = 0xFE; f.mantissa = 0x7FFFFF;
print_float_bits(f);
// Infinity
f.exponent = 0xFF; f.mantissa = 0;
print_float_bits(f);
return 0;
}
On my x86-64 machine with 32-bit IEEE-754 floats, compiled with clang, outputs:
(+ 7f 000000) (3f800000) 1.000000
(- 7f 000000) (bf800000) -1.000000
(+ fe 7fffff) (7f7fffff) 340282346638528859811704183484516925440.000000
(+ ff 000000) (7f800000) inf
Disclaimer: Very much implementation defined behaviour, non-portable and dangerous. Bitfields used for readability of the example. Other alternatives would be to put an array of char or some integer type like uint32_t in the union instead of bitfields, but it's still very much implementation defined behaviour.

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]

General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.

Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

Variable assignment disparity inside a union

What I've heard about union is that it will assign the memory space for biggest sized variable within it. Here I'm trying to assign 'same' value in two different ways , but its ending up problematic.
First,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
The output would be a large random number followed by 'X'
-1674402216 X
When I tried assigning h1.a also into a number,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.a = 1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
This gives the output
88 X
Can someone help me to figure out what exactly is happening here ?
Thankyou :)

Union members occupy same space in memory.
So your union looks something like this:
N-1 ...
--------
N ||X||a||
N+1 | |a||
N+2 | |a||
N+3 | |a||
... | |
--------
...
(Assuming system with 32 bit integer.)
By assiging X you have also modified one byte of your un-initialized a. Your value (-1674402216) can be interpreted as 9C32A658 in base 16. You least significant byte is 58 HEX, which is ASCII code of X adn your other three bytes kept their initial uninitialized value.
In your second case you first initialized int to 1 (which set all but least significant byte to 0), then you have overwritten least significant byte by X have gotten 88 (ASCII code of X) when interpreted as int, and original 'X', when
looking at char member.
Not to forget to mention: Layout like this is implementation defined. Standard does say, as mentioned in comments to you question, that you should not actually access member not written last while at the same time, it is a common practice to use them exactly to do this (see this threads: Why do we need C Unions?, What is the strict aliasing rule?).

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
while(quotient!=0){
bin[i]=quotient%2;
quotient=quotient/2;
i++;
}
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.

To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.

I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
#include<stdio.h>
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
x=0x11223344;
printf("%x\n",p[0]);
printf("%x\n",p[1]);
printf("%x\n",p[2]);
printf("%x\n",p[3]);
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
union
#include<stdio.h>
union
{
int x; // your integer or whatever else data type
BYTE p[4];
} a;
a.x=0x11223344;
printf("%x\n",a.p[0]);
printf("%x\n",a.p[1]);
printf("%x\n",a.p[2]);
printf("%x\n",a.p[3]);
and access the bytes directly via 1D array
[notes]
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
x=0x11223344;
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

How to treat a struct with two unsigned shorts as if it were an unsigned int? (in C)

I created a structure to represent a fixed-point positive number. I want the numbers in both sides of the decimal point to consist 2 bytes.
typedef struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
Now I want to add two fixed point numbers, Fixed x and Fixed y. To do so I treat them like integers and add.
(Fixed) ( (int)x + (int)y );
But as my visual studio 2010 compiler says, I cannot convert between Fixed and int.
What's the right way to do this?
EDIT: I'm not committed to the {short floor, short fraction} implementation of Fixed.

You could attempt a nasty hack, but there's a problem here with endian-ness. Whatever you do to convert, how is the compiler supposed to know that you want floor to be the most significant part of the result, and fraction the less significant part? Any solution that relies on re-interpreting memory is going to work for one endian-ness but not another.
You should either:
(1) define the conversion explicitly. Assuming short is 16 bits:
unsigned int val = (x.floor << 16) + x.fraction;
(2) change Fixed so that it has an int member instead of two shorts, and then decompose when required, rather than composing when required.
If you want addition to be fast, then (2) is the thing to do. If you have a 64 bit type, then you can also do multiplication without decomposing: unsigned int result = (((uint64_t)x) * y) >> 16.
The nasty hack, by the way, would be this:
unsigned int val;
assert(sizeof(Fixed) == sizeof(unsigned int)) // could be a static test
assert(2 * sizeof(unsigned short) == sizeof(unsigned int)) // could be a static test
memcpy(&val, &x, sizeof(unsigned int));
That would work on a big-endian system, where Fixed has no padding (and the integer types have no padding bits). On a little-endian system you'd need the members of Fixed to be in the other order, which is why it's nasty. Sometimes casting through memcpy is the right thing to do (in which case it's a "trick" rather than a "nasty hack"). This just isn't one of those times.

If you have to you can use a union but beware of endian issues. You might find the arithmetic doesn't work and certainly is not portable.
typedef struct Fixed_t {
union {
struct { unsigned short floor; unsigned short fraction };
unsigned int whole;
};
} Fixed;
which is more likely (I think) to work big-endian (which Windows/Intel isn't).

Some magic:
typedef union Fixed {
uint16_t w[2];
uint32_t d;
} Fixed;
#define Floor w[((Fixed){1}).d==1]
#define Fraction w[((Fixed){1}).d!=1]
Key points:
I use fixed-size integer types so you're not depending on short being 16-bit and int being 32-bit.
The macros for Floor and Fraction (capitalized to avoid clashing with floor() function) access the two parts in an endian-independent way, as foo.Floor and foo.Fraction.
Edit: At OP's request, an explanation of the macros:
Unions are a way of declaring an object consisting of several different overlapping types. Here we have uint16_t w[2]; overlapping uint32_t d;, making it possible to access the value as 2 16-bit units or 1 32-bit unit.
(Fixed){1} is a compound literal, and could be written more verbosely as (Fixed){{1,0}}. Its first element (uint16_t w[2];) gets initialized with {1,0}. The expression ((Fixed){1}).d then evaluates to the 32-bit integer whose first 16-bit half is 1 and whose second 16-bit half is 0. On a little-endian system, this value is 1, so ((Fixed){1}).d==1 evaluates to 1 (true) and ((Fixed){1}).d!=1 evaluates to 0 (false). On a big-endian system, it'll be the other way around.
Thus, on a little-endian system, Floor is w[1] and Fraction is w[0]. On a big-endian system, Floor is w[0] and Fraction is w[1]. Either way, you end up storing/accessing the correct half of the 32-bit value for the endian-ness of your platform.
In theory, a hypothetical system could use a completely different representation for 16-bit and 32-bit values (for instance interleaving the bits of the two halves), breaking these macros. In practice, that's not going to happen. :-)

This is not possible portably, as the compiler does not guarantee a Fixed will use the same amount of space as an int. The right way is to define a function Fixed add(Fixed a, Fixed b).

Just add the pieces separately. You need to know the value of the fraction that means "1" - here I'm calling that FRAC_MAX:
// c = a + b
void fixed_add( Fixed* a, Fixed* b, Fixed* c){
unsigned short carry = 0;
if((int)(a->floor) + (int)(b->floor) > FRAC_MAX){
carry = 1;
c->fraction = a->floor + b->floor - FRAC_MAX;
}
c->floor = a->floor + b->floor + carry;
}
Alternatively, if you're just setting the fixed point as being at the 2 byte boundary you can do something like:
void fixed_add( Fixed* a, Fixed *b, Fixed *c){
int ia = a->floor << 16 + a->fraction;
int ib = b->floor << 16 + b->fraction;
int ic = ia + ib;
c->floor = ic >> 16;
c->fraction = ic - c->floor;
}

Try this:
typedef union {
struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
int Fixed_int;
}

If your compiler puts the two short on 4 bytes, then you can use memcpy to copy your int in your struct, but as said in another answer, this is not portable... and quite ugly.
Do you really care adding separately each field in a separate method?
Do you want to keep the integer for performance reason?

// add two Fixed
Fixed operator+( Fixed a, Fixed b )
{
...
}
//add Fixed and int
Fixed operator+( Fixed a, int b )
{
...
}

You may cast any addressable type to another one by using:
*(newtype *)&var

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight