union: strange behavior [c] - c

It is said: "A union is a special class type that can hold only one of its non-static data members at a time." (http://en.cppreference.com/w/cpp/language/union)
But how can it hold more that one member?
y is of 8 bytes and x is of 4 bytes. The size of union is the size of the MAX element (8 bytes). It cannot hold 4 + 8 = 12 bytes...
I'm confused.
#include <stdio.h>
#include <stdlib.h>
union number {
int x;
double y;
};
int main()
{
union number value;
// ok
value.x = 1;
printf("\n int: %5d\ndouble: %f\n", value.x, value.y);
// ok
value.y = 1.0;
printf("\n int: %5d\ndouble: %f\n", value.x, value.y);
// NOT OK! But if I swap `value.x and value.y` it will work properly...
value.y = 1.0;
value.x = 1;
printf("\n int: %5d\ndouble: %f\n", value.x, value.y);
return 0;
}
The output is

The union allocates enough memory for the "largest" datatype.
As an example, If union would contain uint8_t a, uint16_t b. The data would align as follow
Bits LSB 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 MSB
Uint8_t a -------------->
Uint16_t b ------------------------------------>
Var b= 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 in binary
And If you look at var a. It will give you in binary 00000001, 128 in decimal.
`
O well, I dont know if I confused you even more now :).

A union can have as many members as you want, but only one member at a time is valid. This is because all members of a union shares the same memory. So if you change one member, then the others will change as well. However, assigning to one member, and then access another may not produce the results you expect, since often the layout in memory is different. This is the case of int and double, one can't be used as another.
What you're looking for is a structure:
struct number
{
int x;
double y;
};
The above contain two separate members, that can both be used at the same time as they no longer share the same memory.

Related

Memory alignment and padding in c

Consider this code segment
struct {
short x[5];
union {
float y;
long z;
} u;
} t;
Assume that the objects of the type short, float and long occupy 2 bytes, 4 bytes and 8 bytes, respectively. The memory requirement for variable t, Don't ignore the alignment consideration, is:
My attempt without alignment consideration is that struct will reserve 10 bytes for x as each of size is 2 bytes and 8 bytes for long z therefore total would be equal to 18 bytes but I want to know more about what is this alignment?
I want to know about how this memory alignment work
From the C standard:
alignment
requirement that objects of a particular type be located on storage boundaries with
addresses that are particular multiples of a byte address
and further
Alignment of objects
Complete object types have alignment requirements which place restrictions on the
addresses at which objects of that type may be allocated. An alignment is an
implementation-defined integer value representing the number of bytes between
successive addresses at which a given object can be allocated. An object type imposes an
alignment requirement on every object of that type
Notice the part: implementation-defined
So an implementation of the C-standard is allowed to specify restrictions on the addresses where an object of a specific type may be located.
For instance, it could be that float should always be placed at addresses that are multiples of 8, i.e. valid addresses would be X * 8. So 4000, 4008, 4016 would be valid while 4001, 4002, 4003, 4004, 4005, 4006, 4007 would be invalid.
For such implementations padding will be inserted into structs in order to get a valid address.
For your example:
If your compiler requires 8-bytes alignment of long, it will have to insert padding between x and z to make z start at an 8 byte aligned address. The size will then be 24 bytes.
But remember that this is implementation defined.
You can try this program:
#include <stdio.h>
struct {
short x[5];
union {
float y;
long z;
} u;
}t;
int main(void) {
printf("Size of t is %zu\n", sizeof(t));
printf("Size of t.x is %zu\n", sizeof(t.x));
printf("Size of t.u.y is %zu\n", sizeof(t.u.y));
printf("Size of t.u.z is %zu\n", sizeof(t.u.z));
printf("Location of t is %p\n", (void*)&t);
printf("Location of t.x is %p\n", (void*)t.x);
printf("Location of t.y is %p\n", (void*)&t.u.y);
printf("Location of t.z is %p\n", (void*)&t.u.z);
return 0;
}
Possible output:
Size of t is 24
Size of t.x is 10
Size of t.u.y is 4
Size of t.u.z is 8
Location of t is 0x559b60552020
Location of t.x is 0x559b60552020
Location of t.y is 0x559b60552030
Location of t.z is 0x559b60552030
Notice here that the size of t.x is 10 but the address distance between t.x and t.y is 16 (aka 0x10) so there are 6 bytes padding between t.x and t.z.
therefore total would be equal to 18 bytes but i want to know more about what is this alignment?
I am a compiler. I assume from your post that:
type - size and alignment
short - 2 bytes
float - 4 bytes
long - 8 bytes
So I have this code:
struct {
short x[5]; // 2 * 5 = 10 bytes, has to start at address divisible by 2
union {
float y; // 4 bytes, has to start at address divisible by 4
long z; // 8 bytes, has to start at address divisible by 8
} u; // an union, so we take bigger address and alignment..
// so it will have 8 bytes, but it also has to start at address
// divisible by 8
} t; // a structure, I need to take the biggest alignment requirement of members
// so it has to start at address dividable by 8
// and it has at least the size of sum of members
// so at least 10 + 8 bytes + padding
So because _Alignof(long) = 8, I'll make typeof(t) start at address divisible by 8. So I'll look at my linker script and pick... and pick for example memory address 200.
But u needs to start at address divisible by 8. So:
struct { // memory cell 200
short x[5]; // 12 half-words in memory cells 200 - 211
// (211 inclusice, 212 exclusive)
// 212 % 8 = 4, so we need to insert 4 bytes padding here
// so that union will start at address divisible by 8
// padding 4 bytes, memory cells 212 - 215
union { // long-word in memory cells 216 - 223
// 216 is divisible by 8
float y; // word in memory cells 216 - 219
long z; // long-word in memory cells 216 - 223
} u;
} y; // so it has size of 12 + 4 bytes padding + 8 = 24 bytes
Alignment is just that the variable has to start at memory address that is divisible by a number. So you insert padding between members, so that they start at address divisible by a number they need to.

Union data type field behavior

I am having trouble figuring out how this piece of code works.
Mainly, I am confused by how does x.br get the value of 516 after x.str.a and x.str.b get their values of 4 and 2, respectively.
I am new to unions, so maybe there is something I am missing, but shouldn't there only be 1 active field in an union at any given time?
#include <stdio.h>
void f(short num, short* res) {
if (num) {
*res = *res * 10 + num % 10;
f(num / 10, res);
}
}
typedef union {
short br;
struct {
char a, b;
} str;
} un;
void main() {
short res = 0; un x;
x.str.a = 4;
x.str.b = 2;
f(x.br, &res);
x.br = res;
printf("%d %d %d\n", x.br, x.str.a, x.str.b);
}
I would be very thankful if somebody cleared this up for me, thank you!
To add to #Deepstop answer, and to correct an important point about your understanding -
shouldn't there only be 1 active field in an union at any given time?
There's no such a thing as active field in unions. All the different fields are referring to the same exact piece of memory (except miss-aligned data.
You can look at the different fields as different ways to interpret the same data, i.e. you can read your union as two fields or 8 bits or one field of 16 bits. But both will always "work" at the same time.
OK short is likely a 16 bit integer. Char a, b are each 8 bit chars.
So you are using the same 16 bit memory location for both.
0000 0010 0000 0100 is the 16 bit representation of 516
0000 0010 is the 8 bit representation of 2
0000 0100 is the 8 bit representation of 4
The CPU you are running this on is 'little-endian' so the low-order byte of a 16 bit integer comes first, which is the 2, and the high order byte, the 4, comes second.
So by writing 2 then 4 into consecutive bytes, and reading them back as a 16 bit integer, you get 516, which is 2 * 256 + 4. If you wrote 3 then 5, you'd get 3 * 256 + 5, which is 783.
The point is that union puts two data structures in exactly the same memory location.
how does x.br get the value of 516 after x.str.a and x.str.b get their
values of 4 and 2
Your union definition
typedef union {
short br;
struct {
char a, b;
} str;
} un;
Specifies that un.br share the same memory address as un.str. This is the whole point of a union. This means that when you modify the value of un.br that you are also modifying the values for un.str.a and un.str.b.
I am new to unions, so maybe there is something I am missing, but
shouldn't there only be 1 active field in an union at any given time?
Not sure what you mean by "only be 1 active field", but the members of a union are all mapped to the same memory address so any time that you write a value to a union member it writes that value to the same memory address as the other members. If you want the members to be mapped to different memory addresses so that when you write the value of a member it only modifies the value of that specific member, then you should use a struct and not a union.

different between C struct bitfields on char and on int

When using bitfields in C, I found out differences I did not expect related to the actual type that is used to declare the fields.
I didn't find any clear explanation. Now, the problem is identified, so if though there is no clear response, this post may be useful to anyone facing the same issue.
Still if some can point to a formal explanation, this coudl be great.
The following structure, takes 2 bytes in memory.
struct {
char field0 : 1; // 1 bit - bit 0
char field1 : 2; // 2 bits - bits 2 down to 1
char field2 ; // 8 bits - bits 15 down to 8
} reg0;
This one takes 4 bytes in memory, the question is why ?
struct {
int field0 : 1; // 1 bit - bit 0
int field1 : 2; // 2 bits - bits 2 down to 1
char field2 ; // 8 bits - bits 15 down to 8
} reg1;
In both cases, the bits are organized in memory in the same way: field 2 is always taking bits 15 down to 8.
I tried to find some literarure on the subject, but still can't get a clear explanation.
The two most appropriate links I can found are:
http://www.catb.org/esr/structure-packing/
http://www.msg.ucsf.edu/local/programs/IBM_Compilers/C:C++/html/language/ref/clrc03defbitf.htm
However, none really explains really why the second structure is taking 4 bytes. Actually reading carefully, I would even expect the structure to take 2 bytes.
In both cases,
field0 takes 1 bit
field1 takes 2 bits
field2 takes 8 bits, and is aligned on the first available byte address
Hence, the useful data requires 2 bytes in both cases.
So what is behind the scene that makes reg1 to take 4 bytes ?
Full Code Example:
#include "stdio.h"
// Register Structure using char
typedef struct {
// Reg0
struct _reg0_bitfieldsA {
char field0 : 1;
char field1 : 2;
char field2 ;
} reg0;
// Nextreg
char NextReg;
} regfileA_t;
// Register Structure using int
typedef struct {
// Reg1
struct _reg1_bitfieldsB {
int field0 : 1;
int field1 : 2;
char field2 ;
} reg1;
// Reg
char NextReg;
} regfileB_t;
regfileA_t regsA;
regfileB_t regsB;
int main(int argc, char const *argv[])
{
int* ptrA, *ptrB;
printf("sizeof(regsA) == %-0d\n",sizeof(regsA)); // prints 3 - as expected
printf("sizeof(regsB) == %-0d\n",sizeof(regsB)); // prints 8 - why ?
printf("\n");
printf("sizeof(regsA.reg0) == %-0d\n",sizeof(regsA.reg0)); // prints 2 - as epxected
printf("sizeof(regsB.reg0) == %-0d\n",sizeof(regsB.reg1)); // prints 4 - int bit fields tells the struct to use 4 bytes then.
printf("\n");
printf("addrof(regsA.reg0) == 0x%08x\n",(int)(&regsA.reg0)); // 0x0804A028
printf("addrof(regsA.reg1) == 0x%08x\n",(int)(&regsA.NextReg)); // 0x0804A02A = prev + 2
printf("addrof(regsB.reg0) == 0x%08x\n",(int)(&regsB.reg1)); // 0x0804A020
printf("addrof(regsB.reg1) == 0x%08x\n",(int)(&regsB.NextReg)); // 0x0804A024 = prev + 4 - my register is not at the righ place then.
printf("\n");
regsA.reg0.field0 = 1;
regsA.reg0.field1 = 3;
regsA.reg0.field2 = 0xAB;
regsB.reg1.field0 = 1;
regsB.reg1.field1 = 3;
regsB.reg1.field2 = 0xAB;
ptrA = (int*)&regsA;
ptrB = (int*)&regsB;
printf("regsA.reg0.value == 0x%08x\n",(int)(*ptrA)); // 0x0000AB07 (expected)
printf("regsB.reg0.value == 0x%08x\n",(int)(*ptrB)); // 0x0000AB07 (expected)
return 0;
}
When I first write the struct I expected to get reg1 to take only 2 bytes, hence the next register was at the offset = 2.
The relevant part of the standard is C11/C17 6.7.2.1p11:
An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
that, in connection with C11/C17 6.7.2.1p5
A. bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. It is implementation-defined whether atomic types are permitted.
and that you're using char means that there is nothing to discuss in general - for a specific implementation check the compiler manuals. Here's the one for GCC.
From the 2 excerpts it follows that an implementation is free to use absolutely whatever types it wants to to implement the bitfields - it could even use int64_t for both of these cases having the structure of size 16 bytes. The only thing a conforming implementation must do is to pack the bits within the chosen addressable storage unit if enough space remains.
For GCC on System-V ABI on 386-compatible (32-bit processors), the following stands:
Plain bit-fields (that is, those neither signed nor unsigned) always have non- negative values. Although they may have type char, short, int, long, (which can have negative values),
these bit-fields have the same range as a bit-field of the same size
with the corresponding unsigned type. Bit-fields obey the same
size and alignment rules as other structure and union members, with
the following additions:
Bit-fields are allocated from right to left (least to most significant).
A bit-field must entirely reside in a storage unit appropriate for its declared type. Thus a bit-field never crosses its unit boundary.
Bit-fields may share a storage unit with other struct/union members, including members that are not bit-fields. Of course,
struct members occupy different parts of the storage unit.
Unnamed bit-fields' types do not affect the alignment of a structure or union, although individual bit-fields' member offsets obey the
alignment constraints.
i.e. in System-V ABI, 386, int f: 1 says that the bit-field f must be within an int. If entire bytes of space remains, a following char within the same struct will be packed inside this int, even if it is not a bit-field.
Using this knowledge, the layout for
struct {
int a : 1; // 1 bit - bit 0
int b : 2; // 2 bits - bits 2 down to 1
char c ; // 8 bits - bits 15 down to 8
} reg1;
will be
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|a b b x x x x x|c c c c c c c c|x x x x x x x x|x x x x x x x x|
<------------------------------ int ---------------------------->
and the layout for
struct {
char a : 1; // 1 bit - bit 0
char b : 2; // 2 bits - bits 2 down to 1
char c ; // 8 bits - bits 15 down to 8
} reg1;
will be
1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|a b b x x x x x|c c c c c c c c|
<---- char ----><---- char ---->
So there are tricky edge cases. Compare the 2 definitions here:
struct x {
short a : 2;
short b : 15;
char c ;
};
struct y {
int a : 2;
int b : 15;
char c ;
};
Because the bit-field must not cross the unit boundary, the struct x members a and b need to go to different shorts. Then there is not enough space to accommodate the char c, so it must come after that. And the entire struct must be suitably aligned for short so it will be 6 bytes on i386. The latter however, will pack a and b in the 17 lowest bits of the int, and since there is still one entire addressable byte left within the int, the c will be packed here too, and hence sizeof (struct y) will be 4.
Finally, you must really specify whether the int or char is signed or not - the default might be not what you expect! Standard leaves it up to the implementation, and GCC has a compile-time switch to change them.

sizeof(), alignment in C structs:

Preface:
Did my research about struct alignment. Looked at this question, this one and also this one - but still did not find my answer.
My Actual Question:
Here is a code snippet I created in order to clarify my question:
#include "stdafx.h"
#include <stdio.h>
struct IntAndCharStruct
{
int a;
char b;
};
struct IntAndDoubleStruct
{
int a;
double d;
};
struct IntFloatAndDoubleStruct
{
int a;
float c;
double d;
};
int main()
{
printf("Int: %d\n", sizeof(int));
printf("Float: %d\n", sizeof(float));
printf("Char: %d\n", sizeof(char));
printf("Double: %d\n", sizeof(double));
printf("IntAndCharStruct: %d\n", sizeof(IntAndCharStruct));
printf("IntAndDoubleStruct: %d\n", sizeof(IntAndDoubleStruct));
printf("IntFloatAndDoubleStruct: %d\n", sizeof(IntFloatAndDoubleStruct));
getchar();
}
And it's output is:
Int: 4
Float: 4
Char: 1
Double: 8
IntAndCharStruct: 8
IntAndDoubleStruct: 16
IntFloatAndDoubleStruct: 16
I get the alignment seen in the IntAndCharStruct and in the IntAndDoubleStruct.
But I just don't get the IntFloatAndDoubleStruct one.
Simply put: Why isn't sizeof(IntFloatAndDoubleStruct) = 24?
Thanks in advance!
p.s: I'm using Visual-Studio 2017, standard console application.
Edit:
Per comments, tested IntDoubleAndFloatStruct (different order of elements) and got 24 in the sizeof() - And I will be happy if answers will note and explain this case too.
On your platform, the following holds: The size of int and float are both 4. The size & alignment requirement of double is 8.
We know this from the sizeof output you've shown. sizeof (T) gives the number of bytes between the addresses of two consecutive elements of type T in an array. So we know that the alignment requirements are as I've said above. (Note)
Now, the compiler reported 16 for IntFloatAndDoubleStruct. Does it work out?
Assume we have such an object at an address aligned to 16.
int a is therefore at address X aligned to 16, so it's aligned to 4 just fine. It will occupy bytes [X, X+4)
This means float c could start at X+4, which is aligned to 4, which is fine for float. It will occupy bytes [X+4, X+8)
Finally, double d could start at X+8, which is aligned to 8, which is fine for double. It will occupy bytes [X+8, X+16)
This leaves X+16 free for the next struct object, again aligned to 16.
So there's no reason to start any of the members later, so the whole struct fits into 16 bytes just fine.
(Note) This is not strictly true: for each of these, we know that both size and alignment are <= N, that N is a multiple of the alignment requirement, and that there is no N1 < N for which this would also hold. However, this is a very fine detail, and for clarity the answer simply assumes the actual size and alignment requirements for the primitive types are indetical, which is the most likely case on the OP's platform anyway.
Your struct must be 8*N bytes long, since it has a member with 8 bytes (double). That means the struct sits in the memory at an address (A) divisible by 8 (A%8 == 0), and its end address will be (A + 8N) which will also be divisible by 8.
From there, you store 2 4-bytes variables (int + float) meaning you now occupy the memory area [A,A+8). Now you store an 8-byte variable (double). There is no need for padding since (A+8) % 8 == 0 [since A%8 == 0]. So, with no padding you get the 4+4+8 == 16.
If you change the order to int -> double -> float you'll occupy 24 bytes since the double variable original address will not be divisible by 8 and it will have to pad 4 bytes to get to a valid address (and also the struct will have padding at the end).
|--------||--------||--------||--------||--------||--------||--------||--------|
| each || cell || here ||represen||-ts 4 || bytes || || |
|--------||--------||--------||--------||--------||--------||--------||--------|
A A+4 A+8 A+12 A+16 A+20 A+24 [addresses]
|--------||--------||--------||--------||--------||--------||--------||--------|
| int || float || double || double || || || || | [content - basic case]
|--------||--------||--------||--------||--------||--------||--------||--------|
first padding to ensure the double sits on address that is divisble by 8
last padding to ensure the struct size is divisble by the largest member's size (8)
|--------||--------||--------||--------||--------||--------||--------||--------|
| int || padding|| double || double || float || padding|| || | [content - change order case]
|--------||--------||--------||--------||--------||--------||--------||--------|
Compiler will insert padding in order to guarantee that each element is at offset that is some multiple of its size.
In this case int will be at offset=0 (relative to address of a structure instance), float at offset=4, and double at offset=8, because sizes of int and float add up to 8.
There's no padding at the end - size of the structure is already 16, which is a multiple of size of double.

It will give the output 28 on DEV c++ compiler I am expecting 26 considering float as 4 byte int as 4 byte and char as 1 byte [duplicate]

This question already has answers here:
Structure padding and packing
(11 answers)
Closed 7 years ago.
struct
{
int m,s,l;
union
{
char c[10];
};
float p;
} a;
int main()
{
printf("%d",sizeof(a));
return 0;
}
according to the calculations ans must be 4*3+10+4=26 but DEV C++ compiler showing the output 28.
The principles of allocating local variables on stack are roughly the same for all systems
1 -- Stack grows from high addresses to low addresses
2 -- The order of declaration of your variables in your program corresponds to growth of stack
3 -- Each type has alignment - the address of any variable must be divisible by its size (1 for char, 2 for short, etc).Try to waste as little space as possible
In this way, we try to use some waste space to improve the speed of memory access.
So in your codes, it meams that
#include <stdio.h>
#include <stdlib.h>
struct // the base address is &a
{
int m; // [0, 4)
int s; // [4, 8)
int l; // [8, 12)
union
{
char c[10]; // [12, 22)
};
// char m_align[2] // there are 4 byte for align
float p; // [24, 28)
}a; // sizeof = 28
the value m is begin at the start of type a, and occupancy 4 byte in 32bit system, so it's address is [0, 4). and it's start address can divisible by sizeof(int).
this is the same to s and l.
and then the union value c[10] occupancy, occupancy [12, 22), the begining od it's address 12 can divisible by sizeof(int).
What's importment the next address is 22, and the value p will occupancy sizeof(float) = 4 bytes,but the address 22 can't divisible by sizeof(float). we must fill the space to alignment the struct and try to waste as little space as possible. So the base address of value p will be 24 which can divisible by sizeof(float), so the adddress of p is [24, 28]...
you can see this https://en.wikipedia.org/wiki/Data_structure_alignment for detail
The members in the structure arranged as a group of 4 bytes in 32 bit processor.So you are getting 28 bytes as size.
for more details see here
It is due to padding

Resources