union initializing in c [duplicate] - c

This question already has answers here:
Difference between a Structure and a Union
(17 answers)
Closed 9 years ago.
#include<stdio.h>
int main()
{
union emp;
union emp{
int age;
char name[2];
};
union emp e1={512};
printf("%d,%d,%d",e1.age,e1.name[0],e1.name[1]);
return 0;
}
Here ive tried to craete a union and initialize its 1st member i.e "int age". As per my knowledge ANSI C compilers support this. My question is why i am getting an output like "512,0,2". If i replace 512 with 513, 514 and 768 i get the following oytputs.
"513,1,2",
"514,2,2",
"768,0,3",
Now i can see that the e1.name[0] is storing (the number)%256 and e1.name[1] is storing (the number)/256.
It would be greatly appreciated if it is explained that why and how this happens.
Thank you all.

This is undefined behavior, you can't store a field in a union and retrieve it as another type. Thus, you can't really "expect" any output from this piece of code, because anything is acceptable as far as the standard is concerned.
See this question: A question about union in C
Apart from that, since 512 is 2^9, assuming integers are 32 bits and chars are 8 bits, this just means that your architecture is little endian, because 512 is 0x200. When you access e1.name[0] and e2.name[1], you are accessing the first 2 bytes of 512 that were stored in memory, which apparently happens to be 0x00 (the least significant byte), and 0x2 (next byte after the least significant).

That's what a union is supposed to do. It stores the same data in different data types. For your values
513 = 0x0201
514 = 0x0202
768 = 0x0300
name[0] is storing the least significant byte and and name[1] is storing the most significant byte. If you were using a different architecture it would be reversed.

Imagine that: a union can hold two chars OR one int. If you store an int to the union, the two char part can be read out as well, but they will contain the binary part of the stored int.
So in your example, name[0] will contain the lower 16 bits of the originally stored int, name[1] will contain the upper 16 bits of the original int.

Related

Why does an int take up 4 bytes in c or any other language? [duplicate]

This question already has answers here:
size of int variable
(6 answers)
Closed 5 years ago.
This is a bit of a general question and not completely related to the c programming language but it's what I'm on studying at the moment.
Why does an integer take up 4 bytes or How ever many bytes dependant on the system?
Why does it not take up 1 byte per integer?
For example why does the following take up 8 bytes:
int a = 1;
int b = 1;
Thanks
I am not sure whether you are asking why int objects have fixed sizes instead of variable sizes or whether you are asking why int objects have the fixed sizes they do. This answers the former.
We do not want the basic types to have variable lengths. That makes it very complicated to work with them.
We want them to have fixed lengths, because then it is much easier to generate instructions to operate on them. Also, the operations will be faster.
If the size of an int were variable, consider what happens when you do:
b = 3;
b += 100000;
scanf("%d", &b);
When b is first assigned, only one byte is needed. Then, when the addition is performed, the compiler needs more space. But b might have neighbors in memory, so the compiler cannot just grow it in place. It has to release the old memory and allocate new memory somewhere.
Then, when we do the scanf, the compiler does not know how much data is coming. scanf will have to do some very complicated work to grow b over and over again as it reads more digits. And, when it is done, how does it let you know where the new b is? The compiler has to have some mechanism to update the location for b. This is hard and complicated and will cause additional problems.
In contrast, if b has a fixed size of four bytes, this is easy. For the assignment, write 3 to b. For the addition, add 100000 to the value in b and write the result to b. For the scanf, pass the address of b to scanf and let it write the new value to b. This is easy.
The basic integral type int is guaranteed to have at least 16 bits; At least means that compilers/architectures may also provide more bits, and on 32/64 bit systems int will most likely comprise 32 bits or 64 bits (i.e. 4 bytes or 8 bytes), respectively (cf, for example, cppreference.com):
Integer types
... int (also accessible as signed int): This is the most optimal
integer type for the platform, and is guaranteed to be at least 16
bits. Most current systems use 32 bits (see Data models below).
If you want an integral type with exactly 8 bits, use the int8_t or uint8_t.
It doesn't. It's implementation-defined. A signed int in gcc on an Atmel 8-bit microcontroller, for example, is a 16-bit integer. An unsigned int is also 16-bits, but from 0-65535 since it's unsigned.
The fact that an int uses a fixed number of bytes (such as 4) is a compiler/CPU efficiency and limitation, designed to make common integer operations fast and efficient.
There are types (such as BigInteger in Java) that take a variable amount of space. These types would have 2 fields, the first being the number of words being used to represent the integer, and the second being the array of words. You could define your own VarInt type, something like:
struct VarInt {
char length;
char bytes[]; // Variable length
}
VarInt one = {1, {1}}; // 2 bytes
VarInt v257 = {2, {1,1}}; // 3 bytes
VarInt v65537 = {4, {1,0,0,1}}; // 5 bytes
and so on, but this would not be very fast to perform arithmetic on. You'd have to decide how you would want to treat overflow; resizing the storage would require dynamic memory allocation.

C - Type Name : Number?

I was wondering what the following code is doing exactly? I know it's something to do with memory alignment but when I ask for the sizeof(vehicle) it prints 20 but the struct's actual size is 22. I just need to understand how this works, thanks!
struct vehicle {
short wheels:8;
short fuelTank : 6;
short weight;
char license[16];
};
printf("\n%d", sizeof(struct vehicle));
20
Memory will be allocated as (assuming memory word size is of 8 bits)
struct vehicle {
short wheels:8; // 1 byte
short fuelTank : 6;
// padd 2 bits to make fuelTank of 1 byte.
short weight; // 2 bytes.
char license[16]; // 16 bytes.
};
1 + 1 + 2 + 16 = 20 bytes.
Consider a machine with a word size of 32bit. The two first fields fit in a whole 16bit word as they occupy 8 + 6 = 14 bits. The second field, while not a bitfield (doesn't have the :<number> thing to allocate space in bits) can fit another 16 bits word to complete a 32 bit word, so the three first fields can pack in a 32bit word (4 bytes) if the architecture allows to access the memory in 16 bit quantities. Finaly, if you add 16 characters to that, this gives the 20 bytes that sizeof operator sends to printf.
Why do you assume the sizeof (struct vehicle) is 22 bytes? You allowed the compiler to print it and it said it's 20. Compilers are free to pad (or not) the structures to achieve better performance. That's an architecture dependency, and as you have not said architecture and compiler used, it is not possible to go further.
For example, 32bit intel arch allows to pad words at even boundaries without performance penalties, so this is a good selection in order to save memory. On other architectures, perhaps it's not allowed to use 16bit integers and data must be padded to fit the third field (leading to 22 bytes for the whole structure)
The only warranty you have when sizing data is that the compiler must allocate enough space to fit everything in an efficient way, so the only thing you can assume from that declaration is that it will occupy at least the minimum space to represent one field of 8 bit, other of 6, a complete short (I'll assume a short is 16 bit) and 16 characters (assuming 8 bits per char) it ammounts to 8 + 6 + 16 + 16*8 = 158 bits minimum.
Suppose we are writing a compiler for D. Knuth MIX machine. As it's stated in his book Fundamental Algorithms, this machine has an unspecified byte size of 64..100 bytes, requiring five to construct one addressable word (plus a binary sign). If you had a byte size independent compiler (one that compiles for any MIX machine, without assumptions of byte size) you have to use no more than 64 possible values per byte, leading to 6 bit per byte. You then would assume the second field fills one complete byte (and the sign drawn from the word it belongs to) and the first field needs two complete bytes (using half of the values for negative values) The third field might be in the second word, filling three complete bytes (6*3 = 18) and the sign of that word. The next 16 chars can begin on the next word, summing up to five complete words, so the whole structure will have 1 + 1 + 4 = 6 words, or 30 bytes. But if you want to handle effectively three signed fields, you'll need three complete words for the three fields (as each has a sign field only) leading to 7 words or 35 bytes.
I have suggested this example because of the particular characteristics of this architecture, that makes one to think on not so uncommon architectures that some time ago where in common use (the first machines ever built where not binary based, like some of these MIX machines)
Note
You can try to print the actual offsets of the fields, to see where in the structure are located and see where the compiler is padding.
#define OFFSET(Typ, field) ((int)&((Typ *)0)->field)
(Note, edited)
This macro will tell you the offset as an int. Use it as OFFSET(struct vehicle, weight) or OFFSET(struct vehicle, license[3])
Note
I had to edit the last macro definition as it complains on some architectures as the conversion of pointer -> int is not always possible (on 64bit architectures, it looses some bits) so it's better to compute the difference of two pointers, which is a proper size_t value, than to convert it directly from pointer.
#define OFFSET(Typ, field) ((char *)&((Typ *)0)->field - (char *)0)

Output of following code with integer, float, char variable [duplicate]

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 9 years ago.
When I run following, it gives me output as 20.
but int is of 4 byte,float is of 4 byte and character array is of 10 byte, then the total is 18 byte. Why I am getting output as 20 byte?
#include<stdio.h>
struct emp
{
int id;
char name[10];
float f;
}e1;
main()
{
printf("\n\tSize Of Structure is==>%d\n",sizeof(e1));
}
It is address alignment.
See :
int 4
char 10 ==> 12 for alignment
float 4
like :
iiii
cccc
cccc
cc^^ <-- a data pading to make address alignment.
ffff
total 20
For several reasons (instruction set architecture, performance of generated code, ABI conformance...), the compiler is doing some data padding.
You could perhaps use the GCC __attribute__((packed)) (if using GCC or a compatible compiler like CLANG) to avoid that (at the risk of slower code and non-conformance with called libraries). In general avoid packing your data structure, and if possible order fields in them by decreasing alignment. See also __alignof__ in GCC.
First of all, it would be 18, not 16. Secondly, the compiler is allowed to add padding to structs, usually for alignment requirements for the platform in question.
What is going on is called data structure alignment, or commonly discussed in two closely related parts: data alignment and data padding.
For the processor to be able to read the bytes it needs to be set as a memory offset equal to the some multiple of the word size chunk (the word size chunk is often the amount of bytes required to store an integer), this is known as data alignment. Data padding is the process of inserting random bytes to have a proper offset with a multiple of the word size chunk. This can be done in the middle or at the end of a structure, entirely up to the compiler.
Consider the following example on a 32-bit environment. Looking at your structure:
struct emp {
int id;
char name[ 10 ];
float f;
};
If you were to create a new structure, it could be seen in memory as follows:
1. (byte for integer)
2. (byte for integer)
3. (byte for integer)
4. (byte for integer)
5. (byte for char)
6. (byte for char)
7. (byte for char)
8. (byte for char)
9. (byte for char)
10. (byte for char)
11. (byte for char)
12. (byte for char)
13. (byte for char)
14. (byte for char)
15. ***(padding byte)***
16. ***(padding byte)***
17. (byte for float)
18. (byte for float)
19. (byte for float)
20. (byte for float)
Remark:
[x] It can store an integer without any padding.
[x] It can store the 10 bytes for an array of 10 characters.
Notice that the amount of bytes for the first two fields has tallied to 14 bytes, which is not a multiple of the word size chunk 4. The compiler then inserts the proper offset of bytes.
[x] It stores two random bytes that are used to offset 14 and 4.
[x] It stores four bytes for a float.
... and hence the amount of bytes required for the emp structure is 20 bytes (rather than the initial thought of 18). The compilers are trading performance for space efficiency.
Here char is taking 2 bytes. This is done to reduce the access time for that variable. Now how does that reduce access time : read about memory bank i.e even bank and odd bank concept. If you want to make char take one byte use #pragma pack. But before doing so take a look at this

significance of int a : 2 syntax [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What does ‘unsigned temp:3’ mean?
I came across one syntax like int variable:4;
can anyone tell me what this syntax means?
struct abc
{
int a;
int b:2;
int c:1;
};`enter code here`
It defines a width of a bitfield in a struct. A bitfield holds an integer value, but its length is restricted to a certain number of bits, and hence it can only hold a restricted range of values.
In the code you posted, in the structure a is a 32-bit integer, b is a 2-bit bitfield and c is a 1-bit bitfield.
It is a bit-field. Instead of storing a full integer for b it only stores 2 bits, so b can have values of -2, -1, 0 and 1. Similarly c can only have values of -1 and 0.
Depending on what version of the compiler you have the sign extension is a little unpredictable, and some systems may present these values as 0, 1 2 and 3 or 0 and 1.
This will also pack the fields into less than an integer, but again, this is in an implementation defined manner, and you would do well not to make assumptions about how much memory will actually be used or the order of the data in memory.

How the size of this structure comes out to be 4 byte

I do have a structure having bit-fields in it.Its comes out to be 2 bytes according to me but its coming out to be 4 .I have read some question related to this here on stackoverflow but not able to relate to my problem.This is structure i do have
struct s{
char b;
int c:8;
};
int main()
{
printf("sizeof struct s = %d bytes \n",sizeof(struct s));
return 0;
}
if int type has to be on its memory boundary,then output should be 8 bytes but its showing 4 bytes??
Source: http://geeksforgeeks.org/?p=9705
In sum: it is optimizing the packing of bits (that's what bit-fields are meant for) as maximum as possible without compromising on alignment.
A variable’s data alignment deals with the way the data stored in these banks. For example, the natural alignment of int on 32-bit machine is 4 bytes. When a data type is naturally aligned, the CPU fetches it in minimum read cycles.
Similarly, the natural alignment of short int is 2 bytes. It means, a short int can be stored in bank 0 – bank 1 pair or bank 2 – bank 3 pair. A double requires 8 bytes, and occupies two rows in the memory banks. Any misalignment of double will force more than two read cycles to fetch double data.
Note that a double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.
So the compiler will introduce alignment requirement to every structure. It will be as that of the largest member of the structure. If you remove char from your struct, you will still get 4 bytes.
In your struct, char is 1 byte aligned. It is followed by an int bit-field, which is 4 byte aligned for integers, but you defined a bit-field.
8 bits = 1 byte. Char can be any byte boundary. So Char + Int:8 = 2 bytes. Well, that's an odd byte boundary so the compiler adds an additional 2 bytes to maintain the 4-byte boundary.
For it to be 8 bytes, you would have to declare an actual int (4 bytes) and a char (1 byte). That's 5 bytes. Well that's another odd byte boundary, so the struct is padded to 8 bytes.
What I have commonly done in the past to control the padding is to place fillers in between my struct to always maintain the 4 byte boundary. So if I have a struct like this:
struct s {
int id;
char b;
};
I am going to insert allocation as follows:
struct d {
int id;
char b;
char temp[3];
}
That would give me a struct with a size of 4 bytes + 1 byte + 3 bytes = 8 bytes! This way I can ensure that my struct is padded the way I want it, especially if I transmit it somewhere over the network. Also, if I ever change my implementation (such as if I were to maybe save this struct into a binary file, the fillers were there from the beginning and so as long as I maintain my initial structure, all is well!)
Finally, you can read this post on C Structure size with bit-fields for more explanation.
int c:8; means that you are declaring a bit-field with the size of 8 bits. Since the alignemt on 32 bit systems is normally 4 bytes (=32 bits) your object will appear to have 4 bytes instead of 2 bytes (char + 8 bit).
But if you specify that c should occupy 8 bits, it's not really an int, is it? The size of c + b is 2 bytes, but your compiler pads the struct to 4 bytes.
They alignment of fields in a struct is compiler/platform dependent.
Maybe your compiler uses 16-bit integers for bitfields less than or equal to 16 bits in length, maybe it never aligns structs on anything smaller than a 4-byte boundary.
Bottom line: If you want to know how struct fields are aligned you should read the documentation for the compiler and platform you are using.
In generic, platform-independent C, you can never know the size of a struct/union nor the size of a bit-field. The compiler is free to add as many padding bytes as it likes anywhere inside the struct/union/bit-field, except at the very first memory location.
In addition, the compiler is also free to add any number of padding bits to a bit-field, and may put them anywhere it likes, because which bit is msb and lsb is not defined by C.
When it comes to bit-fields, you are left out in the cold by the C language, there is no standard for them. You must read compiler documentation in detail to know how they will behave on your specific platform, and they are completely non-portable.
The sensible solution is to never ever use bit fields, they are a reduntant feature of C. Instead, use bit-wise operators. Bit-wise operators and in-depth documented bit-fields will generate the same machine code (non-documented bit-fields are free to result in quite arbitrary code). But bit-wise operators are guaranteed to work the same on any compiler/system in the world.

Resources