behavior of sizeof operator? - c

#include<stdio.h>
struct krishna {
int i,j,k,l,m;
char c;
double d;
char g[48];
};
int main() {
struct krishna *me={0};
printf("%ld %ld\n",sizeof(me),sizeof(*me));//output is 8 80 how??
return 0;
}
Hello everyone I am new here and the compiler I use is gcc compiler in the above code can anyone explain why
1) pointer irrespective of any type is allocated 8 ?
2) sizeof the above struct is 80 ? Can anyone explain to me in general for any structure how can one determine the structure size , I am getting confused each time I expect one value but getting a different answer and I have also read other questions and answers in stack overflow regarding this and I am still not getting it.Please help.

printf("%ld %ld\n",sizeof(me),sizeof(*me));//output is 8 80 how??
Actually that should be:
printf("%zu %zu\n",sizeof(me),sizeof(*me));//output is 8 80 how??
"%zu" is the correct format string for a size_t value, such as the value you get from sizeof. "%ld" may happen to work on some systems (and apparently it does on yours), but you shouldn't count on that.
If your compiler doesn't support "%zu", you can use "%lu" (which expects an unsigned long argument) and explicitly convert the arguments:
printf("%lu %lu\n", (unsigned long)sizeof(me), (unsigned long)sizeof(*me));
You're getting 8 for sizeof(me) because that happens to be the size of a pointer on the compiler you're using (8 bytes, 64 bits). If you compiled and ran your program on a different system, you might get 4, because a lot of systems have 32-bit pointers. (And this assumes a byte is 8 bits, which is true for most systems but not guaranteed by the language.)
Most compilers make all pointers the same size, but that's not guaranteed by the language either. For example, on a word-addressed machine, an int* pointer could be just a machine-level address, but a char* pointer might need additional information to specify which byte within the word it points to. You're not very likely to run into a system with varying pointer sizes, but there's still no point in assuming that all pointers are the same size.
As for the size of the structure, that also can vary from one compiler to another. Here's your structure again:
struct krishna {
int i,j,k,l,m;
char c;
double d;
char g[48];
};
char is always exactly 1 byte, and char[48] is always exactly 48 bytes.
The number of bytes in an int can vary from one system to another; 4 bytes is most common these days.
The size of a double is typically 8 bytes, but this can also vary (though I don't think I've ever seen a system where sizeof (double) isn't 8 bytes.)
Structure members are laid out in the order in which they're declared, so your i will be at the very beginning of the structure, followed by j, k, and so forth.
Finally, the compiler will often insert padding bytes between members, or after the last member, so that each member is properly aligned. For example, on many systems a 4-byte int needs to be aligned at an offset that's a multiple of 4 bytes; if it's misaligned, access to it may be slow and/or very difficult.
The fact that sizeof (struct krishna) happens to be 80 bytes on your system isn't really all that important. It's more important to understand (a) the general rules compilers use to determine how structures are laid out, and (b) the fact that those rules can result in different layouts for different systems.
The language definition and your compiler guarantee that you can have objects of type struct krishna, and that you can access those objects and their members, getting back whatever values you stored in them. If you need to know how big a struct krishna is, the answer is simply sizeof (struct krishna). If, for some reason, you need to know more details than that (say, if you need to match some externally imposed layout), you can do some experiments and/or consult your compiler's documentation -- but be aware that the specifics will apply only to the compiler you're using on the system where you're using it. (Often an ABI for your system will constrain the compiler's choices.)
You can also use sizeof and offsetof (look it up) to find out where each member is allocated.

All pointers are addresses, and all addresses are the same size on a given system, usually 4 bytes on a 32 bit system and 8 bytes on a 64 bit system. Since you are getting 8, you must be on a 64 bit system.
The size of a struct depends on how the compiler "packs" the individual fields of the struct together into a single block of memory to contain the entire struct. In your case, your struct has 5 int fields (4 bytes each), a single char field (1 byte), a single double field (8 bytes), and a 48 character array. Add all that up and you get 20 + 1 + 8 + 48 = 77 bytes to store your data. The actual size is 80 because the compiler is "padding" the 1 byte char field with 3 extra unused bytes in order to keep all fields in the struct aligned to a 4-byte memory address, which is needed for good performance.
Hope that helps!

This is because:
sizeof( me ) // is a pointer.
... me is a pointer. The size of a pointer is a multiple of the word on your environment, hence it's common that on 32-bit environments a pointer is 4 bytes whereas on a 64-bit environment a pointer is 8 bytes (but not written in stone). If you were to go back a couple years, a 16-bit environment would have a 2 byte pointer. Looking at the next sizeof:
sizeof( *me ) // is a struct krishna, hence 80 bytes are needed to store it in memory.
... is a structure and the size of the structure krishna is 80 bytes. If you look at the structure:
struct krishna {
int i,j,k,l,m; // sizeof( int ) * 5
char c; // sizeof( char ) * 1
double d; // sizeof( double ) * 1
char g[48]; // sizeof( char ) * 48
// padding for memory address offset would be here.
};
... if you add up the amount of bytes required for each field and include the appropriate data structure alignment for the memory address offset then it will total 80 bytes (as expected). The reason it adds an extra 3 unused bytes is because to store a structure in memory it must be in a continuous block of memory that is allocated for the structure. For performance reasons, it will pad any size issues to ensure that the memory addresses are always a multiple of the word. The tradeoff of the 3 bytes for performance improvements is worth it, 3 bytes nowadays is not as impactful as the performance improvements the processor has when data alignment is guaranteed.

Just to add to answers of #Jacob Pollack and #ObjetDart, you can find more about structure padding at Structure padding in C.

Related

How is memory allocated, when using structures

I'm curious, how exactly does memory allocation looks like after calling struct list x; for code bellow:
struct list {
int key;
char name[10];
struct list* ptr;
};
Variable x will store 4 bytes for key, 10 bytes for name and how much bytes for ptr?
struct list will be allocated as a single contiguous block of memory likely containing the following (assuming sizeof(int) == 4 for this platform and toolchain. I use the word "likely" because some considerations here are actually implementation-defined.
Four bytes for key.
Ten bytes for name.
Padding bytes to align ptr to the expected alignment.
sizeof(list*) bytes for a pointer. On a modern-day desktop computer using a common operating system and ABI (meaning a flat addressing model), I could guess that it's likely to be 4 or 8 bytes for 32-bit and 64-bit systems respectively. In reality, a pointer's size is implementation-defined and depends on a number of factors, as Eric Postpischil adds:
...the C standard permits pointers to be different sizes depending on the types they point to. For example, in old word-addressed computers, some pointers may have only a word number. To address characters, a C implementation had to synthesize byte addresses by adding extra bits to the address and generating extra instructions to manipulate the word data.
The size of the alignment is a bit tricky to figure out since it depends on a combination of the platform (different CPU architectures have different alignment requirements), toolchain/ABI, and any unusual commands/configurations (e.g. #pragma pack or equivalent).
If I had to guess with reasonable assumptions but no information, it would be plausible that there are two bytes of padding regardless of whether this was a 32-bit or 64-bit system. Two bytes of padding places the offset of ptr at 4+10+2=16, which satisfies both a four-byte and an eight-byte alignment.
It will be dependent on one's architecture but try it out
#include <stdio.h>
struct list {
int key;
char name[10];
struct list* ptr;
};
int main(void) {
printf("Size of struct is %d\n", sizeof(struct list));
struct list the_list;
printf("struct is at address %p\n", &the_list);
printf("key is at address %p\n", &the_list.key);
printf("name is at address %p\n", &the_list.name);
printf("ptr is at address %p\n", &the_list.ptr);
return 0;
}
When I ran this I got
Size of struct is 24
struct is at address 0x7ffcf32ad210
key is at address 0x7ffcf32ad210
name is at address 0x7ffcf32ad214
ptr is at address 0x7ffcf32ad220
showing that of the 24 bytes total, the first 4 bytes were for key at the beginning of the memory block, the next 12 for name, and then the final 8 were for ptr. Notice there were 2 bytes of padding between name and ptr.
But this may differ on different architectures. As always, best to try things out!
The size of a pointer is constant. It doesn't depend on the size of the structure it points to.
For additional information on the size of pointers, take a look here
Your structure will start with four bytes for key (because int is four bytes in your C implementation) and ten bytes for name. (The C standard would allow a C implementation to insert padding between key and name, but there is no need for it.)
Pointers are commonly four or eight bytes in modern systems, but they can be other sizes. They can even be different sizes in different C implementations on the same system. So ptr will likely take four or eight bytes in your C implementation.
Pointers may require alignment, such as requiring four-byte alignment for a four-byte pointer. In this case, there will be two bytes of padding after name and before ptr. That is because the structure has 14 bytes in key and name, so, to bring that up to a multiple of four, two bytes of padding are needed. (Again, the C standard allows an implementation to insert more padding, but it is not needed.)

C malloc offsets relative to struct definition locations (and padding)

C question:
Does malloc'ing a struct always result in linear placement from top to bottom of the data inside? As a second minor question: is there a standard on the padding size, or does it vary between 32 and 64 bit machines?
Using this test code:
#include <stdio.h>
#include <stdlib.h>
struct test
{
char a;
/* char pad[3]; // I'm assuming this happens from the compiler */
int b;
};
int main() {
int size;
char* testarray;
struct test* testarraystruct;
size = sizeof(struct test);
testarray = malloc(size * 4);
testarraystruct = (struct test *)testarray;
testarraystruct[1].a = 123;
printf("test.a = %d\n", testarray[size]); // Will this always be test.a?
free(testarray);
return 0;
}
On my machine, size is always 8. Therefore I check testarray[8] to see if it's the second struct's 'char a' field. In this example, my machine returns 123, but this obviously isn't proof it always is.
Does the C compiler have any guarantees that struct's are created in linear order?
I am not claiming the way this is done is safe or the proper way, this is a curiosity question.
Does this change if this is becomes C++?
Better yet, would my memory look like this if it malloc'd to 0x00001000?
0x00001000 char a // First test struct
0x00001001 char pad[0]
0x00001002 char pad[1]
0x00001003 char pad[2]
0x00001004 int b // Next four belong to byte b
0x00001005
0x00001006
0x00001007
0x00001008 char a // Second test struct
0x00001009 char pad[0]
0x0000100a char pad[1]
0x0000100b char pad[2]
0x0000100c int b // Next four belong to byte b
0x0000100d
0x0000100e
0x0000100f
NOTE: This question assumes int's are 32 bits
As far as I know, malloc for struct is not linear placement of data but it's a linear allocation of memory for the members with in the structure that too when you create an object of it.
This is also necessary for padding.
Padding also depends on the type of machine (i.e 32 bit or 64 bit).
The CPU fetches the memory based on whether it is 32 bit or 64 bit.
For 32 bit machine your structure will be:
struct test
{
char a; /* 3 bytes padding done to a */
int b;
};
Here your CPU fetch cycle is 32 bit i.e 4 bytes
So in this case (for this example) the CPU takes two fetch cycles.
To make it more clear in one fetch cycle CPU allocates 4 bytes of memory. So 3 bytes of padding will be done to "char a".
For 64 bit machine your structure will be:
struct test
{
char a;
int b; /* 3 bytes padding done to b */
}
Here the CPU fetch cycle is 8 bytes.
So in this case (for this example) the CPU takes one fetch cycles. So 3 bytes of padding here must be done to "int b".
However you can avoid the padding you can use #pragma pack 1
But this will not be efficient w.r.t time because here CPU fetch cycles will be more (for this example CPU fetch cycles will be 5).
This is tradeoff between CPU fetch cycles and padding.
For many CPU types, it is most efficient to read an N-byte quantity (where N is a power of 2 — 1, 2, 4, 8, sometimes 16) when it is aligned on an N-byte address boundary. Some CPU types will generate a SIGBUS error if you try to read an incorrectly aligned quantity; others will make extra memory accesses as necessary to retrieve an incorrectly aligned quantity. AFAICR, the DEC Alpha (subsequently Compaq and HP) had a mechanism that effectively used a system call to fix up a misaligned memory access, which was fiendishly expensive. You could control whether that was allowed with a program (uac — unaligned access control) which would stop the kernel from aborting the process and would do the necessary double reads.
C compilers are aware of the benefits and costs of accessing misaligned data, and go to lengths to avoid doing so. In particular, they ensure that data within a structure, or an array of structures, is appropriately aligned for fast access unless you hold them to ransom with quasi-standard #pragma directives like #pragma pack(1).
For your sample data structure, for a machine where sizeof(int) == 4 (most 32-bit and 64-bit systems), then there will indeed be 3 bytes of padding after an initial 1 byte char field and before a 4-byte int. If you use short s; after the single character, there would be just 1 byte of padding. Indeed, the following 3 structures are all the same size on many machines:
struct test_1
{
char a;
/* char pad[3]; // I'm assuming this happens from the compiler */
int b;
};
struct test_2
{
char a;
short s;
int b;
};
struct test_3
{
char a;
char c;
short s;
int b;
};
The C standard mandates that the elements of a structure are laid out in the sequence in which they are defined. That is, in struct test_3, the element a comes first, then c, then s, then b. That is, a is at the lowest address (and the standard mandates that there is no padding before the first element), then c is at an address higher than a (but the standard does not mandate that it will be one byte higher), then s is at an address higher than c, and that b is at an address higher than s. Further, the elements cannot overlap. There may be padding after the last element of a structure. For example in struct test_4, on many computers, there will be 7 bytes of padding between a and d, and there will be 7 bytes of padding after b:
struct test_4
{
char a;
double d;
char b;
};
This ensures that every element of an array of struct test_4 will have the d member properly aligned on an 8-byte boundary for optimal access (but at the cost of space; the size of the structure is often 24 bytes).
As noted in the first comment to the question, the layout and alignment of the structure is independent of whether the space is allocated by malloc() or on the stack or in global variables. Note that malloc() does not know what the pointer it returns will be used for. Its job is simply to ensure that no matter what the pointer is used for, there will be no misaligned access. That often means the pointer returned by malloc() will fall on an 8-byte boundary; on some 64-bit systems, the address is always a multiple of 16 bytes. That means that consecutive malloc() calls each allocating 1 byte will seldom produce addresses 1 byte apart.
For your sample code, I believe that standard does require that testdata[size] does equal 123 after the assignment. At the very least, you would be hard-pressed to find a compiler where it is not the case.
For simple structures containing plain old data (POD — simple C data types), C++ provides the same layout as C. If the structure is a class with virtual functions, etc, then the layout rules depend on the compiler. Virtual bases and the dreaded 'diamond of death' multiple inheritance, etc, also make changes to the layout of structures.

What memory space is occupied by auto variables in stack

I read that functions in C may use local stack-based variables, and they are allocated simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks (if I am not mistaken). But, what if run code like following:
void foo(void)
{
char str[6];
......
}
What size does var str occupy? 6 bytes or 6 × 4 bytes as per the four-byte chunks.
The four-byte-chunk rule just means that the stack pointer must point to an address that is a multiple of four. In this case, allocating 8 bytes satisfies that rule, and such a block is large enough to hold a 6-character array with only 2 bytes of padding.
Data alignment is a CPU requirement which means that the alignment amount changes from a CPU to another, keep that in mind.
Speaking about stack data-alignment, gcc for example keeps the data aligned using an option called -mpreferred-stack-boundary=n where the data will be aligned to 2^n.
By default, the value of n is 4 which makes the stack-alignment 16-bytes.
What this means is that you'll find yourself allocating 16 bytes in stack memory although what you explictly allocated was just an integer.
int main()
{
char ar[6] = {1,2,3,4,5,6};
int x = 10;
int y = 12 + (int) ar[1] + x;
return y;
}
Compiling this code with gcc on my CPU produces the following assembly(posting only the stack-allocation instruction):
subl $32, %esp
But why 32? we're allocating data that fits exactly in 16 bytes.
Well, there are 8 bytes gcc needs to keep saved for the leave and ret which makes the total needed memory 24.
BUT, the alignment requirement is 16-bytes and thus gcc needs to allocate stack-space so that it's made up of 16-bytes chunks; making that 24 bytes to 32 solves the problem.
You'll have enough space for your variables, for the ret and leave and it's made of two 16-bytes chunks.
The rule of allocating in 4-byte chunks is not valid in all cases. For example, ARM eabi requires aligment of 64-bit integers and doubles on 8-byte boundaries.
Usually the allocated space matches the rules of data packing into structures. So char[6] would actually take 6 bytes (usually), but the padding of the data (for the next field) can use few bytes more.
Example:
struct X
{
char field1[6];
};
So the structure X size would be 8
structure Y
{
char field1[2];
double field2;
};
Structure Y is usually something like 8, 12 or 16 bytes depending on architecture.
Same rules are applied to automatic stack variables: usually the padding is dictated not by type you are using, but by the next type you are going to use. And rules sometimes are a bit vague.
I guess you are getting confused between data size and data alignment. There is no general rule, but, on modern computers, your variable will be stored in 6 bytes. On the other side, the next element won't necessarily be stored at the next byte. This is known as data structure padding.
The word-aligned architectures, where every variable must begin on an address which is a multiple of the word size, are becoming rare. With new processors such as SPARC or x86, variables are self-aligned. It means that they have to begin on an address which is a multiple of its type size.
Therefore, there is no "four-bytes chuck rule" on non-exotic computers. In your example, str will be stored with 6 bytes. If you declare a variable with an alignment of 8 bytes for instance (such as double on x86), there will be 2 padding bytes inserted by your compiler.
Alignment is fixed by the compiler, according to your architecture. So the standard doesn't define anything about it. You may find further informations on Wikipedia.
If you have:
char str[6];
int a;
char b;
char c;
The stack will be of sufficient size to contain all these variables and be divisible by 4 (or whatever alignment is required). But each variable does not need to be aligned on the same boundary (though there may be hardware requirements).
On my system, compiling the above and printing out the addresses of the stack variables (leading digits removed for brevity):
&str -- 18
&a -- 12
&b -- 10
&c -- 11
i.e the compiler will arrange for the stack to be aligned, but the variables do not need to be padded.

size of struct in C [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why isn’t sizeof for a struct equal to the sum of sizeof of each member?
Consider the following C code:
#include <stdio.h>
struct employee
{
int id;
char name[30];
};
int main()
{
struct employee e1;
printf("%d %d %d", sizeof(e1.id), sizeof(e1.name), sizeof(e1));
return(0);
}
The output is:
4 30 36
Why is the size of the structure not equal to the sum of the sizes of its individual component variables?
The compiler may add padding for alignment requirements. Note that this applies not only to padding between the fields of a struct, but also may apply to the end of the struct (so that arrays of the structure type will have each element properly aligned).
For example:
struct foo_t {
int x;
char c;
};
Even though the c field doesn't need padding, the struct will generally have a sizeof(struct foo_t) == 8 (on a 32-bit system - rather a system with a 32-bit int type) because there will need to be 3 bytes of padding after the c field.
Note that the padding might not be required by the system (like x86 or Cortex M3) but compilers might still add it for performance reasons.
As mentioned, the C compiler will add padding for alignment requirements. These requirements often have to do with the memory subsystem. Some types of computers can only access memory lined up to some 'nice' value, like 4 bytes. This is often the same as the word length. Thus, the C compiler may align fields in your structure to this value to make them easier to access (e.g., 4 byte values should be 4 byte aligned) Further, it may pad the bottom of the structure to line up data which follows the structure. I believe there are other reasons as well. More info can be found at this wikipedia page.
Your default alignment is probably 4 bytes. Either the 30 byte element got 32, or the structure as a whole was rounded up to the next 4 byte interval.
Aligning to 6 bytes is not weird, because it is aligning to addresses multiple to 4.
So basically you have 34 bytes in your structure and the next structure should be placed on the address, that is multiple to 4. The closest value after 34 is 36. And this padding area counts into the size of the structure.

Questions about C bitfields

Is bitfield a C concept or C++?
Can it be used only within a structure? What are the other places we can use them?
AFAIK, bitfields are special structure variables that occupy the memory only for specified no. of bits. It is useful in saving memory and nothing else. Am I correct?
I coded a small program to understand the usage of bitfields - But, I think it is not working as expected. I expect the size of the below structure to be 1+4+2 = 7 bytes (considering the size of unsigned int is 4 bytes on my machine), But to my surprise it turns out to be 12 bytes (4+4+4). Can anyone let me know why?
#include <stdio.h>
struct s{
unsigned int a:1;
unsigned int b;
unsigned int c:2;
};
int main()
{
printf("sizeof struct s = %d bytes \n",sizeof(struct s));
return 0;
}
OUTPUT:
sizeof struct s = 12 bytes
Because a and c are not contiguous, they each reserve a full int's worth of memory space. If you move a and c together, the size of the struct becomes 8 bytes.
Moreover, you are telling the compiler that you want a to occupy only 1 bit, not 1 byte. So even though a and c next to each other should occupy only 3 bits total (still under a single byte), the combination of a and c still become word-aligned in memory on your 32-bit machine, hence occupying a full 4 bytes in addition to the int b.
Similarly, you would find that
struct s{
unsigned int b;
short s1;
short s2;
};
occupies 8 bytes, while
struct s{
short s1;
unsigned int b;
short s2;
};
occupies 12 bytes because in the latter case, the two shorts each sit in their own 32-bit alignment.
1) They originated in C, but are part of C++ too, unfortunately.
2) Yes, or within a class in C++.
3) As well as saving memory, they can be used for some forms of bit twiddling. However, both memory saving and twiddling are inherently implementation dependent - if you want to write portable software, avoid bit fields.
Its C.
Your comiler has rounded the memory allocation to 12 bytes for alignment purposes. Most computer memory syubsystems can't handle byte addressing.
Your program is working exactly as I'd expect. The compiler allocates adjacent bitfields into the same memory word, but yours are separated by a non-bitfield.
Move the bitfields next to each other and you'll probably get 8, which is the size of two ints on your machine. The bitfields would be packed into one int. This is compiler specific, however.
Bitfields are useful for saving space, but not much else.
Bitfields are widely used in firmware to map different fields in registers. This save a lot of manual bitwise operations which would have been necessary to read / write fields without it.
One disadvantage is you can't take address of bitfields.

Resources