I'm curious, how exactly does memory allocation looks like after calling struct list x; for code bellow:
struct list {
int key;
char name[10];
struct list* ptr;
};
Variable x will store 4 bytes for key, 10 bytes for name and how much bytes for ptr?
struct list will be allocated as a single contiguous block of memory likely containing the following (assuming sizeof(int) == 4 for this platform and toolchain. I use the word "likely" because some considerations here are actually implementation-defined.
Four bytes for key.
Ten bytes for name.
Padding bytes to align ptr to the expected alignment.
sizeof(list*) bytes for a pointer. On a modern-day desktop computer using a common operating system and ABI (meaning a flat addressing model), I could guess that it's likely to be 4 or 8 bytes for 32-bit and 64-bit systems respectively. In reality, a pointer's size is implementation-defined and depends on a number of factors, as Eric Postpischil adds:
...the C standard permits pointers to be different sizes depending on the types they point to. For example, in old word-addressed computers, some pointers may have only a word number. To address characters, a C implementation had to synthesize byte addresses by adding extra bits to the address and generating extra instructions to manipulate the word data.
The size of the alignment is a bit tricky to figure out since it depends on a combination of the platform (different CPU architectures have different alignment requirements), toolchain/ABI, and any unusual commands/configurations (e.g. #pragma pack or equivalent).
If I had to guess with reasonable assumptions but no information, it would be plausible that there are two bytes of padding regardless of whether this was a 32-bit or 64-bit system. Two bytes of padding places the offset of ptr at 4+10+2=16, which satisfies both a four-byte and an eight-byte alignment.
It will be dependent on one's architecture but try it out
#include <stdio.h>
struct list {
int key;
char name[10];
struct list* ptr;
};
int main(void) {
printf("Size of struct is %d\n", sizeof(struct list));
struct list the_list;
printf("struct is at address %p\n", &the_list);
printf("key is at address %p\n", &the_list.key);
printf("name is at address %p\n", &the_list.name);
printf("ptr is at address %p\n", &the_list.ptr);
return 0;
}
When I ran this I got
Size of struct is 24
struct is at address 0x7ffcf32ad210
key is at address 0x7ffcf32ad210
name is at address 0x7ffcf32ad214
ptr is at address 0x7ffcf32ad220
showing that of the 24 bytes total, the first 4 bytes were for key at the beginning of the memory block, the next 12 for name, and then the final 8 were for ptr. Notice there were 2 bytes of padding between name and ptr.
But this may differ on different architectures. As always, best to try things out!
The size of a pointer is constant. It doesn't depend on the size of the structure it points to.
For additional information on the size of pointers, take a look here
Your structure will start with four bytes for key (because int is four bytes in your C implementation) and ten bytes for name. (The C standard would allow a C implementation to insert padding between key and name, but there is no need for it.)
Pointers are commonly four or eight bytes in modern systems, but they can be other sizes. They can even be different sizes in different C implementations on the same system. So ptr will likely take four or eight bytes in your C implementation.
Pointers may require alignment, such as requiring four-byte alignment for a four-byte pointer. In this case, there will be two bytes of padding after name and before ptr. That is because the structure has 14 bytes in key and name, so, to bring that up to a multiple of four, two bytes of padding are needed. (Again, the C standard allows an implementation to insert more padding, but it is not needed.)
Related
I'm working with vectors and matrices right now and it was suggested to me that I should use SSE instead of using float arrays. However while reading the definition for the C intrinsics and the Assembly instructions it looks like there is a different version of some of the function where the vector has to be "16 byte aligned" and a slower version where the vector isn't aligned. What does having the vector be 16 byte aligned mean? How can I ensure that my vectors are 16 byte aligned?
Alignment ensures that objects are aligned on an address that is a multiple of some power of two. 16-byte-aligned means that the numeric value of the address is a multiple of 16. Alignment is important because CPUs are often less efficient or downright incapable of loading memory that doesn't have the required alignment.
Your ABI determines the natural alignment of types. In general, integer types and floating-point types are aligned to either their own size, or the size of the largest object of that kind that your CPU can treat at once, whichever is smaller. For instance, on 64-bit Intel machines, 32-bit integers are aligned on 4 bytes, 64-bit integers are aligned on 8 bytes, and 128-bit integers are also aligned on 8 bytes.
The alignment of structures and unions is the same as their most aligned field. This means that if your struct contains a field that has a 2-byte alignment and another field that has an 8-byte alignment, the structure will be aligned to 8 bytes.
In C++, you can use the alignof operator, just like the sizeof operator, to get the alignment of a type. In C, the same construct becomes available when you include <stdalign.h>; alternatively, you can use _Alignof without including anything.
AFAIK, there is no standard way to force alignment to be specific value in C or C++, but there are compiler-specific extensions to do it. On Clang and GCC, you can use the __attribute__((aligned(N))) attribute:
struct s_Stuff {
int var1;
short var2;
char padding[10];
} __attribute__((aligned(16)));
(Example.)
(This attribute is not to be confused with __attribute__((align(N))), which sets the alignment of a variable.)
Off the top of my head, I'm not sure for Visual Studio, but according to SoronelHaetir, that would be __declspec(align(N)). Not sure where it goes on the struct declaration.
In the context of vector instructions, alignment is important because people tend to create arrays of floating-point values and operate on them, instead of using types that are known to be aligned. However, __m128, __m256 and __m512 (and all of their variants, like _m128i and such) from <emmintrin.h>, if your compiler environment has it, are guaranteed to be aligned on the proper boundaries for use with aligned intrinsics.
Depending on your platform, malloc may or may not return memory that is aligned on the correct boundary for vector objects. aligned_alloc was introduced in C11 to address these issues, but not all platforms support it.
Apple: does not support aligned_alloc; malloc returns objects on the most exigent alignment that the platform supports;
Windows: does not support aligned_alloc; malloc returns objects aligned on the largest alignment that VC++ will naturally put an object on without an alignment specification; use _aligned_malloc for vector types
Linux: malloc returns objects aligned on an 8- or 16-byte boundary; use aligned_alloc.
In general, it's possible to request slightly more memory and perform alignment yourself with minimal penalties (aside that you're on your own to write a free-like function that will accept a pointer returned by this function):
void* aligned_malloc(size_t size, size_t alignment) {
intptr_t alignment_mask = alignment - 1;
void* memory = malloc(size + alignment_mask);
intptr_t unaligned_ptr = (intptr_t)memory;
intptr_t aligned_ptr = (unaligned_ptr + alignment_mask) & ~alignment_mask;
return (void*)aligned_ptr;
}
Purists might argue that treating pointers as integers is evil, but at the time of writing, they probably won't have a practical cross-platform solution to offer in exchange.
xx-byte alignment means that a the variable's memory address modulo xx is 0.
Ensuring that is a compiler-specific operation, visual c++ for example has __declspec(align(...)), which will work for variables that the compiler allocates (at file or function scope for example), alignment is somewhat harder for dynamic memory, you can use aligned_malloc for that, although your library may already guarantee 16-byte alignment for malloc, it's generally larger alignments that require such a call.
New Edit to improve and focus my answer to the specific query
To ensure data alignment in memory, there are specific functions in C to force this (assuming your data is compatible - where your data matches or discretely fits into your required alignment)
The function to use is [_aligned_malloc][1] instead of vanilla malloc.
// Using _aligned_malloc
// Note alignment should be 2^N where N is any positive int.
int alignment = 16;
ptr = _aligned_malloc('required_size', alignment);
if (ptr == NULL)
{
printf_s( "Error allocation aligned memory.");
return -1;
}
This will (if it succeeds) force your data to align on the 16 byte boundary and should satisfy the requirements for SSE.
Older answer where I waffle on about struct member alignment, which matters - but is not directly answering the query
To ensure struct member byte alignment, you can be careful how you arrange members in your structs (largest first), or you can set this (to some degree) in your compiler settings, member attributes or struct attributes.
Assuming 32 bit machine, 4 byte ints: This is still 4 byte aligned in memory (first largest member is 4 bytes), but padded to be 16 bytes in size.
struct s_Stuff {
int var1; /* 4 bytes */
short var2; /* 2 bytes */
char padding[10]; /* ensure totals struct size is 16 */
}
The compiler usually pads each member to assist with natural alignment, but the padding may be at the end of the struct too. This is struct member data alignment.
Older compiler struct member alignment settings could look similar to these 2 images below...But this is different to data alignment which relates to memory allocation and storage of the data.
It confuses me when Borland uses the phrase (from the images) Data Alignment, and MS uses Struct member alignment. (Although they both refer to specifically struct member alignment)
To maximise efficiency, you need to code for your hardware (or vector processing in this case), so lets assume 32 bit, 4 byte ints, etc. Then you want to use tight structs to save space, but padded structs may improve speed.
struct s_Stuff {
float f1; /* 4 bytes */
float f2; /* 4 bytes */
float f3; /* 4 bytes */
short var2; /* 2 bytes */
}
This struct may be padded to also align the struct members to 4 byte multiples....The compiler will do this unless you specify that it keeps single byte struct member alignment - so the size ON FILE could be 14 bytes, but still in MEMORY an array of this struct would be 16 bytes in size (with 2 bytes wasted), with an unknown data alignment (possibly 8 bytes as default by malloc but not guaranteed. As mentioned above you can force the data alignment in memory with _aligned_malloc on some platforms)
Also regarding member alignment in a struct, the compiler will use multiples of the largest member to set the alignment. Or more specifically:
A struct is always aligned to the largest type’s alignment
requirements
...from here
If you are using a UNION, you are correct that it is forced to the largest possible struct see here
Check that your compiler settings do not contradict your desired struct member alignment / padding too, or else your structs may differ in size to what you expect.
Now, why is it faster? See here which explains how alignment allows the hardware to transmit discrete chunks of data and maximises the use of the hardware that passes around data. That is, the data does not need to be split up or re-arranged at every stage - through the hardware processing
As a rule, its best to set your compiler to resonate with your hardware (and platform OS) so that your alignment (and padding) works best with your hardware processing ability. 32 bit machines usually work best with 4 byte (32 bit) member alignment, but then data written to file with 4 byte member alignment can consume more space than wanted.
Specifically regarding SSE vectors, as this link states, 4 * 4 bytes is they best way to ensure 16 byte alignment, perhaps like this. (And they refer to data alignment here)
struct s_data {
float array[4];
}
or simply an array of floats, or doubles.
#include<stdio.h>
struct std
{
char c;
char e;
};
void main()
{
struct std p,q;
int start,last;
last= &q;
start= &p;
printf("Size:%u %u Address_Diff:%d \n",&p, &q,(last-start) );
}
I am learning structure alignment and padding while we executing above program every time difference between two structure variables showing 16 bytes. I am not understanding why the difference is 16 bytes but i'm expecting the difference is 2 bytes.
some examples:
4077309424 4077309440 diff:16
3600238672 3600238688 diff:16
3272011008 3272011024 diff:16
Generally compiler may pad bytes, to keep each structure variable start position to word or double-word alignment. I guess the choice of alignment position may be architecture and compiler dependent.
It seems your compiler is keeping 16-byte alignment for each structure variable. Sometimes compiler may pad bytes to elements of structure to keep word or double-word alignment. You can check where the bytes padding happening, I mean bytes or padded at element level or structure level.
Do print each element address (as below) and see the difference between them.
printf("Size:%u %u Address_Diff:%d \n",&p.c, &p.e);
If difference is just one byte, it means that bytes are padded at structure level i.e., at the end of structure.
C question:
Does malloc'ing a struct always result in linear placement from top to bottom of the data inside? As a second minor question: is there a standard on the padding size, or does it vary between 32 and 64 bit machines?
Using this test code:
#include <stdio.h>
#include <stdlib.h>
struct test
{
char a;
/* char pad[3]; // I'm assuming this happens from the compiler */
int b;
};
int main() {
int size;
char* testarray;
struct test* testarraystruct;
size = sizeof(struct test);
testarray = malloc(size * 4);
testarraystruct = (struct test *)testarray;
testarraystruct[1].a = 123;
printf("test.a = %d\n", testarray[size]); // Will this always be test.a?
free(testarray);
return 0;
}
On my machine, size is always 8. Therefore I check testarray[8] to see if it's the second struct's 'char a' field. In this example, my machine returns 123, but this obviously isn't proof it always is.
Does the C compiler have any guarantees that struct's are created in linear order?
I am not claiming the way this is done is safe or the proper way, this is a curiosity question.
Does this change if this is becomes C++?
Better yet, would my memory look like this if it malloc'd to 0x00001000?
0x00001000 char a // First test struct
0x00001001 char pad[0]
0x00001002 char pad[1]
0x00001003 char pad[2]
0x00001004 int b // Next four belong to byte b
0x00001005
0x00001006
0x00001007
0x00001008 char a // Second test struct
0x00001009 char pad[0]
0x0000100a char pad[1]
0x0000100b char pad[2]
0x0000100c int b // Next four belong to byte b
0x0000100d
0x0000100e
0x0000100f
NOTE: This question assumes int's are 32 bits
As far as I know, malloc for struct is not linear placement of data but it's a linear allocation of memory for the members with in the structure that too when you create an object of it.
This is also necessary for padding.
Padding also depends on the type of machine (i.e 32 bit or 64 bit).
The CPU fetches the memory based on whether it is 32 bit or 64 bit.
For 32 bit machine your structure will be:
struct test
{
char a; /* 3 bytes padding done to a */
int b;
};
Here your CPU fetch cycle is 32 bit i.e 4 bytes
So in this case (for this example) the CPU takes two fetch cycles.
To make it more clear in one fetch cycle CPU allocates 4 bytes of memory. So 3 bytes of padding will be done to "char a".
For 64 bit machine your structure will be:
struct test
{
char a;
int b; /* 3 bytes padding done to b */
}
Here the CPU fetch cycle is 8 bytes.
So in this case (for this example) the CPU takes one fetch cycles. So 3 bytes of padding here must be done to "int b".
However you can avoid the padding you can use #pragma pack 1
But this will not be efficient w.r.t time because here CPU fetch cycles will be more (for this example CPU fetch cycles will be 5).
This is tradeoff between CPU fetch cycles and padding.
For many CPU types, it is most efficient to read an N-byte quantity (where N is a power of 2 — 1, 2, 4, 8, sometimes 16) when it is aligned on an N-byte address boundary. Some CPU types will generate a SIGBUS error if you try to read an incorrectly aligned quantity; others will make extra memory accesses as necessary to retrieve an incorrectly aligned quantity. AFAICR, the DEC Alpha (subsequently Compaq and HP) had a mechanism that effectively used a system call to fix up a misaligned memory access, which was fiendishly expensive. You could control whether that was allowed with a program (uac — unaligned access control) which would stop the kernel from aborting the process and would do the necessary double reads.
C compilers are aware of the benefits and costs of accessing misaligned data, and go to lengths to avoid doing so. In particular, they ensure that data within a structure, or an array of structures, is appropriately aligned for fast access unless you hold them to ransom with quasi-standard #pragma directives like #pragma pack(1).
For your sample data structure, for a machine where sizeof(int) == 4 (most 32-bit and 64-bit systems), then there will indeed be 3 bytes of padding after an initial 1 byte char field and before a 4-byte int. If you use short s; after the single character, there would be just 1 byte of padding. Indeed, the following 3 structures are all the same size on many machines:
struct test_1
{
char a;
/* char pad[3]; // I'm assuming this happens from the compiler */
int b;
};
struct test_2
{
char a;
short s;
int b;
};
struct test_3
{
char a;
char c;
short s;
int b;
};
The C standard mandates that the elements of a structure are laid out in the sequence in which they are defined. That is, in struct test_3, the element a comes first, then c, then s, then b. That is, a is at the lowest address (and the standard mandates that there is no padding before the first element), then c is at an address higher than a (but the standard does not mandate that it will be one byte higher), then s is at an address higher than c, and that b is at an address higher than s. Further, the elements cannot overlap. There may be padding after the last element of a structure. For example in struct test_4, on many computers, there will be 7 bytes of padding between a and d, and there will be 7 bytes of padding after b:
struct test_4
{
char a;
double d;
char b;
};
This ensures that every element of an array of struct test_4 will have the d member properly aligned on an 8-byte boundary for optimal access (but at the cost of space; the size of the structure is often 24 bytes).
As noted in the first comment to the question, the layout and alignment of the structure is independent of whether the space is allocated by malloc() or on the stack or in global variables. Note that malloc() does not know what the pointer it returns will be used for. Its job is simply to ensure that no matter what the pointer is used for, there will be no misaligned access. That often means the pointer returned by malloc() will fall on an 8-byte boundary; on some 64-bit systems, the address is always a multiple of 16 bytes. That means that consecutive malloc() calls each allocating 1 byte will seldom produce addresses 1 byte apart.
For your sample code, I believe that standard does require that testdata[size] does equal 123 after the assignment. At the very least, you would be hard-pressed to find a compiler where it is not the case.
For simple structures containing plain old data (POD — simple C data types), C++ provides the same layout as C. If the structure is a class with virtual functions, etc, then the layout rules depend on the compiler. Virtual bases and the dreaded 'diamond of death' multiple inheritance, etc, also make changes to the layout of structures.
#include<stdio.h>
struct krishna {
int i,j,k,l,m;
char c;
double d;
char g[48];
};
int main() {
struct krishna *me={0};
printf("%ld %ld\n",sizeof(me),sizeof(*me));//output is 8 80 how??
return 0;
}
Hello everyone I am new here and the compiler I use is gcc compiler in the above code can anyone explain why
1) pointer irrespective of any type is allocated 8 ?
2) sizeof the above struct is 80 ? Can anyone explain to me in general for any structure how can one determine the structure size , I am getting confused each time I expect one value but getting a different answer and I have also read other questions and answers in stack overflow regarding this and I am still not getting it.Please help.
printf("%ld %ld\n",sizeof(me),sizeof(*me));//output is 8 80 how??
Actually that should be:
printf("%zu %zu\n",sizeof(me),sizeof(*me));//output is 8 80 how??
"%zu" is the correct format string for a size_t value, such as the value you get from sizeof. "%ld" may happen to work on some systems (and apparently it does on yours), but you shouldn't count on that.
If your compiler doesn't support "%zu", you can use "%lu" (which expects an unsigned long argument) and explicitly convert the arguments:
printf("%lu %lu\n", (unsigned long)sizeof(me), (unsigned long)sizeof(*me));
You're getting 8 for sizeof(me) because that happens to be the size of a pointer on the compiler you're using (8 bytes, 64 bits). If you compiled and ran your program on a different system, you might get 4, because a lot of systems have 32-bit pointers. (And this assumes a byte is 8 bits, which is true for most systems but not guaranteed by the language.)
Most compilers make all pointers the same size, but that's not guaranteed by the language either. For example, on a word-addressed machine, an int* pointer could be just a machine-level address, but a char* pointer might need additional information to specify which byte within the word it points to. You're not very likely to run into a system with varying pointer sizes, but there's still no point in assuming that all pointers are the same size.
As for the size of the structure, that also can vary from one compiler to another. Here's your structure again:
struct krishna {
int i,j,k,l,m;
char c;
double d;
char g[48];
};
char is always exactly 1 byte, and char[48] is always exactly 48 bytes.
The number of bytes in an int can vary from one system to another; 4 bytes is most common these days.
The size of a double is typically 8 bytes, but this can also vary (though I don't think I've ever seen a system where sizeof (double) isn't 8 bytes.)
Structure members are laid out in the order in which they're declared, so your i will be at the very beginning of the structure, followed by j, k, and so forth.
Finally, the compiler will often insert padding bytes between members, or after the last member, so that each member is properly aligned. For example, on many systems a 4-byte int needs to be aligned at an offset that's a multiple of 4 bytes; if it's misaligned, access to it may be slow and/or very difficult.
The fact that sizeof (struct krishna) happens to be 80 bytes on your system isn't really all that important. It's more important to understand (a) the general rules compilers use to determine how structures are laid out, and (b) the fact that those rules can result in different layouts for different systems.
The language definition and your compiler guarantee that you can have objects of type struct krishna, and that you can access those objects and their members, getting back whatever values you stored in them. If you need to know how big a struct krishna is, the answer is simply sizeof (struct krishna). If, for some reason, you need to know more details than that (say, if you need to match some externally imposed layout), you can do some experiments and/or consult your compiler's documentation -- but be aware that the specifics will apply only to the compiler you're using on the system where you're using it. (Often an ABI for your system will constrain the compiler's choices.)
You can also use sizeof and offsetof (look it up) to find out where each member is allocated.
All pointers are addresses, and all addresses are the same size on a given system, usually 4 bytes on a 32 bit system and 8 bytes on a 64 bit system. Since you are getting 8, you must be on a 64 bit system.
The size of a struct depends on how the compiler "packs" the individual fields of the struct together into a single block of memory to contain the entire struct. In your case, your struct has 5 int fields (4 bytes each), a single char field (1 byte), a single double field (8 bytes), and a 48 character array. Add all that up and you get 20 + 1 + 8 + 48 = 77 bytes to store your data. The actual size is 80 because the compiler is "padding" the 1 byte char field with 3 extra unused bytes in order to keep all fields in the struct aligned to a 4-byte memory address, which is needed for good performance.
Hope that helps!
This is because:
sizeof( me ) // is a pointer.
... me is a pointer. The size of a pointer is a multiple of the word on your environment, hence it's common that on 32-bit environments a pointer is 4 bytes whereas on a 64-bit environment a pointer is 8 bytes (but not written in stone). If you were to go back a couple years, a 16-bit environment would have a 2 byte pointer. Looking at the next sizeof:
sizeof( *me ) // is a struct krishna, hence 80 bytes are needed to store it in memory.
... is a structure and the size of the structure krishna is 80 bytes. If you look at the structure:
struct krishna {
int i,j,k,l,m; // sizeof( int ) * 5
char c; // sizeof( char ) * 1
double d; // sizeof( double ) * 1
char g[48]; // sizeof( char ) * 48
// padding for memory address offset would be here.
};
... if you add up the amount of bytes required for each field and include the appropriate data structure alignment for the memory address offset then it will total 80 bytes (as expected). The reason it adds an extra 3 unused bytes is because to store a structure in memory it must be in a continuous block of memory that is allocated for the structure. For performance reasons, it will pad any size issues to ensure that the memory addresses are always a multiple of the word. The tradeoff of the 3 bytes for performance improvements is worth it, 3 bytes nowadays is not as impactful as the performance improvements the processor has when data alignment is guaranteed.
Just to add to answers of #Jacob Pollack and #ObjetDart, you can find more about structure padding at Structure padding in C.
I read that functions in C may use local stack-based variables, and they are allocated simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks (if I am not mistaken). But, what if run code like following:
void foo(void)
{
char str[6];
......
}
What size does var str occupy? 6 bytes or 6 × 4 bytes as per the four-byte chunks.
The four-byte-chunk rule just means that the stack pointer must point to an address that is a multiple of four. In this case, allocating 8 bytes satisfies that rule, and such a block is large enough to hold a 6-character array with only 2 bytes of padding.
Data alignment is a CPU requirement which means that the alignment amount changes from a CPU to another, keep that in mind.
Speaking about stack data-alignment, gcc for example keeps the data aligned using an option called -mpreferred-stack-boundary=n where the data will be aligned to 2^n.
By default, the value of n is 4 which makes the stack-alignment 16-bytes.
What this means is that you'll find yourself allocating 16 bytes in stack memory although what you explictly allocated was just an integer.
int main()
{
char ar[6] = {1,2,3,4,5,6};
int x = 10;
int y = 12 + (int) ar[1] + x;
return y;
}
Compiling this code with gcc on my CPU produces the following assembly(posting only the stack-allocation instruction):
subl $32, %esp
But why 32? we're allocating data that fits exactly in 16 bytes.
Well, there are 8 bytes gcc needs to keep saved for the leave and ret which makes the total needed memory 24.
BUT, the alignment requirement is 16-bytes and thus gcc needs to allocate stack-space so that it's made up of 16-bytes chunks; making that 24 bytes to 32 solves the problem.
You'll have enough space for your variables, for the ret and leave and it's made of two 16-bytes chunks.
The rule of allocating in 4-byte chunks is not valid in all cases. For example, ARM eabi requires aligment of 64-bit integers and doubles on 8-byte boundaries.
Usually the allocated space matches the rules of data packing into structures. So char[6] would actually take 6 bytes (usually), but the padding of the data (for the next field) can use few bytes more.
Example:
struct X
{
char field1[6];
};
So the structure X size would be 8
structure Y
{
char field1[2];
double field2;
};
Structure Y is usually something like 8, 12 or 16 bytes depending on architecture.
Same rules are applied to automatic stack variables: usually the padding is dictated not by type you are using, but by the next type you are going to use. And rules sometimes are a bit vague.
I guess you are getting confused between data size and data alignment. There is no general rule, but, on modern computers, your variable will be stored in 6 bytes. On the other side, the next element won't necessarily be stored at the next byte. This is known as data structure padding.
The word-aligned architectures, where every variable must begin on an address which is a multiple of the word size, are becoming rare. With new processors such as SPARC or x86, variables are self-aligned. It means that they have to begin on an address which is a multiple of its type size.
Therefore, there is no "four-bytes chuck rule" on non-exotic computers. In your example, str will be stored with 6 bytes. If you declare a variable with an alignment of 8 bytes for instance (such as double on x86), there will be 2 padding bytes inserted by your compiler.
Alignment is fixed by the compiler, according to your architecture. So the standard doesn't define anything about it. You may find further informations on Wikipedia.
If you have:
char str[6];
int a;
char b;
char c;
The stack will be of sufficient size to contain all these variables and be divisible by 4 (or whatever alignment is required). But each variable does not need to be aligned on the same boundary (though there may be hardware requirements).
On my system, compiling the above and printing out the addresses of the stack variables (leading digits removed for brevity):
&str -- 18
&a -- 12
&b -- 10
&c -- 11
i.e the compiler will arrange for the stack to be aligned, but the variables do not need to be padded.