Can you please help explain why following program correctly prints the values of all the structure members?
struct st
{
int i;
char c1;
int j;
char c2;
};
int main()
{
struct st a = {5, 'i', 11, 'H'};
struct st * pa = &a;
int first;
char second;
int third;
char fourth;
first = *((int*)pa);
second = *((char*)pa + 4); /* offset = 4 bytes = sizeof(int) */
third = *((int*)pa + 2); /* why (pa + 2) here? */
fourth = *((char*)pa + 12); /* why (pa + 12) here? */
printf ("first = %d, second = %c, third = %d, fourth = %c\n", first, second, third, fourth);
return 0;
}
Output: first = 5, second = i, third = 11, fourth = H
How can I make above program generalized?
That's because of the padding bytes added to the structure. Three padded bytes will be added after char second;, this is because the char is followed by an int (member with larger alignment) so padding bytes will be inserted to make the alignment multiple of the alignment of larger member.
How can I make above program generalized?
The only way to make it work reliably is by not guessing at the offset. Use the standard offsetof macro, and always do the pointer arithmetic with a character pointer:
first = *(int*)((char*)pa + offsetof(struct st, i));
You don't have to name the field at the point you do the access, but you should definitely use the macro to compute the offest if you intend to pass it into your function.
It is because of structure padding.
After padding your structure will look like below.
struct st
{
int i;
char c1;
char padding[3]; // for alignment of j.
int j;
char c2;
char padding[3]; // for alignment of structure.
};
Hence
first = *((int*)pa);
second = *((char*)pa + 4); /* offset = 4 bytes = sizeof(int) */
third = *((int*)pa + 2); /* offset = 8 bytes(pointer arithmetic) to point to int j*/
fourth = *((char*)pa + 12); /* offset = 12 bytes to point to char c2*/
For more info on structure padding read
Data_structure_alignment
As in another answers - padding.
But some compilers allow you to pack your structures removing (in most cases) the padding.
gcc:
struct __attribute__((packed)) st
{
....
}
The code which access the packed structs may be less efficient and longer.
When creating a struct, all variables occupy the same amount of space (32 bits), the remaining unused bits are padding. So even if you define a char in the struct, this will occupy 4 bytes.
This is due to the fact that your processor addresses data at 32 bits, even if afterwards less bits are used. The memory on the other side stores 1 byte for each address, but when data is fetched by the CPU, data will be adapted to the bus architecture (that depends on the processor).
Also note that the offset depends on the pointer you are using. a char* in this case will increase by 1, while a int* by 4.
This also means that the code is not portable, since, for example, int may not be defined of the same size on different architectures.
Related
This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 9 years ago.
I'm reading structure in file *stl, but the structure is:
typedef struct
{
float x;
float y;
float z;
} point;
typedef struct
{
point normal_vector; //12 bytes
point p1; //12 bytes
point p2; //12 bytes
point p3; //12 bytes
short int notuse; //2 bytes
} triangle;
sizeof(triangle) is 52—12+12+12+12+2+...2 (I don't know where the last 2 comes from?) The size of each unit in file *stl is 50 (not multiple of 4).
How can I reduce the size of structure to read file (from 52 to 50)?
Thank you.
A way to be preferred over reading the struct - whose memory layout can vary, as you see - as it is and reducing its size could be the way to go.
That said, you can read the file in large blocks and cut the data in the parts you need. Then you read out field for field and put the data into your target array. Something like
float read_float(void ** data) {
float ** fp = data;
float ret = **fp; (*fp)++;
return ret;
}
point read_point(void ** data) {
point ret;
ret.x = read_float(data);
ret.y = read_float(data);
ret.z = read_float(data);
return ret;
}
int16_t read16(void ** data) {
int16_t ** i16p = data;
int16_t ret = **i16p; (*i16p)++;
return ret;
}
point read_triangle(void ** data) {
triangle ret;
ret.normal_vector = read_point(data);
ret.p1 = read_point(data);
ret.p2 = read_point(data);
ret.p3 = read_point(data);
ret.notuse = read_int16(data); // using short int is not portable as well as its size could vary...
return ret;
}
void * scursor = source_array; // which would be a char array
while (scursor < source_array + sizeof(source_array)) {
// make sure that there are enough data present...
target[tcursor++] = read_triangle(&scursor); // the scursor will be advanced by the called function.
}
This way could as well - with certain enhancements - be used to keep e. g. the endianness of your numbers the same - which would be preferrably big endian on files intended to be interchanged between platforms. The changes to read16 would be small, the changes to read_float a bit bigger, but still doable.
Extra two bytes are coming due to padding. Padding is to align the structure with 4 bytes boundary (your word size may be 32 bits, it can vary for 64-bits).
In file, you have stored 50 bytes per structure. So, you can read those 50 bytes and assign the value to each member one by from 50 bytes. Code will look like
Char readbuf[50];
//Read the 50 bytes into the buffer readbuf.
triangle t;
t.normal_vector.x = (float *)readbuf;
t.normal_vector.y = (float *)(readbuf + sizeof(float));
t.normal_vector.z = (float *)(readbuf + 2*sizeof(float));
t.p1.x = (float *)(readbuf + 3*sizeof(float));
//and so on for other members.
Please note that this has byte alignment issue and same programme may not work on big endian machine. So, be wary of storing binary data directly without any rule or encoding.
With GCC/G++ you could do this to pack your structure:
typedef struct
{
point normal_vector; //12 bites
point p1; //12 bites
point p2; //12 bites
point p3; //12 bites
short int notuse; //2 bites
} __attribute__((packed)) triangle;
I am trying to implement SSE vectorization on a piece of code for which I need my 1D array to be 16 byte memory aligned. However, I have tried several ways to allocate 16byte memory aligned data but it ends up being 4byte memory aligned.
I have to work with the Intel icc compiler.
This is a sample code I am testing with:
#include <stdio.h>
#include <stdlib.h>
void error(char *str)
{
printf("Error:%s\n",str);
exit(-1);
}
int main()
{
int i;
//float *A=NULL;
float *A = (float*) memalign(16,20*sizeof(float));
//align
// if (posix_memalign((void **)&A, 16, 20*sizeof(void*)) != 0)
// error("Cannot align");
for(i = 0; i < 20; i++)
printf("&A[%d] = %p\n",i,&A[i]);
free(A);
return 0;
}
This is the output I get:
&A[0] = 0x11fe010
&A[1] = 0x11fe014
&A[2] = 0x11fe018
&A[3] = 0x11fe01c
&A[4] = 0x11fe020
&A[5] = 0x11fe024
&A[6] = 0x11fe028
&A[7] = 0x11fe02c
&A[8] = 0x11fe030
&A[9] = 0x11fe034
&A[10] = 0x11fe038
&A[11] = 0x11fe03c
&A[12] = 0x11fe040
&A[13] = 0x11fe044
&A[14] = 0x11fe048
&A[15] = 0x11fe04c
&A[16] = 0x11fe050
&A[17] = 0x11fe054
&A[18] = 0x11fe058
&A[19] = 0x11fe05c
It is 4byte aligned everytime, i have used both memalign, posix memalign. Since I am working on Linux, I cannot use _mm_malloc neither can I use _aligned_malloc.
I get a memory corruption error when I try to use _aligned_attribute (which is suitable for gcc alone I think).
Can anyone assist me in accurately generating 16byte memory aligned data for icc on linux platform.
The memory you allocate is 16-byte aligned. See:
&A[0] = 0x11fe010
But in an array of float, each element is 4 bytes, so the second is 4-byte aligned.
You can use an array of structures, each containing a single float, with the aligned attribute:
struct x {
float y;
} __attribute__((aligned(16)));
struct x *A = memalign(...);
The address returned by memalign function is 0x11fe010, which is a multiple of 0x10. So the function is doing a right thing. This also means that your array is properly aligned on a 16-byte boundary. What you are doing later is printing an address of every next element of type float in your array. Since float size is exactly 4 bytes in your case, every next address will be equal to the previous one +4. For instance, 0x11fe010 + 0x4 = 0x11FE014. Of course, address 0x11FE014 is not a multiple of 0x10. If you were to align all floats on 16 byte boundary, then you will have to waste 16 / 4 - 1 bytes per element. Double-check the requirements for the intrinsics that you are using.
AFAIK, both memalign and posix_memalign are doing their job.
&A[0] = 0x11fe010
This is aligned to 16 byte.
&A[1] = 0x11fe014
When you do &A[1] you are telling the compiller to add one position to a float pointer. It will unavoidably lead to:
&A[0] + sizeof( float ) = 0x11fe010 + 4 = 0x11fe014
If you intend to have every element inside your vector aligned to 16 bytes, you should consider declaring an array of structures that are 16 byte wide.
struct float_16byte
{
float data;
float padding[ 3 ];
}
A[ ELEMENT_COUNT ];
Then you must allocate memory for ELEMENT_COUNT (20, in your example) variables:
struct float_16byte *A = ( struct float_16byte * )memalign( 16, ELEMENT_COUNT * sizeof( struct float_16byte ) );
I found this code on Wikipedia:
Example: get a 12bit aligned 4KBytes buffer with malloc()
// unaligned pointer to large area
void *up=malloc((1<<13)-1);
// well aligned pointer to 4KBytes
void *ap=aligntonext(up,12);
where aligntonext() is meant as:
move p to the right until next well aligned address if
not correct already. A possible implementation is
// PSEUDOCODE assumes uint32_t p,bits; for readability
// --- not typesafe, not side-effect safe
#define alignto(p,bits) (p>>bits<<bits)
#define aligntonext(p,bits) alignto((p+(1<<bits)-1),bits)
I personally believe your code is correct and is suitable for Intel SSE code. When you load data into an XMM register, I believe the processor can only load 4 contiguous float data from main memory with the first one aligned by 16 byte.
In short, I believe what you have done is exactly what you want.
you could also used this in VS.
__declspec(align(16)) struct x {
long long a;
long long b;
char c;
};
instead of this
struct x {
float y;
} __attribute__((aligned(16)));
I am getting unusual behaviour with my code, which is as follows
#include<stdio.h>
struct a
{
int x;
char y;
};
int main()
{
struct a str;
str.x=2;
str.y='s';
printf("%d %d %d",sizeof(int),sizeof(char),sizeof(str));
getch();
return 0;
}
For this piece of code I am getting the output:
4 1 8
As of my knowledge the structure contains an integer variable of size 4 and a char variable of size 1 thus the size of structure a should be 5. But how come the size of structure is 8.
I am using visual C++ compiler.
Why this behaviour?
It is called Structure Padding
Having data structures that start on 4 byte word alignment (on CPUs with 4 byte buses and processors) is far more efficient when moving data around memory, and between RAM and the CPU.
You can generally switch this off with compiler options and/or pragmas, the specifics of doing so will depend on your specific compiler.
Hope this helps.
The compiler inserts padding for optimization and aligment purposes. Here, the compiler inserts 3 dummy bytes between (or after) your both members.
You can handle the alignment with #pragma directive.
Mostly to illustrate how this padding actually works, I've amended your program a little.
#include<stdio.h>
struct a
{
int x;
char y;
int z;
};
int main()
{
struct a str;
str.x=2;
str.y='s';
str.z = 13;
printf ( "sizeof(int) = %lu\n", sizeof(int));
printf ( "sizeof(char) = %lu\n", sizeof(char));
printf ( "sizeof(str) = %lu\n", sizeof(str));
printf ( "address of str.x = %p\n", &str.x );
printf ( "address of str.y = %p\n", &str.y );
printf ( "address of str.z = %p\n", &str.z );
return 0;
}
Note that I added a third element to the structure. When I run this program, I get:
amrith#amrith-vbox:~/so$ ./padding
sizeof(int) = 4
sizeof(char) = 1
sizeof(str) = 12
address of str.x = 0x7fffc962e070
address of str.y = 0x7fffc962e074
address of str.z = 0x7fffc962e078
amrith#amrith-vbox:~/so$
The part of this that illustrates padding is highlighted below.
address of str.y = 0x7fffc962e074
address of str.z = 0x7fffc962e078
While y is only one character, note that z is a full 4 bytes along.
After using malloc() to initialize 5000 bytes of memory, how would I reference the bytes in this memory space? For example, if I need to point to a starting location of data within the memory, how would I go about that?
EDIT: Does it matter what I use to point to it? I mean I am seeing people use bytes/int/char? Is it relevant?
Error I get:
You can use the subscript array[n] operator to access the index you are interested in reading/writing, like so:
uint8_t* const bytes = (uint8_t*)malloc(5000);
bytes[0] = UINT8_MAX; // << write UINT8_MAX to the first element
uint8_t valueAtIndexZero = bytes[0]; // << read the first element (will be UINT8_MAX)
...
free(bytes), bytes = 0;
char * buffer = malloc(5000);
buffer[idx] = whatever;
char * p = buffer + idx;
*p = whatever;
Malloc doesn't initialize the bits allocated by it. Use calloc() rather.
int *p = malloc (5000); // p points to the start of the dynamically allocated area.
As has been mentioned by others, you could do something like this:
int nbytes = 23; // number of bytes of space to allocate
byte *stuff = malloc(nbytes * sizeof stuff[0]);
stuff[0] = 0; // set the first byte to 0
byte x = stuff[0]; // get the first byte
int n = 3;
stuff[n] = 0; // set the nth byte to 0
x = stuff[n]; // nth byte, or in the case of some other type, nth whatever - just make sure it's a safe value, from 0 (inclusive) to the number (nbytes here) of things you allocated (exclusive)
However, a couple of things to note:
malloc will not initialise the memory, but calloc will (as mentioned by Prasoon Saurav)
You should always check to see if the memory allocation failed (see below for an example)
int nbytes = 23; // or however many you want
byte *stuff = malloc(nbytes * sizeof stuff[0]);
if (NULL == stuff) // memory allocation failed!
{
//handle it here, e.g. by exiting the program and displaying an appropriate error message
}
stuff[0] = 0; // set the first byte to 0
byte x = stuff[0]; // get the first byte
int n = 3;
stuff[n] = 0; // set the nth byte to 0
x = stuff[n]; // nth byte, or in the case of some other type, nth whatever
malloc() returns a pointer to the allocated memory:
typedef unsigned char byte;
byte * mem = malloc( 5000 );
byte val = mem[1000]; /* gets the 1000th byte */
After using malloc() to initialize 5000 bytes of memory, how would I
reference the bytes in this memory space? For example, if I need to
point to a starting location of data within the memory, how would I go
about that?
Does it matter what I use to point to it? I mean I am seeing people
use bytes/int/char? Is it relevant?
as you have seen malloc allocates a block of memory counted in bytes, you can assign a pointer to that block and depending on the pointer type the compiler knows how to reference individual elements:
unsigned char *memblob = malloc( 1024 );
short* pshort = (short*)memblob;
now if you reference the second short value i.e. *(pshort + 1) or pshort[1] the compiler knows that it needs to add 2 bytes (sizeof(short)) in order get the next element.
float* pfloat = (float*)memblob;
now if you reference the second float value i.e. *(pfloat + 1) or pfloat[1] the compiler knows that it needs to add 4 bytes (sizeof(float)) in order get the next element.
same with own defined data types:
typedef struct s
{
short a;
long b;
} mystruct_t;
mystruct_t* pstruct = (mystruct_t*)memblob;
pstruct + 1 accesses the struct at offset sizeof(mystruct_t)
so it is really up to you how you want to use the allocated memory
struct {
integer a;
struct c b;
...
}
In general how does gcc calculate the required space? Is there anyone here who has ever peeked into the internals?
I have not "peeked at the internals", but it's pretty clear, and any sane compiler will do it exactly the same way. The process goes like:
Begin with size 0.
For each element, round size up to the next multiple of the alignment for that element, then add the size of that element.
Finally, round size up to the least common multiple of the alignments of all members.
Here's an example (assume int is 4 bytes and has 4 byte alignment):
struct foo {
char a;
int b;
char c;
};
Size is initially 0.
Round to alignment of char (1); size is still 0.
Add size of char (1); size is now 1.
Round to alignment of int (4); size is now 4.
Add size of int (4); size is now 8.
Round to alignment of char (1); size is still 8.
Add size of char (1); size is now 9.
Round to lcm(1,4) (4); size is now 12.
Edit: To address why the last step is necessary, suppose instead the size were just 9, not 12. Now declare struct foo myfoo[2]; and consider &myfoo[1].b, which is 13 bytes past the beginning of myfoo and 9 bytes past &myfoo[0].b. This means it's impossible for both myfoo[0].b and myfoo[1].b to be aligned to their required alignment (4).
There's not truely standardized way of aligning a struct, but the rule of thumb goes like this: The entire struct is aligned at a 4 or 8 byte boundary (depending on the platform). Within the struct, each member is aligned by its size. So the following packs with no padding:
char // 1
char
char
char
short int // 2
short int
int // 4
This will have a total size of 12. However, this next one will cause padding:
char // 1, + 1 bytes padding
short // 2
int // 4
char // 1, + 1 byte padding
short // 2
char // 1
char // 1, + 2 bytes padding
Now the structure takes up 16 bytes.
This is just a typical example, the details will depend on your platform. Sometimes you can tell a compiler to never add any padding -- this cause more expensive memory access (possibly introducing concurrency problems) but will save space.
To lay out aggregates as efficiently as possible, order the members by size, starting with the biggest.
The size of a structure is implementation defined, but it is hard to say what the size of your structure will be without more information (it is incomplete). For instance, given this struct:
struct MyStruct {
int abc;
int def;
char temp;
};
Yields a size of 9 on my compiler. 4 bytes for int and 1 byte for a char.
Have modified your code so that it compiles and ran it on Eclipse/Microsoft C compiler platform:
struct c {
int a;
struct c *b;
};
struct c d;
printf("\nsizeof c=%d, sizeof a=%d, sizeof b=%d",
sizeof(d), sizeof(d.a), sizeof(d.b));
printf("\naddrof c =%08x", &c);
printf("\naddrof c.a=%08x", &c.a);
printf("\naddrof c.b=%08x", &c.b);
The above code fragment produced the following output:
sizeof c=8, sizeof a=4, sizeof b=4
addrof c =0012ff38
addrof c.a=0012ff38
addrof c.b=0012ff3c
Do something like this so you can see (WITHOUT GUESSING) exactly how your compiler formats a structure.