understanding struct in memory - c

#include <stdio.h>
typedef struct ss {
int a;
char b;
int c;
} ssa;
int main(){
ssa *ss;
int *c=&ss->a;
char *d=&ss->b;
int *e=&ss->c;
*c=1;
*d=2;
*e=3;
printf("%d=%p %d=%p %d=%p\n",*c,c++,*c,c++,*c,c);
return 0;
}
//prints 1=0x4aaa4333ac68 2=0x4aaa4333ac6c 3=0x4aaa4333ac70
My thinking of how should be the memory structure:
int | char | int
(68 69 6A 6B) (6C) (6D 6E 6F 70)
I'm trying to understand how this code works in memory.
Why int *e starts from 0x...70?
Why c++ (increment) from char (6C) goes 4 bytes more?
Thanks.

First of all, these lines are illegal:
*c=1;
*d=2;
*e=3;
All you have is a pointer to ssa, but you haven't actually allocated any space for the pointed-to object. Thus, these 3 lines are trying to write into unallocated memory, and you have undefined behavior.
Structure layout in memory is such that member fields are in increasing memory addresses, but the compiler is free to place any amount of padding in between for alignment reasons, although 2 structures sharing the same initial elements will have the corresponding members at the same offset. This is one reason that could justify the "gaps" between member addresses.
You should be more careful with how you call printf(). Argument evaluation order is undefined. You are changing the value of c more than once in between 2 sequence points (see Undefined behavior and sequence points). Furthermore, pointer arithmetic is only guaranteed to work correctly when performed with pointers that point to elements of the same array or one past the end.
So, in short: the code has undefined behavior all over the place. Anything can happen. A better approach would have been:
#include <stdio.h>
typedef struct ss {
int a;
char b;
int c;
} ssa;
int main() {
ssa ss = { 0, 0, 0 };
int *c = &ss.a;
char *d = &ss.b;
int *e = &ss.c;
printf("c=%p d=%p e=%p\n", (void *) c, (void *) d, (void *) e);
return 0;
}
The cast to void * is necessary. You will probably see a gap of 3 bytes between the value of d and e, but keep in mind that this is highly platform dependant.

There is often padding inside structures, you cannot assume that each field follows the one before it immediately.
The padding is added by the compiler to make structure member access quick, and sometimes in order to make it possible. Not all processors support unaligned accesses, and even those that do can have performance penalties for such accesses.
You can use offsetof() to figure out where there is padding, but typically you shouldn't care.

Related

Copying char arr[64] to char arr[] can cause a segmentation fault?

typedef struct {
int num;
char arr[64];
} A;
typedef struct {
int num;
char arr[];
} B;
I declared A* a; and then put some data into it. Now I want to cast it to a B*.
A* a;
a->num = 1;
strcpy(a->arr, "Hi");
B* b = (B*)a;
Is this right?
I get a segmentation fault sometimes (not always), and I wonder if this could be the cause of the problem.
I got a segmentation fault even though I didn't try to access to char arr[] after casting.
This defines a pointer variable
A* a;
There is nothing it is cleanly pointing to, the pointer is non-initialised.
This accesses whatever it is pointing to
a->num = 1;
strcpy(a->arr, "Hi");
Without allocating anything to the pointer beforehand (by e.g. using malloc()) this is asking for segfaults as one possible consequence of the undefined behaviour it invokes.
This is an addendum to Yunnosch's answer, which identifies the problem correctly. Let's assume you do it correctly and either write just
A a;
which gives you an object of automatic storage duration when declared inside a function, or you dynamically allocated an instance of A like this:
A *a = malloc(sizeof *a);
if (!a) return -1; // or whatever else to do in case of allocation error
Then, the next thing is your cast:
B* b = (B*)a;
This is not correct, types A and B are not compatible. Here, it will probably work in practice because the struct members are compatible, but beware that strange things can happen because the compiler is allowed to assume a and b point to different objects because their types are not compatible. For more information, read on the topic of what's commonly called the strict aliasing rule.
You should also know that an incomplete array type (without a size) is only allowed as the very last member of a struct. With a definition like yours:
typedef struct {
int num;
char arr[];
} B;
the member arr is allowed to have any size, but it's your responsibility to allocate it correctly. The size of B (sizeof(B)) doesn't include this member. So if you just write
B b;
you can't store anything in b.arr, it has a size of 0. This last member is called a flexible array member and can only be used correctly with dynamic allocation, adding the size manually, like this:
B *b = malloc(sizeof *b + 64);
This gives you an instance *b with an arr of size 64. If the array doesn't have the type char, you must multiply manually with the size of your member type -- it's not necessary for char because sizeof(char) is by definition 1. So if you change the type of your array to something different, e.g. int, you'd write this to allocate it with 64 elements:
B *b = malloc(sizeof *b + 64 * sizeof *(b->arr));
It appears that you are confusing two different topics. In C99/C11 char arr[]; as the last member of a structure is a Flexible Array Member (FAM) and it allows you to allocate for the structure itself and N number of elements for the flexible array. However -- you must allocate storage for it. The FAM provides the benefit of allowing one-allocation and one-free where there would normally be two required. (In C89 a similar implementation went by the name struct hack, but it was slightly different).
For example, B *b = malloc (sizeof *b + 64 * sizeof *b->arr); would allocate storage for b plus 64-characters of storage for b->arr. You could then copy the members of a to b using the proper '.' and '->' syntax.
A short example can illustrate:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NCHAR 64 /* if you need a constant, #define one (or more) */
typedef struct {
int num;
char arr[NCHAR];
} A;
typedef struct {
int num;
char arr[]; /* flexible array member */
} B;
int main (void) {
A a = { 1, "Hi" };
B *b = malloc (sizeof *b + NCHAR * sizeof *b->arr);
if (!b) {
perror ("malloc-b");
return 1;
}
b->num = a.num;
strcpy (b->arr, a.arr);
printf ("b->num: %d\nb->arr: %s\n", b->num, b->arr);
free (b);
return 0;
}
Example Use/Output
$ ./bin/struct_fam
b->num: 1
b->arr: Hi
Look things over and let me know if that helps clear things up. Also let me know if you were asking something different. It is a little unclear exactly where you confusion lies.

Casting char pointer to int pointer - buffer error 10

In this answer, the author discussed how it was possible to cast pointers in C. I wanted to try this out and constructed this code:
#include <stdio.h>
int main(void) {
char *c;
*c = 10;
int i = *(int*)(c);
printf("%d", i);
return 1;
}
This compiles (with a warning) and when I execute the binary it just outputs bus error: 10. I understand that a char is a smaller size than an int. I also understand from this post that I should expect this error. But I'd really appreciate if someone could clarify on what is going on here. In addition, I'd like to know if there is a correct way to cast the pointers and dereference the int pointer to get 10 (in this example). Thanks!
EDIT: To clarify my intent, if you are worried, I'm just trying to come up with a "working" example of pointer casting. This is just to show that this is allowed and might work in C.
c is uninitialized when you dereference it. That's undefined behaviour.
Likewise, even if c were initialized, your typecast of it to int * and then a dereference would get some number of extra bytes from memory, which is also undefined behaviour.
A working (safe) example that illustrates what you're trying:
int main(void)
{
int i = 10;
int *p = &i;
char c = *(char *)p;
printf("%d\n", c);
return 0;
}
This program will print 10 on a little-endian machine and 0 on a big-endian machine.
These lines of code are problematic. You are writing through a pointer that is uninitialized.
char *c;
*c = 10;
Change to something like this:
char * c = malloc (sizeof (char));
Then, the following line is invalid logic, and the compiler should at least warn you about this:
int i = *(int*)(c);
You are reading an int (probably 4 or 8 bytes) from a pointer that only has one byte of storage (sizeof (char)). You can't read an int worth of bytes from a char memory slot.
First of all your program has undefined behaviour because pointer c was not initialized.
As for the question then you may write simply
int i = *c;
printf("%d", i);
Integral types with rankes less than the rank of type int are promoted to type int in expressions.
I understand that a char is a smaller size than an int. I also understand from this post that I should expect this error. But I'd really appreciate if someone could clarify on what is going on here
Some architectures like SPARC and some MIPS requires strict alignment. Thus if you want to read or write for example a word, it has to be aligned on 4 bytes, e.g. its address is multiple of 4 or the CPU will raise an exception. Other architectures like x86 can handle unaligned access, but with performance cost.
Let's take your code, find all places where things go boom as well as the reason why, and do the minimum to fix them:
#include <stdio.h>
int main(void) {
char *c;
*c = 10;
The preceding line is Undefined Behavior (UB), because c does not point to at least one char-object. So, insert these two lines directly before:
char x;
c = &x;
Lets move on after that fix:
int i = *(int*)(c);
Now this line is bad too.
Let's make our life complicated by assuming you didn't mean the more reasonable implicit widening conversion; int i = c;:
If the implementation defines _Alignof(int) != 1, the cast invokes UB because x is potentially mis-aligned.
If the implementation defines sizeof(int) != 1, the dereferencing invokes UB, because we refer to memory which is not there.
Let's fix both possible issues by changing the lines defining x and assigning its address to c to this:
_Alignas(in) char x[sizeof(int)];
c = x;
Now, reading the dereferenced pointer causes UB, because we treat some memory as if it stored an object of type int, which is not true unless we copied one there from a valid int variable - treating both as buffers of characters - or we last stored an int there.
So, add a store before the read:
*(int*)c = 0;
Moving on...
printf("%d", i);
return 1;
}
To recap, the changed program:
#include <stdio.h>
int main(void) {
char *c;
_Alignas(in) char x[sizeof(int)];
c = x;
*c = 10;
*(int*)c = 0;
int i = *(int*)(c);
printf("%d", i);
return 1;
}
(Used the C11 standard for my fixes.)

Why does following C code print 45 in case of int 45 and 36 in case of STRING and ASCII value of CHAR?

struct s{
int a;
char c;
};
int main()
{
struct s b = {5,'a'};
char *p =(char *)&b;
*p = 45;
printf("%d " , b.a);
return 0;
}
If *p is changes to any character than it prints ASCII value of character , if *p changed to any string ("xyz") than it prints 36 . Why it's happening ?
Can you give memory map of structure s and *p ?
According to me mamory of struct
s as z-> (****)(*) assuming 4 byte for int . and when s initialize than it would have become (0000000 00000000 00000000 00000101)(ASCII OF char a) and *p points to starting address of z . and *p is a character pointer so it will be store ASCII value at each BYTE location i.e. at each * will be occupied by ASCII of char . But now we make it to 45 so z would have become (45 0 0 5 )(ASCII of char a) . But it's not true why ?
When you write to the struct through a char * pointer, you store 45 in the first byte of the struct. If you are on a Little-Endian implementation, you will write to the low end of b.a. If you are on a Big-Endian implementation, you will write to the high end of b.a.
Here is a visualization of what typically happens to the structure on an implementation with 16-bit ints, before and after the assignment *p=45. Note that the struct is padded to a multiple of sizeof(int).
Little-Endian
Before: a [05][00] (int)5
c [61]
[ ]
After: a [2d][00] (int)45
c [61]
[ ]
Big-Endian
Before: a [00][05] (int)5
c [61]
[ ]
After: a [2d][05] (int)11525
c [61]
[ ]
With larger ints, there are more ways to order the bytes, but you are exceedingly unlikely to encounter any other that the two above in real life.
However, The next line invokes undefined behaviour for two reasons:
printf("%d " , b.a);
You are modifying a part of b.a through a pointer of a different type. This may give b.a a "trap representation", and reading a value containing a trap representation causes undefined behaviour. (And no, you are not likely to ever encounter a trap representation (in an integer type) in real life.)
You are calling a variadic function without a function declaration. Variadic functions typically have unusal ways of passing arguments, so the compiler has to know about it. The usual fix is to #include <stdio.h>.
Undefined behaviour means that anything could happen, such as printing the wrong value, crashing your program or (the worst of them all) doing exactly what you expect.
your struct looks in little endian like:
00000101 00000000 00000000 00000000 01100001
so p points to the 5 and overwrite it. at the printf the 4 little endian bytes print the 45.
if you would try it on big endian 754974725 would be the result, because p points to the MSB side of the int.
a simple test program to find out if you are on little or big endian:
int main()
{
int a = 0x12345678;
unsigned char *c = (unsigned char*)(&a);
if (*c == 0x78)
printf("little-endian\n");
else
printf("big-endian\n");
return 0;
}
The C standard guarantees that the address of the first member of a structure is the address of the structure. That is, in your case,
int* p =(int*)&b;
is a safe cast. But there is no standard way of accessing the char member from the address of the structure. This is because the standard does not say anything about the contiguity of successive members in memory: in fact the compiler may or may not insert gaps (called structure packing) between members to suit the chipset.
So what you're doing is essentially undefined.
Because, this
*p = 45;
Changes the value of what p points to to 45. And you made p point to b.
if you want Pointers as structure member instead of an array of char.
try this..
#include<stdio.h>
struct s{
int a;
char *c;
};
int main()
{
struct s b = {5, "a"};
printf("%d %s", b.a, b.c);
return 0;
}
try this Pointers to structure.
#include<stdio.h>
struct s{
int a;
char c[1];
};
int main()
{
struct s *p;
struct s b = {5, 'a'};
p = &b;
printf("%d %s", p->a, p->c);
return 0;
}
char *p = (char*) &b; - in this line, p points to the beginning of b struct as a char pointer.
*p = 45; writes 45 to the memory space of b which is can be accessed by b.a as well.
when you print printf("%d ", b.a); you'll print the 45 stored in the stack memory assigned as a member of struct b you'll get 45.
try debugging it yourself and you'll see it in the watch window

sizeof for a structure with char c[0]

struct xyz {
int a;
int b;
char c[0];
};
struct xyz x1;
printf("Size of structure is %d",sizeof(x1));
Output: 8
why isn't the size of structure 9 bytes? Is it because the character array declared is of size 0?
Zero-length arrays are not in the standard C, but they are allowed by many compilers.
The idea is that they must be placed as the very last field in a struct, but they don't occupy any bytes. The struct works as a header for the array that is placed just next to it in memory.
For example:
struct Hdr
{
int a, b, c;
struct Foo foos[0]
};
struct Hdr *buffer = malloc(sizeof(struct Hdr) + 10*sizeof(Foo));
buffer->a = ...;
buffer->foos[0] = ...;
buffer->foos[9] = ...;
The standard way to do that is to create an array of size 1 and then substracting that 1 from the length of the array. But even that technique is controversial...
For more details and the similar flexible array member see this document.
Your array of characters has a length of 0 and hence the size of c is 0 bytes. Therefore when your compiler allocated a block of memory for that structure it only considers both integers and since you are on a 32-bit environment (assuming so from your result) the size of the structure is 8 bytes.
Remark: You can still access the field c without any compiler warnings (compiled with gcc) however it will be some garbage value.
An array of length 0 is actually not permitted in standard C, but apparently your compiler supports it as an extension.
It's one way of implementing the so-called "struct hack", explained in question 2.6 of the comp.lang.c FAQ.
Because C implementations typically don't do bounds checking for arrays, a zero-element array (or in a more portable variant, a one-element array) gives you a base address for an array of arbitrary size. You have to allocate, typically using malloc, enough memory for the enclosing struct so that there's room for as many array elements as you need:
struct xyz *ptr = malloc(sizeof *ptr + COUNT * sizeof (char));
C99 added a new feature, "flexible array members", that does the same thing without specifying a fake array size:
struct xyz {
int a;
int b;
char c[];
};
In this case:
int a;//4
int b;//4
char c[0] ; 0
So it is 8.
And
The sizeof never return 9 in you struct even if you give a size to char c[];
int a;//4 byte
int b;//4 byte
char c[1];// one byte but it should alignment with other.
just in this way:
^^^^
^^^^
^~~~ //for alignment
So, sizeof return 12 not 9

Array of pointers and more in C

Here is the piece of codes where I don't understand
#include "malloc.h"
/*some a type A and type for pointers to A*/
typedef struct a
{
unsigned long x;
} A, *PA;
/*some a type B and type for pointers to B*/
typedef struct b
{
unsigned char length;
/*array of pointers of type A variables*/
PA * x;
} B, *PB;
void test(unsigned char length, PB b)
{
/*we can set length in B correctly*/
b->length=length;
/*we can also allocate memory for the array of pointers*/
b->x=(PA *)malloc(length*sizeof(PA));
/*but we can't set pointers in x*/
while(length!=0)
b->x[length--]=0; /*it just would not work*/
}
int main()
{
B b;
test(4, &b);
return 0;
}
Can anyone elaborate conceptually to me why we can't set pointers in array x in test()?
On the last line of test() you are initializing the location off the end of your array. If your length is 4, then your array is 4 pointers long. b->x[4] is the 5th element of the array, as the 1st is b->x[0]. You need to change your while loop to iterate over values from 0 to length - 1.
If you want to set to null every PA in b->x, then writing --length instead of length-- should do the job.
Obviously trying to figure out where the -- belongs is confusing. You better write:
unsigned i;
for (i = 0; i < length; i++)
b->x[i] = 0;
But in fact, in this case, you could simply use:
memset(b->x, 0, length*sizeof(PA));
Your structure is more complicated by one level of dynamic memory allocation than is usually necessary. You have:
typedef struct a
{
unsigned long x;
...and other members...
} A, *PA;
typedef struct b
{
unsigned char length;
PA * x;
} B, *PB;
The last member of B is a struct a **, which might be needed, but seldom is. You should probably simplify everything by using:
typedef struct a
{
unsigned long x;
} A;
typedef struct b
{
unsigned length;
A *array;
} B;
This rewrite reflects a personal prejudice against typedefs for pointers (so I eliminated PA and PB). I changed the type of length in B to unsigned from unsigned char; using unsigned char saves on space in the design shown, though it might conceivably save space if you kept track of the allocated length separately from the length in use (but even then, I'd probably use unsigned short rather than unsigned char).
And, most importantly, it changes the type of the array so you don't have a separate pointer for each element because the array contains the elements themselves. Now, occasionally, you really do need to handle arrays of pointers. But it is relatively unusual and it definitely complicates the memory management.
The code in your test() function simplifies:
void init_b(unsigned char length, B *b)
{
b->length = length;
b->x = (A *)malloc(length*sizeof(*b->x));
for (int i = 0; i < length; i++)
b->x[i] = 0;
}
int main()
{
B b;
init_b(4, &b);
return 0;
}
Using an idiomatic for loop avoids stepping out of bounds (one of the problems in the original code). The initialization loop for the allocated memory could perhaps be replaced with a memset() operation, or you could use calloc() instead of malloc().
Your original code was setting the pointers in the array of pointers to null; you could not then access any data because there was no data; you had not allocated the space for the actual struct a values, just space for an array of pointers to such values.
Of course, the code should either check whether memory allocation failed or use a cover function for the memory allocator that guarantees never to return if memory allocation fails. It is not safe to assume memory allocation will succeed without a check somewhere. Cover functions for the allocators often go by names such as xmalloc() or emalloc().
Someone else pointed out that malloc.h is non-standard. If you are using the tuning facilities it provides, or the reporting facilities it provides, then malloc.h is fine (but it is not available everywhere so it does limit the portability of your code). However, most people most of the time should just forget about malloc.h and use #include <stdlib.h> instead; using malloc.h is a sign of thinking from the days before the C89 standard, when there was no header that declared malloc() et al, and that is a long time ago.
See also Freeing 2D array of stack; the code there was isomorphic with this code (are you in the same class?). And I recommended and illustrated the same simplification there.
I just added a printf in main to test b.length and b.x[] values and everything's work.
Just added it like that printf("%d, %d %d %d %d", b.length, b.x[0], b.x[1], b.x[2], b.x[3]); before the return.
It gaves 4, 0, 0, 0, 0 which is I think what you expect no? Or it is an algorithmic error
I assume you are trying to zero all of the unsigned longs inside the array of A's pointed to within B.
Is there a precedence issue with the -> and [] operators here?
Try:
(b->x)[length--] = 0;
And maybe change
typedef struct a
{
unsigned long x;
} A, *PA;
to
typedef struct a
{
unsigned long x;
} A;
typedef A * PA;
etc

Resources