I have a piece of code bellow,and what's the difference of them?
The first one,the address of buf element of the struct is 4 bigger than that of the struct while the second one is not.
First
#include <stdio.h>
typedef struct A
{
int i;
char buf[]; //Here
}A;
int main()
{
A *pa = malloc(sizeof(A));
char *p = malloc(13);
memcpy(p, "helloworld", 10);
memcpy(pa->buf, p, 13);
printf("%x %x %d %s\n", pa->buf, pa, (char *)pa->buf - (char *)pa, pa->buf);
}
Second
typedef struct A
{
int i;
char *buf; //Here
}A;
The first is a C99 'flexible array member'. The second is the reliable fallback for when you don't have C99 or later.
With a flexible array member, you allocate the space you need for the array along with the main structure:
A *pa = malloc(sizeof(A) + strlen(string) + 1);
pa->i = index;
strcpy(pa->buf, string);
...use pa...
free(pa);
As far as the memory allocation goes, the buf member has no size (so sizeof(A) == sizeof(int) unless there are padding issues because of array alignment — eg if you had a flexible array of double).
The alternative requires either two allocations (and two releases), or some care in the setup:
typedef struct A2
{
int i;
char *buf;
} A2;
A2 *pa2 = malloc(sizeof(A2));
pa2->buff = strdup(string);
...use pa2...
free(pa2->buff);
free(pa2);
Or:
A2 *pa2 = malloc(sizeof(A2) + strlen(string) + 1);
pa2->buff = (char *)pa2 + sizeof(A2);
...use pa2...
free(pa2);
Note that using A2 requires more memory, either by the size of the pointer (single allocation), or by the size of the pointer and the overhead for the second memory allocation (double allocation).
You will sometimes see something known as the 'struct hack' in use; this predates the C99 standard and is obsoleted by flexible array members. The code for this looks like:
typedef struct A3
{
int i;
char buf[1];
} A3;
A3 *pa3 = malloc(sizeof(A3) + strlen(string) + 1);
strcpy(pa3->buf, string);
This is almost the same as a flexible array member, but the structure is bigger. In the example, on most machines, the structure A3 would be 8 bytes long (instead of 4 bytes for A).
GCC has some support for zero length arrays; you might see the struct hack with an array dimension of 0. That is not portable to any compiler that is not mimicking GCC.
It's called the 'struct hack' because it is not guaranteed to be portable by the language standard (because you are accessing outside the bounds of the declared array). However, empirically, it has 'always worked' and probably will continue to do so. Nevertheless, you should use flexible array members in preference to the struct hack.
ISO/IEC 9899:2011 §6.7.2.1 Structure and union specifiers
¶3 A structure or union shall not contain a member with incomplete or function type (hence,
a structure shall not contain an instance of itself, but may contain a pointer to an instance
of itself), except that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union containing, possibly
recursively, a member that is such a structure) shall not be a member of a structure or an
element of an array.
¶18 As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
struct A {
int i;
char buf[];
};
does not reserve any space for the array, or for a pointer to an array. What this says is that an array can directly follow the body of A and be accessed via buf, like so:
struct A *a = malloc(sizeof(*a) + 6);
strcpy(a->buf, "hello");
assert(a->buf[0] == 'h');
assert(a->buf[5] == '\0';
Note I reserved 6 bytes following a for "hello" and the nul terminator.
The pointer form uses an indirection (the memory could be contiguous, but this is neither depended on nor required)
struct B {
int i;
char *buf;
};
/* requiring two allocations: */
struct B *b1 = malloc(sizeof(*b1));
b1->buf = strdup("hello");
/* or some pointer arithmetic */
struct B *b2 = malloc(sizeof(*b2) + 6);
b2->buf = (char *)((&b2->buf)+1);
The second is now laid out the same as a above, except with a pointer between the integer and the char array.
Related
I'm interested in the technique used by Sean Barrett to make a dynamic array in C for any type. Comments in the current version claims the code is safe to use with strict-aliasing optimizations:
https://github.com/nothings/stb/blob/master/stb_ds.h#L332
You use it like:
int *array = NULL;
arrput(array, 2);
arrput(array, 3);
The allocation it does holds both the array data + a header struct:
typedef struct
{
size_t length;
size_t capacity;
void * hash_table;
ptrdiff_t temp;
} stbds_array_header;
The macros/functions all take a void* to the array and access the header by casting the void* array and moving back one:
#define stbds_header(t) ((stbds_array_header *) (t) - 1)
I'm sure Sean Barrett is far more knowledgeable than the average programmer. I'm just having trouble following how this type of code is not undefined behavior because of the strict aliasing rules in modern C. If this does avoid problems I'd love to understand why it does so I can incorporate it myself (maybe with a few less macros).
Lets follow the expansions of arrput in https://github.com/nothings/stb/blob/master/stb_ds.h :
#define STBDS_REALLOC(c,p,s) realloc(p,s)
#define arrput stbds_arrput
#define stbds_header(t) ((stbds_array_header *) (t) - 1)
#define stbds_arrput(a,v) (stbds_arrmaybegrow(a,1), (a)[stbds_header(a)->length++] = (v))
#define stbds_arrmaybegrow(a,n) ((!(a) || stbds_header(a)->length + (n) > stbds_header(a)->capacity) \
? (stbds_arrgrow(a,n,0),0) : 0)
#define stbds_arrgrow(a,b,c) ((a) = stbds_arrgrowf_wrapper((a), sizeof *(a), (b), (c)))
#define stbds_arrgrowf_wrapper stbds_arrgrowf
void *stbds_arrgrowf(void *a, size_t elemsize, size_t addlen, size_t min_cap)
{
...
b = STBDS_REALLOC(NULL, (a) ? stbds_header(a) : 0, elemsize * min_cap + sizeof(stbds_array_header));
//if (num_prev < 65536) prev_allocs[num_prev++] = (int *) (char *) b;
b = (char *) b + sizeof(stbds_array_header);
if (a == NULL) {
stbds_header(b)->length = 0;
stbds_header(b)->hash_table = 0;
stbds_header(b)->temp = 0;
} else {
STBDS_STATS(++stbds_array_grow);
}
stbds_header(b)->capacity = min_cap;
return b;
}
how this type of code is not undefined behavior because of the strict aliasing
Strict aliasing is about accessing data that has different effective type than data stored there. I would argue that the data stored in the memory region pointed to by stbds_header(array) has the effective type of the stbds_array_header structure, so accessing it is fine. The structure members are allocated by realloc and initialized one by one inside stbds_arrgrowf by stbds_header(b)->length = 0; lines.
how this type of code is not undefined behavior
I think the pointer arithmetic is fine. You can say that the result of realloc points to an array of one stbds_array_header structure. In other words, when doing the first stbds_header(b)->length = inside stbds_arrgrowf function the memory returned by realloc "becomes" an array of one element of stbds_array_header structures, as If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access from https://port70.net/~nsz/c/c11/n1570.html#6.5p6 .
int *array is assigned inside stbds_arrgrow to point to "one past the last element of an array" of one stbds_array_header structure. (Well, this is also the same place where an int array starts). ((stbds_array_header *) (array) - 1) calculates the address of the last array element by subtracting one from "one past the last element of an array". I would rewrite it as (char *)(void *)t - sizeof(stbds_array_header) anyway, as (stbds_array_header *) (array) sounds like it would generate a compiler warning.
Assigning to int *array in expansion of stbds_arrgrow a pointer to (char *)result_of_realloc + sizeof(stbds_array_header) may very theoretically potentially be not properly aligned to int array type, breaking If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined from https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7 . This is very theoretical, as stbds_array_header structure has size_t and void * and ptrdiff_t members, in any normal architecture it will have good alignment to access int (or any other normal type) after it.
I have only inspected the code in expansions of arrput. This is a 2000 lines of code, there may be other undefined behavior anywhere.
I'm fairly new to C and I am having trouble working with structs. I have the following code:
typedef struct uint8array {
uint8 len;
uint8 data[];
} uint8array;
int compare_uint8array(uint8array* arr1, uint8array* arr2) {
printf("%i %i\n data: %i, %i\n", arr1->len, arr2->len, arr1->data[0], arr2->data[0]);
if (arr1->len != arr2->len) return 1;
return 0;
}
int compuint8ArrayTest() {
printf("--compuint8ArrayTest--\n");
uint8array arr1;
arr1.len = 2;
arr1.data[0] = 3;
arr1.data[1] = 5;
uint8array arr2;
arr2.len = 4;
arr2.data[0] = 3;
arr2.data[1] = 5;
arr2.data[2] = 7;
arr2.data[3] = 1;
assert(compare_uint8array(&arr1, &arr2) != 0);
}
Now the output of this program is:
--compuint8ArrayTest--
3 4
data: 5, 3
Why are the values not what I initialized them to? What am I missing here?
In your case, uint8 data[]; is a flexible array member. You need to allocate memory to data before you can actually access it.
In your code, you're trying to access invalid memory location, causing undefined behavior.
Quoting C11, chapter §6.7.2.1 (emphasis mine)
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. Howev er, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
A proper usage example can also be found in chapter §6.7.2.1
EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if
p had been declared as:
struct { int n; double d[m]; } *p;
typedef struct {
int num;
char arr[64];
} A;
typedef struct {
int num;
char arr[];
} B;
I declared A* a; and then put some data into it. Now I want to cast it to a B*.
A* a;
a->num = 1;
strcpy(a->arr, "Hi");
B* b = (B*)a;
Is this right?
I get a segmentation fault sometimes (not always), and I wonder if this could be the cause of the problem.
I got a segmentation fault even though I didn't try to access to char arr[] after casting.
This defines a pointer variable
A* a;
There is nothing it is cleanly pointing to, the pointer is non-initialised.
This accesses whatever it is pointing to
a->num = 1;
strcpy(a->arr, "Hi");
Without allocating anything to the pointer beforehand (by e.g. using malloc()) this is asking for segfaults as one possible consequence of the undefined behaviour it invokes.
This is an addendum to Yunnosch's answer, which identifies the problem correctly. Let's assume you do it correctly and either write just
A a;
which gives you an object of automatic storage duration when declared inside a function, or you dynamically allocated an instance of A like this:
A *a = malloc(sizeof *a);
if (!a) return -1; // or whatever else to do in case of allocation error
Then, the next thing is your cast:
B* b = (B*)a;
This is not correct, types A and B are not compatible. Here, it will probably work in practice because the struct members are compatible, but beware that strange things can happen because the compiler is allowed to assume a and b point to different objects because their types are not compatible. For more information, read on the topic of what's commonly called the strict aliasing rule.
You should also know that an incomplete array type (without a size) is only allowed as the very last member of a struct. With a definition like yours:
typedef struct {
int num;
char arr[];
} B;
the member arr is allowed to have any size, but it's your responsibility to allocate it correctly. The size of B (sizeof(B)) doesn't include this member. So if you just write
B b;
you can't store anything in b.arr, it has a size of 0. This last member is called a flexible array member and can only be used correctly with dynamic allocation, adding the size manually, like this:
B *b = malloc(sizeof *b + 64);
This gives you an instance *b with an arr of size 64. If the array doesn't have the type char, you must multiply manually with the size of your member type -- it's not necessary for char because sizeof(char) is by definition 1. So if you change the type of your array to something different, e.g. int, you'd write this to allocate it with 64 elements:
B *b = malloc(sizeof *b + 64 * sizeof *(b->arr));
It appears that you are confusing two different topics. In C99/C11 char arr[]; as the last member of a structure is a Flexible Array Member (FAM) and it allows you to allocate for the structure itself and N number of elements for the flexible array. However -- you must allocate storage for it. The FAM provides the benefit of allowing one-allocation and one-free where there would normally be two required. (In C89 a similar implementation went by the name struct hack, but it was slightly different).
For example, B *b = malloc (sizeof *b + 64 * sizeof *b->arr); would allocate storage for b plus 64-characters of storage for b->arr. You could then copy the members of a to b using the proper '.' and '->' syntax.
A short example can illustrate:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NCHAR 64 /* if you need a constant, #define one (or more) */
typedef struct {
int num;
char arr[NCHAR];
} A;
typedef struct {
int num;
char arr[]; /* flexible array member */
} B;
int main (void) {
A a = { 1, "Hi" };
B *b = malloc (sizeof *b + NCHAR * sizeof *b->arr);
if (!b) {
perror ("malloc-b");
return 1;
}
b->num = a.num;
strcpy (b->arr, a.arr);
printf ("b->num: %d\nb->arr: %s\n", b->num, b->arr);
free (b);
return 0;
}
Example Use/Output
$ ./bin/struct_fam
b->num: 1
b->arr: Hi
Look things over and let me know if that helps clear things up. Also let me know if you were asking something different. It is a little unclear exactly where you confusion lies.
I'm trying to understanding the pointer casting in this case.
# https://github.com/udp/json-parser/blob/master/json.c#L408
#define json_char char
typedef struct _json_object_entry
{
json_char * name;
unsigned int name_length;
struct _json_value * value;
} json_object_entry;
typedef struct _json_value
{
struct
{
unsigned int length;
json_object_entry * values;
#if defined(__cplusplus) && __cplusplus >= 201103L
decltype(values) begin () const
{ return values;
}
decltype(values) end () const
{ return values + length;
}
#endif
} object;
}
(*(json_char **) &top->u.object.values) += string_length + 1;
Due to what I see top->u.object.values has the address of the first element of values ( type : json_object_entry ), and then we get the address of values, casting it to char, .. And from here I'm lost. I don't really understand the purpose of this.
// Notes : This is two pass parser for those who wonders what is this.
Thanks
_json_value::values is a pointer to the beginning of (or into) an array of json_object_entrys. The code adjusts its value by a few bytes, e.g in order to skip a header or such before the actual data. Because the pointer is typed one can without casting only change its value in quants of sizeof(_json_object_entry), but apparently the offset can have any value, depending on some string_length. So the address of the pointer is taken, cast to the address of a char pointer (a char pointer can be changed in 1 increments), dereferenced so the result is a pointer to char residing at the same place as the real u.object.values, and then assigned to.
One should add that such code may break at run time if the architecture demands a minimal alignment for structures (possibly depending on their first element, here a pointer) and the string length can have a value which is not a multiple of that alignment. That would make the code UB. I'm not exactly sure whether the code is nominally UB if the alignment is preserved.
Author here (guilty as charged...)
In the first pass, values hasn't yet been allocated, so the parser cheats by using the same field to store the amount of memory (length) that's going to be required when it's actually allocated in the second pass.
if (state.first_pass)
(*(json_char **) &top->u.object.values) += string_length + 1;
The cast to json_char is so that we add multiples of char to the length, rather than multiples of json_object_entry.
It is a bit (...OK, more than a bit...) of a dirty hack re-using the field like that, but it was to save adding another field to json_value or using a union (C89 unions can't be anonymous, so it would have made the structure of json_value a bit weird).
There's no UB here, because we're not actually using values as an array of structs at this point, just subverting the type system and using it as an integer.
json_object_entry * values;
...
}
(*(json_char **) &top->u.object.values) += string_length + 1;
forgetting type correctness, you can collapse the & and *:
((json_char **) top->u.object.values) += string_length + 1;
top->u.object.values is indeed the pointer to first element of values array. It is typecasted to a pointer to a pointer to json_char, and then advanced string_length + 1 characters. The net result is that top->u.object.values now points (string_length + 1) json_chars ahead of what it used to.
struct xyz {
int a;
int b;
char c[0];
};
struct xyz x1;
printf("Size of structure is %d",sizeof(x1));
Output: 8
why isn't the size of structure 9 bytes? Is it because the character array declared is of size 0?
Zero-length arrays are not in the standard C, but they are allowed by many compilers.
The idea is that they must be placed as the very last field in a struct, but they don't occupy any bytes. The struct works as a header for the array that is placed just next to it in memory.
For example:
struct Hdr
{
int a, b, c;
struct Foo foos[0]
};
struct Hdr *buffer = malloc(sizeof(struct Hdr) + 10*sizeof(Foo));
buffer->a = ...;
buffer->foos[0] = ...;
buffer->foos[9] = ...;
The standard way to do that is to create an array of size 1 and then substracting that 1 from the length of the array. But even that technique is controversial...
For more details and the similar flexible array member see this document.
Your array of characters has a length of 0 and hence the size of c is 0 bytes. Therefore when your compiler allocated a block of memory for that structure it only considers both integers and since you are on a 32-bit environment (assuming so from your result) the size of the structure is 8 bytes.
Remark: You can still access the field c without any compiler warnings (compiled with gcc) however it will be some garbage value.
An array of length 0 is actually not permitted in standard C, but apparently your compiler supports it as an extension.
It's one way of implementing the so-called "struct hack", explained in question 2.6 of the comp.lang.c FAQ.
Because C implementations typically don't do bounds checking for arrays, a zero-element array (or in a more portable variant, a one-element array) gives you a base address for an array of arbitrary size. You have to allocate, typically using malloc, enough memory for the enclosing struct so that there's room for as many array elements as you need:
struct xyz *ptr = malloc(sizeof *ptr + COUNT * sizeof (char));
C99 added a new feature, "flexible array members", that does the same thing without specifying a fake array size:
struct xyz {
int a;
int b;
char c[];
};
In this case:
int a;//4
int b;//4
char c[0] ; 0
So it is 8.
And
The sizeof never return 9 in you struct even if you give a size to char c[];
int a;//4 byte
int b;//4 byte
char c[1];// one byte but it should alignment with other.
just in this way:
^^^^
^^^^
^~~~ //for alignment
So, sizeof return 12 not 9