What is the benefit of declaring a C structure member as in array of size 1 instead of a pointer :
struct {
a_struct_t a_member[1];
...
}b_struct;
Thanks in advance
In a typical case, a structure with a member that's declared as an array of one item will have that member as the last item in the struct. The intent is that the struct will be allocated dynamically. When it is allocated, the code will allocate space for as many items as you really want/need in that array:
struct X {
time_t birthday;
char name[1];
};
struct X *x = malloc(sizeof(*x) + 35);
x->birthday = mktime(&t);
strcpy(x->name, "no more than 35 characters");
This works particularly well for strings -- the character you've allocated in the struct gives you space for the NUL terminator, so when you do the allocation, the number of characters you allocate is exactly the strlen() of the string you're going to put there. For most other kinds of items, you normally want to subtract one from the allocation size (or just live with the allocated space being one item larger than is strictly necessary).
You can do (sort of) the same thing with a pointer, but it results in allocating the body of the struct separately from the item you refer to via the pointer. The good point is that (unlike the method above) more than one item can be allocated dynamically, where the method above only works for the last member of the struct.
What you describe are two different things entirely. If you have a pointer as a member:
a_struct_t* a_member;
then it is simply a pointer. There is no memory allocated inside of the struct to hold an a_struct_t. If, on the other hand, you have an array of size 1:
a_struct_t a_member[1];
then your struct actually has an object of type a_struct_t inside of it. From a memory standpoint, it isn't much different from just putting an object of that type inside the struct:
a_struct_t a_member;
From a usage standpoint, an array requires indirection to access the one element (i.e., you need to use *a_member instead of a_member).
"Array of size 1 instead of a pointer"? Sorry, but I don't see how this quiestion can possibly make sense. I would understand if you asked about "array of size 1 instead of an ordinary member (non-array)". But "instead of a pointer"? What does pointer have to do with this? How is it interchangeable with an array, to justify the question?
If what you really wanted to ask is why it is declared as an array of size 1 instead of non-array as in
struct {
a_struct_t a_member;
} b_struct;
then one possible explanation is the well-known idiom called "struct hack". You might see a declaration like
struct {
...
a_struct_t a_member[1];
} b_struct;
used to implement an array of flexible size as the last member of the struct object. The actual struct object is later created within a memory block that is large enough to accomodate as many array elements as necessary. But in this case the array has to be the last member of the struct, not the first one as in your example.
P.S. From time to time you might see "struct hack" implemented through an array of size 0, which is actually a constraint violation in C (i.e. a compile error).
So I think it's been stated that the main difference between pointers and arrays is that you have to allocate memory for pointers.
The tricky part about your question is that even as you allocate space for your struct, if your struct contains a pointer you have to allocate a SECOND time for the pointer, but the pointer itself would be allocated as part of the struct's allocaiton.
If your struct contained an array of 1 you would not have to allocate any additional memory, it would be stored in the struct (which you still have to allocate).
These are different things.
Such member's name is an address of allocated memory, allocated inside the struct instance itself.
Related
I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.
What is the difference between using flexible array member (FAM) or pointer member ? In the two cases, a malloc and an affectation element by element must be done. But with FAM, a memory allocation is done for the whole structure and with ptr member, a memory allocation is done for the ptr member only (see code). What are the pros ans the cons of these two methods ?
#include <stdio.h>
#include <stdlib.h>
typedef struct farr_mb {
int lg;
int arr[];
} Farr_mb;
typedef struct ptr_mb {
int lg;
int * ptr;
} Ptr_mb;
int main() {
int lg=5;
Farr_mb *a=malloc(sizeof(Farr_mb)+lg*sizeof(int));
Ptr_mb b; b.ptr=malloc(lg*sizeof(int));
for (int i=0;i<lg;i++) (a->arr)[i]=i;
for (int i=0;i<lg;i++) (b.ptr)[i]=i;
for (int i=0;i<lg;i++) printf("%d \t",(a->arr)[i]=i);
printf("\n");
for (int i=0;i<lg;i++) printf("%d \t",(b.ptr)[i]=i);
return 0;
}
Before we get to the pros and cons, let's look at some real-world examples.
Let's say we wish to implement a hash table, where each entry is a dynamically managed array of elements:
struct hash_entry {
size_t allocated;
size_t used;
element array[];
};
struct hash_table {
size_t size;
struct hash_entry **entry;
};
#define HASH_TABLE_INITIALIZER { 0, NULL }
This in fact uses both. The hash table itself is a structure with two members. The size member indicates the size of the hash table, and the entry member is a pointer to an array of hash table entry pointers. This way, each unused entry is just a NULL pointer. When adding elements to a hash table entry, the entire struct entry can be reallocated (for sizeof (struct entry) + allocates * sizeof (element) or freed, as long as the corresponding pointer in the entry member in the struct hash_table is updated accordingly.
If we used element *array instead, we would need use struct hash_entry *entry: in the struct hash_table; or allocate the struct hash_entry separately from the array; or allocate both struct hash_entry and array in the single chunk, with the array pointer pointing just after the same struct hash_entry.
The cost of that would be two extra size_ts worth of memory used for each unused hash table slot, as well as an extra pointer dereference when accessing elements. (Or, to get the address of the array, two consecutive pointer dereferences, instead of one pointer dereference plus offset.) If this is a key structure heavily used in an implementation, that cost can be visible in profiling, and negatively affect cache performance. For random accesses, the larger the element array is, the less difference there is, however; the cost is largest when the arrays are small, and fit within the same cacheline (or a few cachelines) as the allocated and used members.
We do not usually want to make the entry member in the struct hash_table a flexible array member, because that would mean you no longer can declare a hash table statically, using struct hash_table my_table = HASH_TABLE_INITIALIZER;; you would need to use a pointer to a table, and an initializer function: struct hash_table *my_table; my_table = hash_table_init(); or similar.
I do have another example of related data structures using both pointer members and flexible array members. It allows one to use variables of type matrix to represent any 2D matrix with double entries, even when a matrix is a view to another (say, a transpose, a block, a row or column vector, or even a diagonal vector); these views are all equal (unlike in e.g. GNU Scientific Library, where matrix views are represented by a separate data type). This matrix representation approach makes writing robust numerical linear algebra code easy, and the ensuing code is much more readable than when using GSL or BLAS+LAPACK. In my opinion, that is.
So, let's look at the pros and cons, from the point of view of how to choose which approach to use. (For that reason, I will not designate any feature as "pro" or "con", as the determination depends on the context, on each particular use case.)
Structures with flexible array members cannot be initialized statically. You can only refer to them via pointers.
You can declare and initialize structures with pointer members. As shown in above example, using a preprocessor initializer macro can mean you do not need an initializer function. For example, a function accepting a struct hash_table *table parameter can always resize the array of pointers using realloc(table->entry, newsize * sizeof table->entry[0]), even when table->entry is NULL. This reduces the number of functions needed, and simplifies their implementation.
Accessing an array via a pointer member can require an extra pointer dereference.
If we compare the accesses to arrays in statically initialized structures with pointer to the array, to a structure with a flexible array member referred via a static pointer, the same number of dereferences are made.
If we have a function that gets the address of a structure as a parameter, then accessing an array element via a pointer member requires two pointer dereferences, whereas accessing a flexible array element requires only one pointer dereference and one offset. If the array elements are small enough and the array index small enough, so that the accessed array element is in the same cacheline, the flexible array member access is often significantly faster. For larger arrays, the difference in performance tends to be insignificant. This does vary between hardware architectures, however.
Reallocating an array via a pointer member hides the complexity from those using the structure as an opaque variable.
This means that if we have a function that receives a pointer to a structure as a parameter, and that structure has a pointer to a dynamically allocated array, the function can reallocate that array without the caller seeing any change in the structure address itself (only structure contents change).
However, if we have a function that receives a pointer to a structure with a flexible array member, reallocating the array means reallocating the entire structure. That potentially modifies the address of the structure. Because the pointer is passed by value, the modification is not visible to the caller. Thus, a function that may resize a flexible array member, must receive a pointer to a pointer to the structure with a flexible array member.
If the function only examines the contents of a structure with a flexible array member, say counts the number of elements that fulfill some criteria, then a pointer to the structure suffices; and both the pointer and the pointed-to data can be marked const. This might help the compiler produce better code. Furthermore, all the data accessed is linear in memory, which helps more complex processors manage caching more efficiently. (To do the same with an array having a pointer member, one would need to pass the pointer to the array, as well as the size field at least, as parameters to the counting function, instead of a pointer to the structure containing those values.)
An unused/empty structure with a flexible array member can be represented by a NULL pointer (to such structure). This can be important when you have an array of arrays.
With structures with flexible array members, the outer array is just an array of pointers. With structures with pointer members, the outer array can be either an array of structures, or an array of pointers to structures.
Both can support different types of sub-arrays, if the structures have a common type tag as the first member, and you use an union of those structures. (What 'use' means in this context, is unfortunately debatable. Some claim you need to access the array via the union, I claim the visibility of such an union is sufficient because anything else will break a huge amount of existing POSIX C code; basically all server-side C code using sockets.)
Those are the major ones I can think of right now. Both forms are ubiquitous in my own code, and I have had no issues with either. (In particular, I prefer using a structure free helper function that poisons the structure to help detect use-after-free bugs in early testing; and my programs do not often have any memory-related issues.)
I will edit the above list, if I find I've missed important facets. Therefore, if you have a suggestion or think I've overlooked something above, please let me know in a comment, so I can verify and edit as appropriate.
I'm currently having an issue with the following struct:
typedef struct __attribute__((__packed__)) rungInput{
operation inputOperation;
inputType type;
char* name;
char numeroInput;
u8 is_not;
} rungInput;
I create multiple structs like above inside a for loop, and then fill in their fields according to my program logic:
while (a < 5){
rungInput input;
(...)
Then when I'm done filling the struct's fields appropriately, I then attempt to copy the completed struct to an array as such:
rungArray[a] = input; //memcpy here instead?
And then I iterate again through my loop. I'm having a problem where my structs seem to all have their name value be the same, despite clearly having gone through different segments of code and assigning different values to that field for every loop iteration.
For example, if I have three structs with the following names: "SW1" "SW2" SW3", after I am done adding them to my array I seem to have all three structs point me to the value "SW3" instead. Does this mean I should call malloc() to allocate manually each pointer inside each struct to ensure that I do not have multiple structs that point to the same value or am I doing something else wrong?
When you write rungArray[i] = input;, you are copying the pointer that is in the input structure into the rungArray[i] structure. If you subsequently overwrite the data that the input structure is pointing at, then you also overwrite the data that the rungArray[i] structure is pointing at. Using memcpy() instead of assignment won't change this at all.
There are a variety of ways around this. The simplest is to change the structure so that you allocate a big enough array in the structure to hold the name:
enum { MAX_NAME_SIZE = 32 };
…
char name[MAX_NAME_SIZE];
…
However, if the extreme size of a name is large but the average size is small, then this may waste too much space. In that case, you continue using a char *, but you do indeed have to modify the copying process to duplicate the string with dynamically allocated memory:
rungArray[i] = input;
rungArray[i].name = strdup(input.name);
Remember to free the memory when you discard the rungArray. Yes, this code copies the pointer and then overwrites it, but it is more resilient to change because all the fields are copied, even if you add some extra (non-pointer) fields, and then the pointer fields are handled specially. If you write the assignments to each member in turn, you have to remember to track all the places where you do this (that would be a single assignment function, wouldn't it?) and add the new assignments there. With the code shown, that mostly happens automatically.
You should malloc memory for your struct and then store the pointers to the structs inside your array. You could also turn your structs into a linked list by adding a pointer to each struct that points to the next instance of your struct.
http://www.cprogramming.com/tutorial/c/lesson15.html
I have a struct in my c code of about 300Bytes (5xint + 256chars), and I wish to have a good mechanism of array for handling all my 'objects' of this struct.
I want to have a global array of pointers, so that at first all indices in the array points to NULL, but then i initialize each index when I need it (malloc) and delete it when im done with it (free).
typedef struct myfiles mf;
mf* myArr[1000];
Is that what Im looking for? Pointers mixed with arrays often confuse me.
If so, just to clerify, does
mf myArr[1000];
already allocates 1000 structs on the stack, where my first suggestion only allocates 1000pointers?
You are correct. Former allocates 1000 pointers, none of which are initialized, latter initializes 1000 objects of ~300 bytes each.
To initalize to null: foo_t* foo[1000] = {NULL};
But this is still silly. Why not just mf* myArr = NULL? Now you have one pointer to uninitialized memory instead of 1000 pointers to initialized memory and one pointer to keep track of. Would you rather do
myArraySingle = malloc(sizeof(mf)*1000); or
for(int i = 0; i < 1000; i++) {
myArray[i] = malloc(1000);
}
And access by myArraySingle[300] or *(myArray[300])`? Anyway my point is syntax aside don't create this unnecessary indirection. A single pointer can point to a contiguous chunk of memory that holds a sequence of objects, much like an array, which is why pointers support array-style syntax and why array indices start at 0.
typedef struct myfiles mf;
mf* myArr[1000];
This is what you are looking for. This will allocate array of 1000 pointers to the structure mf.
You seem to understand correctly.
More accurately, I believe mf* myArr[1000] = { 0 }; would better meet your requirements, because you want a guarantee that all of the elements will be initialised to null pointers. Without an initialisation, that guarantee doesn't exist.
There is no "global" in C. You're referring to objects with static storage duration, declared at file scope.
typedef struct myfiles mf;
mf* myArr[1000];
yes, it will initialize 1000 pointers, you have to allocate memory to each one using malloc/calloc before use.
I just learned that it's possible to increase the size of the memory you'll allocate to a struct when using the malloc function. For example, you can have a struct like this:
struct test{
char a;
int v[1];
char b;
};
Which clearly has space for only 2 chars and 1 int (pointer to an int in reality, but anyway). But you could call malloc in such a way to make the struct holds 2 chars and as many ints as you wanted (let's say 10):
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test)+sizeof(int)*9);
ptr->v[9]=50;
printf("%d\n",ptr->v[9]);
return 0;
}
The output here would be "50" printed on the screen, meaning that the array inside the struct was holding up to 10 ints.
My questions for the experienced C programmers out there:
What is happening behind the scenes here? Does the computer allocate 2+4 (2 chars + pointer to int) bytes for the standard "struct test", and then 4*9 more bytes of memory and let the pointer "ptr" put whatever kind of data it wants on those extra bytes?
Does this trick only works when there is an array inside the struct?
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
...Which clearly has space for only 2 chars and 1 int (pointer to an
int in reality, but anyway)...
Already incorrect. Arrays are not pointers. Your struct holds space for 2 chars and 1 int. There's no pointer of any kind there. What you have declared is essentially equivalent to
struct test {
char a;
int v;
char b;
};
There's not much difference between an array of 1 element and an ordinary variable (there's conceptual difference only, i.e. syntactic sugar).
...But you could call malloc in such a way to make it hold 1 char and as
many ints as you wanted (let's say 10)...
Er... If you want it to hold 1 char, why did you declare your struct with 2 chars???
Anyway, in order to implement an array of flexible size as a member of a struct you have to place your array at the very end of the struct.
struct test {
char a;
char b;
int v[1];
};
Then you can allocate memory for your struct with some "extra" memory for the array at the end
struct test *ptr = malloc(offsetof(struct test, v) + sizeof(int) * 10);
(Note how offsetof is used to calculate the proper size).
That way it will work, giving you an array of size 10 and 2 chars in the struct (as declared). It is called "struct hack" and it depends critically on the array being the very last member of the struct.
C99 version of C language introduced dedicated support for "struct hack". In C99 it can be done as
struct test {
char a;
char b;
int v[];
};
...
struct test *ptr = malloc(sizeof(struct test) + sizeof(int) * 10);
What is happening behind the scenes here? Does the computer allocate
2+4 (2 chars + pointer to int) bytes for the standard "struct test",
and then 4*9 more bytes of memory and let the pointer "ptr" put
whatever kind of data it wants on those extra bytes?
malloc allocates as much memory as you ask it to allocate. It is just a single flat block of raw memory. Nothing else happens "behind the scenes". There's no "pointer to int" of any kind in your struct, so any questions that involve "pointer to int" make no sense at all.
Does this trick only works when there is an array inside the struct?
Well, that's the whole point: to access the extra memory as if it belongs to an array declared as the last member of the struct.
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
It doesn't manage anything. If the array is not the last member of the struct, then trying to work with the extra elements of the array will trash the members of the struct that declared after the array. This is pretty useless, which is why the "flexible" array has to be the last member.
No, that does not work. You can't change the immutable size of a struct (which is a compile-time allocation, after all) by using malloc ( ) at run time. But you can allocate a memory block, or change its size, such that it holds more than one struct:
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test) * 9);
}
That's just about all you can do with malloc ( ) in this context.
In addition to what others have told you (summary: arrays are not pointers, pointers are not arrays, read section 6 of the comp.lang.c FAQ), attempting to access array elements past the last element invokes undefined behavior.
Let's look at an example that doesn't involve dynamic allocation:
struct foo {
int arr1[1];
int arr2[1000];
};
struct foo obj;
The language guarantees that obj.arr1 will be allocated starting at offset 0, and that the offset of obj.arr2 will be sizeof (int) or more (the compiler may insert padding between struct members and after the last member, but not before the first one). So we know that there's enough room in obj for multiple int objects immediately following obj.arr1. That means that if you write obj.arr1[5] = 42, and then later access obj.arr[5], you'll probably get back the value 42 that you stored there (and you'll probably have clobbered obj.arr2[4]).
The C language doesn't require array bounds checking, but it makes the behavior of accessing an array outside its declared bounds undefined. Anything could happen -- including having the code quietly behave just the way you want it to. In fact, C permits array bounds checking; it just doesn't provide a way to handle errors, and most compilers don't implement it.
For an example like this, you're most likely to run into visible problems in the presence of optimization. A compiler (particularly an optimizing compiler) is permitted to assume that your program's behavior is well-defined, and to rearrange the generated code to take advantage of that assumption. If you write
int index = 5;
obj.arr1[index] = 42;
the compiler is permitted to assume that the index operation doesn't go outside the declared bounds of the array. As Henry Spencer wrote, "If you lie to the compiler, it will get its revenge".
Strictly speaking, the struct hack probably involves undefined behavior (which is why C99 added a well-defined version of it), but it's been so widely used that most or all compilers will support it. This is covered in question 2.6 of the comp.lang.c FAQ.