Are flexible array members really necessary? - c

A struct with a flexible array member, apparently, is not intended to be declared, but rather used in conjunction with a pointer to that struct. When declaring a flexible array member, there must be at least one other member, and the flexible array member must be the last member in that struct.
Let's say I have one that looks like this:
struct example{
int n;
int flm[];
}
Then to use it, I'll have to declare a pointer and use malloc to reserve memory for the structure's contents.
struct example *ptr = malloc(sizeof(struct example) + 5*sizeof(int));
That is, if I want my flm[] array to hold five integers. Then, I can just use my struct
like this:
ptr->flm[0] = 1;
My question is, shouldn't I be able to just use a pointer instead of this? Not only would it be compatible pre-C99, but I could use it with or without a pointer to that struct.
Considering I already have to use malloc with the flm, shouldn't I just be able to do this?
Consider this new definition of the example struct;
struct example{
int n;
int *notflm;
}
struct example test = {4, malloc(sizeof(int) * 5)};
I'd even be able to use the replacement the same way as the flexible array member:
Would this also work? (Provided the above definition of example with notflm)
struct example test;
test.n = 4;
notflm = malloc(sizeof(int) * 5);

Pointers are not arrays. The basic reasons for choosing which to use are the same as they always are with arrays versus pointers. In the special case of flexible array members, here are some reasons you may prefer them over a pointer:
Reducing storage requirements. A pointer will enlarge your structure by (typically) 4 or 8 bytes, and you'll spend much more in overhead if you allocate the pointed-to storage separately rather than with a single call to malloc.
Improving access efficiency. A flexible array member is located at a constant offset from the structure base. A pointer requires a separate dereference. This affects both number of instructions required to access it, and register pressure.
Atomicity of allocation success/failure. If you allocate the structure and allocate storage for it to point to as two separate steps, your code for cleaning up in the failure cases will be much uglier, since you have the case where one succeeded and the other failed. This can be avoided with some pointer arithmetic to carve both out of the same malloc request, but it's easy to get the logic wrong and invoke UB due to alignment issues.
Avoiding need for deep-copy. If you use a flexible array instead of a pointer, you can simply memcpy (not assign, since assignment can't know the flexible array length) to copy the structure rather than having to copy the pointed-to data too and fix up the pointer in the new copy.
Avoiding need for deep-free. It's very convenient and clean to be able to just free a single object rather than having to free pointed-to data too. This can also be achieved with the "carving up a single malloc" approach mentioned above, of course, but flexible arrays make it easier and less error-prone.
Surely many more reasons...

Those concepts are definitely not necessary as you have pointed out yourself.
The differences between the two that you have demonstrated are where your data is located in memory.
In the first example with flexible array your metadata and the array itself are in the same block of memory and can be moved as one block (pointer) if you have to.
In the second example your metadata is on the stack and your array is elsewhere on the heap. In order to move/copy it you will now need to move two blocks of memory and update the pointer in your metadata structure.
Generally flexible size arrays are used when you need to place an array and it's metadata spatially together in memory.
An example where this is definitely useful is for instance when placing an array with it's metadata in a file - you have only one continuous block of memory and each time you load it it will (most likely) be placed in a different location of your VM.

Related

How to declare a global array of structs and use it in different functions?

Switching from a OO language (C#), I would like to know what is the best way to declare a struct array that has application lifetime in C. After 1h struggle (and research for ex. about why not to use typedef, why to repeat struct later, etc.), this code is working:
// declaration
struct server {
char* name;
char* ip_address;
int port;
} server;
struct server *servers; // declaring struct server[] servers; does not work
Then using like this in a function, working as well (after multiple experiments with & and *...):
// nb_servers is known from previous calculation
servers = malloc(nb_servers * sizeof(struct server));
// later in the same function
free(servers);
Questions
Why does declaring the struct array with [] not work? Question actually is, is it also possible to declare an array with '[]' (unknown size) and then dynamically initialize it later with malloc and if yes, what is the syntax to do it? Pure syntax question independent of differences in how memory is managed.
If I free(servers) I can no longer use the values after that. But if I don't, if this function gets called multiple times, will the value of this variable simply be overwritten by the result of the new call? Will not freeing servers cause a memory leak?
Hope it is clear, I'm 100% new to C.
is it also possible to declare an array with '[]' (unknown size) and then dynamically initialize it later with malloc?
No. An array must have a knowable size when it is declared. This can be done by explicitly giving it a size in the brackets, or it can be inferred by its initializer. Once its size is set, it will remain that size for its lifetime. malloc, in the strictest sense, doesn't create an "array", but rather returns a pointer to a segment of memory that can be indexed similarly to an array syntactically, but is not truly an array in the same sense as an object declared with [].
Will not freeing servers cause a memory leak?
Yes, you will have a memory leak if you do not free servers before mallocing more memory for it without freeing. If you simply need to resize the array without ditching the data, you can use realloc.

Array of structs vs. Array of pointers to structs

As I continue learning the C language I got a doubt. Which are the differences between using an array in which each element is an struct and using an array in which each element is a pointer to the same type of struct. It seems to me that you can use both equally (Although in the pointers one you have to deal with memory allocation). Can somebody explain me in which case it is better to use one or the other?
Thank you.
Arrays of structures and arrays of pointers to structures are different ways to organize memory.
Arrays of structures have these strong points:
it is easy to allocate such an array dynamically in one step with struct s *p = calloc(n, sizeof(*p));.
if the array is part of an enclosing structure, no separate allocation code is needed at all. The same is true for local and global arrays.
the array is a contiguous block of memory, a pointer to the next and previous elements can be easily computed as struct s *prev = p - 1, *next = p + 1;
accessing array element members may be faster as they are close in memory, increasing cache efficiency.
They also have disadvantages:
the size of the array must be passed explicitly as there is no way to tell from the pointer to the array how many elements it has.
the expression p[i].member generates a multiplication, which may be costly on some architectures if the size of the structure is not a power of 2.
changing the order of elements is costly as it may involve copying large amounts of memory.
Using an array of pointers has these advantages:
the size of the array could be determined by allocating an extra element and setting it to NULL. This convention is used for the argv[] array of command line arguments provided to the main() function.
if the above convention is not used, and the number of elements is passed separately, NULL pointer values could be used to specify missing elements.
it is easy to change the order of elements by just moving the pointers.
multiple elements could be made to point to the same structure.
reallocating the array is easier as only the array of pointers needs reallocation, optionally keeping separate length and size counts to minimize reallocations. Incremental allocation is easy too.
the expression p[i].member generates a simple shift and an extra memory access, but may be more efficient than the equivalent expression for arrays of structures.
and the following drawbacks:
allocating and freeing this indirect array is more cumbersome. An extra loop is required to allocate and/or initialize the structures pointed to by the array.
access to structure elements involve an extra memory indirection. Compilers can generate efficient code for this if multiple members are accessed in the same function, but not always.
pointers to adjacent structures cannot be derived from a pointer to a given element.
EDIT: As hinted by David Bowling, one can combine some of the advantages of both approaches by allocating an array of structures on one hand and a separate array of pointers pointing to the elements of the first array. This is a handy way to implement a sort order, or even multiple concomitant sort orders with separate arrays of pointers, like database indexes.

C efficient way to manage function temporary dynamic memory

I'm making an Nondeterministic Finite Automata (NFA). An NFA has a set of states, I need four array with the same size(number of states in the NFA) to record temporary information about states during the simulation of the NFA.
However, different NFAs has different number of states, thus the size of arrays varies with different NFAs.
Using C I have come up with three ways to deal with the memory allocation:
Use a very big fix-sized array.
malloc memroy dynamically every time the function is called, and before the function finishes, free the allocated memory
Use malloc but not allocate memory on every function call, use four static pointer variables, and one static int arraySize to record allocated array size, for the first time when the function is called, allocate arrays of size NFA->numStates and assign to each of the static pointers, assign NFA->numStates to static int arraySize; for the second time, if NFA->numStates is less than or equal to arraySize, no memory allocation; if NFA->numStates is greater than arraySize, free previous memory and realloc arrays of size NFA->numStates.
Method 1 uses fix-sized array, which means when the input NFA->numStates is greater than the hard-coded array size, the function will not work.
Method 2 is adaptable, but need to allocate memory every time the function is called, not so efficient ?
Method 3 is adaptable too, but is complicated, and is not thread safe ?
Suggestions or other alternatives ?
How about option 2 but with alloca(), i.e. allocate on the stack? It's much (much) faster than malloc(), and automatically de-allocates when you leave the scope which sounds like what you want.
Of course, it's not standard or portable, so it might not be applicable for you.
Failing that, a big fixed-size array seems easy enough and doesn't use more memory.
Variable-Length Arrays (VLAs) have been available since C99. I'd be impressed if you are still working with an implementation that does not support such a thing. VLAs work like regular arrays, except that the size you use to declare them is determined at runtime. That seems to be what you're looking for.
As you are dealing with arrays of arbitrary size I suggest you to implement simple linked list structure that at initialization will hold an array of predefined size and can be extended to hold additional chunks of memory. This will be an abstraction over contiguous memory space. You just need to store the current size of this structure in parent container.
#include <stddef.h>
struct contmem
{
size_t size;
struct memchunk
{
void *data;
struct memchunk *next, *prev;
} head;
}
So, you can extend this structure when you have to and store information by iterating over elements of linked list. This is similar to what happens inside lists from standard C++ library and differs from straight memory reallocation.

structures containing structures vs. structures containing pointers

The following question is in regards to C programming. I am using Microchip C30 compiler (because I know someone will ask)
What is the difference between having a structure which contains several other structures vs a structure which contains several pointers to other structures? Does one make for faster code execution than the other? Does one technique use more or less memory? Does the memory get allocated at the same time in both cases?
If I use the following code does memory automatically get allocated for the subStruct?
// Header file...
typedef struct{
int a;
subStruct * b;
} mainStruct;
typedef struct{
int c;
int d;
}subStruct;
extern mainStruct myMainStruct;
// source file...
mainStruct myMainStruct;
int main(void)
{
//...
{
If you use a pointer, you have to allocate the memory yourself. If you use a substructure, you can allocate the entire thing in one go, either using malloc or on the stack.
What you need depends on your use case:
Pointers will give you smaller struct's
Substructures provide better locality of reference
A pointer may point to either a single struct or the first member in an array of them, while substructures are self-documenting: there's always one of them unless you use an array explicitly
Pointers take up some space, for the pointer itself + overhead from extra memory allocations
And no, it doesn't matter which compiler you use :)
Memory for pointers doesn't get automatically allocated, but when you contain whole structure in your struct, it does.
Also - with pointers you are likely to have fragmented memory - each pointed part of tructure could be in other part of memory.
But with poniters you can share the same substructures across many structs (but this makes changing and deleting them later harder).
Memory for a pointer wouldn't be automatically allocated. You would need to run:
myMainStruct.b=malloc(sizeof(*myMainStruct.b));
In terms of performance, there is likely a small hit to going from one structure to another via the pointer.
As far as speed goes, it varies. Generally, including structs, rather than pointers, will be faster, because the CPU doesn't have to dereference the pointer for every member access. However, if some of the members aren't used very often, and the sub-struct's size is massive, the structure might not fit in the cache and this can slow down your code quite a bit.
Using pointers will use /slightly/ more memory (but only the size of the pointers themselves) than the direct approach.
Usually with pointers to sub-structs you'll allocate the sub-structs separately, but you can write some kind of initialization function which abstracts all the allocation out to "the same time." In your code, memory is allocated for myMainStruct on the stack, but the b member will be garbage. You need to call malloc to allocate heap memory for b, or create a subStruct object on the stack and point myMainStruct.b to it.
What is the difference between having a structure which contains several other structures vs a structure which contains several pointers to other structures?
In the first case what you have is essentially one big structure in contiguous memory. In the "pointers to structures" case your master structure just contains the addresses to the sub-structures which are allocated separately.
Does one make for faster code execution than the other?
The difference should be negligible, but pointers method is will be slightly slower. This is because you must dereference the pointer with each access to the substructure.
Does one technique use more or less memory?
The pointer method uses number_of_pointers * sizeof(void*) more memory. sizeof(void*) will be 4 for 32-bit and 8 for 64-bit.
Does the memory get allocated at the same time in both cases?
No, you need to go through each pointer in your master struct and allocate memory for the sub-structs via malloc().
Conclusion
The pointers add a layer of indirection to the code, which is useful for switching out the sub-structs or having more than one pointer point to the same sub-struct. Having different master-structs pointing to common sub-structs in particular could save quite a bit of memory and allocation time.

Why does C need arrays if it has pointers?

If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?
Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.
It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.
Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.
See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).
Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?
Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Resources