Array of structs vs. Array of pointers to structs - c

As I continue learning the C language I got a doubt. Which are the differences between using an array in which each element is an struct and using an array in which each element is a pointer to the same type of struct. It seems to me that you can use both equally (Although in the pointers one you have to deal with memory allocation). Can somebody explain me in which case it is better to use one or the other?
Thank you.

Arrays of structures and arrays of pointers to structures are different ways to organize memory.
Arrays of structures have these strong points:
it is easy to allocate such an array dynamically in one step with struct s *p = calloc(n, sizeof(*p));.
if the array is part of an enclosing structure, no separate allocation code is needed at all. The same is true for local and global arrays.
the array is a contiguous block of memory, a pointer to the next and previous elements can be easily computed as struct s *prev = p - 1, *next = p + 1;
accessing array element members may be faster as they are close in memory, increasing cache efficiency.
They also have disadvantages:
the size of the array must be passed explicitly as there is no way to tell from the pointer to the array how many elements it has.
the expression p[i].member generates a multiplication, which may be costly on some architectures if the size of the structure is not a power of 2.
changing the order of elements is costly as it may involve copying large amounts of memory.
Using an array of pointers has these advantages:
the size of the array could be determined by allocating an extra element and setting it to NULL. This convention is used for the argv[] array of command line arguments provided to the main() function.
if the above convention is not used, and the number of elements is passed separately, NULL pointer values could be used to specify missing elements.
it is easy to change the order of elements by just moving the pointers.
multiple elements could be made to point to the same structure.
reallocating the array is easier as only the array of pointers needs reallocation, optionally keeping separate length and size counts to minimize reallocations. Incremental allocation is easy too.
the expression p[i].member generates a simple shift and an extra memory access, but may be more efficient than the equivalent expression for arrays of structures.
and the following drawbacks:
allocating and freeing this indirect array is more cumbersome. An extra loop is required to allocate and/or initialize the structures pointed to by the array.
access to structure elements involve an extra memory indirection. Compilers can generate efficient code for this if multiple members are accessed in the same function, but not always.
pointers to adjacent structures cannot be derived from a pointer to a given element.
EDIT: As hinted by David Bowling, one can combine some of the advantages of both approaches by allocating an array of structures on one hand and a separate array of pointers pointing to the elements of the first array. This is a handy way to implement a sort order, or even multiple concomitant sort orders with separate arrays of pointers, like database indexes.

Related

How do I add/subtract pointers to different sections of the heap (in C)

In a program I'm writing, I am implementing binary tree and linked list structures; because I don't know how many nodes I will need, I am putting them on the heap and having the program use realloc() if they need more room.
The problem is that such structures include pointers to other locations in the same structure, and because the realloc() moves the structure, I need to redo all those pointers (unless I change them to offsets, but that increases the complexity of the code and the cost of using the structure, which is far more common than the reallocations).
Now, this might not be a problem; I could just take the old pointer, subtract it from the new pointer, and add the result to each of the pointers I need to change. However, this only works if it is possible to subtract two pointers and get the difference in their addresses (and then add that difference to another pointer to get the pointer that many bytes ahead); because I'm working on the heap, I can't guarantee that the difference of addresses will be divisible by the size of the entries, so normal pointer subtraction (which gives the number of objects in between) will introduce errors. So how do I make it give me the difference in bytes, and work even when they are in two different sections of the heap?
To get the difference between two pointers in bytes, cast them to char *:
(char *) ptrA - (char *) ptrB;
However, if you'd like to implement a binary tree or linked list with all nodes sharing the same block of memory, consider using an array of structs instead, with the pointers being replaced by array indices. The primary advantage of using pointers for a linked list or tree, rather than an array of structs, is that you can add or remove nodes individually without reallocating memory or moving other nodes around, but by making the nodes share the same array, you're negating this advantage.
The best way would indeed be to malloc() a new chunk for every node you have. But this might have some overhead for the internal management of the memory, so if you have lots of them, it might be useful to indeed allocate space fore more nodes at once.
If you need to realloc then, you should go another way:
1. Calculate the offset within your memory block: `ofs = ptrX - start`
2. Add this offset to the new address returned by `realloc()`.
This way, you always stay inside the area you allocated and don't have strange heap pointer differences with nearly no meaning.
In fact ,you can use malloc or calloc to get memory for each node.
So you only need to remeber the address of tree's root node.
In this way, you never need realloc memeory for the whole tree . The address of each node also never change . :)

Are flexible array members really necessary?

A struct with a flexible array member, apparently, is not intended to be declared, but rather used in conjunction with a pointer to that struct. When declaring a flexible array member, there must be at least one other member, and the flexible array member must be the last member in that struct.
Let's say I have one that looks like this:
struct example{
int n;
int flm[];
}
Then to use it, I'll have to declare a pointer and use malloc to reserve memory for the structure's contents.
struct example *ptr = malloc(sizeof(struct example) + 5*sizeof(int));
That is, if I want my flm[] array to hold five integers. Then, I can just use my struct
like this:
ptr->flm[0] = 1;
My question is, shouldn't I be able to just use a pointer instead of this? Not only would it be compatible pre-C99, but I could use it with or without a pointer to that struct.
Considering I already have to use malloc with the flm, shouldn't I just be able to do this?
Consider this new definition of the example struct;
struct example{
int n;
int *notflm;
}
struct example test = {4, malloc(sizeof(int) * 5)};
I'd even be able to use the replacement the same way as the flexible array member:
Would this also work? (Provided the above definition of example with notflm)
struct example test;
test.n = 4;
notflm = malloc(sizeof(int) * 5);
Pointers are not arrays. The basic reasons for choosing which to use are the same as they always are with arrays versus pointers. In the special case of flexible array members, here are some reasons you may prefer them over a pointer:
Reducing storage requirements. A pointer will enlarge your structure by (typically) 4 or 8 bytes, and you'll spend much more in overhead if you allocate the pointed-to storage separately rather than with a single call to malloc.
Improving access efficiency. A flexible array member is located at a constant offset from the structure base. A pointer requires a separate dereference. This affects both number of instructions required to access it, and register pressure.
Atomicity of allocation success/failure. If you allocate the structure and allocate storage for it to point to as two separate steps, your code for cleaning up in the failure cases will be much uglier, since you have the case where one succeeded and the other failed. This can be avoided with some pointer arithmetic to carve both out of the same malloc request, but it's easy to get the logic wrong and invoke UB due to alignment issues.
Avoiding need for deep-copy. If you use a flexible array instead of a pointer, you can simply memcpy (not assign, since assignment can't know the flexible array length) to copy the structure rather than having to copy the pointed-to data too and fix up the pointer in the new copy.
Avoiding need for deep-free. It's very convenient and clean to be able to just free a single object rather than having to free pointed-to data too. This can also be achieved with the "carving up a single malloc" approach mentioned above, of course, but flexible arrays make it easier and less error-prone.
Surely many more reasons...
Those concepts are definitely not necessary as you have pointed out yourself.
The differences between the two that you have demonstrated are where your data is located in memory.
In the first example with flexible array your metadata and the array itself are in the same block of memory and can be moved as one block (pointer) if you have to.
In the second example your metadata is on the stack and your array is elsewhere on the heap. In order to move/copy it you will now need to move two blocks of memory and update the pointer in your metadata structure.
Generally flexible size arrays are used when you need to place an array and it's metadata spatially together in memory.
An example where this is definitely useful is for instance when placing an array with it's metadata in a file - you have only one continuous block of memory and each time you load it it will (most likely) be placed in a different location of your VM.

difference between array and list

what is the difference between array and list?
In C, an array is a fixed-size region of contiguous storage containing multiple objects, one after the other. This array is an "object" in the meaning which C gives to the word - basically just some memory that represents something. An object could just be an int.
You can distinguish slightly between array objects, and array types. Often people use array objects which are allocated with malloc, and used via a pointer to the first element. But C does also have specific types for arrays of different sizes, and also for variable-length-arrays, whose size is set when they are created. VLAs have a slightly misleading name: the size is only "variable" in the sense that it isn't fixed at compile time. It can't change during the lifetime of the object.
So, when I say an array is fixed-size I mean that the size cannot change once the array is created, and this includes VLAs. There is realloc, which logically returns a pointer to a new array that replaces the old one, but can sometimes return the same address passed in, having changed the size of the array in place. realloc operates on memory allocations, not on arrays in general.
That's what an array is. The C programming language doesn't define anything called a list. Can't really compare something which is well defined, with something that isn't defined ;-) Usually "list" would mean a linked list, but in some contexts or in other languages it means other things.
For that matter, in other languages "array" could mean other things, although I can't immediately think of a language where it means anything very different from a C array.
If your question really has nothing to do with C, and is a language-agnostic data-structures question, "what is the difference between an array and a linked list?", then it's a duplicate of this:
Array versus linked-list
There is no such thing as a standard list in C. There is such a thing in C++, where it is implemented as a double-linked list.
The main differences are that arrays have random access - you can access any member of the array in O(1) time (i.e. if a was an array, a[4]) and have a pre-set size at compile time. Linked lists have sequential access - to access an element, you have to loop through the list until you get to the element you want (i.e. if b was a linked list, to get to the 5th element of b you would have to iterate through elements 0, 1, 2, 3 and 4), and the size can be grown and shrunk dynamically.
Although there is nothing like a list in C per se but you sure could be talking about a linked lists implementation.
Array: Random access, predefine size.
Linked List: Sequential access, size at runtime.
Other languages like, say Python, may have have both lists and arrays inbuilt and their meaning may differ.
Useful comments from below:
You could add array lists. Lists which internally is an array which is doubled when needed and halved when only 1/4 full. This gives O(1) for add, remove, get(index) amortized. – lasseespeholt
Python's list is not a linked list. And the distinction between Python list and array is list can store anything while array can only store primitive types (int, float, etc). – KennyTM
For array, it has a fixed size like we write, new int [100]
but list does not have a fixed size...it can go on and on
Insertion and Deletion is easier in list than in array
Reason: we can simply use to change the pointers to insert and delete for linked list but for array insert and deletion needs shiftRight and shiftLeft
Linked List uses a dummy head node to avoid special cases of inserting into an empty list, or removing the last node from a list of unit size; and, it uses double links to allow iterating in both directions. The cost of course is the extra space needed to hold the dummy node (minimal cost), and the extra previous link in addition the usual next link for each node (much more significant cost).
In array, we can add with the help of its random access
In Linked list, reference to the tail node is simply header.prev, which gives us ability to append to the list in constant time (without having to iterate to find the tail reference, or having to maintain a separate tail reference).
But in array, we need to re-size the array before inserting.
Array has the flexibility to attain random access unlike Linked List.
Linked list has problems like,
It consumes extra memory storage for the pointer we are using!
Time complexity of O(n) instead of O(1) like in array
Reverse traversing is difficult for singly linked list and if we use doubly linked list, another pointer means more of extra memory storage
Heap Restriction as well! Memory is allocated only if there is space available in the heap. If insufficient memory then memory won't be created.
Array has problems like,
a chance of memory wastage or shortage.
Hope this helps ! :)
An often under appreciated characteristic of Linked data structures is that you can use them in situations where memory is highly fragmented due to there being no contiguous memory guarantee between elements. For example you could have 100MB of free space but only say a maximum run of free memory of length 10MB. In this case you can only create an an array of size 10MB but perhaps a potentially larger linked list since you would be able to make use of every run of free memory which was large enough to contain a single node.
array has only similar data types(i.e.,) they are homogeneous in nature. we can only have an array of strings , integers etc. also the size of array is predefined.
but in the case of list we can have any type of elements. let it be a string integer or combination of both.Also null or duplicate elements are allowed in list. example of list include arraylist , linkedlist.here in list the size can grow or shrink at any time.

What is a "value" array?

In C, the idea of an array is very straightforward—simply a pointer to the first element in a row of elements in memory, which can be accessed via pointer arithmetic/ the standard array[i] syntax.
However, in languages like Google Go, "arrays are values", not pointers. What does that mean? How is it implemented?
In most cases they're the same as C arrays, but the compiler/interpreter hides the pointer from you. This is mainly because then the array can be relocated in memory in a totally transparent way, and so such arrays appear to have an ability to be resized.
On the other hand it is safer, because without a possibility to move the pointers you cannot make a leak.
Since then (2010), the article Slices: usage and internals is a bit more precise:
The in-memory representation of [4]int is just four integer values laid out sequentially:
Go's arrays are values.
An array variable denotes the entire array; it is not a pointer to the first array element (as would be the case in C).
This means that when you assign or pass around an array value you will make a copy of its contents. (To avoid the copy you could pass a pointer to the array, but then that's a pointer to an array, not an array.)
One way to think about arrays is as a sort of struct but with indexed rather than named fields: a fixed-size composite value.
Arrays in Go are also values in that they are passed as values to functions(in the same way ints,strings,floats etc.)
Which requires copying the whole array for each function call.
This can be very slow for a large array, which is why in most cases it's usually better to use slices

How can I concatenate two arrays in C?

How do I concatenate two arrays to get a single array containing the elements of both original arrays?
Arrays in C simply are a contiguous area of memory, with a pointer to their start*. So merging them involves:
Find the length of the arrays A and B, (you will probably need to know the number of elements and the sizeof each element)
Allocating (malloc) a new array C that is the size of A + B.
Copy (memcpy) the memory from A to C,
Copy the memory from B to C + the length of A (see 1).
You might want also to de-allocate (free) the memory of A and B.
Note that this is an expensive operation, but this is the basic theory. If you are using a library that provides some abstraction, you might be better off. If A and B are more complicated then a simple array (e.g. sorted arrays), you will need to do smarter copying then steps 3 and 4 (see: how do i merge two arrays having different values into one array).
Although for the purpose of this question, the pointer explanation will suffice, strictly speaking (and for pacifying the commenter below): C has the concept of an array, that can be used without the syntax of pointers. Implementation wise, however, a C array and a contiguous area of memory, with a pointer are close enough they can be, and often are, used interchangeably.

Resources