Why should you use a 2D array of structs? - c

I`m curious about if there are cases to use a 2D array of structures, like f.e.:
typedef struct
{
//member variables
}myStruct;
myStruct array2D[x][y];
//Uses cases of array2D[x][y]? Do we need them?
Why should you use a 2D array of structs?

Why should you use a 2D array of structs?
A defined structure is basically just a type like any other (though of course, any type is different from the other and there is a difference between private and standard datatypes; not even mentioning the differences in memory allocation between objects of that types) with which you can declare objects.
You could also ask: Why should you use a char,int or double two-dimensional array?
It is not only a structure own kind of thing.
It depends on the context and its worth to have a clear "structure" in ones code; So to code in this way can help you to make your code more readable and clear, if you need a huge amount of objects of a certain type, in this case a structure.
Maybe you even want to group some objects and/or want to treat them differently. In this case, multiple array dimensions are beneficial because you can address objects of each dimension explicitly and separately.
As #buysbee mentioned in the comments:
One obvious example, where it is beneficial to store structure objects in a two-dimensional array is to storing the pixels of a picture. It is better to store the "pixel" structure objects in a two-dimensional array because this emulates how a picture is constructed of naturally.

We can add more questions: why we should use 3D, 4D. 5D ...... 10000000D arrays of something.
Data structures used in a project depend on the problem to be solved and the algorithm chosen.

Related

Fundamental limitations of cell arrays, arrays of structs, and scalar structs?

I've been using Matlab on and off for decades. I thought I had a good grip on arrays, structs, cell arrays, tables, an array of structs, and a struct in which each field is an array. For the latter two, I assumed that each field needed to be of uniform type. I'm finding that no such limitation exists:
Perhaps Matlab is becoming more flexible with the years (I'm using 2015b), but it does undermine my confidence in choosing the best type of variable for a task if I find that understanding of the limitations of each type is wrong. For the purpose of this question, I can't really articulate the needs of the task because the manner in which I break down a large to-do into tasks depends on my understanding of the data types at my disposal, and their advantages/limitations.
I can (and have) read online documentation ad nauseum, and while they will walk you through code to illustrate what the data types are able to do, I haven't yet come across a succinct description of the comparative limitations between cell arrays, arrays of structs, and structs whose fields are themselves arrays -- to the point that I can use that knowledge to choose the best structure in a given situation. Basic stuff, I do find, e.g., the same field names will occur in each struct of a struct array (but as the above example shows, each field of each struct can contain highly heterogeneous data types and/or array sizes).
THE QUESTION
Can anyone point to such a comparison of limitations between cell arrays, arrays of structs, and scalar structs whose fields are themselves arrays? I'm looking for a treatment at a level that informs a coder in deciding on the best trade-off between (i) speed, (ii) memory, and (iii) readability, maintainability, and evolvability.
I've deliberately left out tables because, although I'm enamoured of their convenient access to, and subsetting of, data sets (and presentation thereof), they have proved rather slow for manipulation of data. They have their uses, and I use them liberally, but I'm not interested in them for the purpose of this comparison, which is under-the-hood algorithm coding.
I think your question eventually narrows down to these three "types" of data structures:
comparative limitations between cell arrays, arrays of structs, and structs whose fiels are themselves arrays
[Note that "structs whose fields are themselves arrays" I translate as "scalar structs" here. An array of structs can also contain arbitrary arrays. My thinking becomes clear below, I hope.]
To me, these are not very different. All three are containers for heterogeneous data. (Heterogeneous data is non-uniform data, each data element is potentially of a different type and size.) Each of these statements can return an array of any type, unrelated to the type of any other array in the container:
cell array: array{i,j}
struct array: array(i,j).value
scalar struct: array.value
So it all depends on how you want to index:
array(i,j).value
^ ^
A B
If you want to index using A only, use a cell array (though you then need curly braces, of course). If you want to index using B only, use a scalar struct. If you want both A and B, use a struct array.
There is no difference in cost that I'm aware of. Each of the arrays contained in these containers takes up some space. The spatial overhead of the various containers is similar, and I have never noted a time overhead difference.
However, there is a huge difference between these two:
array(i).value % s1
array.value(i) % s2
I think that the question deals with this difference also. s1 has a lot more spatial overhead than s2:
>> s1=struct('value',num2cell(1:100))
s1 =
1×100 struct array with fields:
value
>> s2=struct('value',1:100)
s2 =
struct with fields:
value: [1×100 double]
>> whos
Name Size Bytes Class Attributes
s1 1x100 12064 struct
s2 1x1 976 struct
The data needs 800 bytes, so s2 has 176 bytes of overhead, whereas s1 has 11264 (1408%)!
The reason is not the container, but the fact that we're storing one array with 100 elements in one, and 100 arrays with one element in the other. Each array has a header of a certain size that MATLAB uses to know what type of array it is, what sizes it has, to manage its storage and the delayed copy mechanism. The fewer arrays one has, the less memory one uses.
So, don't use a heterogeneous container to store scalars! These things only make sense when you need to store larger arrays, or arrays of different type or size.
The heterogeneous container that is not explicitly asked about (and after the edit explicitly not asked about) is the table. A table is similar to a scalar struct in that each column of the table is a single array, and different columns can have different types. Note that it is possible to use a cell array as a column, allowing for heterogenous elements to be stored in a column, but they make most sense if this is not the case.
One difference with a scalar struct is that each column must have the same number of rows. Another difference is that indexing can look like that of a cell array, a scalar struct, or a struct array.
Thus, the table forces some constrains upon the contained data, which is very beneficial in some circumstances.
However, and as the OP noted, working with tables is slower than working with structs. This is because table is a custom class, not a native type like structs and cell arrays. If you type edit table in MATLAB, you'll see the source code, how it's implemented. It's a classdef file, just like something any of us could write. Consequently, it has the same speed limitations: the JIT is not optimized for it, indexing into a table implies running a function written as an M-file, etc.
One more thing: Don't create cell arrays of structs, or scalar structs with cell arrays. This increases the levels of containers, which increases overhead (both in space and time), and makes the contents more difficult to use. I have seen questions here on SO related to difficulty accessing data, caused by this type of construct:
data{i,j}.value % A cell array with structs. Don't do this!
data.value{i,j} % A struct with cell arrays. Don't do this!
The first example is equal to a struct array (with a lot more overhead), except there is no control over the struct fields within each cell. That is, it is possible for one of the cells to not have a .value field.
The second example makes sense only if value is a different size than a second struct field. If all struct fields are (supposed to be) cell arrays of the same size like this, then use a struct array. Again, less overhead and more uniformity.

Data structure - Array

Here it says:
Arrays are useful mostly because the element indices can be computed
at run time. Among other things, this feature allows a single
iterative statement to process arbitrarily many elements of an array.
For that reason, the elements of an array data structure are required
to have the same size and should use the same data representation.
Is this still true for modern languages?
For example, Java, you can have an array of Objects or Strings, right? Each object or string can have different length. Do I misunderstand the above quote, or languages like Java implements Array differently? How?
In java all types except primitives are referenced types meaning they are a pointer to some memory location manipulated by JVM.
But there are mainly two types of programming languages, fixed-typed like Java and C++ and dynamically-typed like python and PHP. In fixed-typed languages your array should consist of the same types whether String, Object or ...
but in dynamically-typed ones there's a bit more abstraction and you can have different data types in array (I don't know the actual implementation though).
An array is a regular arrangement of data in memory. Think of an array of soldiers, all in a line, with exactly equal spacing between each man.
So they can be indexed by lookup from a base address. But all items have to be the same size. So if they are not, you store pointers or references to make them the same size. All languages use that underlying structure, except for what are sometimes called "associative arrays", indexed by key (strings usually), where you have what is called a hash table. Essentially the hash function converts the key into an array index, with a fix-up to resolve collisions.

Cheapest structure for deferencing?

Let's say I have an associative array keyed by unsigned int; values could be of any fixed-size type. There is some pre-defined maximum no. of instances.
API usage example: MyStruct * valuePtr = get(1234); and put(6789, &myStructInstance); ...basic.
I want to minimise cache misses as I read entries rapidly and at random from this array, so I pre-malloc(sizeof(MyType) * MAX_ENTRIES) to ensure locality of reference inasmuch as possible.
Genericism is important for the values array. I've looked at C pseudo-generics, but prefer void * for simplicity; however, not sure if this is at odds with performance goals. Ultimately, I would like to know what is best for performance.
How should I implement my associative array for performance? Thoughts thus far...
Do I pass the associative array a single void * pointer to the malloced values array and allow it to use that internally (for which we would need to guarantee a matching keys array size)? Can I do this generically, since the type needs(?) to be known in order to index into values array?
Do I have a separate void * valuePtrs[] within the associative array, then have these pointers point to each element in the malloced values array? This would seem to avoid need to know about concrete type?
Do I use C pseudo-generics and thus allow get() to return a specific value type? Surely in this case, the only benefit is not having to explicitly cast, e.g. MyStruct* value = (MyStruct*) get(...)... the array element still has to be dereferenced and so has the same overhead?
And, in general, does the above approach to minimising cahce misses appear to make sense?
In both cases, the performance is basically the same.
In the first one (void* implementation), you will need to look up the value + dereference the pointer. So these are two instructions.
In the other implementation, you will need to multiply the index with the size of the values. So this implementation also asks for two instructions.
However, the first implementation will be easier and more clean to implement. Also, the array is fully transparent; the user will not need to know what kind of structures are in the array.
See solutions categorised below, in terms of pros and cons (thanks to Ruben for assisting my thinking)... I've implemented Option 2 and 5 for my use case, which is somewhat generalised; I recommend Option 4 if you need a very specific, one-off data structure. Option 3 is most flexible while being trivial to code, and is also the slowest. Option 4 is the quickest. Option 5 is a little slow but with flexibility on the array size and ease of general use.
Associative array struct points to array of typed pointers:
pros no failure value required, explicit casts not required, does not need compile-time size of array
cons costly double deref, requires generic library code
Associative array struct holds array of void * pointers:
pros no failure value required, no generic library code
cons costly double deref, explicit casts following get(), needs compile time size of array if VLAs are not used
Associative array struct points to array of void * values:
pros no generic library code, does not need compile-time size of array
cons costly triple deref, explicit casts following get(), requires offset calc which requires sizeof value passed in explicitly
Associative array struct holds array of typed values:
pros cheap single deref, explicit casts not required, keys and entries allocated contiguously
cons requires generic library code, failure value must be supplied, needs compile time size of array if VLAs are not used
Associative array struct points to array of typed values:
pros explicit casts not required, flexible array size
cons costly double deref, requires generic library code, failure value must be supplied, needs compile time size of array if VLAs are not used

When is the best time to use a Structure or an array

I am a little new to C programming. I was writing a C program which has 3 integers to handle. I had all of them inside an array and suddenly I had a thought of why should I not use a structure.
My question here is when is the best time to use a structure and when to use an array. And is there any memory usage difference between the two in this particular case.
Any help regarding this is appriciated. Thanks!
An array is best when you want to loop through the values (which, essentially, means they're strongly related). Otherwise a structure allows you to give them meaningful names and avoids the need to document that array, e.g. myVar[1] is the name of the company and myVar[0] is its phone number, etc. as opposed to companyName, companyPhone.
The difference is about semantic information. If you want to store your information as a list where there is no semantic distinction between different members of that list, then use an array. Perhaps each member of the list represents a different value for the same thing.
If each of those integers represents something special or different, use a struct. Note the implications of using a struct, such as the fact that people expect the members to be closely related semantically.
struct has other advantages over array which can make it more powerful. For example, its ability to encapsulate multiple data types.
If you are passing this information between many functions, a structure is likely more practical (because there is no need to pass the size). It would be bad to pass an array (which decays to a pointer) and expect the callee to know how many items are in the array. Using a struct implicitly makes this part of the function contract.
In terms of size, there is no difference. A 4 byte int would typically be 4-byte aligned.
You can think of structure like an object in OOP languages, a structure ties related data into a single type and allows you to access each member of the structure using the member's name instead of array indices. If you can think of a singular name that could unify the related data then you should be using a structure.
An array can be thought of as a list of items, if the name you thought of above contains the word list or collection or is a plural, then you should be using arrays or other collection types. The primary use of arrays is to loop over it and apply the same operation to every items in the array or a range of items in the array. If you used an array but never looped over it, it's an indication that probably array may not be the best data type.
I would suggest to use an array if the different things you store are logically the same data, but different instance of this. (like a list of telephone numbers or ages). And use a struct when they mean different things (like age and size) bound together because they are related to the same thing (a person).
The size is equal, since both store 3 integers without anything else; You could actually cast the struct to an array and use it like that (although you shouldn't do that for its ugliness).
You could test that with this simple programm:
#include <stdio.h>
struct three_numbers{
int x;
int y;
int z;
};
int main(int argc, char** argv) {
int test[3];
printf("struct: %d, array: %d\n", sizeof(three_numbers), sizeof(test));
}
prints on my system:
struct: 12, array: 12
In my opinion, you should think first from the perspective of the design to decide which one to use. In your question you have mentioned that "I have three integers to handle". The point here is that how did you arrive at three integers?
Just as many others have noted, let's say you need store details of a person, first you need to think of the person as an object and then decide what all information relevant to that person you will need and then decide what data type you need to use for each of those details. What you are trying to do is that you have decided that data types first and then trying work your way up.
To just put in simple words about the difference between structure and array. Structure is a Composite Data Type (or a User defined data type) whereas array is just a collection of similar data.
Use structures to group information about a single object. Use arrays to group information about multiple objects.

dynamic arrays and flexible arrays

what is the difference between those two types of arrays, thanks in advance for any good example taken from different languages
In C/C++ a flexible array is a member of a struct that is an array (not just a pointer) but doesn't specify a length (reference). As such, a struct with a flexible array is an incomplete type, so the typeof operator can't be used.
A dynamic array is an ordered collection (list) that can grow and shrink. Dynamic arrays are usually implemented as a linked list or something similar.
Dynamic arrays have values that can be changed by code and more values populated. The flexible array is an array that does not have a hard coded limit to its length.

Resources