I am unable to understand difference between them. When the same thing is done by three of them then when we should go for Array/Structure/Union?
In an array all the elements have the same size and type, so you can't use one for an int and the other one as a double value, and so on.
In structs, every element can have a different size or type. You can use one as an int and the others for any data type you can use for a regular variable, you can also have arrays of structures.
The unions are used to use a single variable for possibly multiple data types. In a union the size of an instance equals the size of it's largest member, unlike in structs where it equals the sum of individual member sizes.
Also, essentially the syntax is very much clearer if you use a struct even for members of the same type. For example, instead of having
float ****point3d;
You could have
struct point3d_s {
float x, float y, float z;
};
point3d_s *point3d;
will declare a pointer to a 3 dimensional point, which in turn can be used as an array too.
Well, they are three totally different objects.
Use array when you should have many (well, at least two...) elements of the very same type. Mainly, when the number of them might vary.
For example: Hold all the phone numbers of students in a class.
Use struct when you should aggregate a few variables together.
For example: Hold, per a student, their name, their phone number and their address.
Use union when you should always use only one variable type out of a few possible ones.
For example: Hold, per a student, either his phone number or their email address.
Array has no padding in between its elements as compared to structure. All elements of array and structure are considered for total size calculation, while union size is equal to its maximum sized element.
Array have all elements of same type, which is no prerequisite for structure and union.
Array uses index based access for accessing its elements, while structure and union uses .element_name for accessing its elements.
Related
What is the difference between using flexible array member (FAM) or pointer member ? In the two cases, a malloc and an affectation element by element must be done. But with FAM, a memory allocation is done for the whole structure and with ptr member, a memory allocation is done for the ptr member only (see code). What are the pros ans the cons of these two methods ?
#include <stdio.h>
#include <stdlib.h>
typedef struct farr_mb {
int lg;
int arr[];
} Farr_mb;
typedef struct ptr_mb {
int lg;
int * ptr;
} Ptr_mb;
int main() {
int lg=5;
Farr_mb *a=malloc(sizeof(Farr_mb)+lg*sizeof(int));
Ptr_mb b; b.ptr=malloc(lg*sizeof(int));
for (int i=0;i<lg;i++) (a->arr)[i]=i;
for (int i=0;i<lg;i++) (b.ptr)[i]=i;
for (int i=0;i<lg;i++) printf("%d \t",(a->arr)[i]=i);
printf("\n");
for (int i=0;i<lg;i++) printf("%d \t",(b.ptr)[i]=i);
return 0;
}
Before we get to the pros and cons, let's look at some real-world examples.
Let's say we wish to implement a hash table, where each entry is a dynamically managed array of elements:
struct hash_entry {
size_t allocated;
size_t used;
element array[];
};
struct hash_table {
size_t size;
struct hash_entry **entry;
};
#define HASH_TABLE_INITIALIZER { 0, NULL }
This in fact uses both. The hash table itself is a structure with two members. The size member indicates the size of the hash table, and the entry member is a pointer to an array of hash table entry pointers. This way, each unused entry is just a NULL pointer. When adding elements to a hash table entry, the entire struct entry can be reallocated (for sizeof (struct entry) + allocates * sizeof (element) or freed, as long as the corresponding pointer in the entry member in the struct hash_table is updated accordingly.
If we used element *array instead, we would need use struct hash_entry *entry: in the struct hash_table; or allocate the struct hash_entry separately from the array; or allocate both struct hash_entry and array in the single chunk, with the array pointer pointing just after the same struct hash_entry.
The cost of that would be two extra size_ts worth of memory used for each unused hash table slot, as well as an extra pointer dereference when accessing elements. (Or, to get the address of the array, two consecutive pointer dereferences, instead of one pointer dereference plus offset.) If this is a key structure heavily used in an implementation, that cost can be visible in profiling, and negatively affect cache performance. For random accesses, the larger the element array is, the less difference there is, however; the cost is largest when the arrays are small, and fit within the same cacheline (or a few cachelines) as the allocated and used members.
We do not usually want to make the entry member in the struct hash_table a flexible array member, because that would mean you no longer can declare a hash table statically, using struct hash_table my_table = HASH_TABLE_INITIALIZER;; you would need to use a pointer to a table, and an initializer function: struct hash_table *my_table; my_table = hash_table_init(); or similar.
I do have another example of related data structures using both pointer members and flexible array members. It allows one to use variables of type matrix to represent any 2D matrix with double entries, even when a matrix is a view to another (say, a transpose, a block, a row or column vector, or even a diagonal vector); these views are all equal (unlike in e.g. GNU Scientific Library, where matrix views are represented by a separate data type). This matrix representation approach makes writing robust numerical linear algebra code easy, and the ensuing code is much more readable than when using GSL or BLAS+LAPACK. In my opinion, that is.
So, let's look at the pros and cons, from the point of view of how to choose which approach to use. (For that reason, I will not designate any feature as "pro" or "con", as the determination depends on the context, on each particular use case.)
Structures with flexible array members cannot be initialized statically. You can only refer to them via pointers.
You can declare and initialize structures with pointer members. As shown in above example, using a preprocessor initializer macro can mean you do not need an initializer function. For example, a function accepting a struct hash_table *table parameter can always resize the array of pointers using realloc(table->entry, newsize * sizeof table->entry[0]), even when table->entry is NULL. This reduces the number of functions needed, and simplifies their implementation.
Accessing an array via a pointer member can require an extra pointer dereference.
If we compare the accesses to arrays in statically initialized structures with pointer to the array, to a structure with a flexible array member referred via a static pointer, the same number of dereferences are made.
If we have a function that gets the address of a structure as a parameter, then accessing an array element via a pointer member requires two pointer dereferences, whereas accessing a flexible array element requires only one pointer dereference and one offset. If the array elements are small enough and the array index small enough, so that the accessed array element is in the same cacheline, the flexible array member access is often significantly faster. For larger arrays, the difference in performance tends to be insignificant. This does vary between hardware architectures, however.
Reallocating an array via a pointer member hides the complexity from those using the structure as an opaque variable.
This means that if we have a function that receives a pointer to a structure as a parameter, and that structure has a pointer to a dynamically allocated array, the function can reallocate that array without the caller seeing any change in the structure address itself (only structure contents change).
However, if we have a function that receives a pointer to a structure with a flexible array member, reallocating the array means reallocating the entire structure. That potentially modifies the address of the structure. Because the pointer is passed by value, the modification is not visible to the caller. Thus, a function that may resize a flexible array member, must receive a pointer to a pointer to the structure with a flexible array member.
If the function only examines the contents of a structure with a flexible array member, say counts the number of elements that fulfill some criteria, then a pointer to the structure suffices; and both the pointer and the pointed-to data can be marked const. This might help the compiler produce better code. Furthermore, all the data accessed is linear in memory, which helps more complex processors manage caching more efficiently. (To do the same with an array having a pointer member, one would need to pass the pointer to the array, as well as the size field at least, as parameters to the counting function, instead of a pointer to the structure containing those values.)
An unused/empty structure with a flexible array member can be represented by a NULL pointer (to such structure). This can be important when you have an array of arrays.
With structures with flexible array members, the outer array is just an array of pointers. With structures with pointer members, the outer array can be either an array of structures, or an array of pointers to structures.
Both can support different types of sub-arrays, if the structures have a common type tag as the first member, and you use an union of those structures. (What 'use' means in this context, is unfortunately debatable. Some claim you need to access the array via the union, I claim the visibility of such an union is sufficient because anything else will break a huge amount of existing POSIX C code; basically all server-side C code using sockets.)
Those are the major ones I can think of right now. Both forms are ubiquitous in my own code, and I have had no issues with either. (In particular, I prefer using a structure free helper function that poisons the structure to help detect use-after-free bugs in early testing; and my programs do not often have any memory-related issues.)
I will edit the above list, if I find I've missed important facets. Therefore, if you have a suggestion or think I've overlooked something above, please let me know in a comment, so I can verify and edit as appropriate.
So let's say I have a struct that looks like this
(pretty common in the real world, it turns out):
struct foo {
char[24] bar;
uint32_t fnord;
uint32_t quux;
}__attribute__((aligned(4));
What is the stride of bar, that is, what is &bar[1] - &bar[0],
given that it's in struct foo?
This has implications for sizeof(foo), which I'm pretty sure I wanted
to be 32, and I also wanted nice fast aligned operations on foo.fnord and foo.quux, or it wouldn't be aligned in the first place.
Per paragraph 6.2.5/20 of the standard,
An array type describes a contiguously allocated nonempty set of
objects with a particular member object type
(Emphasis added.) Thus, the elements of an array are always contiguous in memory. That is among the defining characteristics of an array. Linkage, storage class, membership in another data structure, alignment requirement of the array itself or of any data structure containing it -- none of these affect array elements' contiguity.
The alignment requirement of an array is normally a multiple of that of its element type, so that aligning the array itself also aligns all its elements. In no case are array elements subject to individual alignment.
While creating a linked list, a data type is supposed to be defined beforehand. Here's a pseudocode for creating a new linked list:
Type ListNode
Declare Pointer as integer
Declare data as string
EndType
Declare Namelist[1-50] of Listnode
For Index=1 to 49
Namelist[Index].pointer=Index + 1
Endfor
Namelist[50].pointer=0
What confuses me is the similarity between newly defined data types and multidimensional arrays where multiple data elements could be stored.
How do newly defined data types allow the storage of multiple different data elements within a single array element?
Before answering your question, I want to define few things. I am considering C programming language here.
Array: An array is a container object that holds a fixed number of values of a single type.
Basic data-type: A basic type is a data type provided by a programming language as a basic building block.
Struct: A struct in the C programming language (and many derivatives) is a complex data type declaration that defines a physically grouped list of variables to be placed under one name in a block of memory, allowing the different variables to be accessed via a single pointer, or the struct declared name which returns the same address.
How do newly defined data types allow the storage of multiple different data elements within a single array element?
As mentioned above, array contains object of a single type. If you have defined a struct (in C) or a class (in Java), you can store objects of type defined by you in an array and each array element will be a compound element.
Example
typedef struct student {
int id;
char name[30];
} Student;
This defines a type called struct student and we have set a name (Student) for the type using typedef. Note that, typedef is just a way to alias a type with a specific name.
Now, we can declare variables or array of variables of type Student:
Student record;
Student records[10];
We can declare the above in the following way as well.
struct student record;
struct student records[10];
Here, record is a single variable and records is an array which contains 10 variables of type struct student.
Now each element of array records has two data elements, id (type int) and name (type char array). This is how you can store different data elements in a single element of an array.
Note that if you declare an array of primitive data types, say integer or double, you will only be able to store elements of a that specific type, not elements with different data types in the array.
Multidimensional array
int var[2][3];
Here, var is a multidimensional (2d) array which can hold 6 (2 x 3) integer values. Note that, even though var is a 2d array, it can only contain elements of type int, not others.
Your understanding about multidimensional arrays is incorrect. As you said, in multidimensional array, multiple data elements could be stored. It is correct but all the data elements will be of same type. So, there is no similarity between multidimensional array and our defined data types (say struct in C).
Say I have:
struct a b[4];
//i filled some elements of b
I need to know the number of non-empty elements of b.
Since I'm not sure whether b has exactly 4 non-empty elements, is there any way to do this?
there is no way to retrieve this information. you have to keep track of the number of elements you use by yourself.
typically, C developpers use another integer value alongside the array:
struct a b[4];
int b_count;
increment the counter each time you fill an element in the array.
you can wrap all this into a structure, in order to keep the counter near the array. this allows you to return the array along with the counter from a function:
struct array {
struct a values[4];
int count;
};
struct array b;
There are two normal ways to do this.
The first is to have some sort of sentinel value which indicates that the array element isn't in use. For example, if you were storing quantities in an integer, you could use the value -1 to indicate it wasn't in use.
As a more relevant example to your situation:
struct a {
int inUse;
// all other fields in structure
};
and set inUse within the array element to 1 or 0 depending on whether that array element is in use.
The second is to maintain extra information outside of the array to indicate which elements were in use. This could be a map if the usage information was sparse, or just a count if you could guarantee active elements would be contiguous at the start.
For a map, you could use:
struct a b[4];
int inUse[4]; // shows inUse indication for each element.
For a simpler count variation:
struct a b[4];
int inUseAtStart; // 0 thru 4 shows how many elements are in use,
// starting at b[0].
There is no such empty or non-empty distinction in C.
The very thing you describe as empty may refer to uninitialized variables.
You will have to keep track of how many elements of the array you use when you populate it. Note that you will have to(==must) do this because in c, there is no bound checking for arrays,so you have to keep track that you do not exceed the bounds of the array(You end up with an Undefined Behavior if you don't), while doing so you can easily keep track of how many elements you used.
C won't create any overhead in it's arrays and therefore it won't store any additional information including element count. There's a decent c++ std::vector template for it in case you don't want to do it yourself (which can be annoying) and in case you are willing to use c++, just saying :)
One thing that you can do is mark the next item after the last item you inserted. For example you used 2 elements then you can mark the third element with a specific data like -1.
Another way is that you can do is to keep a variable which has the count of the elements in the struct.
I need high-performance iteration over transient arrays (on stack and/or in heap) which can store mixed types of data, including various types of pointers.
I thought of using unions to determine largest size of several supported array members.
is the following, the fastest (safe) architecture and solution?
union array_sizer {
void *(* funcPtr)();
void *dataPtr;
struct {int i} *strPtr;
int intVal;
float floatVal;
}
// create an array of 10 item *pairs*.
union array_sizer *myArray = malloc(22 * sizeof(union array_sizer));
// fill up the (null-terminated) array
// then, knowing that every *even* item is an int...
for(int i=0; myArray[i].intVal; i+=2){
//(... do something in loop ...)
}
the array data will be created from functions which enforce data integrity, so the for loop can be pretty skimpy on error checking beyond null-termination.
I would make a parallel 'look up table' which indexes into the original array to say what type it is. So make an enum representing the types and make that a corresponding array.
However if you look at performance if you do it this way you will get page faulting and cache misses because likely the 2 arrays are going to be on different pages. So to get round this what you want to do is instead of a 'struct of arrays' make an 'array of structs'. To do this create a struct which has 2 members : the enum type constant and the data itself. If you do this, when we fetch an index it will ensure the data and the corresponding type information will be on the same page.
This would be my preferred method from a high level design point of view.