fastest multi-type array solution?

fastest multi-type array solution? - c

I need high-performance iteration over transient arrays (on stack and/or in heap) which can store mixed types of data, including various types of pointers.
I thought of using unions to determine largest size of several supported array members.
is the following, the fastest (safe) architecture and solution?
union array_sizer {
void *(* funcPtr)();
void *dataPtr;
struct {int i} *strPtr;
int intVal;
float floatVal;
}
// create an array of 10 item *pairs*.
union array_sizer *myArray = malloc(22 * sizeof(union array_sizer));
// fill up the (null-terminated) array
// then, knowing that every *even* item is an int...
for(int i=0; myArray[i].intVal; i+=2){
//(... do something in loop ...)
}
the array data will be created from functions which enforce data integrity, so the for loop can be pretty skimpy on error checking beyond null-termination.

I would make a parallel 'look up table' which indexes into the original array to say what type it is. So make an enum representing the types and make that a corresponding array.
However if you look at performance if you do it this way you will get page faulting and cache misses because likely the 2 arrays are going to be on different pages. So to get round this what you want to do is instead of a 'struct of arrays' make an 'array of structs'. To do this create a struct which has 2 members : the enum type constant and the data itself. If you do this, when we fetch an index it will ensure the data and the corresponding type information will be on the same page.
This would be my preferred method from a high level design point of view.

Related

Flexible array member and pointer member : pros and cons?

What is the difference between using flexible array member (FAM) or pointer member ? In the two cases, a malloc and an affectation element by element must be done. But with FAM, a memory allocation is done for the whole structure and with ptr member, a memory allocation is done for the ptr member only (see code). What are the pros ans the cons of these two methods ?
#include <stdio.h>
#include <stdlib.h>
typedef struct farr_mb {
int lg;
int arr[];
} Farr_mb;
typedef struct ptr_mb {
int lg;
int * ptr;
} Ptr_mb;
int main() {
int lg=5;
Farr_mb *a=malloc(sizeof(Farr_mb)+lg*sizeof(int));
Ptr_mb b; b.ptr=malloc(lg*sizeof(int));
for (int i=0;i<lg;i++) (a->arr)[i]=i;
for (int i=0;i<lg;i++) (b.ptr)[i]=i;
for (int i=0;i<lg;i++) printf("%d \t",(a->arr)[i]=i);
printf("\n");
for (int i=0;i<lg;i++) printf("%d \t",(b.ptr)[i]=i);
return 0;
}

Before we get to the pros and cons, let's look at some real-world examples.
Let's say we wish to implement a hash table, where each entry is a dynamically managed array of elements:
struct hash_entry {
size_t allocated;
size_t used;
element array[];
};
struct hash_table {
size_t size;
struct hash_entry **entry;
};
#define HASH_TABLE_INITIALIZER { 0, NULL }
This in fact uses both. The hash table itself is a structure with two members. The size member indicates the size of the hash table, and the entry member is a pointer to an array of hash table entry pointers. This way, each unused entry is just a NULL pointer. When adding elements to a hash table entry, the entire struct entry can be reallocated (for sizeof (struct entry) + allocates * sizeof (element) or freed, as long as the corresponding pointer in the entry member in the struct hash_table is updated accordingly.
If we used element *array instead, we would need use struct hash_entry *entry: in the struct hash_table; or allocate the struct hash_entry separately from the array; or allocate both struct hash_entry and array in the single chunk, with the array pointer pointing just after the same struct hash_entry.
The cost of that would be two extra size_ts worth of memory used for each unused hash table slot, as well as an extra pointer dereference when accessing elements. (Or, to get the address of the array, two consecutive pointer dereferences, instead of one pointer dereference plus offset.) If this is a key structure heavily used in an implementation, that cost can be visible in profiling, and negatively affect cache performance. For random accesses, the larger the element array is, the less difference there is, however; the cost is largest when the arrays are small, and fit within the same cacheline (or a few cachelines) as the allocated and used members.
We do not usually want to make the entry member in the struct hash_table a flexible array member, because that would mean you no longer can declare a hash table statically, using struct hash_table my_table = HASH_TABLE_INITIALIZER;; you would need to use a pointer to a table, and an initializer function: struct hash_table *my_table; my_table = hash_table_init(); or similar.
I do have another example of related data structures using both pointer members and flexible array members. It allows one to use variables of type matrix to represent any 2D matrix with double entries, even when a matrix is a view to another (say, a transpose, a block, a row or column vector, or even a diagonal vector); these views are all equal (unlike in e.g. GNU Scientific Library, where matrix views are represented by a separate data type). This matrix representation approach makes writing robust numerical linear algebra code easy, and the ensuing code is much more readable than when using GSL or BLAS+LAPACK. In my opinion, that is.
So, let's look at the pros and cons, from the point of view of how to choose which approach to use. (For that reason, I will not designate any feature as "pro" or "con", as the determination depends on the context, on each particular use case.)
Structures with flexible array members cannot be initialized statically. You can only refer to them via pointers.
You can declare and initialize structures with pointer members. As shown in above example, using a preprocessor initializer macro can mean you do not need an initializer function. For example, a function accepting a struct hash_table *table parameter can always resize the array of pointers using realloc(table->entry, newsize * sizeof table->entry[0]), even when table->entry is NULL. This reduces the number of functions needed, and simplifies their implementation.
Accessing an array via a pointer member can require an extra pointer dereference.
If we compare the accesses to arrays in statically initialized structures with pointer to the array, to a structure with a flexible array member referred via a static pointer, the same number of dereferences are made.
If we have a function that gets the address of a structure as a parameter, then accessing an array element via a pointer member requires two pointer dereferences, whereas accessing a flexible array element requires only one pointer dereference and one offset. If the array elements are small enough and the array index small enough, so that the accessed array element is in the same cacheline, the flexible array member access is often significantly faster. For larger arrays, the difference in performance tends to be insignificant. This does vary between hardware architectures, however.
Reallocating an array via a pointer member hides the complexity from those using the structure as an opaque variable.
This means that if we have a function that receives a pointer to a structure as a parameter, and that structure has a pointer to a dynamically allocated array, the function can reallocate that array without the caller seeing any change in the structure address itself (only structure contents change).
However, if we have a function that receives a pointer to a structure with a flexible array member, reallocating the array means reallocating the entire structure. That potentially modifies the address of the structure. Because the pointer is passed by value, the modification is not visible to the caller. Thus, a function that may resize a flexible array member, must receive a pointer to a pointer to the structure with a flexible array member.
If the function only examines the contents of a structure with a flexible array member, say counts the number of elements that fulfill some criteria, then a pointer to the structure suffices; and both the pointer and the pointed-to data can be marked const. This might help the compiler produce better code. Furthermore, all the data accessed is linear in memory, which helps more complex processors manage caching more efficiently. (To do the same with an array having a pointer member, one would need to pass the pointer to the array, as well as the size field at least, as parameters to the counting function, instead of a pointer to the structure containing those values.)
An unused/empty structure with a flexible array member can be represented by a NULL pointer (to such structure). This can be important when you have an array of arrays.
With structures with flexible array members, the outer array is just an array of pointers. With structures with pointer members, the outer array can be either an array of structures, or an array of pointers to structures.
Both can support different types of sub-arrays, if the structures have a common type tag as the first member, and you use an union of those structures. (What 'use' means in this context, is unfortunately debatable. Some claim you need to access the array via the union, I claim the visibility of such an union is sufficient because anything else will break a huge amount of existing POSIX C code; basically all server-side C code using sockets.)
Those are the major ones I can think of right now. Both forms are ubiquitous in my own code, and I have had no issues with either. (In particular, I prefer using a structure free helper function that poisons the structure to help detect use-after-free bugs in early testing; and my programs do not often have any memory-related issues.)
I will edit the above list, if I find I've missed important facets. Therefore, if you have a suggestion or think I've overlooked something above, please let me know in a comment, so I can verify and edit as appropriate.

Creating structs and unions on MPI

EDITED:
Thanks for the previous assistance. I really appreciate it.
Should i remove the pointer from *list too?
In order to create atributo and estrucura i'm using pointers too, will it be a problem too?
union atributo{
int valor1;
char valor2[tamChar];
float valor3;
};
struct estructura{
int tipo[tamEstruct]; //Here i will have an array with the types of the union
union atributo *list[tamEstruct];
};
union atributo *atributos;
struct estructura *estructuras;
estructuras = malloc (sizeof(struct estructura) * (cantEstruct) );
MPI_Datatype atributo_MPI;
MPI_Datatype type[1] = { MPI_BYTE };
int blocklen[1] = { tamChar }; // the largest element is the chat
MPI_Aint disp[1];
disp[0]= atributos[0]; // the begin of the union
MPI_Datatype estructura_MPI;
MPI_Datatype type[2] = { MPI_INT, atributo_MPI };
int blocklen[2] = { tamEstruct, tamEstruct};
MPI_Aint disp2[2];
disp2[0]= offsetof(atributos, tipo);
disp2[1]= offsetof(atributos, list);
Am i getting close the correct code?

There's several things wrong in that bit of code.
First, structs and unions are very different in this context. A struct contains each of its elements listed, lined up one after another in memory. A union, on the other hand, is only as large as its largest element, and all of its members share the same memory space (as if they were the sole members of a struct). This means you can't pack a union into an MPI_Struct, since every member's offset is in fact 0.
There's two ways to tackle the problem of an array of unions:
Send them as an array of MPI_BYTE. This is simple, but risks data corruption if your processes don't share the same data representation (e.g. their integers differ in endianness).
Use MPI_Pack() to pack the values one by one. This is type safe, but more complicated, and it will result in an in-memory copy of your data.
Note that in either case, the receiving process will need to know what values it received (i.e. if it received 3 unions, are those 3 integers, or 2 integers and a float, etc). You'll have to send this information as well if you use unions.
You might want to consider a trickier approach: group the enums according to which of their value is valid, and send a message in which similar enums are placed next to each other (e.g. a message that contains an array of MPI_INTs, then an array of MPI_FLOATs and so on). With clever use of derived MPI datatypes, you can preserve type safety, avoid in-memory copying, and avoid sending several messages.
Second: NEVER, ever send pointers between MPI processes! It's easy to think of a char* as an array of chars, but it really isn't. It's just a memory address that won't make sense to the process you send it to, and even if it did, you didn't actually send the data that it was pointing to. If valor2 is a string with a reasonably short maximum length, I'd recommend declaring it as char valor2[maxLength] so that memory for it is allocated inside the actual struct/union. If that is not feasible, you'll have to do even more memory juggling to get your strings across to the other processes, as you would with any variable sized array.

What is there to be gained by deterministic field ordering in the memory layout?

Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?

The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.

If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).

Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.

Array of structure with struct hack

Suppose I have structure in C like this
struct A {
int len;
char s[1];
}
I want to have an array of above structure but char s[1] member of struct A can be of variable length. How can we do this? Even struct hack trick in C99 doesn't seem to work here. One solution is to have char * as a last member and do dynamic memory allocation, but I want all data of struct to be in contiguous locations as my implementation needs to be cache oblivious.

You can't have an array of variable size objects, so you can't have an array of structures using the struct hack. All the objects in an array must be the same size. And if they're all the same size, the size must be implied by the structure, so you won't be using the struct hack after all; there'll be a size other than 1 in the dimension of the array s in your structure (unless 1 is big enough for everything). The reason is that the storage location for a[i] (where a is the name of an array and i is an index into the array) must be computable as 'byte address of a plus (i times size of one object in array)'. So the size of the objects in the array (in this case, structures) must be known and fixed.
As an alternative, you can have an array of pointers to variable size objects; you simply have to arrange to allocate each object separately with the appropriate size, and save the pointer to that in the array.
Note that C99 does away with the 'struct hack' (which was never officially portable, though in practice it was) and introduces 'flexible array members' instead:
struct A {
int len;
char data[];
};
However, the advice above still applies.

If there is a maximum size for "s", you could use that instead of [1]. That keeps everything contiguous.
If you really don't want to use dynamic memory, then you can't do it with an array. You need your own "manager" that will use the struct hack trick on each member individually - but that means you can't do indexed lookups - you have to look at each element to see how big it is and jump the right number of bytes to the next element.

In C, array indexing involves multiplying the base address by the compile-time-constant size of an individual element. For that reason, you can't use inbuilt array support directly with the "struct hack", as each s element will be allocated exactly the 1 byte you request, and indices further past the struct will access following S elements in the array (or go off the end completely, possibly crashing).
If you really need contiguous data for cache-access speed, you can pack it yourself, you can solve this (like most things) with an indirection... have a contiguous array of S*, and manually pack your data into another contiguous buffer (malloc() or stack-allocate enough memory for all your S objects including the real data size of all s[] members). Your performance may suffer (or your OS crash) if the int len elements aren't optimally (properly) aligned for your architecture, so you may need to manually pad between S instances.
S* index[100] char data[10000];
(S*)(data) --------------> S with 14-byte s[] using data[0]..[17]
(S*)(data + 20) -----\ 2 byte padding so next S is 4-byte aligned
(S*)(data + 32) --\ \---> S with 7-byte s[] using data[20]..[30]
\ 1 byte padding...
\-----> ...
Unfortunately, this is quite an inflexible data layout - you can't just grow the amount of data in an element's s member without schuffling all the other data out of the way and patching the index, but that's normal for arrays so if you were already considering using them then perhaps this will suit you. Another hassle is calculating the total size of S structs (including s[] and any padding) up front....

What is the cause of flexible array member not at end of struct error?

I am wondering why I keep getting error: flexible array member not at end of struct error when I call malloc. I have a struct with a variable length array, and I keep getting this error.
The struct is,
typedef struct {
size_t N;
double data[];
int label[];
} s_col;
and the call to malloc is,
col = malloc(sizeof(s_col) + lc * (sizeof(double) + sizeof(int)));
Is this the correct call to malloc?

You can only have one flexible array member in a struct, and it must always be the last member of the struct. In other words, in this case you've gone wrong before you call malloc, to the point that there's really no way to call malloc correctly for this struct.
To do what you seem to want (arrays of the same number of data and label members), you could consider something like:
struct my_pair {
double data;
int label;
};
typedef struct {
size_t N;
struct my_pair data_label[];
};
Note that this is somewhat different though: instead of an array of doubles followed by an array of ints, it gives you an array of one double followed by one int, then the next double, next int, and so on. Whether this is close enough to the same or not will depend on how you're using the data (e.g., for passing to an external function that expects a contiguous array, you'll probably have to do things differently).

Given a struct definition and a pointer to the start of a struct, it is necessary that the C compiler be able to access any member of the struct without having to access anything else. Since the location of each item within the structure is determined by the number and types of items preceding it, accessing any item requires that the number and types of all preceding items be known. In the particular case where the last item is an array, this poses no particular difficulty since accessing an item in an array requires knowing where it starts (which requires knowing the number and type of preceding items, rather than the number of items in the array itself), and the item index (which the compiler may assume to be smaller than the number of items for which space exists, without having to know anything about the array size). If a Flexible Array Member appeared anywhere other than at the end of a struct, though, the location of any items which followed it would depend upon the number of items in the array--something the compiler isn't going to know.

typedef struct {
size_t N;
double data[];
int label[];
} s_col;
You can't have
flexible array member (double data[]) in the middle. Consider hardcoded array size or double *data

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight