Creating structs and unions on MPI - c

EDITED:
Thanks for the previous assistance. I really appreciate it.
Should i remove the pointer from *list too?
In order to create atributo and estrucura i'm using pointers too, will it be a problem too?
union atributo{
int valor1;
char valor2[tamChar];
float valor3;
};
struct estructura{
int tipo[tamEstruct]; //Here i will have an array with the types of the union
union atributo *list[tamEstruct];
};
union atributo *atributos;
struct estructura *estructuras;
estructuras = malloc (sizeof(struct estructura) * (cantEstruct) );
MPI_Datatype atributo_MPI;
MPI_Datatype type[1] = { MPI_BYTE };
int blocklen[1] = { tamChar }; // the largest element is the chat
MPI_Aint disp[1];
disp[0]= atributos[0]; // the begin of the union
MPI_Datatype estructura_MPI;
MPI_Datatype type[2] = { MPI_INT, atributo_MPI };
int blocklen[2] = { tamEstruct, tamEstruct};
MPI_Aint disp2[2];
disp2[0]= offsetof(atributos, tipo);
disp2[1]= offsetof(atributos, list);
Am i getting close the correct code?

There's several things wrong in that bit of code.
First, structs and unions are very different in this context. A struct contains each of its elements listed, lined up one after another in memory. A union, on the other hand, is only as large as its largest element, and all of its members share the same memory space (as if they were the sole members of a struct). This means you can't pack a union into an MPI_Struct, since every member's offset is in fact 0.
There's two ways to tackle the problem of an array of unions:
Send them as an array of MPI_BYTE. This is simple, but risks data corruption if your processes don't share the same data representation (e.g. their integers differ in endianness).
Use MPI_Pack() to pack the values one by one. This is type safe, but more complicated, and it will result in an in-memory copy of your data.
Note that in either case, the receiving process will need to know what values it received (i.e. if it received 3 unions, are those 3 integers, or 2 integers and a float, etc). You'll have to send this information as well if you use unions.
You might want to consider a trickier approach: group the enums according to which of their value is valid, and send a message in which similar enums are placed next to each other (e.g. a message that contains an array of MPI_INTs, then an array of MPI_FLOATs and so on). With clever use of derived MPI datatypes, you can preserve type safety, avoid in-memory copying, and avoid sending several messages.
Second: NEVER, ever send pointers between MPI processes! It's easy to think of a char* as an array of chars, but it really isn't. It's just a memory address that won't make sense to the process you send it to, and even if it did, you didn't actually send the data that it was pointing to. If valor2 is a string with a reasonably short maximum length, I'd recommend declaring it as char valor2[maxLength] so that memory for it is allocated inside the actual struct/union. If that is not feasible, you'll have to do even more memory juggling to get your strings across to the other processes, as you would with any variable sized array.

Related

creating mpi derived data type, containing struct with pointer

I have a struct that is defined something like that:
typedef struct NodeItem {
int* data;
int info1;
int info2;
struct NodeItem* next;
} *Node;
I need to send this struct to another MPI process.
I know that I should use MPI_INT for info1 and info2 when defining the derived data type.
However, I struggle to define a derived data type with my pointers.
"data" points to an array of integers, and its size is only known at runtime.
"next" points to the next item in my linked list.
How should I define my derived data type if my struct contains pointers?
Thanks in advance,
Dvir.
To quote myself from another similar question:
NEVER, ever send pointers between MPI processes! It's easy to think of a char* as an array of chars, but it really isn't. It's
just a memory address that won't make sense to the process you send it
to, and even if it did, you didn't actually send the data that it was
pointing to.
In other words, instead of sending the pointers, you need to send the data that the pointers point to, and the receiving process has to populate the values of its pointers itself. In your case, I'd split the send into two rounds:
In round 1, you send info1 and info2 from each struct (either as an array of 2-integer vectors, or just as an array of integers). The receiving process allocates the appropriate size of NodeItems, and in each one, leaves data uninitialized. info1 and info2 are filled from the received data, while next is determined locally (as it is originally on the sending process).
In round 2, you create a message from all the data that your data pointers are pointing to. The receiving process now allocates space for each NodeItem, and sets the data pointer to the allocated address.
For the sending in round 2, you can define an MPI_Indexed datatype. The number of blocks is the number of NodeItem objects, the lengths are the sizes of each array, and the displacements can be calculated by calling MPI_Address() with the first element of the array (not the data pointer!) as input.

What are the real benefits of flexible array member?

After reading some posts related to flexible array member, I am still not fully understand why we need such a feature.
Possible Duplicate:
Flexible array members in C - bad?
Is this a Flexible Array Struct Members in C as well?
(Blame me if I didn't solve my problem from the possible duplicate questions above)
What is the real difference between the following two implementations:
struct h1 {
size_t len;
unsigned char *data;
};
struct h2 {
size_t len;
unsigned char data[];
};
I know the size of h2 is as if the flexible array member (data) were omitted, that is, sizeof(h2) == sizeof(size_t). And I also know that the flexible array member can only appear as the last element of a structure, so the original implementation can be more flexible in the position of data.
My real problem is that why C99 add this feature? Simply because sizeof(h2) doesn't contain the real size of data? I am sure that I must miss some more important points for this feature. Please point it out for me.
The two structs in your post don't have the same structure at all. h1 has a integer and a pointer to char. h2 has an integer, and an array of characters inline (number of elements determined at runtime, possibly none).
Said differently, in h2 the character data is inside the struct. In h1 it has to be somewhere outside.
This makes a lot of difference. For instance, if you use h1 you need to take care of allocating/freeing the payload (in addition to the struct itself). With h2, only one allocation/free is necessary, everything is packaged together.
One case where using h2 might make sense is if you're communicating with something that expects messages in the form of {length,data} pairs. You allocate an instance of h2 by requesting sizeof(h2)+how many payload chars you want, fill it up, and then you can transfer the whole thing in a single write (taking care about endianess and such of course). If you had used h1, you'd need two write calls (unless you want to send the memory address of the data, which usually doesn't make any sense).
So this feature exists because it's handy. And various (sometimes non-portable) tricks where used before that to simulate this feature. Adding it to the standard makes sense.
The main reason the Committee introduced flexible array members is to implement the famous struct hack. See the below quote from the C99 Rationale, especially the part I add the emphasis.
Rationale for International Standard — Programming Languages — C §6.7.2.1 Structure and union specifiers
There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
struct s
{
int n_items;
/* possibly other fields */
int items[1];
};
struct s *p;
size_t n, i;
/* code that sets n omitted */
p = malloc(sizeof(struct s) + (n - 1) * sizeof(int));
/* code to check for failure omitted */
p->n_items = n;
/* example usage */
for (i = 0; i < p->n_items; i++)
p->items[i] = i;
The validity of this construct has always been questionable. In the response to one Defect
Report, the Committee decided that it was undefined behavior because the array p->items
contains only one item, irrespective of whether the space exists. An alternative construct was suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call, this is used in the same way as the struct hack, but is now explicitly valid code.
There are a few restrictions on flexible array members that ensure that code using them makes sense. For example, there must be at least one other member, and the flexible array must occur last. Similarly, structures containing flexible arrays can't occur in other structures or in arrays. Finally, sizeof applied to the structure ignores the array but counts any padding before it. This makes the malloc call as simple as possible.
I don't know if this is considered as an important point, but GCC docs points this out:
GCC allows static initialization of flexible array members. This is equivalent to defining a new structure containing the original structure followed by an array of sufficient size to contain the data. E.g. in the following, f1 is constructed as if it were declared like f2.
struct f1 {
int x; int y[];
} f1 = { 1, { 2, 3, 4 } };
struct f2 {
struct f1 f1; int data[3];
} f2 = { { 1 }, { 2, 3, 4 } };
(taken from http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html)

What is there to be gained by deterministic field ordering in the memory layout?

Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?
The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.
If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).
Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.

Creating an MPI_Datatype for a structure containing pointers

I have the following structure.
typedef struct
{
int *Ai;
double *Ax;
int nz;
}column;
I want to transfer this structure using MPI_Send and MPI_Receive. How do I create an MPI_Datatype for this structure?
MPI is designed to work with arrays of structures rather that with structures of arrays.
The MPI_Hindexed that #suszterpatt proposed is a terrible hack. It will only allow you to send one element of the structure type and only the element that was used to define the MPI data type. For other variables of the same structure type it is mostly guaranteed that the computed offsets will be wrong. Besides Hindexed types use one and the same MPI data type for all elements and thus does not allow you to send both ints and doubles.
The wise thing to do is to transform your program to use arrays of structures:
typedef struct
{
int i;
double z;
} point;
typedef struct
{
point *A;
int nz;
} column;
Now you can create an MPI structured type point_type and use it to send nz elements of that type giving column.A as the buffer address:
int lens[3];
MPI_Aint base, disps[2];
MPI_Datatype oldtypes[2], point_struct, point_type;
MPI_Get_address(&point, disps);
MPI_Get_address(&point.z, disps+1);
base = disps[0];
lens[0] = 1; disps[0] = MPI_Aint_diff(disps[0], base); oldtypes[0] = MPI_INT;
lens[1] = 1; disps[1] = MPI_Aint_diff(disps[1], base); oldtypes[1] = MPI_DOUBLE;
MPI_Type_create_struct(2, lens, disps, oldtypes, &point_struct);
MPI_Type_create_resized(point_struct, 0, sizeof(point), &point_type);
MPI_Type_commit(&point_type);
MPI_Send(column.A, column.nz, point_type, ...);
This first creates an MPI datatype point_struct that describes the layout of the structure members, but does not account for any padding at the end and therefore cannot be used to reliably send an array of such structures. Therefore, a second datatype point_type with the correct extent is created using MPI_Type_create_resized.
On the receiver side you would peek the message with MPI_Probe, extract the number of elements with MPI_Get_count with a type of point_type (that goes straight to the nz field), allocate the A field and use it in MPI_Recv to receive the nz elements:
MPI_Status status;
MPI_Probe(source, tag, comm, &status);
MPI_Get_count(&status, point_type, &column.nz);
if (nz == MPI_UNDEFINED)
... non-integral message was received, do something
column.A = (point *)malloc(column.nz*sizeof(point));
MPI_Recv(column.A, column.nz, point_type, source, tag, comm, MPI_STATUS_IGNORE);
If that code change is impossible you can still go through the intermediate step of transforming your structure before sending it, a process usually called (un-)marshaling. In your case do something like this (I assume that you store the number of array elements in both Ai and Ax in the nz field):
point *temp = (point *)malloc(nz*sizeof(point));
for (int i = 0; i < column.nz; i++)
{
temp[i].i = column.Ai[i];
temp[i].z = column.Az[i];
}
MPI_Send(temp, nz, point_type, ...);
free(temp);
On the receiver side you must do the opposite: allocate a large enough buffer that can hold the structure, receive the message in it and then do the opposite transformation.
Once again, you do not need to transmit the actual value of nz since it can be easily extracted from the length of the message using MPI_Get_count.
Sending pointers to another machine is pointless (no pun intended). Due to virtual addressing, the pointer will likely point to an invalid memory location on the receiving machine, and even if not, you haven't actually sent the data that it was pointing to.
However, with proper use of MPI_Address() and an MPI_Hindexed datatype, it is possible to describe the memory layout of your data (I'm assuming that your pointers point to dynamic arrays). E.g. if Ai points to 3 ints, and Ax points to 5 doubles, you'll need a Hindexed type with 3 blocks: 3 MPI_INTs, 5 MPI_DOUBLEs, and 1 MPI_INT, with the offsets acquired using MPI_Address().
Don't forget to redefine and recommit the datatype if you change the number of items to be sent or reallocate the arrays entirely. And if you're sending multiple structs, you'll have to define and commit this datatype for each one, since your MPI datatype is specific to one particular instance of these structs.
Also keep in mind that you'll have to do some similarly tricky unpacking on the receiving end if you want to recreate the original struct.
"The wise thing to do is to transform your program to use arrays of structures"
Often that's conceptually also better.
I would like to point out another mechanism: using MPI_Pack and MPI_Unpack. For instance, with the original structure you could pack the first integer, then pack the two arrays. The receiver would unpack the integer and then know how many of the other thingies to unpack.
This is also a good solution if your object is not directly accessible but can only be accessed through an iterator or so.

fastest multi-type array solution?

I need high-performance iteration over transient arrays (on stack and/or in heap) which can store mixed types of data, including various types of pointers.
I thought of using unions to determine largest size of several supported array members.
is the following, the fastest (safe) architecture and solution?
union array_sizer {
void *(* funcPtr)();
void *dataPtr;
struct {int i} *strPtr;
int intVal;
float floatVal;
}
// create an array of 10 item *pairs*.
union array_sizer *myArray = malloc(22 * sizeof(union array_sizer));
// fill up the (null-terminated) array
// then, knowing that every *even* item is an int...
for(int i=0; myArray[i].intVal; i+=2){
//(... do something in loop ...)
}
the array data will be created from functions which enforce data integrity, so the for loop can be pretty skimpy on error checking beyond null-termination.
I would make a parallel 'look up table' which indexes into the original array to say what type it is. So make an enum representing the types and make that a corresponding array.
However if you look at performance if you do it this way you will get page faulting and cache misses because likely the 2 arrays are going to be on different pages. So to get round this what you want to do is instead of a 'struct of arrays' make an 'array of structs'. To do this create a struct which has 2 members : the enum type constant and the data itself. If you do this, when we fetch an index it will ensure the data and the corresponding type information will be on the same page.
This would be my preferred method from a high level design point of view.

Resources