Related
I would like to have an example showing how to use MPI_Type_create_subarray to build 2D cyclic distribution for large matrix.
I know that MPI_Type_create_darray will give me 2D cyclic distribution, but it is not compatible with SCALAPACK process grid.
I would to do 2d block cyclic distribution using MPI_Type_create_subarray and pass the matrices to SCALAPACK routines.
Could I have an example showing this?
There are at least two parts to your question. The following sections address these two component pieces, but leave integration of the two to you. The example code contained below in both sections, along with explanations provided in the ScaLapack link below should provide some guidance...
From DeinoMPI:
The following sample code illustrates MPI_Type_create_subarray.
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int myrank;
MPI_Status status;
MPI_Datatype subarray;
int array[9] = { -1, 1, 2, 3, -2, -3, -4, -5, -6 };
int array_size[] = {9};
int array_subsize[] = {3};
int array_start[] = {1};
int i;
MPI_Init(&argc, &argv);
/* Create a subarray datatype */
MPI_Type_create_subarray(1, array_size, array_subsize, array_start, MPI_ORDER_C, MPI_INT, &subarray);
MPI_Type_commit(&subarray);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0)
{
MPI_Send(array, 1, subarray, 1, 123, MPI_COMM_WORLD);
}
else if (myrank == 1)
{
for (i=0; i<9; i++)
array[i] = 0;
MPI_Recv(array, 1, subarray, 0, 123, MPI_COMM_WORLD, &status);
for (i=0; i<9; i++)
printf("array[%d] = %d\n", i, array[i]);
fflush(stdout);
}
MPI_Finalize();
return 0;
}
And from ScaLapack in C essentials:
Unfortunately, there is no C interface for ScaLAPACK or PBLAS.All
parametersshould be passed into routines and functionsby reference,
you can also define constants (i_one for 1, i_negone for -1, d_two for
2.0E+0 etc.) to pass into routines.Matrices should bestoredas 1d array(A[ i + lda*j ], not A[i][j])
To invoke ScaLAPACK routines in your program, you should first
initialize grid via BLACS routines (BLACS is enough). Second, you
should distribute your matrix over process grid (block cyclic 2d
distribution). You can do this by means of pdgeadd_ PBLAS routine.
This routine cumputes sum of two matrices A, B: B:=alphaA+betaB).
Matrices can have different distribution,in particularmatrixA can be
owned by only one process, thus, setting alpha=1, beta=0 you cansimply
copy your non-distributed matrix A into distributed matrix B.
Third, call pdgeqrf_ for matrix B. In the end of ScaLAPACK part of
code, you can collect results on one process (just copy distributed
matrix into local one via pdgeadd_). Finally, close grid via
blacs_gridexit_ and blacs_exit_.
After all, ScaLAPACK-using program should contain following:
void main(){
// Useful constants
const int i_one = 1, i_negone = -1, i_zero = 0;
const double zero=0.0E+0, one=1.0E+0;
... (See the rest of code in linked location above...)
I want to transport a struct between processes and for that I am trying to create a MPI struct. The code is for an Ant Colony Optimization (ACO) Algorithm.
The header file with he C struct contains:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h>
#include <mpi.h>
/* Constants */
#define NUM_CITIES 100 // Number of cities
//among others
typedef struct {
int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
double tour_distance;
} ACO_Ant;
I tried to build my code as suggested in this thread.
Program code:
int main(int argc, char *argv[])
{
MPI_Datatype MPI_TABU, MPI_PATH, MPI_ANT;
// Initialize MPI
MPI_Init(&argc, &argv);
//Determines the size (&procs) of the group associated with a communicator (MPI_COMM_WORLD)
MPI_Comm_size(MPI_COMM_WORLD, &procs);
//Determines the rank (&rank) of the calling process in the communicator (MPI_COMM_WORLD)
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_TABU);
MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_PATH);
MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
// Create ant struct
//int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
//double tour_distance;
int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};
MPI_Datatype tmp_type;
MPI_Aint lb, extent;
MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);
printf("Return: %d\n" , MPI_Bcast(ant, NUM_ANTS, MPI_ANT, 0, MPI_COMM_WORLD));
}
But once the program reaches the MPI_Bcast command, it crashes with Error Code 11, which I presume is MPI_ERR_TOPOLOGY as per this manual. is a segfault (signal 11).
I am also unsure about some of the code why the author of the original program -
Can some one explain why they create
MPI_Aint displacements[3];
MPI_Datatype typelist[3];
of size 3, when the struct has 2 variables?
int block_lengths[2];
Code:
void ACO_Build_best(ACO_Best_tour *tour, MPI_Datatype *mpi_type /*out*/)
{
int block_lengths[2];
MPI_Aint displacements[3];
MPI_Datatype typelist[3];
MPI_Aint start_address;
MPI_Aint address;
block_lengths[0] = 1;
block_lengths[1] = NUM_CITIES;
typelist[0] = MPI_DOUBLE;
typelist[1] = MPI_INT;
displacements[0] = 0;
MPI_Address(&(tour->distance), &start_address);
MPI_Address(tour->path, &address);
displacements[1] = address - start_address;
MPI_Type_struct(2, block_lengths, displacements, typelist, mpi_type);
MPI_Type_commit(mpi_type);
}
All and any help will be appreciated.
Edit: help with solving the problem, not marginally useful StackOverflow jargon
This part is wrong:
int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};
The MPI_TABU and MPI_PATH datatypes already cover NUM_CITIES elements. When you specify the corresponding block size to also be NUM_CITIES, the resultant datatype will try to access NUM_CITIES * NUM_CITIES elements, likely resulting in a segfault (signal 11).
Either set all elements of blocklengths to 1 or replace MPI_TABU and MPI_PATH in the types array with MPI_INT.
This part is also wrong:
MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);
Calling MPI_Type_create_resized with the values returned by MPI_Type_get_extent is meaningless since it just duplicates the type without actually resizing it. Using sizeof(MPI_ANT) is wrong since MPI_ANT is not a C type but an MPI handle, which is either an integer index or a pointer (implementation-dependent). It will work with sizeof(ant) if ant is of type ACO_Ant, but given you call MPI_Bcast(ant, NUM_ANTS, ...), then ant is either a pointer, in which case sizeof(ant) is just the pointer size, or it is an array, in which case sizeof(ant) is NUM_ANTS times larger than it must be. The correct call is:
MPI_Type_create_resized(tmp_type, 0, sizeof(ACO_Ant), &ant_type);
MPI_Type_commit(&ant_type);
And please, never use MPI_ as prefix in your own variable or function names. This makes the code unreadable and is very misleading ("is that a predefined MPI datatype or a user-defined one?")
As for the last question, the author might have had a different structure in mind. Nothing stops you from using larger arrays as long as you call MPI_Type_create with the correct number of significant elements.
Note: You don't have to commit MPI datatypes that are never used directly in communication calls. I.e., those two lines are unnecessary:
MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
I am new to MPI programing. I have a 8 by 10 array that I need to use to find the summation of each row parallely. In rank 0 (process 0), it will generate the 8 by 10 matrix using a 2 dimensional array. Then I would use tag number as the first index value(row number) of the array. This way, I can use a unique buffer to send through Isend. However, it looks like my method of tag number generation for Isend is not working. Can you please look in to the following code and tell me if I am passing the 2D array correctly and tag number. When I run this code, it stop just after executing rannk 1 and waits. I use 3 process for this example and use the command mpirun -np 3 test please let me know how to tackle this problem with an example if possible.
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int tag = 1;
int arr[8][10];
MPI_Request request;
MPI_Status status;
int source = 0;
int dest;
printf ("\n--Current Rank: %d\n", world_rank);
if (world_rank == 0)
{
int i = 0;
int a, b, x, y;
printf("* Rank 0 excecuting\n");
for(a=0; a<8/(world_size-1); a++)//if -np is 3, this will loop 4 times
{
for(b=0; b<(world_size-1); b++)//if -np is 3, this loops will loop 2 times
{//So, if -np is 3, due to both of these loops, Isend will be called 8 times
dest = b+1;
tag = a+b;//create a uniqe tag value each time, which can be use as first index value of array
//Error: This tag value passing to Isend doesn't seems to be workiing
MPI_Isend(&arr[tag][0], 10, MPI_INT, dest, tag, MPI_COMM_WORLD, &request);
}
}
for(x=0; x<8; x++)//Generating the whole 8 by 10 2D array
{
i++;
for ( y = 0; y < 10; y++ )
{
arr[x][y] = i;
}
}
}
else
{
int a, b;
for(b=1; b<=8/(world_size-1); b++)
{
int sum = 0;
int i;
MPI_Irecv(&arr[tag][0], 10, MPI_INT, source, tag, MPI_COMM_WORLD, &request);
MPI_Wait (&request, &status);
//Error: not getting the correct tag value
for(i = 0; i<10; i++)
{
sum = arr[tag][i]+sum;
}
printf("\nSum is: %d at rank: %d and tag is:%d\n", sum, world_rank, tag);
}
}
MPI_Finalize();
}
The tag issue is because of how the tag is computed (or not) on different processes. You're initializing the tag values for all processes as
int tag = 1;
and later, for process rank 0 you set the tag to
tag = a+b;
which, for the first time this is set, will set tag to 0 because both a and b start out as zero. However, for processes with rank above 0, the tag is never changed. They will continue to have the tag set to 1.
The tag uniquely identifies the message being sent by MPI_Isend and MPI_Irecv, which means that a send and its corresponding receive must have the same tag for the data transfer to succeed. Because the tags are mismatched between processes for most of the receives, the transfers are mostly unsuccessful. This causes processes with rank higher than 0 to eventually block (wait) forever on the call to MPI_Wait.
In order to fix this, you have to make sure to change the tags for the processes with rank above zero. However, before we can do that, there's a few other issues worth touching up on.
With the way you've set your tag for the rank 0 process right now, tag can only ever have values 0 to 4 (assuming 3 processes). This is because a is limited to the range 0 to 3, and b can only have values 0 or 1. The maximum possible sum of these values is 4. This means that when you access your array using arr[tag][0], you will miss out on a lot of the data, and you'll re-send the same rows several times. I recommend changing the way you approach sending each subarray (which you're currently accessing with tag) so that you have only one for loop to determine which subarray to send, rather than two embedded loops. Then, you can calculate the process to send the array to as
dest = subarray_index%(world_size - 1) + 1;
This will alternate the desitnations between the processes with rank greater than zero. You can keep the tag as just subarray_index. On the receiving side you'll need to calculate the tag per process, per receive.
Finally, I saw that you were initializing your array after you sent the data. You want to do that beforehand.
Combining all these aspects, we get
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int tag = 1;
int arr[8][10];
MPI_Request request;
MPI_Status status;
int source = 0;
int dest;
printf ("\n--Current Rank: %d\n", world_rank);
if (world_rank == 0)
{
int i = 0;
int a, b, x, y;
printf("* Rank 0 excecuting\n");
//I've moved the array generation to before the sends.
for(x=0; x<8; x++)//Generating the whole 8 by 10 2D array
{
i++;
for ( y = 0; y < 10; y++ )
{
arr[x][y] = i;
}
}
//I added a subarray_index as mentioned above.
int subarray_index;
for(subarray_index=0; subarray_index < 8; subarray_index++)
{
dest = subarray_index%(world_size - 1) + 1;
tag = subarray_index;
MPI_Isend(&arr[subarray_index][0], 10, MPI_INT, dest, tag, MPI_COMM_WORLD, &request);
}
}
else
{
int a, b;
for(b=0; b<8/(world_size-1); b++)
{
int sum = 0;
int i;
//We have to do extra calculations here. These match tag, dest, and subarray.
int my_offset = world_rank-1;
tag = b*(world_size-1) + my_offset;
int subarray = b;
MPI_Irecv(&arr[subarray][0], 10, MPI_INT, source, tag, MPI_COMM_WORLD, &request);
MPI_Wait (&request, &status);
for(i = 0; i<10; i++)
{
sum = arr[subarray][i]+sum;
}
printf("\nSum is: %d at rank: %d and tag is:%d\n", sum, world_rank, tag);
}
}
MPI_Finalize();
}
There's a one thing that still seems a bit unfinished in this version for you to consider: what will happen if your number of processes changes? For example, if you have 4 processes instead of 3, it looks like you may run into some trouble with the loop
for(b=0; b<8/(world_size-1); b++)
because each process will execute it the same number of times, but the amount of data sent doesn't cleanly split for 3 workers (non-rank-zero processes).
However, if that is not a concern to you, then you do not need to handle such cases.
Aside from the obvious question: "why on earth would you want to do that?", there are so many problems here that I'm not sure I'll be able to list them all. I'll give it a try though:
Tag: it seems that the bulk of your method is to use the tag as an indicator of where to look for the receiver. But there are (at least) two major flaws here:
Since tag isn't know before reception, what is &arr[tag][0] supposed to be?
Tags in MPI are messages "identifier"... On normal circumstances a given communication (send and matching receive) should have a matching tag. This can be alleviated by using MPI_ANY_TAG special tag on the receiving side, and retrieving its actual value using the MPI_TAG field of the reception's status. But that's another story.
Bottom line here is that the method isn't such a good one.
Data initialisation: one of the major principles of non-blocking MPI communications is that you should never modify a buffer you used for a communication between the post of the communication (the MPI_Isend() here) and its finalisation (which is missing here). Therefore, your data generation must happen before the attempts to communicate the data.
Speaking of which, communication finalisation: you have too finalise your sending communications. This can be done using either a wait-type call (MPI_Wait() or MPI_Waitall()), or an "infinite" loop of test-type calls (MPI_Test() and such)...
The MPI_Irecv(): why are you using a non-blocking receive when the very next call is MPI_Wait()? If you want a blocking receive, just call MPI_Recv() directly.
So fundamentally, what you try to do here doesn't look right. Therefore, I'm very reluctant in trying to propose you a corrected version since I don't understand the actual problem you try to solve. Is this code a reduced version of a bigger real one (or an initial version of something supposed to grow), or just a toy example meant for you to learn how MPI send/receive works? Is ther any fundamental reason why you're not using a collective communication such as MPI_Scatter()?
Depending on your answer on these questions, I can try to produce a valid version.
I would like to gather data from arrays of double and organize them at the same time. Say we have 2 MPI ranks:
if(rank == 0)
P = {0,1,4,5,8,9};
else
P = {2,3,6,7,10,11};
How could I gather the information located in P and locate them in order, i.e: P in the master should contain P= [0 1 2....9 10 11]
I could gather P as it is, and then reorganizing it in the root however this approach would not be very efficient as P is increased. I have tried creating an MPI_Type_vector however I have not managed to get it right yet. Any ideas?
It depends a little bit on what you mean by "in order". If you mean that, as in the above example, each vector is made up of blocks of data and you want those blocks interleaved in a fixed known order, yes, you can certainly do this. (The question could also be read to be asking if you can do a sort as part of the gather; that's rather harder.)
You have the right approach; you want to send the data as is, but receive the data into specified chunks broken up by processor. Here, the data type you want to receive into looks like this:
MPI_Datatype vectype;
MPI_Type_vector(NBLOCKS, BLOCKSIZE, size*BLOCKSIZE, MPI_CHAR, &vectype);
That is, for a given processor's input, you're going to receive it into NBLOCKS blocks of size BLOCKSIZE, each separated by however many processors there are times the blocksize. As it is, you could receive into that type; to gather into that type, however, you need to set the extents so that the data from each processor is gathered into the right place:
MPI_Datatype gathertype;
MPI_Type_create_resized(vectype, 0, BLOCKSIZE*sizeof(char), &gathertype);
MPI_Type_commit(&gathertype);
The reason for that resizing is given in, for instance, this answer, and likely elsewhere on this site as well.
Putting this together into sample code gives us the following:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
const int BLOCKSIZE=2; /* each block of data is 2 items */
const int NBLOCKS =3; /* each task has 3 such blocks */
char locdata[NBLOCKS*BLOCKSIZE];
for (int i=0; i<NBLOCKS*BLOCKSIZE; i++)
locdata[i] = 'A' + (char)rank; /* rank 0 = 'AAA..A'; rank 1 = 'BBB..B', etc */
MPI_Datatype vectype, gathertype;
MPI_Type_vector(NBLOCKS, BLOCKSIZE, size*BLOCKSIZE, MPI_CHAR, &vectype);
MPI_Type_create_resized(vectype, 0, BLOCKSIZE*sizeof(char), &gathertype);
MPI_Type_commit(&gathertype);
char *globaldata = NULL;
if (rank == 0) globaldata = malloc((NBLOCKS*BLOCKSIZE*size+1)*sizeof(char));
MPI_Gather(locdata, BLOCKSIZE*NBLOCKS, MPI_CHAR,
globaldata, 1, gathertype,
0, MPI_COMM_WORLD);
if (rank == 0) {
globaldata[NBLOCKS*BLOCKSIZE*size] = '\0';
printf("Assembled data:\n");
printf("<%s>\n", globaldata);
free(globaldata);
}
MPI_Type_free(&gathertype);
MPI_Finalize();
return 0;
}
Running gives:
$ mpirun -np 3 ./vector
Assembled data:
<AABBCCAABBCCAABBCC>
$ mpirun -np 7 ./vector
Assembled data:
<AABBCCDDEEFFGGAABBCCDDEEFFGGAABBCCDDEEFFGG>
I am writing a simple code to learn how to define an MPI_Datatype and use it in conjunction with MPI_Gatherv. I wanted to make sure I could combine variable length, dynamically allocated arrays of structured data on a process, which seems to be working fine, up until my call to MPI_Finalize(). I have confirmed that this is where the problem starts to manifest itself by using print statements and the Eclipse PTP debugger (backend is gdb-mi). My main question is, how can I get rid of the segmentation fault?
The segfault does not occur every time I run the code. For instance, it hasn't happened for 2 or 3 processes, but tends to happen regularly when I run with about 4 or more processes.
Also, when I run this code with valgrind, the segmentation fault does not occur. However, I do get error messages from valgrind, though the output is difficult for me to understand when I use MPI functions, even with a large number of targeted suppressions. I am also concerned that if I use more suppressions, I will silence a useful error message.
I compile the normal code using these flags, so I am using the C99 standard in both cases:
-ansi -pedantic -Wall -O2 -march=barcelona -fomit-frame-pointer -std=c99
and the debugged code with:
-ansi -pedantic -std=c99 -Wall -g
Both use the gcc 4.4 mpicc compiler, and are run on a cluster using Red Hat Linux with Open MPI v1.4.5. Please let me know if I have left out other important bits of information. Here is the code, and thanks in advance:
//#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
//#include <limits.h>
#include "mpi.h"
#define FULL_PROGRAM 1
struct CD{
int int_ID;
double dbl_ID;
};
int main(int argc, char *argv[]) {
int numprocs, myid, ERRORCODE;
#if FULL_PROGRAM
struct CD *myData=NULL; //Each process contributes an array of data, comprised of 'struct CD' elements
struct CD *allData=NULL; //root will dynamically allocate this array to store all the data from rest of the processes
int *p_lens=NULL, *p_disp=NULL; //p_lens stores the number of elements in each process' array, p_disp stores the displacements in bytes
int MPI_CD_size; //stores the size of the MPI_Datatype that is defined to allow communication operations using 'struct CD' elements
int mylen, total_len=0; //mylen should be the length of each process' array
//MAXlen is the maximum allowable array length
//total_len will be the sum of mylen across all processes
// ============ variables related to defining new MPI_Datatype at runtime ====================================================
struct CD sampleCD = {.int_ID=0, .dbl_ID=0.0};
int blocklengths[2]; //this describes how many blocks of identical data types will be in the new MPI_Datatype
MPI_Aint offsets[2]; //this stores the offsets, in bytes(bits?), of the blocks from the 'start' of the datatype
MPI_Datatype block_types[2]; //this stores which built-in data types the blocks are comprised of
MPI_Datatype myMPI_CD; //just the name of the new datatype
MPI_Aint myStruct_address, int_ID_address, dbl_ID_address, int_offset, dbl_offset; //useful place holders for filling the arrays above
// ===========================================================================================================================
#endif
// =================== Initializing MPI functionality ============================
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
// ===============================================================================
#if FULL_PROGRAM
// ================== This part actually formally defines the MPI datatype ===============================================
MPI_Get_address(&sampleCD, &myStruct_address); //starting point of struct CD
MPI_Get_address(&sampleCD.int_ID, &int_ID_address); //starting point of first entry in CD
MPI_Get_address(&sampleCD.dbl_ID, &dbl_ID_address); //starting point of second entry in CD
int_offset = int_ID_address - myStruct_address; //offset from start of first to start of CD
dbl_offset = dbl_ID_address - myStruct_address; //offset from start of second to start of CD
blocklengths[0]=1; blocklengths[1]=1; //array telling it how many blocks of identical data types there are, and the number of entries in each block
//This says there are two blocks of identical data-types, and both blocks have only one variable in them
offsets[0]=int_offset; offsets[1]=dbl_offset; //the first block starts at int_offset, the second block starts at dbl_offset (from 'myData_address'
block_types[0]=MPI_INT; block_types[1]=MPI_DOUBLE; //the first block contains MPI_INT, the second contains MPI_DOUBLE
MPI_Type_create_struct(2, blocklengths, offsets, block_types, &myMPI_CD); //this uses the above arrays to define the MPI_Datatype...an MPI-2 function
MPI_Type_commit(&myMPI_CD); //this is the final step to defining/reserving the data type
// ========================================================================================================================
mylen = myid*2; //each process is told how long its array should be...I used to define that randomly but that just makes things messier
p_lens = (int*) calloc((size_t)numprocs, sizeof(int)); //allocate memory for the number of elements (p_lens) and offsets from the start of the recv buffer(d_disp)
p_disp = (int*) calloc((size_t)numprocs, sizeof(int));
myData = (struct CD*) calloc((size_t)mylen, sizeof(struct CD)); //allocate memory for each process' array
//if mylen==0, 'a unique pointer to the heap is returned'
if(!p_lens) { MPI_Abort(MPI_COMM_WORLD, 1); exit(EXIT_FAILURE); }
if(!p_disp) { MPI_Abort(MPI_COMM_WORLD, 1); exit(EXIT_FAILURE); }
if(!myData) { MPI_Abort(MPI_COMM_WORLD, 1); exit(EXIT_FAILURE); }
for(double temp=0.0;temp<1e6;++temp) temp += exp(-10.0);
MPI_Barrier(MPI_COMM_WORLD); //purely for keeping the output organized by give a delay in time
for (int k=0; k<numprocs; ++k) {
if(myid==k) {
//printf("\t ID %d has %d entries: { ", myid, mylen);
for(int i=0; i<mylen; ++i) {
myData[i]= (struct CD) {.int_ID=myid*(i+1), .dbl_ID=myid*(i+1)}; //fills data elements with simple pattern
//printf("%d: (%d,%lg) ", i, myData[i].int_ID, myData[i].dbl_ID);
}
//printf("}\n");
}
}
for(double temp=0.0;temp<1e6;++temp) temp += exp(-10.0);
MPI_Barrier(MPI_COMM_WORLD); //purely for keeping the output organized by give a delay in time
MPI_Gather(&mylen, 1, MPI_INT, p_lens, 1, MPI_INT, 0, MPI_COMM_WORLD); //Each process sends root the length of the vector they'll be sending
#if 1
MPI_Type_size(myMPI_CD, &MPI_CD_size); //gets the size of the MPI_Datatype for p_disp
#else
MPI_CD_size = sizeof(struct CD); //using this doesn't change things too much...
#endif
for(int j=0;j<numprocs;++j) {
total_len += p_lens[j];
if (j==0) { p_disp[j] = 0; }
else { p_disp[j] = p_disp[j-1] + p_lens[j]*MPI_CD_size; }
}
if (myid==0) {
allData = (struct CD*) calloc((size_t)total_len, sizeof(struct CD)); //allocate array
if(!allData) { MPI_Abort(MPI_COMM_WORLD, 1); exit(EXIT_FAILURE); }
}
MPI_Gatherv(myData, mylen, myMPI_CD, allData, p_lens, p_disp, myMPI_CD, 0, MPI_COMM_WORLD); //each array sends root process their array, which is stored in 'allData'
// ============================== OUTPUT CONFIRMING THAT COMMUNICATIONS WERE SUCCESSFUL=========================================
if(myid==0) {
for(int i=0;i<numprocs;++i) {
printf("\n\tElements from %d on MASTER are: { ",i);
for(int k=0;k<p_lens[i];++k) { printf("%d: (%d,%lg) ", k, (allData+p_disp[i]+k)->int_ID, (allData+p_disp[i]+k)->dbl_ID); }
if(p_lens[i]==0) printf("NOTHING ");
printf("}\n");
}
printf("\n"); //each data element should appear as two identical numbers, counting upward by the process ID
}
// ==========================================================================================================
if (p_lens) { free(p_lens); p_lens=NULL; } //adding this in didn't get rid of the MPI_Finalize seg-fault
if (p_disp) { free(p_disp); p_disp=NULL; }
if (myData) { free(myData); myData=NULL; }
if (allData){ free(allData); allData=NULL; } //the if statement ensures that processes not allocating memory for this pointer don't free anything
for(double temp=0.0;temp<1e6;++temp) temp += exp(-10.0);
MPI_Barrier(MPI_COMM_WORLD); //purely for keeping the output organized by give a delay in time
printf("ID %d: I have reached the end...before MPI_Type_free!\n", myid);
// ====================== CLEAN UP ================================================================================
ERRORCODE = MPI_Type_free(&myMPI_CD); //this frees the data type...not always necessary, but a good practice
for(double temp=0.0;temp<1e6;++temp) temp += exp(-10.0);
MPI_Barrier(MPI_COMM_WORLD); //purely for keeping the output organized by give a delay in time
if(ERRORCODE!=MPI_SUCCESS) { printf("ID %d...MPI_Type_free was not successful\n", myid); MPI_Abort(MPI_COMM_WORLD, 911); exit(EXIT_FAILURE); }
else { printf("ID %d...MPI_Type_free was successful, entering MPI_Finalize...\n", myid); }
#endif
ERRORCODE=MPI_Finalize();
for(double temp=0.0;temp<1e7;++temp) temp += exp(-10.0); //NO MPI_Barrier AFTER MPI_Finalize!
if(ERRORCODE!=MPI_SUCCESS) { printf("ID %d...MPI_Finalize was not successful\n", myid); MPI_Abort(MPI_COMM_WORLD, 911); exit(EXIT_FAILURE); }
else { printf("ID %d...MPI_Finalize was successful\n", myid); }
return EXIT_SUCCESS;
}
The outer loop on k is bogus, but is not technically wrong -- it's just useless.
The real issue is that your displacements to MPI_GATHERV are wrong. If you run through valgrind, you'll see something like this:
==28749== Invalid write of size 2
==28749== at 0x4A086F4: memcpy (mc_replace_strmem.c:838)
==28749== by 0x4C69614: unpack_predefined_data (datatype_unpack.h:41)
==28749== by 0x4C6B336: ompi_generic_simple_unpack (datatype_unpack.c:418)
==28749== by 0x4C7288F: ompi_convertor_unpack (convertor.c:314)
==28749== by 0x8B295C7: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:216)
==28749== by 0x935723C: mca_btl_sm_component_progress (btl_sm_component.c:426)
==28749== by 0x51D4F79: opal_progress (opal_progress.c:207)
==28749== by 0x8B225CA: opal_condition_wait (condition.h:99)
==28749== by 0x8B22718: ompi_request_wait_completion (request.h:375)
==28749== by 0x8B231E1: mca_pml_ob1_recv (pml_ob1_irecv.c:104)
==28749== by 0x955E7A7: mca_coll_basic_gatherv_intra (coll_basic_gatherv.c:85)
==28749== by 0x9F7CBFA: mca_coll_sync_gatherv (coll_sync_gatherv.c:46)
==28749== Address 0x7b1d630 is not stack'd, malloc'd or (recently) free'd
Indicating that MPI_GATHERV was given bad information somehow.
(there are other valgrind warnings that come from libltdl inside Open MPI which are unfortunately unavoidable -- it's a bug in libltdl, and another from PLPA, which is also unfortunately unavoidable because it's intentionally doing that [for reasons that aren't interesting to discuss here])
Looking at your displacements computation, I see
total_len += p_lens[j];
if (j == 0) {
p_disp[j] = 0;
} else {
p_disp[j] = p_disp[j - 1] + p_lens[j] * MPI_CD_size;
}
But MPI gather displacements are in units of datatypes, not bytes. So it really should be:
p_disp[j] = total_len;
total_len += p_lens[j];
Making this change made the MPI_GATHERV valgrind warning go away for me.
This outer on 'k' loop is just bogus. It's body is only executed for k=myid (which is a constant for every running process). The k is never referenced inside the loop (except the comparison with the almost-constant myid).
Also, the line with mylen = myid*2; is frowned upon. I suggest you change it to a constant.
for (int k=0; k<numprocs; ++k) {
if(myid==k) {
//printf("\t ID %d has %d entries: { ", myid, mylen);
for(int i=0; i<mylen; ++i) {
myData[i]= (struct CD) {.int_ID=myid*(i+1), .dbl_ID=myid*(i+1)}; //fills data elements with simple pattern
//printf("%d: (%d,%lg) ", i, myData[i].int_ID, myData[i].dbl_ID);
}
//printf("}\n");
}
}
, so (given that myid is between 0 and numprocs) this whole silly construct can be reduced to:
for(int i=0; i<mylen; ++i) {
myData[i].int_ID=myid*(i+1);
myData[i].dbl_ID=myid*(i+1);
}