MPI_Bcast a dynamic 2d array of structs - c

I followed Jonathan's code from here ( MPI_Bcast a dynamic 2d array ) to MPI_Bcast a dynamically allocated 2d array of structs. The struct is as follows:
typedef struct {
char num[MAX];
int neg;
} bignum;
Since this is not a native datatype, I used MPI_Type_create_struct to commit this datatype.
I omit the code here because it works on another project I was doing.
The method I used to dynamically create ("contiguous?") the array is by calling:
bignum **matrix = creatematrix(DIMMAX,DIMMAX);
which is implemented as:
bignum ** creatematrix (int num_rows, int num_cols){
int i;
bignum *one_d = (bignum*)malloc (num_rows * num_cols * sizeof (bignum));
bignum **two_d = (bignum**)malloc (num_rows * sizeof (bignum *));
for (i = 0; i < num_rows; i++)
two_d[i] = &(one_d[i*num_cols]);//one_d + (i * num_cols);
return two_d;
}
Now I'd get input and store it inside matrix and call MPI_Bcast as:
MPI_Bcast(&(matrix[0][0]), DIMMAX*DIMMAX, Bignum, 0, MPI_COMM_WORLD);
There seems to be no segmentation fault, but the problem is that only the first row (matrix[0]) is broadcasted. All other ranks beside the root have data only for the first row. Other rows remain untouched.
It looks like there's something more than allocating a contiguous memory block. Is there anything I missed causing broadcast to be unsuccessful?
EDIT:
I fumbled on a weird workaround, using nested structs. Can anyone explain why this method works while the one above doesn't?
typedef struct {
char num[MAX];
int neg;
} bignum;
typedef struct {
bignum elements[DIMMAX];
} row;
int main(/*int argc,char *argv[]*/){
int dim, my_rank, comm_sz;
int i,j,k; //iterators
bignum global_sum = {{0},0};
row *matrix = (row*)malloc(sizeof(row)*DIMMAX);
row *tempmatrix = (row*)malloc(sizeof(row)*DIMMAX);
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
//CREATE DERIVED DATATYPE Bignum
MPI_Datatype Bignum;
MPI_Datatype type[4] = {MPI_LB, MPI_CHAR, MPI_INT, MPI_UB};
int blocklen[4] = {1,MAX,1,1};
MPI_Aint disp[4];
//get offsets
MPI_Get_address(&global_sum, disp);
MPI_Get_address(&global_sum.num, disp+1);
MPI_Get_address(&global_sum.neg, disp+2);
//MPI_Get_address(&local_sum+1, disp+3);
int base;
base = disp[0];
for (i=0;i<3;i++){
disp[i] -= base;
}
disp[3] = disp[2]+4; //int
MPI_Type_create_struct(4,blocklen,disp,type,&Bignum);
MPI_Type_commit(&Bignum);
//CREATE DERIVED DATATYPE BignumRow
MPI_Datatype BignumRow;
MPI_Datatype type2[3] = {MPI_LB, Bignum, MPI_UB};
int blocklen2[3] = {1,DIMMAX,1};
MPI_Aint disp2[3] = {0,0,DIMMAX*(MAX+4)};
MPI_Type_create_struct(3,blocklen2,disp2,type2,&BignumRow);
MPI_Type_commit(&BignumRow);
MPI_Bcast(&dim, 1, MPI_INTEGER, 0, MPI_COMM_WORLD);
MPI_Bcast(matrix, dim, BignumRow, 0, MPI_COMM_WORLD);

This seems to come up a lot: If you pass around multi-dimensional arrays with MPI in C, you sort of have to think of a multi-dimensional array as an array of arrays. I think in your first example you've gotten lucky that matrix[0][0] (due to the consecutive mallocs?) sits at the beginning of a large block of memory (and hence, no seg fault).
In the second (working) example, you've described the memory addresses of not just the beginning of the memory region (matrix[0][0]), but also all the internal pointers as well.

Related

C pointer changes without modifying it

I am getting some weird pointer-related errors that I do not understand how can happen and how to solve it. Since I am new to C any good coding practice advice is also appreciated.
typedef struct matrix {
int n;
int m;
int **data;
int *inverses;
} Matrix;
Matrix m_new(int n, int m, int *inverses) {
int **data = malloc(n);
for (int k = 0; k < n; k++) {
int *row = malloc(m + 1);
*row++ = k;
*(data+k) = row;
// print_arr(*data); - introduced for debugging purposes
}
Matrix mat = {
.n = n,
.m = m,
.data = data,
.inverses = inverses,
};
return mat;
}
void print_arr(int *arr) {
int len = *(arr-1);
printf("len: %d\n", len);
}
Here, I am going to use this matrix for Gaussian Elimination over the integers mod some number, that's why I have the inverses array, and I use int **, so that I can swap rows easily later. For arrays represented as int * I use the -1th entry to store the length of the array. I got some weird bugs, so I wrote a small function to get the length of an array (print_arr) and check the first array in data. I got the following(for n=m=5):
len: 0
len: 0
len: 0
len: 0
len: -1219680380
Which I find really weird, because it seems like that something is changing in the last iteration, but I don't know why.
Your allocations are too small. You need to multiply the number of elements by the size of the elements to compute how much memory to allocate:
int **data = malloc(n * sizeof (int *));
int *row = malloc((m + 1) * sizeof (int));
With your current code, you overflow your too-small buffers and cause undefined behavior.

An efficient way to perform an all reduction in MPI of a value based on another variable?

As an example, lets say I have
int a = ...;
int b = ...;
int c;
where a is the result of some complex local calculation and b is some metric for the quality of a.
I'd like to send the best value of a to every process and store it in c where best is defined by having the largest value of b.
I guess I'm just wondering if there is a more efficient way of doing this than doing an allgather on a and b and then searching through the resulting arrays.
The actual code involves sending and comparing several hundred values on upto several hundred/thousand processes, so any efficiency gains would be welcome.
I guess I'm just wondering if there is a more efficient way of doing
this than doing an allgather on a and b and then searching through the
resulting arrays.
This can be achieved with only a single MPI_AllReduce.
I will present two approaches, a simpler one (suitable for your use case); and a more generic one, for more complex use-cases. The latter will also be useful to show case MPI functionality such as custom MPI Datatypes and custom MPI reduction operators.
Approach 1
To represent
int a = ...;
int b = ...;
you could use the following struct:
typedef struct MyStruct {
int b;
int a;
} S;
then you can use the MPI Datatype MPI_2INT and the MPI operator MAXLOC
The operator MPI_MINLOC is used to compute a global minimum and also
an index attached to the minimum value. **MPI_MAXLOC similarly computes
a global maximum and index. One application of these is to compute a
global minimum (maximum) and the rank of the process containing this
value.
In your case, instead of the rank we will be using the value of 'a'. Hence, the MPI_AllReduce call:
S local, global;
...
MPI_Allreduce(&local, &global, 1, MPI_2INT, MPI_MAXLOC, MPI_COMM_WORLD);
The complete code would look like the following:
#include <stdio.h>
#include <mpi.h>
typedef struct MyStruct {
int b;
int a;
} S;
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
// Some fake data
S local, global;
local.a = world_rank;
local.b = world_size - world_rank;
MPI_Allreduce(&local, &global, 1, MPI_2INT, MPI_MAXLOC, MPI_COMM_WORLD);
if(world_rank == 0){
printf("%d %d\n", global.b, global.a);
}
MPI_Finalize();
return 0;
}
Second Approach
The MPI_MAXLOC only works for a certain number of predefined datatypes. Nonetheless, for the remaining cases you can use the following approach (based on this SO thread):
Create a struct that will contain the values a and b;
Create a customize MPI_Datatype representing the 1. struct to be sent across processes;
Use MPI_AllReduce:
int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
Combines values from all processes and distributes the result back to
all processes
Use the operation MAX;
I'd like to send the best value of 'a' to every process and store it in
'c' where best is defined by having the largest value of 'b'.
Then you have to tell MPI to only consider the element b of the struct. Hence, you need to create a custom MPI_Op max operation.
Coding the approach
So let us break step-by-step the aforementioned implementation:
First define the struct:
typedef struct MyStruct {
double a, b;
} S;
Second create the customize MPI_Datatype:
void defineStruct(MPI_Datatype *tstype) {
const int count = 2;
int blocklens[count];
MPI_Datatype types[count];
MPI_Aint disps[count];
for (int i=0; i < count; i++){
types[i] = MPI_DOUBLE;
blocklens[i] = 1;
}
disps[0] = offsetof(S,a);
disps[1] = offsetof(S,b);
MPI_Type_create_struct(count, blocklens, disps, types, tstype);
MPI_Type_commit(tstype);
}
Very Important
Note that since we are using a struct you have to be careful with the fact that (source)
the C standard allows arbitrary padding between the fields.
So reducing a struct with two doubles is NOT the same as reducing an array with two doubles.
In the main you have to do:
MPI_Datatype structtype;
defineStruct(&structtype);
Third create the custom max operation:
void max_struct(void *in, void *inout, int *len, MPI_Datatype *type){
S *invals = in;
S *inoutvals = inout;
for (int i=0; i < *len; i++)
inoutvals[i].b = (inoutvals[i].b > invals[i].b) ? inoutvals[i].b : invals[i].b;
}
in the main do:
MPI_Op maxstruct;
MPI_Op_create(max_struct, 1, &maxstruct);
Finally, call the MPI_AllReduce:
S local, global;
...
MPI_Allreduce(&local, &global, 1, structtype, maxstruct, MPI_COMM_WORLD);
The entire code put together:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
typedef struct MyStruct {
double a, b;
} S;
void max_struct(void *in, void *inout, int *len, MPI_Datatype *type){
S *invals = in;
S *inoutvals = inout;
for (int i=0; i<*len; i++)
inoutvals[i].b = (inoutvals[i].b > invals[i].b) ? inoutvals[i].b : invals[i].b;
}
void defineStruct(MPI_Datatype *tstype) {
const int count = 2;
int blocklens[count];
MPI_Datatype types[count];
MPI_Aint disps[count];
for (int i=0; i < count; i++) {
types[i] = MPI_DOUBLE;
blocklens[i] = 1;
}
disps[0] = offsetof(S,a);
disps[1] = offsetof(S,b);
MPI_Type_create_struct(count, blocklens, disps, types, tstype);
MPI_Type_commit(tstype);
}
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
MPI_Datatype structtype;
MPI_Op maxstruct;
S local, global;
defineStruct(&structtype);
MPI_Op_create(max_struct, 1, &maxstruct);
// Just some random values
local.a = world_rank;
local.b = world_size - world_rank;
MPI_Allreduce(&local, &global, 1, structtype, maxstruct, MPI_COMM_WORLD);
if(world_rank == 0){
double c = global.a;
printf("%f %f\n", global.b, c);
}
MPI_Finalize();
return 0;
}
You can pair the value of b with the rank of the process to find the rank that contains the maximum value of b. The MPI_DOUBLE_INT type is very useful for this purpose. You can then broadcast a from this rank in order to have the value at each process.
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int my_rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// Create random a and b on each rank.
srand(123 + my_rank);
double a = rand() / (double)RAND_MAX;
double b = rand() / (double)RAND_MAX;
struct
{
double value;
int rank;
} s_in, s_out;
s_in.value = b;
s_in.rank = my_rank;
printf("before: %d, %f, %f\n", my_rank, a, b);
// Find the maximum value of b and the corresponding rank.
MPI_Allreduce(&s_in, &s_out, 1, MPI_DOUBLE_INT, MPI_MAXLOC, MPI_COMM_WORLD);
b = s_out.value;
// Broadcast from the rank with the maximum value.
MPI_Bcast(&a, 1, MPI_DOUBLE, s_out.rank, MPI_COMM_WORLD);
printf("after: %d, %f, %f\n", my_rank, a, b);
MPI_Finalize();
}

MPI user defined type for parts of three different matrices

I am doing a particle simulation, and need to send some part of three different arrays to other processes. How to use MPI user defined types to do this?
For example, suppose, I have three Matrixes with datatype being double, A, B, and C on Process 1. Now I want to send the first two rows of A, B, and C to Process 2. So how to use MPI user defined type to do this, assuming C type storage for these Matrices? Thank you.
Currently, I am copying the first two rows of these Matrices to a single buffer, and then perform MPI Send. These involves basically the following steps:
Copy the first two rows of A, B, and C to a send_buffer on Process 1.
Send the send_buffer from Process 1 to Process 2.
On Process 2, use recv_buffer to receive data from Process 1.
On Process 2, copy data from recv_buffer to A, B, C on Process 2.
I hope there is a better way to do this. Thanks.
In the code below, an MPI data type is defined to communicate a range of rows of a matrix. If there are three matrices then there would be three send/receive. You can compare the following code with your own code to see which one is better.
If you think transferring matrices one by one is not efficient then you might put all matrices inside a struct and make a MPI data type or consider using MPI_PACK.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
void make_layout(int row_begin, int row_end, int ncol, MPI_Datatype* mpi_dtype)
{
int nblock = 1;
int block_count = (row_end - row_begin + 1) * ncol;
MPI_Aint lb, extent;
MPI_Type_get_extent(MPI_DOUBLE, &lb, &extent);
MPI_Aint offset = row_begin * ncol * extent;
MPI_Datatype block_type = MPI_DOUBLE;
MPI_Type_create_struct(nblock, &block_count, &offset, &block_type, mpi_dtype);
MPI_Type_commit(mpi_dtype);
}
double** allocate(int nrow, int ncol)
{
double *data = (double *)malloc(nrow*ncol*sizeof(double));
double **array= (double **)malloc(nrow*sizeof(double*));
for (int i=0; i<nrow; i++)
array[i] = &(data[ncol*i]);
return array;
}
int main()
{
MPI_Init(NULL, NULL);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// make 3x3 matrix.
int nrow = 3;
int ncol = 3;
double** A = allocate(nrow, ncol);
// make mpi datatype to send rows [0, 1] excluding row 2.
// you can send any range of rows i.e. rows [row_begin, row_end].
int row_begin = 0;
int row_end = 1; // inclusive.
MPI_Datatype mpi_dtype;
make_layout(row_begin, row_end, ncol, &mpi_dtype);
if (rank == 0)
MPI_Send(&(A[0][0]), 1, mpi_dtype, 1, 0, MPI_COMM_WORLD);
else
MPI_Recv(&(A[0][0]), 1, mpi_dtype, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Type_free(&mpi_dtype);
MPI_Finalize();
return 0;
}

MPI Error: (Segmentation fault: 11)

I have seen many similar threads here, but the problem with me, is, that my program actually runs for different settings.
For example: when my matrix is 1024x1024
For 2 cores : Error 11
For 4, 8, 16 etc works fine.
Matrix 2048x2048:
For any core setting : Error 11.
I don't understand why this happens, each process is taking a 2048/(total processes) X 2028 matrix to calculate. And it should be working correctly.
This is how i declare my matrix:
int temp[wp][hp];
For receive:
rc = MPI_Recv(&temp[0][0], wp*hp, MPI_INT, i, tag, MPI_COMM_WORLD, &status);
And for send:
rc = MPI_Send(&temp[0][0], wp*hp, MPI_INT, 0, tag, MPI_COMM_WORLD);
I don't get it, it should be working. Do you think it is perhaps a memory issue and not pointer related?
I would create the array with malloc
int *temp =(int*)malloc(wp * hp * sizeof(int));
then I would change the other lines to
rc = MPI_Recv(temp, wp*hp, MPI_INT, i, tag, MPI_COMM_WORLD, &status);
and
rc = MPI_Send(temp, wp*hp, MPI_INT, 0, tag, MPI_COMM_WORLD);
and when I'm done with the array, I free it.
free(temp)
Like one of the commenters already stated, allocating the array your way is not legal c++.
Edit:
if you want to access to array twodimensional, here is the pattern for that:
temp[rowToAccess * numberOfColumns + columnToAcess]
The answer given is basically a rundown of the issue -- you are more than likely using up the stack space to store the array, and thus exhausting the memory reserved for the stack.
Thus the solution is to create the array dynamically.
Here is an example of creating a 2D array dynamically:
#include <stdlib.h>
//...
int** create2DArray(unsigned nrows, unsigned ncols)
{
unsigned i;
int** ptr = malloc(nrows * sizeof(int*)); // allocate pointers to rows
if ( ptr != NULL )
{
int* pool = malloc(nrows * ncols * sizeof(int)); // allocate pool of memory
if ( pool != NULL )
{
for (i = 0; i < nrows; ++i, pool += ncols )
ptr[i] = pool; // point the row pointers into the pool
}
else
{
free(ptr);
ptr = NULL;
}
}
return ptr;
}
void delete2DArray(int** arr)
{
free(arr[0]); // remove the pool
free(arr); // remove the pointers
}
int main()
{
int **temp = create2DArray(2048, 2048);
if ( temp != NULL )
{
temp[0][0] = 10; // for example you can use it like a 2D array
delete2DArray(temp); // free the memory
}
}
This will create in essence, a contiguous 2D array, similar to the one you attempted to create using int temp[2048][2048], but this time, the memory is obtained from the heap, not the stack.
Note you can use the [][] syntax, just like a 2 dimensional array.
I won't go into detail to how it works, however it is simple enough to follow the logic.
So these were the problems:
1) As indicated by #PaulMcKenzie and #Alex, I had to dynamically allocate memory. Now the problem was that If i used
rc = MPI_Send(temp, wp*hp, MPI_INT, 0, tag, MPI_COMM_WORLD);
or
rc = MPI_Recv(temp, wp*hp, MPI_INT, i, tag, MPI_COMM_WORLD, &status);
as #Alex suggested, then, my program would crash for some reason. So my initial &temp[0][0] was correct.
2) The second problem, was, that I was saving this array using fwrite but when u have dynamical arrays you need to do it like this:
for (int i = 0; i < w; i++){
for (int j = 0; j < h; j++){
fwrite(&im[i][j], sizeof(int), 1, yourfilename);
}
}
Cheers everyone

Writing distributed arrays using MPI-IO and Cartesian topology

I have an MPI code that implements 2D domain decomposition to compute numerical solutions to a PDE. Currently I write certain 2D distributed arrays out for each process (e.g. array_x--> proc000x.bin). I want to reduce that to a single binary file.
array_0, array_1,
array_2, array_3,
Suppose the above illustrates a cartesian topology with 4 processes (2x2). Each 2D array has dimension (nx + 2, nz + 2). The +2 signifies "ghost" layers added to all sides for communication purposes.
I would like to extract the main arrays (omit the ghost layers) and write them to a single binary file with an order something like,
array_0, array_1, array_2, array_3 --> output.bin
If possible it would be preferable to write it as though I had access to the global grid and was writing row-by-row i.e.,
row 0 of array_0, row 0 of array_1, row 1 of array_0 row_1 of array_1 ....
The attempt below attempts the former of the two output formats in file array_test.c
#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
/* 2D array allocation */
float **alloc2D(int rows, int cols);
float **alloc2D(int rows, int cols) {
int i, j;
float *data = malloc(rows * cols * sizeof(float));
float **arr2D = malloc(rows * sizeof(float *));
for (i = 0; i < rows; i++) {
arr2D[i] = &(data[i * cols]);
}
/* Initialize to zero */
for (i= 0; i < rows; i++) {
for (j=0; j < cols; j++) {
arr2D[i][j] = 0.0;
}
}
return arr2D;
}
int main(void) {
/* Creates 5x5 array of floats with padding layers and
* attempts to write distributed arrays */
/* Run toy example with 4 processes */
int i, j, row, col;
int nx = 5, ny = 5, npad = 1;
int my_rank, nproc=4;
int dim[2] = {2, 2}; /* 2x2 cartesian grid */
int period[2] = {0, 0};
int coord[2];
int reorder = 1;
float **A = NULL;
MPI_Comm grid_Comm;
/* Initialize MPI */
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Establish cartesian topology */
MPI_Cart_create(MPI_COMM_WORLD, 2, dim, period, reorder, &grid_Comm);
/* Get cartesian grid indicies of processes */
MPI_Cart_coords(grid_Comm, my_rank, 2, coord);
row = coord[1];
col = coord[0];
/* Add ghost layers */
nx += 2 * npad;
ny += 2 * npad;
A = alloc2D(nx, ny);
/* Create derived datatype for interior grid (output grid) */
MPI_Datatype grid;
int start[2] = {npad, npad};
int arrsize[2] = {nx, ny};
int gridsize[2] = {nx - 2 * npad, ny - 2 * npad};
MPI_Type_create_subarray(2, arrsize, gridsize,
start, MPI_ORDER_C, MPI_FLOAT, &grid);
MPI_Type_commit(&grid);
/* Fill interior grid */
for (i = npad; i < nx-npad; i++) {
for (j = npad; j < ny-npad; j++) {
A[i][j] = my_rank + i;
}
}
/* MPI IO */
MPI_File fh;
MPI_Status status;
char file_name[100];
int N, offset;
sprintf(file_name, "output.bin");
MPI_File_open(grid_Comm, file_name, MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &fh);
N = (nx - 2 * npad) * (ny - 2 *npad);
offset = (row * 2 + col) * N * sizeof(float);
MPI_File_set_view(fh, offset, MPI_FLOAT, grid, "native",
MPI_INFO_NULL);
MPI_File_write_all(fh, &A[0][0], N, MPI_FLOAT, MPI_STATUS_IGNORE);
MPI_File_close(&fh);
/* Cleanup */
free(A[0]);
free(A);
MPI_Type_free(&grid);
MPI_Finalize();
return 0;
}
Compiles with
mpicc -o array_test array_test.c
Runs with
mpiexec -n 4 array_test
While the code compiles and runs, the output is incorrect. I'm assuming that I have misinterpreted the use of the derived datatype and file writing in this case. I'd appreciate some help figuring out my mistakes.
The error you make here is that you have the wrong file view. Instead of creating a type representing the share of the file the current processor is responsible of, you use the mask corresponding to the local data you want to write.
You have actually two very distinct masks to consider:
The mask for the local data, excluding the halo layers; and
The mask for the global data, as it should be once collated into the file.
The former corresponds to this layout:
Here, the data that you want to output on the file for a given process in in dark blue, and the halo layer that should not be written on the file is in lighter blue.
The later corresponds to this layout:
Here, each colour corresponds to the local data coming from a different process, as distributed on the 2D Cartesian grid.
To understand what you need to create to reach this final result, you have to think backwards:
Your final call to the IO routine should be MPI_File_write_all(fh, &A[0][0], 1, interior, MPI_STATUS_IGNORE);. So you have to have your interior type defined such as to exclude the halo boundary. Well fortunately, the type grid you created already does exactly that. So we will use it.
But now, you have to have the view on the file to allow for this MPI_Fie_write_all() call. So the view must be as described in the second picture. We will therefore create a new MPI type representing it. For that, MPI_Type_create_subarray() is what we need.
Here is the synopsis of this function:
int MPI_Type_create_subarray(int ndims,
const int array_of_sizes[],
const int array_of_subsizes[],
const int array_of_starts[],
int order,
MPI_Datatype oldtype,
MPI_Datatype *newtype)
Create a datatype for a subarray of a regular, multidimensional array
INPUT PARAMETERS
ndims - number of array dimensions (positive integer)
array_of_sizes
- number of elements of type oldtype in each
dimension of the full array (array of positive integers)
array_of_subsizes
- number of elements of type oldtype in each dimension of
the subarray (array of positive integers)
array_of_starts
- starting coordinates of the subarray in each dimension
(array of nonnegative integers)
order - array storage order flag (state)
oldtype - array element datatype (handle)
OUTPUT PARAMETERS
newtype - new datatype (handle)
For our 2D Cartesian file view, here are what we need for these input parameters:
ndims: 2 as the grid is 2D
array_of_sizes: these are the dimensions of the global array to output, namely { nnx*dim[0], nny*dim[1] }
array_of_subsizes: these are the dimensions of the local share of the data to output, namely { nnx, nny }
array_of_start: these are the x,y start coordinates of the local share into the global grid, namely { nnx*coord[0], nny*coord[1] }
order: the ordering is C so this must be MPI_ORDER_C
oldtype: data are floats so this must be MPI_FLOAT
Now that we have our type for the file view, we simply apply it with MPI_File_set_view(fh, 0, MPI_FLOAT, view, "native", MPI_INFO_NULL); and the magic is done.
Your full code becomes:
#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
/* 2D array allocation */
float **alloc2D(int rows, int cols);
float **alloc2D(int rows, int cols) {
int i, j;
float *data = malloc(rows * cols * sizeof(float));
float **arr2D = malloc(rows * sizeof(float *));
for (i = 0; i < rows; i++) {
arr2D[i] = &(data[i * cols]);
}
/* Initialize to zero */
for (i= 0; i < rows; i++) {
for (j=0; j < cols; j++) {
arr2D[i][j] = 0.0;
}
}
return arr2D;
}
int main(void) {
/* Creates 5x5 array of floats with padding layers and
* attempts to write distributed arrays */
/* Run toy example with 4 processes */
int i, j, row, col;
int nx = 5, ny = 5, npad = 1;
int my_rank, nproc=4;
int dim[2] = {2, 2}; /* 2x2 cartesian grid */
int period[2] = {0, 0};
int coord[2];
int reorder = 1;
float **A = NULL;
MPI_Comm grid_Comm;
/* Initialize MPI */
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Establish cartesian topology */
MPI_Cart_create(MPI_COMM_WORLD, 2, dim, period, reorder, &grid_Comm);
/* Get cartesian grid indicies of processes */
MPI_Cart_coords(grid_Comm, my_rank, 2, coord);
row = coord[1];
col = coord[0];
/* Add ghost layers */
nx += 2 * npad;
ny += 2 * npad;
A = alloc2D(nx, ny);
/* Create derived datatype for interior grid (output grid) */
MPI_Datatype grid;
int start[2] = {npad, npad};
int arrsize[2] = {nx, ny};
int gridsize[2] = {nx - 2 * npad, ny - 2 * npad};
MPI_Type_create_subarray(2, arrsize, gridsize,
start, MPI_ORDER_C, MPI_FLOAT, &grid);
MPI_Type_commit(&grid);
/* Fill interior grid */
for (i = npad; i < nx-npad; i++) {
for (j = npad; j < ny-npad; j++) {
A[i][j] = my_rank + i;
}
}
/* Create derived type for file view */
MPI_Datatype view;
int nnx = nx-2*npad, nny = ny-2*npad;
int startV[2] = { coord[0]*nnx, coord[1]*nny };
int arrsizeV[2] = { dim[0]*nnx, dim[1]*nny };
int gridsizeV[2] = { nnx, nny };
MPI_Type_create_subarray(2, arrsizeV, gridsizeV,
startV, MPI_ORDER_C, MPI_FLOAT, &view);
MPI_Type_commit(&view);
/* MPI IO */
MPI_File fh;
MPI_File_open(grid_Comm, "output.bin", MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &fh);
MPI_File_set_view(fh, 0, MPI_FLOAT, view, "native", MPI_INFO_NULL);
MPI_File_write_all(fh, &A[0][0], 1, grid, MPI_STATUS_IGNORE);
MPI_File_close(&fh);
/* Cleanup */
free(A[0]);
free(A);
MPI_Type_free(&view);
MPI_Type_free(&grid);
MPI_Finalize();
return 0;
}

Resources