MPI Scatter Array of Matrices Struct

MPI Scatter Array of Matrices Struct - c

I have an array of type Matrix structs which the program got from user's input. I need to distribute the matrices to processes with OpenMPI. I tried using Scatter but I am quite confused about the arguments needed for the program to work (and also how to receive the data in each local arrays). Here is my current code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mpi.h>
#define nil NULL
#define NMAX 100
#define DATAMAX 1000
#define DATAMIN -1000
typedef struct Matrix
{
int mat[NMAX][NMAX]; // Matrix cells
int row_eff; // Matrix effective row
int col_eff; // Matrix effective column
} Matrix;
void init_matrix(Matrix *m, int nrow, int ncol)
{
m->row_eff = nrow;
m->col_eff = ncol;
for (int i = 0; i < m->row_eff; i++)
{
for (int j = 0; j < m->col_eff; j++)
{
m->mat[i][j] = 0;
}
}
}
Matrix input_matrix(int nrow, int ncol)
{
Matrix input;
init_matrix(&input, nrow, ncol);
for (int i = 0; i < nrow; i++)
{
for (int j = 0; j < ncol; j++)
{
scanf("%d", &input.mat[i][j]);
}
}
return input;
}
void print_matrix(Matrix *m)
{
for (int i = 0; i < m->row_eff; i++)
{
for (int j = 0; j < m->col_eff; j++)
{
printf("%d ", m->mat[i][j]);
}
printf("\n");
}
}
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
// Get number of processes
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
// Get process rank
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Get matrices from user inputs
int kernel_row, kernel_col, num_targets, target_row, target_col;
// reads kernel's row and column and initalize kernel matrix from input
scanf("%d %d", &kernel_row, &kernel_col);
Matrix kernel = input_matrix(kernel_row, kernel_col);
// reads number of target matrices and their dimensions.
// initialize array of matrices and array of data ranges (int)
scanf("%d %d %d", &num_targets, &target_row, &target_col);
Matrix *arr_mat = (Matrix *)malloc(num_targets * sizeof(Matrix));
for (int i = 0; i < num_targets; i++)
{
arr_mat[i] = input_matrix(target_row, target_col);
}
// Get number of matrices per process
int num_mat_per_proc = ceil(num_targets / size);
// Init local matrices and scatter the global matrices
Matrix *local_mats = (Matrix *)malloc(num_mat_per_proc * sizeof(Matrix));
MPI_Scatter(arr_mat, sizeof(local_mats), MPI_BYTE, &local_mats, sizeof(local_mats), MPI_BYTE, 0, MPI_COMM_WORLD);
if (rank == 0)
{
// Range arrays -> array of convolution results
int arr_range[num_targets];
printf("From master \n");
for (int i = 0; i < 3; i++)
{
print_matrix(&arr_mat[i]);
}
}
else
{
printf("From slave %d = \n", rank);
print_matrix(&local_mats[0]);
}
MPI_Finalize();
}
So here's a few doubts I have about the current implementation:
Can I accept the input just like that or should I make it so that it only happens in rank 0?
How do I implement the scatter part and possibly using Scatterv because the amount of arrays might not be divisible to the number of processes?

Can I accept the input just like that or should I make it so that it
only happens in rank 0?
No, You should use command line arguments or read from file as best practice.
If you want to use scanf, then use it inside rank 0. STDIN is forwarded to rank 0 (this is not supported in standard as far as I know, But I guess this should work and will be implementation dependent)
How do I implement the scatter part and possibly using Scatterv
because the amount of arrays might not be divisible to the number of
processes?
If you different size to send for different processes, then you should use scatterv.
Scatter Syntax:
MPI_Scatter(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator)
Your usage:
MPI_Scatter(arr_mat, sizeof(local_mats), MPI_BYTE, &local_mats, sizeof(local_mats), MPI_BYTE, 0, MPI_COMM_WORLD);
Potential error points:
In send_count: Size to send (as Gilles Gouaillardet Pointed out in comments). Sizeof(local_mats) instead it should be num_mat_per_proc * sizeof(Matrix).
recv_count: I believe size to receive should not be sizeof(local_mats).
Since you use the same type (MPI_BYTES) for SEND and RECV, your send_count == recv_count

Related

MPI_Get doesn't send the correct elements between the buffers of two process

I am trying to create a program that will ultimately be transposing a matrix in MPI so that it can be used in further computations. But right now I am trying to do a simple thing: Root process has a 4x4 matrix "A" which contains elements 0..15 in row-major order. This data is scattered to 2 processes so that each receives one half of the matrix. Process 0 has a 2x4 sub_matrix "a" and receives elements 0..7 and Process 1 gets elements 8..15 in its sub_matrix "a".
My goal is for these processes to swap their a matrices with each other using MPI_Get. Since I was encountering problems, I decided to test a simpler version and simply make process 0 get process 1's "a" matrix, that way, both processes will have the same elements in their respective sub_matrices once I print after the MPI_Get-call and the MPI_fence are called.
Yet the output is erratic, have tried to trouble-shoot for several hours but haven't been able to crack the nut. Would appreciate your help with this.
This is the code below, and the run-command: mpirun -n 2 ./get
Compile: mpicc -std=c99 -g -O3 -o get get.c -lm
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define NROWS 4
#define NCOLS 4
int allocate_matrix(int ***M, int ROWS, int COLS) {
int *p;
if (NULL == (p = malloc(ROWS * COLS * sizeof(int)))) {
perror("Couldn't allocate memory for input (p in allocate_matrix)");
return -1;
}
if (NULL == (*M = malloc(ROWS * sizeof(int*)))) {
perror("Couldn't allocate memory for input (M in allocate_matrix)");
return -1;
}
for(int i = 0; i < ROWS; i++) {
(*M)[i] = &(p[i * COLS]);
}
return 0;
}
int main(int argc, char *argv[])
{
int rank, nprocs, **A, **a, n_cols, n_rows, block_len;
MPI_Win win;
int errs = 0;
if(rank==0)
{
allocate_matrix(&A, NROWS, NCOLS);
for (int i=0; i<NROWS; i++)
for (int j=0; j<NCOLS; j++)
A[i][j] = i*NCOLS + j;
}
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
n_cols=NCOLS; //cols in a sub_matrix
n_rows=NROWS/nprocs; //rows in a sub_matrix
block_len = n_cols*n_rows;
allocate_matrix(&a, n_rows, n_cols);
for (int i = 0; i <n_rows; i++)
for (int j = 0; j < n_cols; j++)
a[i][j] = 0;
MPI_Datatype block_type;
MPI_Type_vector(n_rows, n_cols, n_cols, MPI_INTEGER, &block_type);
MPI_Type_commit(&block_type);
MPI_Scatter(*A, 1, block_type, &(a[0][0]), block_len, MPI_INTEGER, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
printf("process %d: \n", rank);
for (int j=0; j<n_rows; j++){
for (int i=0; i<n_cols; i++){
printf("%d ",a[j][i]);
}
printf("\n");
}
if (rank == 0)
{
printf("TESTING, before Get a[0][0] %d\n", a[0][0]);
MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win);
MPI_Win_fence((MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE), win);
MPI_Get(*a, 8, MPI_INTEGER, 1, 0, 8, MPI_INTEGER, win);
MPI_Win_fence(MPI_MODE_NOSUCCEED, win);
printf("TESTING, after Get a[0][0] %d\n", a[0][0]);
printf("process %d:\n", rank);
for (int j=0; j<n_rows; j++){
for (int i=0; i<n_cols; i++){
printf("%d ", a[j][i]);
}
printf("\n");
}
}
else
{ /* rank = 1 */
MPI_Win_create(a, n_rows*n_cols*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
MPI_Win_fence((MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE), win);
MPI_Win_fence(MPI_MODE_NOSUCCEED, win);
}
MPI_Type_free(&block_type);
MPI_Win_free(&win);
MPI_Finalize();
return errs;
}
This is the output that I get:
process 0:
0 1 2 3
4 5 6 7
process 1:
8 9 10 11
12 13 14 15
process 0:
1552976336 22007 1552976352 22007
1552800144 22007 117 0
But what I want is for the second time I print the matrix from process 0, it should have the same elements as in process 1.

First, I doubt this is really the code you are testing. You are freeing some MPI type variables that are not defined and also rank is uninitialised in
if(rank==0)
{
allocate_matrix(&A, NROWS, NCOLS);
for (int i=0; i<NROWS; i++)
for (int j=0; j<NCOLS; j++)
A[i][j] = i*NCOLS + j;
}
and the code segfaults because A won't get allocated in the root.
Moving this post MPI_Comm_rank(), freeing the correct MPI type variable, and fixing the call to MPI_Win_create in rank 1:
MPI_Win_create(&a[0][0], n_rows*n_cols*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
// This -------^^^^^^^^
produces the result you are seeking.
I'd recommend to stick to a single notation for the beginning of the array like &a[0][0] instead of a mixture of *a and &a[0][0]. This will prevent (or at least reduce the occurrence of) similar errors in the future.

MPI matrix multiplication

I'm trying to make an MPI matrix multiplication program but the scatter function doesn't seem to be working for me. Only one row is getting scattered and the rest of the cores receive garbage value.
Also when calling the display_matrix() function before I MPI_Init() seems to be running 4 threads instead of 1 (I have quad core CPU). Why is this happening even before initialisation?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include<mpi.h>
int **matrix_generator(int row,int col);
int **multiply_matrices(int **matrix_A,int **matrix_B,int rowsA, int colsA,int rowsB,int colsB);
void display_matrix(int **matrixA,int rows,int cols);
void main(int argc,char *argv[])
{
srand(time(0));
int **matrix_A,**matrix_B,**matrix_result,*scattered_matrix,*gathered_matrix, rowsA,colsA,rowsB,colsB,world_rank,world_size,i,j;
rowsA = atoi(argv[1]);
colsA = atoi(argv[2]);
rowsB = atoi(argv[3]);
colsB = atoi(argv[4]);
scattered_matrix = (int *)malloc(sizeof(int) * rowsA*colsA/4);
if (argc != 5)
{
fprintf(stderr,"Usage: mpirun -np <No. of processors> ./a.out <Rows A> <Columns A> <Rows B> <Columns B>\n");
exit(-1);
}
else if(colsA != rowsB)
{
printf("Check the dimensions of the matrices!\n\n");
}
matrix_A = matrix_generator(rowsA,colsA);
matrix_B = matrix_generator(rowsB,colsB);
display_matrix(matrix_A,rowsA,colsA);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Scatter(matrix_A, rowsA*colsA/4, MPI_INT, scattered_matrix, rowsA*colsA/4, MPI_INT, 0, MPI_COMM_WORLD);
for(i=0;i<world_size;i++)
{
printf("Scattering data %d from root to: %d \n",scattered_matrix[i],world_rank);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
int **matrix_generator(int row, int col)
{
int i, j, **intMatrix;
intMatrix = (int **)malloc(sizeof(int *) * row);
for (i = 0; i < row; i++)
{
intMatrix[i] = (int *)malloc(sizeof(int *) * col);
for (j = 0;j<col;j++)
{
intMatrix[i][j]=rand()%10;
}
}
return intMatrix;
}
void display_matrix(int **matrix, int rows,int cols)
{
int i,j;
for (i = 0; i < rows; i = i + 1)
{
for (j = 0; j < cols; j = j + 1)
printf("%d ",matrix[i][j]);
printf("\n");
}
}

The main issue is your matrices are not allocated in contiguous memory (see the comment section for a link)
The MPI standard does not specify what happens before an app invokes MPI_Init().
The two main MPI implementations choose to spawn all the tasks when mpirun is invoked (that means there are 4 independent processes first, and they "join" into a single MPI job when they all call MPI_Init()).
That being said, once upon a time, a vendor chose to have mpirun start a single MPI task, and they use their own remote-fork when MPI_Init() is called.
Bottom line, if you want to write portable code, do as less as possible (and never print anything) before MPI_Init() is called.

MPI gathering 2D subarrays

I know this has been answered many times before and there is a comprehensive answer here which I have read and attempted to use but I just can't get my code to work for some reason.
I have stripped my code down a bit to make it a bit easier to follow, but basically what I am trying to do is have each process initialise a sub-array and work on it, then put the whole big array back together on rank 0. MPI_Gatherv is giving me a segfault and I cannot figure out why.
Any help would be greatly appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>
#define N 32
void init_lattice(double **site, int row, int col){
int i,j;
for(i=0; i<row; i++){
for(j=0; j<col; j++){
site[i][j]=(drand48()/4294967295.0 + 0.5)*2*M_PI;
}
}
}
int main(int argc, char *argv[]){
int nprocs, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
int dim = 2;
int grid[dim];
grid[0]=0;
grid[1]=0;
// Assign the grid dimensions
MPI_Dims_create(nprocs, dim, grid);
printf("Dim grid: length: %d, width: %d\n", grid[0], grid[1]);
// The new communicator
MPI_Comm comm_grid;
// Allow cyclic behavior
int periodic[dim];
periodic[0] = 1;
periodic[1] = 1;
// Create the communicator
MPI_Cart_create(MPI_COMM_WORLD, dim, grid, periodic, 0, &comm_grid);
int block_len, block_width;
block_len = N/grid[1];
block_width = N/grid[0];
int i, j;
//Create lattice subset
double *data = (double *) malloc (block_len * block_width * sizeof(double));
double **site = (double **) malloc (block_len * sizeof(double *));
for (i = 0; i < block_len; i++)
site[i] = & (data[i * block_width]);
//Initialise lattice
init_lattice(site, block_len, block_width);
MPI_Datatype newtype, subtype;
int sizes[dim];
sizes[0]=N;
sizes[1]=N;
int subsizes[dim];
subsizes[0] = block_len;
subsizes[1] = block_width;
int starts[dim];
starts[0] = 0;
starts[1] = 0;
MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, MPI_DOUBLE, &newtype);
MPI_Type_create_resized(newtype, 0, N/grid[1]*sizeof(double), &subtype);
MPI_Type_commit(&subtype);
int sendcounts[grid[0]*grid[1]];
int displs[grid[0]*grid[1]];
if (rank == 0) {
for (i=0; i<grid[0]*grid[1]; i++) sendcounts[i] = 1;
int disp = 0;
for (i=0; i<grid[0]; i++) {
for (j=0; j<grid[1]; j++) {
displs[i*grid[0]+j] = disp;
disp += 1;
}
disp += ((N/grid[1])-1)*grid[0];
}
}
//Create global lattice
double *global_data = (double *) malloc (N * N * sizeof(double));
double **global_site = (double **) malloc (N * sizeof(double *));
for (i = 0; i < N; i++)
global_site[i] = & (global_data[i * N]);
MPI_Gatherv(&(site[0][0]), N*N/(grid[0]*grid[1]), MPI_DOUBLE, &(global_site[0][0]), sendcounts, displs, subtype, 0, MPI_COMM_WORLD);
if(rank==0){
printf("Rank: %d\n", rank);
for(i=0; i<N; i++){
for(j=0; j<N; j++){
printf("%.2lf ", global_site[i][j]);
}
printf("\n");
}
}
return 0;
}
EDIT:
Ok so I have changed my array allocations to contiguous memory and everything is working as it should now. Thanks talonmies!

The fundamental problem here is that MPI expects all allocations to be contiguous blocks of memory. Your site and global_site arrays are not, they are arrays of pointers. The MPI routines are just reading past the end of each individual row allocation and causing your segfault.
If you want to allocate an n x n array to use with the MPI then you need to replace this:
double **global_site;
if(rank==0){
global_site = malloc(sizeof(double *)*(N));
for(i=0; i<N; i++)
global_site[i] = malloc(sizeof(double)*(N));
}
with something like this:
double *global_site = malloc(sizeof(double)*(N * N));
You will obviously need to adjust the rest of your code accordingly.
It seems the only reason you are actually using arrays of pointers is for the convenience of [i][j] style 2D indexing. If you use linear or pitched linear memory, you can easily make a little preprocessor macro or helper function which can give you that style of indexing into row or column major ordered storage which is still compatible with MPI.

How do I use MPI to scatter a 2d array?

I am trying to use MPI to distribute the work for bucket sort. When I scatter the array, I wanted each process to receive a single bucket (int array) and be able to print its content. However, my current program prints out incorrect values, which make me think I am not indexing into the memory I want. Can someone help explain how I can properly index into the array I am passing to each process or how I am doing this incorrectly?
#define MAX_VALUE 64
#define N 32
main(int argc, char *argv[]){
MPI_Init(&argc, &argv); //initialize MPI environment
int** sendArray = malloc(16*sizeof(int *));
int *arrayIndex = (int *) malloc(16*sizeof(int));
int *receiveArray = (int *) malloc(N*sizeof(int));
int nps, myrank;
MPI_Comm_size(MPI_COMM_WORLD, &nps);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
int i;
if(myrank == 0)
{
//create an array that stores the number of values in each bucket
for( i = 0; i < 16; i++){
arrayIndex[i] = 0;
}
int bucket =0;
int temp = 0;
//creates an int array within each array index of sendArray
for( i = 0; i < 16; i++){
sendArray[i] = (int *)malloc(N * sizeof(int));
}
//Create a random int array with values ranging from 0 to MAX_VALUE
for(i = 0; i < N; i++){
temp= rand() % MAX_VALUE;
bucket = temp/4;
printf("assigning %d to index [%d][%d]\n", temp, bucket, arrayIndex[bucket]);
sendArray[bucket][arrayIndex[bucket]]= temp;
arrayIndex[bucket] = arrayIndex[bucket] + 1;
}
MPI_Scatter(sendArray, 16, MPI_INT, receiveArray, N, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(arrayIndex, 16, MPI_INT, 0, MPI_COMM_WORLD);
printf("bucket %d has %d values\n", myrank, arrayIndex[myrank]);
for( i = 0; i < arrayIndex[myrank]; i++){
printf("bucket %d index %d has value %d\n", myrank, i, receiveArray[i]);
}
}

What you are trying to do doesn't work because MPI always sends only the data you point to. It does not follow the pointers in your sendArray.
In your example you could just make your SendArray bigger, namely 16 * N and put all your data into that continuous array. That way you have a one-dimensional array, but that should not be a Problem in the code you gave us because all buckets have the same length, so you can access element j from bucket i with sendArray[i * N + j].
Also, in most cases send_count should be equal to recv_count. In your case that would be N. The correct MPI call would be
MPI_Scatter(sendArray, N, MPI_INT, receiveArray, N, MPI_INT, 0, MPI_COMM_WORLD);

MPI_Scatterv segfault

I'm just starting out with MPI programming and decided to make a simple distributed qsort using OpenMPI. To distribute parts of the array I want to sort I'm trying to use MPI_Scatterv, however the following code segfaults on me:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
#define ARRAY_SIZE 26
#define BUFFER_SIZE 2048
int main(int argc, char** argv) {
int my_rank, nr_procs;
int* data_in, *data_out;
int* sizes;
int* offsets;
srand(time(0));
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nr_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// everybody generates the control tables
int nr_workers = nr_procs-1;
sizes = malloc(sizeof(int)*nr_workers);
offsets = malloc(sizeof(int)*nr_workers);
int nr_elems = ARRAY_SIZE/nr_workers;
// basic distribution
for (int i = 0; i < nr_workers; ++i) {
sizes[i] = nr_elems;
}
// distribute the remainder
int left = ARRAY_SIZE%nr_workers;
int curr_worker = 0;
while (left) {
++sizes[curr_worker];
curr_worker = (++curr_worker)%nr_workers;
--left;
}
// offsets
int curr_offset = 0;
for (int i = 0; i < nr_workers; ++i) {
offsets[i] = curr_offset;
curr_offset += sizes[i];
}
if (my_rank == 0) {
// root
data_in = malloc(sizeof(int)*ARRAY_SIZE);
data_out = malloc(sizeof(int)*ARRAY_SIZE);
for (int i = 0; i < ARRAY_SIZE; ++i) {
data_in[i] = rand();
}
for (int i = 0; i < nr_workers; ++i) {
printf("%d at %d\n", sizes[i], offsets[i]);
}
MPI_Scatterv (data_in, sizes, offsets, MPI_INT, data_out, ARRAY_SIZE, MPI_INT, 0, MPI_COMM_WORLD);
} else {
// worker
printf("%d has %d elements!\n",my_rank, sizes[my_rank-1]);
// alloc the input buffer
data_in = malloc(sizeof(int)*sizes[my_rank-1]);
MPI_Scatterv(NULL, NULL, NULL, MPI_INT, data_in, sizes[my_rank-1], MPI_INT, 0, MPI_COMM_WORLD);
printf("%d got:\n", my_rank);
for (int i = 0; i < sizes[my_rank-1]; ++i) {
printf("%d ", data_in[i]);
}
printf("\n");
}
MPI_Finalize();
return 0;
}
How would I go about using Scatterv? Am I doing something wrong with allocating my input buffer from inside the worker code?

I changed some part in your code to get something working.
MPI_Scatter() will send data to every processors, including himself. According to your program, processor 0 expects ARRAY_SIZE integers, but sizes[0] is much smaller.
There are other problems on other processus : MPI_Scatter will send sizes[my_rank] integers, but sizes[my_rank-1] will be expected...
Here is a code that scatters data_in from 0 to all processors, including 0. Therefore i added 1 to nr_workers :
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
#define ARRAY_SIZE 26
#define BUFFER_SIZE 2048
int main(int argc, char** argv) {
int my_rank, nr_procs;
int* data_in, *data_out;
int* sizes;
int* offsets;
srand(time(0));
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nr_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// everybody generates the control tables
int nr_workers = nr_procs;
sizes = malloc(sizeof(int)*nr_workers);
offsets = malloc(sizeof(int)*nr_workers);
int nr_elems = ARRAY_SIZE/nr_workers;
// basic distribution
for (int i = 0; i < nr_workers; ++i) {
sizes[i] = nr_elems;
}
// distribute the remainder
int left = ARRAY_SIZE%nr_workers;
int curr_worker = 0;
while (left) {
++sizes[curr_worker];
curr_worker = (++curr_worker)%nr_workers;
--left;
}
// offsets
int curr_offset = 0;
for (int i = 0; i < nr_workers; ++i) {
offsets[i] = curr_offset;
curr_offset += sizes[i];
}
if (my_rank == 0) {
// root
data_in = malloc(sizeof(int)*ARRAY_SIZE);
for (int i = 0; i < ARRAY_SIZE; ++i) {
data_in[i] = rand();
printf("%d %d \n",i,data_in[i]);
}
for (int i = 0; i < nr_workers; ++i) {
printf("%d at %d\n", sizes[i], offsets[i]);
}
} else {
printf("%d has %d elements!\n",my_rank, sizes[my_rank]);
}
data_out = malloc(sizeof(int)*sizes[my_rank]);
MPI_Scatterv (data_in, sizes, offsets, MPI_INT, data_out, sizes[my_rank], MPI_INT, 0, MPI_COMM_WORLD);
printf("%d got:\n", my_rank);
for (int i = 0; i < sizes[my_rank]; ++i) {
printf("%d ", data_out[i]);
}
printf("\n");
free(data_out);
if(my_rank==0){
free(data_in);
}
MPI_Finalize();
return 0;
}
Regarding memory managment, data_in and data_out should be freed at the end of the code.
Is it what you wanted to do ? Good luck with qsort ! I think you are not the first one to sort integers using MPI. See parallel sort using mpi. Your way to generate random numbers on the 0 processus and then scatter them is the right way to go. I think you will be interrested by his TD_Trier() function for communication. Even if you change tri_fusion(T, 0, size - 1); for qsort(...)...
Bye,
Francis