How to convert MPI blocking code into non blocking - c

I want to perform matrix multiplication. I have to write two codes one with MPI blocking and other with MPI non blocking. I have done with MPI blocking. I want some help to convert below code into MPI non blocking.
This is the code of matrix multiplication with Blocking and i want to convert it into MPI non blocking. If anyone is available then Please respond..
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
#include <time.h>
#include <sys/time.h>
// Number of rows and columnns in a matrix
#define N 4
MPI_Status status;
// Matrix holders are created
double matrix_a[N][N],matrix_b[N][N],matrix_c[N][N];
int main(int argc, char **argv)
{
int processCount, processId, slaveTaskCount, source, dest, rows, offset;
struct timeval start, stop;
// MPI environment is initialized
MPI_Init(&argc, &argv);
// Each process gets unique ID (rank)
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
// Number of processes in communicator will be assigned to variable -> processCount
MPI_Comm_size(MPI_COMM_WORLD, &processCount);
// Number of slave tasks will be assigned to variable -> slaveTaskCount
slaveTaskCount = processCount - 1;
// Root (Master) process
if (processId == 0) {
// Matrix A and Matrix B both will be filled with random numbers
srand ( time(NULL) );
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++) {
matrix_a[i][j]= rand()%10;
matrix_b[i][j]= rand()%10;
}
}
printf("\n\t\tMatrix - Matrix Multiplication using MPI\n");
// Print Matrix A
printf("\nMatrix A\n\n");
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++) {
printf("%.0f\t", matrix_a[i][j]);
}
printf("\n");
}
// Print Matrix B
printf("\nMatrix B\n\n");
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++) {
printf("%.0f\t", matrix_b[i][j]);
}
printf("\n");
}
rows = N/slaveTaskCount;
offset = 0;
for (dest=1; dest <= slaveTaskCount; dest++)
{
// Acknowledging the offset of the Matrix A
MPI_Send(&offset, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
// Acknowledging the number of rows
MPI_Send(&rows, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
// Send rows of the Matrix A which will be assigned to slave process to compute
MPI_Send(&matrix_a[offset][0], rows*N, MPI_DOUBLE,dest,1, MPI_COMM_WORLD);
// Matrix B is sent
MPI_Send(&matrix_b, N*N, MPI_DOUBLE, dest, 1, MPI_COMM_WORLD);
// Offset is modified according to number of rows sent to each process
offset = offset + rows;
}
for (int i = 1; i <= slaveTaskCount; i++)
{
source = i;
// Receive the offset of particular slave process
MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
// Receive the number of rows that each slave process processed
MPI_Recv(&rows, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
// Calculated rows of the each process will be stored int Matrix C according to their offset and
// the processed number of rows
MPI_Recv(&matrix_c[offset][0], rows*N, MPI_DOUBLE, source, 2, MPI_COMM_WORLD, &status);
}
// Print the result matrix
printf("\nResult Matrix C = Matrix A * Matrix B:\n\n");
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++)
printf("%.0f\t", matrix_c[i][j]);
printf ("\n");
}
printf ("\n");
}
// Slave Processes
if (processId > 0) {
// Source process ID is defined
source = 0;
MPI_Recv(&offset, 1, MPI_INT, source, 1, MPI_COMM_WORLD, &status);
// The slave process receives number of rows sent by root process
MPI_Recv(&rows, 1, MPI_INT, source, 1, MPI_COMM_WORLD, &status);
// The slave process receives the sub portion of the Matrix A which assigned by Root
MPI_Recv(&matrix_a, rows*N, MPI_DOUBLE, source, 1, MPI_COMM_WORLD, &status);
// The slave process receives the Matrix B
MPI_Recv(&matrix_b, N*N, MPI_DOUBLE, source, 1, MPI_COMM_WORLD, &status);
// Matrix multiplication
for (int k = 0; k<N; k++) {
for (int i = 0; i<rows; i++) {
// Set initial value of the row summataion
matrix_c[i][k] = 0.0;
// Matrix A's element(i, j) will be multiplied with Matrix B's element(j, k)
for (int j = 0; j<N; j++)
matrix_c[i][k] = matrix_c[i][k] + matrix_a[i][j] * matrix_b[j][k];
}
}
// value in matrix C
MPI_Send(&offset, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
// Number of rows the process calculated will be sent to root process
MPI_Send(&rows, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
// Resulting matrix with calculated rows will be sent to root process
MPI_Send(&matrix_c, rows*N, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD);
}
MPI_Finalize();
}

Look at non-blocking this way: instead of spelling out "now I send this, now you receive that", you decide in a stage of the computation: "what are all the messages that will be communicated here". Then you do an Isend for all the sends, and Irecv for all the corresponding receives. And then wait for all the resulting requests.
One problem is that each of these Isend/Irecv operations need their own buffer, so you may need to allocate some more memory.

Related

Matrix not received properly with MPI_Send and MPI_Recv

I am new to programing with MPI and I have an exercise where I have to multiply 2 matrices using MPI_Send and MPI_Recv while sending both matrices to my processes and sending back the result to the root process. (both matrices are square and N is equal to the number of processes).
This is the code I have written:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[]){
srand(time(NULL));
int rank, nproc;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
int **matrice = (int **)malloc(nproc * sizeof(int *));
for ( int i=0; i<nproc; i++)
matrice[i] = (int *)malloc(nproc * sizeof(int));
int **matrice1 = (int **)malloc(nproc * sizeof(int *));
for (int i=0; i<nproc; i++)
matrice1[i] = (int *)malloc(nproc * sizeof(int));
int **result = (int **)malloc(nproc * sizeof(int *));
for (int i=0; i<nproc; i++)
result[i] = (int *)malloc(nproc * sizeof(int));
if(rank == 0){
for(int i = 0; i < nproc; i++){
for(int j = 0; j < nproc; j++){
matrice[i][j] = (rand() % 20) + 1;
matrice1[i][j] = (rand() % 20) + 1;
}
}
for(int i = 1; i < nproc; i++){
MPI_Send(&(matrice[0][0]), nproc*nproc, MPI_INT, i, 1, MPI_COMM_WORLD);
MPI_Send(&(matrice1[0][0]), nproc*nproc, MPI_INT, i, 2, MPI_COMM_WORLD);
}
}else{
MPI_Recv(&(matrice[0][0]), nproc*nproc, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
MPI_Recv(&(matrice1[0][0]), nproc*nproc, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);
}
for(int i = 0; i < nproc; i++){
result[i][j] = 0;
for(int j = 0; j < nproc; j++){
result[rank][i] += matrice[rank][j] * matrice1[j][i];
}
}
if(rank != 0){
MPI_Send(&result[rank][0], nproc, MPI_INT, 0, 'p', MPI_COMM_WORLD);
}
if(rank == 0){
for(int i = 1; i < nproc; i++){
MPI_Recv(&result[i][0], nproc, MPI_INT, i, 'p', MPI_COMM_WORLD, &status);
}
}
MPI_Finalize();
}
I am having problems with MPI_Send or MPI_Recv because only the first row of the matrice I receive is correct, the second row is filled with 0 and the others are random.
I don't understand what is causing this problem.
I am having problems with MPI_Send or MPI_Recv because only the first
row of the matrice I receive is correct, the second row is filled with
0 and the others are random.
You are calling the MPI_Send as follows:
MPI_Send(&(matrice[0][0]), nproc*nproc, MPI_INT, i, 1, MPI_COMM_WORLD);
so telling MPI that you will be sending nproc*nproc elements starting from the position &(matrice[0][0]). MPI_Send expects that those nproc*nproc elements are continuously allocated in memory. Therefore, your matrices should be allocated continuously in memory. You can think of the memory layout of such matrices as :
| ------------ data used in the MPI_Send -----------|
| row1 row2 ... rowN |
|[0, 1, 2, 3, N][0, 1, 2, 3, N] ... [0, 1, 2, 3, N]|
\---------------------------------------------------/
From the last element of one row to the first element of the next row there is no gap.
Unfortunately, you have allocated your matrix as:
int **matrice = (int **)malloc(nproc * sizeof(int *));
for ( int i=0; i<nproc; i++)
matrice[i] = (int *)malloc(nproc * sizeof(int));
which does not allocate a matrix continuously in memory, but rather allocates an array of pointers which are not force to be continuously in memory. You can think of that matrix as having the following memory layout:
| ------------ data used in the MPI_Send ----------|
| row1 [0, 1, 2, 3, N] ... (some "random" stuff) |
\--------------------------------------------------/
row2 [0, 1, 2, 3, N] ... (some "random" stuff)
...
rowN [0, 1, 2, 3, N] ... (some "random" stuff)
From the last element of one row to the first element of the next row there might be a memory gap. Consequently, making it impossible for the MPI_Send to know where the next rows starts. That is why you can receive the first row, but not the remaining rows.
Among others you can use the following approaches to solve that issue
allocated the matrix continuously in memory;
send the matrix row by row.
The simplest (and performance-wise better) solution would be for you to use the first approach; check this SO Thread to see how to dynamically allocate a contiguous block of memory for a 2D array.

How to sum a 2D array in C using MPI

This is the program I am using to sum all values in a 1D array, and it works correctly. But how do I modify it to work on 2D array? Imagine variable a is something like a = { {1,2}, {3,4}, {5,6} };.
I tried few solutions but they are not working, so can someone explain few important changes to make to make it compatible with 2D array also.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
// size of array
#define n 10
int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
// Temporary array for slave process
int a2[1000];
int main(int argc, char* argv[])
{
int pid, np,
elements_per_process,
n_elements_recieved;
// np -> no. of processes
// pid -> process id
MPI_Status status;
// Creation of parallel processes
MPI_Init(&argc, &argv);
// find out process ID,
// and how many processes were started
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
// master process
if (pid == 0) {
int index, i;
elements_per_process = n / np;
// check if more than 1 processes are run
if (np > 1) {
// distributes the portion of array
// to child processes to calculate
// their partial sums
for (i = 1; i < np - 1; i++) {
index = i * elements_per_process;
MPI_Send(&elements_per_process,
1, MPI_INT, i, 0,
MPI_COMM_WORLD);
MPI_Send(&a[index],
elements_per_process,
MPI_INT, i, 0,
MPI_COMM_WORLD);
}
// last process adds remaining elements
index = i * elements_per_process;
int elements_left = n - index;
MPI_Send(&elements_left,
1, MPI_INT,
i, 0,
MPI_COMM_WORLD);
MPI_Send(&a[index],
elements_left,
MPI_INT, i, 0,
MPI_COMM_WORLD);
}
// master process add its own sub array
int sum = 0;
for (i = 0; i < elements_per_process; i++)
sum += a[i];
// collects partial sums from other processes
int tmp;
for (i = 1; i < np; i++) {
MPI_Recv(&tmp, 1, MPI_INT,
MPI_ANY_SOURCE, 0,
MPI_COMM_WORLD,
&status);
int sender = status.MPI_SOURCE;
sum += tmp;
}
// prints the final sum of array
printf("Sum of array is : %d\n", sum);
}
// slave processes
else {
MPI_Recv(&n_elements_recieved,
1, MPI_INT, 0, 0,
MPI_COMM_WORLD,
&status);
// stores the received array segment
// in local array a2
MPI_Recv(&a2, n_elements_recieved,
MPI_INT, 0, 0,
MPI_COMM_WORLD,
&status);
// calculates its partial sum
int partial_sum = 0;
for (int i = 0; i < n_elements_recieved; i++)
partial_sum += a2[i];
// sends the partial sum to the root process
MPI_Send(&partial_sum, 1, MPI_INT,
0, 0, MPI_COMM_WORLD);
}
// cleans up all MPI state before exit of process
MPI_Finalize();
return 0;
}
You can simplify a lot by using MPI_Reduce instead of MPI_Send/MPI_Recv:
Reduces values on all processes to a single value
A nice tutorial about that routine can be found here.
So each process contains an array (e.g., process 0 { 1, 2, 3, 4, 5} and process 1 {6, 7, 8, 9, 10 }) and performs the partial sum of that array. In the end, each process uses MPI_Reduce to sum all the partial sums into a single value available to the master process (it could have been another process as well). Have a look at this example:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]){
int np, pid;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
int partial_sum = 0;
if (pid == 0) {
int a[] = { 1, 2, 3, 4, 5};
for(int i = 0; i < 5; i++)
partial_sum += a[i];
}
else if (pid == 1){
int a[] = {6, 7, 8, 9, 10};
for(int i = 0; i < 5; i++)
partial_sum += a[i];
}
int sum;
MPI_Reduce(&partial_sum, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (pid == 0){
printf("Sum of array is : %d\n", sum);
}
MPI_Finalize();
return 0;
}
This code only works with 2 processes (and it is kind of silly( but I am using it to showcase the use of the MPI_Reduce.
I tried few solutions but they are not working, so can someone explain
few important changes to make to make it compatible with 2D array
also.
If you adapt your code to use the MPI_Reduce as I have shown, then it does not matter if it a 1D or 2D array, because you will first do the partial sum into a single value and then performance the reduction.
Alternatively, you can also have each row assigned to a process and then perform a reduction of the entire array, and then the master process performs the sum of the resulting array.
An example:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]){
int np, pid;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
int partial_sum = 0;
int size = 5;
int a[5] = {1, 2, 3 , 4, 5};
int sum[5] = {0};
MPI_Reduce(&a, &sum, size, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (pid == 0){
int total_sum = 0;
for(int i = 0; i < size; i++)
total_sum += sum[i];
printf("Sum of array is : %d\n", total_sum);
}
MPI_Finalize();
return 0;
}
Output (for two processes):
Sum of array is : 30

MPI library and memory

I have some problems with understanding how works shared memory. There are one main process and N others. The main process sent data to other, I made it like this(data is placed in shared_mem[i] for i process):
int *shared_mem = calloc(numb_of_parts, sizeof(double));
if(world_rank == 0)
{
for(int i = 1; i < numb_of_parts; i++)
{
MPI_Send(shared_mem+i, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}
}
Next processes calculate something and write data in the same cell:
{
MPI_Recv(shared_mem+world_rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* do smth with shared_mem[i] */
MPI_Send(shared_mem+world_rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
Then I wait for all processes and want to count the sum of all cells(with new data) in main process:
PI_Barrier(MPI_COMM_WORLD);
if(world_rank == 0)
{
for(int i = 0; i < numb_of_parts; i++)
{
sum += shared_mem[i];
}
}
But as a result I get always sum of previous data i.e. in main process array haven't changed. What is wrong?
Could you try to decleare double *shared_mem = calloc(numb_of_parts, sizeof(double)); ? For the moment, it is decleared as int*, so shared_mem[i] and shared_mem+i may not be what it is expected to be, since the size of int can be different from the size of double.
Moreover, there are features of MPI which can significantly help you:
The function MPI_Scatter() and MPI_Reduce() using MPI_SUM can be combined.
You can allocate shared memory between processes in a given communicator using MPI_Win_allocate_shared(), if such a thing is possible.
And #Gilles is right: the buffer mem_shared is not shared between processes. Indeed, each process allocates its own buffer mem_shared and this is the reason why message passing is required.
Here is a working code based on your code snippets. I had to add the receive par for the root process. Is it what is missing ? Compile with mpicc main.c -o main -sdt=c99 and run it by mpirun -np 4 main.
/* -*- Mode: C; c-basic-offset:4 ; -*- */
/*
* (C) 2001 by Argonne National Laboratory.
* See COPYRIGHT in top-level directory.
*/
/* This is an interactive version of cpi */
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc,char *argv[])
{
int numb_of_parts, rank;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&numb_of_parts);
int *mem = calloc(numb_of_parts, sizeof(double));
if(rank == 0)
{
mem[0]=0;
for(int i = 1; i < numb_of_parts; i++)
{
mem[i]=i;
MPI_Send(mem+i, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}
for(int i = 1; i < numb_of_parts; i++)
{
MPI_Recv(mem+i, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
}else{
MPI_Recv(mem+rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* do smth with shared_mem[i] */
mem[rank]=mem[rank]*2;
MPI_Send(mem+rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
double sum=0;
if(rank == 0)
{
for(int i = 0; i < numb_of_parts; i++)
{
sum += mem[i];
}
printf("sum is %g\n",sum);
}
MPI_Finalize();
return 0;
}
The problem can be in the /* do smth with shared_mem[i] */... if it does nothing, or if it does not modify mem[rank].

MPI Address not mapped in C

I have trouble with MPI_Recv when using malloc? Is there any suggestion to receive a two dimensional array created with malloc?
Thanks.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
#define SIZE 2000
/* Tags defines message from_to */
#define TO_SLAVE_TAG 1
#define TO_MASTER_TAG 5
void createMatrices();
/* Matrices */
int** first;
/* MPI_WORLD rank and size */
int rank, size;
MPI_Status status;
/*
* matrixSize: current matrix size
* lower_bound: lower bound of the number of rows of [first matrix] allocated to a slave
* upper_bound: upper bound of the number of rows of [first matrix] allocated to a slave
* portion: number of the rows of [first matrix] allocated to a slave according to the number of processors
* count: number of data will pass with mpi functions
*/
int matrixSize, lower_bound, upper_bound, portion, count;
int sum = 0;
clock_t t, start_time, end_time;
int main( int argc, char **argv ) {
/* Initialize the MPI execution environment */
MPI_Init( &argc, &argv );
/* Determines the size of the group */
MPI_Comm_size( MPI_COMM_WORLD, &size );
/* Determines the rank of the calling process */
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
if (rank == 0)
{
for (matrixSize = 500; matrixSize <= SIZE; matrixSize += 500) {
createMatrices(matrixSize);
/*
* Master processor divides [first matrix] elements
* and send them to proper slave processors.
* We can start time at this point.
*/
start_time = clock();
/* Define bounds for each processor except master */
for (int i = 1; i < size; ++i)
{
/* Calculate portion for each slave */
portion = (matrixSize / (size - 1));
lower_bound = (i-1) * portion;
if (((i+1)==size) && (matrixSize % (size-1) != 0)) {
upper_bound = matrixSize;
} else {
upper_bound = lower_bound + portion;
}
/* send matrix size to ith slave */
MPI_Send(&matrixSize, 1, MPI_INT, i, TO_SLAVE_TAG, MPI_COMM_WORLD);
/* send lower bount to ith slave */
MPI_Send(&lower_bound, 1, MPI_INT, i, TO_SLAVE_TAG + 1, MPI_COMM_WORLD);
/* send upper bount to ith slave */
MPI_Send(&upper_bound, 1, MPI_INT, i, TO_SLAVE_TAG + 2, MPI_COMM_WORLD);
/* send allocated row of [first matrix] to ith slave */
count = (upper_bound - lower_bound) * matrixSize;
printf("Count: %d\n", count);
MPI_Send(&(first[lower_bound][0]), count, MPI_DOUBLE, i, TO_SLAVE_TAG + 3, MPI_COMM_WORLD);
}
}
}
if (rank > 0)
{
//receive low bound from the master
MPI_Recv(&matrixSize, 1, MPI_INT, 0, TO_SLAVE_TAG, MPI_COMM_WORLD, &status);
printf("Matrix size: %d\n", matrixSize);
//receive low bound from the master
MPI_Recv(&lower_bound, 1, MPI_INT, 0, TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &status);
printf("Lower bound: %d\n", lower_bound);
//next receive upper bound from the master
MPI_Recv(&upper_bound, 1, MPI_INT, 0, TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &status);
printf("Upper bound: %d\n", upper_bound);
//finally receive row portion of [A] to be processed from the master
count = (upper_bound - lower_bound) * matrixSize;
printf("Count: %d\n", count);
MPI_Recv(&first[lower_bound][0], count, MPI_INT, 0, TO_SLAVE_TAG + 3, MPI_COMM_WORLD, &status);
printf("first[0][0]: %d\n", first[0][0]);
}
MPI_Finalize();
return 0;
}
void createMatrices(int mSize) {
/* matrix cols */
first = malloc(mSize * sizeof(int*));
/* matrix rows */
for (int i = 0; i < mSize; ++i)
first[i] = malloc(mSize * sizeof(int));
srand(time(NULL));
for (int i = 0; i < mSize; ++i)
for (int j = 0; j < mSize; ++j)
first[i][j] = rand()%2;
}
And problem is:
*** Process received signal ***
Signal: Segmentation fault: 11 (11)
Signal code: Address not mapped (1)
Failing at address: 0x0
[ 0] 0 libsystem_platform.dylib 0x00007fff89cc8f1a _sigtramp + 26
[ 1] 0 libsystem_c.dylib 0x00007fff73857070 __stack_chk_guard + 0
[ 2] 0 libdyld.dylib 0x00007fff90f535c9 start + 1
[ 3] 0 ??? 0x0000000000000001 0x0 + 1
*** End of error message ***
To avoid (possibly high) latency costs of sending each row individual you need to create a matrix in linear memory. This is done by allocating enough memory for the entire matrix and setting up pointers to each row. Here is your function modified to do so.
void createMatrices(int mSize) {
/* initialize enough linear memory to store whole matrix */
raw_data=malloc(mSize*mSize*sizeof(int*));
/* matrix row pointers i.e. they point to each consecutive row */
first = malloc(mSize * sizeof(int*));
/* set the pointers to the appropriate address */
for (int i = 0; i < mSize; ++i)
first[i] = raw_data + mSize*i;
/* initialize with random values */
srand(time(NULL));
for (int i = 0; i < mSize; ++i)
for (int j = 0; j < mSize; ++j)
first[i][j] = rand()%2;
}
The other major problem you are facing is proper memory handling. You should free your matrices before allocating new ones on the root rank.
You also need to allocate memory for a matrix on the slave ranks before trying to copy over the data. That also needs to be in linear memory as done in the above function.

MPI type matching

I was wondering why this program actually works in MPI (openMPI 1.5/1.6. )
#include <stdio.h>
#include <mpi.h>
#define VECTOR_SIZE 100
int main(int argc,char ** argv) {
int A[VECTOR_SIZE];
int sub_size=2;
int count=10;
MPI_Datatype partial_array;
int rank,size;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Type_vector(count, sub_size,
2*sub_size, MPI_INT, &partial_array);
MPI_Type_commit(&partial_array);
if (rank == 0) {
int i;
// server - initialize data and send
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = i;
}
MPI_Send(&(A[0]), 1, partial_array, 1, 0, MPI_COMM_WORLD);
} else if (rank==1) {
int i;
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = 0;
}
// vector is composed by 20 MPI_INT elements
MPI_Recv(&(A[0]),20, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
printf("\n");
for (i = 0; i<VECTOR_SIZE; i++) {
printf("%d ",A[i]);
}
printf("\n");
}
MPI_Finalize();
}
while this other program where Send and Receive primitives are exchanged does not terminate (the receive never completes):
#include <stdio.h>
#include <mpi.h>
#define VECTOR_SIZE 100
int main(int argc,char ** argv) {
int A[VECTOR_SIZE];
int sub_size=2;
int count=10;
MPI_Datatype partial_array;
int rank,size;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Type_vector(count, sub_size,
2*sub_size, MPI_INT, &partial_array);
MPI_Type_commit(&partial_array);
if (rank == 0) {
int i;
// server - initialize data and send
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = i;
}
MPI_Send(&(A[0]),20, MPI_INT, 0, 0, MPI_COMM_WORLD);
} else if (rank==1) {
int i;
// client - receive data and print
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = 0;
}
MPI_Recv(&(A[0]), 1, partial_array, 1, 0, MPI_COMM_WORLD, &status);
printf("\n");
for (i = 0; i<VECTOR_SIZE; i++) {
printf("%d ",A[i]);
}
printf("\n");
}
MPI_Finalize();
}
If I understand MPI type mathing rules correctly neither of the two should complete.
Obviously in the second program rank 0 is sending to itself and rank 1 is expecting message also from itself:
MPI_Send(&(A[0]),20, MPI_INT, 0, 0, MPI_COMM_WORLD);
destination rank should be 1, not 0
MPI_Recv(&(A[0]), 1, partial_array, 1, 0, MPI_COMM_WORLD, &status);
source rank should be 0, not 1.
Otherwise you do not understand the MPI type matching correctly. It only states that underlying primitive types in the type maps on both ends should match. You are creating a vector whose type map has 20 primitive integers. If you send one element of this type, your message will actually contain 20 integers. On the receiver side you provide space for at least 20 integers so this is correct. The opposite is also correct.
It is not correct if you send only 10 or 18 integers in the second program since they will not make a complete element of the vector type. Nevertheless, the receive operation will complete but if you call MPI_Get_count() on the status, if will return MPI_UNDEFINED because from the number of received primitive integer elements one cannot construct an integer number of vector elements. It is also not correct to mix primitive types, e.g. send MPI_DOUBLE (or vector, or structure, or whatever other type that has doubles) and receive it as MPI_INT.
Please also note that MPI messages do not carry their type map or type ID with them so most MPI implementations do not check if types match. It is possible to send MPI_FLOAT and receive it as MPI_INT (because both are 4 bytes on most systems) but it is not correct to do so.

Resources