This is the program I am using to sum all values in a 1D array, and it works correctly. But how do I modify it to work on 2D array? Imagine variable a is something like a = { {1,2}, {3,4}, {5,6} };.
I tried few solutions but they are not working, so can someone explain few important changes to make to make it compatible with 2D array also.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
// size of array
#define n 10
int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
// Temporary array for slave process
int a2[1000];
int main(int argc, char* argv[])
{
int pid, np,
elements_per_process,
n_elements_recieved;
// np -> no. of processes
// pid -> process id
MPI_Status status;
// Creation of parallel processes
MPI_Init(&argc, &argv);
// find out process ID,
// and how many processes were started
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
// master process
if (pid == 0) {
int index, i;
elements_per_process = n / np;
// check if more than 1 processes are run
if (np > 1) {
// distributes the portion of array
// to child processes to calculate
// their partial sums
for (i = 1; i < np - 1; i++) {
index = i * elements_per_process;
MPI_Send(&elements_per_process,
1, MPI_INT, i, 0,
MPI_COMM_WORLD);
MPI_Send(&a[index],
elements_per_process,
MPI_INT, i, 0,
MPI_COMM_WORLD);
}
// last process adds remaining elements
index = i * elements_per_process;
int elements_left = n - index;
MPI_Send(&elements_left,
1, MPI_INT,
i, 0,
MPI_COMM_WORLD);
MPI_Send(&a[index],
elements_left,
MPI_INT, i, 0,
MPI_COMM_WORLD);
}
// master process add its own sub array
int sum = 0;
for (i = 0; i < elements_per_process; i++)
sum += a[i];
// collects partial sums from other processes
int tmp;
for (i = 1; i < np; i++) {
MPI_Recv(&tmp, 1, MPI_INT,
MPI_ANY_SOURCE, 0,
MPI_COMM_WORLD,
&status);
int sender = status.MPI_SOURCE;
sum += tmp;
}
// prints the final sum of array
printf("Sum of array is : %d\n", sum);
}
// slave processes
else {
MPI_Recv(&n_elements_recieved,
1, MPI_INT, 0, 0,
MPI_COMM_WORLD,
&status);
// stores the received array segment
// in local array a2
MPI_Recv(&a2, n_elements_recieved,
MPI_INT, 0, 0,
MPI_COMM_WORLD,
&status);
// calculates its partial sum
int partial_sum = 0;
for (int i = 0; i < n_elements_recieved; i++)
partial_sum += a2[i];
// sends the partial sum to the root process
MPI_Send(&partial_sum, 1, MPI_INT,
0, 0, MPI_COMM_WORLD);
}
// cleans up all MPI state before exit of process
MPI_Finalize();
return 0;
}
You can simplify a lot by using MPI_Reduce instead of MPI_Send/MPI_Recv:
Reduces values on all processes to a single value
A nice tutorial about that routine can be found here.
So each process contains an array (e.g., process 0 { 1, 2, 3, 4, 5} and process 1 {6, 7, 8, 9, 10 }) and performs the partial sum of that array. In the end, each process uses MPI_Reduce to sum all the partial sums into a single value available to the master process (it could have been another process as well). Have a look at this example:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]){
int np, pid;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
int partial_sum = 0;
if (pid == 0) {
int a[] = { 1, 2, 3, 4, 5};
for(int i = 0; i < 5; i++)
partial_sum += a[i];
}
else if (pid == 1){
int a[] = {6, 7, 8, 9, 10};
for(int i = 0; i < 5; i++)
partial_sum += a[i];
}
int sum;
MPI_Reduce(&partial_sum, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (pid == 0){
printf("Sum of array is : %d\n", sum);
}
MPI_Finalize();
return 0;
}
This code only works with 2 processes (and it is kind of silly( but I am using it to showcase the use of the MPI_Reduce.
I tried few solutions but they are not working, so can someone explain
few important changes to make to make it compatible with 2D array
also.
If you adapt your code to use the MPI_Reduce as I have shown, then it does not matter if it a 1D or 2D array, because you will first do the partial sum into a single value and then performance the reduction.
Alternatively, you can also have each row assigned to a process and then perform a reduction of the entire array, and then the master process performs the sum of the resulting array.
An example:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]){
int np, pid;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
int partial_sum = 0;
int size = 5;
int a[5] = {1, 2, 3 , 4, 5};
int sum[5] = {0};
MPI_Reduce(&a, &sum, size, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (pid == 0){
int total_sum = 0;
for(int i = 0; i < size; i++)
total_sum += sum[i];
printf("Sum of array is : %d\n", total_sum);
}
MPI_Finalize();
return 0;
}
Output (for two processes):
Sum of array is : 30
Related
I am new to programing with MPI and I have an exercise where I have to multiply 2 matrices using MPI_Send and MPI_Recv while sending both matrices to my processes and sending back the result to the root process. (both matrices are square and N is equal to the number of processes).
This is the code I have written:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[]){
srand(time(NULL));
int rank, nproc;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
int **matrice = (int **)malloc(nproc * sizeof(int *));
for ( int i=0; i<nproc; i++)
matrice[i] = (int *)malloc(nproc * sizeof(int));
int **matrice1 = (int **)malloc(nproc * sizeof(int *));
for (int i=0; i<nproc; i++)
matrice1[i] = (int *)malloc(nproc * sizeof(int));
int **result = (int **)malloc(nproc * sizeof(int *));
for (int i=0; i<nproc; i++)
result[i] = (int *)malloc(nproc * sizeof(int));
if(rank == 0){
for(int i = 0; i < nproc; i++){
for(int j = 0; j < nproc; j++){
matrice[i][j] = (rand() % 20) + 1;
matrice1[i][j] = (rand() % 20) + 1;
}
}
for(int i = 1; i < nproc; i++){
MPI_Send(&(matrice[0][0]), nproc*nproc, MPI_INT, i, 1, MPI_COMM_WORLD);
MPI_Send(&(matrice1[0][0]), nproc*nproc, MPI_INT, i, 2, MPI_COMM_WORLD);
}
}else{
MPI_Recv(&(matrice[0][0]), nproc*nproc, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
MPI_Recv(&(matrice1[0][0]), nproc*nproc, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);
}
for(int i = 0; i < nproc; i++){
result[i][j] = 0;
for(int j = 0; j < nproc; j++){
result[rank][i] += matrice[rank][j] * matrice1[j][i];
}
}
if(rank != 0){
MPI_Send(&result[rank][0], nproc, MPI_INT, 0, 'p', MPI_COMM_WORLD);
}
if(rank == 0){
for(int i = 1; i < nproc; i++){
MPI_Recv(&result[i][0], nproc, MPI_INT, i, 'p', MPI_COMM_WORLD, &status);
}
}
MPI_Finalize();
}
I am having problems with MPI_Send or MPI_Recv because only the first row of the matrice I receive is correct, the second row is filled with 0 and the others are random.
I don't understand what is causing this problem.
I am having problems with MPI_Send or MPI_Recv because only the first
row of the matrice I receive is correct, the second row is filled with
0 and the others are random.
You are calling the MPI_Send as follows:
MPI_Send(&(matrice[0][0]), nproc*nproc, MPI_INT, i, 1, MPI_COMM_WORLD);
so telling MPI that you will be sending nproc*nproc elements starting from the position &(matrice[0][0]). MPI_Send expects that those nproc*nproc elements are continuously allocated in memory. Therefore, your matrices should be allocated continuously in memory. You can think of the memory layout of such matrices as :
| ------------ data used in the MPI_Send -----------|
| row1 row2 ... rowN |
|[0, 1, 2, 3, N][0, 1, 2, 3, N] ... [0, 1, 2, 3, N]|
\---------------------------------------------------/
From the last element of one row to the first element of the next row there is no gap.
Unfortunately, you have allocated your matrix as:
int **matrice = (int **)malloc(nproc * sizeof(int *));
for ( int i=0; i<nproc; i++)
matrice[i] = (int *)malloc(nproc * sizeof(int));
which does not allocate a matrix continuously in memory, but rather allocates an array of pointers which are not force to be continuously in memory. You can think of that matrix as having the following memory layout:
| ------------ data used in the MPI_Send ----------|
| row1 [0, 1, 2, 3, N] ... (some "random" stuff) |
\--------------------------------------------------/
row2 [0, 1, 2, 3, N] ... (some "random" stuff)
...
rowN [0, 1, 2, 3, N] ... (some "random" stuff)
From the last element of one row to the first element of the next row there might be a memory gap. Consequently, making it impossible for the MPI_Send to know where the next rows starts. That is why you can receive the first row, but not the remaining rows.
Among others you can use the following approaches to solve that issue
allocated the matrix continuously in memory;
send the matrix row by row.
The simplest (and performance-wise better) solution would be for you to use the first approach; check this SO Thread to see how to dynamically allocate a contiguous block of memory for a 2D array.
So let's say I'm using MPI and I want to be able to send a number of rows of a matrix of integers from the main process to other processes. It's relatively easy to do so, like this:
MPI_Send(&matrix[start_row][0], amount_of_cells, MPI_INT, target_process, 1, MPI_COMM_WORLD);
Now let's say that in our matrix, instead of each cell holding an integer, each cell holds a reference to an integer array of size 2. How could we send a number of rows of the new matrix to subprocesses?
I was thinking of doing the same thing as the code above but doubling the amount_of_cells variable because each cell holds a reference to an integer array of size 2. However, it doesn't seem to work, I'm at a bit of a loss here.
Any tips or advice on how to approach this would be helpful?
Old matrix:
_________
| 1 | 2 |
--------
| 3 | 4 |
_________
New matrix:
___________________
| [1, 0] | [2, 0] |
--------------------
| [3, 0] | [4, 0] |
___________________
So instead of holding integers, each cell holds a reference to an array of size 2 of integers created using malloc(). How could I send rows of this new matrix to subprocesses?
I was considering defining my own MPI datatype which could be a place to start.
Please try this code,To send an array of pointers to arrays in C MPI
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define n 10
int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int a2[1000];
int main(int argc, char* argv[])
{
int pid, np,
elements_per_process,
n_elements_recieved;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (pid == 0) {
int index, i;
elements_per_process = n / np;
if (np > 1) {
for (i = 1; i < np - 1; i++) {
index = i * elements_per_process;
MPI_Send(&elements_per_process,
1, MPI_INT, i, 0,
MPI_COMM_WORLD);
MPI_Send(&a[index],
elements_per_process,
MPI_INT, i, 0,
MPI_COMM_WORLD);
}
index = i * elements_per_process;
int elements_left = n - index;
MPI_Send(&elements_left,
1, MPI_INT,
i, 0,
MPI_COMM_WORLD);
MPI_Send(&a[index],
elements_left,
MPI_INT, i, 0,
MPI_COMM_WORLD);
}
int sum = 0;
for (i = 0; i < elements_per_process; i++)
sum += a[i];
int tmp;
for (i = 1; i < np; i++) {
MPI_Recv(&tmp, 1, MPI_INT,
MPI_ANY_SOURCE, 0,
MPI_COMM_WORLD,
&status);
int sender = status.MPI_SOURCE;
sum += tmp;
}
printf("Sum of array is : %d\n", sum);
}
else {
MPI_Recv(&n_elements_recieved,
1, MPI_INT, 0, 0,
MPI_COMM_WORLD,
&status);
MPI_Recv(&a2, n_elements_recieved,
MPI_INT, 0, 0,
MPI_COMM_WORLD,
&status);
int partial_sum = 0;
for (int i = 0; i < n_elements_recieved; i++)
partial_sum += a2[i];
MPI_Send(&partial_sum, 1, MPI_INT,
0, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
I hope this code will be usefull.
Thank you.
This is the very basic function of my program, and as such is not necessarily reproducible. However, I was wondering if there is a way to send an array of arrays using MPI? Or is this something that is not possible and I should flatten my array? Any help would be greatly appreciated as I've been struggling with trying to figure this out.
int *individual_topIds;
int **cell_topIds;
cell_topIds = (int**) malloc(sizeof(int*)*25*boxes);
if(rank == 0) {
for (int i = 0; i < boxes; i++) {
individual_topIds = (int*) malloc(sizeof(int)*25);
for(int j = 0; j < cellMatrix[i].numTop; j++){
individual_topIds[j] = cellMatrix[i].aTopIds[j];
}
cell_topIds[i] = individual_topIds;
}
MPI_Send(cell_topIds, boxes*25, MPI_INT, 1, 10, MPI_COMM_WORLD);
}
Then in my rank == 1 section. I have tried send and receive with just boxes, and not boxes*25 as well.
for 1 -> boxes
MPI_Recv(cell_topIds, boxes*25, MPI_INT, 0, 10, MPI_COMM_WORLD, &status);
int *ptop;
ptop = (int*) malloc(sizeof(int)*25);
ptop = cell_topIds[i];
printf("1\n");
for(int j = 0; j < sizeof(&ptop)/sizeof(int); j++){
printf("%d, ", ptop[j]);
}
printf("2\n");
end for i -> boxes
free(ptop);
Edit: Forgot to mention that the output of the print is a seg fault
Caught error: Segmentation fault (signal 11)
This is not a particularly well-worded question.
However, MPI will let you send arrays of arrays if you use a custom type, as below:
#include "mpi.h"
#include <stdio.h>
struct Partstruct
{
char c;
double d[6];
char b[7];
};
int main(int argc, char *argv[])
{
struct Partstruct particle[1000];
int i, j, myrank;
MPI_Status status;
MPI_Datatype Particletype;
MPI_Datatype type[3] = { MPI_CHAR, MPI_DOUBLE, MPI_CHAR };
int blocklen[3] = { 1, 6, 7 };
MPI_Aint disp[3];
MPI_Init(&argc, &argv);
disp[0] = &particle[0].c - &particle[0];
disp[1] = &particle[0].d - &particle[0];
disp[2] = &particle[0].b - &particle[0];
MPI_Type_create_struct(3, blocklen, disp, type, &Particletype);
MPI_Type_commit(&Particletype);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0)
{
MPI_Send(particle, 1000, Particletype, 1, 123, MPI_COMM_WORLD);
}
else if (myrank == 1)
{
MPI_Recv(particle, 1000, Particletype, 0, 123, MPI_COMM_WORLD, &status);
}
MPI_Finalize();
return 0;
}
Alternatively, use a flat array design (this is a good idea for performance reasons as well as easy compatibility with MPI).
I have some problems with understanding how works shared memory. There are one main process and N others. The main process sent data to other, I made it like this(data is placed in shared_mem[i] for i process):
int *shared_mem = calloc(numb_of_parts, sizeof(double));
if(world_rank == 0)
{
for(int i = 1; i < numb_of_parts; i++)
{
MPI_Send(shared_mem+i, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}
}
Next processes calculate something and write data in the same cell:
{
MPI_Recv(shared_mem+world_rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* do smth with shared_mem[i] */
MPI_Send(shared_mem+world_rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
Then I wait for all processes and want to count the sum of all cells(with new data) in main process:
PI_Barrier(MPI_COMM_WORLD);
if(world_rank == 0)
{
for(int i = 0; i < numb_of_parts; i++)
{
sum += shared_mem[i];
}
}
But as a result I get always sum of previous data i.e. in main process array haven't changed. What is wrong?
Could you try to decleare double *shared_mem = calloc(numb_of_parts, sizeof(double)); ? For the moment, it is decleared as int*, so shared_mem[i] and shared_mem+i may not be what it is expected to be, since the size of int can be different from the size of double.
Moreover, there are features of MPI which can significantly help you:
The function MPI_Scatter() and MPI_Reduce() using MPI_SUM can be combined.
You can allocate shared memory between processes in a given communicator using MPI_Win_allocate_shared(), if such a thing is possible.
And #Gilles is right: the buffer mem_shared is not shared between processes. Indeed, each process allocates its own buffer mem_shared and this is the reason why message passing is required.
Here is a working code based on your code snippets. I had to add the receive par for the root process. Is it what is missing ? Compile with mpicc main.c -o main -sdt=c99 and run it by mpirun -np 4 main.
/* -*- Mode: C; c-basic-offset:4 ; -*- */
/*
* (C) 2001 by Argonne National Laboratory.
* See COPYRIGHT in top-level directory.
*/
/* This is an interactive version of cpi */
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc,char *argv[])
{
int numb_of_parts, rank;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&numb_of_parts);
int *mem = calloc(numb_of_parts, sizeof(double));
if(rank == 0)
{
mem[0]=0;
for(int i = 1; i < numb_of_parts; i++)
{
mem[i]=i;
MPI_Send(mem+i, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}
for(int i = 1; i < numb_of_parts; i++)
{
MPI_Recv(mem+i, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
}else{
MPI_Recv(mem+rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* do smth with shared_mem[i] */
mem[rank]=mem[rank]*2;
MPI_Send(mem+rank, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
double sum=0;
if(rank == 0)
{
for(int i = 0; i < numb_of_parts; i++)
{
sum += mem[i];
}
printf("sum is %g\n",sum);
}
MPI_Finalize();
return 0;
}
The problem can be in the /* do smth with shared_mem[i] */... if it does nothing, or if it does not modify mem[rank].
I was wondering why this program actually works in MPI (openMPI 1.5/1.6. )
#include <stdio.h>
#include <mpi.h>
#define VECTOR_SIZE 100
int main(int argc,char ** argv) {
int A[VECTOR_SIZE];
int sub_size=2;
int count=10;
MPI_Datatype partial_array;
int rank,size;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Type_vector(count, sub_size,
2*sub_size, MPI_INT, &partial_array);
MPI_Type_commit(&partial_array);
if (rank == 0) {
int i;
// server - initialize data and send
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = i;
}
MPI_Send(&(A[0]), 1, partial_array, 1, 0, MPI_COMM_WORLD);
} else if (rank==1) {
int i;
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = 0;
}
// vector is composed by 20 MPI_INT elements
MPI_Recv(&(A[0]),20, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
printf("\n");
for (i = 0; i<VECTOR_SIZE; i++) {
printf("%d ",A[i]);
}
printf("\n");
}
MPI_Finalize();
}
while this other program where Send and Receive primitives are exchanged does not terminate (the receive never completes):
#include <stdio.h>
#include <mpi.h>
#define VECTOR_SIZE 100
int main(int argc,char ** argv) {
int A[VECTOR_SIZE];
int sub_size=2;
int count=10;
MPI_Datatype partial_array;
int rank,size;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Type_vector(count, sub_size,
2*sub_size, MPI_INT, &partial_array);
MPI_Type_commit(&partial_array);
if (rank == 0) {
int i;
// server - initialize data and send
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = i;
}
MPI_Send(&(A[0]),20, MPI_INT, 0, 0, MPI_COMM_WORLD);
} else if (rank==1) {
int i;
// client - receive data and print
for (i = 0; i< VECTOR_SIZE; i++) {
A[i] = 0;
}
MPI_Recv(&(A[0]), 1, partial_array, 1, 0, MPI_COMM_WORLD, &status);
printf("\n");
for (i = 0; i<VECTOR_SIZE; i++) {
printf("%d ",A[i]);
}
printf("\n");
}
MPI_Finalize();
}
If I understand MPI type mathing rules correctly neither of the two should complete.
Obviously in the second program rank 0 is sending to itself and rank 1 is expecting message also from itself:
MPI_Send(&(A[0]),20, MPI_INT, 0, 0, MPI_COMM_WORLD);
destination rank should be 1, not 0
MPI_Recv(&(A[0]), 1, partial_array, 1, 0, MPI_COMM_WORLD, &status);
source rank should be 0, not 1.
Otherwise you do not understand the MPI type matching correctly. It only states that underlying primitive types in the type maps on both ends should match. You are creating a vector whose type map has 20 primitive integers. If you send one element of this type, your message will actually contain 20 integers. On the receiver side you provide space for at least 20 integers so this is correct. The opposite is also correct.
It is not correct if you send only 10 or 18 integers in the second program since they will not make a complete element of the vector type. Nevertheless, the receive operation will complete but if you call MPI_Get_count() on the status, if will return MPI_UNDEFINED because from the number of received primitive integer elements one cannot construct an integer number of vector elements. It is also not correct to mix primitive types, e.g. send MPI_DOUBLE (or vector, or structure, or whatever other type that has doubles) and receive it as MPI_INT.
Please also note that MPI messages do not carry their type map or type ID with them so most MPI implementations do not check if types match. It is possible to send MPI_FLOAT and receive it as MPI_INT (because both are 4 bytes on most systems) but it is not correct to do so.