Passing submatrices from master to slaves MPI - c

I'm trying to learn MPI and I've run into the following problem in one of my courses:
Consider a matrix A of dimensions n * n in which each element is an integer. Given 2 pair of indices (i1,j1) and (i2,j2) find the submatrix of such dimensions in matrix A for which it's elements sum is maximum.
I'd like some help on how to pass the submatrices to the processes. Should I calculate first how many submatrices (s) are in the matrix and send to each process N/s? How would I send the submatrices?
Some skeleton code I wrote:
using namespace std;
#pragma comment (lib, "msmpi.lib")
enum CommunicationTag
void print_matrix(int mat[10][10], int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
printf("%d ", mat[i][j]);
int main(int argc, char *argv[]) {
//0. Init part, finding rank and number of processes
int numprocs, rank, rc;
rc = MPI_Init(&argc, &argv);
if (rc != MPI_SUCCESS) {
printf("Error starting MPI program. Terminating \n");
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("I'm rank %d. Num procs %d\n", rank, numprocs); fflush(stdout);
//1. different machine code
if (rank == 0)
int n;
scanf("%d", &n);
int i1, i2, j1, j2;
scanf("%d%d%d%d", &i1, &i2, &j1, &j2);
int mat[10][10];
//init data
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++) {
mat[i][j] = (rand() % 100) - 50; //init random between -50 and 49
print_matrix(mat, n);
//here; how do I pass the submatrices to the processes?
for (int i = 1; i < numprocs; i++) {
//here; how do I pass the submatrices to the processes?
else {
//if slave ...

The first step is to stop thinking about how to use MPI_Send(). The basic solution is to use MPI_Bcast() to transmit A to all the MPI processes.
Then divide the work up (no need to communicate for this, the same dividing logic can run in each process). Compute the sums within each MPI process, and collect them in the main process using MPI_Gather(). Choose the largest and you're done.
It really only requires two MPI operations: Bcast to distribute the input data to all processes, and Gather to centralize the results.
Note that all MPI processes need to execute the collective operations together in lockstep. You only need if (rank == 0) to know which process should load the matrix and analyze the Gathered results.


How to scatter multiple variables in an array for MPI_Scatter

I am currently struggling to equally distribute an array with 8 integers to 2 integers per 4 processors. I used MPI_Bcast to let every processors to know there are total array of 8 and each of those will have 2 integers array called "my_input".
MPI_Scatter (input, 2 , MPI_INT, &my_input, 2 , MPI_INT, 0, MPI_COMM_WORLD );
printf("\n my input is %d & %d and rank is %d \n" , my_input[0], my_input[1] , rank);
However after scattering, I see the print function cannot print the 'rank' but all the integers from the 8 integers array. How should I program in order to equally distribute the number of arrays to other processors from root?
Here is my full code (it is just for testing a total of 8 integers, therefore scanf I will enter '8'):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "mpi.h"
int main(int argc, char *argv[])
//initailise MPI
MPI_Init(&argc, &argv);
//Variable to identify processor and total number of processors
int rank, size;
int my_input[0];
//initailse total array variable
int totalarray =0;
//initialise memory array
int* input;
//range of random number
int upper = 100, lower = 0;
//declare processor rank
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
//declare total size of processor
MPI_Comm_size(MPI_COMM_WORLD, &size);
//let root gather N elements from user
if (rank == 0)
printf("Enter a number from 1 to 1000: ");
int number;
//ask user to input number of elements
printf("Your number is %d\n",number);
//Fill the array to power of 2
int totalarray = pow(2, ceil(log(number)/log(2)));
//allocate memory for the array
input = malloc(totalarray * sizeof(int) );
//Add randomise number until N elements
for(int i =0; i<=totalarray ; i++)
if( i<number)
input[i] = (rand() % (upper - lower + 1)) + lower; ;
//padding zero to the extra elements
else if(number <= i < totalarray)
input[i] = 0;
//confirm the input array
printf("the input is: ");
for(int i =0; i < totalarray ; i++)
printf( "%d ", input[i]);
MPI_Scatter (input, 2 , MPI_INT, &my_input, 2 , MPI_INT, 0, MPI_COMM_WORLD );
printf("\n my input is %d & %d and rank is %d \n" , my_input[0], my_input[1] , rank);
return 0;
I used MPI_Bcast to let every processors to know there are total array
of 8 and each of those will have 2 integers array called "my_input".
Yes, that makes sense.
However after scattering, I see the print function cannot print the
'rank' but all the integers from the 8 integers array. How should I
program in order to equally distribute the number of arrays to other
processors from root?
You have some issues with your code. For instance, you declare the variables my_input, totalarray, and input as:
int my_input[0];
int totalarray =0;
int* input;
and then within if (rank == 0) you redefine them again:
int totalarray = pow(2, ceil(log(number)/log(2)));
input = malloc(totalarray * sizeof(int) );
This is not correct, alternatively what you can do is to declare both arrays as int*, namely:
int *my_input;
int *input;
then allocate their space as soon as you know how many elements there will be in each of those arrays.
The input array can be allocated right after the user has inserted the size of that array:
//ask user to input number of elements
printf("Your number is %d\n",number);
input = malloc(totalarray * sizeof(int));
and the my_input array after the master process has broadcast the input size to the other processes:
MPI_Bcast(&totalarray, 1, MPI_INT, 0, MPI_COMM_WORLD);
int *my_input = malloc((totalarray/size) * sizeof(int));
For the variable totalarray just do not declare again within if (rank == 0). Because if you do so, then int totalarray = pow(2, ceil(log(number)/log(2))); will be a different variable that will only exist in the scope of the if (rank == 0).
The second MPI_Bcast call
is unless, since you want to
to equally distribute total 8 integers in an array to 2 integers for
4 processors.
and not that every process has the entire contend of the my_input array of the master process.
For that you need the MPI_Scatter which you do. However, instead of
MPI_Scatter (input, 2 , MPI_INT, &my_input, 2 , MPI_INT, 0, MPI_COMM_WORLD );
do not hardcode the size of the inputs, because if you want to test with different input sizes and/or with a different number of processes the code will not work, do the following instead:
int size_per_process = totalarray/size;
MPI_Scatter (input, size_per_process , MPI_INT, my_input, size_per_process , MPI_INT, 0, MPI_COMM_WORLD );
The loop for(int i =0; i<=totalarray ; i++) should actually be for(int i =0; i< totalarray ; i++), otherwise you are getting out of boundaries of the array input. Personal opinion, but I think that the adding of the random values logic reads better this way:
for(int i =0; i < number ; i++)
input[i] = (rand() % (upper - lower + 1)) + lower;
for(int i = number; i < totalarray; i++)
input[i] = 0;
The final code would look like the following:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "mpi.h"
int main(int argc, char *argv[])
MPI_Init(&argc, &argv);
int rank, size;
int *input;
int totalarray;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0){
printf("Enter a number from 1 to 1000: ");
int number;
printf("Your number is %d\n",number);
totalarray = pow(2, ceil(log(number)/log(2)));
input = malloc(totalarray * sizeof(int));
int upper = 100, lower = 0;
for(int i = 0; i < number ; i++)
input[i] = (rand() % (upper - lower + 1)) + lower;
for(int i = number; i < totalarray; i++)
input[i] = 0;
printf("the input is: ");
for(int i =0; i < totalarray ; i++)
printf( "%d ", input[i]);
MPI_Bcast(&totalarray, 1, MPI_INT, 0, MPI_COMM_WORLD);
int size_per_process = totalarray / size;
int *my_input = malloc(size_per_process * sizeof(int));
printf("SIZE PER %d\n", size_per_process);
MPI_Scatter (input, size_per_process, MPI_INT, my_input, size_per_process, MPI_INT, 0, MPI_COMM_WORLD );
printf("\n my input is %d & %d and rank is %d \n" , my_input[0], my_input[1] , rank);
return 0;
The last print can also be made to be more generic by printing the entire my_input rather than just the first two positions.

For some reason MPI_Waitall gets stuck (in a deadlock I believe) when I test my program with big numbers

For some reason MPI_Waitall is waiting forever when I enter 10000 as the length for the sequence. Basically I create 4 lists of length n/4 where in this case n is 10000 and I an using non-blocking send so my process 0 does not wait for each list to be sent separately as they do not share any values so they are not overwritten.
Keep in mind that the program works with smaller numbers like 1000 or 100 but I am not sure why it does not work with 10000+.
Here is my code:
#include "ore_header.h"
int main(int argc, char** argv) {
int my_rank, p;
void generate_sequence(int *arr, int n);
int subsequence_check(int *arr,int n, int m);
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
int total;
int length;
int flag;
int seq_length;
MPI_Status stats[p];
MPI_Request reqs[p];
int p_length=0;
int *buf[p];
if (my_rank == 0) {
printf("Enter length and sequence length\n");
scanf("%d %d",&length, &seq_length);
p_length = length / p;
for (int i = 0; i < p; i++) {
buf[i] = (int*)malloc(p_length*sizeof(int));
generate_sequence(buf[i], p_length);
MPI_Isend(buf[i], p_length, MPI_INT, i, 0, MPI_COMM_WORLD, &reqs[i]);
printf("Data sent to process %d\n", i);
MPI_Waitall(p, reqs, stats); //Program wont go past this line
printf("\n\n Data sent to all processes \n\n");
MPI_Bcast(&p_length, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(&seq_length, 1, MPI_INT, 0, MPI_COMM_WORLD);
buf[my_rank] = (int*)malloc(p_length*sizeof(int));
MPI_Recv(buf[my_rank], p_length, MPI_INT, 0, 0, MPI_COMM_WORLD, &stats[my_rank]);
printf("\nData received on process: %d Length: %d\n",my_rank,p_length);
//for (int i = 0; i < p_length; i++) {
// printf("%d",buf[my_rank][i]);
total = subsequence_check(buf[my_rank],p_length,seq_length);
printf("\nI am process: %d\nTotal: %d\n",my_rank,total);
return (0);

MPI Bcast to change globals in C

I'm having difficulties with the following functionality. I'm allowing each process to perform work on the global array. Each process starts with the same global array, but if it changes, it has to update the other processes' array.
float * globalArray;
int main() {
//Sequential code to init globalArray
int size = //Length of globalArray
int rank;
int x, i;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
while(x == 0) {
for(i = 0; i < size; i++) {
printf("Values of array: %f\n", globalArray[i]);
x = getCondition();
MPI_Bcast(&globalArray, size, MPI_FLOAT, rank, MPI_COMM_WORLD);
The problem is that the global arrays in the other processes aren't updated when I Bcast it.

MPI_Scatterv doesn't work

I've wrote a program in C/MPI that simply split a NxN matrix in submatrix (for rows) and then giving it to all processes with the routine MPI_Scatterv. The dimension N is not necessarily multiple of the number of processes. I decide to give one more row to a number of processes equal to DIM % size. The code is the following; it doesn't work, and I don't understand why. The error messages is something like this:
job aborted:
rank: node: exit code[: error message]
0: PACI: -1073741819: process 0 exited without calling finalize
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define DIM 4
#define ROOT 0
float **alloc (int, int);
void init (float **, int, int);
void print (float **, int, int);
int main(int argc, char *argv[])
int rank,
int *sendcount = NULL, *displs = NULL;
float **matrix, **recvbuf;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
dimrecv = (int)(DIM / size);
if(rank < (DIM % size))
dimrecv += 1 ;
recvbuf = alloc(dimrecv, DIM);
if (rank == ROOT)
matrix = alloc(DIM, DIM);
init(matrix, DIM, DIM);
sendcount = (int*)calloc(size, sizeof(int));
displs = (int*)calloc(size, sizeof(int));
int total = 0;
printf("MATRIX %d x %d", DIM, DIM);
print(matrix, DIM, DIM);
displs[0] = 0;
for (i = 0; i < size; i++)
if (i < DIM % size)
sendcount[i] = (ceil((float)DIM/size))*DIM;
sendcount[i] = (floor((float)DIM/size))*DIM;
total += sendcount[i];
if (i + 1 < size)
displs[i + 1] = total;
MPI_Scatterv(&(matrix[0][0]), sendcount, displs, MPI_FLOAT,
recvbuf, dimrecv*DIM, MPI_FLOAT, ROOT, MPI_COMM_WORLD);
for(i = 0; i< size; i++)
if (i == rank)
printf("SUBMATRIX P%d", i);
print(recvbuf, dimrecv, DIM);
/* quit */
return 0;
float **alloc(int rows, int cols)
int i;
float *num_elem = (float *)calloc(rows*cols, sizeof(float));
float **matrix= (float **)calloc(rows, sizeof(float*));
for (i=0; i<rows; i++)
matrix[i] = &(num_elem[cols*i]);
return matrix;
void init (float **matrix, int rows, int cols)
int i, j;
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++)
matrix[i][j] = 1 + (rand() % 5);
void print (float **matrix, int rows, int cols)
int i, j;
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++)
printf("%.1f ", matrix[i][j]);
How could I solve the problem, using a dynamic allocation with a double pointer? I've wrote the same program in a static way and it works!.Thanks a lot.
You need to be more careful about which process/rank is allocating memory, and which process/rank is therefore freeing memory.
In your current implementation, you'll want rank == ROOT to allocate and initialize matrix, sendcount, and displs. You'll want every rank to allocate and initialize sendcount and displs (otherwise, when they each enter MPI_Scatterv how do they know what exactly they'll be receiving?). Finally, they'll also need to allocate but not initialize recvbuf. The initialization of this buffer happens internally to the MPI_Scatterv routine.
[Side note: You don't technically need to have each rank initialize sendcount and displs, although this will certainly be fastest. If only the rank == ROOT process has the knowledge to calculate these values, then you'll have to MPI_Bcast both of these arrays to every process before entering the MPI_Scatterv routine.]
And of course you'll then have to ensure that only the correct ranks free the correct memory they previously allocated.
The reason this worked in your static initialization is that each rank "allocated" the memory when you initially statically defined your arrays. Assuming you did this naively, you probably previously used excess memory in that implementation (because, as seen above, not every rank needs to allocate memory for every matrix/array you are using).
Hope this helps.
Thanks Nose for your suggestion. Nevertheless the program doesn't work well. The modified code is the following:
MPI_Bcast(sendcount, 4, MPI_INT, ROOT, MPI_COMM_WORLD);
MPI_Bcast(displs, 4, MPI_INT, ROOT, MPI_COMM_WORLD);
MPI_Scatterv(&(matrix[0][0]), sendcount, displs, MPI_FLOAT,
recvbuf, dimrecv*DIM, MPI_FLOAT, ROOT, MPI_COMM_WORLD);
for(i = 0; i< size; i++)
if (i == rank)
printf("SUBMATRIX P%d", i);
print(recvbuf, dimrecv, DIM);
if (rank == ROOT) {
for (i=0; i<DIM; i++)
for(i=0; i<dimrecv; i++)
sendcount and displs has been allocated outside the visibility of rank ROOT. There must be something wrong in the code that I don't catch.

MPI_Scatter and MPI_Gather dont work

Hallo Iam trying to make a simlpe parralel program in C language uing MPI. Program should find maximum in array. Root process should send chunks of array to all processes using MPI_Scatter and then gather results by MPI_Gather. When I run the program i get general error like this:
Perhaps this Unix error message will help:
Unix errno: 14
Bad address
I know that there is some problem with MPI_Scatter and MPI_Gather or with the values I am sending to this functions.
I was trying to find the solution, but I found nothing what could be useful.
Here is my code:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define BUFSIZE 9
int max(int *buf, int N){
int i;
int value = 0;
for(i=0; i<N; i++){
if (buf[i]>value){
value = buf[i];
return value;
int main(int argc, char** argv)
{ int size, rank;
int slave;
int *buf;
int *buf1;
int *buf2;
int i, n, value;
MPI_Status status;
/* Initialize MPI */
* Determine size in the world group.
MPI_Comm_size(MPI_COMM_WORLD, &size);
if ((BUFSIZE % size) != 0) {
printf("Wrong Bufsize ");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank==0) {
buf = (int *)malloc(BUFSIZE*sizeof(int));
buf2 = (int *)malloc(size*sizeof(int));
printf("\n Generated array: \n");
for(i=0; i<BUFSIZE; i++){
buf[i] = rand() % 20;
printf("%d, ", buf[i]);
printf("\n Sending values to processes:");
printf("\n -----------------------------");
buf1 = (int *)malloc((BUFSIZE/size)*sizeof(int));
MPI_Scatter(buf, BUFSIZE/size, MPI_INT, buf1, BUFSIZE/size, MPI_INT, 0, MPI_COMM_WORLD);
value = max(&buf1[0], BUFSIZE/size);
printf("\n Max from rocess %d : %d \n", rank, max(&buf1[0], BUFSIZE/size));
MPI_Gather(&value, 1, MPI_INT, buf2, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0){
printf("\n Max value: %d", max(&buf2[0], size));
Initialize your pointers to NULL, and track them.
use buf1 instead of &buf1[0], is more clear.
free your buffers before MPI_Finalize() with:
if(bufferPionter != NULL) free(bufferPionter);
If something is wrong with a pointer will crash in the free call. In the max function, If all your numbers are less than zero the maximun is zero. i fix that.
int max(int *buf, int N){
int i;
int value = N? buf[0] : 0;
for(i=0; i<N; i++){
if (buf[i]>value){
value = buf[i];
return value;
Best regards!
