Make slaves wait for MPI_Bcast from master - c

I'm trying to write a parallel program that implements a pipeline version of Gaussian elimination, using MPI and C language...
However I'm encountering some difficulties early in the implementation of the code....
I use a root process to read a data matrix from a text file... this process gives-me the size of this matrix and I broadcast the size of it to all other processes in order for them to allocate it in memory... However, the slave processes are trying to allocate it before the broadcast from the root...
How can I make them wait?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mpi.h>
int CalcInd(int i, int j, int dimL)
return i*dimL +j;
int main (int argc, char **argv)
FILE *fin, *fout;
char fA[] = "Matrix.txt";
int rank, size, i, ii, j, k, m, n, picked, tmp, total;
int counter=0, elements=0;
int * RightNeigbhor, * LeftNeigbhor, * loc;
float f, magnitude, t;
float * A, * x;
MPI_Status status;
MPI_Request request;
// MPI initialization
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if(rank == 0)
// Defenição dos processos vizinhos pelo master
RightNeigbhor = (int *)calloc(size,sizeof(int));
{printf("!!! Could not allocate memory !!!\n"); exit(-1);}
LeftNeigbhor = (int *)calloc(size,sizeof(int));
{printf("!!! Could not allocate memory !!!\n"); exit(-1);}
for(i = 0; i < size; i++ )
RightNeigbhor[i] = (rank + 1) % size;
LeftNeigbhor[i] = (rank - 1) % size;
// Broadcast os processos vizinhos para todos os processos
MPI_Bcast ( RightNeigbhor, size, MPI_INTEGER, rank, MPI_COMM_WORLD );
MPI_Bcast ( LeftNeigbhor, size, MPI_INTEGER, rank, MPI_COMM_WORLD );
// Leitura da matriz A pelo master
fin = fopen ( fA, "r" );
if (fin == NULL){ printf("!!! FILE NOT FOUND !!!"); exit(-1); }
while( !feof(fin))
fscanf (fin, "%f", &f);
f = 0;
while( !feof(fin))
if(fgetc(fin) == '\n')
n = counter;
m = (elements-1) / counter;
total = n*m;
MPI_Bcast ( &total, 1, MPI_INT, rank, MPI_COMM_WORLD );
MPI_Bcast ( &n, 1, MPI_INT, rank, MPI_COMM_WORLD );
// Alocação de variaveis
A = (float *)calloc(total,sizeof(float));
if(A==NULL){printf("!!! Could not allocate memory !!!\n"); exit(-1);}
loc = (int *)calloc(n,sizeof(int*));
if(loc==NULL){printf("!!! Could not allocate memory !!!\n"); exit(-1);}

Everything in your rank == 0 block runs only in process 0. While process rank == 1 ... n just skip that block. Therefore, you have to put your MPI_Bcast calls in an environment which is visible for all process in MPI_Comm comm here MPI_COMM_WORLD. When process 1...n skip all the initialization and jump to the broadcast before process 0 reaches it, they will wait till the bcast has occured.


distributed algorithm in C

I am a beginner in C. I have to create a distributed architecture with the library MPI. The following code is:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
int main(int argc, char **argv)
int N, w = 1, L = 2, M = 50; // with N number of threads
int T= 2;
int myid;
int buff;
float mit[N][T]; // I initialize a 2d array
for(int i = 0; i < N; ++i){
mit[i][0]= M / (float) N;
for (int j = 1; j < T; ++j){
mit[i][j] = 0;
float tab[T]; // 1d array
MPI_Status stat;
MPI_Init(&argc,&argv); // Initialisation
MPI_Comm_size(MPI_COMM_WORLD, &N);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
for(int j = 0; j < T; j++) {
for(int i = 0; i < N; i++) { // I iterate for each slave
if (myid !=0) {
float y = ((float) rand()) / (float) RAND_MAX;
mit[i][j + 1] = mit[i][j]*(1 + w * L * y);
MPI_Send(&buff, 128, MPI_INT, 0, 0, MPI_COMM_WORLD); // I send the variable buff to the master
if( myid == 0 ) { // Master
for(int i = 1; i < N; i++){
MPI_Recv(&buff, 128, MPI_INT, i, 0, MPI_COMM_WORLD, &stat);
tab[j] += buff; // I need to receive all the variables buff sent by the salves, sum them and stock into the tab at the index j
printf("\n%.20f\n",tab[j]); // I print the result of the sum at index j
return 0;
I use the command in the terminal: mpicc .c -o my_file to compile the program
Then mpirun -np 101 my_file_c to start the program with 101 threads
But the problem is I have the following error int the terminal:
It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
It seems that I have a problem with the master but i don't know why...
Any idea ???
Thank you :)
This behavior is very likely the result of a memory corruption.
You cannot
int buff=mit[i][j+1];
MPI_Send(&buff, 128, MPI_INT, ...);
depending on what you want to achieve, you can try instead
int buff=mit[i][j+1];
MPI_Send(&buff, 1, MPI_INT, ...);
// ...
MPI_Recv(&buff, 1, MPI_INT, ...);
int *buff=&mit[i][j+1];
MPI_Send(buff, 128, MPI_INT, ...);
// fix MPI_Recv()

How to send a integer array via MPI_Send?

I'm trying to create a program in regular C that divides an integer array equally between any amount of process. For debugging purposes I'm using an integer array with 12 numbers and only 2 process so that the master process will have [1,2,3,4,5,6] and the slave1 will have [7,8,9,10,11,12]. However I'm getting an error saying: MPI_ERR_BUFFER: invalid buffer pointer.
After some research I found out that there is a function that does that (MPI_Scatter). Unfortunately, since I'm learning MPI the implementation is restricted to MPI_Send and MPI_Recv only. Anyway, both MPI_Send and MPI_Recv use a void*, and I'm sending a int* so it should work. Can anyone point out what am I doing wrong? Thank you.
int* create_sub_vec(int begin, int end, int* origin);
void print(int my_rank, int comm_sz, int n_over_p, int* sub_vec);
int main(void){
int comm_sz;
int my_rank;
int vec[12] = {1,2,3,4,5,6,7,8,9,10,11,12};
int* sub_vec = NULL;
int n_over_p;
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
n_over_p = 12/comm_sz;
printf("Process %d calcula n_over_p = %d\n", my_rank, n_over_p);
if (my_rank != 0) {
MPI_Recv(sub_vec, n_over_p, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
print(my_rank, comm_sz, n_over_p, sub_vec);
} else {
printf("Distribuindo dados\n");
for (int i = 1; i < comm_sz; i++) {
sub_vec = create_sub_vec(i*n_over_p, (i*n_over_p)+n_over_p, vec);
MPI_Send(sub_vec, n_over_p, MPI_INT, i, 0, MPI_COMM_WORLD);
printf("Fim da distribuicao de dados\n");
sub_vec = create_sub_vec(0, n_over_p, vec);
print(my_rank, comm_sz, n_over_p, sub_vec);
return 0;
int* create_sub_vec(int begin, int end, int* origin){
int* sub_vec;
int size;
int aux = 0;
size = end - begin;
sub_vec = (int*)malloc(size * sizeof(int));
for (int i = begin; i < end; ++i) {
*(sub_vec+aux) = *(origin+i);
aux += 1;
return sub_vec;
void print(int my_rank, int comm_sz, int n_over_p, int* sub_vec){
printf("Process %d out of %d received sub_vecotr: [ ", my_rank, comm_sz);
for (int i = 0; i < n_over_p; ++i)
printf("%d, ", *(sub_vec+i));
The issue is that sub_vec is not allocated on non zero rank.
It is up to you to do that (e.g. MPI does not allocate the receive buffer).
the receive part should look like
if (my_rank != 0) {
sub_vec = (int *)malloc(n_over_p * sizeof(int));
MPI_Recv(sub_vec, n_over_p, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
As you wrote, the natural way is via MPI_Scatter() (and once again, it is up to you to allocate the receive buffer before starting the scatter.

Basic Matrix operation with dynamical array allocation using MPI

I have already looked for answers about MPI and dynamic allocation, but there is still an error in my code.
I think the pairs send/receive work well. The problem is probably due to the identical part when I want to do some basic operations. I can't specify indices of the array, otherwise I get this error:
[lyomatnuc09:07574] * Process received signal *
[lyomatnuc09:07575] * Process received signal *
[lyomatnuc09:07575] Signal: Segmentation fault (11)
[lyomatnuc09:07575] Signal code: Address not mapped (1)
[lyomatnuc09:07575] Failing at address: 0x60
The basic code that reproduce the error is below :
int **alloc_array(int rows, int cols) {
int *data = (int *)malloc(rows*cols*sizeof(int));
int **array= (int **)malloc(rows*sizeof(int*));
for (int i=0; i<rows; i++)
array[i] = &(data[cols*i]);
return array;
int main(int argc, char *argv[])
MPI_Init(&argc, &argv); //initialize MPI operations
MPI_Comm_rank(MPI_COMM_WORLD, &rank); //get the rank
MPI_Comm_size(MPI_COMM_WORLD, &size); //get number of processes
MPI_Datatype columntype;
MPI_Type_vector(10, 1, 10, MPI_INT, &columntype);
start_time = MPI_Wtime();
if (rank == 0)
int **A;
A = alloc_array(10,10);
for ( int i =1 ;i<size;i++)
MPI_Send(&(A[0][0]), 10*10, MPI_INT, i, 1, MPI_COMM_WORLD);
} else if (rank >= 1) {
int **A2;
A2 = alloc_array(10,10);
MPI_Recv(&(A2[0][0]), 10*10, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
for (int i =0; i<10; i++)
for ( int j=0; j<10;i++)
A2[i][j]=i*j;//bug here
}//end slaves task
return 0;

fread() in MPI is giving Signal 7 Bus Error

I am a newbie to C and MPI.
I have the following code which I am using with MPI.
#include "RabinKarp.c"
#include <stdio.h>
#include <stdlib.h>
#include </usr/include/mpi/mpi.h>
typedef struct {
int lowerOffset;
int upperOffset;
int processorNumber;
} patternPartitioning;
int rank;
FILE *fp;
char* filename = "/home/rohit/Downloads/10_seqs_2000_3000_bp.fasta";
int n = 0;
int d = 0;
//number of processors
int k, i = 0, lower_limit, upper_limit;
int main(int argc, char** argv) {
char* pattern= "taaat";
patternPartitioning partition[k];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &k);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
fp = fopen(filename, "rb");
if (fp != '\0') {
fseek(fp, 0L, SEEK_END);
n = ftell(fp);
fseek(fp, 0L, SEEK_SET);
//Do for Master Processor
if(rank ==0){
int m = strlen(pattern);
printf("pattern length is %d \n", m);
d = (int)(n - m + 1) / k;
for (i = 0; i <= k - 2; i++) {
lower_limit = round(i * d);
upper_limit = round((i + 1) * d) + m - 1;
partition->lowerOffset = lower_limit;
partition->upperOffset = upper_limit;
partition->processorNumber = i+1;
// k-2 times calculate the limits like this
printf(" the lower limit is %d and upper limit is%d\n",
partition->lowerOffset, partition->upperOffset);
int mpi_send_block[2];
mpi_send_block[0]= lower_limit;
mpi_send_block[1] = upper_limit;
MPI_Send(mpi_send_block, 2, MPI_INT, i+1, i+1, MPI_COMM_WORLD);
//int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm);
// for the last processor calculate the index here
lower_limit = round((k - 1) * d);
upper_limit = n;
partition->lowerOffset = lower_limit;
partition->upperOffset = n;
partition->processorNumber = k;
printf("Processor : %d : has start : %d : and end : %d :\n",rank,partition->lowerOffset,partition->upperOffset);
//perform the search here
int size = partition->upperOffset-partition->lowerOffset;
char *text = (char*) malloc (size);
fseek(fp,partition->lowerOffset , SEEK_SET);
fread(&text, sizeof(char), size, fp);
printf("read in rank0");
int number =0;
number = rabincarp(text,pattern);
for (i = 0; i <= k - 2; i++) {
int res[1];
MPI_Status status;
// MPI_Recv(res, 1, MPI_INT, i+1, i+1, MPI_COMM_WORLD, &status);
// number = number + res[0];
printf("\n\ntotal number of result found:%d\n", number);
} else {
patternPartitioning mypartition;
MPI_Status status;
int number[1];
int mpi_recv_block[2];
MPI_Recv(mpi_recv_block, 2, MPI_INT, 0, rank, MPI_COMM_WORLD,
printf("Processor : %d : has start : %d : and end : %d :\n",rank,mpi_recv_block[0],mpi_recv_block[1]);
//perform the search here
int size = mpi_recv_block[1]-mpi_recv_block[0];
char *text = (char*) malloc (size);
fseek(fp,mpi_recv_block[0] , SEEK_SET);
fread(&text, sizeof(char), size, fp);
printf("read in rank1");
// fread(text,size,size,fp);
printf("length of text segment by proc: %d is %d",rank,(int)strlen(text));
number[0] = rabincarp(text,pattern);
//MPI_Send(number, 1, MPI_INT, 0, rank, MPI_COMM_WORLD);
return (EXIT_SUCCESS);
if I run( mpirun -np 2 pnew ) this I am getting the following error:
[localhost:03265] *** Process received signal ***
[localhost:03265] *** Process received signal ***
mpirun noticed that process rank 1 with PID 3265 on node localhost exited on signal 7 (Bus error).
so if I remove the fread() statements I dont get the error.. can anyone tell me what am I missing?
char *text = (char*) malloc (size);
fseek(fp,partition->lowerOffset , SEEK_SET);
fread(&text, sizeof(char), size, fp);
The documentation for fread says "The function fread() reads nmemb elements of data, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr."
Since text is a char *, &text is the address of a char *. That won't have enough space to hold the data you're reading. You want to pass fread the address of the memory you allocated, not the address of the variable holding that address! (So remove the &.)
if (fp != '\0') {
fp is FILE* , '\0' is an int constant.
This is not the error, but I suggest you compile with a higher warning level to catch this kind of errors.

Initialize an array using openmpi once

I am trying to run some tests using OPENmpi processing data in an array by spliting up the work across nodes (the second part is with matricies). I am running into some problems now because the data array is being initialized every time and I don't know how to prevent this from happening.
How, using ANSI C can I create a variable length array, using OPENmpi once? I tried making it static and global, but nothing.
#define NUM_THREADS 4
#define NUM_DATA 1000
static int *list = NULL;
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
int i;
if(list == NULL)
list = malloc(n*sizeof(int));
for(i = 0 ; i < n; i++)
list[i] = rand() % 1000;
int position;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank,processor_name, numprocs);
clock_t start = clock();
position = n / NUM_THREADS * rank;
search(list,position, n / NUM_THREADS * (rank + 1));
printf("Time elapsed: %f seconds\n", ((double)clock() - (double)start) /(double) CLOCKS_PER_SEC);
return 0;
Probably the easiest way is to have the rank 0 process do the initialization while the other processes block. Then once the initialization is done, have them all start their work.
A basic example trying to call your search function (NB: it's dry-coded):
#define NUM_THREADS 4
#define NUM_DATA 1000
int main(int argc, char *argv[]) {
int *list;
int numprocs, rank, namelen, i, n;
int chunksize,offset;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status stat;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
//note you'll need to handle n%NUM_THREADS !=0, but i'm ignoring that for now
chunksize = n / NUM_THREADS;
if (rank == 0) {
//Think of this as a master process
//Do your initialization in this process
list = malloc(n*sizeof(int));
for(i = 0 ; i < n; i++)
list[i] = rand() % 1000;
// Once you're ready, send each slave process a chunk to work on
offset = chunksize;
for(i = 1; i < numprocs; i++) {
MPI_Send(&list[offset], chunksize, MPI_INT, i, 0, MPI_COMM_WORLD);
offset += chunksize
search(list, 0, chunksize);
//If you need some sort of response back from the slaves, do a recv loop here
} else {
// If you're not the master, you're a slave process, so wait to receive data
list = malloc(chunksize*sizeof(int));
MPI_Recv(list, chunksize, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat);
// Now you can do work on your portion
search(list, 0, chunksize);
//If you need to send something back to the master, do it here.
