How to calculate percentage ratio? - c

I am new to mpi and I am trying to write a mini C program that calculates the percentage ratio of numbers that the user inputs.
The percentage ratio is calculated by that expression
`δi = ((xi – xmin ) / (xmax – xmin )) * 100`.
The numbers that the user inputs are stored in an array of fixed size data[100] and are scattered to all processes (this program is supposed to work only with four processes).
The problem I am facing is that the division doesn't work although all the processes have the data. For example if the user inputs the numbers {1, 2, 3, 4} the expected percentage ratio according to the mathematical expression is {0, 33.3, 66.6, 100} but instead I am getting {0,0,100,100}. This is what I have.
#include <stdio.h>
#include "mpi.h"
int main(int argc, char** argv){
int my_rank;
int total_processes;
int root = 0;
int data[100];
int loc_data[100];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &total_processes);
int input_size = 0;
if (my_rank == 0){
printf("Input how many numbers: ");
scanf("%d", &input_size);
printf("Input the elements of the array: ");
for(int i=0; i<input_size; i++){
scanf("%d", &data[i]);
MPI_Bcast(&input_size, 1, MPI_INT, root, MPI_COMM_WORLD);
int loc_num = input_size/total_processes;
MPI_Scatter(&data, loc_num, MPI_INT, loc_data, loc_num, MPI_INT, root, MPI_COMM_WORLD);
int global_max = 0;
int global_min = 0;
MPI_Reduce(&loc_data, &global_max, 1, MPI_INT, MPI_MAX, root, MPI_COMM_WORLD);
MPI_Reduce(&loc_data, &global_min, 1, MPI_INT, MPI_MIN, root, MPI_COMM_WORLD);
float loc_delta[100];
int x = 0;
int y = 0;
float p = 0;
for(int j = 0; j< loc_num; j++){
x = loc_data[j] - global_min;
y = global_max - global_min;
MPI_Bcast(&y, 1, MPI_INT, root, MPI_COMM_WORLD);
for(int j = 0; j< loc_num ; j++){
p = (x / y) * 100;
printf("p= %f \n", p);
loc_delta[j] = p;
float final_delta[100];
MPI_Gather(&loc_delta, 1, MPI_FLOAT, final_delta, 1, MPI_FLOAT, root, MPI_COMM_WORLD);
if(my_rank == 0){
printf("max number: %d\n", global_max);
printf("min number: %d\n", global_min);
for(int i = 0; i<input_size; i++)
printf("delta[%d]: %.2f | ", i+1, final_delta[i]);
return 0;

There are several issues with your code.
int global_max = 0;
int global_min = 0;
MPI_Reduce(&loc_data, &global_max, 1, MPI_INT, MPI_MAX, root, MPI_COMM_WORLD);
MPI_Reduce(&loc_data, &global_min, 1, MPI_INT, MPI_MIN, root, MPI_COMM_WORLD);
MPI does not get the minimum of all elements in the array, you have to
do that manually. (source)
Therefore, one needs to first calculate the min and the max within each process' array, and then one can reduce those min and max results among the other processes. Since, all processes should have the min and max of that array, instead of MPI_Reduce, you should use MPI_Allreduce. And your code would look like the following:
int local_max = loc_data[0];
int local_min = loc_data[0];
for(int i = 1; i < loc_num; i++){
local_max = (local_max > loc_data[i]) ? local_max : loc_data[i];
local_min = (local_min < loc_data[i]) ? local_min : loc_data[i];
int global_max = local_max;
int global_min = local_min;
MPI_Allreduce(&local_max, &global_max, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
MPI_Allreduce(&local_min, &global_min, 1, MPI_INT, MPI_MIN, MPI_COMM_WORLD);
Unless you are assuming that loc_num=1, which you should not, this code
for(int j = 0; j< loc_num; j++){
x = loc_data[j] - global_min;
y = global_max - global_min;
overrides the same x and y. Moreover, you should not call MPI_Bcast(&y, 1, MPI_INT, root, MPI_COMM_WORLD);, you want for all the processes to first calculate in parallel their work based on the formula:
δi = ((xi – xmin ) / (xmax – xmin )) * 100.
and only then send their work back to the master process. So each process should apply that formula to their input indices, stored the results in an array and send it back to the master process. Like so:
float loc_delta[100];
float y = global_max - global_min;
for(int j = 0; j< loc_num; j++){
loc_delta[j] = (((float) (loc_data[j] - global_min) / y) * 100.0);
float final_delta[100];
MPI_Gather(&loc_delta, loc_num, MPI_FLOAT, final_delta, loc_num, MPI_FLOAT, root, MPI_COMM_WORLD);
Notice that I am casting (((float) (loc_data[j] - global_min) / y) * 100.0); to float. Otherwise, C would return an int representation of the result.


MPI_Send with type created through MPI_Type_create_darray

I have the following code
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int n_chunks = 16;
assert(n % n_chunks == 0);
int chunk_size = n / n_chunks;
int psizes[2] = {0, 0};
MPI_Dims_create(world_size, 2, psizes);
MPI_Datatype *dist_types = (MPI_Datatype *) malloc(world_size * sizeof(MPI_Datatype));
for (int i = 0; i < world_size; i++) {
int sizes[2] = {n, n};
int dargs[2] = {chunk_size, chunk_size};
MPI_Type_create_darray(world_size, i, 2,
sizes, distribs, dargs, psizes,
MPI_ORDER_C, MPI_DOUBLE, &dist_types[i]);
MPI_Request *send_requests;
if (rank == 0) {
send_requests = (MPI_Request *) malloc(world_size * sizeof(MPI_Request));
for (int i = 0; i < world_size; i++) {
MPI_Isend(&A[0][0], 1, dist_types[i],
i, 0, MPI_COMM_WORLD, &send_requests[i]);
int dist_size;
MPI_Type_size(dist_types[rank], &dist_size);
dist_size /= sizeof(double);
double *D = (double *) malloc(dist_size * sizeof(double));
MPI_Request recv_request;
MPI_Irecv(D, dist_size, MPI_DOUBLE,
0, 0, MPI_COMM_WORLD, &recv_request);
MPI_Wait(&recv_request, MPI_STATUS_IGNORE);
if (rank == 0) {
MPI_Waitall(1, send_requests, MPI_STATUSES_IGNORE);
int m = n / psizes[0];
if (rank == 0) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < m; j++) {
printf("%.2lf ", D[i * m + j]);
When I print out the matrix D, I don't get a block cyclic view of A as I'd expect. Rather, the entries are all jumbled up and apart from the top row they look quite random.
Hence, my question is, can I generally expect this to work or are you not really supposed to use MPI_Type_create_darray in this situation. I'm wondering because from what I could find online, people only mention the function in the context of MPI-IO and I couldn't locate a single example of it being used in a way similar to what I have.
I'm an MPI novice, so maybe I'm just doing something wrong that's unrelated to the type I'm using. Also, I did read that it's not really ideal to distribute your matrix this way and rather use MPI-IO, but I can't really change that.

How to convert MPI blocking code into non blocking

I want to perform matrix multiplication. I have to write two codes one with MPI blocking and other with MPI non blocking. I have done with MPI blocking. I want some help to convert below code into MPI non blocking.
This is the code of matrix multiplication with Blocking and i want to convert it into MPI non blocking. If anyone is available then Please respond..
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
#include <time.h>
#include <sys/time.h>
// Number of rows and columnns in a matrix
#define N 4
MPI_Status status;
// Matrix holders are created
double matrix_a[N][N],matrix_b[N][N],matrix_c[N][N];
int main(int argc, char **argv)
int processCount, processId, slaveTaskCount, source, dest, rows, offset;
struct timeval start, stop;
// MPI environment is initialized
MPI_Init(&argc, &argv);
// Each process gets unique ID (rank)
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
// Number of processes in communicator will be assigned to variable -> processCount
MPI_Comm_size(MPI_COMM_WORLD, &processCount);
// Number of slave tasks will be assigned to variable -> slaveTaskCount
slaveTaskCount = processCount - 1;
// Root (Master) process
if (processId == 0) {
// Matrix A and Matrix B both will be filled with random numbers
srand ( time(NULL) );
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++) {
matrix_a[i][j]= rand()%10;
matrix_b[i][j]= rand()%10;
printf("\n\t\tMatrix - Matrix Multiplication using MPI\n");
// Print Matrix A
printf("\nMatrix A\n\n");
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++) {
printf("%.0f\t", matrix_a[i][j]);
// Print Matrix B
printf("\nMatrix B\n\n");
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++) {
printf("%.0f\t", matrix_b[i][j]);
rows = N/slaveTaskCount;
offset = 0;
for (dest=1; dest <= slaveTaskCount; dest++)
// Acknowledging the offset of the Matrix A
MPI_Send(&offset, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
// Acknowledging the number of rows
MPI_Send(&rows, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
// Send rows of the Matrix A which will be assigned to slave process to compute
MPI_Send(&matrix_a[offset][0], rows*N, MPI_DOUBLE,dest,1, MPI_COMM_WORLD);
// Matrix B is sent
MPI_Send(&matrix_b, N*N, MPI_DOUBLE, dest, 1, MPI_COMM_WORLD);
// Offset is modified according to number of rows sent to each process
offset = offset + rows;
for (int i = 1; i <= slaveTaskCount; i++)
source = i;
// Receive the offset of particular slave process
MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
// Receive the number of rows that each slave process processed
MPI_Recv(&rows, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
// Calculated rows of the each process will be stored int Matrix C according to their offset and
// the processed number of rows
MPI_Recv(&matrix_c[offset][0], rows*N, MPI_DOUBLE, source, 2, MPI_COMM_WORLD, &status);
// Print the result matrix
printf("\nResult Matrix C = Matrix A * Matrix B:\n\n");
for (int i = 0; i<N; i++) {
for (int j = 0; j<N; j++)
printf("%.0f\t", matrix_c[i][j]);
printf ("\n");
printf ("\n");
// Slave Processes
if (processId > 0) {
// Source process ID is defined
source = 0;
MPI_Recv(&offset, 1, MPI_INT, source, 1, MPI_COMM_WORLD, &status);
// The slave process receives number of rows sent by root process
MPI_Recv(&rows, 1, MPI_INT, source, 1, MPI_COMM_WORLD, &status);
// The slave process receives the sub portion of the Matrix A which assigned by Root
MPI_Recv(&matrix_a, rows*N, MPI_DOUBLE, source, 1, MPI_COMM_WORLD, &status);
// The slave process receives the Matrix B
MPI_Recv(&matrix_b, N*N, MPI_DOUBLE, source, 1, MPI_COMM_WORLD, &status);
// Matrix multiplication
for (int k = 0; k<N; k++) {
for (int i = 0; i<rows; i++) {
// Set initial value of the row summataion
matrix_c[i][k] = 0.0;
// Matrix A's element(i, j) will be multiplied with Matrix B's element(j, k)
for (int j = 0; j<N; j++)
matrix_c[i][k] = matrix_c[i][k] + matrix_a[i][j] * matrix_b[j][k];
// value in matrix C
MPI_Send(&offset, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
// Number of rows the process calculated will be sent to root process
MPI_Send(&rows, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
// Resulting matrix with calculated rows will be sent to root process
MPI_Send(&matrix_c, rows*N, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD);
Look at non-blocking this way: instead of spelling out "now I send this, now you receive that", you decide in a stage of the computation: "what are all the messages that will be communicated here". Then you do an Isend for all the sends, and Irecv for all the corresponding receives. And then wait for all the resulting requests.
One problem is that each of these Isend/Irecv operations need their own buffer, so you may need to allocate some more memory.

How to sum a 2D array in C using MPI

This is the program I am using to sum all values in a 1D array, and it works correctly. But how do I modify it to work on 2D array? Imagine variable a is something like a = { {1,2}, {3,4}, {5,6} };.
I tried few solutions but they are not working, so can someone explain few important changes to make to make it compatible with 2D array also.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
// size of array
#define n 10
int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
// Temporary array for slave process
int a2[1000];
int main(int argc, char* argv[])
int pid, np,
// np -> no. of processes
// pid -> process id
MPI_Status status;
// Creation of parallel processes
MPI_Init(&argc, &argv);
// find out process ID,
// and how many processes were started
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
// master process
if (pid == 0) {
int index, i;
elements_per_process = n / np;
// check if more than 1 processes are run
if (np > 1) {
// distributes the portion of array
// to child processes to calculate
// their partial sums
for (i = 1; i < np - 1; i++) {
index = i * elements_per_process;
1, MPI_INT, i, 0,
MPI_INT, i, 0,
// last process adds remaining elements
index = i * elements_per_process;
int elements_left = n - index;
i, 0,
MPI_INT, i, 0,
// master process add its own sub array
int sum = 0;
for (i = 0; i < elements_per_process; i++)
sum += a[i];
// collects partial sums from other processes
int tmp;
for (i = 1; i < np; i++) {
MPI_Recv(&tmp, 1, MPI_INT,
int sender = status.MPI_SOURCE;
sum += tmp;
// prints the final sum of array
printf("Sum of array is : %d\n", sum);
// slave processes
else {
1, MPI_INT, 0, 0,
// stores the received array segment
// in local array a2
MPI_Recv(&a2, n_elements_recieved,
MPI_INT, 0, 0,
// calculates its partial sum
int partial_sum = 0;
for (int i = 0; i < n_elements_recieved; i++)
partial_sum += a2[i];
// sends the partial sum to the root process
MPI_Send(&partial_sum, 1, MPI_INT,
// cleans up all MPI state before exit of process
return 0;
You can simplify a lot by using MPI_Reduce instead of MPI_Send/MPI_Recv:
Reduces values on all processes to a single value
A nice tutorial about that routine can be found here.
So each process contains an array (e.g., process 0 { 1, 2, 3, 4, 5} and process 1 {6, 7, 8, 9, 10 }) and performs the partial sum of that array. In the end, each process uses MPI_Reduce to sum all the partial sums into a single value available to the master process (it could have been another process as well). Have a look at this example:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]){
int np, pid;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
int partial_sum = 0;
if (pid == 0) {
int a[] = { 1, 2, 3, 4, 5};
for(int i = 0; i < 5; i++)
partial_sum += a[i];
else if (pid == 1){
int a[] = {6, 7, 8, 9, 10};
for(int i = 0; i < 5; i++)
partial_sum += a[i];
int sum;
MPI_Reduce(&partial_sum, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (pid == 0){
printf("Sum of array is : %d\n", sum);
return 0;
This code only works with 2 processes (and it is kind of silly( but I am using it to showcase the use of the MPI_Reduce.
I tried few solutions but they are not working, so can someone explain
few important changes to make to make it compatible with 2D array
If you adapt your code to use the MPI_Reduce as I have shown, then it does not matter if it a 1D or 2D array, because you will first do the partial sum into a single value and then performance the reduction.
Alternatively, you can also have each row assigned to a process and then perform a reduction of the entire array, and then the master process performs the sum of the resulting array.
An example:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]){
int np, pid;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &pid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
int partial_sum = 0;
int size = 5;
int a[5] = {1, 2, 3 , 4, 5};
int sum[5] = {0};
MPI_Reduce(&a, &sum, size, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (pid == 0){
int total_sum = 0;
for(int i = 0; i < size; i++)
total_sum += sum[i];
printf("Sum of array is : %d\n", total_sum);
return 0;
Output (for two processes):
Sum of array is : 30

Implementing MPI_Reduce with MPI_Send and MPI_Recv leads to wrong results

I am working on a program that uses MPI_Send() and MPI_Recv() to replace MPI_Reduce().
I get everything to run except the final bits of the code where it gives the PI approximate, error and runtime. I also don't get the correct sum value after receiving.
I believe there is something going wrong on the MPI_Recv() end but I could be wrong. I am only using 2 processors while running this. The program works fine without PI initialized to a value when using MPI_Reduce.
#include "mpi.h"
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[])
int n, i;
double PI25DT = 3.141592653589793238462643;
double pi, h, sum, x;
int size, rank;
double startTime, endTime;
/* Initialize MPI and get number of processes and my number or rank*/
/* Processor zero sets the number of intervals and starts its clock*/
if (rank==0)
for (int i = 0; i < size; i++) {
if (i != rank) {
MPI_Send(&n, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
/* Broadcast number of intervals to all processes */
/* Calculate the width of intervals */
h = 1.0 / (double) n;
/* Initialize sum */
sum = 0.0;
/* Step over each inteval I own */
for (i = rank+1; i <= n; i += size)
/* Calculate midpoint of interval */
x = h * ((double)i - 0.5);
/* Add rectangle's area = height*width = f(x)*h */
sum += (4.0/(1.0+x*x))*h;
/* Get sum total on processor zero */
MPI_Send(&sum, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
MPI_Send(&pi, 1, MPI_SUM, 0, 0, MPI_COMM_WORLD);
if (rank == 0)
double total_sum = 0;
for (int i = 0; i < size; i++)
total_sum += sum;
printf("Total Sum is %lf\n", total_sum);
/* Print approximate value of pi and runtime*/
if (rank==0)
printf("pi is approximately %.16f, Error is %e\n",
pi, fabs(pi - PI25DT));
printf("runtime is=%.16f",endTime-startTime);
return 0;
MPI_Send(&pi, 1, MPI_SUM, 0, 0, MPI_COMM_WORLD);
are wrong the MPI_Send and MPI_Recv expect as third parameter MPI_Datatype not a MPI_OP (i.e., MPI_SUM).
But looking at your code what you actually want to do is to replace those calls by:
double pi = sum;
if (myid != 0) {
MPI_Send(&sum, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
else {
for (int i = 1; i < numprocs; i++) {
pi += value;
To replace the behavior of the MPI_Reduce.
A running example:
#include "mpi.h"
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[])
int n, i;
double PI25DT = 3.141592653589793238462643;
double h, sum, x;
int numprocs, myid;
double startTime, endTime;
/* Initialize MPI and get number of processes and my number or rank*/
/* Processor zero sets the number of intervals and starts its clock*/
if (myid==0) {
for (int i = 0; i < numprocs; i++) {
if (i != myid) {
MPI_Send(&n, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
else {
/* Calculate the width of intervals */
h = 1.0 / (double) n;
/* Initialize sum */
sum = 0.0;
/* Step over each inteval I own */
for (i = myid+1; i <= n; i += numprocs) {
/* Calculate midpoint of interval */
x = h * ((double)i - 0.5);
/* Add rectangle's area = height*width = f(x)*h */
sum += (4.0/(1.0+x*x))*h;
/* Get sum total on processor zero */
double value = 0;
double pi = sum;
if (myid != 0) {
MPI_Send(&sum, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
else {
for (int i = 1; i < numprocs; i++) {
pi += value;
/* Print approximate value of pi and runtime*/
if (myid==0) {
printf("pi is approximately %.16f, Error is %e\n",
pi, fabs(pi - PI25DT));
printf("runtime is=%.16f",endTime-startTime);
return 0;
the output (2 processes):
pi is approximately 3.1415926535898993, Error is 1.061373e-13
runtime is=1.3594319820404053

MPI Search In Array

Im trying to find a spesific value inside an array. Im trying to find it with parallel searching by mpi. When my code finds the value, it shows an error.
Assertion failed in file src/mpid/ch3/src/ch3u_buffer.c at line 77: FALSE
memcpy argument memory ranges overlap, dst_=0x7ffece7eb590 src_=0x7ffece7eb590 len_=4
const char *FILENAME = "input.txt";
const size_t ARRAY_SIZE = 640;
int main(int argc, char **argv)
int *array = malloc(sizeof(int) * ARRAY_SIZE);
int rank,size;
MPI_Status status;
MPI_Request request;
int done,myfound,inrange,nvalues;
int i,j,dummy;
/* Let the system do what it needs to start up MPI */
if (rank == 0)
array = readFile(FILENAME);
MPI_Irecv(&dummy, 1, MPI_INT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &request);
MPI_Test(&request, &done, &status);
inrange = (i <= ((rank + 1) * nvalues - 1) && i >= rank * nvalues); //LIMIT OF THE OFFSET
while (!done && inrange)
if (array[i] == 17)
dummy = 1;
for (j = 0; j < size; j++)
MPI_Send(&dummy, 1, MPI_INT, j, 1, MPI_COMM_WORLD);
printf("P:%d found it at global index %d\n", rank, i);
myfound = 1;
printf("P:%d - %d - %d\n", rank, i, array[i]);
MPI_Test(&request, &done, &status);
inrange = (i <= ((rank + 1) * nvalues - 1) && i >= rank * nvalues);
if (!myfound)
printf("P:%d stopped at global index %d\n", rank, i - 1);
Error is somewhere in here because when i put an invalid number for example -5 into if condition, program runs smoothly.
dummy = 1;
for (j = 0; j < size; j++)
MPI_Send(&dummy, 1, MPI_INT, j, 1, MPI_COMM_WORLD);
printf("P:%d found it at global index %d\n", rank, i);
myfound = 1;
Your program is invalid with respect to the MPI standard because you use the same buffer (&dummy) for both MPI_Irecv() and MPI_Send().
You can either use two distinct buffers (e.g. dummy_send and dummy_recv), or since you do not seem to care about the value of dummy, then use NULL as buffer and send/receive zero size messages.
