Time of process computation with MPI - c

I have the program which calculate integral by using MPI. This is the code (trapezoidal rule, by using external function included in trap1.h file):
#include <mpi.h>
#include <stdio.h>
#include <math.h>
#include "get_data.h"
#include "trap1.h"
extern double trap1(int n1, int n2, double a, double h);
extern void Get_data(int* n, int m, int d);
int main(int argc, char** argv)
{
int world_size, world_rank, i, N, N1, N2;
double Sum, GSum, A, B, h, time1, time2, x, ISum,time;
MPI_Status Status;
A=0.0;
B=1.0;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
Get_data(&N, world_rank, world_size);
if(world_rank==0) time1=MPI_Wtime();
N1 = N*world_rank/world_size+1;
N2 = N*(world_rank+1)/world_size;
h=(B-A)/N;
Sum = trap1(N1,N2, A, h);
FILE * pFile;
if(world_rank==0) {
GSum = Sum;
for(i=1;i<world_size;i++)
{
MPI_Recv(&ISum,1, MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &Status);
}
time2 = MPI_Wtime();
GSum=GSum/N;
time = time2-time1;
printf("Result = %et Time = %et Intervals: %dn",GSum,time,N);
FILE * pFile;
pFile = fopen ("time1_1.log","a");
fprintf (pFile, "number of procs is %it time= %ft cost = %it", world_size, time, world_size*time);
fclose(pFile);
}
else MPI_Send(&Sum, 1, MPI_DOUBLE_PRECISION, 0, 0, MPI_COMM_WORLD);
MPI_Finalize();
}
I need to get the calculation time for each process (i.e., I want to calculate the period of time during which process i, where 0 <= i < world_size, was running). Unfortunately, all of my attempts were unsuccessful. If I understand correctly, I need to get the value
MPI_Wtime() - time1
where time1 is the program initialization value, for each process. But I don't understand exactly in which part of my code this value gives the calculation time for processes.
I've inserted the code
FILE * pFile;
pFile = fopen("time0_1_1.log","a");
fprintf (pFile, "particular_time= %f\t", MPI_Wtime() - time1);
fclose(pFile);
into the program code after
MPI_Recv(&ISum,1, MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &Status);
line. But I'm not sure that the time I've obtained is really the time of particular processes.
Could You help me?

Related

Problem with calculating the total time in MPI program using MPI_Wtime()

I am trying to find out the time taken by each processor and the total time taken to calculate the whole program, there seems to be some sort of error. Any suggestions and help would be much appreciated.I had used the same method for another code and it worked there, but can't seem to figure out the problem in this one.
The code I have written
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"
int main(int argc, char** argv){
int my_rank;
double time1, time2, duration, global;
int size;
float a ;
float b ;
int n ;
float h;
float local_a;
float local_b;
int local_n;
float integral;
float total;
int source;
int dest = 0;
int tag = 0;
MPI_Status status;
float Trap(float local_a, float local_b, int local_n, float h);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (my_rank == 0){
printf("Enter a, b and n \n");
scanf("%f %f %d", &a, &b, &n);
for ( dest = 1 ; dest < size; dest++){
MPI_Send(&a, 1 , MPI_FLOAT, dest , tag=0, MPI_COMM_WORLD);
MPI_Send(&b, 1 , MPI_FLOAT, dest , tag=1, MPI_COMM_WORLD);
MPI_Send(&n, 1 , MPI_INT, dest , tag=2, MPI_COMM_WORLD);
}
}
else{
MPI_Recv(&a, 1, MPI_FLOAT, source, tag=0, MPI_COMM_WORLD, &status);
MPI_Recv(&b, 1, MPI_FLOAT, source, tag=1, MPI_COMM_WORLD, &status);
MPI_Recv(&n, 1, MPI_INT, source, tag=2, MPI_COMM_WORLD, &status);
}
MPI_Barrier(MPI_COMM_WORLD);
time1 = MPI_Wtime();
h = (b-a)/n;
local_n = n/size;
local_a = a + my_rank * local_n * h;
local_b = (local_a + local_n) * h;
integral = Trap(local_a, local_b, local_n, h);
if (my_rank == 0){
total = integral;
for (source = 1; source < size; source++){
MPI_Recv(&integral, 1, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &status);
total += integral;
}
}
else {
MPI_Send(&integral, 1, MPI_FLOAT, dest, tag, MPI_COMM_WORLD);
}
time2 = MPI_Wtime();
duration = time2 - time1;
MPI_Reduce(&duration, &global,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
if (my_rank == 0){
printf("With n = %d trapezoids, our estimate \n", n);
printf("of the integral from %f to %f = %0.8f\n",a,b,total);
printf("Global runtime is %f\n",global);
}
printf("Runtime at %d is %f \n", my_rank,duration);
MPI_Finalize();
}
float Trap(float local_a, float local_b, int local_n, float h){
float integral;
float x;
int i;
float f(float x);
integral = (f(local_a) + f(local_b))/2.0;
x = local_a;
for (int i = 1; i <= local_n-1; i++){
x += h;
integral += f(x);
}
integral *= h;
}
float f(float x){
return x*x;
}
The error that it shows
[Sid-Laptop:4987] *** An error occurred in MPI_Recv
[Sid-Laptop:4987] *** reported by process [852688897,2]
[Sid-Laptop:4987] *** on communicator MPI_COMM_WORLD
[Sid-Laptop:4987] *** MPI_ERR_RANK: invalid rank
[Sid-Laptop:4987] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Sid-Laptop:4987] *** and potentially your MPI job)
Enter a, b and n
[Sid-Laptop:04980] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[Sid-Laptop:04980] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I cannot reproduce your behaviour where removing the Wtime call "fixes" the program, but I suspect what is happening is this:
Your variable "source" is not set. Unset variables have some garbage-value, but they can often be zero. See this question What does uninitialised memory contain?
If your uninitialized source is 0, than it actually has the correct value for the first set of recv-calls. If it is not zero, there is probably no rank with that number, and the call fails.
Answering why the Wtime-call may or may not make it so that on your specific system (compiler+os+hardware+libraries etc) the uninitialized value happens to be zero is hard and also a bit useless. A C-Program that reads an uninitialized variable has so-called "Undefined Behaviour" and can do anything. It is important to understand the concept of undefined behaviour when programming in C. The c-faq describes it like this:
undefined: Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
(https://c-faq.com/ansi/undef.html)
This makes C really quite different from most programming languages in terms of debugging and it is the reason commenters advice you to enable compiler-warnings and fix them.

Not quite understanding MPI

I am attempting to make a program using MPI that will find the value of PI using MPI.
Currently I can find the sum this way:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define NUMSTEPS 1000000
int main() {
int i;
double x, pi, sum = 0.0;
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
double step = 1.0/(double) NUMSTEPS;
x = 0.5 * step;
for (i=0;i<= NUMSTEPS; i++){
x+=step;
sum += 4.0/(1.0+x*x);
}
pi = step * sum;
clock_gettime(CLOCK_MONOTONIC, &end);
u_int64_t diff = 1000000000L * (end.tv_sec - start.tv_sec) + end.tv_nsec - start.tv_nsec;
printf("PI is %.20f\n",pi);
printf("elapsed time = %llu nanoseconds\n", (long long unsigned int) diff);
return 0;
}
But this does not use MPI.
So I have tried to make my own in MPI. My logic is:
Split the 1000000 into equal parts based on how many processors I have
Calculate the values for each range
Send the calculated value back to the master and then divide by the number of processors. I would like to keep the main thread free and not do any work. Similar to a master-slave system.
Here's what I have currently. This doesn't seem to be working and the send/receive gives errors about incompatible variables for receive and send.
#include <mpi.h>
#include <stdio.h>
#include <string.h>
#define NUMSTEPS 1000000
int main(int argc, char** argv) {
int comm_sz; //number of processes
int my_rank; //my process rank
// Initialize the MPI environment
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Slaves
if (my_rank != 0) {
// Process math then send
int i;
double x, pi, sum = 0.0;
double step = 1.0/(double) NUMSTEPS;
x = 0.5 * step;
// Find the start and end for the number
int processors = comm_sz - 1;
int thread_multi = NUMSTEPS / processors;
int start = my_rank * thread_multi;
if((my_rank - 1) != 0){
start += 1;
}
int end = start + thread_multi ;
for (i=start; i <= end; i++){
x+=step;
sum += 4.0 / (1.0 + x * x);
}
pi = step * sum;
MPI_Send(pi, 1.0, MPI_DOUBLE 1, 0, MPI_COMM_WORLD);
// Master
} else {
// Things in here only get called once.
double pi = 0.0;
double total = 0.0;
for (int q = 1; q < comm_sz; q++) {
MPI_Recv(pi, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
total += pi;
pi = 0.0;
}
// Take the added totals and divide by amount of processors that processed, to get the average
double finished = total / (comm_sz - 1);
// Print sum here
printf("Pi Is: %d", finished);
}
// Finalize the MPI environment.
MPI_Finalize();
}
I've currently spent around 3 hours working on this. Never used MPI. Any help would be greatly appreciated.
Try compiling with more compiler warnings and try to fix them, for instance -Wall -Wextra should give you excellent clues about what the issues are.
According to MPI_Send documentation the first argument is a pointer, so you seem to be ignoring an automatic "conversion to pointer" error. You have the same issue in the MPI_Recv() call.
You can try to pass pi as &pi in MPI_Recv and MPI_Send and check if that fixes the error.
As a comment, you can declare dummy variables as pi as a local variables inside the master loop to avoid side-effects:
for (int q = 1; q < comm_sz; q++) {
double pi = 0;
MPI_Recv(&pi, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
total += pi;
}

Storing values starting from a particular location in MPI_Recv

I am testing an example, where I am trying to send an array of 4 elements from process 0 to process 1 and I am doing so using MPI_Type_contiguous
This is the code for the same
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main( int argc, char *argv[] )
{
MPI_Init(&argc, &argv);
int myrank, size; //size will take care of number of processes
MPI_Comm_rank(MPI_COMM_WORLD, &myrank) ;
MPI_Comm_size(MPI_COMM_WORLD, &size);
//declaring the matrix
double mat[4]={1, 2, 3, 4};
int r=4;
double snd_buf[r];
double recv_buf[r];
double buf[r];
int position=0;
MPI_Status status[r];
MPI_Datatype type;
MPI_Type_contiguous( r, MPI_DOUBLE, &type );
MPI_Type_commit(&type);
//sending the data
if(myrank==0)
{
MPI_Send (&mat[0], r , type, 1 /*dest*/ , 100 /*tag*/ , MPI_COMM_WORLD);
}
//receiving the data
if(myrank==1)
{
MPI_Recv(recv_buf, r, type, 0 /*src*/ , 100 /*tag*/, MPI_COMM_WORLD,&status[0]);
}
//printing
if(myrank==1)
{
for(int i=0;i<r;i++)
{
printf("%lf ",recv_buf[i]);
}
printf("\n");
}
MPI_Finalize();
return 0;
}
As one can see the recv_buf size is same as the size of the array. And the output that is being printed is 1 2 3 4
Now what I trying to do is that, say the recv_buf size is 10 and I want to store the elements starting from location 6 to 9. For which I have written this code, but to my surprise it is giving no output
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main( int argc, char *argv[] )
{
MPI_Init(&argc, &argv);
int myrank, size; //size will take care of number of processes
MPI_Comm_rank(MPI_COMM_WORLD, &myrank) ;
MPI_Comm_size(MPI_COMM_WORLD, &size);
//declaring the matrix
double mat[4]={1, 2, 3, 4};
int r=4;
double snd_buf[r];
double recv_buf[10]; //declared it of size 10
double buf[r];
int position=0;
MPI_Status status[r];
MPI_Datatype type;
MPI_Type_contiguous( r, MPI_DOUBLE, &type );
MPI_Type_commit(&type);
//packing and sending the data
if(myrank==0)
{
MPI_Send (&mat[0], r , type, 1 /*dest*/ , 100 /*tag*/ , MPI_COMM_WORLD);
}
//receiving the data
if(myrank==1)
{
MPI_Recv(&recv_buf[6], r, type, 0 /*src*/ , 100 /*tag*/, MPI_COMM_WORLD,&status[0]);
}
//printing
if(myrank==1)
{
for(int i=6;i<10;i++)
{
printf("%lf ",recv_buf[i]);
}
printf("\n");
}
MPI_Finalize();
return 0;
}
Where am I going wrong?
From this SO Thread one can read:
MPI_Type_contiguous is for making a new datatype which is count copies
of the existing one. This is useful to simplify the processes of
sending a number of datatypes together as you don't need to keep track
of their combined size (count in MPI_send can be replaced by 1).
That being said, in your MPI_Send call:
MPI_Send (&mat[0], r , type, 1 /*dest*/ , 100 /*tag*/ , MPI_COMM_WORLD);
you should not send an array with 'r' elements of type 'type', but rather send 1 element of type 'type' (which is equal to 4 doubles). One of the goals of the MPI_Type_contiguous is to abstract away the count to 1 instead of keeping track of the number of elements.
The same applies to your recv call:
MPI_Recv(&recv_buf[6], 1, type, 0 /*src*/ , 100 /*tag*/, MPI_COMM_WORLD,&status[0]);
Finally, you should also free the custom type accordingly:
MPI_Type_free(&type);
The entire code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main( int argc, char *argv[] )
{
MPI_Init(&argc, &argv);
int myrank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank) ;
MPI_Comm_size(MPI_COMM_WORLD, &size);
double mat[4]={1, 2, 3, 4};
int r=4;
double snd_buf[r];
double recv_buf[10];
MPI_Status status;
MPI_Datatype type;
MPI_Type_contiguous( r, MPI_DOUBLE, &type );
MPI_Type_commit(&type);
if(myrank==0)
MPI_Send (&mat[0], 1 , type, 1, 100, MPI_COMM_WORLD);
else if(myrank==1)
{
MPI_Recv(&recv_buf[6], 1, type, 0, 100, MPI_COMM_WORLD, &status);
for(int i=6;i<10;i++)
printf("%lf ",recv_buf[i]);
printf("\n");
}
MPI_Type_free(&type);
MPI_Finalize();
return 0;
}
The Output:
1.000000 2.000000 3.000000 4.000000

Open MPI Waitall() Segmentation Fault

I'm new with MPI and I'm trying to develop a non-blocking programm (with Isend and Irecv). The functionality is very basic (it's educational):
There is one process (rank 0) who is the master and receives messages from the slaves (rank 1-P). The master only receives results.
The slaves generates an array of N random numbers between 0 and R and then they do some operations with those numbers (again, it's just for educational purpose, the operations don't make any sense)
This whole process (operations + send data) is done M times (this is just for comparing different implementations; blocking and non-blocking)
I get a Segmentation Fault in the Master process when I'm calling the MPI_waitall() funcion
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#include <math.h>
#include <time.h>
#define M 1000 //Number of times
#define N 2000 //Quantity of random numbers
#define R 1000 //Max value of random numbers
double SumaDeRaices (double*);
int main(int argc, char* argv[]) {
int yo; /* rank of process */
int p; /* number of processes */
int dest; /* rank of receiver */
/* Start up MPI */
MPI_Init(&argc, &argv);
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &yo);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Request reqs[p-1];
MPI_Status stats[p-1];
if (yo == 0) {
int i,j;
double result;
clock_t inicio,fin;
inicio = clock();
for(i = 0; i<M; i++){ //M times
for(j = 1; j<p; j++){ //for every slave
MPI_Irecv(&result, sizeof(double), MPI_DOUBLE, j, i, MPI_COMM_WORLD, &reqs[j-1]);
}
MPI_Waitall(p-1,reqs,stats); //wait all slaves (SEG_FAULT)
}
fin = clock()-inicio;
printf("Tiempo total de ejecucion %f segundos \n", ((double)fin)/CLOCKS_PER_SEC);
}
else {
double* numAleatorios = (double*) malloc( sizeof(double) * ((double) N) ); //array with numbers
int i,j;
double resultado;
dest=0;
for(i=0; i<M; i++){ //again, M times
for(j=0; j<N; j++){
numAleatorios[j] = rand() % R ;
}
resultado = SumaDeRaices(numAleatorios);
MPI_Isend(&resultado,sizeof(double), MPI_DOUBLE, dest, i, MPI_COMM_WORLD,&reqs[p-1]); //send result to master
}
}
/* Shut down MPI */
MPI_Finalize();
exit(0);
} /* main */
double SumaDeRaices (double* valores){
int i;
double sumaTotal = 0.0;
//Raices cuadradas de los valores y suma de estos
for(i=0; i<N; i++){
sumaTotal = sqrt(valores[i]) + sumaTotal;
}
return sumaTotal;
}
There are several issues with your code. First and foremost in your Isend you pass &resultado several times without waiting until previous non-blocking operation finishes. You are not allowed to reuse the buffer you pass to Isend before you make sure the operation finishes.
Instead I recommend you using normal Send, because in contrast to synchronous send (SSend) normal blocking send returns as soon as you can reuse the buffer.
Second, there is no need to use message tags. I recommend you to just set tag to 0. In terms of performance it is simply faster.
Third, the result shouldn't be a simple variable, but an array of size at least (p-1)
Fourth, I do not recommend you to allocate arrays on stack, like MPI_Request and MPI_Status if the size is not a known small number. In this case the size of array is (p-1), so you better use malloc for this data structure.
Fifth, if you do not check status, use MPI_STATUSES_IGNORE.
Also instead of sizeof(double) you should specify number of items (1).
But of course the absolutely best version is just to use MPI_Gather.
Moreover, generally there is no reason not to run computations on the root node.
Here is slightly rewritten example:
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#include <math.h>
#include <time.h>
#define M 1000 //Number of times
#define N 2000 //Quantity of random numbers
#define R 1000 //Max value of random numbers
double SumaDeRaices (double* valores)
{
int i;
double sumaTotal = 0.0;
//Raices cuadradas de los valores y suma de estos
for(i=0; i<N; i++) {
sumaTotal = sqrt(valores[i]) + sumaTotal;
}
return sumaTotal;
}
int main(int argc, char* argv[]) {
int yo; /* rank of process */
int p; /* number of processes */
/* Start up MPI */
MPI_Init(&argc, &argv);
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &yo);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
double *result;
clock_t inicio, fin;
double *numAleatorios;
if (yo == 0) {
inicio = clock();
}
numAleatorios = (double*) malloc( sizeof(double) * ((double) N) ); //array with numbers
result = (double *) malloc(sizeof(double) * p);
for(int i = 0; i<M; i++){ //M times
for(int j=0; j<N; j++) {
numAleatorios[j] = rand() % R ;
}
double local_result = SumaDeRaices(numAleatorios);
MPI_Gather(&local_result, 1, MPI_DOUBLE, result, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); //send result to master
}
if (yo == 0) {
fin = clock()-inicio;
printf("Tiempo total de ejecucion %f segundos \n", ((double)fin)/CLOCKS_PER_SEC);
}
free(numAleatorios);
/* Shut down MPI */
MPI_Finalize();
} /* main */

Invalid node count in MPI pi calculation

I am trying to parallelize the following code for calculation of pi.
My approach is to use scatter to parallelize the for, and then use a reduce to calculate the sum value and finally show pi.
My code is the following
#include <stdio.h>
#include <mpi.h>
long num_steps = 100000;
double step = 1.0/100000.0;
int main() {
int i, myid, size;
double x, pi, local_sum = 0.0, sum=0.0;
double send_vec[num_steps], recv_vect[num_steps];
// Initialize the MPI environment
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid ==0){
int i=0;
for (i=0; i<num_steps;i++){
send_vec[i]=i;
}
}
MPI_Scatter(send_vec, num_steps/size, MPI_INT, recv_vect,
num_steps, MPI_INT, 0, MPI_COMM_WORLD);
for(i = 0; i < num_steps; ++i) {
x = (recv_vect[i]-0.5)*step;
local_sum += 4.0/(1.0+x*x);
}
MPI_Reduce(&local_sum, &sum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0){
pi = step*sum;
printf("PI value = %f\n", pi);
}
// Finalize the MPI environment.
MPI_Finalize();
}
The thing is when I run the program with the option -np 1 and 2
I do get the desired result.
Yet when I run with 3, 4 and higher I get the following error:
PIC_Send(284).........: Negative count, value is -240000
Fatal error in PMPI_Scatter: Invalid count, error stack
The call to MPI_Scatter() is to be corrected:
MPI_Scatter(send_vec, num_steps/size, MPI_INT, recv_vect,
num_steps, MPI_INT, 0, MPI_COMM_WORLD);
To send double, use the datatype MPI_DOUBLE as you did in the MPI_Reduce()
Since the sendtype is similar to the recvtype, the number of item sent to each process sendcount must be equal to the number of item received by each process recvcount. In the present case, it's num_steps/size.
Finally, the call to MPI_Scatter() will look like:
MPI_Scatter(send_vec, num_steps/size, MPI_DOUBLE, recv_vect,
num_steps/size, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Lastly, dynamic memory allocation can be used to avoid using the stack for storing large arrays. Moreover, the allocated space can be decreased so as to reduce the memory footprint:
num_steps=(num_steps/size)*size;
double* send_vec=NULL;
double* recv_vec=NULL;
if(rank==0){
send_vec=malloc((num_steps/size)*sizeof(double));
if(send_vec==NULL){fprintf(stderr,"malloc failed\n");exit(1);}
}
recv_vec=malloc(num_steps*sizeof(double));
if(recv_vec==NULL){fprintf(stderr,"malloc failed\n");exit(1);}
...
if(rank==0){
free(send_vec);
}
free(recv_vec);

Resources