C parallel implementation of Gauss elimination with MPI [closed]

C parallel implementation of Gauss elimination with MPI [closed] - c

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I'm very new to MPI and I was asked to write a C parallel implementation for Gauss elimination (without pivoting).
I gave it a try (I used a row-wise decomposition) but my code doesn't work. I am hoping someone can give me some pointers here. I've been looking for what's wrong for few days already without success :(
Thank you in advance !
#include<stdio.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
int i,j,k;
int map[500];
float A[500][500],b[500],c[500],x[500],sum=0.0;
double range=1.0;
int n=3;
int rank, nprocs;
clock_t begin1, end1, begin2, end2;
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); /* get number of processes */
//////////////////////////////////////////////////////////////////////////////////
if (rank==0)
{
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
A[i][j]=range*(1.0-2.0*(double)rand()/RAND_MAX);
b[i]=range*(1.0-2.0*(double)rand()/RAND_MAX);
}
printf("\n Matrix A (generated randomly):\n");
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
printf("%9.6lf ",A[i][j]);
printf("\n");
}
printf("\n Vector b (generated randomly):\n");
for (i=0; i<n; i++)
printf("%9.6lf ",b[i]);
printf("\n\n");
}
//////////////////////////////////////////////////////////////////////////////////
begin1 =clock();
MPI_Bcast (A,n*n,MPI_DOUBLE,0,MPI_COMM_WORLD);
MPI_Bcast (b,n,MPI_DOUBLE,0,MPI_COMM_WORLD);
for(i=0; i<n; i++)
{
map[i]= i % nprocs;
}
for(k=0;k<n;k++)
{
MPI_Bcast (&A[k][k],n-k,MPI_DOUBLE,map[k],MPI_COMM_WORLD);
MPI_Bcast (&b[k],1,MPI_DOUBLE,map[k],MPI_COMM_WORLD);
for(i= k+1; i<n; i++)
{
if(map[i] == rank)
{
c[i]=A[i][k]/A[k][k];
}
}
for(i= k+1; i<n; i++)
{
if(map[i] == rank)
{
for(j=0;j<n;j++)
{
A[i][j]=A[i][j]-( c[i]*A[k][j] );
}
b[i]=b[i]-( c[i]*b[k] );
}
}
}
end1 = clock();
//////////////////////////////////////////////////////////////////////////////////
begin2 =clock();
if (rank==0)
{
x[n-1]=b[n-1]/A[n-1][n-1];
for(i=n-2;i>=0;i--)
{
sum=0;
for(j=i+1;j<n;j++)
{
sum=sum+A[i][j]*x[j];
}
x[i]=(b[i]-sum)/A[i][i];
}
end2 = clock();
}
//////////////////////////////////////////////////////////////////////////////////
if (rank==0)
{
printf("\nThe solution is:");
for(i=0;i<n;i++)
{
printf("\nx%d=%f\t",i,x[i]);
}
printf("\n\nLU decomposition time: %f", (double)(end1 - begin1) / CLOCKS_PER_SEC);
printf("\nBack substitution time: %f\n", (double)(end2 - begin2) / CLOCKS_PER_SEC);
}
return(0);
MPI_Finalize();
}
And this is the error I'm getting:
mpirun has exited due to process rank 1 with PID XXXX on node XXXX exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here).

As noticed by High Performance Mark, add MPI_Finalize() before return(0). This code will run without prompting any problem...But the result will still be uncorrect. In parallel, it will print nan as being the result, which is false.
The problem comes from MPI_Bcast(A,n*n,MPI_DOUBLE,...). A is defined as float A[500][500].
You need to broadcast the pointer to the first element &A[0][0], not the pointer to the pointer to the first element.
If you send n*n elements (n=3), you will send A[0][0],...,A[0][8] and A[1][1] will be left uninitialized. This could cause wrong results, such as nan. For the shake of simplicity (laziness...), you may change for 500*500.
MPI_DOUBLE corresponds to double precision... Solution is either to change for double A[500][500] or MPI_Bcast(&A[0][0],500*500,MPI_FLOAT,...). Do the same thing for b.
This deterministic use of rand() is really useful for debugging purpose...Do not forget to use srand() to seed your random generator !
EDIT : here is the code :
#include<stdio.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
int i,j,k;
int map[500];
double A[500][500],b[500],c[500],x[500],sum=0.0;
double range=1.0;
int n=3;
int rank, nprocs;
clock_t begin1, end1, begin2, end2;
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); /* get number of processes */
//////////////////////////////////////////////////////////////////////////////////
if (rank==0)
{
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
A[i][j]=range*(1.0-2.0*(double)rand()/RAND_MAX);
b[i]=range*(1.0-2.0*(double)rand()/RAND_MAX);
}
printf("\n Matrix A (generated randomly):\n");
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
printf("%9.6lf ",A[i][j]);
printf("\n");
}
printf("\n Vector b (generated randomly):\n");
for (i=0; i<n; i++)
printf("%9.6lf ",b[i]);
printf("\n\n");
}
//////////////////////////////////////////////////////////////////////////////////
begin1 =clock();
MPI_Bcast (&A[0][0],500*500,MPI_DOUBLE,0,MPI_COMM_WORLD);
MPI_Bcast (b,n,MPI_DOUBLE,0,MPI_COMM_WORLD);
for(i=0; i<n; i++)
{
map[i]= i % nprocs;
}
for(k=0;k<n;k++)
{
MPI_Bcast (&A[k][k],n-k,MPI_DOUBLE,map[k],MPI_COMM_WORLD);
MPI_Bcast (&b[k],1,MPI_DOUBLE,map[k],MPI_COMM_WORLD);
for(i= k+1; i<n; i++)
{
if(map[i] == rank)
{
c[i]=A[i][k]/A[k][k];
}
}
for(i= k+1; i<n; i++)
{
if(map[i] == rank)
{
for(j=0;j<n;j++)
{
A[i][j]=A[i][j]-( c[i]*A[k][j] );
}
b[i]=b[i]-( c[i]*b[k] );
}
}
}
end1 = clock();
//////////////////////////////////////////////////////////////////////////////////
begin2 =clock();
if (rank==0)
{
x[n-1]=b[n-1]/A[n-1][n-1];
for(i=n-2;i>=0;i--)
{
sum=0;
for(j=i+1;j<n;j++)
{
sum=sum+A[i][j]*x[j];
}
x[i]=(b[i]-sum)/A[i][i];
}
end2 = clock();
}
//////////////////////////////////////////////////////////////////////////////////
if (rank==0)
{
printf("\nThe solution is:");
for(i=0;i<n;i++)
{
printf("\nx%d=%f\t",i,x[i]);
}
printf("\n\nLU decomposition time: %f", (double)(end1 - begin1) / CLOCKS_PER_SEC);
printf("\nBack substitution time: %f\n", (double)(end2 - begin2) / CLOCKS_PER_SEC);
}
MPI_Finalize();
return(0);
}

I'm not much of a C programmer but it looks to me as if you have probably called return prematurely. Specifically you have called it before MPI_Finalize(). Try swapping the order of the statements. Or even dropping the return altogether.

Related

collect result MPI C

hi guys i wanted to ask you for help with the array data collection, below there is my code and here what it should do
implementation of a parallel algorithm in which each
processor read a different size vector
M, carry out the sum of its elements in the memory
local and concatenate results in a contained vector
in the memory of each processor.
CODE
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int * a;
int i, n;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nproc);
MPI_Comm_rank (MPI_COMM_WORLD,&menum);
printf("Insert the size of the array\n");
scanf ("%d", &n);
a = (int *) malloc(sizeof(int)*n);
printf("Size degli array inserito correttamente\n");
printf("Ora, inserire i %d elementi di a\n", n);
for(i = 0; i < n; i++)
scanf ("%d", &a[i]);
printf("Elementi di a inseriti correttamente\n");
float local_sum = 0;
int i;
for (i = 0; i < n; i++) {
local_sum += a[i]; \\
}
MPI_Allgather(&localsum,1,MPI_INT,localsum,1,MPI_INT,MPI_COMMON_WORLD);
MPI_Finalize();
}

Perform a triple pointer (C) offloading to NVIDIA GPU with OpenMP

I've been working with a heat transfer code. This code, basically, stablishes the initial conditions for a cube and all of its faces. The six faces start at different temperatures, and then the code will be calculating how the temperature changes in all of the faces due to the heat transfer between them. Now, I've been trying offloading to an NVIDIA GPU using OpenMP directives. This code initializes the faces conditions using a triple pointer, which is sort of an array of arrays. Reading a little bit about this matter, I've come to know that 3D architectures are not easily offloaded to the GPUs. So my question is if it is possible to offload this triple pointer arrays to the GPU or if I have to use a more flat array form.
Here I leave the code, which is still working on CPU. Parallel version of the code.
#include <omp.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define N 25 //This defines the number of points per dimension (Cube = N*N*N)
#define NUM_STEPS 6000 //This is the number of simulations time steps
/*writeFile: this function writes simulation results into a file.
* A file is created for each iteration that's passed to the function
* as a parameter. It also takes the triple pointer to the simulation
* data*/
void writeFile(int iteration, double*** data){
char filename[50];
char itr[12];
sprintf(itr, "%d", iteration);
strcpy(filename, "heat_");
strcat(filename, itr);
strcat(filename, ".txt");
//printf("Filename is %s\n", filename);
FILE *fp;
fp = fopen(filename, "w");
fprintf(fp, "x,y,z,T\n");
for(int i=0; i<N; i++){
for(int j=0;j<N; j++){
for(int k=0; k<N; k++){
fprintf(fp,"%d,%d,%d,%f\n", i,j,k,data[i][j][k]);
}
}
}
fclose(fp);
}
void compute_heat_transfer(double ***arrayOld, double ***arrayNew){
int i,j,k;
/*Compute steady-state solution*/
for(int nsteps=0; nsteps < NUM_STEPS; nsteps++){
/*if(nsteps % 100 == 0){
writeFile(nsteps, arrayOld);
}*/
#pragma omp parallel shared(arrayNew, arrayOld) private(i,j,k)
{
#pragma omp for
for(i=1; i<N-1; i++){
for(j=1; j<N-1; j++){
for(k=1;k<N-1;k++){
//This is the 6-neighbor stencil computation
arrayNew[i][j][k] = (arrayOld[i-1][j][k] + arrayOld[i+1][j][k] + arrayOld[i][j-1][k] + arrayOld[i][j+1][k] +
arrayOld[i][j][k-1] + arrayOld[i][j][k+1])/6.0;
}
}
}
#pragma omp for
for(i=1; i<N-1; i++){
for(j=1; j<N-1; j++){
for(k=1; k<N-1; k++){
arrayOld[i][j][k] = arrayNew[i][j][k];
}
}
}
}
}
}
int main (int argc, char *argv[]) {
int i,j,k,nsteps;
double mean;
double ***arrayOld; //Variable that will hold the data of the past iteration
double ***arrayNew; //Variable where newly computed data will be stored
arrayOld = (double***)malloc(N*sizeof(double**));
arrayNew = (double***)malloc(N*sizeof(double**));
if(arrayOld== NULL){
fprintf(stderr, "Out of memory");
exit(0);
}
for(i=0; i<N;i++){
arrayOld[i] = (double**)malloc(N*sizeof(double*));
arrayNew[i] = (double**)malloc(N*sizeof(double*));
if(arrayOld[i]==NULL){
fprintf(stderr, "Out of memory");
exit(0);
}
for(int j=0;j<N;j++){
arrayOld[i][j] = (double*)malloc(N*sizeof(double));
arrayNew[i][j] = (double*)malloc(N*sizeof(double));
if(arrayOld[i][j]==NULL){
fprintf(stderr,"Out of memory");
exit(0);
}
}
}
/*Set boundary values and compute mean boundary values*/
mean = 0.0;
for(i=0; i<N; i++){
for(j=0;j<N;j++){
arrayOld[i][j][0] = 100.0;
mean += arrayOld[i][j][0];
}
}
for(i=0; i<N; i++){
for(j=0;j<N;j++){
arrayOld[i][j][N-1] = 100.0;
mean += arrayOld[i][j][N-1];
}
}
for(j=0; j<N; j++){
for(k=0;k<N;k++){
arrayOld[0][j][k] = 100.0;
mean += arrayOld[0][j][k];
}
}
for(j=0; j<N; j++){
for(k=0;k<N;k++){
arrayOld[N-1][j][k] = 100.0;
mean += arrayOld[N-1][j][k];
}
}
for(i=0; i<N; i++){
for(k=0;k<N;k++){
arrayOld[i][0][k] = 100.0;
mean += arrayOld[i][0][k];
}
}
for(i=0; i<N; i++){
for(k=0;k<N;k++){
arrayOld[i][N-1][k] = 0.0;
mean += arrayOld[i][N-1][k];
}
}
mean /= (6.0 * (N*N));
/*Initialize interior values*/
for(i=1; i<N-1; i++){
for(j=1; j<N-1; j++){
for(k=1; k<N-1;k++){
arrayOld[i][j][k] = mean;
}
}
}
double tdata = omp_get_wtime();
compute_heat_transfer(arrayOld, arrayNew);
tdata = omp_get_wtime()-tdata;
printf("Execution time was %f secs\n", tdata);
for(i=0; i<N;i++){
for(int j=0;j<N;j++){
free(arrayOld[i][j]);
free(arrayNew[i][j]);
}
free(arrayOld[i]);
free(arrayNew[i]);
}
free(arrayOld);
free(arrayNew);
return 0;
}

Use variable length arrays with dynamic storage:
Allocation:
double (*arr)[N][N] = calloc(N, sizeof *arr);
Indexing.
Use good old arr[i][j][k] syntax
Deallocation.
free(arr)
Flattening.
double *flat = (double*)arr;
Note that this conversion is not guaranteed by the C standard to work.
Though it will very likely work on all platforms capable of using GPUs.
Passing to functions.
VLAs can be parameters of the functions.
void fun(int n, double arr[n][n][n]) {
...
}
Exemplary usage would be:
foo(N, arr);
EDIT
VLA friendly variant of compute_heat_transfer():
void compute_heat_transfer(int n, double arrayOld[restrict n][n][n], double arrayNew[restrict n][n][n]) {
int i,j,k;
/*Compute steady-state solution*/
for(int nsteps=0; nsteps < NUM_STEPS; nsteps++){
/*if(nsteps % 100 == 0){
writeFile(nsteps, arrayOld);
}*/
#pragma omp parallel for collapse(3)
for(i=1; i<n-1; i++){
for(j=1; j<n-1; j++){
for(k=1; k<n-1; k++){
//This is the 6-neighbor stencil computation
arrayNew[i][j][k] = (arrayOld[i-1][j][k] + arrayOld[i+1][j][k] + arrayOld[i][j-1][k] + arrayOld[i][j+1][k] +
arrayOld[i][j][k-1] + arrayOld[i][j][k+1])/6.0;
}}}
#pragma omp parallel for collapse(3)
for(i=1; i<n-1; i++){
for(j=1; j<n-1; j++){
for(k=1; k<n-1; k++){
arrayOld[i][j][k] = arrayNew[i][j][k];
}}}
}
}
Keyword restrict in arrNew[restrict n][n][n] is used to let the compiler assume that arrNew and arrOld do not alias. It should let the compiler use more aggressive optimizations.
Note that arrNew and arrOld are pointer to arrays. So rather than copy arrNew to arrOld you could simply swap those pointers forming a kind of simple double buffering. It should make the code even faster.

My MPI jacobi iteration program gives wrong result

I got the above mentioned error on running the following program in C.
This program Firstly generates 3x3 array and executes Jacobi iteration. It uses MPI library. I don't know what parts of code are wrong.:
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <math.h> // l2-norm //
#include <time.h>
int main(int argc, char **argv)
{
int numprocs, myid;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
double a[3][3];
double b[3];
double x[3]={0};
double xa[3]={0};
double xnew[3]={0};
double y[3]={0};
float sigancha;
time_t startTime=0, endTime=0;
int n=3;
int i, j =0;
int k=0;
int o;
int hoessu=300;
int minhoessu=300;
double sum=1;
int numsent =0;
int ans;
int row;
MPI_Status status;
int sender;
int po;
double *buffer;
/* synchronization */
MPI_Barrier(MPI_COMM_WORLD);
for (i=0; i<n; i++){
b[i]=i*100;
for (j=0; j<n; j++) {
a[i][j]=((i+j)%10);
if (i==j) {a[i][j]+=5000;}
}
x[i]=b[i]/a[i][i];
}
/* run if sum is greater than 0.0002 */
for (k=0; k<hoessu&&sum>0.0002||k<minhoessu; k++) {
numsent = 0;
for (o=myid+1; o<n+1; o+=numprocs) {
i=o-1;
xa[i]=b[i]+a[i][i]*x[i];
for (j=0; j<n; j++) {
xa[i]-=a[i][j]*x[j];
}
xnew[i]=xa[i]/a[i][i];
/*send xnew[i] to master*/
MPI_Send(&xnew[i],1,MPI_DOUBLE,0,i,MPI_COMM_WORLD);
}
if (myid == 0){
/*get xnew[i]*/
for (i=0; i<n; i++) {
MPI_Recv(&ans, 1, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
sender = status.MPI_SOURCE;
row = status.MPI_TAG;
xnew[row] = ans;
}
/*calculates sum at master*/
for (j=0; j<n; j++){
sum=0.0;
sum+=(xnew[j]-x[j])*(xnew[j]-x[j]);
x[j]=xnew[j];
}
sum=pow(sum,0.5);
MPI_Bcast(&x[0], n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
}
}
if (myid == 0){
endTime=clock();
sigancha=(float)(endTime-startTime)/(CLOCKS_PER_SEC);
printf("finished\n");
for (j=0; j<n; j++) {
printf("x[%d]=%fl\n",j+1,xnew[j]);
}
printf("iteration; %d times itereation are done. \n l2-norm, error is %fl .\n %f seceonds are used. \n",k,sum,sigancha);
}
MPI_Finalize();
}
Uses mpicc for compile.
mpicc mpijacobi2.c -o taskingyeje
./taskingyeje
Result.
finished
x[1]=-1736884775.000000l
x[2]=-370936800.000000l
x[3]=2118301216.000000l
iteration; 300 times itereation are done.
l2-norm, error is 34332272.000000l .
0.020000 seceonds are used.
however, this result is not intended result. If this program worked perfectly, It should give same result of serial jacobi iteration.
It would be
x[1]=-0.000020l
x[2]=-0.019968l
x[3]=0.399956l
I don't know why this program generate wrong result.

#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <math.h> // l2-norm //
#include <time.h>
int main(int argc, char **argv)
{
int numprocs, myid;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
double a[700][700];
double b[700];
double x[700]={0};
double xa[700]={0};
double xnew[700]={0};
double y[700]={0};
float sigancha;
time_t startTime=0, endTime=0;
int n=700;
int i, j =0;
int k=0;
int o;
int hoessu=300;
int minhoessu=300;
double sum=1;
int numsent =0;
int ans;
int row;
MPI_Status status;
int sender;
int po;
double *buffer;
/* synchronization */
MPI_Barrier(MPI_COMM_WORLD);
for (i=0; i<n; i++){
b[i]=i*100;
for (j=0; j<n; j++) {
a[i][j]=((i+j)%10);
if (i==j) {a[i][j]+=10000;}
}
x[i]=b[i]/a[i][i];
}
/* run if sum is greater than 0.0002 */
for (k=0; k<hoessu&&sum>0.0002||k<minhoessu; k++) {
numsent = 0;
for (o=myid+1; o<n+1; o+=numprocs) {
i=o-1;
xa[i]=b[i]+a[i][i]*x[i];
for (j=0; j<n; j++) {
xa[i]-=a[i][j]*x[j];
}
xnew[i]=xa[i]/a[i][i];
/*send xnew[i] to master*/
ans=xnew[i];
MPI_Allgather(&xnew[i],1,MPI_DOUBLE,&xnew[i],1,MPI_DOUBLE,MPI_COMM_WORLD);
}
if (myid == 0){
/*calculates sum at master*/
for (j=0; j<n; j++){
sum=0.0;
sum+=(xnew[j]-x[j])*(xnew[j]-x[j]);
x[j]=xnew[j];
}
sum=pow(sum,0.5);
MPI_Bcast(&x[0], n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
}
}
if (myid == 0){
endTime=clock();
sigancha=(float)(endTime-startTime)/(CLOCKS_PER_SEC);
printf("finished\n");
for (j=0; j<n; j++) {
printf("x[%d]=%fl\n",j+1,xnew[j]);
}
printf("iteration; %d times itereation are done. \n l2-norm, error is %fl .\n %f seceonds are used. \n",k,sum,sigancha);
}
MPI_Finalize();
}

Passing submatrices from master to slaves MPI

I'm trying to learn MPI and I've run into the following problem in one of my courses:
Consider a matrix A of dimensions n * n in which each element is an integer. Given 2 pair of indices (i1,j1) and (i2,j2) find the submatrix of such dimensions in matrix A for which it's elements sum is maximum.
I'd like some help on how to pass the submatrices to the processes. Should I calculate first how many submatrices (s) are in the matrix and send to each process N/s? How would I send the submatrices?
Some skeleton code I wrote:
#include<mpi.h>
#include<stdio.h>
#include<math.h>
#include<assert.h>
#include<iostream>
using namespace std;
#pragma comment (lib, "msmpi.lib")
enum CommunicationTag
{
COMM_TAG_MASTER_SEND_TASK,
COMM_TAG_MASTER_SEND_TERMINATE,
COMM_TAG_SLAVE_SEND_RESULT,
};
void print_matrix(int mat[10][10], int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
printf("%d ", mat[i][j]);
}
printf("\n");
}
}
int main(int argc, char *argv[]) {
//0. Init part, finding rank and number of processes
int numprocs, rank, rc;
rc = MPI_Init(&argc, &argv);
if (rc != MPI_SUCCESS) {
printf("Error starting MPI program. Terminating \n");
MPI_Abort(MPI_COMM_WORLD, rc);
}
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("I'm rank %d. Num procs %d\n", rank, numprocs); fflush(stdout);
//1. different machine code
if (rank == 0)
{
int n;
scanf("%d", &n);
int i1, i2, j1, j2;
scanf("%d%d%d%d", &i1, &i2, &j1, &j2);
int mat[10][10];
//init data
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++) {
mat[i][j] = (rand() % 100) - 50; //init random between -50 and 49
}
print_matrix(mat, n);
//here; how do I pass the submatrices to the processes?
for (int i = 1; i < numprocs; i++) {
MPI_Send(&i1, 1, MPI_INT, i, COMM_TAG_MASTER_SEND_TASK, MPI_COMM_WORLD);
MPI_Send(&i2, 1, MPI_INT, i, COMM_TAG_MASTER_SEND_TASK, MPI_COMM_WORLD);
MPI_Send(&j1, 1, MPI_INT, i, COMM_TAG_MASTER_SEND_TASK, MPI_COMM_WORLD);
MPI_Send(&j2, 1, MPI_INT, i, COMM_TAG_MASTER_SEND_TASK, MPI_COMM_WORLD);
//here; how do I pass the submatrices to the processes?
}
}
else {
//if slave ...
}
system("Pause");
}

The first step is to stop thinking about how to use MPI_Send(). The basic solution is to use MPI_Bcast() to transmit A to all the MPI processes.
Then divide the work up (no need to communicate for this, the same dividing logic can run in each process). Compute the sums within each MPI process, and collect them in the main process using MPI_Gather(). Choose the largest and you're done.
It really only requires two MPI operations: Bcast to distribute the input data to all processes, and Gather to centralize the results.
Note that all MPI processes need to execute the collective operations together in lockstep. You only need if (rank == 0) to know which process should load the matrix and analyze the Gathered results.

Troubles in MPI : Use mpi to rewrite the program as a parallel program

Request:each process needs to calculate the distance between its own group and all points.
My code is as follows：
#include stdio.h
#include stdlib.h
#include math.h
#include string.h
#include "mpi.h"
double **malloc_Array2D(int row, int col)
{
int size = sizeof(double);
int point_size = sizeof(double*);
double **arr = (double **) malloc(point_size * row + size * row * col);
if (arr != NULL)
{
memset(arr, 0, point_size * row + size * row * col);
double *head = (double*)(arr + row);
while (row--)
arr[row] = head + row * col;
}
return (double**)arr;
}
void free_Aarray2D(void **arr)
{
if (arr != NULL)
free(arr);
}
double distance(double *pos1, double *pos2, int dim)
{
int i;
double dist = 0.0;
for(i=0;i<dim;i++)
dist += pow((pos2[i]-pos1[i]), 2.0);
return sqrt(dist);
}
int main(int argc,char *argv[])
{
int np, myid;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
//open file
FILE *fp;
if((fp=fopen("points.dat","r"))==NULL)
{
printf("open file failure! exit!\n");
return -1;
}
//read the number of points
int npoints;
fscanf(fp, "There are %d points\n", &npoints);
if(0==myid)
printf("There are %d points\n", npoints);
int nptsl_max = npoints/np;
if((npoints%np)!=0)
nptsl_max++;
double (*xi)[3];
double **dist;
int *ind = (int *)malloc(sizeof(int)*nptsl_max);
xi = (double (*)[3])malloc(sizeof(double)*nptsl_max*3);
dist= malloc_Array2D(nptsl_max, npoints);
int nptsl = 0; //local number of points
char str[200];
for(int i=0; i<npoints; i++)
{
if(myid == (i%np))
{
fscanf(fp, "%d %lf %lf %lf", ind+nptsl, &xi[nptsl][0], &xi[nptsl][1], &xi[nptsl][2]);
nptsl++;
}
else
fgets(str, 200, fp);
}
fclose(fp);
for(int i=0; i<nptsl; i++)
printf("point %4d on process %d\n", *(ind+i), myid);
dist = (myid == (np-1))?0 :(myid+1);
source = (myid == 0) ?(np-1):(myid-1);
double (*yi)[3];
yi = (double (*)[3])malloc(sizeof(double)*nptsl_max*3);
for(int i=0; i<nptsl; i++)
for(int j=0; j<3; j++)
yi[i][j] = xi[i][j];
for(int loop=0; loop < np)
{
for(int i=0; i<nptsl; i++)
{
for(int j=0; j<npoints; j++)
{
dist[i][j] = distance(xi[i], xi[j], 3);
}
}
}
sprintf(filename, "dist_%d.dat",myid);
fp = fopen(filename, "w");
for(i=0; i<npoints; i++)
{
fprintf(fp, "%4d", ind[i]);
for(j=0; j<npoints; j++)
{
fprintf(fp, " %f", dist[i][j]);
}
fprintf(fp, "\n");
}
fclose(fp);
free(ind);
free(xi);
free_Aarray2D((void **)dist);
}
I don't know how to communicate messages.
I think it should be used MPI_Bcast or MPI_Gather for communication.
But I can't solve this problem all the time.Can anyone help me?Thank you!

So each process is reading a subset from a file, and you want to calculate the distance from each point in the sub-set to ALL the points?
Why is there a loop with a "loop" variable?
If you need to do one iteration for calculating the distances, it would probably be faster to have each process read the whole file entirely, calculate the distances for a subset and the use MPI_Gather to send all the distances to the root node that will write the result.
If you plan on doing several iterations, where in each iteration you also update the location of the points, then you need to also use MPI_Allgather to send/recv the points from the other processes.