From OpenMP to MPI - c

I just wonder how to convert the following openMP program to a MPI program
#include <omp.h>
#define CHUNKSIZE 100
#define N 1000
int main (int argc, char *argv[])
{
int i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i)
{
#pragma omp for schedule(dynamic,chunk) nowait
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
} /* end of parallel section */
return 0;
}
I have a similar program that I would like to run on a cluster and the program is using OpenMP.
Thanks!
UPDATE:
In the following toy code, I want to limit the parallel part within function f():
#include "mpi.h"
#include <stdio.h>
#include <string.h>
void f();
int main(int argc, char **argv)
{
printf("%s\n", "Start running!");
f();
printf("%s\n", "End running!");
return 0;
}
void f()
{
char idstr[32]; char buff[128];
int numprocs; int myid; int i;
MPI_Status stat;
printf("Entering function f().\n");
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if(myid == 0)
{
printf("WE have %d processors\n", numprocs);
for(i=1;i<numprocs;i++)
{
sprintf(buff, "Hello %d", i);
MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }
for(i=1;i<numprocs;i++)
{
MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);
printf("%s\n", buff);
}
}
else
{
MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);
sprintf(idstr, " Processor %d ", myid);
strcat(buff, idstr);
strcat(buff, "reporting for duty\n");
MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
printf("Leaving function f().\n");
}
However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:
$ mpirun -np 3 ex2
Start running!
Entering function f().
Start running!
Entering function f().
Start running!
Entering function f().
WE have 3 processors
Hello 1 Processor 1 reporting for duty
Hello 2 Processor 2 reporting for duty
Leaving function f().
End running!
Leaving function f().
End running!
Leaving function f().
End running!
So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().

To answer your update:
When using MPI, the same program is run by each processor. In order to restrict parallel parts you will need to use a statement like:
if (rank == 0) { ...serial work... }
This will ensure that only one processor does the work inside this block.
You can see how this works in the example program you posted, inside f(), there is the if(myid == 0) statement. This block of statements will then only be executed by process 0, all other processes go straight to the else and receive their messages, before sending them back.
With regard to MPI_Init and MPI_Finalize -- MPI_Init initialises the MPI environment. Once you have called this method you can use the other MPI methods like Send and Recv. Once you have finished using MPI methods, MPI_Finalize will free up the resources etc, but the program will keep running. For example, you could call MPI_Finalize before performing some I/O that was going to take a long time. These methods do not delimit the parallel portion of the code, merely where you can use other MPI calls.
Hope this helps.

You just need to assign a portion of the arrays (a, b, c) to each process. Something like this:
#include <mpi.h>
#define N 1000
int main(int argc, char *argv[])
{
int i, myrank, myfirstindex, mylastindex, procnum;
float a[N], b[N], c[N];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &procnum);
MPI_Comm_rank(comm, &myrank);
/* Dynamic assignment of chunks,
* depending on number of processes
*/
if (myrank == 0)
myfirstindex = 0;
else if (myrank < N % procnum)
myfirstindex = myrank * (N / procnum + 1);
else
myfirstindex = N % procnum + myrank * (N / procnum);
if (myrank == procnum - 1)
mylastindex = N - 1;
else if (myrank < N % procnum)
mylastindex = myfirstindex + N / procnum + 1;
else
mylastindex = myfirstindex + N / procnum;
// Initializations
for(i = myfirstindex; i < mylastindex; i++)
a[i] = b[i] = i * 1.0;
// Computations
for(i = myfirstindex; i < mylastindex; i++)
c[i] = a[i] + b[i];
MPI_Finalize();
}

You can try to use proprietary Intel Cluster OpenMP. It will run OpenMP programs on cluster.
Yes, it simulates shared memory computer on distributed memory clusters using the "Software Distributed shared memory" http://en.wikipedia.org/wiki/Distributed_shared_memory
It is easy to use and included in Intel C++ Compiler (9.1+). But it works only on 64-bit processors.

Related

MPI returns incorrect results for one of processes

I'm learning MPI now and wrote simple C program that uses MPI_Scatter and MPI_Reduce as follows:
int main(int argc, char **argv)
{
int mpirank, mpisize;
int tabsize = atoi(*(argv + 1));
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
MPI_Comm_size(MPI_COMM_WORLD, &mpisize);
unsigned long int sum = 0;
int rcvsize = tabsize / mpisize;
int *rcvbuf = malloc(rcvsize * sizeof(int));
int *tab = malloc(tabsize * sizeof(int));
int totalsum = 0;
if(mpirank == 0){
for(int i=0; i < tabsize; i++){
*(tab + i) = 1;
}
}
MPI_Scatter(tab, tabsize/mpisize, MPI_INT, rcvbuf, tabsize/mpisize, MPI_INT, 0, MPI_COMM_WORLD);
for(int i=0; i < tabsize/mpisize; i++){
sum += *(rcvbuf + i);
}
printf("%d sum = %ld %d\n", mpirank, sum, tabsize/mpisize);
MPI_Reduce(&sum, &totalsum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if(mpirank == 0){
printf("The totalsum = %li\n", totalsum);
}
MPI_Finalize();
return 0;
}
The program gives inconsitient results and I don't understand why. For example:
$ mpirun -np 4 03_array_sum 120000000
1 sum = 29868633 30000000
2 sum = 30000000 30000000
0 sum = 30000000 30000000
3 sum = 30000000 30000000
The totalsum = 119868633
Here process 1 didn't count all elements given to it by MPI_Scatter.
UPDATE:
As user #Gilles Gouaillardet wrote below in the accepted answer I have run the code in a loop thirty times for both versions, with empty $OMPI_MCA_pml and set to "^ucx". When flag is empty for one run 8 out of 30 gives wrong values, when flag is set all runs are correct. Then I run same on Debian GNU/Linux 7 (wheezy) with OpenMPI 1.4.5 and all runs were correct with empty flag. Looks like something is wrong with OpenMPI 4.0.4 and/or Fedora 33.
I was able to reproduce the issue in the very same environment.
I do not know whether the root cause is within Open MPI or UCX.
Meanwhile, you can
mpirun --mca pml ^ucx ...
or
export OMPI_MCA_pml=^ucx
mpirun ...
or add into /etc/openmpi-x86_64/openmpi-mca-params.conf
pml = ^ucx

MPI_Scatter issues (C)

My assignment is to parallelize some provided sequential code that takes an array of numbers and merges it with another array in order to come up with a sorted list. The first step is to initialize the array on each processor, then fill it with values on the root process. Next, I'm supposed to scatter the array out to the other processors so each processor has a chunk of the data for sorting, then sort these small lists locally and then gather them back up with the root process. My problem here is that no matter what I try the scatter won't actually send any data to the other processors. The array in these processors is always full of 0's, whereas the root processor does contain a list of numbers. Anybody care to take a look at my code and tell me what I'm missing?
The original sequential code
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/**
* Prints a vector without decimal places...
*/
void print_vector(int n, double vector[]) {
int i;
printf("[%.0f", vector[0]);
for (i = 1; i < n; i++) {
printf(", %.0f", vector[i]);
}
printf("]\n");
}
/**
* Just checks sequentially if everything is in ascending order.
*
* Note: we don't care about stability in this sort since we
* have no data attached to the double value, see
* http://en.wikipedia.org/wiki/Sorting_algorithm#Stability
* for more detail.
*/
void test_correctness(int n, double v[]) {
int i;
for (i = 1; i < n; i++) {
if (v[i] < v[i-1]) {
printf("Correctness test found error at %d: %.4f is not < %.4f but appears before it\n", i, v[i-1], v[i]);
}
}
}
/**
* Initialize random vector.
*
* You may not parallelize this (even though it could be done).
*/
void init_random_vector(int n, double v[]) {
int i, j;
for (i = 0; i < n; i++) {
v[i] = rand() % n;
}
}
double *R;
/**
* Merges two arrays, left and right, and leaves result in R
*/
void merge(double *left_array, double *right_array, int leftCount, int rightCount) {
int i,j,k;
// i - to mark the index of left aubarray (left_array)
// j - to mark the index of right sub-raay (right_array)
// k - to mark the index of merged subarray (R)
i = 0; j = 0; k =0;
while (i < leftCount && j < rightCount) {
if(left_array[i] < right_array[j])
R[k++] = left_array[i++];
else
R[k++] = right_array[j++];
}
while (i < leftCount)
R[k++] = left_array[i++];
while (j < rightCount)
R[k++] = right_array[j++];
}
/**
* Recursively merges an array of n values using group sizes of s.
* For example, given an array of 128 values and starting s value of 16
* will result in 8 groups of 16 merging into 4 groups of 32, then recursively
* calling merge_all which merges them into 2 groups of 64, then once more
* recursively into 1 group of 128.
*/
void merge_all(double *v, int n, int s) {
if (s < n) {
int i;
for (i = 0; i < n; i += 2*s) {
merge(v+i, v+i+s, s, s);
// result is in R starting at index 0
memcpy(v+i, R, 2*s*sizeof(double));
}
merge_all(v, n, 2*s);
}
}
void merge_sort(int n, double *v) {
merge_all(v, n, 1);
}
int main(int argc, char *argv[]) {
if (argc < 2) {
// () means optional
printf("usage: %s n (seed)\n", argv[0]);
return 0;
}
int n = atoi(argv[1]);
int seed = 0;
if (argc > 2) {
seed = atoi(argv[2]);
}
srand(seed);
int mpi_p, mpi_rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &mpi_p);
MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
// init the temporary array used later for merge
R = malloc(sizeof(double)*n);
// all MPI processes will allocate a vector of length n, but only
// process 0 will initialize the values. You must distribute the
// values to processes. You will also likely need to allocate
// additional memory in your processes, make sure to clean it up
// by adding a correct free at the end of each process.
double *v = malloc(sizeof(double)*n);
if (mpi_rank == 0) {
init_random_vector(n, v);
}
double start = MPI_Wtime();
// do all the work ourselves! (you should make a better algorithm here!)
if (mpi_rank == 0) {
merge_sort(n, v);
}
double end = MPI_Wtime();
if (mpi_rank == 0) {
printf("Total time to solve with %d MPI Processes was %.6f\n", mpi_p, (end-start));
test_correctness(n, v);
}
MPI_Finalize();
free(v);
free(R);
return 0;
}
The parts I edited
double start = MPI_Wtime();
// do all the work ourselves! (you should make a better algorithm here!)
if (mpi_rank == 0)
{
//Scatter(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)
MPI_Scatter(&v, sizeOf, MPI_DOUBLE, &v, sizeOf, MPI_DOUBLE, 0, MPI_COMM_WORLD);
}
if (mpi_rank > 0)
{
//merge_sort(sizeOf, R);
printf("%f :v[0]", v[0]); printf("%s", "\n");
printf("%f :v[1]", v[1]); printf("%s", "\n");
printf("%f :v[2]", v[0]); printf("%s", "\n");
}
if (mpi_rank == 0)
{
//Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)
//MPI_Gather(&v, sizeOf, MPI_DOUBLE, test, sizeOf, MPI_DOUBLE, 0, MPI_COMM_WORLD);
//merge(v, test, n, sizeOf);
merge_sort(n, v);
}
double end = MPI_Wtime();
The code should output three values from each processor that isn't the root, but it just gives me 0's. I tried many different parameters inside the scatter function call, to no avail. Almost everything I tried produces the same output found below. Sample output:
0.000000 :v[0]
0.000000 :v[1]
0.000000 :v[2]
Total time to solve with 2 MPI Processes was 0.000052
EDIT: Calling scatter from every process is also something I tried, as it sounds like this is the way you're supposed to call it instead of just calling it from the root. However, then I get a bunch of errors instead of any output at all. Errors look like this:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: (128)
Failing at address: (nil)
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: (128)
Failing at address: (nil)
Something is clearly wrong here. Not sure what I'm doing wrong though.

MPI_scatter of 1D array

I new to MPI and I am trying to write program that uses MPI_scatter. I have 4 nodes(0, 1, 2, 3). Node0 is master, others are slaves. Master asks user for number of elements of array to send to slaves. Then it creates array of size number of elements * 4. Then every node prints it`s results.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
int main(int argc, char **argv) {
int id, nproc, len, numberE, i, sizeArray;
int *arrayN=NULL;
int arrayNlocal[sizeArray];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (id == MASTER){
printf("Enter number of elements: ");
scanf("%d", &numberE);
sizeArray = numberE * 4;
arrayN = malloc(numberE * sizeof(int));
for (i = 0; i < sizeArray; i++){
arrayN[i] = i + 1;
}
}
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal, numberE,MPI_INT, MPI_COMM_WORLD);
printf("Node %d has: ", id);
for (i = 0; i < numberE; i++){
printf("%d ",arrayNlocal[i]);
}
MPI_Finalize();
return 0;
}
And as error i get:
BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
PID 9278 RUNNING AT 192.168.100.100
EXIT CODE: 139
CLEANING UP REMAINING PROCESSES
YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
In arrayNlocal[sizeArray];, sizeArray is not initialized. The best way to go is to broadcast numberE to every processes and allocate memory for arrayNlocal. Something like:
MPI_Bcast( &numberE, 1, MPI_Int, 0, MPI_COMM_WORLD)
arrayN is an array of size sizeArray = numberE * 4, so:
arrayN = malloc(sizeArray * sizeof(int));
MPI_Scatter() needs pointers to the data to be sent on root node, and a pointer to receive buffer on each process of the communicator. Since arrayNlocal is an array:
MPI_Scatter(arrayN, numberE, MPI_INT, arrayNlocal, numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
or alternatively:
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal[0], numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
id is not initialized in id == MASTER: it must be rank==MASTER.
As is, the prints at the end might occur in a mixed way between processes.
Try to compile your code using mpicc main.c -o main -Wall to enable all warnings: it can save you a few hours in the near future!

Segfault when running openmpi

I am currently working on a project where I need to implement a parallel fft algorithm using openmpi. I have a compiling piece of code, but when I run it over the cluster I get segmentation faults.
I have my hunches about where things are going wrong, but I don't think I have enough of an understanding about pointers and references to be able to make a efficient fix.
The first chunk that could be going wrong is in the passing of the arrays to the helper functions. I believe that either my looping is inconsistent, or I am not understanding how the to pass these pointers and get back the things I need.
The second possible spot would be within the actual mpi_Send/Recv commands. I am sending a type that is not supported by the openmpi c datatypes, so I am using the mpi_byte type to send the raw data instead. Is this a viable option? Or should I be looking into an alternative to this method.
/* function declarations */
double complex get_block(double complex c[], int start, int stop);
double complex put_block(double complex from[], double complex to[],
int start, int stop);
void main(int argc, char **argv)
{
/* Initialize MPI */
MPI_Init(&argc, &argv);
double complex c[N/p];
int myid;
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
//printf("My id is %d\n",myid);
MPI_Status status;
int i;
for(i=0;i<N/p;i++){
c[i] = 1.0 + 1.0*I;
}
int j = log(p)/log(2) + 1;
double q;
double complex z;
double complex w = exp(-2*PI*I/N);
double complex block[N/(2*p)]; // half the size of chunk c
int e,l,t,k,m,rank,plus,minus;
int temp = (log(N)-log(p))/log(2);
//printf("temp = %d", temp);
for(e = 0; e < (log(p)/log(2)); e++){
/* loop constants */
t = pow(2,e); l = pow(2,e+temp);
q = n/2*l; z = cpow(w,(complex)q);
j = j-1; int v = pow(2,j);
if(e != 0){
plus = (myid + p/v)%p;
minus = (myid - p/v)%p;
} else {
plus = myid + p/v;
minus = myid - p/v;
}
if(myid%t == myid%(2*t)){
MPI_Recv((char*)&c,
sizeof(c),
MPI_BYTE,
plus,
MPI_ANY_TAG,
MPI_COMM_WORLD,
&status);
/* transform */
for(k = 0; k < N/p; k++){
m = (myid * N/p + k)%l;
c[k] = c[k] + c[k+N/v] * cpow(z,m);
c[k+N/v] = c[k] - c[k + N/v] * cpow(z,m);
printf("(k,k+N/v) = (%d,%d)\n",k,k+N/v);
}*/
printf("\n\n");
/* end transform */
*block = get_block(c, N/v, N/v + N/p + 1);
MPI_Send((char*)&block,
sizeof(block),
MPI_BYTE,
plus,
1,
MPI_COMM_WORLD);
} else {
// send data of this PE to the (i- p/v)th PE
MPI_Send((char*)&c,
sizeof(c),
MPI_BYTE,
minus,
1,
MPI_COMM_WORLD);
// after the transformation, receive data from (i-p/v)th PE
// and store them in c:
MPI_Recv((char*)&block,
sizeof(block),
MPI_BYTE,
minus,
MPI_ANY_TAG,
MPI_COMM_WORLD,
&status);
*c = put_block(block, c, N/v, N/v + N/p - 1);
//printf("Process %d send/receive %d\n",myid, plus);
}
}
/* shut down MPI */
MPI_Finalize();
}
/* helper functions */
double complex get_block(double complex *c, int start, int stop)
{
double complex block[stop - start + 1];
//printf("%d = %d\n",sizeof(block)/sizeof(double complex), sizeof(&c)/sizeof(double complex));
int j = 0;
int i;
for(i = start; i < stop+1; i++){
block[j] = c[i];
j = j+1;
}
return *block;
}
double complex put_block(double complex from[], double complex to[], int start, int stop)
{
int j = 0;
int i;
for(i = start; i<stop+1; i++){
to[i] = from[j];
j = j+1;
}
return *to;
}
I really appreciate the feedback!
You are using arrays / pointers to arrays in the wrong way. For example you declare an array as double complex block[N], which is fine (although uncommon, in most cases it is better to use malloc) and then you receive into it via MPI_Recv(&block). However "block" is already a pointer to that array, so by writing "&block" you are passing the pointer of the pointer to MPI_Recv. That's not what it expects. If you want to use the "&" notation you have to write &block[0], which would give you the pointer to the first element of the block-array.
Have you tried debugging your code? This can be a pain in a parallel setting, but it can tell you exactly where it is failing and usually also why.
If you're using Linux or OS X, you could run your code as follows on the command line:
mpirun -np 4 xterm -e gdb -ex run --args ./yourprog yourargs
where I'm assuming yourprog is the name of your program and yourargs are any command-line arguments you want to pass.
What this command will do is launch four xterm windows. Each xterm will in turn launch gdb as specified by the option -e. gdb will then execute the command run as specified by the option -ex and launch your executable with the given options, as specified by --args.
What you get are four xterm windows running four instances of your program in parallel with MPI. If any of the instances crashes, gdb will tell you where and why.

C MPI - spawning multiple threads in batches

I have a basic question about MPI programming in C. Essentially what I want is that there is a master process that spawns a specific number of child processes, collects some information from all of them (waits until all of the children finish), calculates some metric, based on this metric it decides if it has to spawn more threads... it keeps doing this until the metric meets some specific condition. I have searched through the literature, to no avail. How can this be done? any pointers?.
Thanks for the help.
Courtesy : An introduction to the Message Passing Interface (MPI) using C. In the "complete parallel program to sum an array", lets say, "for some lame reason", I want the master process to sum the contents of the array twice. I.e in the first iteration, the master process starts the slave processes which compute the sum of the arrays, once they are done and the master process returns the value, I would like to invoke the master process to reinvoke another set of threads to do the computation again. Why would the code below not work? I added a while loop around the master process process which spawns the slave processes.
#include <stdio.h>
#include <mpi.h>
#define max_rows 100000
#define send_data_tag 2001
#define return_data_tag 2002
int array[max_rows];
int array2[max_rows];
main(int argc, char **argv)
{
long int sum, partial_sum,number_of_times;
number_of_times=0;
MPI_Status status;
int my_id, root_process, ierr, i, num_rows, num_procs,
an_id, num_rows_to_receive, avg_rows_per_process,
sender, num_rows_received, start_row, end_row, num_rows_to_send;
/* Now replicte this process to create parallel processes.
* From this point on, every process executes a seperate copy
* of this program */
ierr = MPI_Init(&argc, &argv);
root_process = 0;
/* find out MY process ID, and how many processes were started. */
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
if(my_id == root_process) {
/* I must be the root process, so I will query the user
* to determine how many numbers to sum. */
//printf("please enter the number of numbers to sum: ");
//scanf("%i", &num_rows);
num_rows=10;
while (number_of_times<2)
{
number_of_times++;
start_row=0;
end_row=0;
if(num_rows > max_rows) {
printf("Too many numbers.\n");
exit(1);
}
avg_rows_per_process = num_rows / num_procs;
/* initialize an array */
for(i = 0; i < num_rows; i++) {
array[i] = i + 1;
}
/* distribute a portion of the bector to each child process */
for(an_id = 1; an_id < num_procs; an_id++) {
start_row = an_id*avg_rows_per_process + 1;
end_row = (an_id + 1)*avg_rows_per_process;
if((num_rows - end_row) < avg_rows_per_process)
end_row = num_rows - 1;
num_rows_to_send = end_row - start_row + 1;
ierr = MPI_Send( &num_rows_to_send, 1 , MPI_INT,
an_id, send_data_tag, MPI_COMM_WORLD);
ierr = MPI_Send( &array[start_row], num_rows_to_send, MPI_INT,
an_id, send_data_tag, MPI_COMM_WORLD);
}
/* and calculate the sum of the values in the segment assigned
* to the root process */
sum = 0;
for(i = 0; i < avg_rows_per_process + 1; i++) {
sum += array[i];
}
printf("sum %i calculated by root process\n", sum);
/* and, finally, I collet the partial sums from the slave processes,
* print them, and add them to the grand sum, and print it */
for(an_id = 1; an_id < num_procs; an_id++) {
ierr = MPI_Recv( &partial_sum, 1, MPI_LONG, MPI_ANY_SOURCE,
return_data_tag, MPI_COMM_WORLD, &status);
sender = status.MPI_SOURCE;
printf("Partial sum %i returned from process %i\n", partial_sum, sender);
sum += partial_sum;
}
printf("The grand total is: %i\n", sum);
}
}
else {
/* I must be a slave process, so I must receive my array segment,
* storing it in a "local" array, array1. */
ierr = MPI_Recv( &num_rows_to_receive, 1, MPI_INT,
root_process, send_data_tag, MPI_COMM_WORLD, &status);
ierr = MPI_Recv( &array2, num_rows_to_receive, MPI_INT,
root_process, send_data_tag, MPI_COMM_WORLD, &status);
num_rows_received = num_rows_to_receive;
/* Calculate the sum of my portion of the array */
partial_sum = 0;
for(i = 0; i < num_rows_received; i++) {
partial_sum += array2[i];
}
/* and finally, send my partial sum to hte root process */
ierr = MPI_Send( &partial_sum, 1, MPI_LONG, root_process,
return_data_tag, MPI_COMM_WORLD);
}
ierr = MPI_Finalize();
}
You should start by looking at MPI_Comm_spawn and collective operations. To collect information from old child processes,one would typically use MPI_Reduce.
This stackoverflow question might also be helpful.
...to spawn more threads...
I guess you meant the right thing since you used "process" instead of "thread" mostly, but just to clarify: MPI only deals with processes and not with threads.
I'm not sure how well you know MPI already - let me know if my answer was any help or if you need more hints.
The MPI-2 standard includes process management functionality. It's described in detail in Chapter 5. I have not used it myself though, so perhaps someone else may weigh in with more practical hints.

Resources