I'm new to using MPI, and I'm having an issue understanding why my code isn't working correctly.
#include "mpi.h"
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[]) {
int list_size = 1000
int threads;
int th_nums;
int slice;
char* the_array[list_size];
char* temp_array[list_size];
char str_to_search[10];
FILE *in = fopen("inputfile", "r");
char parse[10];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &threads);
MPI_Comm_size(MPI_COMM_WORLD, &th_nums);
if (threads == 0) { // if master task
fgets(parse, 10, in);
slice = atoi(parse); // How many slices to cut the array into
fgets(parse, 10, in);
sscanf(parse, "%s", search); // gives us the string we want to search
int i;
for (i = 0; i < list_size; i++) {
char temp[10];
fgets(parse, 10, in);
sscanf(parse, "%s", temp);
the_array[i] = strdup(temp);
}
int index = list_size/slice; //
MPI_Bcast(&str_to_search, 10, MPI_CHAR, 0, MPI_COMM_WORLD);
}
MPI_Bcast(&str_to_search, 10, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Scatter(the_array, index, MPI_CHAR, temp_array, index, 0, MPI_COMM_WORLD);
// Search for string occurs here
MPI_Finalize();
return 0;
}
However, I'm finding that when I try to search, only the master task receives some of the slice, the rest is null. All other tasks don't receive anything. I've tried placing MPI_Scatter outside of the if(master task) statement, but I have no luck with this. Also, when the list_size increases, I find the program basically gets stuck at the MPI_Scatter line. What am I doing wrong, and what can I do to correct this?
You should go look up some tutorials on MPI collectives. They require all processes to participate collectively. So if any process calls MPI_Scatter, then all processes must call MPI_Scatter. I'd recommend looking at some sample code and playing with it until you understand what's going on. Then try coming back to your own code and seeing if you can figure out what's going on.
My favorite reference for anything pre-MPI-3 is DeinoMPI. I've never used the implementation, but the documentation is great since it has a complete example for each function in the MPI-2 Spec.
Related
My goal is to create a program that takes a large list of unsorted integers (1-10 million) and divides it into 6 parts where a thread concurrently sorts it. After sorting I merge it into one sorted array so I can find the median and mode quicker.
The input file will be something like this:
# 1000000
314
267
213
934
where the number following the # identifies the number of integers in the list.
Currently I can sort perfect and quickly without threading however when I began threading I ran into an issue. For a 1,000,000 data set it only sorts the first 833,333 integers leaving the last 166,666 (1/6) unsorted.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <time.h>
#define BUF_SIZE 1024
int sum; /* this data will be shared by the thread(s) */
int * bigArr;
int size;
int findMedian(int array[], int size)
{
if (size % 2 != 0)
return array[size / 2];
return (array[(size - 1) / 2] + array[size / 2]) / 2;
}
/*compare function for quicksort*/
int _comp(const void* a, const void* b) {
return ( *(int*)a - *(int*)b);
}
/*This function is the problem method*/
/*indicate range of array to be processed with the index(params)*/
void *threadFct(int param)
{
int x= size/6;
if(param==0)x= size/6;
if(param>0&¶m<5)x= (size/6)*param;
if(param==5)x= (size/6)*param+ (size%size/6);/*pass remainder into last thread*/
qsort((void*)bigArr, x, sizeof(bigArr[param]), _comp);
pthread_exit(0);
}
int main(int argc, char *argv[])
{
FILE *source;
int i =0;
char buffer[BUF_SIZE];
if(argc!=2){
printf("Error. please enter ./a followed by the file name");
return -1;}
source= fopen(argv[1], "r");
if (source == NULL) { /*reading error msg*/
printf("Error. File not found.");
return 1;
}
int count= 0;
while (!feof (source)) {
if (fgets(buffer, sizeof (buffer), source)) {
if(count==0){ /*Convert string to int using atoi*/
char str[1];
sprintf(str, "%c%c%c%c%c%c%c%c%c",buffer[2],buffer[3],buffer[4],buffer[5],buffer[6],buffer[7],buffer[8],buffer[9],buffer[10]);/*get string of first */
size= atoi(str); /* read the size of file--> FIRST LINE of file*/
printf("SIZE: %d\n",size);
bigArr= malloc(size*sizeof(int));
}
else{
//printf("[%d]= %s\n",count-1, buffer); /*reads in the rest of the file*/
bigArr[count-1]= atoi(buffer);
}
count++;
}
}
/*thread the unsorted array*/
pthread_t tid[6]; /* the thread identifier */
pthread_attr_t attr; /* set of thread attributes */
// qsort((void*)bigArr, size, sizeof(bigArr[0]), _comp); <---- sorts array without threading
for(i=0; i<6;i++){
pthread_create(&tid[i], NULL, &threadFct, i);
pthread_join(tid[i], NULL);
}
printf("Sorted array:\n");
for(i=0; i<size;i++){
printf("%i \n",bigArr[i]);
}
fclose(source);
}
So to clarify the problem function is in my threadFct().
To explain what the function is doing, the param(thread number) identifies which chunk of the array to quicksort. I divide the size into 6 parts and because the it is even, the remainder of the numbers go into the last chunk. So for example, 1,000,000 integers I would have the first 5/6 sort 166,666 each and the last 1/6 would sort the remainder (166670).
I am aware that
Multi-threading will not speed up much at all even for 10 million integers
This is not the most efficient way to find the median/mode
Thanks for reading this and any help is received with gratitude.
You're sorting the beginning of the array in every call to qsort. You're only changing the number of elements that each thread sorts, by setting x. You're also setting x to the same value in threads 0 and 1.
You need to calculate an offset into the array for each thread, which is just size/6 * param. The number of elements will be size/6 except for the last chunk, which uses a modulus to get the remainder.
As mentioned in the comments, the argument to the thread function should be a pointer, not int. You can hide an integer in the pointer, but you need to use explicit casts.
void *threadFct(void* param_ptr)
{
int param = (int)param_ptr;
int start = size/6 * param;
int length;
if (param < 5) {
length = size/6;
} else {
length = size - 5 * (size/6);
}
qsort((void*)(bigArr+start), length, sizeof(*bigArr), _comp);
pthread_exit(0);
}
and later
pthread_create(&tid[i], NULL, &threadFct, (void*)i);
I am trying to write MPI C code that repeatedly performs a calculation and saves its outcome into a single array for outputting less frequently. Example code below (the size of var, 200, is sufficient for the number of CPUs in use):
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv){
float *phie, *phitemp, var[200];
int time=0, gatherphi=10, gatherfile = 200, j, iter=0, lephie, x;
int nelecrank = 2, size, rank, Tmax = 2000;
FILE *out;
MPI_Init(&argc, &argv) ;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
lephie = gatherfile/gatherphi; // number of runs of calculation before output
// allocate memory
//printf("%d Before malloc.\n", rank);
if (rank==1) phie=(float *) malloc(nelecrank*size*gatherfile/gatherphi*sizeof(float));
phitemp=(float *) malloc(nelecrank*sizeof(float));
//printf("%d After malloc.\n", rank);
for(j=0;j<200;j++) var[j]=rank;
for(time=0;time<Tmax;time++){
if (!time%gatherphi) {// do calculation
for (j=0;j<nelecrank;j++) { // each processor does the calculation nelecrank times
phitemp[j]=0;
for (x = 0; x<=4; x++) {
phitemp[j]=phitemp[j]+var[rank+j*size];
}
}
} // end of j for loop
printf("iter: %d, %d Before MPI_Gather.\n", iter, rank);
MPI_Gather(phitemp, nelecrank, MPI_FLOAT, phie+iter*nelecrank*size*sizeof(float), nelecrank, MPI_FLOAT, 1, MPI_COMM_WORLD);
iter++;
} // end of gatherphi condition
if (time % gatherfile) { //output result of calculation
iter=0;
if (rank==1) {
out = fopen ("output.txt", "wt+");
if (out == NULL) {
printf("Could not open output file.\n");
exit(1);
}
else printf("Have opened output file.\n");
for (j=0;j<nelecrank*size*lephie;j++) {
fprintf(out,"%f ",*(phie+j*sizeof(float)));
}
fclose(out);
}
} // end of file output
if (rank==1) {
if (phie) free (phie);
}
if (phitemp) free (phitemp);
MPI_Finalize();
return 0;
}
It gives me repeated memory allocation problems until it finally exits. I am not experienced using memory allocation in MPI - can you help?
Many thanks,
Marta
Basically, phie is not big enough.
You malloced nelecrank*size*gatherfile/gatherphi*sizeof(float)=80*sizeof(float) memory for phie.
But, your MPI_Gather usage requires iter*nelecrank*size*sizeof(float)+nelecrank*size*sizeof(float) memory. iter takes a maximum value of nelecrank*Tmax-1, so phie needs to be (nelecrank*Tmax-1)*nelecrank*size*sizeof(float)+nelecrank*size*sizeof(float), which is around 8000*size*sizeof(float) large.
Maybe you want to reset iter=0 in your loop somewhere?
I am trying to program an MPI_Alltoallv using an MPI Derived datatype using MPI_Type_create_struct. I could not find any examples solving this particular problem. Most examples like this perform communication(Send/Recv) using a single struct member, whereas I am targeting an array of structs. Following is a simpler test code that attempts a MPI_Sendrecv operation on an array of structs created using DDT:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <stddef.h>
typedef struct sample{
char str[12];
int count;
}my_struct;
int main(int argc, char **argv)
{
int rank, count;
my_struct *sbuf = (my_struct *) calloc (sizeof(my_struct),5);
my_struct *rbuf = (my_struct *) calloc (sizeof(my_struct),5);
int blens[2];
MPI_Aint displs[2];
MPI_Aint baseaddr, addr1, addr2;
MPI_Datatype types[2];
MPI_Datatype contigs[5];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
strcpy(sbuf[0].str,"ACTGCCAATTCG");
sbuf[0].count = 10;
strcpy(sbuf[1].str,"ACTGCCCATACG");
sbuf[1].count = 5;
strcpy(sbuf[2].str,"ACTGCCAATTTT");
sbuf[2].count = 6;
strcpy(sbuf[3].str,"CCTCCCAATTCG");
sbuf[3].count = 12;
strcpy(sbuf[4].str,"ACTATGAATTCG");
sbuf[4].count = 8;
blens[0] = 12; blens[1] = 1;
types[0] = MPI_CHAR; types[1] = MPI_INT;
for (int i=0; i<5; i++)
{
MPI_Get_address ( &sbuf[i], &baseaddr);
MPI_Get_address ( &sbuf[i].str, &addr1);
MPI_Get_address ( &sbuf[i].count, &addr2);
displs[0] = addr1 - baseaddr;
displs[1] = addr2 - baseaddr;
MPI_Type_create_struct(2, blens, displs, types, &contigs[i]);
MPI_Type_commit(&contigs[i]);
}
/* send to ourself */
MPI_Sendrecv(sbuf, 5, contigs, 0, 0,
rbuf, 5, contigs, 0, 0,
MPI_COMM_SELF, &status);
for (int i=0; i<5; i++)
MPI_Type_free(&contigs[i]);
MPI_Finalize();
return 0;
}
I get the following warning at compile time:
coll.c(53): warning #810: conversion from "MPI_Datatype={int} *" to "MPI_Datatype={int}" may lose significant bits
MPI_Sendrecv(sbuf, 5, contigs, 0, 0,
^
coll.c(54): warning #810: conversion from "MPI_Datatype={int} *" to "MPI_Datatype={int}" may lose significant bits
rbuf, 5, contigs, 0, 0,
And observe the following error across all processes:
Rank 0 [Thu Jun 16 16:19:24 2016] [c0-0c2s9n1] Fatal error in MPI_Sendrecv: Invalid datatype, error stack:
MPI_Sendrecv(232): MPI_Sendrecv(sbuf=0x9ac440, scount=5, INVALID DATATYPE, dest=0, stag=0, rbuf=0x9ac4a0, rcount=5, INVALID DATATYPE, src=0, rtag=0, MPI_COMM_SELF, status=0x7fffffff6780) failed
Not sure what I am doing wrong. Do i need to further use "MPI_Type_create_resized " to register the "extent"? If so, an example quoting the above scenario would really help.
Also my main goal is to perform "MPI_Alltoallv" using a similar array of structs (of size ~ several thousands). Hopefully if I can get the SendRecv to work I can move on to "MPI_Alltoallv".
Any help would be highly appreciated.
The sendtype and recvtype parameters expect a parameter of type MPI_Datatype. What you're passing in is an array of these, i.e. a MPI_Datatype *.
You can only use one of these array elements at a time to pass to this function.
I'm supposed to have two threads that search for the minimum element in an array: the first one searches the first half, and the second thread searches the other half. However, when I run my code, it seems that it chooses a thread randomly. I'm not sure what I'm doing wrong, but it probably has to do with the "mid" part. I tried dividing an array into two, finding the midpoint and then writing the conditions from there, but I probably went wrong somewhere. I also tried putting array[i] in the conditions, but in that case only thread2 executes.
EDIT: I'm really trying my best here, but I'm not getting anywhere. I edited the code in a way that made sense to me, and I probably typecasted "min" wrong but now it doesn't even execute it just gives me an error, even though it compiles just fine. I'm just a beginner, and while I do understand everything you guys are talking about, I have a hard time implementing the ideas, so really, any help with fixing this is appreciated!
EDIT2: Okay so the previous code made no sense at all, I do apologize but I was exhausted while writing it. Anyway, I came up with something else that works partially! I split the array into two halves, however only the first element is accessible when using the pointer. But would it work if the whole array was being accessed and if so how can I fix that then?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#define size 20
void *smallest(void *arg);
pthread_t th, th2;
int array[size], i, min;
int main(int argc, char *argv[]) {
srand ( time(NULL) );
for(i = 0; i < size; i++)
{
array[i] = (rand() % 100)+1;
printf("%d ", array[i]);
}
int *array1 = malloc(10 * sizeof(int));
int *array2 = malloc(10 * sizeof(int));
memcpy(array1, array, 10 * sizeof(int));
memcpy(array2, array + 10, 10 * sizeof(int));
printf("\nFirst half gives %d \n", *array1);
printf("Second half gives %d \n", *array2);
pthread_create(&th, NULL, smallest, (void*) array1);
pthread_create(&th2, NULL, smallest, (void*) array2);
pthread_join(th, NULL);
pthread_join(th2, NULL);
//printf("\nFirst half gives %d\n", array1);
//printf("Second half gives %d\n", array2);
if (*array1 < *array2) {
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *array1);
}
else {
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *array2);
}
return 0;
}
void *smallest(void* arg){
int *array = (int*)arg;
min = array[0];
for (i = 0; i < size; i++) {
if (array[i] < min) {
min = array[i];
}
}
pthread_exit(NULL);
}
The code you've set up never runs more than one thread. Notice that if you run the first branch of the if statement, you fire off one thread to search half the array, wait for it to finish, then continue onward, and if the else branch executes, the same thing happens in the second half of the array. Fundamentally, you probably want to rethink your strategy here by having the code always launch two threads and join each of them only after both threads have started running.
The condition within your if statement also seems like it's mistaken. You're asking whether the middle element of the array is greater than its index. I assume this isn't what you're trying to do.
Finally, the code you have in each thread always looks at the entire array, not just a half of it. I would recommend rewriting the thread routine so that its argument represents the start and end indices of the range to take the minimum of. You would then update the code in main so that when you fire off the thread, you specify which range to search.
I would structure things like this:
Fire off a thread to find the minimum of the first half of the array.
Fire off a thread to find the minimum of the second half of the array.
Join both threads.
Use the results from each thread to find the minimum.
As one final note, since you'll have two different threads each running at the same time, you'll need to watch for data races as both threads try to read or write the minimum value. Consider having each thread use its exit code to signal where the minimum is and then resolving the true minimum back in main. This eliminates the race condition. Alternatively, have one global minimum value, but guard it with a mutex.
1) You´re redeclaring the global variables in the main function, so there´s actually no point in declaring i, low, high, min:
int array[size], i, low, high, min;
The problem you´re having is with the scope of the variables when you redeclare the variables in the main function, the global ones with the same name become "invisible"
int *low = array;
int *high = array + (size/2);
int mid = (*low + *high) / 2;
So when you run the threads all the values of your variables(low, high, min;
) are 0, this is because they are never actually modified by the main and because they start in 0 default(startup code,etc).
Anyways I wouldn´t really recommend(it´s really frowned upon) using global variables unless it´s a really small proyect for personal use.
2) Another crucial problem is that you´re ignorning the main idea behind threads which is running both simultaneously
if (array[mid] > mid) {
pthread_create(&th, NULL, &smallest, NULL);
pthread_join(th, NULL);
printf("\nThread1 finds the number\n");
}
else if (array[mid] < mid) {
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
printf("\nThread2 finds the number\n");
}
You´re actually only running one thread when executing.
Try something like this:
pthread_create(&th, NULL, &smallest, NULL);
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
pthread_join(th, NULL);
3) You are trying to have two threads access the same variable this can result in undefined behaviour, you MUST use a muthex to avoid a number from not actually being stored.
This guide is pretty complete regarding mutexes but if you need anyhelp please let me know.
This is a single threaded version of what you are asking.
#include <stdio.h>
#include <stdlib.h>
/*
I can not run pthread on my system.
So this is some code that should kind of work the same way
*/
typedef int pthread_t;
typedef int pthread_attr_t;
typedef void*(*threadfunc)(void*);
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg)
{
start_routine(arg);
return 0;
}
int pthread_join(pthread_t thread, void **value_ptr)
{
return 0;
}
struct context
{
int* begin;
int* end;
int* result;
};
//the function has to be castable to the threadfunction type
//that way you do not have to worry about casting the argument.
//be careful though - if something does not match these errors may be hard to track
void * smallest(context * c) //signature needet for start routine
{
c->result = c->begin;
for (int* current = c->begin; current < c->end; ++current)
{
if (*current < *c->result)
{
c->result = current;
}
}
return 0; // not needet with the way the argument is set up.
}
int main(int argc, char *argv[])
{
pthread_t t1, t2;
#define size 20
int array[size];
srand(0);
for (int i = 0; i < size; ++i)
{
array[i] = (rand() % 100) + 1;
printf("%d ", array[i]);
}
//prepare data
//one surefire way of messing up in multithreading is sharing data between threads.
//even a simple approach like storing in a variable who is accessing will not solve the issues
//to properly lock data you would have to dive into the memory model.
//either lock with mutexes or memory barriers or just don' t share data between threads.
context c1;
context c2;
c1.begin = array;
c1.end = array + (size / 2);
c2.begin = c1.end + 1;
c2.end = array + size;
//start threads - here your threads would go
//note the casting - you may wnt to wrap this in its own function
//there is error potential here, especially due to maintainance etc...
pthread_create(&t1, 0, (void*(*)(void*))smallest, &c1); //without typedef
pthread_create(&t2, 0, (threadfunc)smallest, &c2); //without typedef
pthread_join(t1, 0);//instead of zero you could have a return value here
pthread_join(t1, 0);//as far as i read 0 throws the return value away
//return value could be useful for error handling
//evaluate
if (*c1.result < *c2.result)
{
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *c1.result);
}
else
{
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *c2.result);
}
return 0;
}
Edit:
I edited some stubs in to give you an idea of how to use multithreading.
I never used pthread but this should likely work.
I used this source for prototype information.
I'm trying to read in multiple files using MPI-IO in C. I'm following this example : http://users.abo.fi/Mats.Aspnas/PP2010/examples/MPI/readfile1.c
However I'm reading in a matrix a doubles instead of a string of chars. Here is that implementation:
/*
Simple MPI-IO program that demonstrate parallel reading from a file.
Compile the program with 'mpicc -O2 readfile1.c -o readfile1'
*/
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
#define FILENAME "filename.dat"
double** ArrayAllocation() {
int i;
double** array2D;
array2D= (double**) malloc(num_procs*sizeof(double*));
for(i = 0; i < num_procs; i++) {
twoDarray[i] = (double*) malloc(column_size*sizeof(double));
}
return array2D;
}
int main(int argc, char* argv[]) {
int i, np, myid;
int bufsize, nrchar;
double *buf; /* Buffer for reading */
double **matrix = ArrayAllocation();
MPI_Offset filesize;
MPI_File myfile; /* Shared file */
MPI_Status status; /* Status returned from read */
/* Initialize MPI */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &np);
/* Open the files */
MPI_File_open (MPI_COMM_WORLD, FILENAME, MPI_MODE_RDONLY,
MPI_INFO_NULL, &myfile);
/* Get the size of the file */
MPI_File_get_size(myfile, &filesize);
/* Calculate how many elements that is */
filesize = filesize/sizeof(double);
/* Calculate how many elements each processor gets */
bufsize = filesize/np;
/* Allocate the buffer to read to, one extra for terminating null char */
buf = (double *) malloc((bufsize)*sizeof(double));
/* Set the file view */
MPI_File_set_view(myfile, myid*bufsize*sizeof(double), MPI_DOUBLE,
MPI_DOUBLE,"native", MPI_INFO_NULL);
/* Read from the file */
MPI_File_read(myfile, buf, bufsize, MPI_DOUBLE, &status);
/* Find out how many elemyidnts were read */
MPI_Get_count(&status, MPI_DOUBLE, &nrchar);
/* Set terminating null char in the string */
//buf[nrchar] = (double)0;
printf("Process %2d read %d characters: ", myid, nrchar);
int j;
for (j = 0; j <bufsize;j++){
matrix[myid][j] = buf[j];
}
/* Close the file */
MPI_File_close(&myfile);
if (myid==0) {
printf("Done\n");
}
MPI_Finalize();
exit(0);
}
However when I try to call MPI_File_open after I close the first file, I get an error. Do I need multiple communicators to perform this? Any tips will be appreciated.
The code in ArrayAllocation above does not quite match the logic of the main program. The matrix is allocated as an array of pointers to vectors of doubles before MPI is initialized, therefore it is impossible to set the number of rows to the number of MPI processes.
The column_size is also not known until the file size is determined.
It is a general convention in the C language to store matrices by rows. Violating this convention might confuse you or the reader of your code.
All in all in order to get this program working you need to declare
int num_procs, column_size;
as global variables before the definition of ArrayAllocation and move the call to this function down below the line where bufsize is calculated:
...
/* Calculate how many elements each processor gets */
bufsize = filesize/np;
num_procs = np;
column_size = bufsize;
double **matrix = ArrayAllocation();
...
With the above modifications this example should work on any MPI implementation that supports MPI-IO. I have tested it with OpenMPI 1.2.8.
In order to generate a test file you could use for instance the following code:
FILE* f = fopen(FILENAME,"w");
double x = 0;
for(i=0;i<100;i++){
fwrite(&x, 1,sizeof(double), f);
x +=0.1;
}
fclose(f);