My main question is how can I associate members of a custom-defined type/struct to that allocated via MPI_Win_allocate_shared(size,disp,...,&baseptr, &win). A help in either C or fortran is appreciated! Below I have included a sketch of what I wanted to do in both C and fortran.
An example in C is roughly as follows:
struct MyStruct{
int * ptr_int;
double * ptr_dble;
};
main(){
int n1,n2,n3;
struct * data;
// I am looking to use MPI to allocate a struct equivalent to the following:
// data = calloc(n3,sizeof(struct MyStruct))
// for (int i=0;i<n3;i++) {
// data[i].ptr_int = calloc(n1,sizeof(int));
// data[i].ptr_dble = calloc(n2,sizeof(double));}
int w_size,w_rank;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD,&w_size);
MPI_Comm_rank(MPI_COMM_WORLD,&w_rank);
MPI_Win win;
MPI_Aint size;
void * baseptr;
if (w_rank==0){
size = n3*(sizeof(int)*n1 + sizeof(double)*n2);
MPI_Win_allocate_shared(size,1,MPI_INFO_NULL,MPI_COMM_WORLD,&baseptr,&win);
// Question: how to associate struct * data with win, baseptr?
// Can &win then be initialized by calling data[i].ptr_int[j] = ...?
}else{
MPI_Win_shared_query(...);
// Question: again, how to associated struct * data with win, baseptr?
}
}
Equivalently an example in fortran is follows:
type MyStruct
integer, allocatable :: ptr_int(:)
real, allocatable :: ptr_dble(:)
end type
program main
implicit none
use mpi
integer :: n1,n2,n3
type(MyStruct), allocatable :: data
integer :: w_rank, w_size, ierr
call mpi_init(ierr)
call mpi_comm_size(mpi_comm_world,w_size,ierr)
call mpi_comm_rank(mpi_comm_world,w_rank,ierr)
MPI_Win MPI_Win
MPI_Aint size
if (w_rank==0) then
size = n3*(sizeof(int)*n1 + sizeof(double)*n2)
call mpi_win_allocate_shared(size,1,MPI_INFO_NULL,MPI_COMM_WORLD,baseptr,win)
! Question: how to associate data with win, baseptr?
! Can win then be initialized by calling data(i)%ptr_int(j) = ...?
else
call mpi_win_shared_query(...);
! Question: again, how to associated type(mystruct) data with win, baseptr?
endif
end program main
I solved the C part of the problem, thanks to a similar question posted earlier at: MPI-3 Shared Memory for Array Struct. I still need to implement it in fortran which is more relevant to my current work.
The key aspect is that one can define a pointer to the struct in each MPI process, and use pointer arithmetics to associate the shared memory with the data structure. A complete C solution is given as follows:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
struct product{
int nint;
int ndble;
int * ptr_int;
double * ptr_dble;
};
int main(){
int n1,n2,nproduct;
n1 = 3;
n2 = 4;
nproduct = 2;
struct product * tmp = calloc(nproduct,sizeof(struct product));
// Initiate MPI
int world_size,world_rank;
int disp_unit;
MPI_Win win;
MPI_Aint size;
void * baseptr;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
printf("Hello world from rank %d out of %d processors\n",world_rank,world_size);
if (world_rank==0){
size = (sizeof(int)*n1 + sizeof(double)*n2) * nproduct;
disp_unit = 1;
MPI_Win_allocate_shared(size,disp_unit,MPI_INFO_NULL,MPI_COMM_WORLD,&baseptr, &win);
printf("Success allocation\n");
}
else{
MPI_Win_allocate_shared(0,1,MPI_INFO_NULL,MPI_COMM_WORLD,&baseptr,&win);
MPI_Win_shared_query(win,0,&size,&disp_unit,&baseptr);
printf("Success query\n");
}
for (int i=0;i<nproduct;i++){
tmp[i].nint = n1;
tmp[i].ndble = n2;
tmp[i].ptr_int = (int*) baseptr;
tmp[i].ptr_dble = (double *) (baseptr + sizeof(int)*n1);
}
if (world_rank==0){
//MPI_Win_lock(MPI_LOCK_EXCLUSIVE,0,MPI_MODE_NOCHECK,win);
for (int i =0;i<nproduct;i++){
// initialize data stored in win via tmp
for (int j =0;j<n1;j++){
tmp[i].ptr_int[j] = j;
}
for (int j=0;j<n2;j++){
tmp[i].ptr_dble[j] = 2*(j-3);
}
}
//MPI_Win_unlock(0,win);
}
MPI_Barrier(MPI_COMM_WORLD);
// test
if (world_rank==1){
for (int j =0;j<n1;j++){
printf("%d ",tmp[1].ptr_int[j]);
}
printf("\n");
for (int j=0;j<n2;j++){
printf("%f ",tmp[1].ptr_dble[j]);
}
printf("\n");
}
MPI_Win_free(&win);
MPI_Finalize();
}
Related
I've been setting up a four-node mpi cluster with raspberry pis. As far as I can tell, I am down to one final major issue, and that is how to send an array of structs from each worker to the manager. I have cropped down the code to the below, but this could take a few tries, as I might have cropped too much. Albeit, I still get the same error (a seg fault, saying an address is not mapped), but sorry if there's a bit of back and fourth.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
struct ticknrank
{
char * ticker;
int errors;
int rank;
};
int main() //Designed for one master, three slaves
{
// i am under the impression the problem lies somewhere in this beginning section, before the commit.
int my_id;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Status status;
MPI_Datatype types[3] = {MPI_CHAR,MPI_INT,MPI_INT};
MPI_Datatype MPI_ticknrank, MPI_tmp;
int blocklengths[3] ={8,1,1};
MPI_Aint offsets[3];
offsets[0] = offsetof(struct ticknrank,ticker);
offsets[1] = offsetof(struct ticknrank,errors);
offsets[2]= offsetof(struct ticknrank,rank);
MPI_Aint lb, extent;
MPI_Type_create_struct(3,blocklengths, offsets, types, &MPI_tmp);
MPI_Type_get_extent(MPI_tmp, &lb, &extent);
MPI_Type_create_resized(MPI_tmp, lb, extent, &MPI_ticknrank);
MPI_Type_commit(&MPI_ticknrank);
// NOTE: sizeof(ticknrank) = 12, while MPI_Type_size(ticknrank) = 16. Not sure what to do about that.
if(my_id == 0) // meaning this process is a host job
{
//NOTE: NodethrRes and fou can be ommitted, I was just lazy and didn't wanna delete them
//on my cluster.
int length = 2;
struct ticknrank * NodeTwoRes = (struct ticknrank *)malloc(length * sizeof(struct ticknrank));
struct ticknrank * NodeThrRes = (struct ticknrank *)malloc(length * sizeof(struct ticknrank));
struct ticknrank * NodeFouRes = (struct ticknrank *)malloc(length * sizeof(struct ticknrank));
MPI_Recv(NodeTwoRes, length, MPI_ticknrank,1,MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Recv(NodeThrRes, length, MPI_ticknrank,2,MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Recv(NodeFouRes, length, MPI_ticknrank,3,MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("%s\n", NodeTwoRes[0].ticker);
}
else
{
int myLen = 2;
struct ticknrank * results = malloc(myLen * sizeof(struct ticknrank));
results[0].ticker = strdup("FIRST");
results[0].rank = 4;
results[0].errors = 7;
results[1].ticker = strdup("SECON");
results[1].rank = 3;
results[1].errors = 15;
MPI_Send(results,myLen,MPI_ticknrank,0,1,MPI_COMM_WORLD);
}
MPI_Type_free(&MPI_ticknrank);
MPI_Finalize();
return 0;
}
The C struct is a char * ticker (which is 4 bytes if you are running 32 bits), but the derived datatype is for a char ticker[8] which is indeed 8 bytes.
If you want to send multiple struct ticknrank in one shot, then the data should be in contiguous memory, which means moving from char * ticker to char ticker[8], and replacing strdup() with strcpy() (and up to you to make sure there is no buffer overflow).
I am implementing a way to transfer a set of data to a programmable dongle. The dongle is based on a smart card technology and can execute an arbitrary code inside. The input and output data is passed as a binary blocks that can be accessed via input and output pointers.
I would like to use an associative array to simplify the data processing code. Everything should work this way:
First the host application:
// Host application in C++
in_data["method"] = "calc_r";
in_data["id"] = 12;
in_data["loc_a"] = 56.19;
in_data["loc_l"] = 44.02;
processor->send(in_data);
Next the code inside the dongle:
// Some dongle function in C
char* method_name = assoc_get_string(in_data, "method");
int id = assoc_get_int(in_data, "id");
float loc_a = assoc_get_float(in_data, "loc_a");
float loc_l = assoc_get_float(in_data, "loc_l");
So my question is about the dongle part functionality. Is there C code or library to implement such an associative array behavior like the above?
Glib's hash table. implements a map interface or (associative array).
And it's most likely the most used hash table implementation for C.
GHashTable *table=g_hash_table_new(g_str_hash, g_str_equal);
/* put */
g_hash_table_insert(table,"SOME_KEY","SOME_VALUE");
/* get */
gchar *value = (gchar *) g_hash_table_lookup(table,"SOME_KEY");
My suspicion is that you would have to write your own. If I understand the architecture you are describing, then you will need to send the entire chunk of data in a single piece. If so, then most libraries will not work for that because they will most likely be allocating multiple pieces of memory, which would require multiple transfers (and an inside understanding of the structure). It would be similar to trying to use a library hash function and then sending its contents over the network on a socket just by passing the root pointer to the send function.
It would be possible to write some utilities of your own that manage a very simple associative array (or hash) in a single block of memory. If the amount of data is small, it could use a simple linear search for the entries and would be a fairly compact bit of code.
Try uthash, a header library implementing a hash table in C. It's small and fairly easy to use.
This is an old thread, but I thought this might still be useful for anyone out there looking for an implementation. It doesn't take too much code; I did mine in ~100 lines of without any extra library. I called it a dictionary since it parallels (sort of) the python datatype. Here is my code:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
typedef struct hollow_list hollow_list;
struct hollow_list{
unsigned int size;
void *value;
bool *written;
hollow_list *children;
};
//Creates a hollow list and allocates all of the needed memory
hollow_list hollow_list_create(unsigned int size){
hollow_list output;
output = (hollow_list) {.size = size, .value = (void *) 0, .written = calloc(size, sizeof(bool)), .children = calloc(size, sizeof(hollow_list))};
return output;
}
//Frees all memory of associated with a hollow list and its children
void hollow_list_free(hollow_list *l, bool free_values){
int i;
for(i = 0; i < l->size; i++){
hollow_list_free(l->children + i, free_values);
}
if(free_values){
free(l->value);
}
free(l);
}
//Reads from the hollow list and returns a pointer to the item's data
void *hollow_list_read(hollow_list *l, unsigned int index){
if(index == 0){
return l->value;
}
unsigned int bit_checker;
bit_checker = 1<<(l->size - 1);
int i;
for(i = 0; i < l->size; i++){
if(bit_checker & index){
if(l->written[i] == true){
return hollow_list_read(l->children + i, bit_checker ^ index);
} else {
return (void *) 0;
}
}
bit_checker >>= 1;
}
}
//Writes to the hollow list, allocating memory only as it needs
void hollow_list_write(hollow_list *l, unsigned int index, void *value){
if(index == 0){
l->value = value;
} else {
unsigned int bit_checker;
bit_checker = 1<<(l->size - 1);
int i;
for(i = 0; i < l->size; i++){
if(bit_checker & index){
if(!l->written[i]){
l->children[i] = hollow_list_create(l->size - i - 1);
l->written[i] = true;
}
hollow_list_write(l->children + i, bit_checker ^ index, value);
break;
}
bit_checker >>= 1;
}
}
}
typedef struct dictionary dictionary;
struct dictionary{
void *value;
hollow_list *child;
};
dictionary dictionary_create(){
dictionary output;
output.child = malloc(sizeof(hollow_list));
*output.child = hollow_list_create(8);
output.value = (void *) 0;
return output;
}
void dictionary_write(dictionary *dict, char *index, unsigned int strlen, void *value){
void *hollow_list_value;
dictionary *new_dict;
int i;
for(i = 0; i < strlen; i++){
hollow_list_value = hollow_list_read(dict->child, (int) index[i]);
if(hollow_list_value == (void *) 0){
new_dict = malloc(sizeof(dictionary));
*new_dict = dictionary_create();
hollow_list_write(dict->child, (int) index[i], new_dict);
dict = new_dict;
} else {
dict = (dictionary *) hollow_list_value;
}
}
dict->value = value;
}
void *dictionary_read(dictionary *dict, char *index, unsigned int strlen){
void *hollow_list_value;
dictionary *new_dict;
int i;
for(i = 0; i < strlen; i++){
hollow_list_value = hollow_list_read(dict->child, (int) index[i]);
if(hollow_list_value == (void *) 0){
return hollow_list_value;
} else {
dict = (dictionary *) hollow_list_value;
}
}
return dict->value;
}
int main(){
char index0[] = "hello, this is a test";
char index1[] = "hello, this is also a test";
char index2[] = "hello world";
char index3[] = "hi there!";
char index4[] = "this is something";
char index5[] = "hi there";
int item0 = 0;
int item1 = 1;
int item2 = 2;
int item3 = 3;
int item4 = 4;
dictionary d;
d = dictionary_create();
dictionary_write(&d, index0, 21, &item0);
dictionary_write(&d, index1, 26, &item1);
dictionary_write(&d, index2, 11, &item2);
dictionary_write(&d, index3, 13, &item3);
dictionary_write(&d, index4, 17, &item4);
printf("%d\n", *((int *) dictionary_read(&d, index0, 21)));
printf("%d\n", *((int *) dictionary_read(&d, index1, 26)));
printf("%d\n", *((int *) dictionary_read(&d, index2, 11)));
printf("%d\n", *((int *) dictionary_read(&d, index3, 13)));
printf("%d\n", *((int *) dictionary_read(&d, index4, 17)));
printf("%d\n", ((int) dictionary_read(&d, index5, 8)));
}
Unfortunately you can't replicate the list[x] syntax, but this is the best alternative I have come up with.
Yes, but it will not work in the way you have specified. It will instead use a struct to store the data and functions that operate on that struct, giving you the result you want. See A Simple Associative Array Library In C. Example of use:
struct map_t *test;
test=map_create();
map_set(test,"One","Won");
map_set(test,"Two","Too");
map_set(test,"Four","Fore");
GLib's Hash Tables and Balanced Binary Trees might be what you're after.
Mark Wilkins gave you the right answer. If you want to send the data as a single chunk, you need to understand how C++ maps are represented in your architecture and write the access functions.
Anyway, if you decide to recreate the map on the dongle, I've written a small C library where you could write thinks like:
tbl_t in_data=NULL;
tblSetSS(in_data,"method","calc_r");
tblSetSN(in_data,"id",12);
tblSetSF(in_data,"loc_a",56.19);
tblSetSF(in_data,"loc_l",44.02);
and then:
char *method_name = tblGetP(in_data, "method");
int id = tblGetN(in_data, "id");
float loc_a = tblGetF(in_data, "loc_a");
float loc_l = tblGetF(in_data, "loc_l");
The hashtable is a variation of the Hopscotch hash, which is rather good on average, and you can have any mix of type for keys and data (i.e. you can use an entire table as a key).
The focus for that functions was on easing programming rather than pure speed and the code is not thoroughly tested but if you like the idea and want to expand on it, you can have a look at the code on googlecode.
(There are other things like variable length strings and a fast sttring pattern matching function but those might not be of interest in this case).
I am working on a c code that holds a structure that hosts some values which I call range.
My purpose is to use this so called range dynamically (holding different amount of data at every execution). I am now provisionally using the # define comp instead. This so called range gets updated every time I call my update_range though the use of s1 structure (and memory allocations).
What I found weird is that when I introduced a "show_range" function to output the actual values inside/outside the update function I realized that I loose the first two values.
Here is the code.
Any suggestions on that?
Thanks in advance!
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <errno.h>
#include <string.h>
#include <complex.h>
#define comp 1024
// struct holding a complex-valued range
struct range {
int dimensions; /* number of dimensions */
int* size; /* array holding number of points per dimension */
complex double* values; /* array holding complex valued */
int components; /* number of components that will change on any execution*/
};
// parameters to use in function
struct s1 {
int tag;
struct range* range;
};
int update_range(struct s1* arg);
int show_range(struct range* argrange, char* message);
int copy_range(struct range* in, struct range* out);
int main(void) {
int ret = 0;
struct s1 s1;
s1.tag = 0;
s1.range = malloc(sizeof(struct range));
update_range(&s1);
show_range(s1.range, "s1.range inside main function");
return ret;
}
////////////////////////////////////////////
int update_range(struct s1* arg) {
int ret = 0;
int i;
struct range range;
range.dimensions = 1;
range.size = malloc(range.dimensions * sizeof(int));
range.components = comp;
range.size[0] = range.components; // unidimensional case
range.values = malloc(range.components * sizeof(complex double));
for (i = 0; i < range.components; i++) {
range.values[i] = (i + 1) + I * (i + 1);
}
show_range(&range, "range inside update_range function");
arg->range->size =
malloc(range.dimensions * sizeof(int)); // size was unknown before
arg->range->values =
malloc(comp * sizeof(complex double)); // amount of values was unknown
copy_range(&range, arg->range);
show_range(arg->range, "arg->range inside update_range function");
if (range.size)
free(range.size);
range.size = NULL;
if (range.values)
free(range.values);
range.values = NULL;
return ret;
}
////////////////////////////////////////////
// Show parameters (10 first values)
int show_range(struct range* argrange, char* message) {
int ret = 0;
vint i;
printf(" ******************************\n");
printf(" range in %s \n", message);
printf(" arg.dimensions=%d \n", argrange->dimensions);
printf(" arg.size[0]=%d \n", argrange->size[0]);
printf(" argrange.components=%d \n", argrange->components);
printf(" first 10 {Re} values: \n");
for (i = 0; i < 10; i++) {
printf(" argrange.values[%d]=%f\n", i, creal(argrange->values[i]));
}
printf("\n");
return ret;
}
////////////////////////////////////////////
// copy range
int copy_range(struct range* in, struct range* out) {
int ret = 0;
if (in == NULL) {
fprintf(stderr, "error: in points to NULL (%s:%d)\n", __FILE__,
__LINE__);
ret = -1;
goto cleanup;
}
if (out == NULL) {
fprintf(stderr, "error: out points to NULL (%s:%d)\n", __FILE__,
__LINE__);
ret = -1;
goto cleanup;
}
out->dimensions = in->dimensions;
out->size = in->size;
out->values = in->values;
out->components = in->components;
cleanup:
return ret;
}
Your copy_range function is broken, because it copy only pointer to size and values and not the memory. After you call free(range.size); and free(range.values); you are deleting mamory also from original object but without setting its pointers back to NULL.
After calling update_range, s1.range has non NULL pointers in size and values, but they are pointing to deleted memory.
You are experiencing undefined behaviour (UB) due to accessing freed memory. Your copy_range() function only does a shallow copy of the two pointer fields so when you run free(range->size) you make arg->range->size invalid.
You should make copy_range() a deep copy by allocating and copying the pointer contents like:
out->size = malloc(in->dimensions * sizeof(int));
memcpy(out->size, in->size, in->dimensions * sizeof(int));
out->values = malloc(in->components * sizeof(complex double));
memcpy(out->values , in->values, in->components * sizeof(complex double));
There are not 10 items to print, so the lines:
printf(" first 10 {Re} values: \n");
for (i = 0; i < 10; i++) {
printf(" argrange.values[%d]=%f\n", i, creal(argrange->values[i]));
}
Will be printing from random memory.
a much better method would be:
printf(" first %d {Re} values: \n", min(argrange.components,10));
for (i = 0; i < argrange.components; i++) {
printf(" argrange.values[%d]=%f\n", i, creal(argrange->values[i]));
}
The above is just one of many problems with the code.
I would suggest executing the code using a debugger to get the full story.
as it is, the code has some massive memory leaks due mostly
to overlaying malloc'd memory pointers.
for instance as in the following:
arg->range->size =
malloc(range.dimensions * sizeof(int)); // size was unknown before
arg->range->values =
malloc(comp * sizeof(complex double)); // amount of values was unknown
I am writing a multi-threaded program that reads in a file of matrices and creates threads that multiply each row of 2 matrices together. I managed to get everything to work properly using only a single thread, but when I tried to create multiple threads simultaneously, I ran into problems. First, I am storing the result matrices in a global struct containing an array of struct that contains an array of matrix values. (ex. struct matrixArray.array -> struct matrix.array). When my function that was being called by pthread_create ran, my global matrixArray lost all of its data. So, I tried to pass the matrixArray into the function called by pthread_create. However, I keep getting the error: expression must have struct or union type. This is when I try to access the struct inside the function being called, despite the fact I am declaring it as struct matrixArray.
Here are my structures:
typedef struct matrix{
int rows;
int cols;
int multRow; // The "MULTIPLIED ROW" This is for determing which row the current thread needs to use for multiplication. This only applies for Matrix A in each set.
int size;
int set; // This is for which set the matrix belongs to.
char letter; // This is for labeling the matrices A B and C
int * array;
unsigned int * threadID; // Array containing the thread ids that are used to create the result
} matrix;
typedef struct matrixArray{
int size;
matrix * array;
} matrixArray;
Here is the main function:
int main(int argc, char *argv[])
{
pthread_t * tid; /* the thread identifier */
pthread_attr_t attr; /* set of attributes for the thread */
int i; // Counter
int aIndex; // Index of the current 'A' matrix being multiplied.
int rows,cols;
// Checke to make sure we have the correct number of arguments supplied
// when running the program.
if(argc < 1){
printf("Error: You did not provide the correct number of arguments.\n\n");
return 0;
}
// Read the file and create the matrices
readFile();
// Initialize the result matrix before we start creating threads that use it.
mtxResults = newMatrixArray();
// Get the default attributes
pthread_attr_init(&attr);
// Set the current set to be mutliplied to 1
currentSet = 1;
// Create a new matrixArray to pass to the threads
//struct matrixArray *mtxPassed = malloc(sizeof(struct matrixArray));
struct matrixArray *mtxPassed = newMatrixArray();
memcpy(mtxPassed, &mtxResults, sizeof(struct matrixArray));
// Allocate size of tid array based on number of threads
tid = malloc(threads * sizeof(pthread_t));
// Create the threads.
for(i = 0; i < threads; i++){
//pthread_create(&tid[i], &attr, runner, argv[1]);
pthread_create(&tid[i], &attr, runner, mtxPassed);
// Increment currentSet when the current row evalutated
// in the current set is equal to the total number of rows available.
aIndex = ((currentSet * 2) - 2);
if(mtx.array[aIndex].multRow == mtx.array[aIndex].rows){
currentSet++;
}
}
// Wait for threads to finish
for(i = 0; i < threads; i++){
pthread_join(tid[i], NULL);
}
// Print the matrices
//printMatrices();
} // End of main()
And here is the function runner, being called by pthread_create:
// The thread will begin control in this function
void *runner(void *param)
{
struct matrixArray *mtxA = newMatrixArray();
mtxA = (struct matrixArray *)param;
printf("mtxPassed.size = %i\n",mtxA.size);
// Do the matrix multiplication for a single row
matrixMultiply(currentSet, (unsigned int)pthread_self(), mtxA);
pthread_exit(0);
}
Originally, the error was occurring when I passed the mtxA structure to matrixMultiply function, but I have since tried accessing it inside runner directly to see if it worked there. It still is not.
I get the error at "printf("mtxPassed.size = %I\n",mtxA.size);" Instead of the first 2 lines of runner, I have also tried "struct matrixArray *mtxA = (struct matrixArray *)param;" That did not work either.
Here is the error:
[nsltg2#lewis assign2]$ cc -pthread -lpthread assign2.c
assign2.c(602): error: expression must have struct or union type
printf("mtxPassed.size = %i\n",mtxPassed.size);
^
I would really appreciate any help you can offer. Thank you so much!
I have created this little program to calculate pi using probability and ratios. In order to make it run faster I decided to give multithreading with pthreads a shot. Unfortunately, even after doing much searching around I was unable to solve the problem I have in that when I run the threadFunc function, with one thread, whether that be with a pthread, or just normally called from the calculate_pi_mt function, the performance is much better (at least twice or if not 3 times better) than when I try running it with two threads on my dual core machine. I have tried disabling optimizations to no avail. As far as I can see, when the thread is running it is using local variables apart from at the end when I have used a mutex lock to create the sum of hits...
Firstly are there any tips for creating code that will run better here? (ie style) because I'm just learning by trying this stuff.
And secondly would there be any reason for these obvious performance problems?
When running with number of threads set to 1, one of my cpus maxes out at 100%. When set to two, the second cpu rises to roughly 80%-90%, but all this extra work it is apparently doing is to no avail! Could it be the use of the rand() function?
struct arguments {
int n_threads;
int rays;
int hits_in;
pthread_mutex_t *mutex;
};
void *threadFunc(void *arg)
{
struct arguments* args=(struct arguments*)arg;
int n = 0;
int local_hits_in = 0;
double x;
double y;
double r;
while (n < args->rays)
{
n++;
x = ((double)rand())/((double)RAND_MAX);
y = ((double)rand())/((double)RAND_MAX);
r = (double)sqrt(pow(x, 2) + pow(y, 2));
if (r < 1.0){
local_hits_in++;
}
}
pthread_mutex_lock(args->mutex);
args->hits_in += local_hits_in;
pthread_mutex_unlock(args->mutex);
return NULL;
}
double calculate_pi_mt(int rays, int threads){
double answer;
int c;
unsigned int iseed = (unsigned int)time(NULL);
srand(iseed);
if ( (float)(rays/threads) != ((float)rays)/((float)threads) ){
printf("Error: number of rays is not evenly divisible by threads\n");
}
/* argument initialization */
struct arguments* args = malloc(sizeof(struct arguments));
args->hits_in = 0;
args->rays = rays/threads;
args->n_threads = 0;
args->mutex = malloc(sizeof(pthread_mutex_t));
if (pthread_mutex_init(args->mutex, NULL)){
printf("Error creating mutex!\n");
}
pthread_t thread_ary[MAXTHREADS];
c=0;
while (c < threads){
args->n_threads += 1;
if (pthread_create(&(thread_ary[c]),NULL,threadFunc, args)){
printf("Error when creating thread\n");
}
printf("Created Thread: %d\n", args->n_threads);
c+=1;
}
c=0;
while (c < threads){
printf("main waiting for thread %d to terminate...\n", c+1);
if (pthread_join(thread_ary[c],NULL)){
printf("Error while waiting for thread to join\n");
}
printf("Destroyed Thread: %d\n", c+1);
c+=1;
}
printf("Hits in %d\n", args->hits_in);
printf("Rays: %d\n", rays);
answer = 4.0 * (double)(args->hits_in)/(double)(rays);
//freeing everything!
pthread_mutex_destroy(args->mutex);
free(args->mutex);
free(args);
return answer;
}
There's a couple of problems I can see:
rand() is not thread-safe. Use drand48_r() (which generates a double in the range [0.0, 1.0) natively, which is what you want)
You only create one struct arguments structure, then try to use that for multiple threads. You need to create a seperate one for each thread (just use an array).
Here's how I'd clean up your approach. Note how we don't need to use any mutexes - each thread just stashes its own return value in a seperate location, and the main thread adds them up after the other threads have finished:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#include <pthread.h>
struct thread_info {
int thread_n;
pthread_t thread_id;
int rays;
int hits_in;
};
void seed_rand(int thread_n, struct drand48_data *buffer)
{
struct timeval tv;
gettimeofday(&tv, NULL);
srand48_r(tv.tv_sec * thread_n + tv.tv_usec, buffer);
}
void *threadFunc(void *arg)
{
struct thread_info *thread_info = arg;
struct drand48_data drand_buffer;
int n = 0;
const int rays = thread_info->rays;
int hits_in = 0;
double x;
double y;
double r;
seed_rand(thread_info->thread_n, &drand_buffer);
for (n = 0; n < rays; n++)
{
drand48_r(&drand_buffer, &x);
drand48_r(&drand_buffer, &y);
r = x * x + y * y;
if (r < 1.0){
hits_in++;
}
}
thread_info->hits_in = hits_in;
return NULL;
}
double calculate_pi_mt(int rays, int threads)
{
int c;
int hits_in = 0;
if (rays % threads) {
printf("Error: number of rays is not evenly divisible by threads\n");
rays = (rays / threads) * threads;
}
/* argument initialization */
struct thread_info *thr = malloc(threads * sizeof thr[0]);
for (c = 0; c < threads; c++) {
thr[c].thread_n = c;
thr[c].rays = rays / threads;
thr[c].hits_in = 0;
if (pthread_create(&thr[c].thread_id, NULL, threadFunc, &thr[c])) {
printf("Error when creating thread\n");
}
printf("Created Thread: %d\n", thr[c].thread_n);
}
for (c = 0; c < threads; c++) {
printf("main waiting for thread %d to terminate...\n", c);
if (pthread_join(thr[c].thread_id, NULL)) {
printf("Error while waiting for thread to join\n");
}
hits_in += thr[c].hits_in;
printf("Destroyed Thread: %d\n", c+1);
}
printf("Hits in %d\n", hits_in);
printf("Rays: %d\n", rays);
double answer = (4.0 * hits_in) / rays;
free(thr);
return answer;
}
You're using far too many synchronization primitives. You should sum the local_hits at the end in the main thread, and not use a mutex to update it in an asynchronous fashion. Or, at least, you could use an atomic operation (it's just an int) to do it instead of lock an entire mutex to update one int.
Threading has a cost. It may be that, as your useful computing code looks very simple, the cost of thread management (cost paid when changing thread and synchronisation cost) is much higher than the benefit.