Allocating memory buffer for use in MPI_Pack() - c

I'm going to use MPI_Pack() to make a message composed of n ints and m doubles. Their positions in the message buffer will be something like this
p1 x ints, q1 x doubles, p2 x ints, q2 x doubles, ..., pN x ints, qN x doubles
where n=p1+p2+...+pN and m=q1+q2+...+qN.
My question: Is the size of this message equal to the size of a message composed of the same number of ints and doubles but with the following order:
n x ints, m x doubles
I'm asking this question because I want to know how much memory should be allocated for the buffer. If the size of the message depends only on the number of ints and doubles and not how they are arranged, then the buffer can be allocated very easily:
MPI_Pack_size(n, MPI_INT, communicator, &k1);
MPI_Pack_size(m, MPI_DOUBLE, communicator, &k2);
buffer = malloc(k1 + k2);
Obviously the following solution is correct:
k = 0;
for (int i=0; i < N; i++)
{
MPI_Pack_size(p[i], MPI_INT, communicator, &k1);
MPI_Pack_size(q[i], MPI_DOUBLE, communicator, &k2);
k += k1 + k2;
}
buffer = malloc(k);
But for a large N, it may result in a too excessively large buffer, because as the official document of MPI states, the routine MPI_Pack_size()
returns an upper bound, rather than an exact bound, since the
exact amount of space needed to pack the message may depend on the context (e.g.,
first message packed in a packing unit may take more space).
UPDATE: a program I wrote for testing if the order of packing the ints and doubles affect the size of the message.
#include <stdio.h>
#include <mpi.h>
#include <assert.h>
#include <stdlib.h>
#include <time.h>
#define BUFF_SIZE 200000 /* buffer size in bytes */
#define MY_MPI_REAL MPI_DOUBLE
typedef double real;
int main()
{
MPI_Init(NULL, NULL);
int ic = 0, rc = 0; /* counters of int and real numbers */
int pos = 0; /* position in the buffer, used in MPI_Pack() calls */
/* allocate memory of the pack buffer */
void *buff = malloc(BUFF_SIZE);
assert(buff);
/* case 1: packing a large number of pairs of arrays */
srand(time(NULL));
for (int i=0; i<100; i++) /* 100 array pairs */
{
/* make int and real arrays of random lengths */
int ik = 99 * ((double)rand() / RAND_MAX) + 1;
int rk = 99 * ((double)rand() / RAND_MAX) + 1;
int *iarr = (int *)malloc(ik * sizeof(int));
assert(iarr);
double *rarr = (real *)malloc(rk * sizeof(real));
assert(rarr);
ic += ik;
rc += rk;
/* pack the array pair */
MPI_Pack(iarr, ik, MPI_INT, buff, BUFF_SIZE, &pos, MPI_COMM_WORLD);
MPI_Pack(rarr, rk, MY_MPI_REAL, buff, BUFF_SIZE, &pos, MPI_COMM_WORLD);
free(iarr);
free(rarr);
}
printf("final position for case 1 = %d\n", pos);
/* case 2: packing a single pair of arrays */
pos = 0;
int *iarr = (int *)malloc(ic * sizeof(int));
assert(iarr);
double *rarr = (real *)malloc(rc * sizeof(real));
assert(rarr);
MPI_Pack(iarr, ic, MPI_INT, buff, BUFF_SIZE, &pos, MPI_COMM_WORLD);
MPI_Pack(rarr, rc, MY_MPI_REAL, buff, BUFF_SIZE, &pos, MPI_COMM_WORLD);
free(iarr);
free(rarr);
printf("final position for case 2 = %d\n", pos);
free(buff);
printf("sizeof(int) = %ld, sizeof(real) = %ld\n", sizeof(int), sizeof(real));
printf("num of ints = %d, num of reals = %d\n", ic, rc);
printf("num of ints x sizeof(int) + num of reals x sizeof(real) = %ld\n", ic*sizeof(int)+rc*sizeof(real));
MPI_Finalize();
}

I think your worries are misplaced. The only possible overhead I see would be from alignment: maybe a one time alignment at the start of the buffer, and then maybe per element. However, the pack buffer is counted in bytes, and I just tested it: even packing a single byte does not lead to any padding. So that leads me to suspect that every data type basically takes the exact amount of space.

Related

Assertion failed in file, memcpy argument ranges overlap: MPI_Scatter

Below is my code. Currently, it is trying to distribute work to be done for a 1d representation of a matrix (2d array). I MPI_Scatter the portion of the array which needs work. I store that portion into local_C which should be of the same size as the portion sent. I also broadcast the M (col), Q (used for gather function as col size), ....
int main(int argc, char *argv[]) {
int rank;
int nproc;
int n_local;
int N; // rows
int M; // rows/columns
int Q; // columns
MPI_Init (&argc, &argv); /* intialize MPI*/
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Comm_size(comm, &nproc);
MPI_Comm_rank(comm, &rank);
int *matrixA;
int *matrixB;
int *matrixC;
int *local_C;
// manager core constructs factors of matrix representation
if (rank == 0) {
N = atoi(argv[1]);
M = atoi(argv[2]);
Q = atoi(argv[3]);
// check if correct number of input
if (argc != 4) {
printf("Enter <filename> <N> <M> <Q>\n");
exit(1);
}
else if (N % nproc != 0) { // check if N is a multiple of the number of processors
printf("Ensure N is divisible by number of processors: %i\n", nproc);
exit(1);
}
// create matrices of size
matrixA = malloc(N * M * sizeof(long));
randomlyFillArray(matrixA, N * M);
matrixB = malloc(M * Q * sizeof(long));
randomlyFillArray(matrixB, M * Q);
// create resulting product matrix of size
matrixC = malloc(N * Q * sizeof(long));
// sequential compute
//computeMatrixProductSequentially(matrixA, matrixB, matrixC, M, N, Q);
// parallel compute
// block data
n_local = N / nproc;
local_C = malloc(n_local * M * sizeof(long));
MPI_Bcast(&M, 1, MPI_INT, 0, comm);
MPI_Bcast(&Q, 1, MPI_INT, 0, comm);
MPI_Bcast(&n_local, 1, MPI_INT, 0, comm);
// scatter matrixA for n_local row to cores
MPI_Scatter(&matrixA, n_local * M, MPI_LONG, &local_C, n_local * M, MPI_LONG, 0, comm);
// broadcast matrixB to all cores
MPI_Bcast(&matrixB, 1, MPI_LONG, 0, comm);
}
}
else {
MPI_Bcast(&M, 1, MPI_INT, 0, comm);
MPI_Bcast(&Q, 1, MPI_INT, 0, comm);
MPI_Bcast(&n_local, 1, MPI_INT, 0, comm);
// scatter recv matrixA row
MPI_Scatter(&matrixA, n_local * M, MPI_LONG, &local_C, n_local * M, MPI_LONG, 0, comm);
// broadcast recv matrixB
MPI_Bcast(&matrixB, 1, MPI_LONG, 0, comm);
//MPI_Gather();
}
MPI_Finalize();
return 0;
}
Here is the error when trying to compile and run the program.
The purpose, in case it matters, is to multiply two matrices in parallel using 1d arrays.
The problem with your code is that MPI calls take a int* or double* or whateversimpletype* argument. Your MatrixA is int*, so using &MatrixA makes the buffer int**. Solution: pass MatrixA directly as buffer.
Also: you are coding as if the scatter operation on non-zero ranks creates the matrix. That is not the case. You need to allocate the array yourself, and MPI will write the values into it.
Another remark: scattering a matrix is not a scalable solution and is bad MPI coding. It introduces both a memory bottleneck, because your process zero needs to be able to store all the data, and a time bottleneck because all otehr processes have to wait for process zero to construct the matrix. The right way to code this is to let each process construct its own part of the matrix. Always keep your data structure distributed from start to end!

Open MPI Waitall() Segmentation Fault

I'm new with MPI and I'm trying to develop a non-blocking programm (with Isend and Irecv). The functionality is very basic (it's educational):
There is one process (rank 0) who is the master and receives messages from the slaves (rank 1-P). The master only receives results.
The slaves generates an array of N random numbers between 0 and R and then they do some operations with those numbers (again, it's just for educational purpose, the operations don't make any sense)
This whole process (operations + send data) is done M times (this is just for comparing different implementations; blocking and non-blocking)
I get a Segmentation Fault in the Master process when I'm calling the MPI_waitall() funcion
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#include <math.h>
#include <time.h>
#define M 1000 //Number of times
#define N 2000 //Quantity of random numbers
#define R 1000 //Max value of random numbers
double SumaDeRaices (double*);
int main(int argc, char* argv[]) {
int yo; /* rank of process */
int p; /* number of processes */
int dest; /* rank of receiver */
/* Start up MPI */
MPI_Init(&argc, &argv);
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &yo);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Request reqs[p-1];
MPI_Status stats[p-1];
if (yo == 0) {
int i,j;
double result;
clock_t inicio,fin;
inicio = clock();
for(i = 0; i<M; i++){ //M times
for(j = 1; j<p; j++){ //for every slave
MPI_Irecv(&result, sizeof(double), MPI_DOUBLE, j, i, MPI_COMM_WORLD, &reqs[j-1]);
}
MPI_Waitall(p-1,reqs,stats); //wait all slaves (SEG_FAULT)
}
fin = clock()-inicio;
printf("Tiempo total de ejecucion %f segundos \n", ((double)fin)/CLOCKS_PER_SEC);
}
else {
double* numAleatorios = (double*) malloc( sizeof(double) * ((double) N) ); //array with numbers
int i,j;
double resultado;
dest=0;
for(i=0; i<M; i++){ //again, M times
for(j=0; j<N; j++){
numAleatorios[j] = rand() % R ;
}
resultado = SumaDeRaices(numAleatorios);
MPI_Isend(&resultado,sizeof(double), MPI_DOUBLE, dest, i, MPI_COMM_WORLD,&reqs[p-1]); //send result to master
}
}
/* Shut down MPI */
MPI_Finalize();
exit(0);
} /* main */
double SumaDeRaices (double* valores){
int i;
double sumaTotal = 0.0;
//Raices cuadradas de los valores y suma de estos
for(i=0; i<N; i++){
sumaTotal = sqrt(valores[i]) + sumaTotal;
}
return sumaTotal;
}
There are several issues with your code. First and foremost in your Isend you pass &resultado several times without waiting until previous non-blocking operation finishes. You are not allowed to reuse the buffer you pass to Isend before you make sure the operation finishes.
Instead I recommend you using normal Send, because in contrast to synchronous send (SSend) normal blocking send returns as soon as you can reuse the buffer.
Second, there is no need to use message tags. I recommend you to just set tag to 0. In terms of performance it is simply faster.
Third, the result shouldn't be a simple variable, but an array of size at least (p-1)
Fourth, I do not recommend you to allocate arrays on stack, like MPI_Request and MPI_Status if the size is not a known small number. In this case the size of array is (p-1), so you better use malloc for this data structure.
Fifth, if you do not check status, use MPI_STATUSES_IGNORE.
Also instead of sizeof(double) you should specify number of items (1).
But of course the absolutely best version is just to use MPI_Gather.
Moreover, generally there is no reason not to run computations on the root node.
Here is slightly rewritten example:
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#include <math.h>
#include <time.h>
#define M 1000 //Number of times
#define N 2000 //Quantity of random numbers
#define R 1000 //Max value of random numbers
double SumaDeRaices (double* valores)
{
int i;
double sumaTotal = 0.0;
//Raices cuadradas de los valores y suma de estos
for(i=0; i<N; i++) {
sumaTotal = sqrt(valores[i]) + sumaTotal;
}
return sumaTotal;
}
int main(int argc, char* argv[]) {
int yo; /* rank of process */
int p; /* number of processes */
/* Start up MPI */
MPI_Init(&argc, &argv);
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &yo);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
double *result;
clock_t inicio, fin;
double *numAleatorios;
if (yo == 0) {
inicio = clock();
}
numAleatorios = (double*) malloc( sizeof(double) * ((double) N) ); //array with numbers
result = (double *) malloc(sizeof(double) * p);
for(int i = 0; i<M; i++){ //M times
for(int j=0; j<N; j++) {
numAleatorios[j] = rand() % R ;
}
double local_result = SumaDeRaices(numAleatorios);
MPI_Gather(&local_result, 1, MPI_DOUBLE, result, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); //send result to master
}
if (yo == 0) {
fin = clock()-inicio;
printf("Tiempo total de ejecucion %f segundos \n", ((double)fin)/CLOCKS_PER_SEC);
}
free(numAleatorios);
/* Shut down MPI */
MPI_Finalize();
} /* main */

Sorting an array of structs in C

We were given an assignment in class, to sort array of structs. After the assignment was handed in we discussed doing the sorting by using pointers of arrays to sort it as it was more efficient than the way most people had done it.
I decided to try and do it this way as well, however I'm running into some issues that I haven't been able to solve.
http://pastebin.com/Cs3y39yu
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
typedef struct stage2{//standard dec of the struct field
char need;
double ring;
char fight[8];
int32_t uncle;
char game;
double war;
int8_t train;
uint32_t beds;
float crook;
int32_t feast;
int32_t rabbits;
int32_t chin;
int8_t ground;
char veil;
uint32_t flowers;
int8_t adjustment;
int16_t pets;
} stage2;
void usage(){//usage method to handle areas
fprintf(stderr,"File not found\n");//prints to stderr
exit(1);//exits the program
}
int needS(const void *v1, const void *v2)
{
const stage2 *p1 = v1;
const stage2 *p2 = v2;
printf("%c %c \n",p1->need,p2->need);
return 0;
}
int main(int argc, char** argv){
if(argc != 3){//checks for a input files, only 1
usage();//if not runs usage
}
int structSize = 60;//size of structs in bytes, sizeof() doesnt return correct val
char* fileName = argv[1]; //pull input filename
FILE *file = fopen(fileName, "r");//opens in read mode
fseek(file, 0, SEEK_END); //goes to end of file
long fileSize = ftell(file); //saves filesize
char *vals = malloc(fileSize); //allocates the correct size for array based on its filesize
fseek(file, 0, SEEK_SET); //returns to start of file
fread(vals, 1, fileSize, file); //reads the file into char array vals
fclose(file); //closes file
int structAmount = fileSize/structSize; //determines amount of structs we need
stage2 mainArray[structAmount]; //makes array of structs correct size
int j;//loop variables
int i;
printf("need, ring, fight, uncle, game, war, train, beds, crook, feast, rabbits, chin, ground, veil, flowers, adjustment, pets\n");//prints our struct names
for(i = 0; i < structAmount; i ++){//initialises the array vals
mainArray[i].need = *(&vals[0+(i*60)]);
mainArray[i].ring = *((double *)&vals[1+(i*60)]);
for(j = 0;j<9;j++){
mainArray[i].fight[j] = *(&vals[j+9+(i*60)]);
}
mainArray[i].uncle = *((int32_t *)&vals[17+(i*60)]);
mainArray[i].game = *(&vals[21+(i*60)]);
mainArray[i].war = *((double *)&vals[22+(i*60)]);
mainArray[i].train = *((int8_t *)&vals[30+(i*60)]);
mainArray[i].beds = *((uint32_t *)&vals[31+(i*60)]);
mainArray[i].crook = *((float *)&vals[35+(i*60)]);
mainArray[i].feast = *((int32_t *)&vals[39+(i*60)]);
mainArray[i].rabbits = *((int32_t *)&vals[43+(i*60)]);
mainArray[i].chin = *((int32_t *)&vals[47+(i*60)]);
mainArray[i].ground = *((int8_t *)&vals[51+(i*60)]);
mainArray[i].veil = *(&vals[52+(i*60)]);
mainArray[i].flowers = *((uint32_t *)&vals[53+(i*60)]);
mainArray[i].adjustment = *((int8_t *)&vals[57+(i*60)]);
mainArray[i].pets = *((int16_t *)&vals[58+(i*60)]);
}
for(i = 0; i < structAmount; i ++){//prints
printf("%c, %f, %s, %d, %c, %f, %d, %u, %f, %d, %d, %d, %d, %c, %u, %d, %d \n",
mainArray[i].need,mainArray[i].ring,mainArray[i].fight,mainArray[i].uncle,mainArray[i].game,mainArray[i].war,mainArray[i].train,
mainArray[i].beds,mainArray[i].crook,mainArray[i].feast,mainArray[i].rabbits,mainArray[i].chin,mainArray[i].ground,mainArray[i].veil,
mainArray[i].flowers,mainArray[i].adjustment,mainArray[i].pets);//prints
}
free(vals);//frees the memory we allocated to vals
stage2 *array = malloc(structAmount * structSize);
for(i = 0; i < structAmount; i ++){
array[i] = mainArray[i];
}
printf("Before Sort\n\n");
for(i = 0; i < structAmount; i ++){
printf("%c, %f, %s, %d, %c, %f, %d, %u, %f, %d, %d, %d, %d, %c, %u, %d, %d \n",
array[i].need,array[i].ring,array[i].fight,array[i].uncle,array[i].game,array[i].war,array[i].train,
array[i].beds,array[i].crook,array[i].feast,array[i].rabbits,array[i].chin,array[i].ground,array[i].veil,
array[i].flowers,array[i].adjustment,array[i].pets);//prints
}
qsort(array, structAmount,structSize,needS);
printf("After Sort\n\n");
for(i = 0; i < structAmount; i ++){
printf("%c, %f, %s, %d, %c, %f, %d, %u, %f, %d, %d, %d, %d, %c, %u, %d, %d \n",
array[i].need,array[i].ring,array[i].fight,array[i].uncle,array[i].game,array[i].war,array[i].train,
array[i].beds,array[i].crook,array[i].feast,array[i].rabbits,array[i].chin,array[i].ground,array[i].veil,
array[i].flowers,array[i].adjustment,array[i].pets);//prints
}
FILE *my_file = fopen(argv[2], "wb");
for(i = 0; i < structAmount; i ++){
fwrite(&mainArray[i].need, sizeof(char), 1, my_file);
fwrite(&mainArray[i].ring, sizeof(double), 1, my_file);
fwrite(&mainArray[i].fight, sizeof(char[8]), 1, my_file);
fwrite(&mainArray[i].uncle, sizeof(int32_t), 1, my_file);
fwrite(&mainArray[i].game, sizeof(char), 1, my_file);
fwrite(&mainArray[i].war, sizeof(double), 1, my_file);
fwrite(&mainArray[i].train, sizeof(int8_t), 1, my_file);
fwrite(&mainArray[i].beds, sizeof(uint32_t), 1, my_file);
fwrite(&mainArray[i].crook, sizeof(float), 1, my_file);
fwrite(&mainArray[i].feast, sizeof(int32_t), 1, my_file);
fwrite(&mainArray[i].rabbits, sizeof(int32_t), 1, my_file);
fwrite(&mainArray[i].chin, sizeof(int32_t), 1, my_file);
fwrite(&mainArray[i].ground, sizeof(int8_t), 1, my_file);
fwrite(&mainArray[i].veil, sizeof(char), 1, my_file);
fwrite(&mainArray[i].flowers, sizeof(uint32_t), 1, my_file);
fwrite(&mainArray[i].adjustment, sizeof(int8_t), 1, my_file);
fwrite(&mainArray[i].pets, sizeof(int16_t), 1, my_file);
}
fclose(my_file);
return 0;//return statement
}
There is the link to the code that I have been working on. My main issue is that from what I gather, when using the sorting comparison method needS (line 31) for the first execution of the sort it should return the first field for the first two structs in the array and print them (I am aware that this isn't a valid sorting method but wanted to make sure the variables were what I expected). However that's not what occurs; the p1 variable will print out what I expect however the p2 variable will not. From this point onwards, every use of this will return junk (I think) values.
Is there anything that I'm missing or doing wrong?
Your problem is in this line:
int structSize = 60; //size of structs in bytes, sizeof() doesn't return correct val
You're wrong; sizeof() does return the correct size. Whatever you're doing is bogus. You need to go back to the basics of serialization. You will need to write the data correctly. You should be able to read all the data into the array in a single read operation. If you really have a 60 bytes per record in the input, you have serious issues.
Note that your structure layout in memory wastes 7 bytes on most systems (3 on some) between the char need; and double ring; elements. There's a similar gap between the char game; and double war;, and another gap (usually 3 bytes) between int8_t train; and uint32_t beds;, and another similar gap between (usually 2 bytes this time) between char veil; and uint32_t flowers; (because char veil; is preceded by int8_t ground;, and a gap of 1 between int8_t adjustment; and int16_t pets; — I won't guarantee that I've spotted all the gaps).
To learn more about structure layouts and padding, see Why isn't sizeof for a struct equal to the sum of sizeof of each member? as suggested by paddy.
To minimize the wasted space, the heuristic is to put the members with bigger base types before those with smaller base types. Thus all the double members should come before any of the char members. If a member is an array type, ignore the array and look at the size of the base type. For example, char fight[8]; is best put with the other char members at the end, though given that it is a multiple of 8 bytes, it could stay where it is — but it is simpler to be consistent. Pointers need to be treated as 8 bytes on 64-bit systems, or as 4 bytes on 32-bit systems. Place pointers between the non-pointer types such as long long or double (usually 8 bytes each) and the smaller non-pointer types such as int or uint32_t. The long type is a nuisance; it can be 4 or 8 bytes, depending (it's 4 bytes on Windows, even Windows 64-bit; it is 8 bytes on 64-bit Unix or 4 bytes on 32-but Unix).
The function needS needs to return a value that is appropriate for sorting the items in the array.
int needS(const void *v1, const void *v2)
{
const stage2 *p1 = v1;
const stage2 *p2 = v2;
printf("%c %c \n",p1->need,p2->need);
// Something like:
return (p1->need < p2->need);
}
You have confused the "packed" structure size in your data file with the in-memory structure array that you allocate (by default, struct stage2 will have extra padding to align data types efficiently). This line here is the problem:
stage2 *array = malloc(structAmount * structSize);
It should be:
stage2 *array = malloc(structAmount * sizeof(stage2));
Your call to qsort needs to be updated accordingly too:
qsort(array, structAmount, sizeof(stage2), needS);
In addition to other posters' good answers about comparison functions and structure size, you also don't sort by pointers, which was your initial intention.
In your code, you sort a copy of the whole array of large structs. When sorting by pointer, you create an auxiliary array of pointers to the original array elements and then sort these pointers by way of the data they point to. This will leave the original array intact.
Your comparison function must then treat the void * pointers as pointers to pointers to your struct. Example (with heavily abridged structures) below.
#include <stdio.h>
#include <stdlib.h>
typedef struct stage2 {
int need;
} stage2;
int needS(const void *v1, const void *v2)
{
stage2 *const *p1 = v1;
stage2 *const *p2 = v2;
return ((*p1)->need - (*p2)->need) - ((*p2)->need - (*p1)->need);
}
int main(int argc, char **argv)
{
stage2 mainArray[] = {
{8}, {3}, {5}, {1}, {19}, {-2}, {8}, {0},{0}, {4}, {5}, {1}, {8}
};
int structAmount = sizeof(mainArray) / sizeof(*mainArray);
int i;
stage2 **array = malloc(structAmount * sizeof(*array));
// Assign pointers
for (i = 0; i < structAmount; i++) {
array[i] = &mainArray[i];
}
qsort(array, structAmount, sizeof(*array), needS);
puts("Original");
for (i = 0; i < structAmount; i++) {
printf("%d\n", mainArray[i].need);
}
puts("");
puts("Sorted");
for (i = 0; i < structAmount; i++) {
printf("%d\n", array[i]->need);
}
free(array);
return 0;
}

Why does fopen() change the value of the element in an array?

int getbit(int * list, int n)
{
return (list[n / 32] >> (n % 32)) & 1;
}
void setbit(int * list, int n)
{
list[n / 32] |= 1 << (n % 32);
}
int main()
{
FILE * out;
int size = 99; //2000000000;
int root = sqrt(size);
int * list = malloc(size / 8.0); //(2*10^9)/8
memset(list, 0, sizeof list);
int i, j;
for (i = 2; i <= root; i++)
for (j = 2 * i; j < size; j += i)
setbit(list, j);
printf("i=%d j=%d 98=%d\n", i, j, getbit(list, 98));
out = fopen("output.txt", "w");
printf("i=%d j=%d 98=%d\n", i, j, getbit(list, 98));
/*for (i=2; i<size; i++)
if (!getbit(list, i))
fprintf(out, "%d\n", i);
fclose(out);*/
return 0;
}
Whenever I use the fopen() in between printf, the value of the third parameter changes from 1 to 0. If I comment out the line then the value is same. What might be the reason behind this?
You see undefined behavior: sizeof(list) is probably 4 or 8 bytes, depending on the architecture, so memset with zeros does not go past the forth byte. You are reading from the third 32-bit word, which came from malloc and has not been initialized by the memset yet. Moreover, you are allocating 12 bytes (size/8.0 gets converted to int; it never makes sense to pass a float or a double to malloc, because you cannot allocate fractional bytes) so accessing the 98-th bit goes past the allocated area.
You should fix these undefined behaviors: allocate enough memory by using
// count needs to be a multiple of sizeof(int)
// The math gets pretty ugly here, but it should work:
int count = sizeof(int)*(size+(8*sizeof(int))-1)/(8*sizeof(int));
int * list = malloc(count);
Then initialize the data to zero by using the proper size:
memset(list, 0, count);
You're writing in memory you do not own, that has an undefined behavior.
Firstly, you're allocating only 12 bytes here:
int* list = malloc(size / 8.0);
You should do this (just giving you the idea, I don't know how many bytes you really want to allocate..):
int* list = malloc((size / 8.0) * sizeof(*list));
Secondly, you are memsetting only 4 bytes (if you're on 32bits system) here:
memset(list, 0, sizeof list);
You should do this:
memset(list, 0, (size / 8.0) * sizeof(*list));
Finally, the only reason your call to fopen() changes things, is because fopen() allocates memory.
Good luck.

Segmentation fault after scatterv

/**
* BLOCK_LOW
* Returns the offset of a local array
* with regards to block decomposition
* of a global array.
*
* #param (int) process rank
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) offset of local array in global array
*/
#define BLOCK_LOW(id, p, n) ((id)*(n)/(p))
/**
* BLOCK_HIGH
* Returns the index immediately after the
* end of a local array with regards to
* block decomposition of a global array.
*
* #param (int) process rank
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) offset after end of local array
*/
#define BLOCK_HIGH(id, p, n) (BLOCK_LOW((id)+1, (p), (n)))
/**
* BLOCK_SIZE
* Returns the size of a local array
* with regards to block decomposition
* of a global array.
*
* #param (int) process rank
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) size of local array
*/
#define BLOCK_SIZE(id, p, n) ((BLOCK_HIGH((id), (p), (n))) - (BLOCK_LOW((id), (p), (n))))
/**
* BLOCK_OWNER
* Returns the rank of the process that
* handles a certain local array with
* regards to block decomposition of a
* global array.
*
* #param (int) index in global array
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) rank of process that handles index
*/
#define BLOCK_OWNER(i, p, n) (((p)*((i)+1)-1)/(n))
/*Matricefilenames:
small matrix A.bin of dimension 100 × 50
small matrix B.bin of dimension 50 × 100
large matrix A.bin of dimension 1000 × 500
large matrix B.bin of dimension 500 × 1000
An MPI program should be implemented such that it can
• accept two file names at run-time,
• let process 0 read the A and B matrices from the two data files,
• let process 0 distribute the pieces of A and B to all the other processes,
• involve all the processes to carry out the the chosen parallel algorithm
for matrix multiplication C = A * B ,
• let process 0 gather, from all the other processes, the different pieces
of C ,
• let process 0 write out the entire C matrix to a data file.
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include "mpi-utils.c"
void read_matrix_binaryformat (char*, double***, int*, int*);
void write_matrix_binaryformat (char*, double**, int, int);
void create_matrix (double***,int,int);
void matrix_multiplication (double ***, double ***, double ***,int,int, int);
int main(int argc, char *argv[]) {
int id,p; // Process rank and total amount of processes
int rowsA, colsA, rowsB, colsB; // Matrix dimensions
double **A; // Matrix A
double **B; // Matrix B
double **C; // Result matrix C : AB
int local_rows; // Local row dimension of the matrix A
double **local_A; // The local A matrix
double **local_C; // The local C matrix
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &id);
MPI_Comm_size (MPI_COMM_WORLD, &p);
if(argc != 3) {
if(id == 0) {
printf("Usage:\n>> %s matrix_A matrix_B\n",argv[0]);
}
MPI_Finalize();
exit(1);
}
if (id == 0) {
read_matrix_binaryformat (argv[1], &A, &rowsA, &colsA);
read_matrix_binaryformat (argv[2], &B, &rowsB, &colsB);
}
if (p == 1) {
create_matrix(&C,rowsA,colsB);
matrix_multiplication (&A,&B,&C,rowsA,colsB,colsA);
char* filename = "matrix_C.bin";
write_matrix_binaryformat (filename, C, rowsA, colsB);
free(A);
free(B);
free(C);
MPI_Finalize();
return 0;
}
// For this assignment we have chosen to bcast the whole matrix B:
MPI_Bcast (&B, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast (&colsA, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (&colsB, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (&rowsA, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (&rowsB, 1, MPI_INT, 0, MPI_COMM_WORLD);
local_rows = BLOCK_SIZE(id, p, rowsA);
/* SCATTER VALUES */
int *proc_elements = (int*)malloc(p*sizeof(int)); // amount of elements for each processor
int *displace = (int*)malloc(p*sizeof(int)); // displacement of elements for each processor
int i;
for (i = 0; i<p; i++) {
proc_elements[i] = BLOCK_SIZE(i, p, rowsA)*colsA;
displace[i] = BLOCK_LOW(i, p, rowsA)*colsA;
}
create_matrix(&local_A,local_rows,colsA);
MPI_Scatterv(&A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,&local_A[0],
local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);
/* END SCATTER VALUES */
create_matrix (&local_C,local_rows,colsB);
matrix_multiplication (&local_A,&B,&local_C,local_rows,colsB,colsA);
/* GATHER VALUES */
MPI_Gatherv(&local_C[0], rowsA*colsB, MPI_DOUBLE,&C[0],
&proc_elements[0],&displace[0],MPI_DOUBLE,0, MPI_COMM_WORLD);
/* END GATHER VALUES */
char* filename = "matrix_C.bin";
write_matrix_binaryformat (filename, C, rowsA, colsB);
free (proc_elements);
free (displace);
free (local_A);
free (local_C);
free (A);
free (B);
free (C);
MPI_Finalize ();
return 0;
}
void create_matrix (double ***C,int rows,int cols) {
*C = (double**)malloc(rows*sizeof(double*));
(*C)[0] = (double*)malloc(rows*cols*sizeof(double));
int i;
for (i=1; i<rows; i++)
(*C)[i] = (*C)[i-1] + cols;
}
void matrix_multiplication (double ***A, double ***B, double ***C, int rowsC,int colsC,int colsA) {
double sum;
int i,j,k;
for (i = 0; i < rowsC; i++) {
for (j = 0; j < colsC; j++) {
sum = 0.0;
for (k = 0; k < colsA; k++) {
sum = sum + (*A)[i][k]*(*B)[k][j];
}
(*C)[i][j] = sum;
}
}
}
/* Reads a 2D array from a binary file*/
void read_matrix_binaryformat (char* filename, double*** matrix, int* num_rows, int* num_cols) {
int i;
FILE* fp = fopen (filename,"rb");
fread (num_rows, sizeof(int), 1, fp);
fread (num_cols, sizeof(int), 1, fp);
/* storage allocation of the matrix */
*matrix = (double**)malloc((*num_rows)*sizeof(double*));
(*matrix)[0] = (double*)malloc((*num_rows)*(*num_cols)*sizeof(double));
for (i=1; i<(*num_rows); i++)
(*matrix)[i] = (*matrix)[i-1]+(*num_cols);
/* read in the entire matrix */
fread ((*matrix)[0], sizeof(double), (*num_rows)*(*num_cols), fp);
fclose (fp);
}
/* Writes a 2D array in a binary file */
void write_matrix_binaryformat (char* filename, double** matrix, int num_rows, int num_cols) {
FILE *fp = fopen (filename,"wb");
fwrite (&num_rows, sizeof(int), 1, fp);
fwrite (&num_cols, sizeof(int), 1, fp);
fwrite (matrix[0], sizeof(double), num_rows*num_cols, fp);
fclose (fp);
}
My task is to do a parallel matrix multiplication of matrix A and B and gather the results in matrix C.
I am doing this by dividing matrix A in rowwise pieces and each process is going to use its piece to multiply matrix B, and get back its piece from the multiplication. Then I am going to gather all the pieces from the processes and put them together to matrix C.
I allready posted a similiar question, but this code is improved and I have progressed but I am still getting a segmentation fault after the scatterv call.
So I see a few problems right away:
MPI_Bcast (&B, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Here, you're passing not a pointer to doubles, but a pointer to a pointer to a pointer to a double (B is defined as double **B) and you're telling MPI to follow that pointer and send 1 double from there. That is not going to work.
You might think that what you're accomplishing here is sending the pointer to the matrix, from which all tasks can read the array -- that doesn't work. The processes don't share a common memory space (that's why MPI is called distributed memory programming) and the pointer doesn't go anywhere. You're actually going to have to send the contents of the matrix,
MPI_Bcast (&(B[0][0]), rowsB*colsB, MPI_DOUBLE, 0, MPI_COMM_WORLD);
and you're going to have to make sure the other processes have correctly allocated memory for the B matrix ahead of time.
There's similar pointer problems elsewhere:
MPI_Scatterv(&A[0], ..., &local_A[0]
Again, A is a pointer to a pointer to doubles (double **A) as is local_A, and you need to be pointing MPI to pointer to doubles for this to work, something like
MPI_Scatterv(&(A[0][0]), ..., &(local_A[0][0])
that error seems to be present in all the communications routines.
Remember that anything that looks like (buffer, count, TYPE) in MPI means that the MPI routines follow the pointer buffer and send the next count pieces of data of type TYPE there. MPI can't follow pointers within the buffer you sent becaue in general it doens't know they're there. It just takes the next (count * sizeof(TYPE)) bytes from pointer buffer and does whatever communications is appropriate with them. So you have to pass it a pointer to a stream of data of type TYPE.
Having said all that, it would be a lot easier to work with you on this if you had narrowed things down a bit; right now the program you've posted includes a lot of I/O stuff that's irrelevant, and it means that no one can just run your program to see what happens without first figuring out the matrix format and then generating two matrices on their own. When posting a question about source code, you really want to post a (a) small bit of source which (b) reproduces the problem and (c) is completely self-contained.
Consider this an extended comment as Jonathan Dursi has already given a fairly elaborate answer. You matrices are really represented in a weird way but at least you followed the advice given to your other question and allocate space for them as contiguous blocks and not separately for each row.
Given that, you should replace:
MPI_Scatterv(&A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,&local_A[0],
local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);
with
MPI_Scatterv(A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,local_A[0],
local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);
A[0] already points to the beginning of the matrix data and there is no need to make a pointer to it. The same goes for local_A[0] as well as for the parameters to the MPI_Gatherv() call.
It has been said many times already - MPI doesn't do pointer chasing and only works with flat buffers.
I've also noticed another mistake in your code - memory for your matrices is not freed correctly. You are only freeing the array of pointers and not the matrix data itself:
free(A);
should really become
free(A[0]); free(A);

Resources