MPI_Send and MPI_Recv, measuring transfer time of a 1Mb message - c

I am attempting to send a message of size 1Mb using MPI_Send and MPI_Recv and to measure how long it takes to send that message. Here is my c code.
#include <stdio.h>
#include <mpi.h>
#include <assert.h>
#include <sys/time.h>
int main(int argc,char *argv[])
{
int rank,p;
struct timeval t1,t2;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&p);
printf("my rank=%d\n",rank);
printf("Rank=%d: number of processes =%d\n",rank,p);
assert(p>=2);
if(rank==0) {
int x[255] = { 0 };
int dest = 7;
int i = 0;
while (i<254)
{
x[i] = 255;
i++;
}
gettimeofday(&t1,NULL);
MPI_Send(&x[0],255,MPI_INT,dest,1,MPI_COMM_WORLD);
gettimeofday(&t2,NULL);
int tSend = (t2.tv_sec-t1.tv_sec)*1000 + (t2.tv_usec-t1.tv_usec)/1000;
printf("Rank=%d: sent message %d to rank %d; Send time %d millisec\n", rank,*x,dest,tSend);
} else
if (rank==7) {
int y[255]={0};
MPI_Status status;
gettimeofday(&t1,NULL);
MPI_Recv(&y[0],255,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
gettimeofday(&t2,NULL);
int tRecv = (t2.tv_sec-t1.tv_sec)*1000 + (t2.tv_usec-t1.tv_usec)/1000;
printf("Rank=%d: received message %d from rank %d; Recv time %d millisec\n",rank,*y,status.MPI_SOURCE,tRecv);
}
MPI_Finalize();
}
This code compiles and runs just fine, but it always says that it completes the send and receive in 0 milliseconds, which isn't possible. I'm guessing that my syntax in sending the array is wrong so I'm just sending 4 bytes or something, but I can't figure it out.
Any help would be appreciated!

Maybe a better way to measure the time is to measure it in microseconds
(t2.tv_sec - t1.tv_sec) * 1000000 + t2.tv_usec - t1.tv_usec
and see if you get any values.

Related

MPI_Scatter produces write error, bad address (3)

I am receiving a writing error when trying to scatter a dynamically allocated matrix (it is contiguous), it happens when more than 5 cores are involved in the computation. I have placed printfs and it occurs in the scatter, the code is the next:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cblas.h>
#include <sys/time.h>
int main(int argc, char* argv[])
{
int err = MPI_Init(&argc, &argv);
MPI_Comm world;
world=MPI_COMM_WORLD;
int size = 0;
err = MPI_Comm_size(world, &size);
int rank = 0;
err = MPI_Comm_rank(world, &rank);
int n_rows=2400, n_cols=2400, n_rpc=n_rows/size;
float *A, *Asc, *B, *C; //Dyn alloc A B and C
Asc=malloc(n_rpc*n_cols*sizeof(float));
B=malloc(n_rows*n_cols*sizeof(float));
C=malloc(n_rows*n_cols*sizeof(float));
A=malloc(n_rows*n_cols*sizeof(float));
if(rank==0)
{
for (int i=0; i<n_rows; i++)
{
for (int j=0; j<n_cols; j++)
{
A[i*n_cols+j]= i+1.0;
B[i*n_cols+j]=A[i*n_cols+j];
}
}
}
struct timeval start, end;
if(rank==0) gettimeofday(&start,NULL);
MPI_Bcast(B, n_rows*n_cols, MPI_FLOAT, 0, MPI_COMM_WORLD);
if(rank==0) printf("Before Scatter\n"); //It is breaking here
MPI_Scatter(A, n_rpc*n_cols, MPI_FLOAT, Asc, n_rpc*n_cols, MPI_FLOAT, 0, MPI_COMM_WORLD);
if(rank==0) printf("After Scatter\n");
/* Some computation */
err = MPI_Finalize();
if (err) DIE("MPI_Finalize");
return err;
}
Upto 4 cores, it works correctly and performs the scatter, but with 5 or more it does not, and I can not find a clear reason.
The error message is as follows:
[raspberrypi][[26238,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0xac51e0, 8)
Bad address(3)
[raspberrypi][[26238,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0xaf197048, 29053982)
Bad address(1)
[raspberrypi:05345] pml_ob1_sendreq.c:308 FATAL
Thanks in advance!
Multiple errors, first of all, take care of using always the same type when defining variables. Then, when you use scatter, the send count and receive are the same, and you will be sending Elements/Cores. Also when receiving with gather you have to receive the same amount you sent, so again Elements/Cores.

Sending and Receiving different type of data using MPI_Isend

I want to send different type of data from a single node to another. For example, I have 2 data with different types: int and double. However, after I send them out, the node which receives the data receives the wrong value even though I specify the type. From the example code below, the second node receives 0.00000 and 0 values for both receiving. How can I fix this and make the second node receive the right value, in this case the value for double should be 1.2 and 5 for int.
include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
/* Main function */
int main(int argc, char *argv[]) {
int rank, numberOfProcesses;
MPI_Init( &argc, &argv );
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numberOfProcesses);
MPI_Status status[5];
MPI_Request req[5]; //request array
int nreq = 0;
int tag = 0;
if (rank == 0){
double sendDouble = 1.2;
int sendInteger = 5;
MPI_Isend(&sendInteger, 1, MPI_INT, 1, 100, MPI_COMM_WORLD,&req[nreq++]);
MPI_Isend(&sendDouble, 1, MPI_DOUBLE, 1, 222, MPI_COMM_WORLD,&req[nreq++]);
}
MPI_Waitall(nreq,req,status);
if (rank ==1) {
MPI_Status status;
double recvDouble;
int recvInt;
MPI_Irecv(&recvDouble,1,MPI_DOUBLE,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&req[nreq++]);
MPI_Irecv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&req[nreq++]);
printf("receive double %lf\n", recvDouble);
printf("receive integer %d\n", recvInt);
}
MPI_Finalize();
return 0;
}
receive double 0.000000
receive integer 0
I am not sure if this is an answer, but I have some observations:
Apparently you start the program twice and depending on its rank it is a sender or a receiver.
If it is the sender, you send and integer. Then you send a double.
If it is a receiver, you want to receive a double.
But in asynchronous communication, you must read what is in the channel, so the first thing in the channel would be the integer. After having read the integer, you can read the double. (I cannot find any information that the receiver would pick a double from the stream, ignoring the int.)
Note that your MPI_Waitall(nreq,req,status); is also executed for rank==1, though that won't matter because nreq is zero.
You should also check the return values of the MCI calls.

How to get runtime of one thread in kernel?

I need to get the process/thread time. I use linux-4.13.4 in Ubuntu 16.04
I read some posts and get that
sum_exec_runtime
Total time process ran on CPU
In real time
Nano second units (10^−9)
Updated in __update_curr(), called from update_curr()
So I think if it is a single thread program. Somehow, I can get the running time of the thread by exact sum_exec_runtime from task_struct
I add syscall to get time:
So I make some little change inside linux kernel.
struct task_struct {
...
...
struct sched_entity se;
// TODO: to get start time and end time
u64 start_time;
u64 end_time;
...
...
};
Then I add my syscall to store sum_exec_runtime into start_time and end_time when I call
asmlinkage long sys_start_vm_timer(int __user vm_tid);
asmlinkage long sys_end_vm_timer(int __user vm_tid, unsigned long long __user *time);
SYSCALL_DEFINE1(start_vm_timer,
int __user, vm_tid){
(find_task_by_vpid(vm_tid))->vm_start_time =
(find_task_by_vpid(vm_tid))->se.sum_exec_runtime;
return 0;
}
SYSCALL_DEFINE2(end_timer,
int __user, tid,
unsigned long long __user, *time){
u64 vm_time;
(find_task_by_vpid(vm_tid))->vm_end_time =
(find_task_by_vpid(vm_tid))->se.sum_exec_runtime;
printk("-------------------\n");
printk("end_vm_time: vm_elapsed_time = %llu \n", ((find_task_by_vpid(vm_tid))->vm_end_time - (find_task_by_vpid(vm_tid))->vm_start_time) );
vm_time = ((find_task_by_vpid(vm_tid))->vm_end_time - (find_task_by_vpid(vm_tid))->vm_start_time);
copy_to_user(time, &vm_time, sizeof(unsigned long long));
return 0;
}
I test with this program tries to get the time of a for loop.
#include <stdio.h>
#include <linux/kernel.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <time.h>
#include <sys/time.h>
#include <stdlib.h>
int main(){
int tid = syscall(SYS_gettid);
printf("tid = %d \n", tid);
printf("My process ID : %d\n", getpid());
printf("My parent's ID: %d\n", getppid());
struct timeval start, end;
unsigned long long elapsedTime = 0;
gettimeofday(&start, NULL);
syscall(336, tid);
int i = 0;
int j = 0;
for(i = 0; i < 65535; i++){
j += 1;
}
syscall(337, tid, &elapsedTime);
gettimeofday(&end, NULL);
printf("thread time = %llu microseconds \n", elapsedTime/1000);
printf("gettimeofday = %ld microseconds \n", ((end.tv_sec * 1000000 + end.tv_usec)- (start.tv_sec * 1000000 + start.tv_usec)));
return 0;
}
I get unexpected result.
wxf#wxf:/home/wxf/cPrj$ ./thread_time
tid = 6905
My process ID : 6905
My parent's ID: 6595
thread time = 0 microseconds
gettimeofday = 422 microseconds
From dmesg,
[63287.065285] tid = 6905
[63287.065288] start_vm_timer = 0
[63287.065701] tid = 6905
[63287.065702] -------------------
[63287.065703] end_vm_timer = 0
[63287.065704] end_vm_time: vm_elapsed_time = 0
I expect that they should be almost the same. But why process/thread time is 0?
The value of sum_exec_runtime does not include the current runtime of the thread. It's updated when needed, not continuously. See the update_curr function.

A performance test doesn't work with nanomsg (bus)

I'd like to use nanomsg as bus-system. So I tried to code a performance test and testing it by using two PCs.
At first I wrote an server, which connects to the other server:
#include <nanomsg/nn.h>
#include <nanomsg/bus.h>
#include <string.h>
#include <stdio.h>
#include <assert.h>
int main(int argc, char *argv[]) {
if (argc == 2) {
int socket = nn_socket (AF_SP_RAW, NN_BUS);
assert(nn_bind (socket, "tcp://*:27384") != -1);
assert(nn_connect (socket, argv[1]) != -1);
assert(nn_device(socket, -1) != -1);
nn_close(socket);
}
}
In my case I run these commands as:
On the first PC:./server tcp://192.168.1.11:27384
On the second PC:./server tcp://192.168.1.241:27384
They are connected, to prove it, I used nanocat and connected it locally to the server:
On the first PC:
nanocat --bus --connect tcp://127.0.0.1:27384 --data foo --interval 1 --ascii
On the second PC:
nanocat --bus --connect tcp://127.0.0.1:27384 --data bar --interval 1 --ascii
On the first PC, I received a 'bar' every second, on the second PC a 'foo', also every second.
So I wrote now the receiver.
#include <nanomsg/nn.h>
#include <nanomsg/bus.h>
#include <string.h>
#include <stdio.h>
#include <assert.h>
int main(int argc, char *argv[]) {
int socket = nn_socket (AF_SP, NN_BUS);
assert(nn_connect (socket, "tcp://127.0.0.1:27384") != -1);
sleep(1);
unsigned char buffer[4096];
while(1) {
int n = nn_recv(socket, buffer, 4096, 0);
if (n > 0) {
nn_send(socket, buffer, strlen(buffer), 0);
}
}
nn_close(socket);
}
It receives a message and sends it back.
Then I wrote the sender:
#include <nanomsg/nn.h>
#include <nanomsg/bus.h>
#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <string.h>
#define NANO_PER_SEC 1000000000.0
int main(int argc, char *argv[]) {
int socket = nn_socket (AF_SP, NN_BUS);
nn_connect (socket, "tcp://127.0.0.1:27384");
sleep(1);
unsigned char buffer[4096];
int i = 0;
for (i = 0; i < 1024; i++) {
buffer[i] = 'a';
}
buffer[i] = '\0';
struct timespec start, end;
double start_sec, end_sec, elapsed_sec;
double average;
double m[4096];
for (i = 0; i < 4096; i++) {
clock_gettime(CLOCK_REALTIME, &start);
int ns = nn_send(socket, buffer, strlen(buffer), 0);
int nr = nn_recv(socket, buffer, 4096, 0);
clock_gettime(CLOCK_REALTIME, &end);
start_sec = start.tv_sec + start.tv_nsec/NANO_PER_SEC;
end_sec = end.tv_sec + end.tv_nsec/NANO_PER_SEC;
m[i] = end_sec - start_sec;
}
elapsed_sec = 0.0;
for (i = 0; i < 4096; i++) {
elapsed_sec = elapsed_sec + m[i];
}
average = (elapsed_sec / 4096.0) * 1000000.0;
printf("Average: %.3f micros\nWhole: %.12f seconds\n", average, elapsed_sec);
nn_close(socket);
}
The sender transmits 4096 times 1kbyte to the receiver and measures the time,
so I get the whole time and the average time.
At first I test it at only one PC locally, in three opened bash-terminals.
First terminal:./server tcp://192.168.1.11:27384
Second terminal:./receiver
Third terminal:./sender
So, I got from the "sender" programme this output:
Average: 60.386 micros
Whole: 0.247341632843 seconds
Then I tried to run this:
On first PC:
./server tcp://192.168.1.11:27384
./receiver
On second PC:
./server tcp://192.168.1.241:27384
./sender
But it stucks, the first PC running the "receiver" doesn't receive any message from the second PC, which runs the "sender". I don't get it whats wrong, because with nanocat it works fine.
Can somebody please help me?

Force MPI_Send to use eager or rendezvouz protocol

I'm doing a little MPI (openmpi) program in C for a workshop at college. Our objective is to observe the time difference between the two main protocols of MPI, eager and rendezvous, regarding the message size.
We haven't worked with MPI before, and we have thought that there may be a way to "select" between the two protocols. Searching on google for information about how to do it I found (somewhere I don't remember) that there is an eager limit. I read that it is set by the MPI implementation, and also, that you can change it somehow.
Any advice on how to choose between the protocols?
Are there any relation between the protocols and MPI_Send/MPI_Isend?
I thought that changing the receiver buffer size will break from eager and start using rendezvous. But it's just a hunch.
Here is my code for now:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"
#define KBdata 32000 //openmpi default buffer size
#define ndata KBdata/4 //number of ints that fits in buffer
int main(int argc, char *argv[]) {
int myid, numprocs;
int tag,source,destination,count;
int buffer[ndata];
MPI_Status status;
MPI_Request request;
int iter = 20;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0 && numprocs == 2) { //to
int recvID = 1;
double acum = 0;
int i;
double startT;
for (i=0;i<iter;++i)
{
double startTime = MPI_Wtime();
MPI_Send(&buffer,ndata,MPI_INT,recvID,0,MPI_COMM_WORLD);
double endTime = MPI_Wtime();
double elapsed = endTime - startTime;
acum += elapsed;
printf("%d, %f, elapsed: %f\n",i,acum,elapsed);fflush(stdout);
usleep(500000);
}
printf("total: %f\nmean: %f\n", acum, acum/iter);
}
else if (numprocs == 2) {
int i;
for(i=0; i<iter; ++i)
{
printf("Waiting for receive\n");fflush(stdout);
MPI_Recv(&buffer,ndata,MPI_INT,0,0,MPI_COMM_WORLD,&status);
printf("Received %d\n",i);fflush(stdout);
}
}
else {
printf("Need only 2 threads\n");
}
MPI_Finalize();
return 0;
}
Thank you in advice.
There is no direct connection between eager/rendezvous and MPI_Send/Isend. However, if you're under the eager limit, your MPI_Send is no longer blocking. If you want it to block regardless, you can use MPI_Ssend.
Regarding eager limits:
MVAPICH2:
MV2_IBA_EAGER_THRESHOLD= < nbytes >
Intel MPI (depending on version):
I_MPI_EAGER_THRESHOLD= < nbytes >
I_MPI_SHM_EAGER_THRESHOLD= < nbytes >
Open MPI:
--mca_btl_openib_eager_limit < nbytes >
--mca_btl_openib_rndv_eager_limit < nbytes >
Cray MPICH:
MPICH_GNI_MAX_EAGER_MSG_SIZE=<value>

Resources