I have to face this situation:
given N number of MPI nodes
and
given a 2D real array of [N_ROWS,N_COLS] dimension
I have to partition it into in order to speed up calculus, giving to each node
a subsection of 2D array and taking advantage of number of nodes.
Following Fortran way to store data in memory, arrays are indexed using the most rapidly changing variable first, every [:,i]-column of the array is "logically" separated from the others.
I have looked around to very illuminating questions like this one Sending 2D arrays in Fortran with MPI_Gather
And I have reached the idea of using mpi_scatterv and mpi_gatherv, BUT I'm stuck against the fact that, since in the problem constraints, there is no possibility to guarantee that for each MPI node it is given the same amount of data, or, in pseudo code:
#Number_of_MPI_nodes != N_ROWS*N_COLS
I was looking to use vectors, since each "column" has is own "independent" series of data, when I say "independent" I mean that I have to do some manipulation on the data belonging the same column, without affecting other columns.
Obviously, since the inequality given, some MPI nodes will have a different number of "columns" to analyze.
After doing some math, I need to gather back the data, using mpi_gatherv
I will update the question with a working example in a few hours!
Thanks a lot to everybody !
CODE:
program main
use mpi
implicit none
integer:: N_COLS=100, N_ROWS=200
integer:: i, j
integer:: ID_mpi, COM_mpi, ERROR_mpi
integer:: master = 0, SIZE_mpi=0
integer:: to_each_cpu=0, to_each_cpu_oddment=0
integer:: sub_matrix_size=0
integer:: nans=0, infs=0, array_split =0, my_type=0
integer ,dimension(:), allocatable :: elem_to_each_cpu
integer ,dimension(:), allocatable :: displacements
integer,parameter:: seed = 12345
character*160:: message
real :: tot_sum = 0.0
real ,dimension(:,:), allocatable:: Data_Matrix
real ,dimension(:,:), allocatable:: sub_split_Data_Matrix
call srand(seed)
call MPI_INIT(ERROR_mpi)
COM_mpi = MPI_COMM_WORLD
call MPI_COMM_RANK(COM_mpi,ID_mpi,ERROR_mpi)
call MPI_COMM_SIZE(COM_mpi,SIZE_mpi,ERROR_mpi)
!! allocation Data_Matrix
i = 1; j = 1
if (ID_mpi .eq. master) then
i = N_ROWS; j = N_COLS
end if
allocate(Data_Matrix(i, j))
do j = 1, N_COLS
do i = 1, N_ROWS
Data_Matrix(i, j) = rand()
tot_sum = tot_sum + Data_Matrix(i, j)
enddo
enddo
write(message,*) "N_COLS:",N_COLS, "N_ROWS:", N_ROWS, " TOTAL_SUM:", tot_sum
write(*,*) message
!! SINCE THERE ARE NO RESTRICTIONS ON MPI NUMBER OR CPUS OR
!! SIZE OR Data_Matrix I NEED TO DO THIS
to_each_cpu =N_COLS / SIZE_mpi
to_each_cpu_oddment = N_COLS -( to_each_cpu * SIZE_mpi )
allocate(elem_to_each_cpu(SIZE_mpi))
elem_to_each_cpu = to_each_cpu
allocate(displacements(SIZE_mpi))
displacements = 0
!! I CHOOSE TO SPLIT THE DATA IN THIS WAY
if (ID_mpi .eq. master) then
write(message,*) "N_COLS:",N_COLS, "mpisize:", SIZE_mpi, "to_each_cpu\oddment:", to_each_cpu, " \ ", to_each_cpu_oddment
write(*,*) message
j=1
do i = 1 , to_each_cpu_oddment
elem_to_each_cpu(j) = elem_to_each_cpu(j) + 1
j = j + 1
if(j .gt. SIZE_mpi) j = 1
enddo
do j = 2, SIZE_mpi
displacements(j) = elem_to_each_cpu(j-1) + displacements(j-1)
enddo
do i = 1 , SIZE_mpi
write(message,*)i, " to_each_cpu:", &
elem_to_each_cpu(i), " sub_split_buff_displ:",displacements(i), "=",elem_to_each_cpu(i)+displacements(i)
write(*,*) message
enddo
end if
call MPI_BCAST(elem_to_each_cpu, SIZE_mpi, MPI_INT, 0, COM_mpi, ERROR_mpi)
call MPI_BCAST(displacements, SIZE_mpi, MPI_INT, 0, COM_mpi, ERROR_mpi)
allocate( sub_split_Data_Matrix(N_ROWS,elem_to_each_cpu(ID_mpi+1)) )
call MPI_TYPE_VECTOR(N_COLS,N_ROWS,N_ROWS,MPI_FLOAT,my_type,ERROR_mpi)
call MPI_TYPE_COMMIT(my_type, ERROR_mpi)
sub_split_Data_Matrix=0
sub_matrix_size = N_ROWS*elem_to_each_cpu(ID_mpi+1)
call MPI_scatterv( Data_Matrix,elem_to_each_cpu,displacements,&
MPI_FLOAT, sub_split_Data_Matrix, sub_matrix_size ,MPI_FLOAT, &
0, COM_mpi, ERROR_mpi)
!!! DOING SOME MATH ON SCATTERED MATRIX
call MPI_gatherv(&
sub_split_Data_Matrix, sub_matrix_size,MPI_FLOAT ,&
Data_Matrix, elem_to_each_cpu, displacements, &
MPI_FLOAT, 0, COM_mpi, ERROR_mpi)
!!! DOING SOME MATH ON GATHERED MATRIX
tot_sum = 0.0
do j = 1, N_COLS
do i = 1, N_ROWS
tot_sum = tot_sum + Data_Matrix(i, j)
enddo
enddo
write(message,*) "N_COLS:",N_COLS, "N_ROWS:", N_ROWS, " TOTAL_SUM:", tot_sum
write(*,*) message
deallocate(Data_Matrix)
if (ID_mpi .eq. master) then
deallocate(elem_to_each_cpu )
deallocate(displacements )
endif
deallocate(sub_split_Data_Matrix)
end
RESULT:
Error occurred in MPI_Gahterv
on communicator MPI_COMM_WORLD
Invalid memory reference
QUESTION:
Can you help me find the error ?
Or better, can you help me in showing if the approach
that I used was appropriate ?
Thanks a lot!
I had a look at your code and did some changes to fix it:
Unimportant: a few stylistic / cosmetic elements here and there to (from my standpoint and that is arguable) improve readability. Sorry if you don't like it.
There is no need for the process 0 to be the only one computing the lengths and displacements for the MPI_Scatterv()/MPI_Gatherv() calls. All processes should compute them since they all have the necessary data to do so. Moreover, it spares you two MPI_Bcast() which is good.
The lengths were strangely computed. I suspect it was wrong but I'm not sure since it was so convoluted I just rewrote it.
The main issue was a mix-up between the vector type and the scalar type: your lengths and displacements were computed for your vector type, but you were calling MPI_Scatterv()/MPI_Gatherv() with the scalar type. Moreover, for Fortran, this scalar type is MPI_REAL, not MPI_FLOAT. In the code I posted here-below, I computed lengths and displacements for MPI_REAL, but if you prefer, you can divide them all by N_ROWS and use the result of MPI_Type_contiguous( N_ROWS, MPI_REAL, my_type ) instead of MPI_REAL in the scatter/gather, and get the same result.
Here is the modified code:
program main
use mpi
implicit none
integer, parameter :: N_COLS=100, N_ROWS=200, master=0
integer :: i, j
integer :: ID_mpi,SIZE_mpi, COM_mpi, ERROR_mpi, my_type
integer :: to_each_cpu, to_each_cpu_oddment, sub_matrix_size
integer, allocatable :: elem_to_each_cpu(:), displacements(:)
real :: tot_sum = 0.0
real, allocatable :: Data_Matrix(:,:), sub_split_Data_Matrix(:,:)
call MPI_Init( ERROR_mpi )
COM_mpi = MPI_COMM_WORLD
call MPI_Comm_rank( COM_mpi, ID_mpi, ERROR_mpi )
call MPI_Comm_size( COM_mpi, SIZE_mpi, ERROR_mpi )
!! allocation Data_Matrix
if ( ID_mpi == master ) then
allocate( Data_Matrix( N_ROWS, N_COLS ) )
call random_number( Data_Matrix )
do j = 1, N_COLS
do i = 1, N_ROWS
tot_sum = tot_sum + Data_Matrix(i, j)
enddo
enddo
print *, "N_COLS:", N_COLS, "N_ROWS:", N_ROWS, " TOTAL_SUM:", tot_sum
end if
!! SINCE THERE ARE NO RESTRICTIONS ON MPI NUMBER OR CPUS OR
!! SIZE OR Data_Matrix I NEED TO DO THIS
to_each_cpu = N_COLS / SIZE_mpi
to_each_cpu_oddment = N_COLS - ( to_each_cpu * SIZE_mpi )
allocate( elem_to_each_cpu(SIZE_mpi) )
elem_to_each_cpu = to_each_cpu * N_ROWS
allocate( displacements(SIZE_mpi) )
displacements = 0
!! I CHOOSE TO SPLIT THE DATA IN THIS WAY
if ( ID_mpi == master ) then
print *, "N_COLS:", N_COLS, "mpisize:", SIZE_mpi, "to_each_cpu\oddment:", to_each_cpu, " \ ", to_each_cpu_oddment
end if
do i = 1, to_each_cpu_oddment
elem_to_each_cpu(i) = elem_to_each_cpu(i) + N_ROWS
enddo
do i = 1, SIZE_mpi-1
displacements(i+1) = displacements(i) + elem_to_each_cpu(i)
enddo
if ( ID_mpi == master ) then
do i = 1, SIZE_mpi
print *, i, " to_each_cpu:", &
elem_to_each_cpu(i), " sub_split_buff_displ:", displacements(i), &
"=", elem_to_each_cpu(i) + displacements(i)
enddo
end if
allocate( sub_split_Data_Matrix(N_ROWS, elem_to_each_cpu(ID_mpi+1)/N_ROWS) )
sub_split_Data_Matrix = 0
sub_matrix_size = elem_to_each_cpu(ID_mpi+1)
call MPI_scatterv( Data_Matrix, elem_to_each_cpu ,displacements, MPI_REAL, &
sub_split_Data_Matrix, sub_matrix_size, MPI_REAL, &
master, COM_mpi, ERROR_mpi )
!!! DOING SOME MATH ON SCATTERED MATRIX
call MPI_gatherv( sub_split_Data_Matrix, sub_matrix_size, MPI_REAL, &
Data_Matrix, elem_to_each_cpu, displacements, MPI_REAL, &
master, COM_mpi, ERROR_mpi )
!!! DOING SOME MATH ON GATHERED MATRIX
if ( ID_mpi == master ) then
tot_sum = 0.0
do j = 1, N_COLS
do i = 1, N_ROWS
tot_sum = tot_sum + Data_Matrix(i, j)
enddo
enddo
print *, "N_COLS:", N_COLS, "N_ROWS:", N_ROWS, " TOTAL_SUM:", tot_sum
deallocate( Data_Matrix )
endif
deallocate( elem_to_each_cpu )
deallocate( displacements )
deallocate( sub_split_Data_Matrix )
end program main
With these modifications, the code works as expected:
$ mpif90 scat_gath2.f90
$ mpirun -n 3 ./a.out
N_COLS: 100 N_ROWS: 200 TOTAL_SUM: 10004.4443
N_COLS: 100 mpisize: 3 to_each_cpu\oddment: 33 \ 1
1 to_each_cpu: 6800 sub_split_buff_displ: 0 = 6800
2 to_each_cpu: 6600 sub_split_buff_displ: 6800 = 13400
3 to_each_cpu: 6600 sub_split_buff_displ: 13400 = 20000
N_COLS: 100 N_ROWS: 200 TOTAL_SUM: 10004.4443
Related
Want to gather a large 2D array in MPI on the root process
There is no problem on this code when I used small arrays but when I use number of rows 360*75.It crashes. Use 6 processes. So every slave has a chunk 60*75
program test
implicit none
include 'mpif.h'
INTEGER :: ierr, size, rank, i, j, k, l, num_angles,slice
DOUBLE PRECISION :: theta1, theta2, PI
DOUBLE PRECISION, ALLOCATABLE, DIMENSION(:,:) :: R
integer :: doublesize
integer (kind=MPI_Address_kind) :: start, extent
integer :: blocktype, resizedtype
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
if (rank == 0) then
if (allocated(R)) deallocate(R)
allocate(R(1:360*75,1:10))
else
if (allocated(R)) deallocate(R)
allocate(R(1:60*75,1:10))
end if
R = rank
call MPI_Type_create_subarray(2, [360*75,10], [60*75,10], [0,0], &
MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION, &
blocktype, ierr)
start = 0
call MPI_Type_size(MPI_DOUBLE_PRECISION, doublesize, ierr)
extent = doublesize * 60*75
call MPI_Type_create_resized(blocktype, start, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
CALL MPI_GATHER(R, 60*75*10, MPI_DOUBLE_PRECISION, R, 1, resizedtype, 0, MPI_COMM_WORLD, ierr)
CALL MPI_FINALIZE(ierr)
end program
The output on small arrays look like
0 0 0 0
0 0 0 0
1 1 1 1
1 1 1 1
2 2 2 2
2 2 2 2
....
Can the problem be with memory allocation on root process?
EDIT
change following in the code based on comment. Still have an error
CALL MPI_Type_contiguous(60*75*10, MPI_DOUBLE_PRECISION, new_type,ierr)
call MPI_TYPE_COMMIT(new_type,ierr)
if (rank == 0) then
CALL MPI_GATHER(MPI_IN_PLACE, 60*75*10, new_type, R, 1, new_type, 0, MPI_COMM_WORLD, ierr)
else
CALL MPI_GATHER(R, 60*75*10, new_type, R, 1, new_type, 0, MPI_COMM_WORLD, ierr)
end if
CALL MPI_FINALIZE(ierr)
Intro:
I am trying to write a large set of data to a single file using MPI IO using the code below.
The problem i encounter is, that i get an integer overflow (variable disp) and thus the MPI IO does not work properly.
The reason for this is, i think, the declaration of the variable disp (integer (kind=MPI_OFFSET_KIND) :: disp = 0) in the subroutine write_to_file(...).
Since for the process with the highest rank disp overflows.
Question:
Can I somehow define disp as kind=MPI_OFFSET_KIND but with higher range? I did not find a solution for that, except writing to multiple files, but I would prefer writing into a single file.
Some context:
The code is just a part of an code, which i use to output (and read; but I cut that part from the code example to make it easier) scalar (ivar = 1), vector(ivar=3) or tensor(ivar=3,6 or 9) values into binary files. The size of the 3D grid is defined by imax, jmax and kmax, where kmax is decomposed by Px processes into Mk. Lately the 3D grid grew to a size where i encountered the described problem.
Code Example: MPI_IO_LargeFile.f90
"""
PROGRAM MPI_IO_LargeFile
use MPI
implicit none
integer rank, ierr, Px
integer i, j, k, cnt
integer imax, jmax, kmax, Mk
integer status(MPI_STATUS_SIZE)
integer ivars;
real*4, dimension(:,:,:,:), allocatable :: outarr, dataarr
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, Px, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
imax = 256
jmax = 512
kmax = 1024
Mk = kmax / Px
if (rank < 1) print *, 'Preparing dataarr'
ivars = 6
allocate(dataarr(ivars,imax,jmax,Mk))
call RANDOM_NUMBER(dataarr)
! Output of Small File
if (rank < 1) print *, 'Output of SmallFile.txt'
ivars = 3
allocate(outarr(ivars,imax,jmax,Mk))
outarr(:,:,:,:) = dataarr(1:3,:,:,:)
call write_to_file(rank, 'SmallFile.txt', outarr)
deallocate(outarr)
! Output of Large File
if (rank < 1) print *, 'Output of LargeFile.txt'
ivars = 6
allocate(outarr(ivars,imax,jmax,Mk))
outarr(:,:,:,:) = dataarr(1:6,:,:,:)
call write_to_file(rank, 'LargeFile.txt', outarr)
deallocate(outarr)
deallocate(dataarr)
call MPI_Finalize(ierr)
CONTAINS
subroutine write_to_file(myrank, filename, arr)
implicit none
integer, intent(in) :: myrank
integer :: ierr, file, varsize
character(len=*), intent(in):: filename
real*4, dimension(:,:,:,:), allocatable, intent(inout) :: arr
**integer (kind=MPI_OFFSET_KIND) :: disp = 0**
varsize = size(arr)
disp = myrank * varsize * 4
**write(*,*) rank, varsize, disp**
call MPI_File_open(MPI_COMM_WORLD, filename, &
& MPI_MODE_WRONLY + MPI_MODE_CREATE, &
& MPI_INFO_NULL, file, ierr )
call MPI_File_set_view(file, disp, MPI_REAL4, &
& MPI_REAL4, 'native', MPI_INFO_NULL, ierr)
call MPI_File_write(file, arr, varsize, &
& MPI_REAL4, MPI_STATUS_IGNORE, ierr)
call MPI_FILE_CLOSE(file, ierr)
end subroutine write_to_file
END PROGRAM MPI_IO_LargeFile
"""
Output of Code: MPI_IO_LargeFile.f90
mpif90 MPI_IO_LargeFile.f90 -o MPI_IO_LargeFile
mpirun -np 4 MPI_IO_LargeFile
Preparing dataarr
Output of SmallFile.txt
2 100663296 805306368
1 100663296 402653184
3 100663296 1207959552
0 100663296 0
Output of LargeFile.txt
1 201326592 805306368
0 201326592 0
2 201326592 1610612736
3 201326592 -1879048192
mca_fbtl_posix_pwritev: error in writev:Invalid argument
mca_fbtl_posix_pwritev: error in writev:Invalid argument
The Problem is that the multiplication in
disp = myrank * varsize * 4
overflowed, since each variable was declared as integer.
One solution provided by #Gilles (in the comments of the question) was simply to change this line to
disp = myrank * size(arr, kind=MPI_OFFSET_KIND) * 4
Using size(arr, kind=MPI_OFFSET_KIND) converts the solution into an integer of kind=MPI_OFFSET_KIND, which solves the overflow problem.
Thank you for your help.
I have a physics simulation program that generates a file with six columns, one for the time, and other five for physical properties. I need to make a Fortran 90 program that read this file, and generates five files with two columns, one for the time and another for a physical property.
I have used F90 before, but I only know how to generate files and write on them, but I have no idea how to modify a file and generate more files with data from a file.
I don't expect to have the problem solved, I just want to know where to find information. Any advice will be useful.
I don't know a priori how many rows the program will generate
Here is an example which is not been tested...
It is a bit of a kindergarten approach, but it may be helpful. You could avoid the 6 array altogether, but it is often better to have the variables as separate arrays as it makes it vectorise better with contiguous memory layout. One could also read those into the 6 arrays, and avoid the 6xN array.
PROGRAM ABC
IMPLICIT NONE
REAL, DIMENSION(:,:) :: My_File_Data
REAL, DIMENSION(:) :: My_Data1
REAL, DIMENSION(:) :: My_Data2
REAL, DIMENSION(:) :: My_Data3
REAL, DIMENSION(:) :: My_Data4
REAL, DIMENSION(:) :: My_Data5
REAL, DIMENSION(:) :: My_Data6
INTEGER :: Index, LUN, I, IO_Status
OPEN(NEWUNIT=LUN, FILE='abc.dat')
Index = 0
FirstPass: DO WHILE(.TRUE.)
READ(UNIT=LUN,*, IO_Status)
IF(IO_Status /= 0) EXIT
Index = Index + 1
ENDDO FirstPass
REWIND(LUN)
ALLOCATE(My_File_Data(Index))
ALLOCATE(My_Data1(Index))
ALLOCATE(My_Data2(Index))
ALLOCATE(My_Data3(Index))
ALLOCATE(My_Data4(Index))
ALLOCATE(My_Data5(Index))
ALLOCATE(My_Data6(Index))
SecondPass: DO I = 1, Index
READ(UNIT=LUN,*) My_File_Data(:,I)
Index = Index + 1
ENDDO SecondPass
DO I = 1, Index
Data1(I) = My_File_Data(1,I)
ENDDO
! What follows is more elegant...
Data2(:) = My_File_Data(2,:) !Where the first (:) is redundant... It seems more readable, but there are some reasons not to use it... (LTR)
Data3 = My_File_Data(3,:)
Data4 = My_File_Data(4,:)
Data5 = My_File_Data(5,:)
Data6 = My_File_Data(6,:)
DEALLOCATE(My_File_Data)
!Etc
The first step is to read in the data. In the following instructions, we will first loop over the file and count the number of rows, nrows. This value will be used to allocate a data array to the necessary size. We then return to the beginning of the file and read in our data in a second loop.
Declare an integer variable to act as a file handle/reference.
Declare an allocatable array of reals (floats) to hold the data.
Loop over the file to count the number of lines in the file. Remove header lines from the count.
Allocate the data array to the proper size, (nrows,nvalues).
Return to the beginning of the file. Repeat the loop over each of the rows, reading all values from the row into your data array.
Close the file.
The next step is to create 5 new files, each containing the time and one of the 5 property measurements:
Loop over each of the 5 properties contained in data.
For each jth property, open a new file.
Loop over the data array, writing the time and jth property to a new line.
Close the file.
Here is working code you can use or modify to suit your needs:
program SO
implicit none
integer :: i, j, nrows, nvalues, funit, ios
real, allocatable, dimension(:,:) :: data
character(len=10), dimension(5) :: outfiles
!! begin
nvalues = 5
nrows = 0
open(newunit=funit, file='example.txt', status='old', iostat=ios)
if (ios /= 0) then
print *, 'File could not be opened.'
call exit
else
do
read(funit,*,iostat=ios)
if (ios == 0) then
nrows = nrows + 1
elseif (ios < 0) then !! End of file (EOF).
exit !! The 'exit' stmt breaks out of the loop.
else !! Error if > 0.
print *, 'Read error at line ', nrows + 1
call exit() !! The 'exit' intrinsic ends the program.
endif !! We we may pass an optional exit code.
enddo
endif
nrows = nrows - 1 !! 'nrows-1': Remove column headers from count.
if (allocated(data)) deallocate(data) !! This test follows standard "best practices".
allocate(data(nrows,nvalues+1))
rewind(funit)
read(funit, *) !! Skip column headers.
do i = 1,nrows
read(funit, *) data(i,:) !! Read values into array.
enddo
close(funit)
!! Output file names.
outfiles = ['prop1.txt', 'prop2.txt', 'prop3.txt', 'prop4.txt', 'prop5.txt']
do j = 1,nvalues
open(newunit=funit, file=outfiles(j), status='replace', iostat=ios)
if (ios /= 0) then
print *, 'Could not open output file: ',outfiles(j)
call exit()
endif
write(funit,"(a)") "time "//outfiles(j)(1:5)
do i = 1,nrows
write(funit,"(f0.0,t14,es14.6)") data(i,1), data(i,j+1)
enddo
close(funit)
enddo
end program SO
All the other answers want to read in everything at once. I think that's too much of a bother.
Firstly, I'd check if I even needed Fortran for that. The Linux command cut can be used very effectively here. For example, if your data is comma separated, you could simply use
for i in {1..5}; do
cut -d, -f1,$((i+1)) data.txt > data${i}.txt;
done
to do the whole thing.
If you need Fortran, here's how I'd go about it:
Open all files
In a permanent loop, read in the whole row at once.
If you encounter an error, it's probably EOF, so exit the loop
Write the data to the output files.
Here's some basic code:
program split
implicit none
integer :: t, d(5), u_in, u_out(5)
integer :: i
integer :: ios
open(newunit=u_in, file='data.txt', status="old", action="read")
open(newunit=u_out(1), file='temperature.txt', status='unknown', action='write')
open(newunit=u_out(2), file='pressure.txt', status='unknown', action='write')
open(newunit=u_out(3), file='pair_energy.txt', status='unknown', action='write')
open(newunit=u_out(4), file='ewald_energy.txt', status='unknown', action='write')
open(newunit=u_out(5), file='pppm_energy.txt', status='unknown', action='write')
read(u_in, *) ! omit the column names
write(u_out(1), *) "Time Temperature"
write(u_out(2), *) "Time Pressure"
write(u_out(3), *) "Time Pair Energy"
write(u_out(4), *) "Time Ewald Energy"
write(u_out(5), *) "Time PPPM Energy"
do
read(u_in, *, iostat=ios) t, d
if (ios /= 0) exit
do i = 1, 5
write(u_out(i), *) t, d(i)
end do
end do
close(u_in)
do i = 1, 5
close(u_out(i))
end do
end program split
Cheers
I am writing a fortran code for benchmarking three loop kernels:
program Kernel_benchmark
implicit none
double precision,dimension (:),save,allocatable:: a,b,c,d,x,y
double precision s
double precision,dimension (:,:),save,allocatable:: mat
double precision wcs,wce,ct,runtime, total
integer k,iter,r,i,j,N
do k = 3, 20
N = INT(2.5**k)
allocate (a(N),b(N),c(N),d(N))
do i=1,N
a(i) = 1.2
b(i) = 1.2
c(i) = 1.2
d(i) = 1.2
end do
iter = 1
runtime = 0.0
do while(runtime < 0.2)
call timing(wcs,ct)
do r =0, iter
do i=1,N
a(i) = b(i) + c(i) * d(i)
end do
if(a(ISHFT(N,-1)) < 0.0) then
call dummy(a)
end if
end do
call timing(wce,ct)
runtime = wce - wcs
iter = iter * 2
end do
iter = iter / 2
open(unit=1, file = 'vector_triad.dat',status = 'unknown')
write(1,*) N, (N * iter* 2) / (runtime * 1e-6)
close(1)
deallocate(a,b,c,d)
end do
do k = 3, 20
N = INT(2.5**k)
allocate(a(N))
do i = 1, N
a(i) = 1.2
end do
s = 2.2
iter = 1
runtime = 0.0
do while(runtime < 0.2)
call timing(wcs,ct)
do r = 0, iter
do i = 1, N
a(i) = s * a(i)
end do
if(a(ISHFT(N,-1)) < 0.0) then
call dummy(a)
end if
end do
call timing(wce,ct)
runtime = wce - wcs
iter = iter * 2
end do
iter = iter / 2
open (unit = 2, file = 'vector_update.txt', status = 'unknown' )
write(2,*) N, (N * iter) / (runtime * 1e-6)
close(2)
deallocate(a)
end do
do k = 10, 22
N = INT(1.5**k)
allocate (mat(N,N),x(N),y(N))
do i = 1, N
do j = 1, N
mat(i,j) = 1.2
end do
y(i) = 1.2
x(i) = 1.2
end do
iter = 1
runtime = 0.0
do while(runtime < 0.2)
call timing(wcs,ct)
do r = 0, iter
do i = 1, N
y(i) = 0.0
do j = 1, N
y(i) = y(i) + (mat(i,j) * x(i))
end do
end do
if(y(ISHFT(N,-1))< 0.0) then
call dummy(y)
end if
end do
call timing(wce,ct)
runtime = wce - wcs
iter = iter * 2
end do
iter = iter / 2
open (unit = 3, file = 'matrix_vector.txt', status ='unknown')
write(3,*) N, (2 * N * N * iter) / (runtime * 1e-6)
close(3)
deallocate(mat,x,y)
end do
end program Kernel_benchmark
The dummy function I have written inside a C source file as follows
#include "dummy.h"
void dummy(double *array){
printf ("Well if its printing this then you're pretty much screwed.");
}
and dummy.h simply contains the function prototype.
I made an dummy.o object file and I'm trying to link it with my fortran source code using an intel ifort compiler. Unfortunately, I'm getting an error
In function MAIN__':bench.f90:(.text+0x8ca): undefined reference todummy_'
every time the dummy function is called. Any suggestion? Thanks in advance.
The modern way to interface with Fortran is the interoperability with C and the iso_c_binding module, as discussed many times on this site.
Calling a FORTRAN subroutine from C
https://stackoverflow.com/search?tab=votes&q=iso_c_binding
Inside the Fortram program, the symbol dummy is taken to be a subroutine with an implicit interface. Naturally the Fortran compiler things that subroutine will be a Fortran subroutine and will arrange argument passing, linker name mangling, etc appropriately.
Because dummy procedure is a C function and not a Fortran subroutine, problems ensure.
If the Fortran compiler is explicitly told that the dummy symbol is a C function, then it will make the appropriate changes. In the specification part of your main program:
INTERFACE
SUBROUTINE dummy(array) BIND(C, NAME='dummy')
IMPLICIT NONE
DOUBLE PRECISION :: array(*)
END SUBROUTINE
END INTERFACE
Robust code would further set the kind of the array argument appropriately.
If you use GNU compilers, note that the name mangling is a little bit different for C and Fortran. If your fortran programm call subroutine xyz then the corresponding C subroutine should be named xyz_.
So in your case it is enough to rename dummy into dummy_ in the C sources. You may also need to link with -lg2c if I remember correctly.
I'm writing to a file as follows. The order does not necessarily matter (though it would be nice if I could get it ordered by K, as would be inherently in serial code)
CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)
OPEN(EIGENVALUES_UP_IO, FILE=EIGENVALUES_UP_PATH, ACCESS='APPEND')
WRITE(EIGENVALUES_UP_IO, *) K * 0.0001_DP * PI, (EIGENVALUES(J), J = 1, ATOM_COUNT)
CLOSE(EIGENVALUES_UP_IO)
I'm aware this is likely to be the worst option.
I've taken a look at MPI_FILE_WRITE_AT etc. but I'm not sure they (directly) take data in the form that I have?
The file must be in the same format as this, which comes out as a line per K, with ATOM_COUNT + 1 columns. The values are REAL(8)
I've hunted over and over, and can't find any simple references on achieving this. Any help? :)
Similar code in C (assuming it's basically the same as FORTRAN) is just as useful
Thanks!
So determining the right IO strategy depends on a lot of factors. If you are just sending back a handful of eigenvalues, and you're stuck writing out ASCII, you might be best off just sending all the data back to process 0 to write. This is not normally a winning strategy, as it obviously doesn't scale; but if the amount of data is very small, it could well be better than the contention involved in trying to write out to a shared file (which is, again, harder with ASCII).
Some code is below which will schlep the amount of data back to proc 0, assuming everyone has the same amount of data.
Another approach would just be to have everyone write out their own ks and eigenvalues, and then as a postprocessing step once the program is finished, cat them all together. That avoids the MPI step, and (with the right filesystem) can scale up quite a ways, and is easy; whether that's better is fairly easily testable, and will depend on the amount of data, number of processors, and underlying file system.
program testio
use mpi
implicit none
integer, parameter :: atom_count = 5
integer, parameter :: kpertask = 2
integer, parameter :: fileunit = 7
integer, parameter :: io_master = 0
double precision, parameter :: pi = 3.14159
integer :: totalk
integer :: ierr
integer :: rank, nprocs
integer :: handle
integer(kind=MPI_OFFSET_KIND) :: offset
integer :: filetype
integer :: j,k
double precision, dimension(atom_count, kpertask) :: eigenvalues
double precision, dimension(kpertask) :: ks
double precision, allocatable, dimension(:,:):: alleigenvals
double precision, allocatable, dimension(:) :: allks
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
totalk = nprocs*kpertask
!! setup test data
do k=1,kpertask
ks(k) = (rank*kpertask+k)*1.d-4*PI
do j=1,atom_count
eigenvalues(j,k) = rank*100+j
enddo
enddo
!! Everyone sends proc 0 their data
if (rank == 0) then
allocate(allks(totalk))
allocate(alleigenvals(atom_count, totalk))
endif
call MPI_GATHER(ks, kpertask, MPI_DOUBLE_PRECISION, &
allks, kpertask, MPI_DOUBLE_PRECISION, &
io_master, MPI_COMM_WORLD, ierr)
call MPI_GATHER(eigenvalues, kpertask*atom_count, MPI_DOUBLE_PRECISION, &
alleigenvals, kpertask*atom_count, MPI_DOUBLE_PRECISION, &
io_master, MPI_COMM_WORLD, ierr)
if (rank == 0) then
open(unit=fileunit, file='output.txt')
do k=1,totalk
WRITE(fileunit, *) allks(k), (alleigenvals(j,k), j = 1, atom_count)
enddo
close(unit=fileunit)
deallocate(allks)
deallocate(alleigenvals)
endif
call MPI_FINALIZE(ierr)
end program testio
If you can determine how long each rank's write will be, you can call MPI_SCAN(size, offset, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD) to compute the offset that each rank should start at, and then they can all call MPI_FILE_WRITE_AT. This is probably more suitable if you have a lot of data, and you are confident that your MPI implementation does the write efficiently (doesn't serialize internally, or the like).