I am trying to calculate a fairly complicated function, say func() - involving several additions, substractions, multiplications, divisions and trigonometric functions, of several two-dimensional arrays in fortran. The calculation is massively parrallel, in that each func() is independent over its row and column location. Each of the matrices is many gigabytes in size, and there are about a dozen of them as arguments.
I would like to make use of Intel MKL functions (invoking --mkl-parallel), in particular VML functions to add, subtract, divide etc. My question is: how can I render a complicated functional expression such as,
e.g.: func(x,y,z) = x*y+cos(z*x-x) where x,y,z are 2d arrays of several GB
in terms of VML functions but using more familiar binary operators. You see my problem requires, in principle, converting all the binary operators, such as "+" and "*" into binary functions taking arguments as ?vadd(x,y). Of course this would be very cumbersome and unsightly for large expressions. Is there a way to overload the binary arithmetic operators such as "+","-" to preferentially use MKL/VML versions in fortran. An example would be nice! Thanks!
I know this answer is a little bit off-topic.
Since all the operations are element-wise and your operations are simple, the func() could be a memory bandwidth bounded task. In this case, using VML may not be a good choice to maximum the performance.
Suppose each of your arrays is of 10GB in size, uisng VML as follows will need at least 9 x 10GB reading and 5 x 10GB writing.
func(...) {
tmp1=x*z
tmp1=tmp1-x;
tmp1=cos(tmp1);
tmp2=x*y;
return tmp1+tmp2;
}
where all the operations all overloaded for 2d array.
Instead you may find the following approach has much less memory access (3 x 10GB reading and 1 x 10GB writing) thus could be quicker (pseudo code).
$omp parallel for
for i in 1 to m
for j in 1 to n
result(i,j)= x(i,j)*y(i,j)+cos(z(i,j)*x(i,j)-x(i,j));
end
end
I developped a small example to show the addition of two vectors. As I don't have MKL installed anymore, I used the SAXPY command from BLAS. The principle should be the same.
At first you define a module with the appropriate definitions. In my case this would be an assignment to save a real array in my datatype (this is only a convenience function as you could also directly access the array variable) and the definition of the addition. Both are a new overload to the + operator and = assignment.
In the program, I define three fields. Two of them get assigned with random numbers and then added to get the third field. Then the first two fields get stored in my special variables, and the result of this addition is stored in a third variable of this type.
Finally, the result is compared by accessing the array directly. Please note, that the assignment from custom datatype to the same datatype is already defined (e.g. ffield3 = ffield1 is already defined.)
My module:
MODULE fasttype
IMPLICIT NONE
PRIVATE
PUBLIC :: OPERATOR(+), ASSIGNMENT(=)
TYPE,PUBLIC :: fastreal
REAL,DIMENSION(:),ALLOCATABLE :: array
END TYPE
INTERFACE OPERATOR(+)
MODULE PROCEDURE fast_add
END INTERFACE
INTERFACE ASSIGNMENT(=)
MODULE PROCEDURE fast_assign
END INTERFACE
CONTAINS
FUNCTION fast_add(fr1, fr2) RESULT(fr3)
TYPE(FASTREAL), INTENT(IN) :: fr1, fr2
TYPE(FASTREAL) :: fr3
INTEGER :: L
L = SIZE(fr2%array)
fr3 = fr2
CALL SAXPY(L, 1., fr1%array, 1, fr3%array, 1)
END FUNCTION
SUBROUTINE fast_assign(fr1, r2)
TYPE(FASTREAL), INTENT(OUT) :: fr1
REAL, DIMENSION(:), INTENT(IN) :: r2
INTEGER :: L
IF (.NOT. ALLOCATED(fr1%array)) THEN
L = SIZE(r2)
ALLOCATE(fr1%array(L))
END IF
fr1%array = r2
END SUBROUTINE
END MODULE
My program:
PROGRAM main
USE fasttype
IMPLICIT NONE
REAL, DIMENSION(:), ALLOCATABLE :: field1, field2, field3
TYPE(fastreal) :: ffield1, ffield2, ffield3
ALLOCATE(field1(10),field2(10),field3(10))
CALL RANDOM_NUMBER(field1)
CALL RANDOM_NUMBER(field2)
field3 = field1 + field2
ffield1 = field1
ffield2 = field2
ffield3 = ffield1 + ffield2
WRITE(*,*) field3 == ffield3%array
END PROGRAM
Related
I want to assign complex array as variable.
My code is like
complex indx(3,3)
integer i,j
do i=1,3
do j=1,3
indx(i,j) = (i,j)
write(*,*) indx(i,j)
end do
end do
and in this case I am getting an error like
A symbol must be a defined parameter in this context. [I]
indx(i,j) = (i,j)
You must use function cmplx to build a complex value you want to assign.
complex indx(3,3)
integer i,j
do i=1,3
do j=1,3
indx(i,j) = cmplx(i,j)
write(*,*) indx(i,j)
end do
end do
The syntax you tried is only valid for constant literals.
The answer by Vladimir F tells the important part: for (i,j) to be a complex literal constant i and j must be constants.1 As stated there, the intrinsic complex function cmplx can be used in more general cases.
For the sake of some variety and providing options, I'll look at other aspects of complex arrays. In the examples which follow I'll ignore the output statement and assume the declarations given.
We have, then, Vladimir F's correction:
do i=1,3
do j=1,3
indx(i,j) = CMPLX(i,j) ! Note that this isn't in array element order
end do
end do
We could note, though, that cmplx is an elemental function:
do i=1,3
indx(i,:) = CMPLX(i,[(j,j=1,3)])
end do
On top of that, we can consider
indx = RESHAPE(CMPLX([((i,i=1,3),j=1,3)],[((j,i=1,3),j=1,3)]),[3,3])
where this time the right-hand side is in array element order for indx.
Well, I certainly won't say that this last (or perhaps even the second) is better than the original loop, but it's an option. In some cases it could be more elegant.
But we've yet other options. If one has compiler support for complex part designators we have an alternative for the first form:
do i=1,3
do j=1,3
indx(i,j)%re = i
indx(i,j)%im = j
end do
end do
This doesn't really give us anything, but note that we can have the complex part of an array:
do i=1,3
indx(i,:)%re = [(i,j=1,3)]
indx(i,:)%im = [(j,j=1,3)]
end do
or
do i=1,3
indx(i,:)%re = i ! Using scalar to array assignment
indx(i,:)%im = [(j,j=1,3)]
end do
And we could go all the way to
indx%re = RESHAPE([((i,i=1,3),j=1,3))],[3,3])
indx%im = RESHAPE([((j,i=1,3),j=1,3))],[3,3])
Again, that's all in the name of variety or for other applications. There's even spread to consider in some of these. But don't hate the person reviewing your code.
1 That's constants not constant expresssions.
A similar question was answered in Fortran runtime warning: temporary array. However, the solutions do not quite help me in my case.
Inside a subroutine, I have a subroutine call as:
subroutine initialize_prim(prim)
real(kind=wp), dimension(2, -4:204), intent(out) :: prim
call double_gaussian(prim(1, :))
end subroutine initialize_prim
subroutine double_gaussian(y)
real(kind=wp), dimension(-4:204), intent(out) :: y
integer :: i
do i = -4, 204
y(i) = 0.5 * ( &
exp(-((r(i) - r0))**2) + exp(-((r(i) + r0)/std_dev)**2))
end do
end subroutine double_gaussian
This gives an error message saying that fortran creates a temporary array for "y" in "double_gaussian". Having read a bit about continguous arrays, I understand why this error appears.
Now, looking at my whole program, it would be very tedious to invert the order of the arrays for "prim", so that solution is not really possible.
For creating assumed-shapes in "double_gaussian", I tried doing,
real(kind=wp), dimension(:), intent(out) :: y
integer :: i
do i = -4, 204
y(i) = 0.5 * ( &
exp(-((r(i) - r0))**2) + exp(-((r(i) + r0)/std_dev)**2))
end do
end subroutine double_gaussian
This, however, causes fortran to crash with the error message
"Index '-4' of dimension 1 of array 'y' below lower bound of 1".
It seems that for the assumed-shape format, the indexing is nonetheless assumed to start with 1, whereas it starts at -4 as in my case.
Is there a way to resolve this issue?
I think that you have perhaps misinterpreted a compiler warning as an error. Usually compilers issue a warning when they create temporary arrays - it's a useful aid to high-performance programming. But I'm not sure a compiler ever regards that as an error. And yes, I understand why you might not want to re-order your array just to avoid that
As for the crash - you have discovered that Fortran routines don't automagically know about the lower bounds of arrays which you have carefully set to be other than 1 (nor their upper bounds either). If it is necessary you have to pass the bounds (usually only the lower bound, the routine can figure out the upper bound itself) in the argument list.
However, it rarely is necessary, and it doesn't seem to be in your code - the loop to set each value of the y array could (if I understand correctly) be replaced by
y = 0.5 * (exp(-((r - r0))**2) + exp(-((r + r0)/std_dev)**2))
PS I think that this part of your question, about routines not respecting other-than-1 array lower bounds, is almost certainly a duplicate of several others asked hereabouts but which I couldn't immediately find.
I am wanting to mask a Fortran array. Here's the way I am currently doing it...
where (my_array <=15.0)
mask_array = 1
elsewhere
mask_array = 0
end where
So then I get my masked array with:
masked = my_array * mask_array
Is there a more concise way to do this?
Use the MERGE intrinsic function:
masked = my_array * merge(1,0,my_array<=15.0)
Or, sticking with where,
masked = 0
where (my_array <=15.0) masked = my_array
I expect that there are differences, in speed and memory consumption, between the use of where and the use of merge but off the top of my head I don't know what they are.
There are two different approaches already given here: one retaining where and one using merge. In the first, High Performance Mark mentions that there may be differences in speed and memory use (think about temporary arrays). I'll point out another potential consideration (without making a value judgment).
subroutine work_with_masked_where(my_array)
real, intent(in) :: my_array(:)
real, allocatable :: masked(:)
allocate(masked(SIZE(my_array)), source=0.)
where (my_array <=15.0) masked = my_array
! ...
end subroutine
subroutine work_with_masked_merge(my_array)
real, intent(in) :: my_array(:)
real, allocatable :: masked(:)
masked = MERGE(my_array, 0., my_array<=15.)
! ...
end subroutine
That is, the merge solution can use automatic allocation. Of course, there are times when one doesn't want this (such as when working with lots of my_arrays of the same size: there are often overheads when checking array sizes in these cases): use masked(:) = MERGE(...) after handling the allocation (which may be relevant even for the question code).
I find it useful to define a function where which takes an array of logicals and returns the integer indices of the .true. values, so e.g.
x = where([.true., .false., .false., .true.]) ! sets `x` to [1, 4].
This function can be defined as
function where(input) result(output)
logical, intent(in) :: input(:)
integer, allocatable :: output(:)
integer :: i
output = pack([(i, i=1, size(input))], input)
end function
With this where function, your problem can be solved as
my_array(where(my_array>15.0)) = 0
This is probably not the most performant way of doing this, but I think it is very readable and concise. This where function can also be more flexible than the where intrinsic, as it can be used e.g. for specific dimensions of multi-dimensional arrays.
Limitations:
Note however that (as #francescalus points out) this will not work for arrays which are not 1-indexed. This limitation cannot easily be avoided, as performing comparison operations on such arrays drops the indexing information, e.g.
real :: my_array(-2,2)
integer, allocatable :: indices(:)
my_array(-2:2) = [1,2,3,4,5]
indices = my_array>3
write(*,*) lbound(indices), ubound(indices) ! Prints "1 5".
For arrays which are not 1-indexed, in order to use this where function you would need the rather ugly
my_array(where(my_array>15.0)+lbound(my_array)-1) = 0
I need cells index numbers, which fulfil following conditions:
Q(i)<=5 and V(i)/=1
(size(Q)==size(V)). I wrote something like this:
program test
implicit none
integer, allocatable, dimension(:):: R
integer Q(5)
integer V(5)
integer counter,limit,i
counter=0
limit=5
V=(/0,0,1,0,0/)
Q=(/5,10,2,7,2/)
do i=1,5
if((Q(i)<=5).AND.(V(i)/=1)) then
counter=counter+1
end if
end do
allocate(R(counter))
counter=1
do i=1,5
if((Q(i)<=5).AND.(V(i)/=1)) then
R(counter)=i
counter=counter+1
end if
end do
deallocate(R)
end program test
but I don't think it is a very efficient . Is there any better solution for this problem?
I can remove one loop by writing
program test
implicit none
integer, allocatable, dimension(:):: R
integer Q(5)
integer V(5)
integer counter,limit,i
counter=0
limit=5
V=(/0,0,1,0,0/)
Q=(/5,10,2,7,2/)
V=-V+1
allocate(R((count(V*Q<=5)-count(V*Q==0))))
counter=1
do i=1,size(Q)
if((Q(i)<=5).AND.(V(i)==1)) then
R(counter)=i
counter=counter+1
end if
end do
end program test
The question is very close to being a duplicate but explaining why would be a cumbersome comment.
Answers to that question take advantage of a common idiom:
PACK((/(i,i=1,SIZE(mask))/), mask)
This returns an array of 1-based indexes corresponding to .TRUE. elements of the logical array mask. For that question mask was the result of arr.gt.min but mask can be any rank-1 logical array.
Here, mask could well be Q.le.5.and.V.ne.1 (noting Q and V are the same length`).
In Fortran 95 (which is why I'm using (/../) and .ne.) one doesn't have access to the modern feature of automatic array allocation, so a manual allocation will be required. Something like
logical mask(5)
mask = Q.le.5.and.V.ne.1
ALLOCATE(R(COUNT(mask))
R = PACK((/(i,i=1,5)/),mask)
As an incentive to use a modern compiler, with Fortran 2003 compliance enabled, this is the same as
R = PACK((/(i,i=1,5)/), Q.le.5.and.V.ne.1)
(with appropriate other declarations, etc.)
When considering doing this creation in a subroutine it is exceptionally important to think about array bounds if using non-1-based indexing or subarrays. See my answer in the linked question for details.
i have allocated a lot of 2D arrays in my code, and I want each one array to read from a file named as array's name. The problem is that each array has different size, so I am looking for the most efficient way. The code is like this:
Module Test
USE ...
implicit NONE
private
public:: initializeTest, readFile
real(kind=8),dimension(:,:),allocatable,target:: ar1,ar2,ar3,ar4,ar5,...,ar10
real(kind=8),dimension(:,:),pointer:: pAr
CONTAINS
!
subroutine initializeTest
integer:: k1,k2,k3,k4,k5
integer:: ind1,ind2
allocate(ar1(k1,k1),ar2(k1,k2),ar3(k2,k4),ar4(k5,k5),...) !variable sizes
! here needs automatization - since its repeated
pAr => ar1
ind1 = size(pAr,1)
ind2 = size(pAr,2)
call readFile(par,ind1,ind2)
pAr => ar2
ind1 = size(pAr,1)
ind2 = size(pAr,2)
call readFile(par,ind1,ind2)
!....ar3, ... , ar9
pAr => ar10
ind1 = size(pAr,1)
ind2 = size(pAr,2)
call readFile(par,ind1,ind2)
end subroutine initializeTest
!
!
subroutine readFile(ar,row,col)
real(kind=8),dimension(row,col)
integer:: i,j,row,col
! it should open the file with same name as 'ar'
open(unit=111,file='ar.dat')
do i = 1, row
read(222,*) (ar(i,j),j=1,col)
enddo
end subroutine importFile
!
!
end module Test
If your arrays ar1, ar2, etc. had the same dimensions you could put them all in a 3-dimensional array. Since they have different dimensions, you can define a derived type, call it a "matrix", with an allocatable array component and then create an array of that derived type. Then you can read the i'th matrix from a file such as "input_1.txt" for i=1.
The program below, which works with g95 and gfortran, shows how the derived type can be declared and used.
module foo
implicit none
type, public :: matrix
real, allocatable :: xx(:,:)
end type matrix
end module foo
program xfoo
use foo, only: matrix
implicit none
integer, parameter :: nmat = 9
integer :: i
character (len=20) :: fname
type(matrix) :: y(nmat)
do i=1,nmat
allocate(y(i)%xx(i,i))
write (fname,"('input_',i0)") i
! in actual code, read data into y(i)%xx from file fname
y(i)%xx = 0.0
print*,"read from file ",trim(fname)
end do
end program xfoo
As far as I know, extracting the name of the variable from the variable at runtime isn't going to work.
If you need lots of automation for the arrays, consider using an array of a derived type, as one other answer suggests, in order to loop over them both for allocation and reading. Then you can enumerate the files, or store a label with the derived type.
Sticking to specific array names, an alternative is to just read/write the files with the required name as an argument to the routine:
Module Test
...
! here needs automatization - since its repeated
call readFile(ar1,'ar1')
call readFile(ar2,'ar2')
!....ar3, ... , ar9
call readFile(ar10,'ar10')
end subroutine initializeTest
subroutine readFile(ar,label)
real(kind=8) :: ar(:,:)
character(len=*) :: label
integer:: i,j,nrow,ncol,fd
nrow=size(ar,1)
ncol=size(ar,2)
open(newunit=fd,file=label)
do i = 1, row
read(fd,*) (ar(i,j),j=1,col)
enddo
end subroutine readFile
end module Test
Some unsollicited comments: I don't really get why (in this example) readFile is public, why the pointers are needed? Also, kind=8 shouldn't be used (Fortran 90 kind parameter).