Passing array of unknown size (subroutine output) to another subroutine - arrays

I'm new to Intel MKL. Here's a problem I've come across -- apparently a problem not related to MKL itself, but to the problem of how to declare and pass an array of hitherto unknown size as an output of a subroutine to another subroutine.
I'm trying to use mkl_ddnscsr to convert a matrix to its CSR format suitable for calling by Pardiso:
CALL mkl_ddnscsr(job,Nt,Nt,Adns,Nt,Acsr,ja,ia,info)
CALL PARDISO(pt,1,1,11,13,Nt,Acsr,ia,ja,perm,1,iparm,0,b,x,errr)
Problem is, I have no idea what the length of the CSR form Acsr and the index vector ja before calling the mkl_ddnscsr subroutine. How should one declare Acsr and ja in the main program, or the subroutine where these two lines are located?
I tried something like
INTERFACE
SUBROUTINE mkl_ddnscsr(job, m, n, Adns, lda, Acsr, ja, ia, info)
IMPLICIT NONE
INTEGER :: job(8)
INTEGER :: m, n, lda, info
INTEGER, ALLOCATABLE :: ja(:)
INTEGER :: ia(m+1)
REAL(KIND=8), ALLOCATABLE :: Acsr(:)
REAL(KIND=8) :: Adns(:)
END SUBROUTINE
END INTERFACE
followed by
INTEGER, ALLOCATABLE :: ja(:)
REAL(KIND=8), ALLOCATABLE :: Acsr(:)
outside the INTERFACE, in the main program. But this configuration gives me the segmentation fault upon running.
On the other hand, if I try something like
INTERFACE
SUBROUTINE mkl_ddnscsr(job, m, n, Adns, lda, Acsr, ja, ia, info)
IMPLICIT NONE
INTEGER :: job(8)
INTEGER :: m, n, lda, info
INTEGER :: ja(:), ia(m+1)
REAL(KIND=8) :: Acsr(:), Adns(:)
END SUBROUTINE
END INTERFACE
and then
INTEGER, DIMENSION(:) :: ja
REAL(KIND=8), DIMENSION(:) :: Acsr
Then ifort would give me the following message:
error #6596: If a deferred-shape array is intended, then the ALLOCATABLE or POINTER attribute is missing; if an assumed-shape array is intended, the array must be a dummy argument.
Anyone got any idea how to work around this? What's the right way to declare ja and Acsr in the main program (or main subroutine) and pass them around?
Note that the subroutines are part of the Intel MKL package, not something I write on my own, so it appears that module would be out of the question.

You can find the interface for mkl_ddnscsr from the manual page, or from the include file mkl_spblas.fi in your MKL install directory (e.g., /path/to/mkl/include/).
INTERFACE
subroutine mkl_ddnscsr ( job, m, n, Adns, lda, Acsr, AJ, AI, info )
integer job(8)
integer m, n, lda, info
integer AJ(*), AI(m+1)
double precision Adns(*), Acsr(*)
end
END INTERFACE
Because this routine has only Fortran77-style dummy arguments (i.e., explicit-shape array AI(m+1) or assumed-size arrays like Adns(*)), you can pass any local or allocatable arrays (after allocated in the caller side) as actual arguments. Also, it is not mandatory to write an interface block explicitly, but it should be useful to include it (in the caller side) to detect potential interface mismatch.
According to the manual, it looks like mkl_ddnscsr (a routine for converting a dense to sparse matrix) works something like this:
program main
implicit none
! include 'mkl_spblas.fi' !! or mkl.fi (not mandatory but recommended)
integer :: nzmax, nnz, job( 8 ), m, n, lda, info, irow, k
double precision :: A( 10, 20 )
double precision, allocatable :: Asparse(:)
integer, allocatable :: ia(:), ja(:)
A( :, : ) = 0.0d0
A( 2, 3 ) = 23.0d0
A( 2, 7 ) = 27.0d0
A( 5, 4 ) = 54.0d0
A( 9, 9 ) = 99.0d0
!! Give an estimate of the number of non-zeros.
nzmax = 10
!! Or assume that non-zeros occupy at most 2% of A(:,:), for example.
! nzmax = size( A ) / 50
!! Or count the number of non-zeros directly.
! nzmax = count( abs( A ) > 0.0d0 )
print *, "nzmax = ", nzmax
m = size( A, 1 ) !! number of rows
n = size( A, 2 ) !! number of columns
lda = m !! leading dimension of A
allocate( Asparse( nzmax ) )
allocate( ja( nzmax ) ) !! <-> columns(:)
allocate( ia( m + 1 ) ) !! <-> rowIndex(:)
job( 1 ) = 0 !! convert dense to sparse A
job( 2:3 ) = 1 !! use 1-based indices
job( 4 ) = 2 !! use the whole A as input
job( 5 ) = nzmax !! maximum allowed number of non-zeros
job( 6 ) = 1 !! generate Asparse, ia, and ja as output
call mkl_ddnscsr( job, m, n, A, lda, Asparse, ja, ia, info )
if ( info /= 0 ) then
print *, "insufficient nzmax (stopped at ", info, "row)"; stop
endif
nnz = ia( m+1 ) - 1
print *, "number of non-zero elements = ", nnz
do irow = 1, m
!! This loop runs only for rows having nonzero elements.
do k = ia( irow ), ia( irow + 1 ) - 1
print "(2i5, f15.8)", irow, ja( k ), Asparse( k )
enddo
enddo
end program
Compiling with ifort -mkl test.f90 (with ifort14.0) gives the expected result
nzmax = 10
number of non-zero elements = 4
2 3 23.00000000
2 7 27.00000000
5 4 54.00000000
9 9 99.00000000
As for the determination of nzmax, I think there are at least three ways for this: (1) just use a guess value (as above); (2) assume the fraction of nonzero elements in the whole array; or (3) directly count the number of nonzeros in the dense array. In any case, because we have the exact number of nonzeros as output (nnz), we could re-allocate Asparse and ja to have the exact size (if necessary).
Similarly, you can find the interface for PARDISO from the include file mkl_pardiso.fi or from this (or this) page.

Related

Fortran array input

I haven't done any Fortran programming for year and it seems I'm rather rusty now. So, I won't provide you with all my failed attempts but will humbly ask you to help me with the following.
I've got the following "input" file
1 5 e 4
A b & 1
c Z ; b
y } " N
t r ' +
It can have more columns and/or rows. I would now like to assign each of these ASCII characters to arrays x(i,j) so that I can process them further after ICHAR conversions. In this example i=1,4, j=1,5, but it can be any No depending on the input file. The simplest example
PROGRAM Example
integer :: i, j
CHARACTER, ALLOCATABLE, DIMENSION(:,:) :: A
READ *, A
ALLOCATE (A(i,j))
PRINT *, A
END PROGRAM Example
compiles (Example.f95) but
cat input | ./Example.f95
does not give any output.
I would greatly appreciate an advice on how to import the afore-mentioned strings into the program as x(i,j) terms of an array.
In Fortran, it's always best to know in advance how big your arrays need to be. I understand that in your case you can't know.
Assuming that your input is at least formatted correctly (i.e. the columns match up and have only a single space in between them), I've created a code that should in theory be able to read them in an arbitrary shape. (Not quite arbitrary, it assumes that there are fewer than 511 columns.)
It uses two ways:
It simply reads the first line in at once (1024 characters, hence the 511 limit on columns) then calculates from the length the number of columns
It then allocates an array with a guessed number of rows, and once it notices that the guess was too small, it creates a new allocation with double the number of rows. It then uses the move_alloc command to swap the allocations.
To find when it should end reading the values, it simply checks whether the read returns the IOSTAT_END error code.
Here's the code:
program read_input
use iso_fortran_env, only: IOSTAT_END
implicit none
character, dimension(:,:), allocatable :: A, A_tmp
character(len=1024) :: line ! Assumes that there are never more than 500 or so columns
integer :: i, ncol, nrow, nrow_guess
integer :: ios
character :: iom
! First, read the first line, to see how many columns there are
read(*, '(A)', iostat=ios, iomsg=iom) line
call iocheck('read first line', ios, iom)
ncol = (len_trim(line) + 1) / 2
! Let's first allocate memory for two rows, we can make it more later.
nrow_guess = 2
allocate(A(ncol, nrow_guess))
! Instead of standard input, we're reading from the line we read before.
read(line, '(*(A1,X))', iostat=ios, iomsg=iom) A(:, 1)
call iocheck('read first line into vals', ios, iom)
! Now loop over all the rows
nrow = 1
read_loop: do
if (nrow >= nrow_guess) then ! We have guessed too small.
! This is a bit convoluted, but the best
! way to increase the array shape.
nrow_guess = nrow_guess * 2
allocate(A_tmp(ncol, nrow_guess))
A_tmp(:, 1:nrow_guess/2) = A(:,:)
call move_alloc(A_tmp, A)
end if
read(*, '(*(A1,X))', iostat = ios, iomsg=iom) A(:, nrow+1)
if (ios == IOSTAT_END) exit read_loop ! We're done reading.
call iocheck('read line into vals', ios, iom)
nrow = nrow + 1
end do read_loop
! The last guess was probably too large,
! let's move it to an array of the correct size.
if (nrow < nrow_guess) then
allocate(A_tmp(ncol, nrow))
A_tmp(:,:) = A(:, 1:nrow)
call move_alloc(A_tmp, A)
end if
! To show we have all values, print them out.
do i = 1, nrow
print '(*(X,A))', A(:, i)
end do
contains
! This is a subroutine to check for IO Errors
subroutine iocheck(op, ios, iom)
character(len=*), intent(in) :: op, iom
integer, intent(in) :: ios
if (ios == 0) return
print *, "IO ERROR"
print *, "Operation: ", op
print *, "Message: ", iom
end subroutine iocheck
end program read_input
Edited to add
I had trouble with the special characters in your example input file, otherwise I'd just have made a read(*, *) A(:, nrow) -- but that messed the special characters up. That's why I chose the explicit (*(A1, X)) format. Of course that messes up when your characters don't start at the first position in the line.
You need to read the first line and determine how characters there in the line. Then read the entire file to determine the number of lines. Allocate the 2D array to hold characters. Then read the file and parse each line into the 2D array. There are more elegant ways of doing this, but here you go
program foo
implicit none
character(len=:), allocatable :: s
character, allocatable :: a(:,:)
integer fd, i, j, n, nr, nc
!
! Open file for reading
!
open(newunit=fd, file='tmp.dat', status='old', err=9)
!
! Determine number of characters in a row. Assumes all rows
! are of the same length.
!
n = 128
1 if (allocated(s)) then
deallocate(s)
n = 2 * n
end if
allocate(character(len=n) :: s)
read(fd,'(A)') s
if (len_trim(s) == 128) goto 1
s = adjustl(s)
n = len_trim(s)
deallocate(s)
!
! Allocate a string of the correct length.
!
allocate(character(len=n) :: s)
!
! Count the number of rows
!
rewind(fd)
nr = 0
do
read(fd,*,end=2)
nr = nr + 1
end do
!
! Read file and store individual characters in a(:,:)
!
2 rewind(fd)
nc = n / 2 + 1
allocate(a(nr,nc))
do i = 1, nr
read(fd,'(A)') s
do j = 1, nc
a(i,j) = s(2*j-1:2*j-1)
end do
end do
close(fd)
write(s,'(I0)') nc
s = '(' // trim(s) // '(A,1X))'
do i = 1, nr
write(*,s) a(i,:)
end do
stop
9 write(*,'(A)') 'Error: cannot open tmp.dat'
end program foo
Apparently, GOTO is verbotem, here. Here's an elegant solution.
program foo
implicit none
character, allocatable :: s(:), a(:,:)
integer fd, i, j, n, nr, nc
! Open file for reading
open(newunit=fd, file='tmp.dat', status='old', access='stream', err=9)
inquire(fd, size = n) ! Determine file size.
allocate(s(n)) ! Allocate space
read(fd) s ! Read the entire file
close(fd)
nr = count(ichar(s) == 10) ! Number of rows
nc = (count(ichar(s) /= 32) - nr) / nr ! Number of columns
a = reshape(pack(s, ichar(s) /= 32 .and. ichar(s) /= 10), [nc,nr])
a = transpose(a)
do i = 1, nr
do j = 1, nc
write(*,'(A,1X)',advance='no') a(i,j)
end do
write(*,*)
end do
stop
9 write(*,'(A)') 'Error: cannot open tmp.dat'
end program foo

Clever/fast way of array multiplication and summation

I have to solve a double integral
in my program, that can be translated into the i-,j- loops in the following minimum working example:
program test
implicit none
integer :: i, j, n
double precision, allocatable :: y(:), res(:), C(:,:,:)
n=200
allocate(y(n), res(n), C(n,n,n))
call random_number(y)
call random_number(C)
res = 0.d0
do i=1, n
do j=1, n
res(:) = res(:) + y(i) * y(j) * C(:, j, i)
end do
end do
deallocate(y, res, C)
end program test
I have to solve this integral multiple times per execution and profiling tells me that it's the bottle neck of my calculation consuming more than 95 % of the execution time.
I was wondering whether there's any possibility to solve this in a more clever, i.e., fast way and maybe get rid of one or both of the loops.
My question is not to optimize the code with compiler flags or parallelization, but whether the double loop is the best practice to tackle the given problem. Usually loops are slow and I try to avoid them. I was thinking that it might be possible to avoid the loops by reshaping or spreading the arrays. But I just don't see it.
If you write the double loop in Matrix notation,y(i)*y(j) becomes the diadic YY^t, with Y being a n x 1 matrix. With this you can re-write the loop to (pseudo-code)
do n=1,size(C,1)
res(n) = sum( YY^t * C_n )
enddo
where C_n = C(n,:,:) and * is an element-wise multiplication. Apart from the element-wise calculation you already did, this leaves you two additional ways of calculating the results:
res(n) = sum( (YY^t) * C_n )
res(n) = sum( Y * (Y^t C_n) )
In both cases, it is beneficial to have contiguous data and re-order the array C:
do i=1,n
C2(:,:,i) = C(i,:,:)
enddo !i
The number of floating point operations is the same with both approaches and slightly less than in the original approach. So let's measure the time for all of them...
Here are the implementations using LAPACK for the matrix operations (and using dot products where applicable):
1. sum( (YY^t) * C_n )
call system_clock(iTime1)
call dgemm('N','N',n,n,1,1.d0,y,n,y,1,0.d0,mat,n)
nn=n*n
do i=1,n
res(i) = ddot( nn, mat, 1, C2(:,:,i), 1 )
enddo !i
2. sum( Y * (Y^t C_n) )
do i=1,n
call dgemm('N','N',1,n,n,1.d0,y,1,C2(:,:,i),n,0.d0,vec,1)
res(i) = ddot( n, y, 1, vec, 1 )
enddo !i
The outcome is as follows:
Orig: 0.111000001
sum((YY^t)C): 0.116999999
sum(Y(Y^tC)): 0.187000006
Your original implementation is the fastest! Why? Most probably due to the ideal usage of the cache on the CPU. Fortran compilers typically are very smart in optimizing loops, and in the element-wise calculation, you simply add and scale vectors, without any matrix operation. This can be utilized very efficiently.
So, is there room for improvement? Certainly :) The operation you are performing inside the loop is commonly known as axpy: y = a*x + y. This is a commonly used BLAS subroutine - usually highly optimized.
Utilizing this leads to
res = 0.d0
do i=1, n
do j=1, n
call daxpy(n, y(i)*y(j), C(:,j,i), 1, res, 1)
end do
end do
and takes
Orig (DAXPY): 0.101000004
Which is roughly 10% faster.
Here is the complete code, all measurements have been performed with OpenBLAS and with n=500 (to better see the impact)
program test
implicit none
integer :: i, j, n, nn
double precision, allocatable, target :: y(:), res(:), resC(:), C(:,:,:), C2(:,:,:), mat(:,:), vec(:)
integer :: count_rate, iTime1, iTime2
double precision :: ddot
n=500
allocate(y(n), res(n), resC(n), C(n,n,n), C2(n,n,n), mat(n,n), vec(n))
call random_number(y)
call random_number(C)
! Get the count rate
call system_clock(count_rate=count_rate)
! Original Aproach
call system_clock(iTime1)
res = 0.d0
do i=1, n
do j=1, n
res(:) = res(:) + y(i) * y(j) * C(:, j, i)
end do
end do
call system_clock(iTime2)
print *,'Orig: ',real(iTime2-iTime1)/real(count_rate)
! Original Aproach, DAXPY
call system_clock(iTime1)
resC = 0.d0
do i=1, n
do j=1, n
call daxpy(n, y(i)*y(j), C(:,j,i), 1, resC, 1)
end do
end do
call system_clock(iTime2)
print *,'Orig (DAXPY): ',real(iTime2-iTime1)/real(count_rate)
! print *,maxval( abs(resC-res) )
! Re-order
do i=1,n
C2(:,:,i) = C(i,:,:)
enddo !i
! sum((YY^t)C)
call system_clock(iTime1)
call dgemm('N','N',n,n,1,1.d0,y,n,y,1,0.d0,mat,n)
nn=n*n
do i=1,n
resC(i) = ddot( nn, mat, 1, C2(:,:,i), 1 )
enddo !i
call system_clock(iTime2)
print *,'sum((YY^t)C): ',real(iTime2-iTime1)/real(count_rate)
! print *,maxval( abs(resC-res) )
! sum(Y(Y^tC))
call system_clock(iTime1)
do i=1,n
call dgemm('N','N',1,n,n,1.d0,y,1,C2(:,:,i),n,0.d0,vec,1)
resC(i) = ddot( n, y, 1, vec, 1 )
enddo !i
call system_clock(iTime2)
print *,'sum(Y(Y^tC)): ',real(iTime2-iTime1)/real(count_rate)
! print *,maxval( abs(resC-res) )
end program test

Error 542 - N appears in the dimension of a variable

program factorial
implicit none
integer:: n1
real:: fact = 1.0
integer:: n = n1
integer, dimension(1:n):: x
integer:: i
print *, "Enter a number:"
read *, n1
x(1) = n1
do i=1,n1-1
x(i+1) = n1-i
fact = fact*x(i)
end do
print *, fact
end program factorial
I have written a code for calculating factorial of a number. I am asking the user to put in an integer 'n1', after which it will create an array variable containing n1 compartments. I am unsuccessful in compiling this. I keep getting the following error!
factorial.F95(6) : error 542 - N appears in the dimension of a variable, yet is not a dummy argument, a variable available through USE or CONTAINS association, a COMMON variable, a PARAMETER, or a PURE FUNCTION
Compilation failed.
How can I fix this? I want the array dimension to be equal to the input number. For example, say I want to calculate 5! (5 factorial), I want the x array to be of 5 (row or column) element length. Somehow, I am unable to do that!
The constant n1 needs to be a compile time constant to be used as a static array dimension
program factorial
implicit none
integer, parameter:: n1
integer, dimension(1:n1):: x
or you need to use allocatable arrays.
As Vladimir suggested, you have to allocate the array:
integer, dimension(:), allocatable :: x
integer :: alloc_stat
print *, "Enter a number:"
read *, n1
ALLOCATE( x(1:n1), STAT=alloc_stat )
IF ( alloc_stat .ne. 0 ) THEN
WRITE(ERROR_UNIT,*) "Array allocation failed."
ERROR_STOP alloc_stat
ENDIF
(I always check the status of my ALLOCATE statements. It's not whether you are paranoid or not, it's whether you're paranoid enough.)

FORTRAN: out of bounds check fails in subroutines?

I have a rather simple piece of code (reduced to the essentials from bigger program).
I pass an array and the size of the array to a subroutine. I get no error if the passed size does not match the actual size of the array. And I can even manipulate the parts of the array that don't "exist" !!! (I overwrite memory that I shouldn't overwrite).
Here is the subroutine:
subroutine sub(arr, narr)
implicit none
integer, intent(in) :: narr
double precision, dimension(narr) :: arr
integer :: j
do j = 1, narr
! print all the values
write(*, '("Arr[",I0,"] = ",f0.10)') j, arr(j)
! change the values
arr(j) = -10d0
enddo
end subroutine
and here the main program
program main
implicit none
integer, parameter :: narr = 5
! the array is made smaller
double precision, dimension(narr - 2) :: array
integer :: j
! assign values to array
array = (/ (1d0*j, j = 1,narr - 2) /)
! print using the subroutine
print*, "inside subroutine"
call sub(array, narr)
! print outside the subroutine
print *, " "
print *, "outside subroutine"
do j = 1, narr
write(*, '("Arr[",I0,"] = ",f0.10)') j, array(j)
enddo
end program
If I compile with ifort and "-check all" it only catches the error in the main program, but not in the subroutine.
Is there a way to catch also the error in the subroutine ?
Yes. Declare array as dimension(:) in the subroutine -- assumed-shape array. Using this Fortran >90 declaration requires that the procedure interface be known to the caller -- the easiest way is to have the procedure in a module and use that module in the caller. You don't actually need to pass the size of the array to the subroutine -- you can determine it as size(arr). I have left the argument narr to retain the bug.
module MySub
contains
subroutine sub(arr, narr)
implicit none
integer, intent(in) :: narr
double precision, dimension(:) :: arr
integer :: j
do j = 1, narr
! print all the values
write(*, '("Arr[",I0,"] = ",f0.10)') j, arr(j)
! change the values
arr(j) = -10d0
enddo
end subroutine
end module MySub
program main
use MySub
implicit none
integer, parameter :: narr = 5
! the array is made smaller
double precision, dimension(narr - 2) :: array
integer :: j
! assign values to array
array = (/ (1d0*j, j = 1,narr - 2) /)
! print using the subroutine
print*, "inside subroutine"
call sub(array,narr)
! print outside the subroutine
print *, " "
print *, "outside subroutine"
do j = 1, narr
write(*, '("Arr[",I0,"] = ",f0.10)') j, array(j)
enddo
end program

min and max of input array file (.dat) with subroutine

I try to implement a code that read in a number n, creates a vector to store n double precision numbers, read this number, call a subroutine printminmax() to find min and max. My code work perfect for normal numbers (integer,real etc) but when i have scientific notation (0.3412E+01) stack.Why? I thought with * read all the formats. Thanks
implicit none
integer, dimension(:), allocatable :: x
integer :: n
open (unit=77, file='input2.dat', action='read', status='old')
read(77,*), n
allocate(x(n))
call printminmax(n)
deallocate(x)
end
subroutine printminmax(y)
implicit none
integer, dimension(:), allocatable :: x
integer :: y,max,min,i
allocate(x(y))
read(77,*) x
!print *,'Maximun=', maxval(x)
!print *,'Minimun=', minval(x
!initialize the value max & min
max=x(1)
min=x(1)
do i=2,y
if (x(i)>max) max=x(i)
if (x(i)<min) min=x(i)
end do
write(*,*) 'Maximum=',max
write(*,*) 'Minimum=',min
end subroutine printminmax
one example of the stack input is
1000
5.39524398466520e-01
9.85099770130787e-01
7.38946122872518e-01
6.47771620257608e-01
8.80871051119695e-01
2.99375585725816e-02
the error that i take for scientific notation is
At line 13 of file io.f90 (unit = 77, file = 'input3.dat')
Fortran runtime error: Bad integer for item 1 in list input
ok i found it.I should have double precision on x, no integer.

Resources