This question already has answers here:
Why Segmentation fault is happening in this openmp code?
(2 answers)
Closed 6 years ago.
I am using Eclipse with GNU Fortran compiler to compute a large arrays to solve a matrix problem. However, I have read and notice that I am unable to read all my data into the array which causes my project.exe to crash when I invoke -fopenmp into my compiler settings; otherwise, the program works fine.
program Top_tier
integer, parameter:: n=145894, nz_num=4608168
integer ia(n+1), ja(nz_num)
double precision a(nz_num), rhs(n)
integer i
open (21, file='ia.dat')
do i=1, n+1
read(21,*) ia(i)
enddo
close(21)
open (21, file='a.dat')
do i=1, nz_num
read(21,*) a(i)
enddo
close(21)
open (21, file='ja.dat')
do i=1, nz_num
read(21,*) ja(i)
enddo
close(21)
open (21, file='b.dat')
do i=1, n
read(21,*) rhs(i)
enddo
close(21)
End
In my quest to find a solution around it, I have found the most probable cause is the limit of the stack size which can be seen by the fact that if I set nz_num to lesser or equal to 26561, the program will run properly. A possible solution is to set environment variable to increase stacksize but the program does not recognise when I type "setenv" or "export" OMP_STACKSIZE into the program. Am I doing something wrong? Is there any advise on how I can solve this problem?
Thanks!
You are allocating a, rhs, ia ja on the stack, which is why you are running out of stack space in the first place. I would suggest to always allocate large arrays on the heap:
integer, parameter:: n=145894, nz_num=4608168
integer, dimension(:), allocatable :: ia, ja
double precision, dimension(:), allocatable :: a, rhs
integer i
allocate(ia(n+1), ja(nz_num))
allocate(a(nz_num), rhs(n))
! rest of your code...
deallocate(ia, ja)
deallocate(a, rhs)
Instead of directly declaring your four arrays of a certain size (causing them to be allocated on the stack) you declare them as allocatable and give the shape of the arrays. Further down you can then allocate your arrays to the size you want. This size can be chosen at runtime. That means, if you are reading your arrays from a file you could also store the size of the arrays at the beginning of the file and use this for your allocate call.
Finally, as always with dynamically allocated memory, don't forget to deallocate them when you don't need them anymore.
Edit:
And I forgot to say that this doesn't really have anything to do with openmp except for that openmp threads probably have much small stack size limits (in this case it would be only the openmp master thread).
My (Fortran) code is very simple. All it does is filling up a large array, that depends on five (independent!) variables. Here is a brief example
do i = 1, imax
do j = 1, jmax
do k = 1, kmax
array(i,j,k) = ! some function of i,j,k
end do
end do
end do
I would to use different threads to fill the values of array in a faster way.
I thought the simplest way to achieve that would be to enclose the loop in these commands
!$OMP PARALLEL DO
!$OMP PARALLEL END
However, if I do this I get completely different results from the serial case. I apologize if the question is too simple, but I couldn't really find a proper example to help solve my problem. Can you recommend a solution or provide an example?
I don't exatly know what is happening, but it could be a race condition or just bad declaration of the directives. Try this and see if it works
!replace ... with variables that are constants as in shared(a,b,c)
!$omp parallel do default(private) shared(...)
do i=1,imax
j=1,jmax
k=1,kmax
array(i,j,k) = ! some function of i,j,k
end do
end do
end do
!$omp end parallel do
Is it possible in a modern Fortran compiler such as Intel Fortran to determine array strides at runtime? For example, I may want to perform a Fast Fourier Transform (FFT) on an array section:
program main
complex(8),allocatable::array(:,:)
allocate(array(17, 17))
array = 1.0d0
call fft(array(1:16,1:16))
contains
subroutine fft(a)
use mkl_dfti
implicit none
complex(8),intent(inout)::a(:,:)
type(dfti_descriptor),pointer::desc
integer::stat
stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a(:,1))
stat = DftiFreeDescriptor(desc)
end subroutine
end program
However, the MKL Dfti* routines need to be explicitly told the array strides.
Looking through reference manuals I have not found any intrinsic functions which return stride information.
A couple of interesting resources are here and here which discuss whether array sections are copied and how Intel Fortran handles arrays internally.
I would rather not restrict myself to the way that Intel currently uses its array descriptors.
How can I figure out the stride information? Note that in general I would want the fft routine (or any similar routine) to not require any additional information about the array to be passed in.
EDIT:
I have verified that an array temporary is not created in this scenario, here is a simpler piece of code which I have checked on Intel(R) Visual Fortran Compiler XE 14.0.2.176 [Intel(R) 64], with optimizations disabled and heap arrays set to 0.
program main
implicit none
real(8),allocatable::a(:,:)
pause
allocate(a(8192,8192))
pause
call random_number(a)
pause
call foo(a(:4096,:4096))
pause
contains
subroutine foo(a)
implicit none
real(8)::a(:,:)
open(unit=16, file='a_sum.txt')
write(16, *) sum(a)
close(16)
end subroutine
end program
Monitoring the memory usage, it is clear that an array temporary is never created.
EDIT 2:
module m_foo
implicit none
contains
subroutine foo(a)
implicit none
real(8),contiguous::a(:,:)
integer::i, j
open(unit=16, file='a_sum.txt')
write(16, *) sum(a)
close(16)
call nointerface(a)
end subroutine
end module
subroutine nointerface(a)
implicit none
real(8)::a(*)
end subroutine
program main
use m_foo
implicit none
integer,parameter::N = 8192
real(8),allocatable::a(:,:)
integer::i, j
real(8)::count
pause
allocate(a(N, N))
pause
call random_number(a)
pause
call foo(a(:N/2,:N/2))
pause
end program
EDIT 3:
The example illustrates what I'm trying to achieve. There is a 16x16 contiguous array, but I only want to transform the upper 4x4 array. The first call simply passes in the array section, but it doesn't return a single one in the upper left corner of the array. The second call sets the appropriate stride and a subsequently contains the correct upper 4x4 array. The stride of the upper 4x4 array with respect to the full 16x16 array is not one.
program main
implicit none
complex(8),allocatable::a(:,:)
allocate(a(16,16))
a = 0.0d0
a(1:4,1:4) = 1.0d0
call fft(a(1:4,1:4))
write(*,*) a(1:4,1:4)
pause
a = 0.0d0
a(1:4,1:4) = 1.0d0
call fft_stride(a(1:4,1:4), 1, 16)
write(*,*) a(1:4,1:4)
pause
contains
subroutine fft(a) !{{{
use mkl_dfti
implicit none
complex(8),intent(inout)::a(:,:)
type(dfti_descriptor),pointer::desc
integer::stat
stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a(:,1))
stat = DftiFreeDescriptor(desc)
end subroutine !}}}
subroutine fft_stride(a, s1, s2) !{{{
use mkl_dfti
implicit none
complex(8),intent(inout)::a(:,:)
integer::s1, s2
type(dfti_descriptor),pointer::desc
integer::stat
integer::strides(3)
strides = [0, s1, s2]
stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
stat = DftiSetValue(desc, DFTI_INPUT_STRIDES, strides)
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a(:,1))
stat = DftiFreeDescriptor(desc)
end subroutine !}}}
end program
I'm guessing you get confused because you worked around the explicit interface of the MKL function DftiComputeForward by giving it a(:,1). This is contiguous and doesn't need an array temporary. It's wrong, however, the low-level routine will get the whole array and that's why you see that it works if you specify strides. Since the DftiComputeForward exects an array complex(kind), intent inout :: a(*), you can work by passing it through an external subroutine.
program ...
call fft(4,4,a(1:4,1:4))
end program
subroutine fft(m,n,a) !{{{
use mkl_dfti
implicit none
complex(8),intent(inout)::a(*)
integer :: m, n
type(dfti_descriptor),pointer::desc
integer::stat
stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, (/m,n/) )
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a)
stat = DftiFreeDescriptor(desc)
end subroutine !}}}
This will create an array temporary though when going into the subroutine. A more efficient solution is then indeed strides:
program ...
call fft_strided(4,4,a,16)
end program
subroutine fft_strided(m,n,a,lda) !{{{
use mkl_dfti
implicit none
complex(8),intent(inout)::a(*)
integer :: m, n, lda
type(dfti_descriptor),pointer::desc
integer::stat
integer::strides(3)
strides = [0, 1, lda]
stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, (/m,n/) )
stat = DftiSetValue(desc, DFTI_INPUT_STRIDES, strides)
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a)
stat = DftiFreeDescriptor(desc)
end subroutine !}}}
Tho routine DftiComputeForward accepts an assumed size array. If you pass something complicated and non-contiguous, a copy will have to be made at passing. The compiler can check at run-time if the copy is actually necessary or not. In any case for you the stride is always 1, because that will be the stride the MKL routine will see.
In your case you pass A(:,something), this is a contiguous section, provided A is contiguous. If A is not contiguous a copy will have to be made. Stride is always 1.
Some of the answers here do not understand the different between fortran strides and memory strides (though they are related).
To answer your question for future readers beyond the specific case you have here - there does not appear to be away to find an array stride solely in fortran, but it can be done via C using inter-operability features in newer compilers.
You can do this in C:
#include "stdio.h"
size_t c_compute_stride(int * x, int * y)
{
size_t px = (size_t) x;
size_t py = (size_t) y;
size_t d = py-px;
return d;
}
and then call this function from fortran on the first two elements of an array, e.g.:
program main
use iso_c_binding
implicit none
interface
function c_compute_stride(x, y) bind(C, name="c_compute_stride")
use iso_c_binding
integer :: x, y
integer(c_size_t) :: c_compute_stride
end function
end interface
integer, dimension(10) :: a
integer, dimension(10,10) :: b
write(*,*) find_stride(a)
write(*,*) find_stride(b(:,1))
write(*,*) find_stride(b(1,:))
contains
function find_stride(x)
integer, dimension(:) :: x
integer(c_size_t) :: find_stride
find_stride = c_compute_stride(x(1), x(2))
end function
end program
This will print out:
4
4
40
In short: assumed-shape arrays always have stride 1.
A bit longer: When you pass a section of an array to a subroutine which takes an assumed-shape array, as you have here, then the subroutine doesn't know anything about the original size of the array. If you look at the upper- and lower-bounds of the dummy argument in the subroutine, you'll see they will always be the size of the array section and 1.
integer, dimension(10:20) :: array
integer :: i
array = [ (i, i=10,20) ]
call foo(array(10:20:2))
subroutine foo(a)
integer, dimension(:) :: a
integer :: i
print*, lbound(a), ubound(a)
do i=lbound(a,1), ubound(a,2)
print*, a(i)
end do
end subroutine foo
This gives the output:
1 6
10 12 14 16 18 20
So, even when your array indices start at 10, when you pass it (or a section of it), the subroutine thinks the indices start at 1. Similarly, it thinks the stride is 1. You can give a lower bound to the dummy argument:
integer, dimension(10:) :: a
which will make lbound(a) 10 and ubound(a) 15. But it's not possible to give an assumed-shape array a stride.
I am having an issue with private arrays when using the !$OMP TASK construct. Arrays listed as PRIVATE for tasks are crashing/becoming corrupted when their bounds are given by input parameters in the subroutine. I am using static arrays to avoid the usual issues with allocatable arrays and !$OMP PARALLEL PRIVATE.
The following simplified code reproduces the issue, and crashes with SIGSEV:
SUBROUTINE do_work(n_in)
USE omp_lib
IMPLICIT NONE
INTEGER, INTENT(IN) :: n_in
INTEGER :: i, counter
REAL, DIMENSION(n_in) :: a
REAL, DIMENSION(20) :: b
!$OMP PARALLEL PRIVATE(a,i)
!$OMP SINGLE
counter = 0
DO WHILE(counter .LE. 20)
!$OMP TASK FIRSTPRIVATE(counter) PRIVATE(a,i)
a(:) = 5.0
DO i = 1,n_in
a(1) = a(1) + a(i)
END DO
b(counter) = a(1)
!$OMP END TASK
counter = counter + 1
END DO
!$OMP END SINGLE
!$OMP END PARALLEL
END SUBROUTINE do_work
The issue, however, is cleared away simply by hardcoding the size of array a i.e. REAL, DIMENSION(5) :: a. It is almost as if the task space is not aware of the array size parameter n_in. However, I have verified n_in both inside and outside of the task construct and outside of the parallel construct. Furthermore, if a is declared as a scalar, it works
Is the usage of PRIVATE clauses incorrect or incomplete?
SIDE NOTES:
I've written this simplified code to reproduce the problem. In reality, I am parallelizing a series of linked lists, as you can probably tell from the structure
Any code calling this subroutine is serial. There is no parallel nesting, recursion, etc.
I have no clue why, but in my case it works fine, if I include a radom print command (e.g print*,'test' or print*,a or even only print*, ) somewhere in the parallel region. If I comment it out, again I also get a SIGSEV ... strange. Sorry for this more or less answer.
I would like to create an array with a dimension based on the number of elements meeting a certain condition in another array. This would require that I initialize an array mid-routine, which Fortran won't let me do.
Is there a way around that?
Example routine:
subroutine example(some_array)
real some_array(50) ! passed array of known dimension
element_count = 0
do i=1,50
if (some_array.gt.0) then
element_count = element_count+1
endif
enddo
real new_array(element_count) ! new array with length based on conditional statement
endsubroutine example
Your question isn't about initializing an array, which involves setting its values.
However, there is a way to do what you want. You even have a choice, depending on how general it's to be.
I'm assuming that the element_count means to have a some_array(i) in that loop.
You can make new_array allocatable:
subroutine example(some_array)
real some_array(50)
real, allocatable :: new_array(:)
allocate(new_array(COUNT(some_array.gt.0)))
end subroutine
Or have it as an automatic object:
subroutine example(some_array)
real some_array(50)
real new_array(COUNT(some_array.gt.0))
end subroutine
This latter works only when your condition is "simple". Further, automatic objects cannot be used in the scope of modules or main programs. The allocatable case is much more general, such as when you want to use the full loop rather than the count intrinsic, or want the variable not as a procedure local variable.
In both of these cases you meet the requirement of having all the declarations before executable statements.
Since Fortran 2008 the block construct allows automatic objects even after executable statements and in the main program:
program example
implicit none
real some_array(50)
some_array = ...
block
real new_array(COUNT(some_array.gt.0))
end block
end program example
Try this
real, dimension(50) :: some_array
real, dimension(:), allocatable :: other_array
integer :: status
...
allocate(other_array(count(some_array>0)),stat=status)
at the end of this sequence of statements other_array will have the one element for each element of some_array greater than 0, there is no need to write a loop to count the non-zero elements of some_array.
Following #AlexanderVogt's advice, do check the status of the allocate statement.
You can use allocatable arrays for this task:
subroutine example(some_array)
real :: some_array(50)
real,allocatable :: new_array(:)
integer :: i, element_count, status
element_count = 0
do i=lbound(some_array,1),ubound(some_array,1)
if ( some_array(i) > 0 ) then
element_count = element_count + 1
endif
enddo
allocate( new_array(element_count), stat=status )
if ( status /= 0 ) stop 'cannot allocate memory'
! set values of new_array
end subroutine
You need to use an allocatable array (see this article for more on it). This would change your routine to
subroutine example(input_array,output_array)
real,intent(in) :: input_array(50) ! passed array of known dimension
real, intent(out), allocatable :: output_array(:)
integer :: element_count, i
element_count = 0
do i=1,50
if (some_array.gt.0) element_count = element_count+1
enddo
allocate(output_array(element_count))
end subroutine
Note that the intents may not be necessary, but are probably good practice. If you don't want to call a second array, it is possible to create a reallocate subroutine; though this would require the array to already be declared as allocatable.