Given the following code:
integer, parameter :: n = 10000
integer, parameter :: m = 3
real, dimension(:,:), allocatable :: arr
! First way
allocate(arr(n,m))
! Second way
allocate(arr(m,n))
What is the "best" way to allocate arr when there is a large difference in the two dimensions, the first way or the second way? Does it matter, or is it something that is strongly dependent on how arr will be used?
Fortran is column-major, i.e. the first dimension changes the fastest.
The optimal choice of dimensions depends on your problem:
If arr is a list of coordinates in a 3D space, and you commonly operate on these coordinates, you should probably choose the second option:
allocate(arr(m,n))
arr(:,1) = [x, y, z]
! ...
Then, you have a contiguous layout for each coordinate.
If you have three vectors with a length of n = 10000 instead (e.g. three right-hand-sides), option one would give you contiguous chunks for each vector.
In conclusion: it depends what you are trying to do.
If there is a difference, it is a minor one related to the characteristics of sequential memory access, memory fetch optimizations, and maybe caching.
Otherwise, don't worry about it. Just make the program work correctly. If somehow a few more nanoseconds of performance need to be squeezed out of it, well, there are undoubtedly better places to expend mental energy to get it.
If you are passing slices of the array to a function, then the difference matters greatly.
Related
Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.
No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.
I have some iterative Fortran code which at each integration step produces some output. What is the best practice in terms of speed/accuracy for getting each of these steps saved to disk?
My current approach involves declaring some large array, at each integration step saving the output to a row of the array, and then finally saving a cropped version of the total array to file. A psuedo-example is shown below.
program IO_example
integer, parameter :: dp = selected_real_kind(33,4931)
integer(kind=dp) :: nrows = 1e6, ncols = 6
real(kind=dp), dimension(nrows,ncols) :: BigDataArray
real(kind=dp), dimension(ncols) :: RowVector
real(kind=dp), dimension(:,:), allocatable :: SmallDataArray
integer(kind=dp) :: i !for iterating
i = 1
do while (condition)
!Update RowVector
BigDataArray(i,:) = RowVector
i = i+1
enddo
!First reallocate to create a smaller array
allocate(SmallDataArray(i,ncols))
SmallDataArray = BigDataArray(1:i, :)
!Now save
open(unit=10,file=BinaryData,status='replace',form='unformatted')
write(10) SmallDataArray
close(10)
end program IO_example
Now this works fine, but my question is is this the best way to do this, or is some other approach more favourable? By best I am particularly referring to speed (how much does writing to array and writing to file slow down the code), although accuracy issues are also important (I understand these are avoided by writing in binary unformatted. See this StackOverflow answer).
Some potential issues I can foresee is the SmallDataArray being greater than the RAM (especially in quad precision) and so unable to write to disk. In addition, the number of iterations could become greater than nrows (in this case I suppose one can just increase nrows, but at what point does this start to impact performance?)
Thanks in advance for any help.
This is probably an extended comment, taking advantage of some formatting, and verges close to an opinion, but there are one or two matters which are amenable to measurement and which you might care to test for yourself.
I'm not sure what role BigDataArray plays in your code, since you don't seem to need all the data in memory after it has been computed. You could probably drop it altogether and simply accumulate results into SmallDataArray. If BigDataArray has 10^6 rows, then maybe give SmallDataArray 10^5 rows, and fill it up 10 times. Or, if you're not certain at the outset how many rows to allocate to Big, then don't, just set Small to 10^5 and fill it up as many times as necessary, exiting when the computation converges.
(And don't get hung up on the numbers I've chosen, the best size for Small is something you probably ought to experiment with.)
Once the code has filled Small write it to file, go back to row 1 and carry on.
If you follow this approach you will eliminate at least a couple of potential performance issues; the repeated allocation of Small (not sure what that's about anyway), and the movement of data when you copy a bunch of rows from Big to Small (which gains you nothing in terms of computation performance and is unnecessary for writing the data to the file).
As you seem to know, the rule when writing data to file (which is very slow computationally) is to write large volumes in one go, but it's difficult to state how large that volume should be without at least some measurements and some testing, so go measure and test.
By dropping Big altogether you remove that burden from the memory while the code runs. And if you do need all of Big at the end of the calculation, you could always read it back in (subject to memory being available of course).
Finally, let me get some retaliation in first: if your response to this 'answer' is something akin to Oh, that doesn't answer my real question, it only answers the simplified question I asked but I have all these other issues to consider would you mind taking a look at these too ... then you can take it that my response to that is (a) unprintable and (b) boils down to Yes, I would mind
In Fortran, I have an 1D array of type real, real :: work(2*N), which represents N complex numbers. I don't have any impact of the declaration of the array.
Later I need to apply a complex conjugation on work. However, conjg(work(:)) does not work since it is of type real.
Is there a efficient way to convince the compiler to apply the conjg to my array?
The easiest approach is already in the comment by HighPerformanceMark, just multiply the elements representing the imaginary part by -1.
You can also use equivalence between a real array and a complex array. It will be just one array but viewed as both real and complex. Maybe not strictly standard conforming (not sure) but working as long as N is constant.
The equivalence is used as:
real :: work(2*N)
complex :: cwork(N)
!both work and cwork point to the same data
equivalence (work, cwork)
work = some_initial_value
!this conjugates work at the same time as cwork because they are just different names for the same array
cwork = conjg(cwork)
Use a complex variable, COMPLEX :: temp(N) and apply the conjugation to that. You can then dissect the real and complex parts and put them back into your work array by using REAL(temp) and AIMAG(temp).
Probably it is better to make your work a complex type from the outset though.
I work with Fortran now quite a long time but I have a question to which I can't find a satisfying answer.
If I have two arrays and I want to copy one into the other:
real,dimension(0:100,0:100) :: array1,array2
...
do i=0,100
do j=0,100
array1(i,j) = array2(i,j)
enddo
enddo
But I also noticed that it works as well if I do it like that:
real,dimension(0:100,0:100) :: array1,array2
...
array1 = array2
And there is a huge difference in computational time! (The second one is much faster!)
If I do it without a loop can there be a problem because I don't know maybe I'm not coping the content just the memory reference?
Does it change anything if I do another mathematical step like:
array1 = array2*5
Is there a problem on a different architecture (cluster server) or on a different compiler (gfortran, ifort)?
I have to perform various computational steps on huge amounts of data so the computational time is an issue.
Everything that #Alexander_Vogt said, but also:
do i=0,100
do j=0,100
array1(i,j) = array2(i,j)
enddo
enddo
will always be slower than
do j=0,100
do i=0,100
array1(i,j) = array2(i,j)
enddo
enddo
(Unless the compiler notices it and reorders the loops.)
In Fortran, the first parameter is the fastest changing. That means that in the second loop, the compiler can load several elements of the array in one big swoop in the lower level caches to do operations on.
If you have multidimensional loops, always have the innermost loop loop over the first index, and so on. (If possible in any way.)
Fortran is very capable of performing vector operations. Both
array1 = array2
and
array1 = array2*5
are valid operations.
This notation allows the compiler to efficiently parallelize (and/or) optimize the code, as no dependence on the order of the operations exists.
However, these construct are equivalent to the explicit loops, and it depends on the compiler which one will be faster.
Whether the memory will be copied or not depends on what further is done with the arrays and whether the compiler can optimize that. If there is no performance gain, it is safe to assume the array will be copied.
Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.
No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.