Passing arrays as smaller than they actually are - arrays

In the following code, I declare an array mass with 20 elements. When passed to the subroutine foo, foo is told that mass has only 10 elements. However, I can still access the 20th element. My questions are:
Why can I pass an array to a subroutine and tell the subroutine the wrong size for the array?
Why can I still access the 20th element even though the subroutine thinks it only has 10 elements?
Would any changes I make to the 20th element in the array while in foo remain with the array just the same as if foo knew the proper size of the array?
Here is the code:
program test
implicit none
integer num2,i
real*8 mass(20)
num2=10
do i = 1,num
mass(i) = 1.d0
end do
call foo(num2,mass)
end
subroutine foo(num2,mass)
integer num2
real*8 mass(num2)
write(*,"(A20,E15.9)") "first one:",mass(1)
write(*,"(A20,E15.9)") "tenth one:",mass(10)
write(*,"(A20,E15.9)") "twentieth one:",mass(20)
continue
end
Note: This situation of a subroutine being told the wrong size for an array is one I am encountering in someone else's code I am trying to modify for my own usage.

There are a couple of misconceptions underlying your question which I'll address in this answer. Some practical implications are given in the answer by M. S. B..
You say that you declare an array mass with twenty elements, but that foo is told that it has only ten. This is not correct.
What you have are actually two distinct entities: an array mass in the main program, with twenty elements, and an array (also called mass) of size ten in the subroutine foo.
The "passing" is establishing an association (so-called argument association) between those two entities. The array called mass in the main program is the actual argument in the subroutine and the array called mass in the subroutine is the dummy argument.
The dummy argument is an array of explicit shape, extent num2 (which is also argument associated with the num2 of the main program). The first num2 elements of the dummy argument are associated with the first num2 elements of the actual argument. [This leads to the requirement that there are at least num2 elements in the array in the main program.]
So, the answer to your first question
Why can I pass an array to a subroutine and tell the subroutine the wrong size for the array?
is just this: you aren't telling it the wrong size, you are merely saying that the array in the subroutine corresponds to the first num2 elements of the argument.
Coming to
Why can I still access the 20th element even though the subroutine thinks it only has 10 elements?
This is a programming error. You are not allowed to use a subscript value larger than the extent of the array. What happens when you try this is up to the compiler. As the other answer says, it's quite possible that this is just accessing (because of the way the association is implemented) that location in memory corresponding to the large element of the actual argument. But equally the compiler could complain (especially with those checking compilation options chosen), or if the passing is done with temporary copying the thing could crash. [This latter is unlikely.]
Finally
Would any changes I make to the 20th element in the array while in foo remain with the array just the same as if foo knew the proper size of the array?
This is again implementation specific. There is no correct Fortran answer, as your program isn't conforming. As with the previous point, if that's an area of memory corresponding to the appropriate element of the actual argument and there's no bounds checking going on the changes could persist. If the compiler chose to do a copy there could either be a crash or the changes outside the bounds could be ignored on return. [Again, these last two are unlikely as it would require the compiler to be "clever".]

Why can I pass an array to a subroutine and tell the subroutine the wrong size for the array?
Because Fortran allows you to use a portion of an array.
Why can I still access the 20th element even though the subroutine thinks it only has 10 elements?
Because 1) the array elements are consecutive in memory so the indexing reaches the 20th element, and 2) Fortran-produced programs normally doesn't check whether or not the programming is making performing invalid array subscripting. Most compilers have options to insert such checks, e.g., -fcheck=bounds or -fcheck=all for gfortran.
Would any changes I make to the 20th element in the array while in foo remain with the array just the same as if foo knew the proper size of the array?
Yes. If you did this and the original array didn't have 20 elements, you would alter an unrelated memory location, with bad results.

Related

Passing size of array as argument to a subroutine in fortran

I was wondering about the overhead of querying size of array in fortran. Old fortran (<f95) way was to pass the size of array to the arguments of subroutine:
subroutine asub(nelem,ar)
integer,intent(in)::nelem
real*8,intent(in)::ar(:)
! do stuff with nelem such as allocate other arrays
end subroutine asub
Since the size function of f95, it can be done this way:
subroutine asub(ar)
real*8,intent(in)::ar(:)
! do stuff with size(ar) such as allocate other arrays
end subroutine asub
Is method 2 bad performance-wise if asub is called million times ?
I am asking because I am working on a relatively big code where some array sizes are global variables (not even passed as subroutine arguments), which is really bad in my opinion. Method 1 would require a lot of work in order to propagate the array sizes to the whole code while method 2 is clearly faster to achieve in my case.
Thanks !
nelem is a number that you need to read from memory, size(ar) is also a number that you need to read from memory. And you need to inquire the value just once. And then probably do a lot of computation over nelem elements. The overhead inquiring the size of the value will be completely negligible.
OK, size(ar) is a function call, but the compiler can just insert reading the right value from the array descriptor). And even if it remains a function call, still it will be called just once.
Differences, if any, will be elsewhere, mainly as described in the Q/A linked linked by francescalus Passing arrays to subroutines in Fortran: Assumed shape vs explicit shape. Depending on what the compiler can assume about the array being contiguous in memory it will be able to optimize it better or worse (e.g. SIMD vectorization).
As always, where performance matters, you should test and measure. Remember to enable all relevant compiler optimizations.

Fortran LAPACK cblat1 check2 ctest: pass value instead of array? [duplicate]

I'm going through a Fortran code, and one bit has me a little puzzled.
There is a subroutine, say
SUBROUTINE SSUB(X,...)
REAL*8 X(0:N1,1:N2,0:N3-1),...
...
RETURN
END
Which is called in another subroutine by:
CALL SSUB(W(0,1,0,1),...)
where W is a 'working array'. It appears that a specific value from W is passed to the X, however, X is dimensioned as an array. What's going on?
This is non-uncommon idiom for getting the subroutine to work on a (rectangular in N-dimensions) subset of the original array.
All parameters in Fortran (at least before Fortran 90) are passed by reference, so the actual array argument is resolved as a location in memory. Choose a location inside the space allocated for the whole array, and the subroutine manipulates only part of the array.
Biggest issue: you have to be aware of how the array is laid out in memory and how Fortran's array indexing scheme works. Fortran uses column major array ordering which is the opposite convention from c. Consider an array that is 5x5 in size (and index both directions from 0 to make the comparison with c easier). In both languages 0,0 is the first element in memory. In c the next element in memory is [0][1] but in Fortran it is (1,0). This affects which indexes you drop when choosing a subspace: if the original array is A(i,j,k,l), and the subroutine works on a three dimensional subspace (as in your example), in c it works on Aprime[i=constant][j][k][l], but in Fortran in works on Aprime(i,j,k,l=constant).
The other risk is wrap around. The dimensions of the (sub)array in the subroutine have to match those in the calling routine, or strange, strange things will happen (think about it). So if A is declared of size (0:4,0:5,0:6,0:7), and we call with element A(0,1,0,1), the receiving routine is free to start the index of each dimension where ever it likes, but must make the sizes (4,5,6) or else; but that means that the last element in the j direction actually wraps around! The thing to do about this is not use the last element. Making sure that that happens is the programmers job, and is a pain in the butt. Take care. Lots of care.
in fortran variables are passed by address.
So W(0,1,0,1) is value and address. so basically you pass subarray starting at W(0,1,0,1).
This is called "sequence association". In this case, what appears to be a scaler, an element of an array (actual argument in caller) is associated with an array (implicitly the first element), the dummy argument in the subroutine . Thereafter the elements of the arrays are associated by storage order, known as "sequence". This was done in Fortran 77 and earlier for various reasons, here apparently for a workspace array -- perhaps the programmer was doing their own memory management. This is retained in Fortran >=90 for backwards compatibility, but IMO, doesn't belong in new code.

How to use of an array of pointers to assign non-adjacent array entries in FORTRAN

I want to be able to reset elements in an array which are not in a consecutive block of memory. I thought to use an array of pointers for this rather than a pointer array, as my understanding of pointer arrays is that they must point to a coherent block of memory (e.g pointer(1:10,1:10) => target(1:100) )
My simple test program is as follows:
program test
implicit none
integer, target :: arr(4,4)
type ptr
integer, pointer :: p
end type ptr
type(ptr), dimension(2) :: idx
arr=0
idx(1)%p=>arr(2,2)
idx(2)%p=>arr(4,2)
idx(1)%p=5 ! this is okay
idx(2)%p=5 ! this is okay
idx(1:2)%p=5 ! this gives an error
print *,arr
end program test
The first two statements idx(n)%p=5 are okay, but I want to be able to set a chunk of the array in one statement using the spanning method idx(1:n)%p=5 but when I do this I get the following compile error:
Error: Component to the right of a part reference with nonzero rank must not have the POINTER attribute at (1)
Can I set the value of a chunk of array entries using pointer somehow? Perhaps it is in fact possible with a pointer array rather than an array of pointers...
I think this is related to Fortran: using a vector to index a multidimensional array
but I can't see how to use that answer here.
Probably an extended comment rather than an answer, but I need a little formatting ...
It's best not to think of Fortran supporting arrays of pointers. You've already grasped that (I think) and built the usual workaround of an array of derived type each instance of the type having an element which is a pointer.
What's not entirely clear is why you don't use multiple vector subscripts such as
arr([2,4],[2]) = 5
(OK, the second subscript is something of a degenerate vector but the statement is intended to have the same effect as your attempt to use pointers.) If you would rather, you could use array subscript triplets instead
arr(2:4:2,2) = 5
Perhaps you've simplified your needs too far to express the necessity of using pointers in which case my feeble suggestions will not meet your undeclared needs.

Fortran memory management and subroutines/functions

at the moment I am working on a code for numerical simulations in Fortran 95. My platform is WIndows and I take advantage of the MSVC environment with the Intel Fortran compiler.
This code, as many in this field, creates a system of equations to be resolved. Numerically, this happen storing a square matrix and a vector of known values. Now, in order to optimize the memory, the matrices are stored in convenient form, like the compressed sparse rows format (CSR) or analogous, so the zero values are not stored.
Given this brief introduction, here there are my doubts.
Since at compiling time I do not know the dimension of my arrays, I just declare them as:
REAL, DIMENSION(:), ALLOCATABLE :: myArray
and once I retrieve the dimension of such a vector, I call
ALLOCATE(myArray(N)) where N is the number of elements that I want to allocate
Still, memory is empty, since the values are not stored but a memory check is done in order to avoid stack overflow. Is it right?
Now, filling it with values, the occupied space ramp up. The structure of a Fortran array, both for a 1D vector and multi-dimensional array, is to fill in column order a space equivalent to the number of value. It is to say that if we have a 2D array of dimension 1000x1000, it will be stored in 1M "contiguous boxes" ordered by column numbers (first the first column is stored, then the second one and so on..).
If this is true, so the structure of data is the same, is the access time to a particular value the only difference between a multidimensional and a 1D vector?
Is then the command RESHAPE changing only the way the program "sees" the arrays?
The array I need for my purposes is defined in a module that each subroutine/function share. In particular, a subroutine allocate and fill it. Coming back to the main program, there is no problem with that since I display to the user some statistics about it. Let us say, we allocated 400M REAL*4 values, with about 1.5GB of used memory.
However, once that I get into another subroutine, the program stops saying forrtl: severe(170): Program Exception - Stack Overflow. I ran out of memory. But how could it be if the matrix is already allocated and I did not allocate anything more? Notice that: the subroutine uses the same module, so variables are already declared; my RAM has still a free space of about 1.3GB; the stop is at the first line of the subroutine.
Is subroutine (and also function) doubling the data? I thought Fortran would pass the address of my variables in that case, avoiding copies and working directly on the values.
Finally, as many of you, I enjoyed in C++ the STD library functions, like vector::push_back and so on. In Fortran, there are not such beautiful routines but some very useful functions are still there. Masking an array, using WHERE or COUNT or MERGEcan help you to handle some operation effectively.
However, they are veeeeery slow when my matrix is bigger than 1M entries. In that case even a sequential search and substitute is faster than creating a mask or use where. How could it be possible? Aren't they multithreaded?
Thank you in advance for your patience!! All suggestions are very welcome!!
Comment space is limited, so I am posting this as an answer. Obviously you are running out of stack space, not out of memory. The stack size of the main thread on Windows is fixed at link time (the default is 1 MiB) and any larger stack allocation could result in a stack overflow. This could happen because of many reasons, but mainly:
the subroutine that you call uses big stack arrays (e.g. non-ALLOCATABLE arrays);
you pass a non-contiguous array subsection to the subroutine, e.g. myArray(1:10:2), and you don't have an explicit interface for that subroutine. In this case the compiler would make a temporary most likely stack copy of the data being passed, which could exhaust the stack space and trigger the exception.
I would guess the first point is the one, relevant to your case, since the exception occurs when you enter the subroutine (probably in the prologue, where stack space for all local variables is being reserved). You might instruct Intel Fortran to enable heap arrays in the project settings and see if it helps (not sure if the Windows version enables heap arrays be default or not).
Without even a single line of your code shown, it would be quite hard to guess what is the source of the problem and to solve it.

What happens when I pass an array to a function/subroutine?

I had never thought about this before, but lately I've been worried about something. In Fortran90(95), say I create a really big array
Integer :: X(1000000)
and then I write a function that takes this array as an argument. When I pass the array to the function (as in myfunc(X)) what exactly happens during run time?
Does the entire array get passed by value and a new copy constructed inside the function?(costly)
Or does the compiler simply pass some sort of reference or pointer to the array?(cheap)
Do the dimension of the array or the declaration of the function make a difference?
In Fortran 90 , as in most other programming languages, arrays are passed by reference (technically, this is often a reference to the first item of the array). In Fortran 90, non-array values are also usually passed by reference. So, you needn't worry about the size of the parameters you pass, since they won't be copied but will, instead, be passed simply by reference.
The one thing you don't want to do is something like:
INTEGER :: X(1:1000,1:1000,1:1000)
CALL myRoutine(X(2:999,2:999,2:999))
where myRoutine cannot operate on the bounds of the array for some reason. It cannot pass the reference to the slice of the array since it not contiguous in memory. So it creates a temporary array and copies the values from X. Needless to say this is very slow. But you shouldn't have that issue with 1D array, even when specifying slices, as they are still contiguous in memory.

Resources