I had never thought about this before, but lately I've been worried about something. In Fortran90(95), say I create a really big array
Integer :: X(1000000)
and then I write a function that takes this array as an argument. When I pass the array to the function (as in myfunc(X)) what exactly happens during run time?
Does the entire array get passed by value and a new copy constructed inside the function?(costly)
Or does the compiler simply pass some sort of reference or pointer to the array?(cheap)
Do the dimension of the array or the declaration of the function make a difference?
In Fortran 90 , as in most other programming languages, arrays are passed by reference (technically, this is often a reference to the first item of the array). In Fortran 90, non-array values are also usually passed by reference. So, you needn't worry about the size of the parameters you pass, since they won't be copied but will, instead, be passed simply by reference.
The one thing you don't want to do is something like:
INTEGER :: X(1:1000,1:1000,1:1000)
CALL myRoutine(X(2:999,2:999,2:999))
where myRoutine cannot operate on the bounds of the array for some reason. It cannot pass the reference to the slice of the array since it not contiguous in memory. So it creates a temporary array and copies the values from X. Needless to say this is very slow. But you shouldn't have that issue with 1D array, even when specifying slices, as they are still contiguous in memory.
Related
I'm going through a Fortran code, and one bit has me a little puzzled.
There is a subroutine, say
SUBROUTINE SSUB(X,...)
REAL*8 X(0:N1,1:N2,0:N3-1),...
...
RETURN
END
Which is called in another subroutine by:
CALL SSUB(W(0,1,0,1),...)
where W is a 'working array'. It appears that a specific value from W is passed to the X, however, X is dimensioned as an array. What's going on?
This is non-uncommon idiom for getting the subroutine to work on a (rectangular in N-dimensions) subset of the original array.
All parameters in Fortran (at least before Fortran 90) are passed by reference, so the actual array argument is resolved as a location in memory. Choose a location inside the space allocated for the whole array, and the subroutine manipulates only part of the array.
Biggest issue: you have to be aware of how the array is laid out in memory and how Fortran's array indexing scheme works. Fortran uses column major array ordering which is the opposite convention from c. Consider an array that is 5x5 in size (and index both directions from 0 to make the comparison with c easier). In both languages 0,0 is the first element in memory. In c the next element in memory is [0][1] but in Fortran it is (1,0). This affects which indexes you drop when choosing a subspace: if the original array is A(i,j,k,l), and the subroutine works on a three dimensional subspace (as in your example), in c it works on Aprime[i=constant][j][k][l], but in Fortran in works on Aprime(i,j,k,l=constant).
The other risk is wrap around. The dimensions of the (sub)array in the subroutine have to match those in the calling routine, or strange, strange things will happen (think about it). So if A is declared of size (0:4,0:5,0:6,0:7), and we call with element A(0,1,0,1), the receiving routine is free to start the index of each dimension where ever it likes, but must make the sizes (4,5,6) or else; but that means that the last element in the j direction actually wraps around! The thing to do about this is not use the last element. Making sure that that happens is the programmers job, and is a pain in the butt. Take care. Lots of care.
in fortran variables are passed by address.
So W(0,1,0,1) is value and address. so basically you pass subarray starting at W(0,1,0,1).
This is called "sequence association". In this case, what appears to be a scaler, an element of an array (actual argument in caller) is associated with an array (implicitly the first element), the dummy argument in the subroutine . Thereafter the elements of the arrays are associated by storage order, known as "sequence". This was done in Fortran 77 and earlier for various reasons, here apparently for a workspace array -- perhaps the programmer was doing their own memory management. This is retained in Fortran >=90 for backwards compatibility, but IMO, doesn't belong in new code.
I want to be able to reset elements in an array which are not in a consecutive block of memory. I thought to use an array of pointers for this rather than a pointer array, as my understanding of pointer arrays is that they must point to a coherent block of memory (e.g pointer(1:10,1:10) => target(1:100) )
My simple test program is as follows:
program test
implicit none
integer, target :: arr(4,4)
type ptr
integer, pointer :: p
end type ptr
type(ptr), dimension(2) :: idx
arr=0
idx(1)%p=>arr(2,2)
idx(2)%p=>arr(4,2)
idx(1)%p=5 ! this is okay
idx(2)%p=5 ! this is okay
idx(1:2)%p=5 ! this gives an error
print *,arr
end program test
The first two statements idx(n)%p=5 are okay, but I want to be able to set a chunk of the array in one statement using the spanning method idx(1:n)%p=5 but when I do this I get the following compile error:
Error: Component to the right of a part reference with nonzero rank must not have the POINTER attribute at (1)
Can I set the value of a chunk of array entries using pointer somehow? Perhaps it is in fact possible with a pointer array rather than an array of pointers...
I think this is related to Fortran: using a vector to index a multidimensional array
but I can't see how to use that answer here.
Probably an extended comment rather than an answer, but I need a little formatting ...
It's best not to think of Fortran supporting arrays of pointers. You've already grasped that (I think) and built the usual workaround of an array of derived type each instance of the type having an element which is a pointer.
What's not entirely clear is why you don't use multiple vector subscripts such as
arr([2,4],[2]) = 5
(OK, the second subscript is something of a degenerate vector but the statement is intended to have the same effect as your attempt to use pointers.) If you would rather, you could use array subscript triplets instead
arr(2:4:2,2) = 5
Perhaps you've simplified your needs too far to express the necessity of using pointers in which case my feeble suggestions will not meet your undeclared needs.
In the following code, I declare an array mass with 20 elements. When passed to the subroutine foo, foo is told that mass has only 10 elements. However, I can still access the 20th element. My questions are:
Why can I pass an array to a subroutine and tell the subroutine the wrong size for the array?
Why can I still access the 20th element even though the subroutine thinks it only has 10 elements?
Would any changes I make to the 20th element in the array while in foo remain with the array just the same as if foo knew the proper size of the array?
Here is the code:
program test
implicit none
integer num2,i
real*8 mass(20)
num2=10
do i = 1,num
mass(i) = 1.d0
end do
call foo(num2,mass)
end
subroutine foo(num2,mass)
integer num2
real*8 mass(num2)
write(*,"(A20,E15.9)") "first one:",mass(1)
write(*,"(A20,E15.9)") "tenth one:",mass(10)
write(*,"(A20,E15.9)") "twentieth one:",mass(20)
continue
end
Note: This situation of a subroutine being told the wrong size for an array is one I am encountering in someone else's code I am trying to modify for my own usage.
There are a couple of misconceptions underlying your question which I'll address in this answer. Some practical implications are given in the answer by M. S. B..
You say that you declare an array mass with twenty elements, but that foo is told that it has only ten. This is not correct.
What you have are actually two distinct entities: an array mass in the main program, with twenty elements, and an array (also called mass) of size ten in the subroutine foo.
The "passing" is establishing an association (so-called argument association) between those two entities. The array called mass in the main program is the actual argument in the subroutine and the array called mass in the subroutine is the dummy argument.
The dummy argument is an array of explicit shape, extent num2 (which is also argument associated with the num2 of the main program). The first num2 elements of the dummy argument are associated with the first num2 elements of the actual argument. [This leads to the requirement that there are at least num2 elements in the array in the main program.]
So, the answer to your first question
Why can I pass an array to a subroutine and tell the subroutine the wrong size for the array?
is just this: you aren't telling it the wrong size, you are merely saying that the array in the subroutine corresponds to the first num2 elements of the argument.
Coming to
Why can I still access the 20th element even though the subroutine thinks it only has 10 elements?
This is a programming error. You are not allowed to use a subscript value larger than the extent of the array. What happens when you try this is up to the compiler. As the other answer says, it's quite possible that this is just accessing (because of the way the association is implemented) that location in memory corresponding to the large element of the actual argument. But equally the compiler could complain (especially with those checking compilation options chosen), or if the passing is done with temporary copying the thing could crash. [This latter is unlikely.]
Finally
Would any changes I make to the 20th element in the array while in foo remain with the array just the same as if foo knew the proper size of the array?
This is again implementation specific. There is no correct Fortran answer, as your program isn't conforming. As with the previous point, if that's an area of memory corresponding to the appropriate element of the actual argument and there's no bounds checking going on the changes could persist. If the compiler chose to do a copy there could either be a crash or the changes outside the bounds could be ignored on return. [Again, these last two are unlikely as it would require the compiler to be "clever".]
Why can I pass an array to a subroutine and tell the subroutine the wrong size for the array?
Because Fortran allows you to use a portion of an array.
Why can I still access the 20th element even though the subroutine thinks it only has 10 elements?
Because 1) the array elements are consecutive in memory so the indexing reaches the 20th element, and 2) Fortran-produced programs normally doesn't check whether or not the programming is making performing invalid array subscripting. Most compilers have options to insert such checks, e.g., -fcheck=bounds or -fcheck=all for gfortran.
Would any changes I make to the 20th element in the array while in foo remain with the array just the same as if foo knew the proper size of the array?
Yes. If you did this and the original array didn't have 20 elements, you would alter an unrelated memory location, with bad results.
In C, the idea of an array is very straightforward—simply a pointer to the first element in a row of elements in memory, which can be accessed via pointer arithmetic/ the standard array[i] syntax.
However, in languages like Google Go, "arrays are values", not pointers. What does that mean? How is it implemented?
In most cases they're the same as C arrays, but the compiler/interpreter hides the pointer from you. This is mainly because then the array can be relocated in memory in a totally transparent way, and so such arrays appear to have an ability to be resized.
On the other hand it is safer, because without a possibility to move the pointers you cannot make a leak.
Since then (2010), the article Slices: usage and internals is a bit more precise:
The in-memory representation of [4]int is just four integer values laid out sequentially:
Go's arrays are values.
An array variable denotes the entire array; it is not a pointer to the first array element (as would be the case in C).
This means that when you assign or pass around an array value you will make a copy of its contents. (To avoid the copy you could pass a pointer to the array, but then that's a pointer to an array, not an array.)
One way to think about arrays is as a sort of struct but with indexed rather than named fields: a fixed-size composite value.
Arrays in Go are also values in that they are passed as values to functions(in the same way ints,strings,floats etc.)
Which requires copying the whole array for each function call.
This can be very slow for a large array, which is why in most cases it's usually better to use slices
In C, arrays are passed to functions as pointers. Structures can be passed to functions either by value or by address (pointer). Is there any specific reason why we can not pass array by value but we can pass structre by value ?
In C, everything is passed by value. There is another rule that says that in most contexts, the name of an array is equivalent to a pointer to its first element. Passing an array to a function is such a context.
So, the special case is not that arrays are passed by reference, the special case is the rule about arrays decaying to pointers. This gives one the impression that an array is passed by reference (which it effectively is, but now you know why!)
The post in my link above explains in more detail about the type of an array in different contexts.