Read data from file into array

Read data from file into array - arrays

I am having trouble reading data from a file into an array, I am new to programing
and fortran so any information would be a blessing.
this is a sample of the txt file the program is reading from
!60
!USS Challenger 1.51 12712.2 1.040986E+11
!USS Cortez 9.14 97822.5 2.009091E+09
!USS Dauntless 5.76 27984.0 2.599167E+09
!USS Enterprise 2.48 9136.3 1.478474E+10
!USS Excalibur 3.83 32253.0 1.286400E+10
all together there is 60 ships. the variables are separated by spaces and
are as follows
warp factor, distance in light years, actual velocity.
this is the current code I have used, it has given the error The module or main
program array 'm' at (1) must have constant shape
PROGRAM engine_performance
IMPLICIT NONE
INTEGER :: i ! loop index
REAL,PARAMETER :: c = 2.997925*10**8 ! light years (m/s)
REAL,PARAMETER :: n = 60 ! number of flights
REAL :: warpFactor ! warp factor
REAL :: distanceLy ! distance in light years
REAL :: actualTT ! actual time travel
REAL :: velocity ! velocity in m/s
REAL :: TheoTimeT ! theoretical time travel
REAL :: average ! average of engine performance
REAL :: median ! median of engine performance
INTEGER, DIMENSION (3,n), ALLOCATABLE :: M
OPEN(UNIT=10, FILE="TrekTimes.txt")
DO i = 1,n
READ(*,100) warpFactor, distanceLY, actualTT
100 FORMAT(T19,F4.2,1X,F7.1,&
1X,ES 12.6)
WRITE(*,*) M
END DO
CLOSE (10)
END PROGRAM engine_performance

The first time I read your code I mistakenly read M as an array in which your program would store the numbers from the file of ships. On closer inspection I realise (a) that M is an array of integers and (b) the read statement later in the code reads each line of the input file but doesn't store warpFactor, distanceLY, actualTT anywhere.
Making a wild guess that M ought to be the representation of the numeric factors associated with each ship, here's how to fix your code. If the wild guess is wide of the mark, clarify what you are trying to do and any remaining problems with your code.
First, fix the declaration of M
REAL, DIMENSION (:,:), ALLOCATABLE :: M
The term (3,n) can't be used in the declaration of an allocatable array. Because n is previously declared to be a real constant it's not valid as the specification of an extent of an array. If it could be the declaration of the array would fix its dimensions at (3,60) which means that the array can't be allocatable.
So, also change the declaration of n to
INTEGER :: n
It's no longer a parameter, you're going to read its value from the first line of the file, which is why it's in the file in the first place.
Second, I think you have rows and columns switched in your array. The file has 60 rows (of ships), each of which has 3 columns of numeric data. So when it comes time to allocate M use the statement
ALLOCATE(M(n,3))
Of course, prior to that you'll have had to read n from the first line in the file, but that shouldn't be a serious problem.
Third, read the values into the array. Change
READ(*,100) warpFactor, distanceLY, actualTT
to
READ(*,100) M(i,:)
which will read the whole of row i.
Finally, those leading ! on each line of the file -- get rid of them (use an editor or re-create the file without them). They prevent the easy reading of values. With the ! present reading n requires a format specification, without it it's just read(10,*).
Oh, and absolutely finally: you should probably, after you've got this program working, direct your attention to the topic of defined types in your favourite Fortran tutorial, and learn how to declare type(starship) for added expressiveness and ease of programming.

Related

non-contiguous data and temporary array creation

A similar question was answered in Fortran runtime warning: temporary array. However, the solutions do not quite help me in my case.
Inside a subroutine, I have a subroutine call as:
subroutine initialize_prim(prim)
real(kind=wp), dimension(2, -4:204), intent(out) :: prim
call double_gaussian(prim(1, :))
end subroutine initialize_prim
subroutine double_gaussian(y)
real(kind=wp), dimension(-4:204), intent(out) :: y
integer :: i
do i = -4, 204
y(i) = 0.5 * ( &
exp(-((r(i) - r0))**2) + exp(-((r(i) + r0)/std_dev)**2))
end do
end subroutine double_gaussian
This gives an error message saying that fortran creates a temporary array for "y" in "double_gaussian". Having read a bit about continguous arrays, I understand why this error appears.
Now, looking at my whole program, it would be very tedious to invert the order of the arrays for "prim", so that solution is not really possible.
For creating assumed-shapes in "double_gaussian", I tried doing,
real(kind=wp), dimension(:), intent(out) :: y
integer :: i
do i = -4, 204
y(i) = 0.5 * ( &
exp(-((r(i) - r0))**2) + exp(-((r(i) + r0)/std_dev)**2))
end do
end subroutine double_gaussian
This, however, causes fortran to crash with the error message
"Index '-4' of dimension 1 of array 'y' below lower bound of 1".
It seems that for the assumed-shape format, the indexing is nonetheless assumed to start with 1, whereas it starts at -4 as in my case.
Is there a way to resolve this issue?

I think that you have perhaps misinterpreted a compiler warning as an error. Usually compilers issue a warning when they create temporary arrays - it's a useful aid to high-performance programming. But I'm not sure a compiler ever regards that as an error. And yes, I understand why you might not want to re-order your array just to avoid that
As for the crash - you have discovered that Fortran routines don't automagically know about the lower bounds of arrays which you have carefully set to be other than 1 (nor their upper bounds either). If it is necessary you have to pass the bounds (usually only the lower bound, the routine can figure out the upper bound itself) in the argument list.
However, it rarely is necessary, and it doesn't seem to be in your code - the loop to set each value of the y array could (if I understand correctly) be replaced by
y = 0.5 * (exp(-((r - r0))**2) + exp(-((r + r0)/std_dev)**2))
PS I think that this part of your question, about routines not respecting other-than-1 array lower bounds, is almost certainly a duplicate of several others asked hereabouts but which I couldn't immediately find.

Trick to read data from hard drive faster between sucessive compilations

I am developing code with a compiled language (Fortran 95) that does certain calculations on a huge galaxy catalog. Each time I implement some change, I compile and run the code, and it takes about 3 minutes just reading the ASCII file with the galaxy data from disk. This is a waste of time.
Had I started this project in IDL or Matlab, then it would be different, because the variables containing the array data would be kept in memory between different compilations.
However, I think something could be done to speed up that unnerving reading from disk, like having the files in a fake RAM partition or something.

Instead of going into details on RAM disks I propose you switch from ASCII databases to Binary ones. here is a very simplistic example... An array of random numbers, stored as ASCII (ASCII.txt) and as binary date (binary.bin):
program writeArr
use,intrinsic :: ISO_Fortran_env, only: REAL64
implicit none
real(REAL64),allocatable :: tmp(:,:)
integer :: uFile, i
allocate( tmp(10000,10000) )
! Formatted read
open(unit=uFile, file='ASCII.txt',form='formatted', &
status='replace',action='write')
do i=1,size(tmp,1)
write(uFile,*) tmp(:,i)
enddo !i
close(uFile)
! Unformatted read
open(unit=uFile, file='binary.bin',form='unformatted', &
status='replace',action='write')
write(uFile) tmp
close(uFile)
end program
Here is the result in terms of sizes:
:> ls -lah ASCII.txt binary.bin
-rw-rw-r--. 1 elias elias 2.5G Feb 20 20:59 ASCII.txt
-rw-rw-r--. 1 elias elias 763M Feb 20 20:59 binary.bin
So, you save a factor of ~3.35 in terms of storage.
Now comes the fun part: reading it back in...
program readArr
use,intrinsic :: ISO_Fortran_env, only: REAL64
implicit none
real(REAL64),allocatable :: tmp(:,:)
integer :: uFile, i
integer :: count_rate, iTime1, iTime2
allocate( tmp(10000,10000) )
! Get the count rate
call system_clock(count_rate=count_rate)
! Formatted write
open(unit=uFile, file='ASCII.txt',form='formatted', &
status='old',action='read')
call system_clock(iTime1)
do i=1,size(tmp,1)
read(uFile,*) tmp(:,i)
enddo !i
call system_clock(iTime2)
close(uFile)
print *,'ASCII read ',real(iTime2-iTime1,REAL64)/real(count_rate,REAL64)
! Unformatted write
open(unit=uFile, file='binary.bin',form='unformatted', &
status='old',action='read')
call system_clock(iTime1)
read(uFile) tmp
call system_clock(iTime2)
close(uFile)
print *,'Binary read ',real(iTime2-iTime1,REAL64)/real(count_rate,REAL64)
end program
The result is
ASCII read 37.250999999999998
Binary read 1.5460000000000000
So, a factor of >24!
So instead of thinking of anything else, please switch to a binary file format first.

Parsing a complicated function using MKL/VML

I am trying to calculate a fairly complicated function, say func() - involving several additions, substractions, multiplications, divisions and trigonometric functions, of several two-dimensional arrays in fortran. The calculation is massively parrallel, in that each func() is independent over its row and column location. Each of the matrices is many gigabytes in size, and there are about a dozen of them as arguments.
I would like to make use of Intel MKL functions (invoking --mkl-parallel), in particular VML functions to add, subtract, divide etc. My question is: how can I render a complicated functional expression such as,
e.g.: func(x,y,z) = x*y+cos(z*x-x) where x,y,z are 2d arrays of several GB
in terms of VML functions but using more familiar binary operators. You see my problem requires, in principle, converting all the binary operators, such as "+" and "*" into binary functions taking arguments as ?vadd(x,y). Of course this would be very cumbersome and unsightly for large expressions. Is there a way to overload the binary arithmetic operators such as "+","-" to preferentially use MKL/VML versions in fortran. An example would be nice! Thanks!

I know this answer is a little bit off-topic.
Since all the operations are element-wise and your operations are simple, the func() could be a memory bandwidth bounded task. In this case, using VML may not be a good choice to maximum the performance.
Suppose each of your arrays is of 10GB in size, uisng VML as follows will need at least 9 x 10GB reading and 5 x 10GB writing.
func(...) {
tmp1=x*z
tmp1=tmp1-x;
tmp1=cos(tmp1);
tmp2=x*y;
return tmp1+tmp2;
}
where all the operations all overloaded for 2d array.
Instead you may find the following approach has much less memory access (3 x 10GB reading and 1 x 10GB writing) thus could be quicker (pseudo code).
$omp parallel for
for i in 1 to m
for j in 1 to n
result(i,j)= x(i,j)*y(i,j)+cos(z(i,j)*x(i,j)-x(i,j));
end
end

I developped a small example to show the addition of two vectors. As I don't have MKL installed anymore, I used the SAXPY command from BLAS. The principle should be the same.
At first you define a module with the appropriate definitions. In my case this would be an assignment to save a real array in my datatype (this is only a convenience function as you could also directly access the array variable) and the definition of the addition. Both are a new overload to the + operator and = assignment.
In the program, I define three fields. Two of them get assigned with random numbers and then added to get the third field. Then the first two fields get stored in my special variables, and the result of this addition is stored in a third variable of this type.
Finally, the result is compared by accessing the array directly. Please note, that the assignment from custom datatype to the same datatype is already defined (e.g. ffield3 = ffield1 is already defined.)
My module:
MODULE fasttype
IMPLICIT NONE
PRIVATE
PUBLIC :: OPERATOR(+), ASSIGNMENT(=)
TYPE,PUBLIC :: fastreal
REAL,DIMENSION(:),ALLOCATABLE :: array
END TYPE
INTERFACE OPERATOR(+)
MODULE PROCEDURE fast_add
END INTERFACE
INTERFACE ASSIGNMENT(=)
MODULE PROCEDURE fast_assign
END INTERFACE
CONTAINS
FUNCTION fast_add(fr1, fr2) RESULT(fr3)
TYPE(FASTREAL), INTENT(IN) :: fr1, fr2
TYPE(FASTREAL) :: fr3
INTEGER :: L
L = SIZE(fr2%array)
fr3 = fr2
CALL SAXPY(L, 1., fr1%array, 1, fr3%array, 1)
END FUNCTION
SUBROUTINE fast_assign(fr1, r2)
TYPE(FASTREAL), INTENT(OUT) :: fr1
REAL, DIMENSION(:), INTENT(IN) :: r2
INTEGER :: L
IF (.NOT. ALLOCATED(fr1%array)) THEN
L = SIZE(r2)
ALLOCATE(fr1%array(L))
END IF
fr1%array = r2
END SUBROUTINE
END MODULE
My program:
PROGRAM main
USE fasttype
IMPLICIT NONE
REAL, DIMENSION(:), ALLOCATABLE :: field1, field2, field3
TYPE(fastreal) :: ffield1, ffield2, ffield3
ALLOCATE(field1(10),field2(10),field3(10))
CALL RANDOM_NUMBER(field1)
CALL RANDOM_NUMBER(field2)
field3 = field1 + field2
ffield1 = field1
ffield2 = field2
ffield3 = ffield1 + ffield2
WRITE(*,*) field3 == ffield3%array
END PROGRAM

Trying to read a 2 column input file into two separate arrays and having a lot of trouble

This is likely a trivial question, but for some reason I'm having a lot of trouble solving this problem. I'm reading from an input file which has to sets of numbers in two columns. The first column is a list of integers representing time(e.g. 0530). The second column is a list of real data 5 digits long with 3 digits after the decimal place(e.g. 19.213). The two columns have 3 spaces in between. I would like to read this into my program into separate arrays. I've stat the dimensions of the arrays at the maximum possible length(1440), as is shown below. Id like to use this arrays in a function eventually, but i cant even get the input to work properly. Thanks for the help.
PROGRAM readtest1
IMPLICIT NONE
INTEGER, DIMENSION(1440) :: t
REAL, DIMENSION(1440) :: tuvr
OPEN(1, FILE='AP2412.tv', STATUS='old', ACTION='read')
OPEN(2, FILE='timetuvr.txt', STATUS='replace', ACTION='write')
READ(1,100) t, tuvr
100 FORMAT(I5, F8.3)
WRITE(2,100) t, tuvr
END PROGRAM readtest1
Oh and when I compile and run the program i get the error 'FORTRAN runtime error: Expected REAL for item 2 in formatted transfer, got INTEGER)
I believe that fortran is reading directly down the column, which is causing this problem, but I'm unsure of how to fix it. Do i need a double loop?

read (...) t, tuvr reads the entire arrays at once, in a block. You want to read them one element at a time since that is how they are file. Like this:
do i=1, 1440
read (1, '(i5,f8.3)' ) t(i), tuvr(i)
end do
Depending on whether or not the numbers in the file are perfectly in columns, you might find it necessary to use list-directed IO: read (1, *) t(i), tuvr(i). This method is very flexible and easy to use.
If the file might have less than 1440 lines, try something like this, which detects the end of file and counts how many lines were read:
program test
use, intrinsic :: iso_fortran_env
implicit none
integer, parameter :: ArrayLen = 1440
INTEGER, DIMENSION(ArrayLen) :: t
REAL, DIMENSION(ArrayLen) :: tuvr
integer :: i, ReadCode, num
num = 0
ReadLoop: do i=1, ArrayLen
read (1, '(i5,f8.3)', iostat=ReadCode ) t(i), tuvr(i)
if ( ReadCode /= 0 ) then
if ( ReadCode == iostat_end ) then
exit ReadLoop
else
write ( *, '( / "Error on read: ", I0 )' ) ReadCode
stop
end if
end if
num = num + 1
end do ReadLoop
end program test

In Fortran 90, what is a good way to write an array to a text file, row-wise?

I am new to Fortran, and I would like to be able to write a two-dimensional array to a text file, in a row-wise manner (spaces between columns, and each row on its own line). I have tried the following, and it seems to work in the following simple example:
PROGRAM test3
IMPLICIT NONE
INTEGER :: i, j, k, numrows, numcols
INTEGER, DIMENSION(:,:), ALLOCATABLE :: a
numrows=5001
numcols=762
ALLOCATE(a(numrows,numcols))
k=1
DO i=1,SIZE(a,1)
DO j=1,SIZE(a,2)
a(i,j)=k
k=k+1
END DO
END DO
OPEN(UNIT=12, FILE="aoutput.txt", ACTION="write", STATUS="replace")
DO i=1,numrows
WRITE(12,*) (a(i,j), j=1,numcols)
END DO
END PROGRAM test3
As I said, this seems to work fine in this simple example: the resulting text file, aoutput.txt, contains the numbers 1-762 on line 1, numbers 763-1524 on line 2, and so on.
But, when I use the above ideas (i.e., the last fifth-to-last, fourth-to-last, third-to-last, and second-to-last lines of code above) in a more complicated program, I run into trouble; each row is delimited (by a new line) only intermittently, it seems. (I have not posted, and probably will not post, here my entire complicated program/script--because it is rather long.) The lack of consistent row delimiters in my complicated program/script probably suggests another bug in my code, not with the four-line write-to-file routine above, since the above simple example appears to work okay. Still, I am wondering, can you please help me think if there is a better row-wise write-to-text file routine that I should be using?
Thank you very much for your time. I really appreciate it.

There's a few issues here.
The fundamental one is that you shouldn't use text as a data format for sizable chunks of data. It's big and it's slow. Text output is good for something you're going to read yourself; you aren't going to sit down with a printout of 3.81 million integers and flip through them. As the code below demonstrates, the correct text output is about 10x slower, and 50% bigger, than the binary output. If you move to floating point values, there are precision loss issues with using ascii strings as a data interchange format. etc.
If your aim is to interchange data with matlab, it's fairly easy to write the data into a format matlab can read; you can use the matOpen/matPutVariable API from matlab, or just write it out as an HDF5 array that matlab can read. Or you can just write out the array in raw Fortran binary as below and have matlab read it.
If you must use ascii to write out huge arrays (which, as mentioned, is a bad and slow idea) then you're running into problems with default record lengths in list-drected IO. Best is to generate at runtime a format string which correctly describes your output, and safest on top of this for such large (~5000 character wide!) lines is to set the record length explicitly to something larger than what you'll be printing out so that the fortran IO library doesn't helpfully break up the lines for you.
In the code below,
WRITE(rowfmt,'(A,I4,A)') '(',numcols,'(1X,I6))'
generates the string rowfmt which in this case would be (762(1X,I6)) which is the format you'll use for printing out, and the RECL option to OPEN sets the record length to be something bigger than 7*numcols + 1.
PROGRAM test3
IMPLICIT NONE
INTEGER :: i, j, k, numrows, numcols
INTEGER, DIMENSION(:,:), ALLOCATABLE :: a
CHARACTER(LEN=30) :: rowfmt
INTEGER :: txtclock, binclock
REAL :: txttime, bintime
numrows=5001
numcols=762
ALLOCATE(a(numrows,numcols))
k=1
DO i=1,SIZE(a,1)
DO j=1,SIZE(a,2)
a(i,j)=k
k=k+1
END DO
END DO
CALL tick(txtclock)
WRITE(rowfmt,'(A,I4,A)') '(',numcols,'(1X,I6))'
OPEN(UNIT=12, FILE="aoutput.txt", ACTION="write", STATUS="replace", &
RECL=(7*numcols+10))
DO i=1,numrows
WRITE(12,FMT=rowfmt) (a(i,j), j=1,numcols)
END DO
CLOSE(UNIT=12)
txttime = tock(txtclock)
CALL tick(binclock)
OPEN(UNIT=13, FILE="boutput.dat", ACTION="write", STATUS="replace", &
FORM="unformatted")
WRITE(13) a
CLOSE(UNIT=13)
bintime = tock(binclock)
PRINT *, 'ASCII time = ', txttime
PRINT *, 'Binary time = ', bintime
CONTAINS
SUBROUTINE tick(t)
INTEGER, INTENT(OUT) :: t
CALL system_clock(t)
END SUBROUTINE tick
! returns time in seconds from now to time described by t
REAL FUNCTION tock(t)
INTEGER, INTENT(IN) :: t
INTEGER :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
END FUNCTION tock
END PROGRAM test3

This may be a very roundabout and time-consuming way of doing it, but anyway... You could simply print each array element separately, using advance='no' (to suppress insertion of a newline character after what was being printed) in your write statement. Once you're done with a line you use a 'normal' write statement to get the newline character, and start again on the next line. Here's a small example:
program testing
implicit none
integer :: i, j, k
k = 1
do i=1,4
do j=1,10
write(*, '(I2,X)', advance='no') k
k = k + 1
end do
write(*, *) '' ! this gives you the line break
end do
end program testing
When you run this program the output is as follows:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40

Using an "*" is list-directed IO -- Fortran will make the decisions for you. Some behaviors aren't specified. You could gain more control using a format statement. If you wanted to positively identify row boundaries you write a marker symbol after each row. Something like:
DO i=1,numrows
WRITE(12,*) a(i,:)
write (12, '("X")' )
END DO
Addendum several hours later:
Perhaps with large values of numcols the lines are too long for some programs that are you using to examine the file? For the output statement, try:
WRITE(12, '( 10(2X, I11) )' ) a(i,:)
which will break each row of the matrix, if it has more than 10 columns, into multiple, shorter lines in the file.