fastest way to read large text file - file

I am looking to pull certain groups of lines from large (~870,000,000 line/~4GB) text files. As a small example, in a 50 line file I might want lines 3-6, 18-27, and 39-45. Using SO to start, and writing some programs to benchmark with my data, it seems that fortran90 has given me the best results (as compared with python, shell commands (bash), etc...).
My current scheme is simply to open the file and use a series of loops to move the read pointer to where I need and writing the results to an output file.
With the above small example this would look like:
open(unit=1,fileName)
open(unit=2,outFile)
do i=1,2
read(1,*)
end do
do i=3,6
read(1,*) line
write(2,*) line
end do
do i=7,17
read(1,*)
end do
do i=18,27
read(1,*) line
write(2,*) line
end do
do i=28,38
read(1,*)
end do
do i=39,45
read(1,*) line
write(2,*) line
end do
*It should be noted I am assuming buffered i/o when compiling, although this seems to only minimally speed things up.
I am curious if this is the most efficient way to accomplish my task. If the above is in fact the best way to do this with fortran90, is there another language more suited to this task?
*Update: Made sure I was using buffered i/o, manually finding the most efficient blocksize/blockcount. That increased speed by about 7%. I should note that the files I am working with do not have a fixed record length.

You can also try to use sed utility.
sed '3,6!d' yourfile.txt
sed '18,27!d' yourfile.txt
Unix utilities tend to be very optimized and to solve easy tasks like this very fast.

One should be able to do this is most any language, so sticking with the theme here is something that should be close to working if you fix up the typos.
(If I had a fortran compiler on an iPad that would make it more useful.)
PROGRAM AA
IMPLICIT NONE
INTEGER :: In_Unit, Out_Unit, I
LOGICAL, DIMENSION(1000) :: doIt
CHARACTER(LEN=20) :: FileName = 'in.txt'
CHARACTER(LEN=20) :: Outfile = 'out.txt'
CHARACTER(LEN=80) :: line
open(NEWunit=In_Unit, fileName) ! Status or action = read only??
open(NEWunit=Out_Unit, outFile) ! Status or action = new or readwrite??
DoIt = .FALSE.
DoIt(3:6) = .TRUE.
DoIt(18:27) = .TRUE.
DoIt(39:45) = .TRUE.
do i=1,1000
read(I_Unit,*) line
IF(doIt(I)) write(Out_Unit,*) line
end do
CLOSE(In_Unit)
CLOSE(Out_Unit)
END PROGRAM AA

Related

Fortran Getting inputs from a file

I have a Fortran code that gets integers from users, sorts them in descending order with bubble sort, then writes it into a file. However, in the beginning, instead of getting the integers from users, I have to get them from a file. How can I do that? Could you please help me? Thanks.
PROGRAM project
IMPLICIT NONE
INTEGER array(1000),t,p,c
PRINT*,"Enter 1000 element array"
READ*,array
c=1
OPEN(UNIT=25,FILE="sorted.txt")
DO p=1,999
DO c=1,999
IF (array(c)>array(c+1)) then
t=array(c)
array(c)=array(c+1)
array(c+1)=t
ENDIF
ENDDO
ENDDO
WRITE(98,*) array
CLOSE(98)
PRINT*,array(2:999)
END PROGRAM
Before I start, you open the file to write the output to as unit 25:
OPEN(UNIT=25,FILE="sorted.txt")
But you write and close unit 98:
WRITE(98,*) array
CLOSE(98)
Is this an error in your question, otherwise you might want to investigate.
That said, if you are running the program from a shell like bash, the easiest way is to use the output redirecting:
$ cat unsorted.txt > sort.exe
That way you don't have to change anything in your code.
If you don't want to do that, you need to open and read from the file in your program:
open(unit=24, file='unsorted.txt', action='READ', &
status='OLD', form="FORMATTED")
read(24, *) array
close(24)
This assumes that the file with the unsorted data contains exactly 1000 integers and nothing else.

Read 4D arrays from two binary files and write into a single one

Anyone knows how to read and write two unformatted binary files (each one having a 4-dimensional variable) into a single .bin file with the same dimensions?
I am trying this, but without success (I am new to Fortran):
program teste
implicit none
real, dimension(144,73,12,4) :: air,hgt
integer :: l,k,reclen
real :: irec
inquire(iolength=reclen)air
open(1,file='air.bin',status='old',form='unformatted',access='direct',action='read',recl=reclen)
open(2,file='hgt.bin',status='old',form='unformatted',access='direct',action='read',recl=reclen)
open(3,file='air_hgt.bin',form='unformatted',access='direct',action='write',recl=reclen)
read(1,rec=1)air
read(2,rec=1)hgt
close(1)
close(2)
irec=0
do l=1,4
do k=1,12
irec=irec+1
write(3,rec=irec)air(:,:,k,l)
end do
do k=1,12
irec=irec+1
write(3,rec=irec)hgt(:,:,k,l)
end do
end do
close(3)
end program teste
When I run the code, the compiler never stops running, even though the .bin files are very small (2MB each). So, after a few minutes, I force it to stop running and the file created is very large (hundreds of gigabytes). Doesn't seem to make much sense.

f90 read .txt file return NaN

I am trying to read a .txt file with multiple arrays with a fortran program.
It looks like the program is finding the file but it only returns NaN value...
!
INTEGER :: T, RH, i, j, ierror
!
REAL, DIMENSION(3,3) :: AFILE
!
LOGICAL :: dir_e
inquire(file='PSR_FAB.txt', exist=dir_e)
if ( dir_e ) then
print*, "dir exists!"
else
print*, 'nope'
end if
OPEN (UNIT = 1234 , FILE = 'PSR_FAB.txt', STATUS = 'OLD', ACTION = 'READ')
DO i=1,3
READ(1234,*, IOSTAT=ierror) (AFILE(i,j),j=1,3)
print*, (AFILE(i,j),j=1,3)
! if (ierror>0) then
! stop 'Error while reading from file. '
! elseif (ierror<0) then
! print* ,PSR_FILE
! stop 'Reached end of file. '
! endif
ENDDO
CLOSE(UNIT=1234)
!
T=2
RH=3
print*,AFILE(T,RH)
!
In order to test the program, I'm using the following .txt file:
1 2 3
4 5 6
7 8 9
Also, when I am using the "ierror if test", the "Reaching end of file" pops out, which mean ierror<0, which mean the end of file is reach.
At first I thought it was because it could not find the file, but when I inquire it, it has no problem finding it...
And as I said earlier, the AFILE contains only NaN value after the file has been read.
I am wondering if the problem lies in the .txt file or in the code. Maybe it is the READ statement, but the code seems ok to me.
I am kind of stuck at the moment and out of ideas... Any thoughts?
Thank you
I experienced a similar problem when I ran a program on a Windows 2000, but it worked fine on a Windows 7 box.
In my situation, the application was built on a Win7 box using Intel Fortran V11. This application worked on a Win7 box, but when run on a Win2000 box it failed to read the first 2 entries.
Try compiling with the /arch:IA32 option. This will use the X87 instruction set instead of the SSE2 enhanced instructions. This produces an application that works on both platforms.
It's possible the PSR_FAB.txt file is being opened at a position that is not the beginning of the file. Without specifying the POSITION= attribute to the open statement the file position is taken ASIS. I'm not certain what conditions result in ASIS yielding a position other than the start of the file, though.
I recommend specifying the file be rewound on opening, with:
OPEN (UNIT = 1234 , FILE = 'PSR_FAB.txt', STATUS = 'OLD', &
ACTION = 'READ', POSITION = 'REWIND')
There are other issues that could cause this problem (or others like it), though.

How to read from a specific line from a text file in VHDL

I am doing a program in VHDL to read and write data. My program has to read data from a line, process it, and then save the new value in the old position. My code is somewhat like:
WRITE_FILE: process (CLK)
variable VEC_LINE : line;
file VEC_FILE : text is out "results";
begin
if CLK='0' then
write (VEC_LINE, OUT_DATA);
writeline (VEC_FILE, VEC_LINE);
end if;
end process WRITE_FILE;
If I want to read line 15, how can I specify that? Then I want to clear line 15 and have to write a new data there. The LINE is of access type, will it accept integer values?
Russell's answer - using two files - is the answer.
There isn't a good way to find the 15th line (seek) but for VHDL's purpose, reading and discarding the first 14 lines is perfectly adequate. Just wrap it in a procedure named "seek" and carry on!
If you're on the 17th line already, you can't seek backwards, or rewind to the beginning. What you can do is flush the output file (save the open line, copy the rest of the input file to it, close both files and reopen them. Naturally, this requires VHDL-93 not VHDL-87 syntax for file operations). Just wrap that in a procedure called "rewind", and carry on!
Keep track of the current line number, and now you can seek to line 15, wherever you are.
It's not pretty and it's not fast, but it'll work just fine. And that's good enough for VHDL's purposes.
In other words you can write a text editor in VHDL if you must, (ignoring the problem of interactive input, though reading stdin should work) but there are much better languages for the job. One of them even looks a lot like an object-oriented VHDL...
Use 2 files, an input file and an output file.
file_open(vectors, "stimulus/input_vectors.txt", read_mode);
file_open(results, "stimulus/output_results.txt", write_mode);
while not endfile(vectors) loop
readline(vectors, iline);
read(iline, a_in);
etc for all your input data...
write(oline, <output data>
end loop;
file_close(vectors);
file_close(results);

Fortran. Keep reading first word of the document until it matches the input.

Good evening!
I am trying to read a text document in Fortran 95 and do some calculations with the values in it. The document has numerous gas names and certain values of 'A' assigned to them. So essentially it look something like this:
Document 1: gas values.
GAS A1 A2
steam 1 2
air 3 4
I want then the user to input a gas name (variable gasNameIn) and implement while loop to keep searching for the gas until it matches the input. So eg. user inputs 'air' and the program starts reading first words until air comes up. It then read values of A1 and A2 and uses them for calculation. What I did for it is that I opened the file as unit 25 and tried the following loop:
do while(gasName .NE. gasNameIn)
read(25, *) gasName
if (gasName .EQ. gasNameIn)
read(25,*) A1, A2
endif
enddo
but I get an error "End of file on unit 25".
Any ideas on how my while loop is wrong? Thank you!
By the first read statement, you read the name correctly, but Fortran then proceeds to the next line (record). Try to use
read(25, *, advance='no') gasName
If your searched gas was on the last line, you get the end of file error. Otherwise you will have an error when reading A1.
you need to read whole lines as strings and process. This is untested but the jist of it:
character*100 wholeline
wholeline=''
do while(index(wholeline,'air').eq.0)
read(unit,'(a)')wholeline
end do
then if you can count on the the first strings taking up ~7 cols like in the example,
read(wholeline(7:),*)ia1,ia2
What happened is that you read the whole line in as "gasName" and tested the whole line to see if it was equivalent to "gasNameIn". It never will be the same if the data is laid out the way you have in your sample, so you will get to the end of your file before you ever get a match. It has been a while since I've written in Fortran, but in the first loop "gasName" will be undefined. Usually that is a no no in programming.
The answer that got in before I could get mine typed in is the way forward with your problem. Give it a go. If you still have some trouble maybe I'll fire up a Fortran compiler and try my hand at Fortran again. (It's been since the early 90's that I've done any.)
CHEERS!
Hi, here is your code modified a bit, you can pitch what you don't need. I just added a couple of lines to make it a self-contained program. For instance, There is a statement which assigns the value "steam" to "gasNameIn". I think you mentioned that you would have the user enter a value for that. I've also added a "write" statement to show that your program has properly read the values. It is a crude kind of "unit test". Otherwise, I've only addressed the part of the program that your question asked about. Check this out:
character (len=5) :: gasName, gasNameIn
gasNameIn = "steam"
open (unit = 25, file = "gasvalues.txt")
do while (gasName .NE. gasNameIn)
read (25, *) gasName
if (gasName .EQ. gasNameIn) then
backspace (25)
read (25,*) gasName, A1, A2
write (*, *) "The gas is: ", gasName, ". The factors are A1: ", A1, "; A2: ", A2
endif
end do
close (25)
end
Ok, here are a couple of notes to help clarify what I've done. I tried to keep your code as intact as I could manage. In your program there were some things that my fortran 95 compiler complained about namely: the "then" is required in the "if" test, and "enddo" needed to be "end do". I think those were the main compiler issues I had.
As far as your code goes, when your program tests that "gasName" is the same as "gasNameIn", then you want to "backup" to that record again so that you can re-read the line. There are probably better ways to do this, but I tried to keep your program ideas the way you wanted them. Oh, yes, one more thing, I've named the input file "gasvalues.txt".
I hope this helps some.
CHEERS!

Resources