Reading a specific line in Julia - file

Since I'm new in Julia I have sometimes obvious for you problems.
This time I do not know how to read the certain piece of data from the file i.e.:
...
stencil: half/bin/3d/newton
bin: intel
Per MPI rank memory allocation (min/avg/max) = 12.41 | 12.5 | 12.6 Mbytes
Step TotEng PotEng Temp Press Pxx Pyy Pzz Density Lx Ly Lz c_Tr
200000 261360.25 261349.16 413.63193 2032.9855 -8486.073 4108.1669
200010 261360.45 261349.36 413.53903 22.925126 -29.762605 132.03134
200020 261360.25 261349.17 413.46495 20.373081 -30.088775 129.6742
Loop
What I want is to read this file from third row after "Step" (the one which starts at 200010 which can be a different number - I have many files which stars at the same place but from different integer) until the program will reach the "Loop". Could you help me please? I'm stacked - I don't know how to combine the different options of julia to do it...

Here is one solution. It uses eachline to iterate over the lines. The loop skips the header, and any empty lines, and breaks when the Loop line is found. The lines to keep are returned in a vector. You might have to modify the detection of the header and/or the end token depending on the exact file format you have.
julia> function f(file)
result = String[]
for line in eachline(file)
if startswith(line, "Step") || isempty(line)
continue # skip the header and any empty lines
elseif startswith(line, "Loop")
break # stop reading completely
end
push!(result, line)
end
return result
end
f (generic function with 2 methods)
julia> f("file.txt")
3-element Array{String,1}:
"200000 261360.25 261349.16 413.63193 2032.9855 -8486.073 4108.1669"
"200010 261360.45 261349.36 413.53903 22.925126 -29.762605 132.03134"
"200020 261360.25 261349.17 413.46495 20.373081 -30.088775 129.6742"

Related

Reordering array dimensions in matlab

First, some background:
As part of a much larger code I'm reading in data from a netCDF input file. I produce this input file beforehand. The code has been written to expect a term F which is an array shaped like t-by-x-by-y-by-z where time t usually has around 20 values, and x and y dimensions are usually of the order 1000 entries each and z has usually about 5.
In summary, F is a 20x1000x1000x5 array.
This format is incredibly slow to read. It is many times faster to read it if it's written in the format x-by-y-by-z-by-t.
So what instead I am now producing an input netCDF file containing Fnew, which is a 1000x1000x5x20 array.
Now my question: I want to make as few changes to the larger code as possible, so after Fnew is read in, I immediately want to rearrange it to match F.
There must be an easy solution to this?
As Cris Luengo commented, the permute function is what you need. This is how you would use it in your context:
function test()
%% instead of the current (using 10 instead of 1000 just for demo purposes):
txyz = rand(20,10,10,5);
largerCodeOld(txyz);
%% you can now use:
xyzt = rand(10,10,5,20);
largerCodeNew( xyzt );
end
function largerCodeOld(txyz)
% do stuff with txyz
end
function largerCodeNew(xyzt)
txyz = permute(xyzt,[ 4, 1:3 ]);
% either do stuff with txyz
% or call largerCodeOld( txyz )
end

Outputting the rows from an (ixj) array into individual (5xj/5) arrays in a text file

In a program I'm writing, I've created an allocated, final product array AFT(n,92). In my output I would like present each row as its own table, 5 columns wide.
So in this case, it would be n individual tables of 19 rows X 5 columns with only 2 values on the final row. I attempted doing this as a do loop as shown in the code snip below, but the output comes out as just one long column. I'm not sure where to go from here.
DO i=1,n
WRITE(4,800) t(i), ' HHMM LDT' !Writes the table header using an array which holds the corresponding time value
800 FORMAT(14, A9)
DO j=1,92
WRITE(4,900) AFT(i,j)
900 FORMAT(5ES23.14)
END DO
END DO
I believe this is happening because the write command is performed for each j individually due to the use of a loop, but my inexperience with FORTRAN is leading me to a blank when I try to come up another approach.
Yes, each write statement produces one line of text output. If you want multiple items to be included in the same output record, you have to include them in the write statement. If you want to include portions of an array, you can use techniques such as:
do i=1, N
write (*, *) (array (i,j), j=1, 5)
end do
or
do i=1, N
write (*, *) array (i, 1:5)
end do
The first is using implied do loops, the second array sections.

find and delete lines in file python 3

I use python 3
Okay, I got a file that lock like this:
id:1
1
34
22
52
id:2
1
23
22
31
id:3
2
12
3
31
id:4
1
21
22
11
how can I find and delete only this part of the file?
id:2
1
23
22
31
I have been trying a lot to do this but can't get it to work.
Is the id used for the decision to delete the sequence, or is the list of values used for the decision?
You can build a dictionary where the id number is the key (converted to int because of the later sorting) and the following lines are converted to the list of strings that is the value for the key. Then you can delete the item with the key 2, and traverse the items sorted by the key, and output the new id:key plus the formated list of the strings.
Or you can build the list of lists where the order is protected. If the sequence of the id's is to be protected (i.e. not renumbered), you can also remember the id:n in the inner list.
This can be done for a reasonably sized file. If the file is huge, you should copy the source to the destination and skip the unwanted sequence on the fly. The last case can be fairly easy also for the small file.
[added after the clarification]
I recommend to learn the following approach that is usefull in many such cases. It uses so called finite automaton that implements actions bound to transitions from one state to another (see Mealy machine).
The text line is the input element here. The nodes that represent the context status are numbered here. (My experience is that it is not worth to give them names -- keep them just stupid numbers.) Here only two states are used and the status could easily be replaced by a boolean variable. However, if the case becomes more complicated, it leads to introduction of another boolean variable, and the code becomes more error prone.
The code may look very complicated at first, but it is fairly easy to understand when you know that you can think about each if status == number separately. This is the mentioned context that captured the previous processing. Do not try to optimize, let the code that way. It can actually be human-decoded later, and you can draw the picture similar to the Mealy machine example. If you do, then it is much more understandable.
The wanted functionality is a bit generalized -- a set of ignored sections can be passed as the first argument:
import re
def filterSections(del_set, fname_in, fname_out):
'''Filtering out the del_set sections from fname_in. Result in fname_out.'''
# The regular expression was chosen for detecting and parsing the id-line.
# It can be done differently, but I consider it just fine and efficient.
rex_id = re.compile(r'^id:(\d+)\s*$')
# Let's open the input and output file. The files will be closed
# automatically.
with open(fname_in) as fin, open(fname_out, 'w') as fout:
status = 1 # initial status -- expecting the id line
for line in fin:
m = rex_id.match(line) # get the match object if it is the id-line
if status == 1: # skipping the non-id lines
if m: # you can also write "if m is not None:"
num_id = int(m.group(1)) # get the numeric value of the id
if num_id in del_set: # if this id should be deleted
status = 1 # or pass (to stay in this status)
else:
fout.write(line) # copy this id-line
status = 2 # to copy the following non-id lines
#else ignore this line (no code needed to ignore it :)
elif status == 2: # copy the non-id lines
if m: # the id-line found
num_id = int(m.group(1)) # get the numeric value of the id
if num_id in del_set: # if this id should be deleted
status = 1 # or pass (to stay in this status)
else:
fout.write(line) # copy this id-line
status = 2 # to copy the following non-id lines
else:
fout.write(line) # copy this non-id line
if __name__ == '__main__':
filterSections( {1, 3}, 'data.txt', 'output.txt')
# or you can write the older set([1, 3]) for the first argument.
Here the output id-lines where given the original number. If you want to renumber the sections, it can be done via a simple modification. Try the code and ask for details.
Beware, the finite automata have limited power. They cannot be used for the usual programming languages as they are not able to capture nested paired structures (like parenteses).
P.S. The 7000 lines is actually a tiny file from a computer perspective ;)
Read each line into an array of strings. The index number is the line number - 1. Check if the line equals "id:2" before you read the line. If yes, then stop reading the line until the line equals "id:3". After reading the line, clear the file and write the array back to the file until the end of the array. This may not be the most efficient way but should work.
if there isn't any values in between that would interfere this would work....
import fileinput
...
def deleteIdGroup( number ):
deleted = False
for line in fileinput.input( "testid.txt", inplace = 1 ):
line = line.strip( '\n' )
if line.count( "id:" + number ): # > 0
deleted = True;
elif line.count( "id:" ): # > 0
deleted = False;
if not deleted:
print( line )
EDIT:
sorry this deletes id:2 and id:20 ... yuo could modify it so that the first if checks - line == "id:" + number

In Fortran 90, what is a good way to write an array to a text file, row-wise?

I am new to Fortran, and I would like to be able to write a two-dimensional array to a text file, in a row-wise manner (spaces between columns, and each row on its own line). I have tried the following, and it seems to work in the following simple example:
PROGRAM test3
IMPLICIT NONE
INTEGER :: i, j, k, numrows, numcols
INTEGER, DIMENSION(:,:), ALLOCATABLE :: a
numrows=5001
numcols=762
ALLOCATE(a(numrows,numcols))
k=1
DO i=1,SIZE(a,1)
DO j=1,SIZE(a,2)
a(i,j)=k
k=k+1
END DO
END DO
OPEN(UNIT=12, FILE="aoutput.txt", ACTION="write", STATUS="replace")
DO i=1,numrows
WRITE(12,*) (a(i,j), j=1,numcols)
END DO
END PROGRAM test3
As I said, this seems to work fine in this simple example: the resulting text file, aoutput.txt, contains the numbers 1-762 on line 1, numbers 763-1524 on line 2, and so on.
But, when I use the above ideas (i.e., the last fifth-to-last, fourth-to-last, third-to-last, and second-to-last lines of code above) in a more complicated program, I run into trouble; each row is delimited (by a new line) only intermittently, it seems. (I have not posted, and probably will not post, here my entire complicated program/script--because it is rather long.) The lack of consistent row delimiters in my complicated program/script probably suggests another bug in my code, not with the four-line write-to-file routine above, since the above simple example appears to work okay. Still, I am wondering, can you please help me think if there is a better row-wise write-to-text file routine that I should be using?
Thank you very much for your time. I really appreciate it.
There's a few issues here.
The fundamental one is that you shouldn't use text as a data format for sizable chunks of data. It's big and it's slow. Text output is good for something you're going to read yourself; you aren't going to sit down with a printout of 3.81 million integers and flip through them. As the code below demonstrates, the correct text output is about 10x slower, and 50% bigger, than the binary output. If you move to floating point values, there are precision loss issues with using ascii strings as a data interchange format. etc.
If your aim is to interchange data with matlab, it's fairly easy to write the data into a format matlab can read; you can use the matOpen/matPutVariable API from matlab, or just write it out as an HDF5 array that matlab can read. Or you can just write out the array in raw Fortran binary as below and have matlab read it.
If you must use ascii to write out huge arrays (which, as mentioned, is a bad and slow idea) then you're running into problems with default record lengths in list-drected IO. Best is to generate at runtime a format string which correctly describes your output, and safest on top of this for such large (~5000 character wide!) lines is to set the record length explicitly to something larger than what you'll be printing out so that the fortran IO library doesn't helpfully break up the lines for you.
In the code below,
WRITE(rowfmt,'(A,I4,A)') '(',numcols,'(1X,I6))'
generates the string rowfmt which in this case would be (762(1X,I6)) which is the format you'll use for printing out, and the RECL option to OPEN sets the record length to be something bigger than 7*numcols + 1.
PROGRAM test3
IMPLICIT NONE
INTEGER :: i, j, k, numrows, numcols
INTEGER, DIMENSION(:,:), ALLOCATABLE :: a
CHARACTER(LEN=30) :: rowfmt
INTEGER :: txtclock, binclock
REAL :: txttime, bintime
numrows=5001
numcols=762
ALLOCATE(a(numrows,numcols))
k=1
DO i=1,SIZE(a,1)
DO j=1,SIZE(a,2)
a(i,j)=k
k=k+1
END DO
END DO
CALL tick(txtclock)
WRITE(rowfmt,'(A,I4,A)') '(',numcols,'(1X,I6))'
OPEN(UNIT=12, FILE="aoutput.txt", ACTION="write", STATUS="replace", &
RECL=(7*numcols+10))
DO i=1,numrows
WRITE(12,FMT=rowfmt) (a(i,j), j=1,numcols)
END DO
CLOSE(UNIT=12)
txttime = tock(txtclock)
CALL tick(binclock)
OPEN(UNIT=13, FILE="boutput.dat", ACTION="write", STATUS="replace", &
FORM="unformatted")
WRITE(13) a
CLOSE(UNIT=13)
bintime = tock(binclock)
PRINT *, 'ASCII time = ', txttime
PRINT *, 'Binary time = ', bintime
CONTAINS
SUBROUTINE tick(t)
INTEGER, INTENT(OUT) :: t
CALL system_clock(t)
END SUBROUTINE tick
! returns time in seconds from now to time described by t
REAL FUNCTION tock(t)
INTEGER, INTENT(IN) :: t
INTEGER :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
END FUNCTION tock
END PROGRAM test3
This may be a very roundabout and time-consuming way of doing it, but anyway... You could simply print each array element separately, using advance='no' (to suppress insertion of a newline character after what was being printed) in your write statement. Once you're done with a line you use a 'normal' write statement to get the newline character, and start again on the next line. Here's a small example:
program testing
implicit none
integer :: i, j, k
k = 1
do i=1,4
do j=1,10
write(*, '(I2,X)', advance='no') k
k = k + 1
end do
write(*, *) '' ! this gives you the line break
end do
end program testing
When you run this program the output is as follows:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
Using an "*" is list-directed IO -- Fortran will make the decisions for you. Some behaviors aren't specified. You could gain more control using a format statement. If you wanted to positively identify row boundaries you write a marker symbol after each row. Something like:
DO i=1,numrows
WRITE(12,*) a(i,:)
write (12, '("X")' )
END DO
Addendum several hours later:
Perhaps with large values of numcols the lines are too long for some programs that are you using to examine the file? For the output statement, try:
WRITE(12, '( 10(2X, I11) )' ) a(i,:)
which will break each row of the matrix, if it has more than 10 columns, into multiple, shorter lines in the file.

Using a Do-Loop, to read a column of data (numerical & string) and filter the numbers as output into another file

I have an output file, single column, where each 7th line is a string and the others numerical (something like below)
998.69733
377.29340
142.22397
53.198547
19.743515
7.5493960
timestep: 1
998.69733
377.29340
142.22047
53.188023
19.755905
7.5060229
timestep: 2
998.69733
377.29337
I need to read this data into another file, omitting the text and keeping only the numbers and tried a loop to allocate a dummy for my string but I get an error as it does not recognize (AI).
DO 10 I = 1, 1000
IF (MOD(I,7) == 0) THEN
READ (8, FMT= '(AI)') dummy
END IF
READ (8,*) val
WRITE (9,*) val
10 CONTINUE
(8 - input file and 9 - output file allocation)
I am quite new to Fortran and spent a lot of time surfing for a solution or at least a similar problem but did not find anything. I would really appreciate some help.
Thank you very much in advance.
If you just want to skip the seventh lines, you could do "read (8, '(A)' ) dummy" where dummy is declared as a character string (i.e., "character (len=80) :: dummy"). It won't matter than some of the characters are letters and others numbers.
P.S. The modern was to write loops is "do", "end do" ... no need for line numbers and continue statements.
Simply use list-directed input with an empty input item list.
Also, the seventh line gets read twice in your loop. Put the read and write of val in an ELSE section, or, alternatively, us the CYCLE statement:
DO I = 1, 1000
IF(MOD(I,7) == 0) THEN
READ(8,*)
CYCLE
END IF
READ(8,*) val
WRITE(9,*) val
END DO

Resources