I have a data file with 4 columns:
x y u v
such that x and y are the coordinate positions associated to the values u and v.
The data is structured such that
x y u v
1 1 # #
2 1 # #
3 1 # #
...
However, I would like to restructure the file such that
x y u v
1 1 # #
1 2 # #
1 3 # #
...
Is there a function in fortran which can achieve this?
Well, I never make claims about "pretty," but it should do the job. Obviously, you will need to check your FORMAT statements:
PROGRAM TEST
REAL*8 :: U(4,4)
REAL*8 :: V(4,4)
INTEGER :: X, Y
DO
READ(*,'(2I2)',ADVANCE='NO',END=10) X,Y
READ(*,'(2F6.1)',ADVANCE='YES',END=10) U(X,Y),V(X,Y)
END DO
10 CONTINUE
WRITE(*,'(2I4,2F10.2)') ((I,J,U(I,J),V(I,J),J=1,4),I=1,4)
END
I'm assuming that your arrays are already allocated properly.
Here's my input file:
$ cat test.in
1 1 5.0 10.0
2 1 1.3 -0.2
3 1 5.1 0.0
4 1 -9.1 3.0
1 2 4.0 2.0
2 2 14.0 -8.0
3 2 -8.0 8.0
4 2 4.0 9.6
1 3 2.0 1.1
2 3 3.4 8.0
3 3 4.0 7.0
4 3 4.0 4.1
1 4 5.5 8.4
2 4 34.1 23.0
3 4 -4.1 4.0
4 4 6.0 8.4
And the output:
$ cat test.in | ./a.out
1 1 5.0 10.0
1 2 4.0 2.0
1 3 2.0 1.1
1 4 5.5 8.4
2 1 1.3 -0.2
2 2 14.0 -8.0
2 3 3.4 8.0
2 4 34.1 23.0
3 1 5.1 0.0
3 2 -8.0 8.0
3 3 4.0 7.0
3 4 -4.1 4.0
4 1 -9.1 3.0
4 2 4.0 9.6
4 3 4.0 4.1
4 4 6.0 8.4
Related
Say I have the following dataframe:
df = pd.DataFrame({'A' : [0, 0.3, 0.8, 1, 1.5, 2.3, 2.3, 2.9], 'B' : randn(8)})
df
Out[86]:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
3 1.0 -0.870462
4 1.5 -0.700132
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
How could I split this dataframe based on the range of Col A values as in the following (0<=A<1, 1<=A<2, etc.)?:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
A B
3 1.0 -0.870462
4 1.5 -0.700132
A B
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
I know that np.array could be used if this were to be split based on equal number of rows:
np.array_split(df, 3)
but does something similar exist that allows me to apply my condition here?
Try with groupby and floored division:
for k, d in df.groupby(df['A']//1):
print(d)
Output:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
A B
3 1.0 -0.870462
4 1.5 -0.700132
A B
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
I'm a Julia beginner (scripting beginner too).
I have a text file which consists in 4 columns:
1 5.4 9.5 19.5
2 5.4 9.4 20.6
2 6.2 9.6 18.3
1 9.1 0.5 17.2
2 8.5 1.4 19.6
2 8.4 0.6 24.1
etc.
I have no idea how in Julia I can replace certain values in the rows or add a new one according to a existing column pattern 122 122. For example I would like to add the column with letter C and O (C when is 1 in the first column and O when is 2). I would like to add new column after the one with C and O where the pattern 1 2 2 is designated by number 4 and next by number 5. This is how I imagine the result:
C 4 1 5.4 9.5 19.5
O 4 2 5.4 9.4 20.6
O 4 2 6.2 9.6 18.3
C 5 1 9.1 0.5 17.2
O 5 2 8.5 1.4 19.6
O 5 2 8.4 0.6 24.1
Thank you for your help in advance.
Kasia.
String processing is fairly straightforward in Julia. You might write a function that takes an input and output filename as follows:
function munge_file(in::AbstractString, out::AbstractString)
# open the output file for writing
open(out, "w") do out_io
# open the input file for reading
open(in, "r") do in_io
# and process the contents
munge_file(in_io, out_io)
end
end
end
Now, the inner call to munge_file will have to do the actual work (this isn't particularly optimized, but should very straightforward):
function munge_file(input::IO, io::IO = IOBuffer())
# initialize the pattern index
pattern_index = 3
# iterate over each line of the input
for line in eachline(input)
# skip empty lines
isempty(line) && continue
# split the current line into parts
parts = split(line, ' ')
# this line doesn't conform to the specified input pattern
# might be better to throw an error here
length(parts) == 4 || continue
# this line starts a new pattern if the first character is a 1
is_start = parse(Int, parts[1]) == 1
# increment the counter (for the second output column)
pattern_index += is_start
# first column depends on whether a 1 2 2 pattern starts here or not
print(io, is_start ? 'C' : 'O')
print(io, ' ')
# print the pattern counter
print(io, pattern_index)
print(io, ' ')
# print the original line
println(io, line)
end
return io
end
Using the code in the REPL produces the expected output:
shell> cat input.txt
1 5.4 9.5 19.5
2 5.4 9.4 20.6
2 6.2 9.6 18.3
1 9.1 0.5 17.2
2 8.5 1.4 19.6
2 8.4 0.6 24.1
julia> munge_file("input.txt", "output.txt")
IOStream(<file output.txt>)
shell> cat output.txt
C 4 1 5.4 9.5 19.5
O 4 2 5.4 9.4 20.6
O 4 2 6.2 9.6 18.3
C 5 1 9.1 0.5 17.2
O 5 2 8.5 1.4 19.6
O 5 2 8.4 0.6 24.1
Assuming your file is input.txt you could do:
open("output.txt","w") do f
println.(Ref(f),replace.(replace.(readlines("input.txt"),r"^1 "=>"C "), r"^2 "=>"O "))
end;
Dots (.) in the above code vectorize it so functions work on vectors rather than scalars. The replace function takes a String, regular expression and new value. ^ in regular expression means "line starts with".
I've been trying to merge 2 NFL dataframes of different sizes and freq,but 2 same same columns of teiamname and year, the first one the index is team name and year, and are the year avgs, the next one is sorted by tm name and year but is broken into weekly games 1-17, so I've been trying to merge on the team name and year then give the yearly avgs,which is 9 columns and then per that year per week(1-17) on 11 different columns. I have been at this for 2 weeks, iv tried every which way, multi indexing, ... I can iterate through each datframe and append to an array in the right order but when I try to make that list a DF.. no go, tried multi indexing groupby....
any help would be greatly appreciated
Thanks
Year Tm_name W L W_L_Pct PD MoV SoS SRS OSRS DSRS
1 2015 1 13.0 3.0 0.813 176.0 11.0 1.3 12.3 9.0 3.4
2 2016 1 7.0 8.0 0.469 56.0 3.5 -1.9 1.6 2.4 -0.8
3 2017 1 8.0 8.0 0.500 -66.0 -4.1 0.4 -3.7 -4.0 0.2
4 2018 1 3.0 13.0 0.188 -200.0 -12.5 1.0 -11.5 -9.6 -1.9
5 2015 2 8.0 8.0 0.500 -6.0 -0.4 -3.4 -3.8 -4.0 0.3
Week Year Date Tm_name win_loss home_away Opp1_team Tm_Pnts \
0 1 2018 2018-09-09 1 0.0 1.0 32.0 6.0
1 2 2018 2018-09-16 1 0.0 0.0 18.0 0.0
2 3 2018 2018-09-23 1 0.0 1.0 6.0 14.0
3 4 2018 2018-09-30 1 0.0 1.0 28.0 17.0
4 5 2018 2018-10-07 1 1.0 0.0 29.0 28.0
Opp2_pnts Off_1stD Off_TotYd Def_1stD_All Def_TotYd_All
0 24.0 14.0 213.0 30.0 429.0
1 34.0 5.0 137.0 24.0 432.0
2 16.0 13.0 221.0 21.0 316.0
3 20.0 18.0 263.0 19.0 331.0
4 18.0 10.0 220.0 33.0 447.0
If you have 2 columns which are the same in both dataframes, why don't you use pandas.Dataframe.join to join the two tables? By that you would have all data for the team name and year in the same row.
I have a simple query. I am trying to get the standard deviation of each row between two columns in an array (n=2 for the length of the array; I know it's a small sample size)
It forms part of a longer code but simply:
data$i <- sd(data$x, data$y)^2 + (0.1)^2 / data$j
so my data would look like this:
x y
3 13
4 9
19 3
14 3
18 4
3 10
9 4
3 6
3 8
10 9
8 10
11 9
13 12
15 14
19 16
8 8
8 18
11 14
10 12
18 14
12 20
6 8
and, just using the sd(), I would like to get this:
7.1
3.5
11.3
7.8
9.9
4.9
3.5
2.1
3.5
0.7
1.4
1.4
0.7
0.7
2.1
0.0
7.1
2.1
1.4
2.8
5.7
1.4
To apply sd() across the rows, you would use apply
apply(data[, c("x","y")],1,sd)
I have a sorted (Ascending trend) array as
[1 1 1 1 1 1.2 1.6 2 2 2 2.4 2.4 2.4 2.6 3 3.5 3.6 3.8 3.9 4 4.3 4.3 4.6 5 5.02 6 7]
I want to check and print the number of the repeated numbers between each "natural numbers".
for example:
between 1 and 2: 0 (no repeated)
between 2 and 3: 3 repeated with 2.4
between 3 and 4: 0
between 4 and 5: 2 repeated with 4.3
between 5 and 6: 0
between 6 and 7: 0
Is there any function in MATLAB to do this task?
you can use tabulate, and the array need not be even sorted for that.
Then just select the proper elements using logical conditions. For example:
A=[1 1 1 1 1 1.2 1.6 2 2 2 2.4 2.4 2.4 2.6 3 3.5 3.6 3.8 3.9 4 4.3 4.3 4.6 5 5.02 6 7]
M=tabulate(A) % get frequency table
id1=mod(M(:,1),1)>0; % get indices for non integer values
id2=M(:,2)>1; % get indices for more than one occurrence
idx=id1 & id2; % get indices that combines the two above
ans=[M(idx,1) , M(idx,2)] % show value , # of repeats
ans =
2.4000 3.0000
4.3000 2.0000
the alternative is to use histc. So if your vector is stored in a then
h = histc(a,a); % count how many times the number is there, the a should be sorted
natNumbers = (mod(a,1)==0) .* h;
nonnatNum = (mod(a,1)>0).*h;
indNN = find(natNumbers>0);
indNNN = find(nonNatNumbers>1);
resultIndex = sort([indNN indNNN]);
result = [a(resultIndex);h(resultIndex)]
Then you can work with the result matrix by checking if there are any numbers between natural numbers