pandas merging 2 dataframes of difference size, columns and freq

pandas merging 2 dataframes of difference size, columns and freq - arrays

I've been trying to merge 2 NFL dataframes of different sizes and freq,but 2 same same columns of teiamname and year, the first one the index is team name and year, and are the year avgs, the next one is sorted by tm name and year but is broken into weekly games 1-17, so I've been trying to merge on the team name and year then give the yearly avgs,which is 9 columns and then per that year per week(1-17) on 11 different columns. I have been at this for 2 weeks, iv tried every which way, multi indexing, ... I can iterate through each datframe and append to an array in the right order but when I try to make that list a DF.. no go, tried multi indexing groupby....
any help would be greatly appreciated
Thanks
Year Tm_name W L W_L_Pct PD MoV SoS SRS OSRS DSRS
1 2015 1 13.0 3.0 0.813 176.0 11.0 1.3 12.3 9.0 3.4
2 2016 1 7.0 8.0 0.469 56.0 3.5 -1.9 1.6 2.4 -0.8
3 2017 1 8.0 8.0 0.500 -66.0 -4.1 0.4 -3.7 -4.0 0.2
4 2018 1 3.0 13.0 0.188 -200.0 -12.5 1.0 -11.5 -9.6 -1.9
5 2015 2 8.0 8.0 0.500 -6.0 -0.4 -3.4 -3.8 -4.0 0.3
Week Year Date Tm_name win_loss home_away Opp1_team Tm_Pnts \
0 1 2018 2018-09-09 1 0.0 1.0 32.0 6.0
1 2 2018 2018-09-16 1 0.0 0.0 18.0 0.0
2 3 2018 2018-09-23 1 0.0 1.0 6.0 14.0
3 4 2018 2018-09-30 1 0.0 1.0 28.0 17.0
4 5 2018 2018-10-07 1 1.0 0.0 29.0 28.0
Opp2_pnts Off_1stD Off_TotYd Def_1stD_All Def_TotYd_All
0 24.0 14.0 213.0 30.0 429.0
1 34.0 5.0 137.0 24.0 432.0
2 16.0 13.0 221.0 21.0 316.0
3 20.0 18.0 263.0 19.0 331.0
4 18.0 10.0 220.0 33.0 447.0

If you have 2 columns which are the same in both dataframes, why don't you use pandas.Dataframe.join to join the two tables? By that you would have all data for the team name and year in the same row.

Related

Splitting dataframes on range of values in one column

Say I have the following dataframe:
df = pd.DataFrame({'A' : [0, 0.3, 0.8, 1, 1.5, 2.3, 2.3, 2.9], 'B' : randn(8)})
df
Out[86]:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
3 1.0 -0.870462
4 1.5 -0.700132
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
How could I split this dataframe based on the range of Col A values as in the following (0<=A<1, 1<=A<2, etc.)?:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
A B
3 1.0 -0.870462
4 1.5 -0.700132
A B
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
I know that np.array could be used if this were to be split based on equal number of rows:
np.array_split(df, 3)
but does something similar exist that allows me to apply my condition here?

Try with groupby and floored division:
for k, d in df.groupby(df['A']//1):
print(d)
Output:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
A B
3 1.0 -0.870462
4 1.5 -0.700132
A B
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341

Use for loop to get i:i+1 columns

I want to use for loop so I can get the result for i:i+1 columns at each iteration. E.g: when i= 1, I get 1st & 2nd cols, i= 2 , 3rd and 4th cols.
m= [2 -3;4 6]
al= [1 3;-2 -4]
l=[1 0; 2 4]
Random.seed!(1234)
d= [rand(2:20, 10) rand(-1:10, 10)]
Random.seed!(1234)
c= [rand(0:30, 10) rand(-1:20, 10)]
n,g=size(w)
mx=zeros(n,2*g)
for i =1:g
mx[ : ,i:i+1] = m[:,i]' .+ (d[:,i]' .* al[:,i])' .+ (l[:,i]' .* c[:,i])
end
return mx
I got the following which is wrong when I compared it with doing the process manually
julia> mx
10×4 Array{Float64,2}:
8.0 -6.0 10.0 0.0
22.0 3.0 18.0 0.0
40.0 18.0 38.0 0.0
9.0 9.0 22.0 0.0
51.0 3.0 18.0 0.0
10.0 18.0 34.0 0.0
52.0 12.0 30.0 0.0
36.0 21.0 42.0 0.0
44.0 24.0 42.0 0.0
33.0 24.0 42.0 0.0
The first 2 cols and last 2 cols in mx should be matched the results here with same order
mx1_2=m[:,1]' .+ (d[:,1]' .* al[:,1])' .+ (l[:,1]' .* c[:,1])
mx2_4=m[:,2]' .+ (d[:,2]' .* al[:,2])' .+ (l[:,2]' .* c[:,2])

Fortran restructure data file

I have a data file with 4 columns:
x y u v
such that x and y are the coordinate positions associated to the values u and v.
The data is structured such that
x y u v
1 1 # #
2 1 # #
3 1 # #
...
However, I would like to restructure the file such that
x y u v
1 1 # #
1 2 # #
1 3 # #
...
Is there a function in fortran which can achieve this?

Well, I never make claims about "pretty," but it should do the job. Obviously, you will need to check your FORMAT statements:
PROGRAM TEST
REAL*8 :: U(4,4)
REAL*8 :: V(4,4)
INTEGER :: X, Y
DO
READ(*,'(2I2)',ADVANCE='NO',END=10) X,Y
READ(*,'(2F6.1)',ADVANCE='YES',END=10) U(X,Y),V(X,Y)
END DO
10 CONTINUE
WRITE(*,'(2I4,2F10.2)') ((I,J,U(I,J),V(I,J),J=1,4),I=1,4)
END
I'm assuming that your arrays are already allocated properly.
Here's my input file:
$ cat test.in
1 1 5.0 10.0
2 1 1.3 -0.2
3 1 5.1 0.0
4 1 -9.1 3.0
1 2 4.0 2.0
2 2 14.0 -8.0
3 2 -8.0 8.0
4 2 4.0 9.6
1 3 2.0 1.1
2 3 3.4 8.0
3 3 4.0 7.0
4 3 4.0 4.1
1 4 5.5 8.4
2 4 34.1 23.0
3 4 -4.1 4.0
4 4 6.0 8.4
And the output:
$ cat test.in | ./a.out
1 1 5.0 10.0
1 2 4.0 2.0
1 3 2.0 1.1
1 4 5.5 8.4
2 1 1.3 -0.2
2 2 14.0 -8.0
2 3 3.4 8.0
2 4 34.1 23.0
3 1 5.1 0.0
3 2 -8.0 8.0
3 3 4.0 7.0
3 4 -4.1 4.0
4 1 -9.1 3.0
4 2 4.0 9.6
4 3 4.0 4.1
4 4 6.0 8.4

Add or subtract the first value of a column to the rest of the column (MATLAB)

I was wondering how to go about adding or subtracting the first value of my data to/from the rest of the column, so that the first row of data would be 0.
For instance, this:
A = [13.2 12.4 -11.7 6.3 -4.0
14.2 13.1 -9.2 8.2 -4.1
14.4 14.5 -7.6 10.0 -5.1];
Would change to:
0 0 0 0 0
1 0.7 2.5 1.9 0.1
1.2 2.1 4.1 3.7 1.1
I think I can check whether the first number is positive/negative by using sign() and choose whether to add or subtract this using an ifelse statement, but I am unsure how to apply this to each column individually (or if this is the best way!).
Many thanks in advance.

You actually need element-by-element operation, as the definition of bsxfun states. In your case it should be:
A = [13.2 12.4 -11.7 6.3 -4.0
14.2 13.1 -9.2 8.2 -4.1
14.4 14.5 -7.6 10.0 -5.1];
B=bsxfun(#minus,A,A(1,:))
B =
0 0 0 0 0
1.0000 0.7000 2.5000 1.9000 -0.1000
1.2000 2.1000 4.1000 3.7000 -1.1000
This is the result for your question description, but for the example that you add, I assume that you want the absolute values, so you need to add abs:
B=abs(bsxfun(#minus,A,A(1,:)))
B =
0 0 0 0 0
1.0000 0.7000 2.5000 1.9000 0.1000
1.2000 2.1000 4.1000 3.7000 1.1000

You can select the first row and subtract it from the matrix.
A = A - A(1, :)
Or for older versions of Matlab:
A = A - repmat(A(1, :), size(A, 1), 1)

Standard deviation of each row between two columns in R

I have a simple query. I am trying to get the standard deviation of each row between two columns in an array (n=2 for the length of the array; I know it's a small sample size)
It forms part of a longer code but simply:
data$i <- sd(data$x, data$y)^2 + (0.1)^2 / data$j
so my data would look like this:
x y
3 13
4 9
19 3
14 3
18 4
3 10
9 4
3 6
3 8
10 9
8 10
11 9
13 12
15 14
19 16
8 8
8 18
11 14
10 12
18 14
12 20
6 8
and, just using the sd(), I would like to get this:
7.1
3.5
11.3
7.8
9.9
4.9
3.5
2.1
3.5
0.7
1.4
1.4
0.7
0.7
2.1
0.0
7.1
2.1
1.4
2.8
5.7
1.4

To apply sd() across the rows, you would use apply
apply(data[, c("x","y")],1,sd)

Categories

HOME

azure-active-directory

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

pandas merging 2 dataframes of difference size, columns and freq - arrays

If you have 2 columns which are the same in both dataframes, why don't you use pandas.Dataframe.join to join the two tables? By that you would have all data for the team name and year in the same row.

Related

Splitting dataframes on range of values in one column

Use for loop to get i:i+1 columns

Fortran restructure data file

Add or subtract the first value of a column to the rest of the column (MATLAB)

Standard deviation of each row between two columns in R

Categories

Resources