Splitting dataframes on range of values in one column - arrays

Say I have the following dataframe:
df = pd.DataFrame({'A' : [0, 0.3, 0.8, 1, 1.5, 2.3, 2.3, 2.9], 'B' : randn(8)})
df
Out[86]:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
3 1.0 -0.870462
4 1.5 -0.700132
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
How could I split this dataframe based on the range of Col A values as in the following (0<=A<1, 1<=A<2, etc.)?:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
A B
3 1.0 -0.870462
4 1.5 -0.700132
A B
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341
I know that np.array could be used if this were to be split based on equal number of rows:
np.array_split(df, 3)
but does something similar exist that allows me to apply my condition here?

Try with groupby and floored division:
for k, d in df.groupby(df['A']//1):
print(d)
Output:
A B
0 0.0 0.130471
1 0.3 0.029251
2 0.8 0.790972
A B
3 1.0 -0.870462
4 1.5 -0.700132
A B
5 2.3 -0.361464
6 2.3 -1.100923
7 2.9 -1.003341

Related

Merge 2 text files with the same first column

I need to merge this 2 files
File1
1
1
2
2
2
3
4
4
4
File2
1 A 0.2 0.8 0.3
2 B 0.4 0.3 0.2
3 C 0.8 0.9 0.5
4 D 0.6 0.7 0.8
Output should be
1 A 0.2 0.8 0.3
1 A 0.2 0.8 0.3
2 B 0.4 0.3 0.2
2 B 0.4 0.3 0.2
2 B 0.4 0.3 0.2
3 C 0.8 0.9 0.5
4 D 0.6 0.7 0.8
4 D 0.6 0.7 0.8
4 D 0.6 0.7 0.8
If you are using python and pandas then it's not too difficult I guess
d1 = pd.read_csv('doc1.txt',sep=" ",header=None)
d2 = pd.read_csv('doc2.txt',sep= " ",header=None)
data = d1.merge(d2,on=[0],how='left')
print(data)
There will be NAN values in data if second file does not have corresponding indices if you don't want that, you can change the type of join

Comparing two columns and summing the values in Matlab

I have 2 columns like this:
0.0 1.2
0.0 2.3
0.0 1.5
0.1 1.0
0.1 1.2
0.1 1.4
0.1 1.7
0.4 1.1
0.4 1.3
0.4 1.5
In the 1st column, 0.0 is repeated 3 times. I want to sum corresponding elements
(1.2 + 2.3 + 1.5) in the 2nd column. Similarly, 0.1 is repeated 4 times in the 1st
column. I want to sum the corresponding elements (1.0 + 1.2 + 1.4 + 1.7) in the 2nd
column and so on.
I am trying like this
for i = 1:length(col1)
for j = 1:length(col2)
% if col2(j) == col1(i)
% to do
end
end
end
This is a classical use of unique and accumarray:
x = [0.0 1.2
0.0 2.3
0.0 1.5
0.1 1.0
0.1 1.2
0.1 1.4
0.1 1.7
0.4 1.1
0.4 1.3
0.4 1.5]; % data
[~, ~, w] = unique(x(:,1)); % labels of unique elements
result = accumarray(w, x(:,2)); % sum using the above as grouping variable
You can also use the newer splitapply function instead of accumarray:
[~, ~, w] = unique(x(:,1)); % labels of unique elements
result = splitapply(#sum, x(:,2), w); % sum using the above as grouping variable
a=[0.0 1.2
0.0 2.3
0.0 1.5
0.1 1.0
0.1 1.2
0.1 1.4
0.1 1.7
0.4 1.1
0.4 1.3
0.4 1.5]
% Get unique col1 values, and indices
[uniq,~,ib]=unique(a(:,1));
% for each unique value in col1
for ii=1:length(uniq)
% sum all col2 values that correspond to the current index of the unique value
s(ii)=sum(a(ib==ii,2));
end
Gives:
s =
5.0000 5.3000 3.9000

SQL Server : adding rows for each row?

I have a table in SQL Server like this:
Col1 Col2 Col3
----- ---- -----
1 1 1
0.5 0.5 2
0.3 0.1 3
What I would like to do is that for each value in Col 3, so 1,2,3, add a 4th column that contains the numbers 1-53 in sequence. So, something like:
Col1 Col2 Col3 Col 4
----- ---- ----- ------
1 1 1 1
1 1 1 2
1 1 1 3
And so forth.
How could I accomplish this in T-SQL / Microsoft SQL Server 2016?
Thanks!
Are these the results you're trying to get?
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Col1 DECIMAL(9,1) NOT NULL,
Col2 DECIMAL(9,1) NOT NULL,
Col3 INT NOT NULL
);
INSERT #TestData (Col1, Col2, Col3) VALUES
(1, 1 ,1), (0.5,0.5,2), (0.3,0.1,3);
SELECT
td.Col1, td.Col2, td.Col3, Col4 = t.n
FROM
#TestData td
CROSS APPLY dbo.tfn_Tally(53, 1) t;
Results...
Col1 Col2 Col3 Col4
----- ----- ---- -----
1.0 1.0 1 1
0.5 0.5 2 1
0.3 0.1 3 1
1.0 1.0 1 2
0.5 0.5 2 2
0.3 0.1 3 2
1.0 1.0 1 3
0.5 0.5 2 3
0.3 0.1 3 3
1.0 1.0 1 4
0.5 0.5 2 4
0.3 0.1 3 4
1.0 1.0 1 5
0.5 0.5 2 5
0.3 0.1 3 5
1.0 1.0 1 6
0.5 0.5 2 6
0.3 0.1 3 6
1.0 1.0 1 7
0.5 0.5 2 7
0.3 0.1 3 7
1.0 1.0 1 8
0.5 0.5 2 8
0.3 0.1 3 8
1.0 1.0 1 9
0.5 0.5 2 9
0.3 0.1 3 9
1.0 1.0 1 10
0.5 0.5 2 10
0.3 0.1 3 10
1.0 1.0 1 11
0.5 0.5 2 11
0.3 0.1 3 11
1.0 1.0 1 12
0.5 0.5 2 12
0.3 0.1 3 12
1.0 1.0 1 13
0.5 0.5 2 13
0.3 0.1 3 13
1.0 1.0 1 14
0.5 0.5 2 14
0.3 0.1 3 14
1.0 1.0 1 15
0.5 0.5 2 15
0.3 0.1 3 15
1.0 1.0 1 16
0.5 0.5 2 16
0.3 0.1 3 16
1.0 1.0 1 17
0.5 0.5 2 17
0.3 0.1 3 17
1.0 1.0 1 18
0.5 0.5 2 18
0.3 0.1 3 18
1.0 1.0 1 19
0.5 0.5 2 19
0.3 0.1 3 19
1.0 1.0 1 20
0.5 0.5 2 20
0.3 0.1 3 20
1.0 1.0 1 21
0.5 0.5 2 21
0.3 0.1 3 21
1.0 1.0 1 22
0.5 0.5 2 22
0.3 0.1 3 22
1.0 1.0 1 23
0.5 0.5 2 23
0.3 0.1 3 23
1.0 1.0 1 24
0.5 0.5 2 24
0.3 0.1 3 24
1.0 1.0 1 25
0.5 0.5 2 25
0.3 0.1 3 25
1.0 1.0 1 26
0.5 0.5 2 26
0.3 0.1 3 26
1.0 1.0 1 27
0.5 0.5 2 27
0.3 0.1 3 27
1.0 1.0 1 28
0.5 0.5 2 28
0.3 0.1 3 28
1.0 1.0 1 29
0.5 0.5 2 29
0.3 0.1 3 29
1.0 1.0 1 30
0.5 0.5 2 30
0.3 0.1 3 30
1.0 1.0 1 31
0.5 0.5 2 31
0.3 0.1 3 31
1.0 1.0 1 32
0.5 0.5 2 32
0.3 0.1 3 32
1.0 1.0 1 33
0.5 0.5 2 33
0.3 0.1 3 33
1.0 1.0 1 34
0.5 0.5 2 34
0.3 0.1 3 34
1.0 1.0 1 35
0.5 0.5 2 35
0.3 0.1 3 35
1.0 1.0 1 36
0.5 0.5 2 36
0.3 0.1 3 36
1.0 1.0 1 37
0.5 0.5 2 37
0.3 0.1 3 37
1.0 1.0 1 38
0.5 0.5 2 38
0.3 0.1 3 38
1.0 1.0 1 39
0.5 0.5 2 39
0.3 0.1 3 39
1.0 1.0 1 40
0.5 0.5 2 40
0.3 0.1 3 40
1.0 1.0 1 41
0.5 0.5 2 41
0.3 0.1 3 41
1.0 1.0 1 42
0.5 0.5 2 42
0.3 0.1 3 42
1.0 1.0 1 43
0.5 0.5 2 43
0.3 0.1 3 43
1.0 1.0 1 44
0.5 0.5 2 44
0.3 0.1 3 44
1.0 1.0 1 45
0.5 0.5 2 45
0.3 0.1 3 45
1.0 1.0 1 46
0.5 0.5 2 46
0.3 0.1 3 46
1.0 1.0 1 47
0.5 0.5 2 47
0.3 0.1 3 47
1.0 1.0 1 48
0.5 0.5 2 48
0.3 0.1 3 48
1.0 1.0 1 49
0.5 0.5 2 49
0.3 0.1 3 49
1.0 1.0 1 50
0.5 0.5 2 50
0.3 0.1 3 50
1.0 1.0 1 51
0.5 0.5 2 51
0.3 0.1 3 51
1.0 1.0 1 52
0.5 0.5 2 52
0.3 0.1 3 52
1.0 1.0 1 53
0.5 0.5 2 53
0.3 0.1 3 53
You'll have to invent a fake table with numbers in:
WITH nums as(
SELECT 1 as num
UNION ALL
SELECT num + 1 FROM nums
WHERE num <= 53
)
SELECT yourtable.*, num as col4 FROM
Yourtable
CROSS JOIN
nums
You can use below code. There are many ways to generate sequence (you can store it in temp table or use cte)
CREATE TABLE temp
(
Col1 DECIMAL(10,1),
Col2 DECIMAL(10,1),
Col3 INT
)
INSERT INTO temp
VALUES
(1,1,1)
,(0.5,0.5,2)
,(0.3,0.1,3)
DECLARE #Start INT =1
, #ENd INT = 53
SELECT
t.*
, seq.n AS Col4
FROM temp t
CROSS APPLY
(
SELECT DISTINCT n = number
FROM master..[spt_values]
WHERE number BETWEEN #start AND #end
) seq
RESULT:
Col1 Col2 Col3 Col4
--------------------------------------- --------------------------------------- ----------- -----------
1.0 1.0 1 1
1.0 1.0 1 2
1.0 1.0 1 3
1.0 1.0 1 4
1.0 1.0 1 5
1.0 1.0 1 6
1.0 1.0 1 7
1.0 1.0 1 8
1.0 1.0 1 9
1.0 1.0 1 10
1.0 1.0 1 11
1.0 1.0 1 12
1.0 1.0 1 13
1.0 1.0 1 14
1.0 1.0 1 15
1.0 1.0 1 16
1.0 1.0 1 17
1.0 1.0 1 18
1.0 1.0 1 19
1.0 1.0 1 20
1.0 1.0 1 21
1.0 1.0 1 22
1.0 1.0 1 23
1.0 1.0 1 24
1.0 1.0 1 25
1.0 1.0 1 26
1.0 1.0 1 27
1.0 1.0 1 28
1.0 1.0 1 29
1.0 1.0 1 30
1.0 1.0 1 31
1.0 1.0 1 32
1.0 1.0 1 33
1.0 1.0 1 34
1.0 1.0 1 35
1.0 1.0 1 36
1.0 1.0 1 37
1.0 1.0 1 38
1.0 1.0 1 39
1.0 1.0 1 40
1.0 1.0 1 41
1.0 1.0 1 42
1.0 1.0 1 43
1.0 1.0 1 44
1.0 1.0 1 45
1.0 1.0 1 46
1.0 1.0 1 47
1.0 1.0 1 48
1.0 1.0 1 49
1.0 1.0 1 50
1.0 1.0 1 51
1.0 1.0 1 52
1.0 1.0 1 53
0.5 0.5 2 1
0.5 0.5 2 2
0.5 0.5 2 3
0.5 0.5 2 4
0.5 0.5 2 5
0.5 0.5 2 6
and so on...

Fortran restructure data file

I have a data file with 4 columns:
x y u v
such that x and y are the coordinate positions associated to the values u and v.
The data is structured such that
x y u v
1 1 # #
2 1 # #
3 1 # #
...
However, I would like to restructure the file such that
x y u v
1 1 # #
1 2 # #
1 3 # #
...
Is there a function in fortran which can achieve this?
Well, I never make claims about "pretty," but it should do the job. Obviously, you will need to check your FORMAT statements:
PROGRAM TEST
REAL*8 :: U(4,4)
REAL*8 :: V(4,4)
INTEGER :: X, Y
DO
READ(*,'(2I2)',ADVANCE='NO',END=10) X,Y
READ(*,'(2F6.1)',ADVANCE='YES',END=10) U(X,Y),V(X,Y)
END DO
10 CONTINUE
WRITE(*,'(2I4,2F10.2)') ((I,J,U(I,J),V(I,J),J=1,4),I=1,4)
END
I'm assuming that your arrays are already allocated properly.
Here's my input file:
$ cat test.in
1 1 5.0 10.0
2 1 1.3 -0.2
3 1 5.1 0.0
4 1 -9.1 3.0
1 2 4.0 2.0
2 2 14.0 -8.0
3 2 -8.0 8.0
4 2 4.0 9.6
1 3 2.0 1.1
2 3 3.4 8.0
3 3 4.0 7.0
4 3 4.0 4.1
1 4 5.5 8.4
2 4 34.1 23.0
3 4 -4.1 4.0
4 4 6.0 8.4
And the output:
$ cat test.in | ./a.out
1 1 5.0 10.0
1 2 4.0 2.0
1 3 2.0 1.1
1 4 5.5 8.4
2 1 1.3 -0.2
2 2 14.0 -8.0
2 3 3.4 8.0
2 4 34.1 23.0
3 1 5.1 0.0
3 2 -8.0 8.0
3 3 4.0 7.0
3 4 -4.1 4.0
4 1 -9.1 3.0
4 2 4.0 9.6
4 3 4.0 4.1
4 4 6.0 8.4

Summing over rows of a matrix in Matlab with the same index

I have a matrix A in Matlab of dimension hxk where element ik reports an index from {1,2,...,s<=h}. The indices can be repeated across rows. I want to obtain B of dimension sx(k-1) where element j is the sum of the rows of A(:,1:k-1) with index j. For example if
A = [0.4 5 6 0.3 1;
0.6 -0.7 3 2 2;
0.3 4.5 6 8.9 1;
0.9 0.8 0.7 3 3;
0.7 0.8 0.9 0.5 2]
the result shoud be
B = [0.7 9.5 12 9.2;
1.3 0.1 3.9 2.5;
0.9 0.8 0.7 3]
You'd need a multi-column version of accumarray. Failing that, you can use sparse as follows:
[m n] = size(A);
rows = ceil(1/(n-1):1/(n-1):m);
cols = repmat(1:n-1,1,m);
B = full(sparse(A(rows,end), cols, A(:,1:end-1).'));
cell2mat(arrayfun(#(x) sum(A(A(:,end)==x,1:end-1),1), unique(A(:,end)), 'UniformOutput', false))
The key point is selecting rows A(A(:,end)==x,1:end-1) where x is a unique element of A(:,end)

Resources