Adding new column to pandas dataframe with same element multiple times [duplicate]

Adding new column to pandas dataframe with same element multiple times [duplicate] - arrays

This question already has an answer here:
Pandas: how to create a simple counter that increases each n rows?
(1 answer)
Closed 1 year ago.
I have a dataframe which looks like this:
import pandas as pd
df = pd.DataFrame({
'SENDER_ID': [1,2,3,4,5,6,7,8,9,10,11,12] })
df =
SENDER_ID
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
Now I want to add a column which has the the element multiple times.
SENDER_ID counter
0 1 0
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1
6 7 2
7 8 2
8 9 2
9 10 3
10 11 3
11 12 3
The dataframe always has a length of multiple of 3 and is much larger then in this simple example.
What is the easiest and most generic way to add this new column?

Another way using pd.RangeIndex:
df['count'] = pd.RangeIndex(0, len(df)//3).repeat(3)
print(df)
# Output:
SENDER_ID count
0 1 0
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1
6 7 2
7 8 2
8 9 2
9 10 3
10 11 3
11 12 3

I think I found a solution which works:
max_list_length = int(len(df) / 3)
liste = [[n]*3 for n in range(0,max_list_length)]
value = sum(liste, [])
>>>> value
>>>> [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]
for n in range (0, len(df)):
df.at[n, 'counter'] = value[n]

Related

How can I count number of occurrences of unique row in MATLAB ?

I have a matrix like following,
A =
1 2 3
4 5 6
7 8 9
10 11 12
4 5 6
7 8 9
4 5 6
1 2 3
I could extract unique rows in this matrix using command A_unique = unique(A,'rows') and result as follows
A_unique =
1 2 3
4 5 6
7 8 9
10 11 12
I need to find number of times each rows exists in the main matrix A
Some thing like following
A_unique_count =
2
3
2
1
How can I find count of unique rows? Can anyone help? Thanks in Advance
Manu

The third output of unique gives you the index of the unique row in the original array. You can use this with accumarray to count the number of occurrences.
For example:
A = [1 2 3
4 5 6
7 8 9
10 11 12
4 5 6
7 8 9
4 5 6
1 2 3];
[uniquerow, ~, rowidx] = unique(A, 'rows');
noccurrences = accumarray(rowidx, 1)
Returns:
noccurrences =
2
3
2
1
As expected

I would recommend #excaza's approach. But just for variety:
A_unique_count = diff([0; find([any(diff(sortrows(A), [], 1), 2); 1])]);

J: Coordinates with specific value

Let's say we have array
0 1 2 3 4 5 8 7 8 9
There are two indexes that have value 8:
(i.10) ([#~8={) 0 1 2 3 4 5 8 7 8 9
6 8
Is there any shorter way to get this result? May be some built-in verb.
But more important. What about higher dimensions?
Let's say we have matrix 5x4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
I want to find out what are coordinates with value 6.
I want to get result such (there are three coordinates):
4 1
3 2
2 3
It's pretty basic task and I think it should exist some simple solution.
The same in three dimensions?
Thank you

Using Sparse array functionality ($.) provides a very fast and lean solution that also works for multiple dimensions.
]a=: 5 ]\ 1 + i. 8
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
6 = a
0 0 0 0 0
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
4 $. $. 6 = a
1 4
2 3
3 2
Tacitly:
getCoords=: 4 $. $.
getCoords 6 = a ,: a
0 1 4
0 2 3
0 3 2
1 1 4
1 2 3
1 3 2

Verb indices I. almost does the job.
When you have a simple list, I.'s use is straightforward:
I. 8 = 0 1 2 3 4 5 8 7 8 9
6 8
For higher order matrices you can pair it with antibase #: to get the coordinates in base $ matrix. Eg:
]a =: 4 5 $ 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
I. 6 = ,a
9 13 17
($a) #: 9 13 17
1 4
2 3
3 2
Similarly, for any number of dimensions: flatten (,), compare (=), get indices (I.) and convert coordinates (($a)&#:):
]coords =: ($a) #: I. 5 = , a =: ? 5 6 7 $ 10
0 0 2
0 2 1
0 2 3
...
(<"1 coords) { a
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
By the way, you can write I. x = y as x (I.#:=) y for extra performance. It is special code for
indices where x f y

Sum row vectors IF two or more rows in given column match (MATLAB)

I have a 48x202 matrix, where the first columns in the matix is an ID, and the rest of the columns is related vectors to the row ID in the first column.
The ID column is sorted in acending order, and multiple rows can have the same ID.
I want to summarize all IDs that are equal, meaning that i want to sum the rows in the matrix who has identical ID in the first column.
The resulting matrix should be 32x202, since there are only 32 IDs.
Any ideas?

I'd totally approach this with accumarray as well as unique. Like the previous answer, let A be your matrix. You would obtain your answer thusly:
[vals,~,id] = unique(A(:,1),'stable');
B = accumarray(id, (1:numel(id)).', [], #(x) {sum(A(x,2:end),1)});
out = [vals cell2mat(B)];
The first line of code produces vals which is a list of all unique IDs seen in the first column of A and id assigns a unique integer ID without any gaps from 1 up to as many unique IDs there are in the first column of A. The reason why you want to do this is for the next line of code.
How accumarray works is that you provide a set of keys and a set of values associated with each key. accumarray groups all values that belong to the same key and does something to all of the values. The keys in our case is the IDs given in the first column of A and the values are the actual row locations of the matrix A from 1 up to as many rows as A. Now, the default behaviour when collecting all of the values together is to sum all of the values that belong to the same key together, but we're going to do something a bit different. What we'll do is that for each unique ID seen in the first column of A, there will be a bunch of row locations that map to the same ID. We're going to use these row locations and will access the matrix A and sum all of the columns from the second column to the end. That's what the anonymous function in the fourth argument of accumarray is doing. accumarray traditionally should output a single value representing all of the values mapped to a key, but we get around this by outputting a single cell, where each cell entry is the row sum of the mapped columns.
Each element of B gives you the row sum for each corresponding unique value in vals and so the last line of code pieces these together - the unique value in vals with the corresponding row sum. I had to use cell2mat because this was a matrix of cells and I had to convert all of these into a numerical matrix to complete the task.
Here's an example seeing this in action. I'm going to do this for a smaller set of data:
>> rng(123);
>> A = [[1;1;1;2;2;2;2;3;3;4;4;5;6;7] randi(10, 14, 10)];
>> A
A =
1 7 4 3 4 5 1 10 3 2 3
1 3 8 7 5 7 9 9 4 9 6
1 3 2 1 9 9 7 4 6 4 9
2 6 2 5 3 6 8 1 7 6 4
2 8 6 5 5 7 1 4 2 6 8
2 5 6 5 10 6 6 4 2 6 2
2 10 7 5 6 7 6 8 4 1 7
3 7 9 4 7 7 2 10 7 10 9
3 5 8 5 2 9 2 4 9 10 10
4 4 7 9 9 1 7 8 6 3 1
4 4 8 10 7 8 4 6 9 3 5
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6
The first column is our IDs, and the next columns are the data. Running the above code I just wrote, we get:
>> out
out =
1 13 14 11 18 21 17 23 13 15 18
2 29 21 20 24 26 21 17 15 19 21
3 12 17 9 9 16 4 14 16 20 19
4 8 15 19 16 9 11 14 15 6 6
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6
If you double check each row, summing over all of the columns that match each of the column IDs matches up. For example, the first three rows map to the same ID, and we should sum up all of these rows and we get the corresponding sum. The second column is equal to 7+3+3=13, the third column is equal to 4+8+2=14, etc.

Another approach is to apply unique and then use bsxfun to build a matrix that multiplied by the non-ID part of the input matrix will give the result.
Let the input matrix be denoted as A. Then:
[u, ~, v] = unique(A(:,1));
result = [ u bsxfun(#eq, u, u(v).') * A(:,2:end) ];
Example: borrowing from #rayryeng's answer, let
A = [ 1 7 4 3 4 5 1 10 3 2 3
1 3 8 7 5 7 9 9 4 9 6
1 3 2 1 9 9 7 4 6 4 9
2 6 2 5 3 6 8 1 7 6 4
2 8 6 5 5 7 1 4 2 6 8
2 5 6 5 10 6 6 4 2 6 2
2 10 7 5 6 7 6 8 4 1 7
3 7 9 4 7 7 2 10 7 10 9
3 5 8 5 2 9 2 4 9 10 10
4 4 7 9 9 1 7 8 6 3 1
4 4 8 10 7 8 4 6 9 3 5
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6 ];
Then the result is
result =
1 13 14 11 18 21 17 23 13 15 18
2 29 21 20 24 26 21 17 15 19 21
3 12 17 9 9 16 4 14 16 20 19
4 8 15 19 16 9 11 14 15 6 6
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6
and the intermediate matrix created with bsxfun is
>> bsxfun(#eq, u, u(v).')
ans =
1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
Pre-multiplying A by this matrix means that the first three rows of A are added to give the first row of the result; then the following four rows of A are added to give the second row of the result, etc.

You can find the unique row IDs with unique and then loop over all of those, summing the other columns: Let A be your matrix, then
rID = unique(A(:, 1));
B = zeros(numel(rID), size(A, 2));
for ii = 1:numel(rID)
B(ii, 1) = rID(ii);
B(ii, 2:end) = sum(A(A(:, 1) == rID(ii), 2:end), 1);
end
B contains your output.

Taking averages of data based on logical filter

we have two columns ('A' and 'B') as follows.
A = [10 5 6 6 10 2 3 2 1 3 2 3 3 7 9 8 6 8 8 12]
B = [10 5 6 6 2 2 3 2 1 3 2 3 3 7 2 2 3 3 8 12]
logicalFilter= ~(B<=3 & B>1)
Now I need to take averages of data points in A corresponding to logicalFilter == 1 for three different blocks of logicalFilter == 1 separately and also ignoring first two points (for example) in A when logicalFilter == 1 in each block for the calculation of averages. How this can be done?

My mentalist skills leading me to this answer:
%// input
A = [10 5 6 6 10 2 3 2 1 3 2 3 3 7 9 8 6 8 8 12]
B = [10 5 6 6 2 2 3 2 1 3 2 3 3 7 2 2 3 3 8 12]
mask = (B<=3 & B>1)
%// get subs and vals for accumarray
C = cumsum(~mask) + 1
[~,~,subs] = unique(C(mask))
val = A(mask)
%// calculate mean starting with 3rd value of group
out = accumarray(subs(:),val(:),[],#(x) mean(x(3:end)) )
out =
2.5000 3.0000 7.0000

How to concatenate submatrix into a bigger matrix in Octave

I'm trying to solve the following issue: I have an 3x3x4 array like this:
A(:,:,1) = A(:,:,2) = A(:,:,3) = A(:,:,4) =
1 1 1 2 2 2 3 3 3 4 4 4
1 1 1 2 2 2 3 3 3 4 4 4
1 1 1 2 2 2 3 3 3 4 4 4
I would like to produce a 6x6 matrix like the following:
B =
1 1 1 3 3 3
1 1 1 3 3 3
1 1 1 3 3 3
2 2 2 4 4 4
2 2 2 4 4 4
2 2 2 4 4 4
My first thought was to use something like the reshape function, but since it operates columnwise, the result is not what I want.
Do you have any ideas to perform it efficiently?
Thanks in advance

This is for a general case of converting a 3D array into such a 2D array -
m = 2; %// number of 3D slices to be vertically concatenated to form the rows
m1 = size(A,1)*m;
m2 = size(A,3)/m;
B = reshape(permute(reshape(permute(A,[1 3 2]),m1,m2,[]),[1 3 2]),m1,[])
Sample run -
A(:,:,1) =
1 1 7
1 9 1
1 7 2
A(:,:,2) =
3 9 2
9 4 7
9 3 7
A(:,:,3) =
2 6 8
4 8 4
1 8 4
A(:,:,4) =
1 1 7
8 3 4
1 9 8
A(:,:,5) =
7 9 2
6 8 5
4 1 6
A(:,:,6) =
3 2 8
4 9 1
4 4 4
B =
1 1 7 2 6 8 7 9 2
1 9 1 4 8 4 6 8 5
1 7 2 1 8 4 4 1 6
3 9 2 1 1 7 3 2 8
9 4 7 8 3 4 4 9 1
9 3 7 1 9 8 4 4 4

Since your sub-matrices are all of the same size you can assign them directly into B:
clear
B = zeros(6);
A(:,:,1) = ones(3);
A(:,:,2) = 2*ones(3);
A(:,:,3) = 3*ones(3);
A(:,:,4) = 4*ones(3);
B = [A(:,:,1) A(:,:,3); A(:,:,2) A(:,:,4)]
B =
1 1 1 3 3 3
1 1 1 3 3 3
1 1 1 3 3 3
2 2 2 4 4 4
2 2 2 4 4 4
2 2 2 4 4 4
This might prove cumbersome if you have many more sub-matrices though but that could be automated.

permute is much more efficient (à la Divakar) or manually slicing into a 2D array (à la Benoit), but I'll add something to the mix for future readers. One way I can suggest is to take each plane and place it into a 1D cell array, reshape the cell array into a 2 x 2 grid, then convert the 2 x 2 grid into a final matrix. Something like:
B = arrayfun(#(x) A(:,:,x), 1:4, 'uni', 0);
B = reshape(B, 2, 2);
B = cell2mat(B)
B =
1 1 1 3 3 3
1 1 1 3 3 3
1 1 1 3 3 3
2 2 2 4 4 4
2 2 2 4 4 4
2 2 2 4 4 4

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Adding new column to pandas dataframe with same element multiple times [duplicate] - arrays

Another way using pd.RangeIndex: df['count'] = pd.RangeIndex(0, len(df)//3).repeat(3) print(df) # Output: SENDER_ID count 0 1 0 1 2 0 2 3 0 3 4 1 4 5 1 5 6 1 6 7 2 7 8 2 8 9 2 9 10 3 10 11 3 11 12 3

I think I found a solution which works: max_list_length = int(len(df) / 3) liste = [[n]*3 for n in range(0,max_list_length)] value = sum(liste, []) >>>> value >>>> [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3] for n in range (0, len(df)): df.at[n, 'counter'] = value[n]

Related

How can I count number of occurrences of unique row in MATLAB ?

J: Coordinates with specific value

Sum row vectors IF two or more rows in given column match (MATLAB)

Taking averages of data based on logical filter

How to concatenate submatrix into a bigger matrix in Octave

Categories

Resources