Sum row vectors IF two or more rows in given column match (MATLAB)

Sum row vectors IF two or more rows in given column match (MATLAB) - arrays

I have a 48x202 matrix, where the first columns in the matix is an ID, and the rest of the columns is related vectors to the row ID in the first column.
The ID column is sorted in acending order, and multiple rows can have the same ID.
I want to summarize all IDs that are equal, meaning that i want to sum the rows in the matrix who has identical ID in the first column.
The resulting matrix should be 32x202, since there are only 32 IDs.
Any ideas?

I'd totally approach this with accumarray as well as unique. Like the previous answer, let A be your matrix. You would obtain your answer thusly:
[vals,~,id] = unique(A(:,1),'stable');
B = accumarray(id, (1:numel(id)).', [], #(x) {sum(A(x,2:end),1)});
out = [vals cell2mat(B)];
The first line of code produces vals which is a list of all unique IDs seen in the first column of A and id assigns a unique integer ID without any gaps from 1 up to as many unique IDs there are in the first column of A. The reason why you want to do this is for the next line of code.
How accumarray works is that you provide a set of keys and a set of values associated with each key. accumarray groups all values that belong to the same key and does something to all of the values. The keys in our case is the IDs given in the first column of A and the values are the actual row locations of the matrix A from 1 up to as many rows as A. Now, the default behaviour when collecting all of the values together is to sum all of the values that belong to the same key together, but we're going to do something a bit different. What we'll do is that for each unique ID seen in the first column of A, there will be a bunch of row locations that map to the same ID. We're going to use these row locations and will access the matrix A and sum all of the columns from the second column to the end. That's what the anonymous function in the fourth argument of accumarray is doing. accumarray traditionally should output a single value representing all of the values mapped to a key, but we get around this by outputting a single cell, where each cell entry is the row sum of the mapped columns.
Each element of B gives you the row sum for each corresponding unique value in vals and so the last line of code pieces these together - the unique value in vals with the corresponding row sum. I had to use cell2mat because this was a matrix of cells and I had to convert all of these into a numerical matrix to complete the task.
Here's an example seeing this in action. I'm going to do this for a smaller set of data:
>> rng(123);
>> A = [[1;1;1;2;2;2;2;3;3;4;4;5;6;7] randi(10, 14, 10)];
>> A
A =
1 7 4 3 4 5 1 10 3 2 3
1 3 8 7 5 7 9 9 4 9 6
1 3 2 1 9 9 7 4 6 4 9
2 6 2 5 3 6 8 1 7 6 4
2 8 6 5 5 7 1 4 2 6 8
2 5 6 5 10 6 6 4 2 6 2
2 10 7 5 6 7 6 8 4 1 7
3 7 9 4 7 7 2 10 7 10 9
3 5 8 5 2 9 2 4 9 10 10
4 4 7 9 9 1 7 8 6 3 1
4 4 8 10 7 8 4 6 9 3 5
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6
The first column is our IDs, and the next columns are the data. Running the above code I just wrote, we get:
>> out
out =
1 13 14 11 18 21 17 23 13 15 18
2 29 21 20 24 26 21 17 15 19 21
3 12 17 9 9 16 4 14 16 20 19
4 8 15 19 16 9 11 14 15 6 6
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6
If you double check each row, summing over all of the columns that match each of the column IDs matches up. For example, the first three rows map to the same ID, and we should sum up all of these rows and we get the corresponding sum. The second column is equal to 7+3+3=13, the third column is equal to 4+8+2=14, etc.

Another approach is to apply unique and then use bsxfun to build a matrix that multiplied by the non-ID part of the input matrix will give the result.
Let the input matrix be denoted as A. Then:
[u, ~, v] = unique(A(:,1));
result = [ u bsxfun(#eq, u, u(v).') * A(:,2:end) ];
Example: borrowing from #rayryeng's answer, let
A = [ 1 7 4 3 4 5 1 10 3 2 3
1 3 8 7 5 7 9 9 4 9 6
1 3 2 1 9 9 7 4 6 4 9
2 6 2 5 3 6 8 1 7 6 4
2 8 6 5 5 7 1 4 2 6 8
2 5 6 5 10 6 6 4 2 6 2
2 10 7 5 6 7 6 8 4 1 7
3 7 9 4 7 7 2 10 7 10 9
3 5 8 5 2 9 2 4 9 10 10
4 4 7 9 9 1 7 8 6 3 1
4 4 8 10 7 8 4 6 9 3 5
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6 ];
Then the result is
result =
1 13 14 11 18 21 17 23 13 15 18
2 29 21 20 24 26 21 17 15 19 21
3 12 17 9 9 16 4 14 16 20 19
4 8 15 19 16 9 11 14 15 6 6
5 8 4 6 6 3 7 7 4 6 3
6 5 4 7 4 2 6 2 4 10 5
7 1 3 2 4 6 4 4 4 10 6
and the intermediate matrix created with bsxfun is
>> bsxfun(#eq, u, u(v).')
ans =
1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
Pre-multiplying A by this matrix means that the first three rows of A are added to give the first row of the result; then the following four rows of A are added to give the second row of the result, etc.

You can find the unique row IDs with unique and then loop over all of those, summing the other columns: Let A be your matrix, then
rID = unique(A(:, 1));
B = zeros(numel(rID), size(A, 2));
for ii = 1:numel(rID)
B(ii, 1) = rID(ii);
B(ii, 2:end) = sum(A(A(:, 1) == rID(ii), 2:end), 1);
end
B contains your output.

Related

Adding new column to pandas dataframe with same element multiple times [duplicate]

This question already has an answer here:
Pandas: how to create a simple counter that increases each n rows?
(1 answer)
Closed 1 year ago.
I have a dataframe which looks like this:
import pandas as pd
df = pd.DataFrame({
'SENDER_ID': [1,2,3,4,5,6,7,8,9,10,11,12] })
df =
SENDER_ID
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
Now I want to add a column which has the the element multiple times.
SENDER_ID counter
0 1 0
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1
6 7 2
7 8 2
8 9 2
9 10 3
10 11 3
11 12 3
The dataframe always has a length of multiple of 3 and is much larger then in this simple example.
What is the easiest and most generic way to add this new column?

Another way using pd.RangeIndex:
df['count'] = pd.RangeIndex(0, len(df)//3).repeat(3)
print(df)
# Output:
SENDER_ID count
0 1 0
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1
6 7 2
7 8 2
8 9 2
9 10 3
10 11 3
11 12 3

I think I found a solution which works:
max_list_length = int(len(df) / 3)
liste = [[n]*3 for n in range(0,max_list_length)]
value = sum(liste, [])
>>>> value
>>>> [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]
for n in range (0, len(df)):
df.at[n, 'counter'] = value[n]

Problems with setting array elements in Forth

I am writing code in Forth that should create a 12x12 array of random numbers from 1 to 8.
create big_array 144 allocate drop
: reset_array big_array 144 0 fill ;
reset_array
variable rnd here rnd !
: random rnd # 31421 * 6927 + dup rnd ! ;
: choose random um* nip ;
: random_fill 144 1 do 8 choose big_array i + c! loop ;
random_fill
: Array_# 12 * + big_array swap + c# ;
: show_small_array cr 12 0 do 12 0 do i j Array_# 5 u.r loop cr loop ;
show_small_array
However, I notice that elements 128 to 131 of my array are always much larger than expected:
0 4 0 4 2 6 0 5 2 5 7 3
6 3 7 3 7 7 3 1 5 0 6 1
0 3 3 0 3 1 0 7 2 0 4 5
3 7 6 6 2 1 0 2 3 4 2 7
4 7 1 5 3 5 7 2 3 5 3 6
3 0 6 4 1 3 3 2 5 4 4 7
3 2 1 4 3 4 3 7 2 6 5 5
2 4 4 3 4 5 4 4 6 5 6 0
2 5 2 7 3 1 5 0 1 4 6 7
2 0 3 3 0 7 3 6 4 1 3 6
0 1 1 6 0 3 0 2 169 112 41 70
7 2 3 1 2 2 7 6 0 5 1 2
Moreover, when I try to change the value of these elements individually, this causes the other three elements to change value. For example, if I code:
9 choose big_array 128 + c!
then the array will become:
0 4 0 4 2 6 0 5 2 5 7 3
6 3 7 3 7 7 3 1 5 0 6 1
0 3 3 0 3 1 0 7 2 0 4 5
3 7 6 6 2 1 0 2 3 4 2 7
4 7 1 5 3 5 7 2 3 5 3 6
3 0 6 4 1 3 3 2 5 4 4 7
3 2 1 4 3 4 3 7 2 6 5 5
2 4 4 3 4 5 4 4 6 5 6 0
2 5 2 7 3 1 5 0 1 4 6 7
2 0 3 3 0 7 3 6 4 1 3 6
0 1 1 6 0 3 0 2 2 12 194 69
7 2 3 1 2 2 7 6 0 5 1 2
Do you have any idea why these specific elements are always impacted and if there is a way to prevent this?

Better readability and less error prone: 144 allocate ⇨ 144 chars allocate
A mistake: create big_array 144 allocate drop ⇨ create big_array 144 chars allot
A mistake: random um* nip ⇨ random swap mod
A mistake: 144 1 do ⇨ 144 0 do
An excessive operation: big_array swap + ⇨ big_array +
And add the stack comments, please. Especially, when you ask for help.
Do you have any idea why these specific elements are always impacted and if there is a way to prevent this?
Since you try to use memory in the dictionary space without reserving it. This memory is used by the Forth system.

J: Coordinates with specific value

Let's say we have array
0 1 2 3 4 5 8 7 8 9
There are two indexes that have value 8:
(i.10) ([#~8={) 0 1 2 3 4 5 8 7 8 9
6 8
Is there any shorter way to get this result? May be some built-in verb.
But more important. What about higher dimensions?
Let's say we have matrix 5x4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
I want to find out what are coordinates with value 6.
I want to get result such (there are three coordinates):
4 1
3 2
2 3
It's pretty basic task and I think it should exist some simple solution.
The same in three dimensions?
Thank you

Using Sparse array functionality ($.) provides a very fast and lean solution that also works for multiple dimensions.
]a=: 5 ]\ 1 + i. 8
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
6 = a
0 0 0 0 0
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
4 $. $. 6 = a
1 4
2 3
3 2
Tacitly:
getCoords=: 4 $. $.
getCoords 6 = a ,: a
0 1 4
0 2 3
0 3 2
1 1 4
1 2 3
1 3 2

Verb indices I. almost does the job.
When you have a simple list, I.'s use is straightforward:
I. 8 = 0 1 2 3 4 5 8 7 8 9
6 8
For higher order matrices you can pair it with antibase #: to get the coordinates in base $ matrix. Eg:
]a =: 4 5 $ 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
I. 6 = ,a
9 13 17
($a) #: 9 13 17
1 4
2 3
3 2
Similarly, for any number of dimensions: flatten (,), compare (=), get indices (I.) and convert coordinates (($a)&#:):
]coords =: ($a) #: I. 5 = , a =: ? 5 6 7 $ 10
0 0 2
0 2 1
0 2 3
...
(<"1 coords) { a
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
By the way, you can write I. x = y as x (I.#:=) y for extra performance. It is special code for
indices where x f y

Tacit function to multiply five consecutive number in a list: J, j701

I'm working on Project Euler, I'm on problem 8, and I'm trying a simple brute force: Multiply each consecutive 5 digit of the number, make a list with the results, and find the higher.
This is the code I'm currently trying to write in J:
n =: 731671765313x
NB. 'n' will be the complete 1000-digits number
itl =: (".#;"0#":)
NB. 'itl' transform an integer in a list of his digit
N =: itl n
NB. just for short writing
takeFive =: 5 {. ] }.~ 1 -~ [
NB. this is a dyad, I get this code thanks to '13 : '5{.(x-1)}.y'
NB. that take a starting index and it's applied to a list
How I can use takeFive for all the index of N?
I tried:
(i.#N) takeFive N
|length error: takeFive
| (i.#N) takeFive N
but it doesn't work and I don't know why.
Thank you all.

1. The reason that (i.#N) takeFive N is not working is that you are essentially trying to run 5{. ((i.#N)-1) }. Nbut you have to use x not as a list but as an atom. You can do that by setting the appropriate left-right rank " of the verb:
(i.#N) (takeFive"0 _) N
7 3 1 6 7
7 3 1 6 7
3 1 6 7 1
1 6 7 1 7
6 7 1 7 6
7 1 7 6 5
1 7 6 5 3
7 6 5 3 1
6 5 3 1 3
5 3 1 3 0
3 1 3 0 0
1 3 0 0 0
2. One other way is to bind (&) your list (N) to takeFive and then run the binded-verb through every i.#N. To do this, it's better to use the reverse version of takeFive: takeFive~:
((N&(takeFive~))"0) i.#N
7 3 1 6 7
7 3 1 6 7
3 1 6 7 1
1 6 7 1 7
6 7 1 7 6
7 1 7 6 5
1 7 6 5 3
7 6 5 3 1
6 5 3 1 3
5 3 1 3 0
3 1 3 0 0
1 3 0 0 0
or (N&(takeFive~)) each i.#N.
3. I think, though, that the infix dyad \ might serve you better:
5 >\N
7 3 1 6 7
3 1 6 7 1
1 6 7 1 7
6 7 1 7 6
7 1 7 6 5
1 7 6 5 3
7 6 5 3 1
6 5 3 1 3

Reshape acast() remove missing values

I have this dataframe:
df <- data.frame(subject = c(rep("one", 20), c(rep("two", 20))),
score1 = sample(1:3, 40, replace=T),
score2 = sample(1:6, 40, replace=T),
score3 = sample(1:3, 40, replace=T),
score4 = sample(1:4, 40, replace=T))
subject score1 score2 score3 score4
1 one 2 4 2 2
2 one 3 3 1 2
3 one 1 2 1 3
4 one 3 4 1 2
5 one 1 2 2 3
6 one 1 5 2 4
7 one 2 5 3 2
8 one 1 5 1 3
9 one 3 5 2 2
10 one 2 3 3 4
11 one 3 2 1 3
12 one 2 5 2 1
13 one 2 4 1 4
14 one 2 2 1 3
15 one 1 3 1 4
16 one 1 6 1 3
17 one 3 4 2 2
18 one 3 2 1 3
19 one 2 5 3 1
20 one 3 6 2 1
21 two 1 6 3 4
22 two 1 2 1 2
23 two 3 2 1 2
24 two 1 2 2 1
25 two 2 3 1 3
26 two 1 5 3 3
27 two 2 4 1 4
28 two 2 6 2 4
29 two 1 6 2 2
30 two 1 5 1 4
31 two 2 1 2 4
32 two 3 6 1 1
33 two 1 1 3 1
34 two 2 4 2 3
35 two 2 1 3 2
36 two 2 3 1 3
37 two 1 2 3 4
38 two 3 5 2 2
39 two 2 1 3 4
40 two 2 1 1 3
Note that the scores have different ranges of values. Score 1 ranges from 1-3, score 2 from -6, score 3 from 1-3, score 4 from 1-4
I'm trying to reshape data like this:
library(reshape2)
dfMelt <- melt(df, id.vars="subject")
acast(dfMelt, subject ~ value ~ variable)
Aggregation function missing: defaulting to length
, , score1
1 2 3 4 5 6
one 6 7 7 0 0 0
two 8 9 3 0 0 0
, , score2
1 2 3 4 5 6
one 0 5 3 4 6 2
two 5 4 2 2 3 4
, , score3
1 2 3 4 5 6
one 10 7 3 0 0 0
two 8 6 6 0 0 0
, , score4
1 2 3 4 5 6
one 3 6 7 4 0 0
two 3 5 5 7 0 0
Note that the output array includes scores as "0" if they are missing. Is there any way to stop these missing scores being outputted by acast?

In this case, you might do better sticking to base R's table feature. I'm not sure that you can have an irregular array like you are looking for.
For example:
> lapply(df[-1], function(x) table(df[[1]], x))
$score1
x
1 2 3
one 9 6 5
two 11 4 5
$score2
x
1 2 3 4 5 6
one 2 5 4 3 3 3
two 4 2 2 3 4 5
$score3
x
1 2 3
one 9 5 6
two 4 11 5
$score4
x
1 2 3 4
one 4 4 8 4
two 2 6 5 7
Or, using your "long" data:
with(dfMelt, by(dfMelt, variable,
FUN = function(x) table(x[["subject"]], x[["value"]])))

Since each "score" subset is going to have a different shape, you will not be able to preserve the array structure. One option is to use lists of two-dim arrays or data.frames. eg:
# your original acast call
res <- acast(dfMelt, subject ~ value ~ variable)
# remove any columns that are all zero
apply(res, 3, function(x) x[, apply(x, 2, sum)!=0] )
Which gives:
$score1
1 2 3
one 7 8 5
two 6 8 6
$score2
1 2 3 4 5 6
one 4 2 6 4 1 3
two 2 5 3 4 3 3
$score3
1 2 3
one 5 10 5
two 5 11 4
$score4
1 2 3 4
one 5 4 4 7
two 4 6 6 4