an array of arrays varied in length in R - arrays

I use R for my statistical analysis.
I wanna group my data in an array based on the ID column. This results in having an array of unique IDs which each cell includes a data array of correspondence ID. Since the number of the data per ID is not similar, therefor each array in each cell has different length.
So I wonder how I can create an array of arrays varied in length using R?
I already having the following codes but get an error:
#number of unique IDs
size<-unique(data[,1]);
for (i in 1:length (gr))
{
index<- which(data[,1]==gr[i]);
data_c[[i,1]]<-data[index,];
}
Here is the error
more elements supplied than there are to replace
Thanks in advance for any comment.
I explain my problem by an example:
I have following data called it DATA_ALL:
DATA_ALL[]=
id age T1 T2 T3 T4
1 20 1 0 0 0
1 20 NA 0 NA 0
1 20 0 0 0 0
5 30 1 NA 0 0
5 30 0 0 0 1
6 40 0 1 0 0
I want to group the data of each id and put all in an array (array of arrays):
DATA_GROUPED []=
id data
1 1 X1[]=[an array includes all data from DATA_ALL where the id=1]
2 5 X2[]=[an array includes all data from DATA_ALL where the id=5]
3 6 X3[]=[an array includes all data from DATA_ALL where the id=6]
Please note that the length of X1!=X2!=X3
So how I can create the DATA_GROUPED[] matrix??

It is nearly impossible to answer your question in relation to your code, but in general, I think what you want to do is create a list of vectors, a bit like this:
one<-letters[1]
two<-letters[2:3]
three<-letters[4:6]
combined<-list(one=one, two=two, three=three)
Be sure to use indexing correctly now, and preferably with [[:
for(i in 1:length(combined))
{
cat("The contents of item", names(combined)[i], "are:", combined[[i]], "\n")
}
Output:
The contents of item one are: a
The contents of item two are: b c
The contents of item three are: d e f
Edit (following edit of question):
split.data.frame(DATA_ALL, DATA_ALL[,1])
Check ?split and note the first paragraph in Details.
Note this indeed creates a list of matrices/arrays.

Related

Generating an array using absolute values instead of cell references

This question is related to this one
I have (3,0), (2,1), and (2,0) in rows 1 to 3. There are ways to generate the array of {0,0,0,1,1,0,0} using cell addresses. (See the reference above.)
Now, my question is can the same array be generated NOT using the cell references but using the three pairs of the numbers only?
NOTE: In the real case, there may be up to six pairs of numbers, A1:B1 to A6:B6, and up to 2880 array elements.
Excel 2010
A B
1 3 0
2 2 1
3 2 0
You can proceed as in the edit to my answer to the original question, by generating an array as follows:
0 0 0
1 0 0
1 1 0
and using Mmult instead of Offset to get the running totals of the first column of numbers.
=SUM(INDEX(INDEX({3,0;2,1;2,0},0,2),N(IF({1},MATCH(ROW(A1:INDEX(A:A,SUM(INDEX({3,0;2,1;2,0},0,1)))),
MMULT(IF(ROW(A1:INDEX($1:$1048576,COUNT(INDEX({3,0;2,1;2,0},0,1)),COUNT(INDEX({3,0;2,1;2,0},0,1))))>
COLUMN(A1:INDEX($1:$1048576,COUNT(INDEX({3,0;2,1;2,0},0,1)),COUNT(INDEX({3,0;2,1;2,0},0,1)))),1,0),INDEX({3,0;2,1;2,0},0,1))+1)))))
If you hard-code the number of pairs of numbers instead of using Countif to count them, the formula is shorter:
=SUM(INDEX(INDEX({3,0;2,1;2,0},0,2),N(IF({1},MATCH(ROW(A1:INDEX(A:A,SUM(INDEX({3,0;2,1;2,0},0,1)))),
MMULT(IF(ROW(A1:INDEX($1:$1048576,3,3))>
COLUMN(A1:INDEX($1:$1048576,3,3)),1,0),INDEX({3,0;2,1;2,0},0,1))+1)))))

Matlab One Hot Encoding - convert column with categoricals into several columns of logicals

CONTEXT
I have a large number of columns with categoricals, all with different, unrankable choices. To make my life easier for analysis, I'd like to take each of them and convert it to several columns with logicals. For example:
1 GENRE
2 Pop
3 Classical
4 Jazz
...would turn into...
1 Pop Classical Jazz
2 1 0 0
3 0 1 0
4 0 0 1
PROBLEM
I've tried using ind2vec but this only works with numericals or logicals. I've also come across this but am not sure it works with categoricals. What is the right function to use in this case?
If you want to convert from a categorical vector to a logical array, you can use the unique function to generate column indices, then perform your encoding using any of the options from this related question:
% Sample data:
data = categorical({'Pop'; 'Classical'; 'Jazz'; 'Pop'; 'Pop'; 'Jazz'});
% Get unique categories and create indices:
[genre, ~, index] = unique(data)
genre =
Classical
Jazz
Pop
index =
3
1
2
3
3
2
% Create logical matrix:
mat = logical(accumarray([(1:numel(index)).' index], 1))
mat =
6×3 logical array
0 0 1
1 0 0
0 1 0
0 0 1
0 0 1
0 1 0
ind2vec do work with the cell strings, and you could call cellstr function to get such a cell string.
This codes may help (From this ,I only changed a little)
data = categorical({'Pop'; 'Classical'; 'Jazz';});
GENRE = cellstr(data); %change categorical data into cell strings
[~, loc] = ismember(GENRE, unique(GENRE));
genre = ind2vec(loc')';
Gen=full(genre);
array2table(Gen, 'VariableNames', unique(GENRE))
run such a code will return this:
ans =
Classical Jazz Pop
_________ ____ ___
0 0 1
1 0 0
0 1 0
you can call unique(GENRE) to check the categories(in cell strings). In the meanwhile, logical(Gen)(or call logical(full(genre))) contain columns with logical that you need.
P.s. categorical structure might be faster than cell string, but ind2vec function doesn't work with it. unique and accumarray might better.

find largest value in an array if value in first column matches specified value

I'm trying to find the largest or max value in an array/range (E44:I205) among rows with values in column D (D44:D2015) that match a word. For instance:
D E F G H I
Cheetah Cat 0 1 2 3 4
Tiger Cat 1 1 2 3 4 5
Dog 0 0 1 2 3
Among the rows with the word "*"&"cat", I want to find the max value. In this example, the formula should = 5. I've tried the following formula, but it just returns the first instance of "cat" and the associated max value in that row.
=LARGE(IF($D$25:$D$205="*"&"cat",$E$44:$I$205,),1)
Any help is much appreciated!
Use:
=AGGREGATE(14,6,E25:I205/(RIGHT(D25:D205,3)="cat"),1)

Calculating mean over an array of lists in R

I have an array built to accept the outputs of a modelling package:
M <- array(list(NULL), c(trials,3))
Where trials is a number that will generate circa 50 sets of data.
From a sampling loop, I am inserting a specific aspect of the outputs. The output from the modelling package looks a little like this:
Mt$effects
c_name effect Other
1 DPC_I 0.0818277549 0
2 DPR_I 0.0150814475 0
3 DPA_I 0.0405341027 0
4 DR_I 0.1255416311 0
5 (etc.)
And I am inserting it into my array via a loop
For(x in 1:trials) {
Mt<-run_model(params)
M[[x,3]] <- Mt$effects
}
The object now looks as follows
M[,3]
[[1]]
c_name effect Other
1 DPC_I 0.0818277549 0
2 DPR_I 0.0150814475 0
3 DPA_I 0.0405341027 0
4 DR_I 0.1255416311 0
5 (etc.)
[[2]]
c_name effect Other
1 DPC_I 0.0717384637 0
2 DPR_I 0.0190812375 0
3 DPA_I 0.0856456427 0
4 DR_I 0.2330002551 0
5 (etc.)
[[3]]
And so on (up to 50 elements).
What I want to do is calculate an average (and sd) of effect, grouped by each c_name, across each of these 50 trial runs, but I’m unable to extract the data in to a single dataframe (for example) so that I can run a ddply summarise across them.
I have tried various combinations of rbind, cbind, unlist, but I just can’t understand how to correctly lift this data out of the sequential elements. I note also that any reference to .names results in NULL.
Any solution would be most appreciated!

Merge multiple arrays of unique occurrences

I want to merge multiple arrays of unique occurrences to a single array. To get the arrays in the first place I use this code, where image series is a slice from a tiff image imported using imread:
a = unique(img_series);
occu = [a,histc(img_series(:),a)];
I do that multiple times, because the tiff image I'm using has multiple hundred images stacked, which my RAM will not support to import at once. So each 'occu' looks something like this (first number is the unique value, second number is the number of occurrences):
occu1 occu2 .....
0 1 1 2
12 1 10 1
14 1 12 1
15 1 14 2
.. .. .. .. .....
Now I want to merge them all together, or better merge them in each iteration, when I'm reading another stacked image.
The merged results should be a 2D matrix similar to the one above. The number of occurrences of the same values should be added to one another, as this is the whole point of counting them. So the result of the above example should be this:
occu_total
0 1
1 2
10 1
12 2
14 3
15 1
.. ..
I found the join command, but that one does not seem to work here. I guess I could do it the long way of searching the matching number and add the occurrences together and so on, but there must be a quicker way of doing it.
A = [0 1;12 1; 14 1;15 1];B = [1 2;10 1;12 1;14 2];
tmp = [A;B]; %// merge arrays into a single one
tmp(:,1) = tmp(:,1)+1;%// remove zero occurrences by adding 1 to everything
C = accumarray(tmp(:,1),tmp(:,2)); %// add occurrences all up
D = [1:numel(C)].'; %// create numbered array
E = [D C];
E((C==0),:)=[]; %// get output
E(:,1) = E(:,1)-1;%// subtract the 1 again
E =
0 1
1 2
10 1
12 2
14 3
15 1
Job for accumarray. This takes the first argument as your dictionary key, and adds the values of the each key together. The addition and subtraction of 1 is done because 0 cannot be an index in MATLAB. To circumvent this (assuming you have no negative numbers), you can simply add 1 and remove that afterwards, shifting all your indices to positive integers. If you hit negative numbers, subtract tmp(:,1) = min(tmp(:,1)+1 and add E(:,1) = min(tmp(:,1)-1

Resources