Subsetting a dataframe into an array - arrays

Is it possible to subset a data frame into an array?
I have a huge data frame and need to create subsets to work with them.
There is a long method where i create many variables but what about an array storing these subsets?
I want to subset based on the year(from 2004 to 2015) and store it into an array.
Here is my code.
for(i in (4:15))
{
v=2004
temp<-subset(LI,format(strptime(LI[,1],"%Y-%m-%d"),"%Y")==v)
Annual_LI[i]<-temp
v=v+1
}
The error which appears is
M_LI [i] < - temp :
Number of replacing elements is not a multiple of the length of replacement
How do i go about this?

Related

Pythonic efficient way to compute the mean of two successives arrays and add the resulted array between them

l have 9000 array of arrays called my_data=(9000,). Each array is composed of a number of arrays.
len(my_data[0])=345 arrays# each array of 2000 values
len(my_data[700])=222 arrays s# each array of 2000 values
What l would like to do ?
Given two successive arrays, compute their mean and add the resulted mean vector between them.
What l have tried ?
new_data=[]
for i in np.arange(len(my_data)):
for j in np.arange(len(my_data[i]):
mean_arrays=np.mean(my_data[i][j],my_data[i][j+1],axis=0)
new_data.append(my_data[i][j]) # add the first array
new_data.append(mean_arrays) # add the mean of the two arrays
new_data.append(my_data[i][j+1]) # add the second array
new_data=np.asarray(new_data)
Is there any efficient way to compute that efficiently in less time and in pythonic way to avoid the nested for loops ?
Thank you

Mapping a 2D array into 1D array with variable column width

I know mapping 2D array into 1D array has been asked many times, but I did not find a solution that would fit a where the column count varies.
So I want get a 1-dimensional index from this 2-dimensional array
Col> _0____1____2__
Row 0 |_0__|_1__|_2__|
V 1 |_3__|_4__|
2 |_5__|_6__|_7__|
3 |_8__|_9__|
4 |_10_|_11_|_12_|
5 |_13_|_14_|
The normal formula index = row * columns + column does not work, since after the 2nd row the index is out of place.
What is the correct formula here?
EDIT:
The specific issue is that I have a list of items in with the layout like in the grid, but a one dimensional array for the data. So while looping through the elements in the UI, I need to get the correct data, but can only get the row and column for that element. I need to find a way to turn a row/column value into an index for the data-array
Bad picture trying to explain it
A truly optimal answer (or even a provably correct one) will depend on the language you are using and how it lays out memory for such arrays.
However, taking your question simply at face value, you have to know what the actual length of each row is in order to calculate a 1D index.
So either the row length follows some pattern that can be inferred from the data, or you have (or can write) a rlen = rowLength( 2dTable, RowNumber) function.
Then, depending on how big the tables are and how fast you need to run, you can calculate a 1D index from the 2d table by adding all the previous row lengths until the current row length is less than the 2d column index.
or build a 1d table of the row lengths (or commulative rowlengths) so you can scan it and so only call your rowlength function for each row only once.
With a better description of your problem, you might get a better answer...
For your example which alternates between 3 and 2 columns you can construct a formula:
index = (row / 2) * (3 + 2) + (row % 2 ? 3 : 0) + column
(C-like syntax, assuming integer division)
In general though, the one and only way to implement what you're doing here, jagged arrays, is to make an array of arrays, a.k.a. an Iliffe vector. That means, use the row number as index into an array of pointers which point to the individual row arrays containing the actual data.
You can have an additional 1D array having the length of the columns say "length". Then your formula is index=sum {length(i)}+column. i runs from 0 to row.

Split array into smaller unequal-sized arrays dependend on array-column values

I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.

Insert new values into an array

I currently have a column vectors of different lengths and I want to insert another column vector at various points of the original array. i.e. I want to add my new array to the start of the old array skip 10 places add my new array again, skip another 10 spaces and add my new array again and so on till the end of the array. I can do this by using:
OffsetSign = [1:30]';
Extra = [0;0;0;0;0];
OffsetSign =[Extra;OffsetSign(1:10);Extra;OffsetSign(11:20);Extra;OffsetSign(21:30)];
However this is not suitable for longer arrays. Any tips on an easy way to do this for longer arrays?
here's one way to do it:
a = [1:30]';
b = [0;0;0;0;0];
a=reshape(a,10,[]);
b=repmat(b,[1 size(a,2)])
r=[b ; a]
r=r(:);
the trick is to reshape a to a matrix with columns of the right size (10 elements each). Replicate b to this # of columns , concatenate both and flatten the matrix to a vector...

Creating sub-arrays from large single array based on marker values

I need to create a 1-D array of 2-D arrays, so that a program can read each 2-D array separately.
I have a large array with 5 columns, with the second column storing 'marker' data. Depending on the marker value, I need to take the corresponding data from the remaining 4 columns and put them into a new array on its own.
I was thinking of having two for loops running, one to take the target data and write it to a cell in the 1-D array, and one to read the initial array line-by-line, looking for the markers.
I feel like this is a fairly simple issue, I'm just having trouble figuring out how to essentially cut and paste certain parts of an array and write them to a new one.
Thanks in advance.
No for loops needed, use your marker with logical indexing. For example, if your large array is A :
B=A(A(:,2)==marker,[1 3:5])
will select all rows where the marker was present, without the 2nd col. Then you can use reshape or the (:) operator to make it 1D, for example
B=B(:)
or, if you want a one-liner:
B=reshape(A(A(:,2)==marker,[1 3:5]),1,[]);
I am just answering my own question to show any potential future users the solution I came up with eventually.
%=======SPECIFY CSV INPUT FILE HERE========
MARKER_DATA=csvread('ESphnB2.csv'); % load data from csv file
%===================================
A=MARKER_DATA(:,2); % create 1D array for markers
A=A'; % make column into row
for i=1:length(A) % for every marker
if A(i) ~= 231 % if it is not 231 then
A(i)=0; % set value to zero
end
end
edgeArray = diff([0; (A(:) ~= 0); 0]); % set non-zero values to 1
ind = [find(edgeArray > 0) find(edgeArray < 0)-1]; % find indices of 1 and save to array with beginning and end
t=1; % initialize counter for trials
for j=1:size(ind,1) % for every marked index
B{t}=MARKER_DATA(ind(j,1):ind(j,2),[3:6]); % create an array with the rows from the data according to indicies
t=t+1; % create a new trial
end
gazeVectors=B'; % reorient and rename array of trials for saccade analysis
%======SPECIFY MAT OUTPUT FILE HERE===
save('Trial_Data_2.mat','gazeVectors'); % save array to mat file
%=====================================

Resources