Loading multiple files into matrix using R - arrays

I am new to the programming world and need help with loading a file to R and creating a matrix with it. I can import individual files and create and individual matrix out of it. How do I do this for multiple files? I have 21 files that each contain 100 rows and 100 columns and I need to import each file and put everything in a single array.

I would use list.files to list my files by pattern.
lapply to loop through the list of files and create a list data.frame with read.csv
rbindlist to bind all in a big matrix.
temp = list.files(pattern="*.csv")
named.list <- lapply(temp, read.csv)
library(data.table)
files.matrix <-rbindlist(named.list)

It's not exactly clear what structure you want. You can choose between a 2100x100 matrix or a 2100x100 dataframe or a 100x 100x 21 array or a list with 21 entries each of which was 100 x 100. (In R an array is the term one would use for a regular 3 dimensional structure with columns all of that same type. (and then of course there is agstudy's suggestion that you use a data.table.)
In a sense agstudy's code already gives you the 21 item list of dataframes each of dimension: 100x100:
temp = list.files(pattern="*.csv")
named.list <- lapply(temp, read.csv)
To get 100 x 100 x 21 array continue with this:
require(abind)
arr <- abind(named.list)
To get the 2100 x 100 dataframe, continue instead with:
longdf <- do.call(rbind, named.list)
To get the 2100 x 100 matrix continue from the last line with:
longmtx <- data.matrix(longdf)

Related

Displaying a matrix obtained from a vectorised structure

I am trying to extract the coordinates located in the variable x within a .mat structure. I would like to print them as a three column matrix. Let's say:
-5543837.67700032 -2054567.16633347 2387852.25825667
4641938.565315761 393003.28157792 4133325.70392322
-3957408.7414133 3310229.46968631 3737494.72491701
1492206.38965564 -4458130.51073730 4296015.51539152
4075539.69798060 931735.497964395 4801629.46009471
3451207.69353006 3060375.44622100 4391915.05780934
I know that I can get them with
file=load('./filee_scan.mat')
stat = [file.scan.stat]';
x = [stat.x]';
But I get something like:
-5543837.67700032
-2054567.16633347
2387852.25825667
4641938.565315761
393003.28157792
4133325.70392322
% :: and so on
I would like to print them as I showed at the beginning (x as a vector of 3 coordinates and one line per station) but I don't know how to treat them. I have tried with loops but I really don't know how to express them.
How can I display my coordinates as an n -by- 3 matrix?
This is the scan file:
This is x:
[stat.x].' gives you a flattened vector, as you've seen. You can reshape() that vector to the desired format:
x = reshape(x,3,[]).';
This reshapes first to 3 rows and n columns (your number of stations), then transposed to have n rows of 3 columns.
For a short introduction on how reshape works, see this answer of mine.

Avoid collpasing dimensions when omitting NAs from array

I have an array where I have to omit NA values. I know that it is an array full of matrices where every row has exactly one NA value. My approach works well for >2 columns of the matrices, but apply() drops one dimension when there are only two columns (as after omitting the NA values, one column disappears).
As this step is part of a much larger code, I would like to avoid recoding the rest and make this step robust to the case when the number of columns is two. Here is a simple example:
#create an array
arr1 <- array(rnorm(3000),c(500,2,3))
#randomly distribute 1 NA value per row of the array
for(i in 1:500){
arr1[i,,sample(3,1)] <- NA
}
#omit the NAs from the array
arr1.apply <- apply(arr1, c(1,2),na.omit)
#we lose no dimension as every dimension >1
dim(arr1.apply)
[1] 2 500 2
#now repeat with a 500x2x2 array
#create an array
arr2 <- array(rnorm(2000),c(500,2,2))
#randomly distribute 1 NA value per row of the array
for(i in 1:500){
arr2[i,,sample(2,1)] <- NA
}
#omit the NAs from the array
arr2.apply <- apply(arr2, c(1,2),na.omit)
#we lose one dimension because the last dimension collapses to size 1
dim(arr2.apply)
[1] 500 2
I do not want apply() to drop the last dimension as it breaks the rest of my code.
I am aware that this is a known issue with apply(), however, I am eager to resolve the problem in this very step, so any help would be appreciated. So far I've tried to wrap apply() in an array() command using the dimensions that should result, however, I think this mixes up the values in the matrix in a way that is not desirable.
Thanks for your help.
I propose a stupid solution, but I think you have no choice if you want to keep it this way:
arr1.apply <- if(dim(arr1)[3] > 2){
apply(arr1, c(1,2),na.omit)} else{
array(apply(arr1, c(1,2),na.omit),dim = c(1,dim(arr1)[1:2]))}

How to convert a cell array of 2D matrices into a multidimensional array in MATLAB

In MATLAB, I have a defined cell array C of
size(C) = 1 by 150
Each matrix T of this cell C is of size
size(C{i}) = 8 by 16
I am wondering if there is a way to define a new multidimension (3D) matrix M that is of size 8 by 16 by 150
That is when I write the command size(M) I get 8 by 16 by 150
Thank you! Looking forward for your answers
If I'm understanding your problem correctly, you have a cell array of 150 cells, and each cell element is 8 x 16, and you wish to stack all of these matrices together in the third dimension so you have a 3D matrix of size 8 x 16 x 150.
It's a simple as:
M = cat(3, C{:});
This syntax may look strange, but it's very valid. The command cat performs concatenation of matrices where the first parameter is the dimension you want to concatenate to... so in your case, that's the third dimension, and the parameters after are the matrices you want to concatenate to make the final matrix.
Doing C{:} creates what is known as a comma-separated list. This is equivalent to typing out the following syntax in MATLAB:
C{1}, C{2}, C{3}, ..., C{150}
Therefore, by doing cat(3, C{:});, what you're really doing is:
cat(3, C{1}, C{2}, C{3}, ..., C{150});
As such, you're taking all of the 150 cells and concatenating them all together in the third dimension. However, instead of having to type out 150 individual cell entries, that is encapsulated by creating a comma-separated list via C{:}.

Adding multiple rows in Array

I have an array A size of 16X16 and I want to add first 3 rows out of 16 in A. What is the most efficient solution in MATLAB?
I tried this code but this is not efficient because I want to extend it for large arrays:
filename = 'n1.txt';
B = importdata(filename);
i = 1;
D = B(i,:)+ B(i+1,:)+ B(i+2,:);
For example, if I want to extend this for an array of size 256x256 and I want to extract 100 rows and add them, how I will do this?
A(1:3,:);%// first three rows.
This uses the standard indices of matrix notation. Check Luis's answer I linked for the full explanation on indices in all forms. For summing things:
B = A(1:100,:);%// first 100 rows
C = sum(B,1);%// sum per column
D = sum(B,2);%// sum per row
E = sum(B(:));%// sum all elements, rows and columns, to a single scalar

Creating sub-arrays from large single array based on marker values

I need to create a 1-D array of 2-D arrays, so that a program can read each 2-D array separately.
I have a large array with 5 columns, with the second column storing 'marker' data. Depending on the marker value, I need to take the corresponding data from the remaining 4 columns and put them into a new array on its own.
I was thinking of having two for loops running, one to take the target data and write it to a cell in the 1-D array, and one to read the initial array line-by-line, looking for the markers.
I feel like this is a fairly simple issue, I'm just having trouble figuring out how to essentially cut and paste certain parts of an array and write them to a new one.
Thanks in advance.
No for loops needed, use your marker with logical indexing. For example, if your large array is A :
B=A(A(:,2)==marker,[1 3:5])
will select all rows where the marker was present, without the 2nd col. Then you can use reshape or the (:) operator to make it 1D, for example
B=B(:)
or, if you want a one-liner:
B=reshape(A(A(:,2)==marker,[1 3:5]),1,[]);
I am just answering my own question to show any potential future users the solution I came up with eventually.
%=======SPECIFY CSV INPUT FILE HERE========
MARKER_DATA=csvread('ESphnB2.csv'); % load data from csv file
%===================================
A=MARKER_DATA(:,2); % create 1D array for markers
A=A'; % make column into row
for i=1:length(A) % for every marker
if A(i) ~= 231 % if it is not 231 then
A(i)=0; % set value to zero
end
end
edgeArray = diff([0; (A(:) ~= 0); 0]); % set non-zero values to 1
ind = [find(edgeArray > 0) find(edgeArray < 0)-1]; % find indices of 1 and save to array with beginning and end
t=1; % initialize counter for trials
for j=1:size(ind,1) % for every marked index
B{t}=MARKER_DATA(ind(j,1):ind(j,2),[3:6]); % create an array with the rows from the data according to indicies
t=t+1; % create a new trial
end
gazeVectors=B'; % reorient and rename array of trials for saccade analysis
%======SPECIFY MAT OUTPUT FILE HERE===
save('Trial_Data_2.mat','gazeVectors'); % save array to mat file
%=====================================

Resources