Select range based on contents of an array

Select range based on contents of an array - arrays

I have two arrays. I'd like to copy ranges of data from one of them, based on a list of array locations stored in the other. For example, if the first array were comprised of 100 rows and 2 columns, I might like to copy rows 10-20, rows 60-70 and rows 75-79. If that were the case, then the contents of the second array would be as follows:
b =
10 20
60 70
75 79
In order to select the appropriate rows of the first array based on the second (let's call it 'a'), I would do the following:
b = [a(1,1):a(1,2) a(2,1):a(2,2) a(3,1):a(3,2)]
This works, and returns array 'b' which is basically array 'a' with the correct contents extracted.
The problem is, array 'b' actually contains between 50 and 60 rows (i.e ranges to be included).
How do I make the above code more efficient, such that it works for any number of rows in 'b'?

Example:
a = rand(100,1);
ranges = [
10 20
60 70
75 79
];
idx = arrayfun(#colon, ranges(:,1), ranges(:,2), 'Uniform',false);
idx = [idx{:}];
b = a(idx)

You could use a for, assuming ranges of values to be extracted from a are in b, and should go to c:
% input
a = rand(100,2);
b = [10 20; 60 70; 75 79];
% output
c = zeros(1,2);
for i = 1:size(b,1)
c = vertcat(c a(b(i,1):b(i,2), :));
end
c = c(2:size(c,1), :);

A solution using cellfun and cell arrays:
% Some example data:
a = reshape(1:100, 10, 10);
b = [ 2 3; 5 8; 10 10 ];
Here's the code:
bCell = mat2cell(b, 3, [ 1 1 ]);
aRows = cell2mat(arrayfun(#(x,y) a(x:y, :), bCell{1}, bCell{2}, 'UniformOutput', false));

Related

Loading arrays of different sizes into a single array

I have 100 arrays with the dimension of nx1. n varies from one array to the next (e.g, n1 = 50, n2 = 52, n3 = 48 etc.). I would like to combine all these arrays into a single one with the dimension of 100 x m with m being the max of n's.
The issue I am running into is that as n varies, Matlab will throw out an error says that the dimensions mismatch. Is there a way to get around this so I can pad "missing" cell with N/A? For instance, if the first array contains 50 elements (i.e., n1 = 50) like this:
23
31
6
...
22
the second array contains 52 elements (i.e., n2 = 52) like this:
25
85
41
...
8
12
66
The result should be:
23 25
31 85
6 41
... ...
22 8
N/A 12
N/A 66
Thanks to the community in advance!

Here is another approach without eval.
array_lengths = cellfun(#numel, arrays);
max_length = max(array_lengths);
result = nan(max_rows, num_arrays);
for r=1:num_arrays
result(1:array_lengths(r),r) = arrays{r}(1:array_lengths(r));
end
Some explanation: I'm assuming your arrays are stored in a cell to begin with. Here is some code to generate fictitious data with the dimensions you gave:
% some dummy data for your arrays.
num_arrays = 100;
primerArrayCell = num2cell(ones(1,num_arrays)); % , 1, ones(1, num_arrays));
arrays = cellfun(#(c) rand(randi(50, 1),1), primerArrayCell, 'uniformoutput',false);
You can use cellfun with an anonymous function to get the lengths of each individual array:
% Assume your arrays are in a cell of arrays with the variable name arrays
array_lengths = cellfun(#numel, arrays);
max_length = max(array_lengths);
Allocate an array of nan values to store your result
% initialize your data to nan's.
result = nan(max_rows, num_arrays);
Then fill in the non-nan values based on the length of the arrays calculated previously.
for r=1:num_arrays
result(1:array_lengths(r),r) = arrays{r}(1:array_lengths(r));
end

You may want to consider using structure arrays for storing such datasets as it makes everything easier when merging them into a single array.
But to answer your question, if you have arrays like this:
a1 = 1:20; % array of size 1 x 20
n1 = numel(a1); % 20
a2 = 50:60; % array of size 1 x 11
n2 = numel(a2); % 11
... say you have nArrs arrays
Given nArrs arrays for example, you can create the desired matrix res like this:
m = max([n1, n2, .... ]);
res = ones(m,nArrs) * nan; % initialize the result matrix w/ nan
% Manually
res(1:n1,1) = a1.';
res(1:n2,2) = a2.';
% ... so on
% Or use eval instead like this
for i = 1:nArrs
eval(['res(1:n' int2str(i) ', i) = a' int2str(i) '.'';'])
end
Now bear in mind that using eval is NOT recommended but hopefully that just gives you an idea as to what to do. If you did use structures, you can replace eval with something more efficient and robust like arrayfun for instance.

How do a create a list of indexes based on the value and a new array from that? MATLAB

I would like to eliminate all the columns in which the third row contain zero values in my dataset.
As an example:
original_data = [1 2 3 4 5; 1 2 3 4 5; 0 0 0 1 2]
For the first three columns (with zeros on third line), I would like to create a new array in which the colums with zeros in third line are deleted to get the result:
new_data = [ 4 5; 4 5; 1 2]
I would also like an array of the column indices of the non-zero values in the original array.
For example:
original_indices = [4, 5]
I tried:
dados_teste = dados_out_15;
dados_p6 = [];
[m,n] = size(dados_teste)
for i = 1:n
if dados_teste(3:i) == 0;
dados_p6 = dados_teste(:,i)
else
dados_p6 = dados_teste(:,n)
end
end
But it clearly does not work...

I would apply the find() function to find all the non-zero indices, then apply matrix indexing to generate a new array that only contains the columns corresponding to the non-zero indices in the third row.
Sample_Array = [20 30 40 50; 30 20 70 90; 0 2 1 2];
%Grabbing the third row of the matrix%
Third_Row = Sample_Array(3,:);
%Finding all the non-zero indices%
[Non_Zero_Indices] = find(Third_Row);
%Using matrix indices to generate a new array based on the non-zero
%indicies%
New_Matrix = Sample_Array(:,Non_Zero_Indices);
%Printing matrices%
Sample_Array
New_Matrix
Non_Zero_Indices

reversing shuffling of array by indexing

I have a matrix whose columns which was shuffled according to some index. I know want to find the index that 'unshuffles' the array back into its original state.
For example:
myArray = [10 20 30 40 50 60]';
myShuffledArray = nan(6,3)
myShufflingIndex = nan(6,3)
for x = 1:3
myShufflingIndex(:,x) = randperm(length(myArray))';
myShuffledArray(:,x) = myArray(myShufflingIndex(:,x));
end
Now I want to find a matrix myUnshufflingIndex, which reverses the shuffling to get an array myUnshuffledArray = [10 20 30 40 50 60; 10 20 30 40 50 60; 10 20 30 40 50 60]'
I expect to use myUnshufflingIndex in the following way:
for x = 1:3
myUnShuffledArray(:,x) = myShuffledArray(myUnshufflingIndex(:,x), x);
end
For example, if one column in myShufflingIndex = [2 4 6 3 5 1]', then the corresponding column in myUnshufflingIndex is [6 1 4 2 5 3]'
Any ideas on how to get myUnshufflingIndex in a neat vectorised way? Also, is there a better way to unshuffle the array columnwise than in a loop?

You can get myUnshufflingIndex with a single call to sort:
[~, myUnshufflingIndex] = sort(myShufflingIndex, 1);
Alternatively, you don't even need to compute myUnshufflingIndex, since you can just use myShufflingIndex on the left hand side of the assignment to unshuffle the data:
for x = 1:3
myUnShuffledArray(myShufflingIndex(:, x), x) = myShuffledArray(:, x);
end
And if you'd like to avoid a for loop while unshuffling, you can vectorize it by adding an offset to each column of your index, turning it into a matrix of linear indices instead of just row indices:
[nRows, nCols] = size(myShufflingIndex);
myUnshufflingIndex = myShufflingIndex+repmat(0:nRows:(nRows*(nCols-1)), nRows, 1);
myUnShuffledArray = nan(nRows, nCols); % Preallocate
myUnShuffledArray(myUnshufflingIndex) = myShuffledArray;

Delete values between specific ranges of indices in an array

I have an array :
Z = [1 24 3 4 52 66 77 8 21 100 101 120 155];
I have another array:
deletevaluesatindex=[1 3; 6 7;10 12]
I want to delete the values in array Z at indices (1 to 3, 6 to 7, 10 to 12) represented in the array deletevaluesatindex
So the result of Z is:
Z=[4 52 8 21 155];
I tried to use the expression below, but it does not work:
X([deletevaluesatindex])=[]

Another solution using bsxfun and cumsum:
%// create index matrix
idx = bsxfun(#plus , deletevaluesatindex.', [0; 1])
%// create mask
mask = zeros(numel(Z),1);
mask(idx(:)) = (-1).^(0:numel(idx)-1)
%// extract unmasked elements
out = Z(~cumsum(mask))
out = 4 52 8 21 155

This will do it:
rdvi= size(deletevaluesatindex,1); %finding rows of 'deletevaluesatindex'
temp = cell(1,rdvi); %Pre-allocation
for i=1:rdvi
%making a cell array of elements to be removed
temp(i)={deletevaluesatindex(i,1):deletevaluesatindex(i,2)};
end
temp = cell2mat(temp); %Now temp array contains the elements to be removed
Z(temp)=[] % Removing the elements

If you control how deletevaluesatindex is generated, you can instead directly generate the ranges using MATLAB's colon operator and concatenate them together using
deletevaluesatindex=[1:3 6:7 10:12]
then use the expression you suggested
Z([deletevaluesatindex])=[]
If you have to use deletevaluesatindex as it is given, you can generate the concatenated range using a loop or something like this
lo = deletevaluseatindex(:,1)
up = deletevaluseatindex(:,2)
x = cumsum(accumarray(cumsum([1;up(:)-lo(:)+1]),[lo(:);0]-[0;up(:)]-1)+1);
deleteat = x(1:end-1)

Edit: as in comments noted this solution only works in GNU Octave
with bsxfun this is possible:
Z=[1 24 3 4 52 66 77 8 21 100 101 120 155];
deletevaluesatindex = [1 3; 6 7;10 12];
idx = 1:size(deletevaluesatindex ,1);
idx_rm=bsxfun(#(A,B) (A(B):deletevaluesatindex (B,2))',deletevaluesatindex (:,1),idx);
Z(idx_rm(idx_rm ~= 0))=[]

Find timeline for duration values in Matlab

I have the following time-series:
b = [2 5 110 113 55 115 80 90 120 35 123];
Each number in b is one data point at a time instant. I computed the duration values from b. Duration is represented by all numbers within b larger or equal to 100 and arranged consecutively (all other numbers are discarded). A maximum gap of one number smaller than 100 is allowed. This is how the code for duration looks like:
N = 2; % maximum allowed gap
duration = cellfun(#numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'], 'split'));
giving the following duration values for b:
duration = [4 3];
I want to find the positions (time-lines) within b for each value in duration. Next, I want to replace the other positions located outside duration with zeros. The result would look like this:
result = [0 0 3 4 5 6 0 0 9 10 11];
If anyone could help, it would be great.

Answer to original question: pattern with at most one value below 100
Here's an approach using a regular expression to detect the desired pattern. I'm assuming that one value <100 is allowed only between (not after) values >=100. So the pattern is: one or more values >=100 with a possible value <100 in between .
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
This gives
y =
0 0 3 4 5 6 0 0 9 10 11
Answer to edited question: pattern with at most n values in a row below 100
The regexp needs to be modified, and it has to be dynamically built as a function of n:
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result

Here is another solution, not using regexp. It naturally generalizes to arbitrary gap sizes and thresholds. Not sure whether there is a better way to fill the gaps. Explanation in comments:
% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];
% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));
% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);