How can I efficiently remove zeroes from a (non-sparse) matrix? - arrays

I have a matrix:
x = [0 0 0 1 1 0 5 0 7 0];
I need to remove all of the zeroes, like so:
x = [1 1 5 7];
The matrices I am using are large (1x15000) and I need to do this multiple times (5000+), so efficiency is key!

One way:
x(x == 0) = [];
A note on timing:
As mentioned by woodchips, this method seems slow compared to the one used by KitsuneYMG. This has also been noted by Loren in one of her MathWorks blog posts. Since you mentioned having to do this thousands of times, you may notice a difference, in which case I would try x = x(x~=0); first.
WARNING: Beware if you are using non-integer numbers. If, for example, you have a very small number that you would like to consider close enough to zero so that it will be removed, the above code won't remove it. Only exact zeroes are removed. The following will help you also remove numbers "close enough" to zero:
tolerance = 0.0001; % Choose a threshold for "close enough to zero"
x(abs(x) <= tolerance) = [];

Just to be different:
x=x(x~=0);
or
x=x(abs(x)>threshold);
This has the bonus of working on complex numbers too

Those are the three common solutions. It helps to see the difference.
x = round(rand(1,15000));
y = x;
tic,y(y==0) = [];toc
Elapsed time is 0.004398 seconds.
y = x;
tic,y = y(y~=0);toc
Elapsed time is 0.001759 seconds.
y = x;
tic,y = y(find(y));toc
Elapsed time is 0.003579 seconds.
As you should see, the cheapest way is the direct logical index, selecting out the elements to be retained. The find is more expensive, since matlab finds those elements, returning a list of them, and then indexes into the vector.

Here's another way
y = x(find(x))
I'll leave it to you to figure out the relative efficiency of the various approaches you try -- do write and let us all know.

Though my timing results are not conclusive to whether it is significantly faster, this seems to be the fastest and easiest approach:
y = nonzeros(y)

x = [0 0 0 1 1 0 5 0 7 0]
y = [0 2 0 1 1 2 5 2 7 0]
Then x2 and y2 can be obtained as:
x2=x(~(x==0 & y==0))
y2=y(~(x==0 & y==0))
x2 = [0 1 1 0 5 0 7]
y2 = [2 1 1 2 5 2 7]
Hope this helps!

Related

How can I remove rows of a matrix in Matlab when the difference between two consecutive rows is more than a threshold?

Suppose a data like:
X y
1 5
2 6
3 1
4 7
5 3
6 8
I want to remove 3 1 and 5 3 because their difference with the previous row is more than 3. In fact, I want to draw a plot with them and want it to be smooth.
I tried
for qq = 1:size(data,1)
if data(qq,2) - data(qq-1,2) > 3
data(qq,:)=[];
end
end
However, it gives:
Subscript indices must either be real positive integers or logicals.
Moreover, I guess the size of array changes as I remove some elements.
In the end, the difference between no consecutive elements must be greater than threshold.
In practice I want to smooth the following picture where there is high fluctuate
One very simple filter from Mathematical morphology that you could try is the closing with a structuring element of size 2. It changes the value of any sample that is lower than both neighbors to the lowest of its two neighbors. Other values are not changed. Thus, it doesn't use a threshold to determine what samples are wrong, it only looks that the sample is lower than both neighbors:
y = [5, 6, 1, 7, 3, 8]; % OP's second column
y1 = y;
y1(end+1) = -inf; % enforce boundary condition
y1 = max(y1,circshift(y1,1)); % dilation
y1 = min(y1,circshift(y1,-1)); % erosion
y1 = y1(1:end-1); % undo boundary condition change
This returns y1 = [5 6 6 7 7 8].
If you want to prevent changing your signal for small deviations, you can apply your threshold as a second step:
I = y1 - y < 3;
y1(I) = y(I);
This finds the places where we changed the signal, but the change was less than the threshold of 3. At those places we write back the original value.
You have a few errors:
Your index needs to start from 2, so that you aren't trying to index 0 for a previous index.
You need to check that the absolute value of the difference is greater than 3.
Since your data matrix is changing sizes, you can't use a for loop with a fixed number of iterations. Use a while loop instead.
This should give you the results you want:
qq = 2;
while qq <= size(data, 1)
if abs(data(qq, 2) - data(qq-1, 2)) > 3,
data(qq, :) = [];
else
qq = qq+1;
end
end

Replace +/- values around index - MATLAB

Following this question and the precious help I got from it, I've reached to the following issue:
Using indices of detected peaks and having computed the median of my signal +/-3 datapoints around these peaks, I need to replace my signal in a +/-5 window around the peak with the previously computed median.
I'm only able replace the datapoint at the peak with the median, but not the surrounding +/-5 data points...see figure. Black = original peak; Yellow = data point at peak changed to the median of +/-3 datapoints around it.
Original peak and changed peak
Unfortunately I have not been able to make it work by following suggestions on the previous question.
Any help will be very much appreciated!
Cheers,
M
Assuming you mean the following. Given the array
x = [0 1 2 3 4 5 35 5 4 3 2 1 0]
you want to replace 35 and surrounding +/- 5 entries with the median of 3,4,5,35,5,4,3, which is 4, so the resulting array should be
x = [0 4 4 4 4 4 4 4 4 4 4 4 0]
Following my answer in this question an intuitive approach is to simply replace the neighbors with the median value by offsetting the indicies. This can be accomplished as follows
[~,idx]=findpeaks(x);
med_sz = 3; % Take the median with respect to +/- this many neighbors
repl_sz = 5; % Replace neighbors +/- this distance from peak
if ~isempty(idx)
m = medfilt1(x,med_sz*2+1);
N = numel(x);
for offset = -repl_sz:repl_sz
idx_offset = idx + offset;
idx_valid = idx_offset >= 1 & idx_offset <= N;
x(idx_offset(idx_valid)) = m(idx(idx_valid));
end
end
Alternatively, if you want to avoid loops, an equivalent loopless implementation is
[~,idx]=findpeaks(x);
med_sz = 3;
repl_sz = 5;
if ~isempty(idx)
m = medfilt1(x,med_sz*2+1);
idx_repeat = repmat(idx,repl_sz*2+1,1);
idx_offset = idx_repeat + repmat((-repl_sz:repl_sz)',1,numel(idx));
idx_valid = idx_repeat >= 1 & idx_repeat <= numel(x);
idx_repeat = idx_repeat(idx_valid);
idx_offset = idx_offset(idx_valid);
x(idx_offset) = m(idx_repeat);
end

Upsample vector by including zeros in-between elements

I am trying to make a Upsampling code, so i want to insert a zeros vector inside a vector, like this:
z=[0 0]%Zeros Vector
x=[1 2 3 4]%Vector to Upsample
y=[1 0 0 2 0 0 3 0 0 4 0 0]%Vector Upsampled
I have this code:
fs=20;
N=50;
T=1/fs;
n=0:1:N-1;
L=3;
M=2;
x = exp(-0.5*n*T).*sin(2*pi*n*T);
A=zeros(1,L);
disp(A);
for i = 1:M:length(x)
x(:,i)=A;
end
disp(x);
But I am gettinf this error:
A(I,J,...) = X: dimensions mismatch
Any Idea of how can I do that?
Forget the loop. Use the following solution:
out = zeros(1,length(x)*L);
out(:,1:L:end) = x
Here's a solution using bsxfun and reshape:
y = reshape(bsxfun(#times,[1;z.'],x),1,[]);
I initially thought of repelem, but decided it was too much work. However, if you just want to make your vector longer using a "zero order approximation" - this is just the function for you.
You can use repmat and reshape to get the upsampled vector as:
y = reshape([x' repmat(z,size(x,2),1)]',1,[])
y =
1 0 0 2 0 0 3 0 0 4 0 0
Keep in mind that z and x are row vectors, you may need to play with the statement a little bit if they are column vectors.
A short alternative using the kronecker tensor product:
y = kron(x,[1, z]) %// x(:).' and z(:).' for independent vector orientations
And another fast alternative:
y = [1; z(:)]*x; y = y(:).' %// x(:).' for independent vector orientations
which is basically equivalent to:
y = reshape( [1; z(:)]*x, 1, []) %// x(:).' for independent vector orientations
Use upsample
fs=20;
N=50;
T=1/fs;
n=0:1:N-1;
L=3;
x = exp(-0.5*n*T).*sin(2*pi*n*T);
x = upsample(x, L)

Split vector in MATLAB

I'm trying to elegantly split a vector. For example,
vec = [1 2 3 4 5 6 7 8 9 10]
According to another vector of 0's and 1's of the same length where the 1's indicate where the vector should be split - or rather cut:
cut = [0 0 0 1 0 0 0 0 1 0]
Giving us a cell output similar to the following:
[1 2 3] [5 6 7 8] [10]
Solution code
You can use cumsum & accumarray for an efficient solution -
%// Create ID/labels for use with accumarray later on
id = cumsum(cut)+1
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(mask).',vec(mask).',[],#(x) {x})
Benchmarking
Here are some performance numbers when using a large input on the three most popular approaches listed to solve this problem -
N = 100000; %// Input Datasize
vec = randi(100,1,N); %// Random inputs
cut = randi(2,1,N)-1;
disp('-------------------- With CUMSUM + ACCUMARRAY')
tic
id = cumsum(cut)+1;
mask = cut==0;
out = accumarray(id(mask).',vec(mask).',[],#(x) {x});
toc
disp('-------------------- With FIND + ARRAYFUN')
tic
N = numel(vec);
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
toc
disp('-------------------- With CUMSUM + ARRAYFUN')
tic
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
toc
Runtimes
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.068102 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.117953 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 12.560973 seconds.
Special case scenario: In cases where you might have runs of 1's, you need to modify few things as listed next -
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Setup IDs differently this time. The idea is to have successive IDs.
id = cumsum(cut)+1
[~,~,id] = unique(id(mask))
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(:),vec(mask).',[],#(x) {x})
Sample run with such a case -
>> vec
vec =
1 2 3 4 5 6 7 8 9 10
>> cut
cut =
1 0 0 1 1 0 0 0 1 0
>> celldisp(out)
out{1} =
2
3
out{2} =
6
7
8
out{3} =
10
For this problem, a handy function is cumsum, which can create a cumulative sum of the cut array. The code that produces an output cell array is as follows:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [0 0 0 1 0 0 0 0 1 0];
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = {};
for i=1:numel(sumvals)
output{i} = vec(cutsum == sumvals(i)); %#ok<SAGROW>
end
As another answer shows, you can use arrayfun to create a cell array with the results. To apply that here, you'd replace the for loop (and the initialization of output) with the following line:
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
That's nice because it doesn't end up growing the output cell array.
The key feature of this routine is the variable cutsum, which ends up looking like this:
cutsum =
0 0 0 NaN 1 1 1 1 NaN 2
Then all we need to do is use it to create indices to pull the data out of the original vec array. We loop from zero to max and pull matching values. Notice that this routine handles some situations that may arise. For instance, it handles 1 values at the very beginning and very end of the cut array, and it gracefully handles repeated ones in the cut array without creating empty arrays in the output. This is because of the use of unique to create the set of values to search for in cutsum, and the fact that we throw out the NaN values in the sumvals array.
You could use -1 instead of NaN as the signal flag for the cut locations to not use, but I like NaN for readability. The -1 value would probably be more efficient, as all you'd have to do is truncate the first element from the sumvals array. It's just my preference to use NaN as a signal flag.
The output of this is a cell array with the results:
output{1} =
1 2 3
output{2} =
5 6 7 8
output{3} =
10
There are some odd conditions we need to handle. Consider the situation:
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14];
cut = [1 0 0 1 1 0 0 0 0 1 0 0 0 1];
There are repeated 1's in there, as well as a 1 at the beginning and end. This routine properly handles all this without any empty sets:
output{1} =
2 3
output{2} =
6 7 8 9
output{3} =
11 12 13
You can do this with a combination of find and arrayfun:
vec = [1 2 3 4 5 6 7 8 9 10];
N = numel(vec);
cut = [0 0 0 1 0 0 0 0 1 0];
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
1 2 3
out{2} =
5 6 7 8
out{3} =
10
So how does this work? Well, the first line defines your input vector, the second line finds how many elements are in this vector and the third line denotes your cut vector which defines where we need to cut in our vector. Next, we use find to determine the locations that are non-zero in cut which correspond to the split points in the vector. If you notice, the split points determine where we need to stop collecting elements and begin collecting elements.
However, we need to account for the beginning of the vector as well as the end. ind_after tells us the locations of where we need to start collecting values and ind_before tells us the locations of where we need to stop collecting values. To calculate these starting and ending positions, you simply take the result of find and add and subtract 1 respectively.
Each corresponding position in ind_after and ind_before tell us where we need to start and stop collecting values together. In order to accommodate for the beginning of the vector, ind_after needs to have the index of 1 inserted at the beginning because index 1 is where we should start collecting values at the beginning. Similarly, N needs to be inserted at the end of ind_before because this is where we need to stop collecting values at the end of the array.
Now for ind_after and ind_before, there is a degenerate case where the cut point may be at the end or beginning of the vector. If this is the case, then subtracting or adding by 1 will generate a start and stopping position that's out of bounds. We check for this in the 4th and 5th line of code and simply set these to 1 or N depending on whether we're at the beginning or end of the array.
The last line of code uses arrayfun and iterates through each pair of ind_after and ind_before to slice into our vector. Each result is placed into a cell array, and our output follows.
We can check for the degenerate case by placing a 1 at the beginning and end of cut and some values in between:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [1 0 0 1 0 0 0 1 0 1];
Using this example and the above code, we get:
>> celldisp(out)
out{1} =
1
out{2} =
2 3
out{3} =
5 6 7
out{4} =
9
out{5} =
10
Yet another way, but this time without any loops or accumulating at all...
lengths = diff(find([1 cut 1])) - 1; % assuming a row vector
lengths = lengths(lengths > 0);
data = vec(~cut);
result = mat2cell(data, 1, lengths); % also assuming a row vector
The diff(find(...)) construct gives us the distance from each marker to the next - we append boundary markers with [1 cut 1] to catch any runs of zeros which touch the ends. Each length is inclusive of its marker, though, so we subtract 1 to account for that, and remove any which just cover consecutive markers, so that we won't get any undesired empty cells in the output.
For the data, we mask out any elements corresponding to markers, so we just have the valid parts we want to partition up. Finally, with the data ready to split and the lengths into which to split it, that's precisely what mat2cell is for.
Also, using #Divakar's benchmark code;
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.272810 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.436276 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 17.112259 seconds.
-------------------- With mat2cell
Elapsed time is 0.084207 seconds.
...just sayin' ;)
Here's what you need:
function spl = Splitting(vec,cut)
n=1;
j=1;
for i=1:1:length(b)
if cut(i)==0
spl{n}(j)=vec(i);
j=j+1;
else
n=n+1;
j=1;
end
end
end
Despite how simple my method is, it's in 2nd place for performance:
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.264428 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.407963 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 18.337940 seconds.
-------------------- SIMPLE
Elapsed time is 0.271942 seconds.
Unfortunately there is no 'inverse concatenate' in MATLAB. If you wish to solve a question like this you can try the below code. It will give you what you looking for in the case where you have two split point to produce three vectors at the end. If you want more splits you will need to modify the code after the loop.
The results are in n vector form. To make them into cells, use num2cell on the results.
pos_of_one = 0;
% The loop finds the split points and puts their positions into a vector.
for kk = 1 : length(cut)
if cut(1,kk) == 1
pos_of_one = pos_of_one + 1;
A(1,one_pos) = kk;
end
end
F = vec(1 : A(1,1) - 1);
G = vec(A(1,1) + 1 : A(1,2) - 1);
H = vec(A(1,2) + 1 : end);

Using bsxfun with an anonymous function

after trying to understand the bsxfun function I have tried to implement it in a script to avoid looping. I am trying to check if each individual element in an array is contained in one matrix, returning a matrix the same size as the initial array containing 1 and 0's respectively. The anonymous function I have created is:
myfunction = #(x,y) (sum(any(x == y)));
x is the matrix which will contain the 'accepted values' per say. y is the input array. So far I have tried using the bsxfun function in this way:
dummyvar = bsxfun(myfunction,dxcp,X)
I understand that myfunction is equal to the handle of the anonymous function and that bsxfun can be used to accomplish this I just do not understand the reason for the following error:
Non-singleton dimensions of the two input arrays must match each other.
I am using the following test data:
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
and hope for the output to be:
dummyvar = [1,0,0,0]
Cheers, NZBRU.
EDIT: Reached 15 rep so I have updated the answer
Thanks again guys, I thought I would update this as I now understand how the solution provided from Divakar works. This might deter confusion from others who have read my initial question and are confused to how bsxfun() works, I think writing it out helps me understand it better too.
Note: The following may be incorrect, I have just tried to understand how the function operates by looking at this one case.
The input into the bsxfun function was dxcp and X transposed. The function handle used was #eq so each element was compared.
%%// Given data
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
The following code:
bsxfun(#eq,dxcp,X')
compared every value of dxcp, the first input variable, to every row of X'. The following matrix is the output of this:
dummyvar =
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
The first element was found by comparing 1 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The next along the first row was found by comparing 2 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
This was repeated until all of the values of dxcp where compared to the first row of X'. Following this logic, the first element in the second row was calculating using the comparison between: dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The final solution provided was any(bsxfun(#eq,dxcp,X'),2) which is equivalent to: any(dummyvar,2). http://nf.nci.org.au/facilities/software/Matlab/techdoc/ref/any.html seems to explain the any function in detail well. Basically, say:
A = [1,2;0,0;0,1]
If the following code is run:
result = any(A,2)
Then the function any will check if each row contains one or several non-zero elements and return 1 if so. The result of this example would be:
result = [1;0;1];
Because the second input parameter is equal to 2. If the above line was changed to result = any(A,1) then it would check for each column.
Using this logic,
result = any(A,2)
was used to obtain the final result.
1
0
0
0
which if needed could be transposed to equal
[1,0,0,0]
Performance- After running the following code:
tic
dummyvar = ~any(bsxfun(#eq,dxcp,X'),2)'
toc
It was found that the duration was:
Elapsed time is 0.000085 seconds.
The alternative below:
tic
arrayfun(#(el) any(el == dxcp),X)
toc
using the arrayfun() function (which applies a function to each element of an array) resulted in a runtime of:
Elapsed time is 0.000260 seconds.
^The above run times are averages over 5 runs of each meaning that in this case bsxfun() is faster (on average).
You don't want every combination of elements thrown into your any(x == y) test, you want each element from dxcp tested to see if it exists in X. So here is the short version, which also needs no transposes. Vectorization should also be a bit faster than bsxfun.
arrayfun(#(el) any(el == X), dxcp)
The result is
ans =
0 1 0 0 0 0

Resources