How can I remove rows of a matrix in Matlab when the difference between two consecutive rows is more than a threshold? - arrays

Suppose a data like:
X y
1 5
2 6
3 1
4 7
5 3
6 8
I want to remove 3 1 and 5 3 because their difference with the previous row is more than 3. In fact, I want to draw a plot with them and want it to be smooth.
I tried
for qq = 1:size(data,1)
if data(qq,2) - data(qq-1,2) > 3
data(qq,:)=[];
end
end
However, it gives:
Subscript indices must either be real positive integers or logicals.
Moreover, I guess the size of array changes as I remove some elements.
In the end, the difference between no consecutive elements must be greater than threshold.
In practice I want to smooth the following picture where there is high fluctuate

One very simple filter from Mathematical morphology that you could try is the closing with a structuring element of size 2. It changes the value of any sample that is lower than both neighbors to the lowest of its two neighbors. Other values are not changed. Thus, it doesn't use a threshold to determine what samples are wrong, it only looks that the sample is lower than both neighbors:
y = [5, 6, 1, 7, 3, 8]; % OP's second column
y1 = y;
y1(end+1) = -inf; % enforce boundary condition
y1 = max(y1,circshift(y1,1)); % dilation
y1 = min(y1,circshift(y1,-1)); % erosion
y1 = y1(1:end-1); % undo boundary condition change
This returns y1 = [5 6 6 7 7 8].
If you want to prevent changing your signal for small deviations, you can apply your threshold as a second step:
I = y1 - y < 3;
y1(I) = y(I);
This finds the places where we changed the signal, but the change was less than the threshold of 3. At those places we write back the original value.

You have a few errors:
Your index needs to start from 2, so that you aren't trying to index 0 for a previous index.
You need to check that the absolute value of the difference is greater than 3.
Since your data matrix is changing sizes, you can't use a for loop with a fixed number of iterations. Use a while loop instead.
This should give you the results you want:
qq = 2;
while qq <= size(data, 1)
if abs(data(qq, 2) - data(qq-1, 2)) > 3,
data(qq, :) = [];
else
qq = qq+1;
end
end

Related

Find a duplicate in array of integers

This was an interview question.
I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.
After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.
The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.
EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.
Take the very last element (x).
Save the element at position x (y).
If x == y you found a duplicate.
Overwrite position x with x.
Assign x = y and continue with step 2.
You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.
Edit: without modifying the input array
This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):
Example:
2 3 4 1 5 4 6 7 8
Entry: 8 7 6
Permutation cycle: 4 1 2 3
As we can see the duplicate (4) is the first number of the cycle.
Finding the permutation cycle
x = last element
x = element at position x
repeat step 2. n times (in total), this guarantees that we entered the cycle
Measuring the cycle length
a = last x from above, b = last x from above, counter c = 0
a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
if a == b the cycle length is c, otherwise continue with step 2.
Finding the entry point to the cycle
x = last element
x = element at position x
repeat step 2. c times (in total)
y = last element
if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
x = element at position x, y = element at position y
repeat steps 5. and 6. until a solution was found.
The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).
Example from above:
x takes the following values: 8 7 6 4 1 2 3 4 1 2
a takes the following values: 2 3 4 1 2
b takes the following values: 2 4 2 4 2
therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)
x takes the following values: 8 7 6 4 | 1 2 3 4
y takes the following values: | 8 7 6 4
x == y == 4 in the end and this is a solution!
Example 2 as requested in the comments: 3 1 4 6 1 2 5
Entering cycle: 5 1 3 4 6 2 1 3
Measuring cycle length:
a: 3 4 6 2 1 3
b: 3 6 1 4 2 3
c = 5
Finding the entry point:
x: 5 1 3 4 6 | 2 1
y: | 5 1
x == y == 1 is a solution
Here is a possible implementation:
function checkDuplicate(arr) {
console.log(arr.join(", "));
let len = arr.length
,pos = 0
,done = 0
,cur = arr[0]
;
while (done < len) {
if (pos === cur) {
cur = arr[++pos];
} else {
pos = cur;
if (arr[pos] === cur) {
console.log(`> duplicate is ${cur}`);
return cur;
}
cur = arr[pos];
}
done++;
}
console.log("> no duplicate");
return -1;
}
for (t of [
[0, 1, 2, 3]
,[0, 1, 2, 1]
,[1, 0, 2, 3]
,[1, 1, 0, 2, 4]
]) checkDuplicate(t);
It is basically the solution proposed by #maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.
If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:
Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".
Set i to 0 and do the following loop:
Increment i until item i is unflagged.
Set j to i and do the following loop:
Set j to vector[j].
if the item at j is flagged, j is a duplicate. Terminate both loops.
Flag the item at j.
If j != i, continue the inner loop.
Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
It depends what tools are you(your app) can use. Currently a lot of frameworks/libraries exists. For exmaple in case of C++ standart you can use std::map<> ,as maraca mentioned.
Or if you have time you can made your own implementation of binary tree, but you need to keep in mind that insert of elements differs in comarison with usual array. In this case you can optimise search of duplicates as it possible in your particular case.
binary tree expl. ref:
https://www.wikiwand.com/en/Binary_tree

Replace +/- values around index - MATLAB

Following this question and the precious help I got from it, I've reached to the following issue:
Using indices of detected peaks and having computed the median of my signal +/-3 datapoints around these peaks, I need to replace my signal in a +/-5 window around the peak with the previously computed median.
I'm only able replace the datapoint at the peak with the median, but not the surrounding +/-5 data points...see figure. Black = original peak; Yellow = data point at peak changed to the median of +/-3 datapoints around it.
Original peak and changed peak
Unfortunately I have not been able to make it work by following suggestions on the previous question.
Any help will be very much appreciated!
Cheers,
M
Assuming you mean the following. Given the array
x = [0 1 2 3 4 5 35 5 4 3 2 1 0]
you want to replace 35 and surrounding +/- 5 entries with the median of 3,4,5,35,5,4,3, which is 4, so the resulting array should be
x = [0 4 4 4 4 4 4 4 4 4 4 4 0]
Following my answer in this question an intuitive approach is to simply replace the neighbors with the median value by offsetting the indicies. This can be accomplished as follows
[~,idx]=findpeaks(x);
med_sz = 3; % Take the median with respect to +/- this many neighbors
repl_sz = 5; % Replace neighbors +/- this distance from peak
if ~isempty(idx)
m = medfilt1(x,med_sz*2+1);
N = numel(x);
for offset = -repl_sz:repl_sz
idx_offset = idx + offset;
idx_valid = idx_offset >= 1 & idx_offset <= N;
x(idx_offset(idx_valid)) = m(idx(idx_valid));
end
end
Alternatively, if you want to avoid loops, an equivalent loopless implementation is
[~,idx]=findpeaks(x);
med_sz = 3;
repl_sz = 5;
if ~isempty(idx)
m = medfilt1(x,med_sz*2+1);
idx_repeat = repmat(idx,repl_sz*2+1,1);
idx_offset = idx_repeat + repmat((-repl_sz:repl_sz)',1,numel(idx));
idx_valid = idx_repeat >= 1 & idx_repeat <= numel(x);
idx_repeat = idx_repeat(idx_valid);
idx_offset = idx_offset(idx_valid);
x(idx_offset) = m(idx_repeat);
end

How to check if all the entries in columns of a matrix are equal (in MATLAB)?

I have a matrix of growing length for example a 4-by-x matrix A where x is increasing in a loop. I want to find the smallest column c where all columns before that, each, carry one single number. The matrix A can look like:
A = [1 2 3 4;
1 2 3 5;
1 2 3 1;
1 2 3 0];
where c=3, and x=4.
At each iteration of the loop where A grows in length, the value of index c grows as well. Therefore, at each iteration, I want to update the value of c. How efficiently can I code this in Matlab?
Let's say you had the matrix A and you wanted to check a particular column iito see if all its elements are the same. The code would be:
all(A(:, ii)==A(1, ii)) % checks if all elements in column are same as first element
Also, keep in mind that once the condition is broken, x cannot be updated anymore. Therefore, your code should be:
x = 0;
while true
%% expand A by one column
if ~all(A(:, x)==A(1, x)) % true if all elements in column are not the same as first element
break;
end
x = x+1;
end
You could use this:
c = find(arrayfun(#(ind)all(A(1, ind)==A(:, ind)), 1:x), 1, 'first');
This finds the first column where not all values are the same. If you run this in a loop, you can detect when entries in a column start to differ:
for x = 1:maxX
% grow A
c = find(arrayfun(#(ind)~all(A(1, ind)==A(:, ind)), 1:x), 1, 'first');
% If c is empty, all columns have values equal to first row.
% Otherwise, we have to subtract 1 to get the number of columns with equal values
if isempty(c)
c = x;
else
c = c - 1;
end
end
Let me give a try as well:
% Find the columns which's elements are same and sum the logical array up
c = sum(A(1,:) == power(prod(A,1), 1/size(A,1)))
d=size(A,2)
To find the last column such that each column up to that one consists of equal values:
c = find(any(diff(A,1,1),1),1)-1;
or
c = find(any(bsxfun(#ne, A, A(1,:)),1),1)-1;
For example:
>> A = [1 2 3 4 5 6;
1 2 3 5 5 7;
1 2 3 1 5 0;
1 2 3 0 5 8];
>> c = find(any(diff(A,1,1),1),1)-1
c =
3
You can try this (easy and fast):
Equal_test = A(1,:)==A(2,:)& A(2,:)==A(3,:)&A(3,:)==A(4,:);
c=find(Equal_test==false,1,'first')-1;
You can also check the result of find if you want.

How to get elements larger than x in a given range?

Given a matrix A, how do I get the elements (and their indices) larger than x in a specific range?
e.g.
A = [1:5; 2:6; 3:7; 4:8; 5:9]
A =
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
And for instance I want all elements larger than 5 and appear in the range A(2:4,3:5). I should get:
elements:
6 , 6 , 7 , 6 , 7 , 8
indices:
14, 18, 19, 22, 23, 24
A(A>5) would give me all entries which are larger than 5.
A(2:4,3:5) would give all elements in the range 2:4,3:5.
I want some combination of the two. Is it possible or the only way is to put the needed range in another array B and only then perform B(B>5)? Obviously 2 problems here: I'd lose the original indices, and it will be slower. I'm doing this on a large number of matrices.
Code. I'm trying to avoid matrix multiplication, so this may look a bit odd:
A = [1:5; 2:6; 3:7; 4:8; 5:9];
[r,c] = meshgrid(2:4,3:5);
n = sub2ind(size(A), r(:), c(:));
indices = sort(n(A(n) > 5)); %'skip sorting if not needed'
values = A(indices);
Explanation. The code converts the Cartesian product of the subscripts to linear indices in the A matrix. Then it selects the indices that respect the condition, then it selects the values.
However, it is slow.
Optimization. Following LuisMendo's suggestion, the code may be sped up by replacing the sub2ind-based linear index calculation with a handcrafted linear index calculation:
A = [1:5; 2:6; 3:7; 4:8; 5:9];
%'For column-first, 1-based-index array memory '
%'layout, as in MATLAB/FORTRAN, the linear index '
%'formula is: '
%'L = R + (C-1)*NR '
n = bsxfun(#plus, (2:4), (transpose(3:5) - 1)*size(A,1));
indices = n(A(n) > 5);
values = A(indices);
If you only need the values (not the indices), it can be done using the third output of find and matrix multiplication. I don't know if it will be faster than using a temporary array, though:
[~, ~, values] = find((A(2:4,3:5)>5).*A(2:4,3:5));
Assuming you need the linear indices and the values, then if the threshold is positive you could define a mask. This may be a good idea if the mask can be defined once and reused for all matrices (that is, if the desired range is the same for all matrices):
mask = false(size(A));
mask(2:4,3:5) = true;
indices = find(A.*mask>5);
values = A(indices);
its a little clunky, but:
R = 2:4;
C = 3:5;
I = reshape(find(A),size(A))
indicies = nonzeros(I(R,C).*(A(R,C)>5))
values = A(indicies)

How would I go about this task- Matlab [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to store value generated from nested for loop in an array, in Matlab?
I have an array of digits. e.g. x = [4,9,8]. I use find(x) to obtain [1,2,3], then find(x)+length(x) to obtain [4,5,6].
I want this(in this case, adding 3 to the array, to make a sequence of 1,2,3 4,5,6 7,8,9...) to go on n number of times, so I require a loop.
Now with the array x, I want to add [4,9,8] to [1,2,3] , which gives [5,11,11].
I have [1,2,3]...[10,11,12]...[n,n+1,n+2] from find(x)+length(x) looped, I want to add elements in x to the elements in corresponding positions, in the array that is going up in three.
So, for example, [4,5,6] 5 is in position 2. x=[4,9,8]. 9 is in position 2 within x. Therefore, I want to add 9 to 5. I want to do this for each element (in this case, each of the three elements). I would add 9 to 11, and 9 to 11 as both numbers are in position '2' in their respective arrays.
I was thinking of using a nested for loop, to take care of the find(x)+length(x). I am just unsure of how to make the 'location additions' happen.
I would then like to store the results of the additions in a separate array.
Thanks in advance for your time and help!
So, we start with
x = [4,9,8]
Let's add [1, 2, 3]
x + [1, 2, 3]
ans =
5 11 11
A more flexible way
x + (1 : length(x))
ans =
5 11 11
If you do not want to start at 1 but at b (say, we add [5, 6, 7] to x):
b = 5;
x + b + (0 : length(x) - 1)
ans =
9 15 15
I think this should get you going and you can add your loop now.
Warning: You have a very strange way of using find(). Just to make sure: find(x) returns the indices of the non-zero entries in x. If all elements of the vector x are non-zero, you have the equality
find(x) == 1 : length(x)
If any element in x is zero, you run into problems, when adding it to find(x):
x = [4, 9, 0, 8];
find(x)
ans =
1 2 4
x + find(x)
Error using +
Matrix dimensions must agree.

Resources