What does "uniquetol" do, exactly? - arrays

The uniquetol function, introduced in R2015a, computes "unique elements within tolerance". Specifically,
C = uniquetol(A,tol) returns the unique elements in A using tolerance tol.
But the problem of finding unique elements with a given tolerance has several solutions. Which one is actually produced?
Let's see two examples:
Let A = [3 5 7 9] with absolute tolerance 2.5. The output can be [3 7], or it can be [5 9]. Both solutions satisfy the requirement.
For A = [1 3 5 7 9] with absolute tolerance 2.5, the output can be [1 5 9] or [3 7]. So even the number of elements in the output can vary.
See this nice discussion about the transitivity issue that lies at the heart of the problem.
So, how does uniquetol work? What output does it produce among the several existing solutions?

To simplify, I consider the one-output, two-input version of uniquetol,
C = uniquetol(A, tol);
where the first input is a double vector A. In particular, this implies that:
The 'ByRows' option of uniquetol is not used.
The first input is a vector. If it were not, uniquetol would implicitly linearize to a column, as usual.
The second input, which defines the tolerance, is interpreted as follows:
Two values, u and v, are within tolerance if abs(u-v) <= tol*max(abs(A(:)))
That is, the specified tolerance is relative by default. The actual tolerance used in the comparisons is obtained by scaling by the maximum absolute value in A.
With these considerations, it seems that the approach that uniquetol uses is:
Sort A.
Pick the first entry of sorted A, and set this as reference value (this value will have to be updated later).
Write the reference value into the output C.
Skip subsequent entries of sorted A until one is found that is not within tolerance of the reference value. When that entry is found, take it as the new reference value and go back to step 3.
Of course, I'm not saying that this is what uniquetol internally does. But the output seems to be the same. So this is functionally equivalent to what uniquetol does.
The following code implements the approach described above (inefficient code, just to illustrate the point).
% Inputs A, tol
% Output C
tol_scaled = tol*max(abs(A(:))); % scale tolerance
C = []; % initiallize output. Will be extended
ref = NaN; % initiallize reference value to NaN. This will immediately cause
% A(1) to become the new reference
for a = sort(A(:)).';
if ~(a-ref <= tol_scaled)
ref = a;
C(end+1) = ref;
end
end
To verify this, let's generate some random data and compare the output of uniquetol and of the above code:
clear
N = 1e3; % number of realizations
S = 1e5; % maximum input size
for n = 1:N;
% Generate inputs:
s = randi(S); % input size
A = (2*rand(1,S)-1) / rand; % random input of length S; positive and
% negative values; random scaling
tol = .1*rand; % random tolerance (relative). Change value .1 as desired
% Compute output:
tol_scaled = tol*max(abs(A(:))); % scale tolerance
C = []; % initiallize output. Will be extended
ref = NaN; % initiallize reference value to NaN. This will immediately cause
% A(1) to become the new reference
for a = sort(A(:)).';
if ~(a-ref <= tol_scaled)
ref = a;
C(end+1) = ref;
end
end
% Check if output is equal to that of uniquetol:
assert(isequal(C, uniquetol(A, tol)))
end
In all my tests this has run without the assertion failing.
So, in summary, uniquetol seems to sort the input, pick its first entry, and keep skipping entries for as long as it can.
For the two examples in the question, the outputs are as follows. Note that the second input is specified as 2.5/9, where 9 is the maximum of the first input, to achieve an absolute tolerance of 2.5:
>> uniquetol([1 3 5 7 9], 2.5/9)
ans =
1 5 9
>> uniquetol([3 5 7 9], 2.5/9)
ans =
3 7

Related

Define a vector with random steps

I want to create an array that has incremental random steps, I've used this simple code.
t_inici=(0:10*rand:100);
The problem is that the random number keeps unchangable between steps. Is there any simple way to change the seed of the random number within each step?
If you have a set number of points, say nPts, then you could do the following
nPts = 10; % Could use 'randi' here for random number of points
lims = [0, 10] % Start and end points
x = rand(1, nPts); % Create random numbers
% Sort and scale x to fit your limits and be ordered
x = diff(lims) * ( sort(x) - min(x) ) / diff(minmax(x)) + lims(1)
This approach always includes your end point, which a 0:dx:10 approach would not necessarily.
If you had some maximum number of points, say nPtsMax, then you could do the following
nPtsMax = 1000; % Max number of points
lims = [0,10]; % Start and end points
% Could do 10* or any other multiplier as in your example in front of 'rand'
x = lims(1) + [0 cumsum(rand(1, nPtsMax))];
x(x > lims(2)) = []; % remove values above maximum limit
This approach may be slower, but is still fairly quick and better represents the behaviour in your question.
My first approach to this would be to generate N-2 samples, where N is the desired amount of samples randomly, sort them, and add the extrema:
N=50;
endpoint=100;
initpoint=0;
randsamples=sort(rand(1, N-2)*(endpoint-initpoint)+initpoint);
t_inici=[initpoint randsamples endpoint];
However not sure how "uniformly random" this is, as you are "faking" the last 2 data, to have the extrema included. This will somehow distort pure randomness (I think). If you are not necessarily interested on including the extrema, then just remove the last line and generate N points. That will make sure that they are indeed random (or as random as MATLAB can create them).
Here is an alternative solution with "uniformly random"
[initpoint,endpoint,coef]=deal(0,100,10);
t_inici(1)=initpoint;
while(t_inici(end)<endpoint)
t_inici(end+1)=t_inici(end)+rand()*coef;
end
t_inici(end)=[];
In my point of view, it fits your attempts well with unknown steps, start from 0, but not necessarily end at 100.
From your code it seems you want a uniformly random step that varies between each two entries. This implies that the number of entries that the vector will have is unknown in advance.
A way to do that is as follows. This is similar to Hunter Jiang's answer but adds entries in batches instead of one by one, in order to reduce the number of loop iterations.
Guess a number of required entries, n. Any value will do, but a large value will result in fewer iterations and will probably be more efficient.
Initiallize result to the first value.
Generate n entries and concatenate them to the (temporary) result.
See if the current entries are already too many.
If they are, cut as needed and output (final) result. Else go back to step 3.
Code:
lower_value = 0;
upper_value = 100;
step_scale = 10;
n = 5*(upper_value-lower_value)/step_scale*2; % STEP 1. The number 5 here is arbitrary.
% It's probably more efficient to err with too many than with too few
result = lower_value; % STEP 2
done = false;
while ~done
result = [result result(end)+cumsum(step_scale*rand(1,n))]; % STEP 3. Include
% n new entries
ind_final = find(result>upper_value,1)-1; % STEP 4. Index of first entry exceeding
% upper_value, if any
if ind_final % STEP 5. If non-empty, we're done
result = result(1:ind_final-1);
done = true;
end
end

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

Test if arrays are proportional

Is there a nice way to test whether two arrays are proportional in MATLAB? Something like the isequal function but for testing for proportionality.
One heuristic way to do this would be to simply divide one array by another element wise, and ensure that the largest and smallest values within this result are within some tolerance. The degenerate case is when you have zeroes in the arrays. In this case, using max and min will not affect the way this algorithm works because those functions ignore nan values. However, if both A and B are zero arrays, then there are an infinite number of scalar multiples that are possible and so there isn't one answer. We'll set this to nan if we encounter this.
Given A and B, something like this could work:
C = A./B; % Divide element-wise
tol = 1e-10; % Divide tolerance
% Check if the difference between largest and smallest values are within the
% tolerance
check = abs(max(C) - min(C)) < tol;
% If yes, get the scalar multiple
if check
scalar = C(1);
else % If not, set to NaN
scalar = NaN;
end
If you have the Statistics Toolbox, you can use pdist2 to compute the 'cosine' distance between the two arrays. This will give 0 if they are proportional:
>> pdist2([1 3 5], [10 30 50], 'cosine')
ans =
0
>> pdist2([1 3 5], [10 30 51], 'cosine')
ans =
3.967230676171774e-05
As mentioned by #rayryeng, be sure to use a tolerance if you are dealing with real numbers.
A = rand(1,5);
B = pi*A;
C = A./B; %Divide the two
PropArray = all(abs(diff(C))<(3*eps)); % check for equality within tolerance
if PropArray
PropConst = C(1); % they're all equal, get the constant
else
PropConst = nan; % They're not equal, set nan
end

Matlab: Help in implementing quantized time series

I am having trouble implementing this code due to the variable s_k being logical 0/1. In what way can I implement this statement?
s_k is a random sequence of 0/1 generated using a rand() and quantizing the output of rand() by its mean given below. After this, I don't know how to implement. Please help.
N =1000;
input = randn(N);
s = (input>=0.5); %converting into logical 0/1;
UPDATE
N = 3;
tmax = 5;
y(1) = 0.1;
for i =1 : tmax+N-1 %// Change here
y(i+1) = 4*y(i)*(1-y(i)); %nonlinear model for generating the input to Autoregressive model
end
s = (y>=0.5);
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
x = sum(s(ind+1).*(2.^(-ind+N+1))); % The output of this conversion should be real numbers
% Autoregressive model of order 1
z(1) =0;
for j =2 : N
z(j) = 0.195 *z(j-1) + x(j);
end
You've generated the random logical sequence, which is great. You also need to know N, which is the total number of points to collect at one time, as well as a list of time values t. Because this is a discrete summation, I'm going to assume the values of t are discrete. What you need to do first is generate a sliding window matrix. Each column of this matrix represents a set of time values for each value of t for the output. This can easily be achieved with bsxfun. Assuming a maximum time of tmax, a starting time of 0 and a neighbourhood size N (like in your equation), we can do:
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
For example, assuming tmax = 5 and N = 3, we get:
ind =
0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
Each column represents a time that we want to calculate the output at and every row in a column shows a list of time values we want to calculate for the desired output.
Finally, to calculate the output x, you simply take your s_k vector, make it a column vector, use ind to access into it, do a point-by-point multiplication with 2^(-k+N+1) by substituting k with what we got from ind, and sum along the rows. So:
s = rand(max(ind(:))+1, 1) >= 0.5;
x = sum(s(ind+1).*(2.^(-ind+N+1)));
The first statement generates a random vector that is as long as the maximum time value that we have. Once we have this, we use ind to index into this random vector so that we can generate a sliding window of logical values. We need to offset this by 1 as MATLAB starts indexing at 1.

How to deal with multiple minimum values when using "min" function

I am using matlab's "min" function to determine the index corresponding to the minimum value within a array (just a vector, actually)... All's well and good, except that I've found that when there are multiple values in the array that share the minimum value, the function [C, I] = min(A) returns only one of the indices. This actually would not be an issue, except that the index it returns is not always the first (i.e., smallest) index that has the minimum value. The documentation says that this should be the case (so, if entry #4 and entry #13 in an array have the same (minimum) value, it should return I = 4), but that's not what's happening.
Does anyone know how to have the min function return the smallest/lowest index for a shared minimum value within an array/vector? Relatedly, can anyone explain why the function is not behaving as it seemingly should?
Thanks,
Ben Mooneyham
As stated above, the values are then likely not the same. Consider
a = [1 2 3 4 2 4 3 1];
b = a;
b(1) = 1+eps; b(end) = 1-eps; % added a small error to the 1st and 8th element
[~,Ia] = min(a);
[~,Ib] = min(b);
where Ia is 1 and Ib would be 8.
A solution is to round off your inputs:
f = 0.1;% rounding off to 1 decimal place
c = round(b/f)*f;
[~,Ic] = min(c);
where Ic will be 1, as expected.

Resources