Find multiple maxima of vector for a moving window - arrays

I have a 1D matrix x and I want for specific iterations to slide the analyse window, so that every time it moves by 20 samples with 50% overlap. I found bsxfun but i don't know how to adjust it to my problem.
I wrote the below code but I'm not getting the results I expect. I need to get for every iteration the max value of autocorr, for every overlapping window. I get an error for the number of lags.
x = rand(1,100);
N = length(x); % length of signal
n1 = 20; % length of analysing window
win_num = floor((N/n1)*2-1); % number of windows
for i=1:win_num
xmax(i) = max(bsxfun(#autocorr,x(1:n1/2:N),win_num-1));
end

You can modify your loop as follows, to make it work:
x = rand(1,100); %// example data
N = length(x) %// length of signal
n = 20 %// length of analysing window
for ii = n/2:n/2:N-1
xmax(ii*2/n) = max( x( ii-n/2+1 : ii+n/2) );
end
A vectorized version could be:
xmax = max( x( bsxfun(#plus, (1:n).',0:n/2:N-n) ) )
Explanation:
%// create index matrix with moving window
idx = bsxfun(#plus, (1:n).',0:n/2:N-n); %'
%// get values of original vector
xM = x( idx );
%// find maximimum in dimension 1
xmax = max( xM ).'

Related

Minimize (firstA_max - firstA_min) + (secondB_max - secondB_min)

Given n pairs of integers. Split into two subsets A and B to minimize sum(maximum difference among first values of A, maximum difference among second values of B).
Example : n = 4
{0, 0}; {5;5}; {1; 1}; {3; 4}
A = {{0; 0}; {1; 1}}
B = {{5; 5}; {3; 4}}
(maximum difference among first values of A, maximum difference among second values of B).
(maximum difference among first values of A) = fA_max - fA_min = 1 - 0 = 1
(maximum difference among second values of B) = sB_max - sB_min = 5 - 4 = 1
Therefore, the answer if 1 + 1 = 2. And this is the best way.
Obviously, maximum difference among the values equals to (maximum value - minimum value). Hence, what we need to do is find the minimum of (fA_max - fA_min) + (sB_max - sB_min)
Suppose the given array is arr[], first value if arr[].first and second value is arr[].second.
I think it is quite easy to solve this in quadratic complexity. You just need to sort the array by the first value. Then all the elements in subset A should be picked consecutively in the sorted array. So, you can loop for all ranges [L;R] of the sorted. Each range, try to add all elements in that range into subset A and add all the remains into subset B.
For more detail, this is my C++ code
int calc(pair<int, int> a[], int n){
int m = 1e9, M = -1e9, res = 2e9; //m and M are min and max of all the first values in subset A
for (int l = 1; l <= n; l++){
int g = m, G = M; //g and G are min and max of all the second values in subset B
for(int r = n; r >= l; r--) {
if (r - l + 1 < n){
res = min(res, a[r].first - a[l].first + G - g);
}
g = min(g, a[r].second);
G = max(G, a[r].second);
}
m = min(m, a[l].second);
M = max(M, a[l].second);
}
return res;
}
Now, I want to improve my algorithm down to loglinear complexity. Of course, sort the array by the first value. After that, if I fixed fA_min = a[i].first, then if the index i increase, the fA_max will increase while the (sB_max - sB_min) decrease.
But now I am still stuck here, is there any ways to solve this problem in loglinear complexity?
The following approach is an attempt to escape the n^2, using an argmin list for the second element of the tuples (lets say the y-part). Where the points are sorted regarding x.
One Observation is that there is an optimum solution where A includes index argmin[0] or argmin[n-1] or both.
in get_best_interval_min_max we focus once on including argmin[0] and the next smallest element on y and so one. The we do the same from the max element.
We get two dictionaries {(i,j):(profit, idx)}, telling us how much we gain in y when including points[i:j+1] in A, towards min or max on y. idx is the idx in the argmin array.
calculate the objective for each dict assuming max/min or y is not in A.
combine the results of both dictionaries, : (i1,j1): (v1, idx1) and (i2,j2): (v2, idx2). result : j2 - i1 + max_y - min_y - v1 - v2.
Constraint: idx1 < idx2. Because the indices in the argmin array can not intersect, otherwise some profit in y might be counted twice.
On average the dictionaries (dmin,dmax) are smaller than n, but in the worst case when x and y correlate [(i,i) for i in range(n)] they are exactly n, and we do not win any time. Anyhow on random instances this approach is much faster. Maybe someone can improve upon this.
import numpy as np
from random import randrange
import time
def get_best_interval_min_max(points):# sorted input according to x dim
L = len(points)
argmin_b = np.argsort([p[1] for p in points])
b_min,b_max = points[argmin_b[0]][1], points[argmin_b[L-1]][1]
arg = [argmin_b[0],argmin_b[0]]
res_min = dict()
for i in range(1,L):
res_min[tuple(arg)] = points[argmin_b[i]][1] - points[argmin_b[0]][1],i # the profit in b towards min
if arg[0] > argmin_b[i]: arg[0]=argmin_b[i]
elif arg[1] < argmin_b[i]: arg[1]=argmin_b[i]
arg = [argmin_b[L-1],argmin_b[L-1]]
res_max = dict()
for i in range(L-2,-1,-1):
res_max[tuple(arg)] = points[argmin_b[L-1]][1]-points[argmin_b[i]][1],i # the profit in b towards max
if arg[0]>argmin_b[i]: arg[0]=argmin_b[i]
elif arg[1]<argmin_b[i]: arg[1]=argmin_b[i]
# return the two dicts, difference along y,
return res_min, res_max, b_max-b_min
def argmin_algo(points):
# return the objective value, sets A and B, and the interval for A in points.
points.sort()
# get the profits for different intervals on the sorted array for max and min
dmin, dmax, y_diff = get_best_interval_min_max(points)
key = [None,None]
res_min = 2e9
# the best result when only the min/max b value is includes in A
for d in [dmin,dmax]:
for k,(v,i) in d.items():
res = points[k[1]][0]-points[k[0]][0] + y_diff - v
if res < res_min:
key = k
res_min = res
# combine the results for max and min.
for k1,(v1,i) in dmin.items():
for k2,(v2,j) in dmax.items():
if i > j: break # their argmin_b indices can not intersect!
idx_l, idx_h = min(k1[0], k2[0]), max(k1[1],k2[1]) # get index low and idx hight for combination
res = points[idx_h][0]-points[idx_l][0] -v1 -v2 + y_diff
if res < res_min:
key = (idx_l, idx_h) # new merged interval
res_min = res
return res_min, points[key[0]:key[1]+1], points[:key[0]]+points[key[1]+1:], key
def quadratic_algorithm(points):
points.sort()
m, M, res = 1e9, -1e9, 2e9
idx = (0,0)
for l in range(len(points)):
g, G = m, M
for r in range(len(points)-1,l-1,-1):
if r-l+1 < len(points):
res_n = points[r][0] - points[l][0] + G - g
if res_n < res:
res = res_n
idx = (l,r)
g = min(g, points[r][1])
G = max(G, points[r][1])
m = min(m, points[l][1])
M = max(M, points[l][1])
return res, points[idx[0]:idx[1]+1], points[:idx[0]]+points[idx[1]+1:], idx
# let's try it and compare running times to the quadratic_algorithm
# get some "random" points
c1=0
c2=0
for i in range(100):
points = [(randrange(100), randrange(100)) for i in range(1,200)]
points.sort() # sorted for x dimention
s = time.time()
r1 = argmin_algo(points)
e1 = time.time()
r2 = quadratic_algorithm(points)
e2 = time.time()
c1 += (e1-s)
c2 += (e2-e1)
if not r1[0] == r2[0]:
print(r1,r2)
raise Exception("Error, results are not equal")
print("time of argmin_algo", c1, "time of quadratic_algorithm",c2)
UPDATE: #Luka proved the algorithm described in this answer is not exact. But I will keep it here because it's a good performance heuristics and opens the way to many probabilistic methods.
I will describe a loglinear algorithm. I couldn't find a counter example. But I also couldn't find a proof :/
Let set A be ordered by first element and set B be ordered by second element. They are initially empty. Take floor(n/2) random points of your set of points and put in set A. Put the remaining points in set B. Define this as a partition.
Let's call a partition stable if you can't take an element of set A, put it in B and decrease the objective function and if you can't take an element of set B, put it in A and decrease the objective function. Otherwise, let's call the partition unstable.
For an unstable partition, the only moves that are interesting are the ones that take the first or the last element of A and move to B or take the first or the last element of B and move to A. So, we can find all interesting moves for a given unstable partition in O(1). If an interesting move decreases the objective function, do it. Go like that until the partition becomes stable. I conjecture that it takes at most O(n) moves for the partition to become stable. I also conjecture that at the moment the partition becomes stable, you will have a solution.

Can this loop containing different indices be vectorized or speeded up?

I have a code which is doing some processing over every point in a 3D matrix. The array input_vec_1D is accessed by an unusual index ind_prime which depends on the loop variable (for context, the index is determined from an algorithm that I am using in Eq. 42e of this paper, and my full code is here). I have managed to get it working correctly by first turning the matrix to a 1D array, calculating the correct indices, doing the processing, and reshaping back to 3D afterwards:
Nx = 8; Ny = 6; Nz = 4; Ntot = Nx*Ny*Nz; % Number of points
xvals = rand(1,Nx); yvals = rand(1,Ny); zvals = rand(1,Nz); % Grid vectors
input_vec_3D = rand(Ny,Nx,Nz); % Dummy 3D array
factor1 = 3.6*xvals; % some constant times xvals
factor2 = 1.2*yvals;
factor3 = 8.5*zvals;
input_vec_1D = reshape( permute(input_vec_3D,[3,1,2]) , [Ntot 1]); % Reshape to 1D for loop
output_vec = zeros(Ntot,1);
for ind = 1:Ntot
j1 = floor( floor( (ind-1)/Nz ) /Ny ) + 1;
j2 = mod( floor( (ind-1)/Nz ) , Ny ) + 1;
j3 = mod( (ind-1) , Nz ) + 1;
n1 = mod( 5*(j1-1) ,Nx);
n2 = mod( 3*(j2-1) ,Ny);
n3 = mod( 2*(j3-1) ,Nz);
ind_prime = mod( ( n3 + Nz*(n2 + Ny*n1) ) , Ntot ) + 1; % a different index for input_vec
output_vec(ind) = output_vec(ind) + input_vec_1D(ind_prime) * factor1(j1)*factor2(j2)*factor3(j3);
end
output_vec = permute( reshape( output_vec, [Nz,Ny,Nx] ) , [2,3,1] ); % Reshape back to 3D
This loop over all elements is the slowest part of my code, so I would like to speed it up - by vectorizing or otherwise.
My arrays are typically 512x512x1024 complex doubles, so it is crucial for my application that I did not store any temporary extra large matrices due to limited RAM (around 6 GB), which precludes the use of meshgrid() to generate the factors (notice that factor1, factor2, factor3 are only 1D vectors, so memory usage for them is small).
I was kindly helped with a very similar loop here, which was solved using Matlab's implicit expansion in that case. However, this is more complicated, because in the processing line different indices are used ind_prime, ind, and j.
You can vectorise this entirely, which will be quicker (about 50% by my tests of 512x512x10 input matrix). But that involves the creation of several arrays, which we can reduce the size of in two ways
Use an integer data type (e.g. uint32) for the indicies. uint32 is 4 bytes per index, compared to 8 bytes for a double so that's a decent saving, especially when indices are always integers anyway. Note to make the most of this you have to use idivide instead of ./ to avoid MATLAB converting to double internally, and you have to convert Nx/Ny/Nz to uint32 for the same reason.
We can't use uint8 or uint16 as their max values are too small to cater to your large arrays.
Also note that (by default) idivide uses fix rounding, i.e. rounding towards 0, so you can skip using floor and maybe make up a small amount of performance there.
Recycle your arrays. Instead of using j1..3 and n1..3, as 6 indexing arrays, we can just reorder the operations slightly and recycle j1..3.
This comes together like so:
IND = uint32(1:Ntot); % Shorthand, could skip defining this and write each tiem if memory was tight
Nx = uint32(Nx); Ny = uint32(Ny); Nz = uint32(Nz); Ntot = Nx*Ny*Nz; % uint8 conversion
J1 = idivide( idivide(IND-1,Nz), Ny ) + 1; % idivide to avoid "double" casting, does "fix" rounding
J2 = mod( idivide( IND-1, Nz ), Ny ) + 1; % idivide to avoid "double" casting, does "fix" rounding
J3 = mod( IND-1, Nz ) + 1;
FAC = (factor1(J1).*factor2(J2).*factor3(J3)).'; % done here so we can recycle J1..3
J1 = mod( 5*(J1-1), Nx );
J2 = mod( 3*(J2-1), Ny );
J3 = mod( 2*(J3-1), Nz );
ind_prime2 = (mod( ( J3 + Nz*(J2 + Ny*J1) ) , Ntot ) + 1).';
output_vec3 = input_vec_1D(ind_prime2) .* FAC;

Array not defined

I'm still confused why am not able to know the results of this small algorithm of my array. the array has almost 1000 number 1-D. am trying to find the peak and the index of each peak. I did found the peaks, but I can't find the index of them. Could you please help me out. I want to plot all my values regardless the indexes.
%clear all
%close all
%clc
%// not generally appreciated
%-----------------------------------
%message1.txt.
%-----------------------------------
% t=linspace(0,tmax,length(x)); %get all numbers
% t1_n=0:0.05:tmax;
x=load('ww.txt');
tmax= length(x) ;
tt= 0:tmax -1;
x4 = x(1:5:end);
t1_n = 1:5:tt;
x1_n_ref=0;
k=0;
for i=1:length(x4)
if x4(i)>170
if x1_n_ref-x4(i)<0
x1_n_ref=x4(i);
alpha=1;
elseif alpha==1 && x1_n_ref-x4(i)>0
k=k+1;
peak(k)=x1_n_ref; // This is my peak value. but I also want to know the index of it. which will represent the time.
%peak_time(k) = t1_n(i); // this is my issue.
alpha=2;
end
else
x1_n_ref=0;
end
end
%----------------------
figure(1)
% plot(t,x,'k','linewidth',2)
hold on
% subplot(2,1,1)
grid
plot( x4,'b'); % ,tt,x,'k'
legend('down-sampling by 5');
Here is you error:
tmax= length(x) ;
tt= 0:tmax -1;
x4 = x(1:5:end);
t1_n = 1:5:tt; % <---
tt is an array containing numbers 0 through tmax-1. Defining t1_n as t1_n = 1:5:tt will not create an array, but an empty matrix. Why? Expression t1_n = 1:5:tt will use only the first value of array tt, hence reduce to t1_n = 1:5:tt = 1:5:0 = <empty matrix>. Naturally, when you later on try to access t1_n as if it were an array (peak_time(k) = t1_n(i)), you'll get an error.
You probably want to exchange t1_n = 1:5:tt with
t1_n = 1:5:tmax;
You need to index the tt array correctly.
you can use
t1_n = tt(1:5:end); % note that this will give a zero based index, rather than a 1 based index, due to t1_n starting at 0. you can use t1_n = 1:tmax if you want 1 based (matlab style)
you can also cut down the code a little, there are some variables that dont seem to be used, or may not be necessary -- including the t1_n variable:
x=load('ww.txt');
tmax= length(x);
x4 = x(1:5:end);
xmin = 170
% now change the code
maxnopeaks = round(tmax/2);
peaks(maxnopeaks)=0; % preallocate the peaks for speed
index(maxnopeaks)=0; % preallocate index for speed
i = 0;
for n = 2 : tmax-1
if x(n) > xmin
if x(n) >= x(n-1) & x(n) >= x(n+1)
i = i+1;
peaks(i) = t(n);
index(i) = n;
end
end
end
% now trim the excess values (if any)
peaks = peaks(1:i);
index = index(1:i);

find possible combinations with specific condition

I want to calculate all possible combinations of 1:16 with 10 subsets.
combos = combntns(1:16,10)
But the condition is that the returned combos should have minimum 1 member of the following vectors:
V1=1:4,V2=5:8,V3=9:12,V4=13:16,
Any solution?
With that problem size you can afford to generate all combinations and then select those that meet the requirements:
n = 16; %// number of elements to choose from
c = 10; %// combination size
s = 4; %// size of each group (size of V1, V2 etc)
combos = nchoosek(1:n, c);
ind = all(any(any(bsxfun(#eq, combos, reshape(1:n, 1,1,s,[])),2),3),4);
combos = combos(ind,:);
This can be generalized for generic elements and arbitrary condition vectors, assuming all vectors are the same size:
elements = 1:16; %// elements to choose from
c = 10; %// combination size
vectors = {1:4, 5:8, 9:12, 13:16}; %// cell array of vectors
s = numel(vectors{1});
combos = nchoosek(elements, c);
ind = all(any(any(bsxfun(#eq, combos, reshape(cat(1,vectors{:}).', 1,1,s,[])),2),3),4); %'
combos = combos(ind,:);

Sorted vector of indices from a vector

My question is more understable with an example.
Given an arbitrary vector, i.e. [6 2 5], I want to get another vector whose elements are the sorted indices of the input vector: in this case, [3 1 2].
Is there any MATLAB function capable of returning this?
Thanks!
use the second argument of sort twice
[~, tmp] = sort( myInput );
[~, myOutput] = sort( tmp );
Regarding running times:
n = 1000;
x = unique(randi(100*n,1,n)); %// make sure all elements of x are different
tic; %// try this answer
[ii t]=sort(x);
[ii out1]=sort(t);
toc,
tic;
out2 = sum(bsxfun(#ge, x, x.'));
toc
Output:
Elapsed time is 0.000778 seconds. %// this answer
Elapsed time is 0.003835 seconds. %// bsxfun approach
If all elements of the input vector x are assured to be different, you could use bsxfun: for each element of x, count how many elements (including itself) it equals or exceeds:
y = sum(triu(bsxfun(#ge, x(:).', x(:))), 1);
If the elements of x are not necessarily different, you need an additional step to make sure comparisons are done only with previous and current elements:
m = bsxfun(#ge, x(:).', x(:));
y = sum(m & ~tril(m,-1).', 1);

Resources