Create a matrix with a changing number of columns - arrays

I'm trying to do an homemade version of peakfinder.m, by making it work with multiple arrays instead of just one at a time, for more time efficient performance. (http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder)
I have a 2D matrix where I need to find if the sign changes in the 2nd dimension.
dx0 = diff(x0,1,2); % Find derivative
dx0(dx0 == 0) = -eps; % This is so we find the first of repeated values
ind = find(dx0(:,1:end-1).*(dx0(:,2:end)) < 0)+1; % Find where the derivative changes sign
Now my problem is that it does find where the derivative changes sign, but it is one big vector. So if the signs changes twice in the same row (or doesn't in a row), I have no way to find out.
So if x0 is of size 1000x10, I'd like ind to be of size 1000xY, where Y is the number of times it changes sign in EACH row. I also need to know at which values of x0 there is a sign change. So each row will be in the style of :
2 4 7
4 8
2 5 6 8
etc.
Is this possible at all? Or should I change the code so it places a 0 if it doesn't change and a 1 if it does change, considering I'll be working with the values where it changes?

cellfun approach -
b1 = padarray(sign(dx0(:,1:end-1))~=sign(dx0(:,2:end)),[0 1],'pre')
out = cellfun(#find,mat2cell(b1,ones(1,size(b1,1)),size(b1,2)),'uni',0)
The above code assumes you have padarray which seems like a recent addition to MATLAB's Image Processing Toolbox. So, if you don't have it, you can concatenate zeros (with false) like this -
b1 = sign(dx0(:,1:end-1))~=sign(dx0(:,2:end))
b1 = [false(size(b1,1),1) b1]
out = cellfun(#find,mat2cell(b1,ones(1,size(b1,1)),size(b1,2)),'uni',0)
Alternative solution using cellfun with nonzeros function -
b1 = padarray(sign(dx0(:,1:end-1))~=sign(dx0(:,2:end)),[0 1],'pre')
out = cellfun(#nonzeros,mat2cell(bsxfun(#times,b1,1:size(b1,2)),ones(1,size(b1,1)),size(b1,2)),'uni',0)
out contains the locations of sign change across the rows, which can be displayed using celldisp(out).
The counts of the sign changes can be calculated using -
counts = cellfun(#numel,out)

Related

Is there any mechanism to auto squeeze in Matlab / Octave

For an nD array, it would be nice to be able to auto squeeze to remove singleton dimensions. Is there a way to do this that I don't know about? This would be especially useful for aggregate functions (e.g. sum, mean, etc) where you always expect a result with fewer dimensions.
Here's a simple example:
>> A = ones(3,3,3);
>> B = mean(A);
>> size(B)
ans =
1 3 3
>> squeeze(B)
ans =
1 1 1
1 1 1
1 1 1
It would be nice if Matlab/Octave would automatically do the squeezing for me. Or if there was a way to turn that option on (something similar to hold on for plots).
As far as I know, Matlab does not have that. And I don't think it would be a good idea. Consider a modified version of your example:
>> A = ones(3,1,1,3);
>> B = mean(A);
>> size(B)
ans =
1 1 1 3
What should "auto-squeeze" do here? Reduce B to size [1 1 3] or to [1 3]?
You could argue that it should remove the same dimension that mean has turned into a singleton. But then it would have to be done within the mean function, perhaps with an optional input argument. Once you get the function output, there is no information how it was obtained.
Or you could argue that it should remove all singleton dimensions, like squeeze (more or less) does. But then it would remove dimensions that were already singleton in the function input, which is probably unwanted.
If you ask me, having a second input in squeeze specifiyng which (singleton) dimensions to remove would be a nice addition (in the same vein as you can use mean(A, 1) to force the operation to be applied along the first dimension even if A happens to be a row vector).
I agree with Luis and Cris, but I would add the following.
Both Matlab and Octave do automatically squeeze extra dimensions, in a very particular scenario: any dimensions at the end that have been reduced to singletons, are automatically squeezed out.
E.g.
A = ones([1,2,3,4]);
B = mean(A, 4);
size(B)
% ans = 1 2 3
Note, how the answer is [1,2,3], and not [1,2,3,1]. This is in contrast to languages like python, for instance, where a size of (1,1) is very different to a size of (1,).
Therefore, with regard to your questions, one way to use this to your advantage could be to ensure that the dimension that is to be reduced is always found at the end, and thus automatically simplified.
This becomes even more useful when you realise that:
size(A(:)) % ans = 24 1 (i.e. 24)
size(A(:,:)) % ans = 1 24
size(A(:,:,:)) % ans = 1 2 12
size(A(:,:,:,:)) % ans = 1 2 3 4
Meaning, if you order your dimensions hierarchically you can ensure that any operations that need to take place over the higher dimensions, can a) be vectorised easily, and b) give a natural result, without the need to waste time squeezing or permuting the resulting dimensions.

Ignore NaN when detrending 3-d array

I'm using Matlab 2016a; I'm attempting to detrend a 3-dimensional array along the third dimension, but where there are missing values. It is critical that the values stay in the same positions in the array since the position relates to a geographic location.
In this image, imagine that Page 2 has NaN at random locations but that Page 1 and Page 3 have complete data. Detrending along the 3rd dimension, some vectors will have three data points and some will have two. I need to be able to detrend along the third dimension using all available values. If I were to look at the values for the detrended Page 1 or Page 3, there should be no missing values (since there are always either 2 or 3 data points to use), but Page 2 would have NaN placeholders in the location where the NaN was located.
My question is: how can I detrend along the third dimension while ignoring NaN?
I've attempted using detrend3 (found on the Matlab file exchange: https://www.mathworks.com/matlabcentral/fileexchange/61328-detrend3?focused=7203929&tab=function), which works perfectly when detrending 3-d arrays with no missing values.
Detrending with NaN present produces an error. I've tried ignoring NaN and also setting NaN to -9999 and then ignoring that number, but have been unable to get these efforts to work.
Any guidance about what direction to go would be greatly appreciated.
function detrended = detrendNaN3(A,t)
%DETRENDNAN3 Detrends a matrix with NaNs into the third dimension
% Input Arguments:
% - A: NxMxK matrix
% - t: 1xK time vector
% time to same format as A
t = bsxfun(#times,permute(t,[3 1 2]),ones(size(A)));
% where A == Nan, -> t = NaN
t(isnan(A)) = NaN;
%mean of time each pixel
xm = nanmean(t,3);
% mean of every pixel in A
ym = nanmean(A,3);
% calculate slope using least squares for every pixel
a = nansum(bsxfun(#times,bsxfun(#minus,t,xm),bsxfun(#minus,A,ym)),3)./nansum(bsxfun(#minus,t,xm).^2,3);
% calculate intercept for every pixel
b = ym - a.*xm;
% calculate trend for every pixel
trend = bsxfun(#plus,b,bsxfun(#times,a,t));
% remove trend
detrended = A-trend;
end
Even tough the function is fully vectorised it could be written a bit faster - but it's currently very readable and with a 2500x1700x100 matrix it takes about 8 seconds which I deem acceptable.
An updated version is maintained at the file exchange.

Reshape a 3D array and remove missing values

I have an NxMxT array where each element of the array is a grid of Earth. If the grid is over the ocean, then the value is 999. If the grid is over land, it contains an observed value. N is longitude, M is latitude, and T is months.
In particular, I have an array called tmp60 for the ten years 1960 through 1969, so 120 months for each grid.
To test what the global mean in January 1960 was, I write:
tmpJan60=tmp60(:,:,1);
tmpJan60(tmpJan60(:,:)>200)=NaN;
nanmean(nanmean(tmpJan60))
which gives me 5.855.
I am confused about the reshape function. I thought the following code should yield the same average, namely 5.855, but it does not:
load tmp60
N1=size(tmp60,1)
N2=size(tmp60,2)
N3=size(tmp60,3)
reshtmp60 = reshape(tmp60, N1*N2,N3);
reshtmp60( reshtmp60(:,1)>200,: )=[];
mean(reshtmp60(:,1))
this gives me -1.6265, which is not correct.
I have checked the result in Excel (!) and 5.855 is correct, so I assume I make a mistake in the reshape function.
Ideally, I want a matrix that takes each grid, going first down the N-dimension, and make the 720 rows with 120 columns (each column is a month). These first 720 rows will represent one longitude band around Earth for the same latitude. Next, I want to increase the latitude by 1, thus another 720 rows with 120 columns. Ultimately I want to do this for all 360 latitudes.
If longitude and latitude were inputs, say column 1 and 2, then the matrix should look like this:
temp = [-179.75 -89.75 -1 2 ...
-179.25 -89.75 2 4 ...
...
179.75 -89.75 5 9 ...
-179.75 -89.25 2 5 ...
-179.25 -89.25 3 4 ...
...
-179.75 89.75 2 3 ...
...
179.75 89.75 6 9 ...]
So temp(:,3) should be all January 1960 observations.
One way to do this is:
grid1 = tmp60(1,1,:);
g1 = reshape(grid1, [1,120]);
grid2 = tmp60(2,1,:);
g2 = reshape(grid2,[1,120]);
g = [g1;g2];
But obviously very cumbersome.
I am not able to automate this procedure for the N*M elements, so comments are appreciated!
A link to the file tmp60.mat
The main problem in your code is treating the nans. Observe the following example:
a = randi(10,6);
a(a>7)=nan
m = [mean(a(:),'omitnan') mean(mean(a,'omitnan'),'omitnan')]
m =
3.8421 3.6806
Both elements in m are simply the mean on all elements in a. But they are different! The reason is the taking the mean of all values together, with mean(a(:),'omitnan') is like summing all not-nan values, and divide by the number of values we summed:
sum(a(:),'omitnan')/sum(~isnan(a(:)))==mean(a(:),'omitnan') % this is true
but taking the mean of the first dimension, we get 6 mean values:
sum(a,'omitnan')./sum(~isnan(a))==mean(a,'omitnan') % this is also true
and when we take the mean of them we divide by a larger number, because all nans were omitted already:
mean(sum(a,'omitnan')./sum(~isnan(a)))==mean(a(:),'omitnan') % this is false
Here is what I think you want in your code:
% this is exactly as your first test:
tmpJan60=tmn60(:,:,1);
tmpJan60(tmpJan60>200) = nan;
m1 = mean(mean(tmpJan60,'omitnan'),'omitnan')
% this creates the matrix as you want it:
result = reshape(permute(tmn60,[3 1 2]),120,[]).';
result(result>200) = nan;
r = reshape(result(:,1),720,360);
m2 = mean(mean(r,'omitnan'),'omitnan')
isequal(m1,m2)
To create the matrix you first permute the dimensions so the one you want to keep as is (time) will be the first. Then reshape the array to Tx(lon*lat), so you get 120 rows for all time steps and 259200 columns for all combinations of the coordinates. All that's left is to transpose it.
m1 is your first calculation, and m2 is what you try to do in the second one. They are equal here, but their value is not 5.855, even if I use your code.
However, I think the right solution will be to take the mean of all values together:
mean(result(:,1),'omitnan')

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

Randomize matrix elements between two values while keeping row and column sums fixed (MATLAB)

I have a bit of a technical issue, but I feel like it should be possible with MATLAB's powerful toolset.
What I have is a random n by n matrix of 0's and w's, say generated with
A=w*(rand(n,n)<p);
A typical value of w would be 3000, but that should not matter too much.
Now, this matrix has two important quantities, the vectors
c = sum(A,1);
r = sum(A,2)';
These are two row vectors, the first denotes the sum of each column and the second the sum of each row.
What I want to do next is randomize each value of w, for example between 0.5 and 2. This I would do as
rand_M = (0.5-2).*rand(n,n) + 0.5
A_rand = rand_M.*A;
However, I don't want to just pick these random numbers: I want them to be such that for every column and row, the sums are still equal to the elements of c and r. So to clean up the notation a bit, say we define
A_rand_c = sum(A_rand,1);
A_rand_r = sum(A_rand,2)';
I want that for all j = 1:n, A_rand_c(j) = c(j) and A_rand_r(j) = r(j).
What I'm looking for is a way to redraw the elements of rand_M in a sort of algorithmic fashion I suppose, so that these demands are finally satisfied.
Now of course, unless I have infinite amounts of time this might not really happen. I therefore accept these quantities to fall into a specific range: A_rand_c(j) has to be an element of [(1-e)*c(j),(1+e)*c(j)] and A_rand_r(j) of [(1-e)*r(j),(1+e)*r(j)]. This e I define beforehand, say like 0.001 or something.
Would anyone be able to help me in the process of finding a way to do this? I've tried an approach where I just randomly repick the numbers, but this really isn't getting me anywhere. It does not have to be crazy efficient either, I just need it to work in finite time for networks of size, say, n = 50.
To be clear, the final output is the matrix A_rand that satisfies these constraints.
Edit:
Alright, so after thinking a bit I suppose it might be doable with some while statement, that goes through every element of the matrix. The difficult part is that there are four possibilities: if you are in a specific element A_rand(i,j), it could be that A_rand_c(j) and A_rand_r(i) are both too small, both too large, or opposite. The first two cases are good, because then you can just redraw the random number until it is smaller than the current value and improve the situation. But the other two cases are problematic, as you will improve one situation but not the other. I guess it would have to look at which criteria is less satisfied, so that it tries to fix the one that is worse. But this is not trivial I would say..
You can take advantage of the fact that rows/columns with a single non-zero entry in A automatically give you results for that same entry in A_rand. If A(2,5) = w and it is the only non-zero entry in its column, then A_rand(2,5) = w as well. What else could it be?
You can alternate between finding these single-entry rows/cols, and assigning random numbers to entries where the value doesn't matter.
Here's a skeleton for the process:
A_rand=zeros(size(A)) is the matrix you are going to fill
entries_left = A>0 is a binary matrix showing which entries in A_rand you still need to fill
col_totals=sum(A,1) is the amount you still need to add in every column of A_rand
row_totals=sum(A,2) is the amount you still need to add in every row of A_rand
while sum( entries_left(:) ) > 0
% STEP 1:
% function to fill entries in A_rand if entries_left has rows/cols with one nonzero entry
% you will need to keep looping over this function until nothing changes
% update() A_rand, entries_left, row_totals, col_totals every time you loop
% STEP 2:
% let (i,j) be the indeces of the next non-zero entry in entries_left
% assign a random number to A_rand(i,j) <= col_totals(j) and <= row_totals(i)
% update() A_rand, entries_left, row_totals, col_totals
end
update()
A_rand(i,j) = random_value;
entries_left(i,j) = 0;
col_totals(j) = col_totals(j) - random_value;
row_totals(i) = row_totals(i) - random_value;
end
Picking the range for random_value might be a little tricky. The best I can think of is to draw it from a relatively narrow distribution centered around N*w*p where p is the probability of an entry in A being nonzero (this would be the average value of row/column totals).
This doesn't scale well to large matrices as it will grow with n^2 complexity. I tested it for a 200 by 200 matrix and it worked in about 20 seconds.

Resources