signrank test in a three-dimensional array in MATLAB - arrays

I have a 60x60x35 array and would like to calculate the Wilcoxon signed rank test to calculate if the median for each element value across the third array dimension (i.e. with 35 values) is different from zero. Thus, I would like my results in two 60x60 arrays - with values of 0 and 1 depending on the test statistic, and in a separate array with corresponding p values.
The problem I am facing is specifying the command in a way that desired output would have appropriate dimensions and would be calculated across the appropriate dimension of the array.
Thanks for your help and all the best!

So one way to solve your problem is using a nested for-loop. Lets say your data is stored in data:
data=rand(60,60,35);
size_data=size(data);
p=zeros(size_data(1),size_data(2));
p(:,:)=NaN;
h=zeros(size_data(1),size_data(2));
h(:,:)=NaN;
for k=1:size_data(1)
for l=1:size_data(2)
tmp_data=data(k,l,:);
tmp_data=reshape(tmp_data,1,numel(tmp_data));
[p(k,l), h(k,l)]=signrank(tmp_data);
end
end
What I am doing is I preallocate the memory of p,h as a 60x60 matrix. Then I set them to NaN, so if you can easily see if sth went wrong (0 would be an acceptable result). Now I loop over all elements and store the actual data array in a new variable. signrank needs the data to be an array so I reshape it to two dimensions.
I guess you could skip those loops by using bsxfun

Related

Getting 10 nsmallest arrays from a set of arrays

First of all, I apologize for the confusing title, the task which I'm trying to accomplish is itself still confusing to me, hence why I'm finding it hard to do it. I'll try to be as clear as I can from now on.
I have 100 500x500 arrays, the values inside range from 0 to 1. What I would like to do is write a code that gives me 10 arrays, these arrays will be a sort of composite of the minimum values between them.
The first array is made of the absolute minimum values, the second array with the 2nd order minimum values....and so on. So the 10 arrays will be a composite of sorted ascending values.
I managed to get the absolute minimum with np.minimum() but I have no clue on how to proceed to the next ones.
To reiterate, I don't want to sort the 100 arrays, but loop through them and create new arrays with the lowest values found in each position.
Sorting is the most efficient way.
np.sort([array0,array1,...], 0)
Will yield an array where the first element is an 100x100 array of the smallest element-wise entries of all your arrays, the second the second smallest, etc.

How to blit from a 1D array along a dimension of a 2D array?

I have a 2D array, and have computed necessary updates along a given dimension of it using a 1D array (said updates can't be computed in place as earlier calculations would override values needed in later calculations). I thus want to copy the updates into my 2D array. The most obvious way to do this would, at first glance, appear to be to use Array slicing and Array.blit.
I have tried the approach of extracting the relevant dimension using array slicing, and then blitting across to that, but that doesn't update the values inside the 2D array. I think what is happening is that a new, separate, 1D array is being created when I make the slice, and the values are being blitted into that new array, which of course is dropped a moment later when it goes back out of scope.
I suppose you could say that I was expecting the slicing to return a view into the 2D array which would work for the blit function call, but instead the slicing actually returns a new array with the values copied into it (which, thinking about it, is what slicing does otherwise, I believe).
Currently I am using a workaround whereby I create a 2D array, where one of the dimensions is only 1 element wide (thus effectively re-creating a 1D array), and then using Array2D.blit. I would prefer to do it directly though, both because I find this ugly, and moreover because it would be quite useful elsewhere in my program where I can't just declare a 1D array as 2D.
My first approach:
let srcArray = Array.zeroCreate srcArrayLength
... // do relevant computation
srcArray.[index] <- result
... // finish computation
Array.blit srcArray 0 destArray.[index, *] 0 srcArrayLength
My current approach:
let srcArray = Array2D.zeroCreate 1 srcArrayLength
... // do relevant computation
srcArray.[0,index] <- result
... // finish computation
Array2D.blit srcArray 0 0 destArray index 0 1 srcArrayLength
The former approach has no effect on my destination 2D array. The latter approach works where I use it, but as I said above it isn't nice, and cannot be used in another situation, where I have a jagged 2D array (i.e. 'a[][]) that I would like to blit across from.
How might I go about achieiving my aim? I thought of Span/Memory, but it wasn't clear to me if and how they could be used here. Alternatively, if you can spot a better way to do this that doesn't involve blit, I'm all-virtual-ears.
I figured out a fairly good solution to this, with the help of someone over in the F# Foundation Slack. Since nobody else has posted an answer, I'll put this one up.
Both Array.Copy (note that that is the .NET Array.Copy method, not the F#-specific Array.copy) and Buffer.BlockCopy were suggested to me. Array.Copy still complains about mismatching array types, but Buffer.BlockCopy ignores the dimensionality of the supplied array, and merely copies the specified number of bytes from one location to another. Using this and relying on the fact that 2D arrays are really stored as 1D arrays in row-major order (the same as C, I believe), it is quite possible to overwrite the last dimension of a multi-dimensional array reasonably cleanly.
I updated the code from the 'current approach' in my question to the below:
let srcArray = Array.zeroCreate srcArrayLength
... //do relevant computation
srcArray.[index] <- result
... //finish computation
Buffer.BlockCopy(srcArray, 0, destArray, firstDimIndex * lengthOfSecondDim * sizeof<'a>, lengthOfSecondDim * sizeof<'a>
Not only does it do the job in a way which I personally find a bit tidier, but it has a side-benefit in that it is noticeably faster than the second approach described in the question - I haven't yet run a benchmark to quantify the difference though.

Change array dimensions, using spreadsheet functions, when used inside SUMPRODUCT

I am interested in spreadsheet functions, not VBA solutions, to be included in a single cell formula.
[A1:A15 contain numeric values from 1 to 127, B1:B15 contain integers from 1 to 7 that set a divisor.]
Given the function:
=SUMPRODUCT(MOD(FREQUENCY(A1:A15;A1:A15);B1:B15))
FREQUENCY(A1:A15;A1:A15) gives a 1-column array of 15+1 rows, whereas the second part (B1:B15) is a 1-column array of 15 rows.
I would like to change the resulting array given by FREQUENCY (only in memory -not explicit in sheet-) from a 1-column 16 rows array to a 1-column 15 rows array with the first 15 cell values of that array.
[FREQUENCY documentation: https://support.office.com/en-us/article/FREQUENCY-function-44e3be2b-eca0-42cd-a3f7-fd9ea898fdb9 NB: for Excel, second remark states number of elements that depend on bins_array. ]
I would appreciate suggestions.
Thus, both arrays within MOD will have the same dimensions and SUMPRODUCT will not find cells with error values. I can disregard error values using IF and ISERROR within SUMPRODUCT, but I'd rather disregard the non-relevant part of the FREQUENCY resulting array if it is possible.
It has been thought that making it more specific might be more helpful, so it has been heavily reduced and simplified.
With external help, I have been able to fine-tune a way to solve my problem using INDEX in array formula mode. I am posting the answer in case it helps others.
One way: Put FREQUENCY(A1:A15;A1:A15), or any formula that produces an multi-cell array, within INDEX and have 2nd and/or 3rd arguments as array of consecutive values which will represent rows/columns.
INDEX(FREQUENCY(A1:A15;A1:A15);ROW(INDIRECT("1:" & ROWS(FREQUENCY(A1:A15;A1:A15)-1));1)
First argument within INDEX is the resulting array coming from a formula to shrink (from 16x1 to 15x1), which would be a multi-cell array formula if explicitly entered; second argument is the array 1..15 given by row numbers from 1 to the number of total rows of the "array from formula to shrink" MINUS 1: the first 15 (out of 16) values in the array from a formula; 3rd argument is the column of the shrank array (if need be, more than one could be selected using an analogue to the second argument).
In the particular case of FREQUENCY, because it is known that we are interested in the "bins" part of the function, the formula can be simplified by including the total rows of the "bins"/"intervals" array inside FREQUENCY (its second argument). We will have
INDEX(FREQUENCY(A1:A15;A1:A15);ROW(INDIRECT("1:" & ROWS(A1:A15)));1)
and the complete formula would become
SUMPRODUCT(MOD(INDEX(FREQUENCY(A1:A15;A1:A15);ROW(INDIRECT("1:" & ROWS(A1:A15)));1);B1:B15))
Now, both dividend and divisor of MOD have exactly the same dimensions (15x1) and because B1:B15 includes integers greater than 0 there are no errors.
Thanks all for helping me in making question more concise and better formatted.
ADDITIONAL INFORMATION: As pointed out correctly in comments by XOR LX, this does not seem to work in the widely popular spreadsheet software Excel. It has been developed for an INDEX function inside SUMPRODUCT as used in Open Office Calc which I had mistakenly thought 100% equivalent to Excel's version. A more complete answer perhaps using other functions would be appreciated.
In the previous answer, XOR LX points out very correctly that this formula cannot work in Excel, due to row_num/column_num argument behaviour. Very kindly XOR LX has shown me how that approach can work, and also thanks and credit for supplying a good answer: "INDEX can be used to redimension array (even dynamically created ones) if the row_num/column_num array is coerced to take an arbitrary array with the right dimensions, as shown on this blog entry " The following formula has been checked in Excel 2010 and has the expected results:
SUMPRODUCT(MOD(INDEX(FREQUENCY(A1:A15,A1:A15),N(INDEX(ROW(INDIRECT("1:" & ROWS(A1:A15))),,)),1),B1:B15))
NB: row_num argument of first INDEX, a ROW generated auxiliary array, has been nested inside N(INDEX([...],,)); at least one comma is necessary to account for the two arguments minimum of the nested INDEX. It is in itself interesting the discussion that applies generally to INDEX's arguments, and other functions', that need to be coerced to take arrays (see, here and here at XOR LX's blog). For Open Office users it might be worth stressing the point made at the blog
Unlike OFFSET, (...) for which the first parameter must be a
reference (...) in the worksheet, INDEX can also accept –
and manipulate – for its reference arrays which consist of values
generated e.g. via other subfunctions within the formula. XOR LX's blog
That would be indeed the case in changing the dimension in an array as in this question, but also useful in reversing or displacing the values in an array, for example. Open Office accepts arrays as row_num/column_num, so the coercion is not needed and some formulas rely on this, but without it, these formulas are unlikely to work when files are open in Excel.
Regrettably, this type of coercion is not passed correctly to Open Office, and formula need to be "decoerced" to work, at least in my casual tests.
In order to use a formula that would work in both spreadsheet programs regarding shortening arrays, the only thing I have managed is the following (required: arrays must be single-column)
SUMPRODUCT(
(COLUMN(INDIRECT("R1C1:R"& ROWS(vals_to_mod) &"C"& ROWS(FREQUENCY(vals_for_freq,vals_for_freq)),FALSE))
-ROW(COLUMN(INDIRECT("R1C1:R"& ROWS(vals_to_mod) &"C"& ROWS(FREQUENCY(vals_for_freq,vals_for_freq)),FALSE))
=0)
*MOD(TRANSPOSE(FREQUENCY(vals_for_freq,vals_for_freq)),vals_to_mod)
)
(it "shortens" one array to the shortest of the pair, by creating an auxiliary array with TRUE/1s on the diagonal starting top-left and FALSE/0s elsewhere, therefore disregarding all defined values outside the square section of the array. Thus, SUMPRODUCT adds values within the diagonal of the square section which are the product of the corresponding values up to the last value of the shorter array.)

Mean of a 4D array across selected dimensions

I am using the mean function in MATLAB on a 4D matrix.
The matrix is a 32x2x20x7 array and I wish to find the mean of each row, of all columns and elements of 3rd dimension, for each 4th dimension.
So basically mean(data(b,:,:,c)) [pseudo-code] for each b, c.
However, when I do this it spits me out separate means for each 3rd dimension, do you know how I can get it to give me one mean for the above equation - so it would be (32x7=)224 means.
You could do it without loops:
data = rand(32,2,20,7); %// example data
squeeze(mean(mean(data,3),2))
The key is to use a second argument to mean, which specifies across which dimension the mean is taken (in your case: dimensions 2 and 3). squeeze just removes singleton dimensions.
this should work
a=rand(32,2,20,7);
for i=1:32
for j=1:7
c=a(i,:,:,j);
mean(c(:))
end
end
Note that with two calls to mean, there will be small numerical differences in the result depending on the order of operations. As such, I suggest doing this with one call to mean to avoid such concerns:
squeeze(mean(reshape(data,size(data,1),[],size(data,4)),2))
Or if you dislike squeeze (some people do!):
mean(permute(reshape(data,size(data,1),[],size(data,4)),[1 3 2]),3)
Both commands use reshape to combine the second and third dimensions of data, so that a single call to mean on the new larger second dimension will perform all of the required computations.

Initialize array without zeroes

I have an 3 dimensional array that represents an xy grid, and the z vector represents depth. I only know depths of certain rows and am trying to interpolate the array. My questions is how do I create a 720x400 array, without setting all the values to 0 (as that could affect the interpolation).
Thanks!
You can use:
A = nan(m,n,...);
to initialize a matrix with NaN's, if that is what you ask for. Other popular choices are inf(m,n,...) to initialize with Inf's and ones(m,n,...) to initialize with 1's.
So, to create a 720x400 matrix full of NaN's you can just:
A = nan(720,400);
It is not necessary to initialize the empty rows to a special value. Instead, you can modify the interpolation procedure to assign a zero weight to these rows. Then, they will not affect the interpolation.
A simple way to do so in MATLAB would be to use the griddata method for the interpolation.

Resources