The Goal
(Forgive me for length of this, it's mostly background and detail.)
I'm contributing to a TOML encoder/decoder for MATLAB and I'm working with numerical arrays right now. I want to input (and then be able to write out) the numerical array in the same format. This format is the nested square-bracket format that is used by numpy.array. For example, to make multi-dimensional arrays in numpy:
The following is in python, just to be clear. It is a useful example though my work is in MATLAB.
2D arrays
>> x = np.array([1,2])
>> x
array([1, 2])
>> x = np.array([[1],[2]])
>> x
array([[1],
[2]])
3D array
>> x = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>> x
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
4D array
>> x = np.array([[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]])
>> x
array([[[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]]],
[[[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16]]]])
The input is a logical construction of the dimensions by nested brackets. Turns out this works pretty well with the TOML array structure. I can already successfully parse and decode any size/any dimension numeric array with this format from TOML to MATLAB numerical array data type.
Now, I want to encode that MATLAB numerical array back into this char/string structure to write back out to TOML (or whatever string).
So I have the following 4D array in MATLAB (same 4D array as with numpy):
>> x = permute(reshape([1:16],2,2,2,2),[2,1,3,4])
x(:,:,1,1) =
1 2
3 4
x(:,:,2,1) =
5 6
7 8
x(:,:,1,2) =
9 10
11 12
x(:,:,2,2) =
13 14
15 16
And I want to turn that into a string that has the same format as the 4D numpy input (with some function named bracketarray or something):
>> str = bracketarray(x)
str =
'[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]]'
I can then write out the string to a file.
EDIT: I should add, that the function numpy.array2string() basically does exactly what I want, though it adds some other whitespace characters. But I can't use that as part of the solution, though it is basically the functionality I'm looking for.
The Problem
Here's my problem. I have successfully solved this problem for up to 3 dimensions using the following function, but I cannot for the life of me figure out how to extend it to N-dimensions. I feel like it's an issue of the right kind of counting for each dimension, making sure to not skip any and to nest the brackets correctly.
Current bracketarray.m that works up to 3D
function out = bracketarray(in, internal)
in_size = size(in);
in_dims = ndims(in);
% if array has only 2 dimensions, create the string
if in_dims == 2
storage = cell(in_size(1), 1);
for jj = 1:in_size(1)
storage{jj} = strcat('[', strjoin(split(num2str(in(jj, :)))', ','), ']');
end
if exist('internal', 'var') || in_size(1) > 1 || (in_size(1) == 1 && in_dims >= 3)
out = {strcat('[', strjoin(storage, ','), ']')};
else
out = storage;
end
return
% if array has more than 2 dimensions, recursively send planes of 2 dimensions for encoding
else
out = cell(in_size(end), 1);
for ii = 1:in_size(end) %<--- this doesn't track dimensions or counts of them
out(ii) = bracketarray(in(:,:,ii), 'internal'); %<--- this is limited to 3 dimensions atm. and out(indexing) need help
end
end
% bracket the final bit together
if in_size(1) > 1 || (in_size(1) == 1 && in_dims >= 3)
out = {strcat('[', strjoin(out, ','), ']')};
end
end
Help me Obi-wan Kenobis, y'all are my only hope!
EDIT 2: Added test suite below and modified current code a bit.
Test Suite
Here is a test suite to use to see if the output is what it should be. Basically just copy and paste it into the MATLAB command window. For my current posted code, they all return true except the ones more than 3D. My current code outputs as a cell. If your solution output differently (like a string), then you'll have to remove the curly brackets from the test suite.
isequal(bracketarray(ones(1,1)), {'[1]'})
isequal(bracketarray(ones(2,1)), {'[[1],[1]]'})
isequal(bracketarray(ones(1,2)), {'[1,1]'})
isequal(bracketarray(ones(2,2)), {'[[1,1],[1,1]]'})
isequal(bracketarray(ones(3,2)), {'[[1,1],[1,1],[1,1]]'})
isequal(bracketarray(ones(2,3)), {'[[1,1,1],[1,1,1]]'})
isequal(bracketarray(ones(1,1,2)), {'[[[1]],[[1]]]'})
isequal(bracketarray(ones(2,1,2)), {'[[[1],[1]],[[1],[1]]]'})
isequal(bracketarray(ones(1,2,2)), {'[[[1,1]],[[1,1]]]'})
isequal(bracketarray(ones(2,2,2)), {'[[[1,1],[1,1]],[[1,1],[1,1]]]'})
isequal(bracketarray(ones(1,1,1,2)), {'[[[[1]]],[[[1]]]]'})
isequal(bracketarray(ones(2,1,1,2)), {'[[[[1],[1]]],[[[1],[1]]]]'})
isequal(bracketarray(ones(1,2,1,2)), {'[[[[1,1]]],[[[1,1]]]]'})
isequal(bracketarray(ones(1,1,2,2)), {'[[[[1]],[[1]]],[[[1]],[[1]]]]'})
isequal(bracketarray(ones(2,1,2,2)), {'[[[[1],[1]],[[1],[1]]],[[[1],[1]],[[1],[1]]]]'})
isequal(bracketarray(ones(1,2,2,2)), {'[[[[1,1]],[[1,1]]],[[[1,1]],[[1,1]]]]'})
isequal(bracketarray(ones(2,2,2,2)), {'[[[[1,1],[1,1]],[[1,1],[1,1]]],[[[1,1],[1,1]],[[1,1],[1,1]]]]'})
isequal(bracketarray(permute(reshape([1:16],2,2,2,2),[2,1,3,4])), {'[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]]'})
isequal(bracketarray(ones(1,1,1,1,2)), {'[[[[[1]]]],[[[[1]]]]]'})
I think it would be easier to just loop and use join. Your test cases pass.
function out = bracketarray_matlabbit(in)
out = permute(in, [2 1 3:ndims(in)]);
out = string(out);
dimsToCat = ndims(out);
if iscolumn(out)
dimsToCat = dimsToCat-1;
end
for i = 1:dimsToCat
out = "[" + join(out, ",", i) + "]";
end
end
This also seems to be faster than the route you were pursing:
>> x = permute(reshape([1:16],2,2,2,2),[2,1,3,4]);
>> tic; for i = 1:1e4; bracketarray_matlabbit(x); end; toc
Elapsed time is 0.187955 seconds.
>> tic; for i = 1:1e4; bracketarray_cris_luengo(x); end; toc
Elapsed time is 5.859952 seconds.
The recursive function is almost complete. What is missing is a way to index the last dimension. There are several ways to do this, the neatest, I find, is as follows:
n = ndims(x);
index = cell(n-1, 1);
index(:) = {':'};
y = x(index{:}, ii);
It's a little tricky at first, but this is what happens: index is a set of n-1 strings ':'. index{:} is a comma-separated list of these strings. When we index x(index{:},ii) we actually do x(:,:,:,ii) (if n is 4).
The completed recursive function is:
function out = bracketarray(in)
n = ndims(in);
if n == 2
% Fill in your n==2 code here
else
% if array has more than 2 dimensions, recursively send planes of 2 dimensions for encoding
index = cell(n-1, 1);
index(:) = {':'};
storage = cell(size(in, n), 1);
for ii = 1:size(in, n)
storage(ii) = bracketarray(in(index{:}, ii)); % last dimension automatically removed
end
end
out = { strcat('[', strjoin(storage, ','), ']') };
Note that I have preallocated the storage cell array, to prevent it from being resized in every loop iteration. You should do the same in your 2D case code. Preallocating is important in MATLAB for performance reasons, and the MATLAB Editor should warm you about this too.
Im trying to manual sort on the below array.
The issue here is, the result varies while reading the item from the "for-loop enumuration" (noted as //(2)) verses reading it as a subscript (noted as //(1)). It could be a minor issue hiding behind my eye. Appreciate your time.
var mySortArray : Array<Int> = []
mySortArray = [1,5,3,3,21,11,2]
for (itemX,X) in mySortArray.enumerated() {
for (itemY,Y) in mySortArray.enumerated() {
// if mySortArray[itemX] < mySortArray[itemY] // (1)
if X < Y // (2)
{
//Swap the position of item in the array
mySortArray.swapAt(itemX, itemY)
}
}
}
print(mySortArray)
// Prints [1, 2, 3, 3, 5, 11, 21] ( for condition // (1))
// Prints [2, 1, 3, 5, 11, 3, 21] ( for condition // (2))
mySortArray = [1,5,3,3,21,11,2]
print("Actual Sort Order : \(mySortArray.sorted())")
// Prints Actual Sort Order : [1, 2, 3, 3, 5, 11, 21]
The problem here is that the function .enumerated() returns a new sequence and iterates that. Think of it as a new array.
So, you are working with 3 different arrays here.
You have an unsorted array that you want to fix. Lets call this the w ("working array") and then you have you array x and array y.
So, w is [1,5,3,3,21,11,2], x and y are effectively the same as w at the beginning.
Now you get your first two values that need to swap...
valueX is at index 1 of x (5). valueY is at index 2 of y (3).
And you swap them... in w.
So now w is [1,3,5,3,21,11,2] but x and y are unchanged.
So now you indexes are being thrown off. You are comparing items in x with items in y and then swapping them in we which is completely different.
You need to work with one array the whole time.
Of course... there is also the issue that your function is currently very slow. O(n^2) and there are much more efficient ways of sorting.
If you are doing this as an exercise in learning how to write sort algorithms then keep going. If not you should really be using the .sort() function.
Really what you want to be doing is not using .enumerated() at all. Just use ints to get (and swap) values in w.
i.e. something like
for indexX in 0..<w.count {
for indexY in indexX..<w.count {
// do some comparison stuff.
// do some swapping stuff.
}
}
Ruby has lots of nice ways of iterating and directly returning that result. This mostly involve array methods. For example:
def ten_times_tables
(1..5).map { |i| i * 10 }
end
ten_times_tables # => [10, 20, 30, 40, 50]
However, I sometimes want to iterate using while and directly return the resulting array. For example, the contents of the array may depend on the expected final value or some accumulator, or even on conditions outside of our control.
A (contrived) example might look like:
def fibonacci_up_to(max_number)
sequence = [1, 1]
while sequence.last < max_number
sequence << sequence[-2..-1].reduce(:+)
end
sequence
end
fibonacci_up_to(5) # => [1, 1, 2, 3, 5]
To me, this sort of approach feels quite "un-Ruby". The fact that I construct, name, and later return an array feels like an anti-pattern. So far, the best I can come up with is using tap, but it still feels quite icky (and quite nested):
def fibonacci_up_to(max_number)
[1, 1].tap do |sequence|
while sequence.last < max_number
sequence << sequence[-2..-1].reduce(:+)
end
end
end
Does anyone else have any cleverer solutions to this sort of problem?
Something you might want to look into for situations like this (though maybe your contrived example fits this a lot better than your actual use case) is creating an Enumerator, so your contrived example becomes:
From the docs for initialize:
fib = Enumerator.new do |y|
a = b = 1
loop do
y << a
a, b = b, a + b
end
end
and then call it:
p fib.take_while { |elem| elem <= 5 }
#=> [1, 1, 2, 3, 5]
So, you create an enumerator which iterates all your values and then once you have that, you can iterate through it and collect the values you want for your array in any of the usual Ruby-ish ways
Similar to Simple Lime's Enumerator solution, you can write a method that wraps itself in an Enumerator:
def fibonacci_up_to(max_number)
return enum_for(__callee__, max_number) unless block_given?
a = b = 1
while a <= max_number
yield a
a, b = b, a + b
end
end
fibonacci_up_to(5).to_a # => [1, 1, 2, 3, 5]
This achieves the same result as returning an Enumerator instance from a method, but it looks a bit nicer and you can use the yield keyword instead of a yielder block variable. It also lets you do neat things like:
fibonacci_up_to(5) do |i|
# ..
end
I have an array in Matlab. I numbered every entry in array with natural number. So I formed equivalence relation in array.
For example,
array = [1 2 3 5 6 7]
classes = [1 2 1 1 3 3].
I want to get cell array: i-th cell array's position is connected with i-th entry of initial array and shows, which elements are in the one class with this entry. For the example above, I would get:
{[1 3 5], [2], [1 3 5], [1 3 5], [6 7], [6 7]}
It can be done easily with for-loop, but is there any other solution? It will be good if it works faster than O(n^2), where n is the size of initial array.
Edit.
Problem will be solved, if I know the approach to split sorted array into cells with indeces of equal elements by O(n).
array = [1 1 1 2 3 3]
groups = {[1 2 3], [4], [5 6]}
Not sure about complexity, but accumarray with cell output is useful for splitting up the array based on unique values of the classes:
data = sortrows([classes; array].',1) %' stable w.r.t. array
arrayPieces = accumarray(data(:,1),data(:,2)',[],#(x){x.'})
classElements = arrayPieces(classes).'
Regarding sorted array splitting into cells of indeces:
>> array = [1 1 1 2 3 3]
>> arrayinds = accumarray(array',1:numel(array),[],#(x){x'})' %' transpose for rows
arrayinds =
[1x3 double] [4] [1x2 double]
>> arrayinds{:}
ans =
1 2 3
ans =
4
ans =
5 6
I don't know how to do this without for-loops entirely, but you can use a combination of sort, diff, and find to organize and partition the equivalence class identifiers. That'll give you a mostly vectorized solution, where the M-code level for-loop is O(n) where n is the number of classes, not the length of the whole input array. This should be pretty fast in practice.
Here's a rough example using some index munging. Be careful; there's probably an off-by-one edge case bug in there somewhere since I just banged this out.
function [eqVals,eqIx] = equivsets(a,x)
%EQUIVSETS Find indexes of equivalent values
[b,ix] = sort(x);
ixEdges = find(diff(b)); % identifies partitions between equiv classes
ix2 = [0 ixEdges numel(ix)];
eqVals = cell([1 numel(ix2)-1]);
eqIx = cell([1 numel(ix2)-1]);
% Map back to original input indexes and values
for i = 1:numel(ix2)-1
eqIx{i} = ix((ix2(i)+1):ix2(i+1));
eqVals{i} = a(eqIx{i});
end
I included the indexes in the output because they're often more useful than the values themselves. You'd call it like this.
% Get indexes of occurrences of each class
equivs = equivsets(array, classes)
% You can expand that to get equivalences for each input element
equivsByValue = equivs(classes)
It's a lot more efficient to build the lists for each class first and then expand them out to match the input indexes. Not only do you have to do the work just once, but when you use the b = a(ix) to expand a small cell array to a larger one, Matlab's copy-on-write optimization will end up reusing the memory for the underlying numeric mxArrays so you get a more compact representation in memory.
This transformation pops up a lot when working with unique() or databases. For decision support systems and data warehouse style things I've worked with, it happens all over the place. I wish it were built in to Matlab. (And maybe it's been added to one of the db or timeseries toolboxes in recent years; I'm a few versions behind.)
Realistically, if performance of this is critical for your code, you might also look at dropping down to Java or C MEX functions and implementing it there. But if your data sets are low cardinality - that is, have a small number of classes/distinct values, like numel(unique(classes)) / numel(array) tends to be less than 0.1 or so - the M-code implementation will probably be just fine.
For the second question:
array = [1 1 1 2 3 3]; %// example data
Use diff to find the end of each run of equal values, and from that build the groups:
ind = [0 find(diff([array NaN])~=0)];
groups = arrayfun(#(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
Same approach using unique:
[~, ind] = unique(array);
ind = [0 ind];
groups = arrayfun(#(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
I haven't tested if the complexity is O(n), though.