Sorting a cell array? - arrays

I have a cell array of numbers but the majority of the cell array is empty for example:
x =
[] [6] [] [4] [] [] [] [1]
I have a matching array y
y = [1, 3,1,5,7,3,1,5]
I want to get the index of the numbers from the cell array x and use them to get the corresponding values from y. So x(2) matches with y(2). I convert x to a array using
x = cell2mat(x);
But the problem is that it returns
x = [6,4,1]
This does not allow me to get the correct index so I can then sort X and then sort Y accordingly so the same indices match up. I tried to use sort that does not work for cell arrays.

Just use y(x); that will return indices 6, 4, and 1 from the y vector.
Note that the order of the returned matrix will depend on the order of the indices in x; if you want to sort x, do it before running y(x).

Related

how to compare two 1d arrays and output the indices where array 1 has the same scalar value as array 2

I'm trying to figure out how to take two 1d arrays, put it in a function so that every number in array 1 is compared to every number in array 2, and then the element numbers of the duplicate numbers found is displayed with respect to array 1
for example
array 1 = [12,16,36,72,82]
array 2= [16,53,72,12,40,71]
and the output would be
elements= 1 2 4
I'm new to Matlab so I don't currently have all the skills to make This work I'm trying to figure it out but I don't know what exactly to do.
it won't let me post the code because it doesn't make sense.
im not sure how to post is on her otherwise.
[Edit2]
Best approach
The unreliable approach below works for the example given by the OP, with the restriction discussed below. The best approach is using MATLAB's ismember and find functions:
array1 = [12,16,36,72,82];
array2 = [16,53,72,12,40,71];
idc = find( ismember(array1, array2 ) )
ismember(array1, array2) returns a logical array indicating which elements of array1 are contained in array2.
find(...) converts the logical array to indices.
[Edit1]
Alternative approach
An approach without ismember would be:
array1 = [12,16,36,72,82];
array2 = [16,53,72,12,40,71];
findFcn = #(X) find( array1(:) == X )';
idcs = arrayfun(findFcn, array2(:), 'UniformOutput', false );
idcs = unique([idcs{:}])
Explanation:
findFun(X) gives the index of the value of X in array1.
findFun(X) is called by arrayfun with X being equal to each element of array2 and arrayfun stores the return values of the calls to the cell array idcs.
Finally, idcs = unique([idcs{:}]); converts the cell array idc to a double array and removes repetitions.
[First answer]
Unreliable approach
You can use the intersect function:
array1 = [12,16,36,72,82];
array2 = [16,53,72,12,40,71];
[ ~, x1 ] = intersect( array1, array2 );
~ means that the first return value, which would be the intersection of array1 and array2, is discarded. x1 is then equal to the indices that the intersection's values have in array1.
If you also want to have the indices in array2 you could do
[ ~, x1, x2 ] = intersect( array1, array2 );
to store them in x2.
Note:
x1 and x2 only contain the indices of the first occurrence in array1 and array2, respectively.

Argmax of a multidimensional array along a subset of dimensions in Matlab

Say, Y is a 7-dimensional array, and I need an efficient way to maximize it along the last 3 dimensions, that will work on GPU.
As a result I need a 4-dimensional array with maximal values of Y and three 4-dimensional arrays with the indices of these values in the last three dimensions.
I can do
[Y7, X7] = max(Y , [], 7);
[Y6, X6] = max(Y7, [], 6);
[Y5, X5] = max(Y6, [], 5);
Then I have already found the values (Y5) and the indices along the 5th dimension (X5). But I still need indices along the 6th and 7th dimensions.
Here's a way to do it. Let N denote the number of dimensions along which to maximize.
Reshape Y to collapse the last N dimensions into one.
Maximize along the collapsed dimensions. This gives argmax as a linear index over those dimensions.
Unroll the linear index into N subindices, one for each dimension.
The following code works for any number of dimensions (not necessarily 7 and 3 as in your example). To achieve that, it handles the size of Y generically and uses a comma-separated list obtained from a cell array to get N outputs from sub2ind.
Y = rand(2,3,2,3,2,3,2); % example 7-dimensional array
N = 3; % last dimensions along which to maximize
D = ndims(Y);
sz = size(Y);
[~, ind] = max(reshape(Y, [sz(1:D-N) prod(sz(D-N+1:end))]), [], D-N+1);
sub = cell(1,N);
[sub{:}] = ind2sub(sz(D-N+1:D), ind);
As a check, after running the above code, observe for example Y(2,3,1,2,:) (shown as a row vector for convenience):
>> reshape(Y(2,3,1,2,:), 1, [])
ans =
0.5621 0.4352 0.3672 0.9011 0.0332 0.5044 0.3416 0.6996 0.0610 0.2638 0.5586 0.3766
The maximum is seen to be 0.9011, which occurs at the 4th position (where "position" is defined along the N=3 collapsed dimensions). In fact,
>> ind(2,3,1,2)
ans =
4
>> Y(2,3,1,2,ind(2,3,1,2))
ans =
0.9011
or, in terms of the N=3 subindices,
>> Y(2,3,1,2,sub{1}(2,3,1,2),sub{2}(2,3,1,2),sub{3}(2,3,1,2))
ans =
0.9011

Python: append kmean.labels_ to Numpy array

The size of two Numpy array are:
(406, 278)
(406,)
however, error occurred while appending Numpy array:
ValueError: all the input arrays must have same number of dimensions
code:
y = numpy.array(kmeans.labels_,copy=True)
x = numpy.append(x, y, axis=1); #error
x = numpy.append(x, y, axis=0); #error
As the error says, you are trying to append a 1d array to a 2d array with an axis parameter, and according to docs:
When axis is specified, values must have the correct shape.
You need to reshape y to a 2d array firstly:
Both of these two methods should work:
np.append(x, y[None, :], axis=0)
np.append(x, y.reshape(1,-1), axis=0)
According to numpy documentation ,
When axis is specified, values must have the correct shape.
So if you want to append the vector y = [0 1 2] to the matrix x = [[0, 0],[1, 1],[2, 2]] with axis=1, first you need to turn y into a matrix form, and then transpose it:
x = numpy.zeros((406,278))
y = numpy.zeros((406,))
x = numpy.append(x, numpy.transpose([y]), axis=1);
print(x.shape) # gives (406,279)

Cost efficient algorithm to group array of sets

Can anyone help me out with some effectively good algorithm to carry out the following task:
I got a file of unique row numbers with an array of integer numbers per row.
I need to check each single row for the values of an array that show up in different rows and put them in one group. Here is an example how it may look:
Row Number; Array of data[...]
L1; [1,2,3,4,5]
L2; [2,3]
L3: [8,9]
L4: [6]
L5; [7]
L6; [5,6]
Based on these input data, I expect the algorithm to produce the result:
Group N; Array of rows [...]
G1; [L1,L2,L4,L6]
G2; [ L3]
G3; [ L5]
P.S the original dataset accounts for hundreds of millions of rows and can contain close to a million of array elements... time efficiency is a concern.
Thanks
I believe this is equivalent to finding connected components of a graph in which:
The vertices correspond to the initial row numbers
There is an edge between two vertices x and y if there is a common element in the array for x and the array for y
This can be done efficiently using a disjoint set data structure as follows:
MakeSet(d) for each of the data values d (1,2,3,4,5,6,7,8,9 in your example)
For each row with array A, call join(A[0],A[i]) for each choice of i.
This will produce a set for each connected component. You can then produce your output array by iterating over the rows a second time:
set output to an array of empty lists
for each row r
A = array for row r
id = find(A[0])
output[id].append(r)
Example Python Code
from collections import defaultdict
data=[[1,2,3,4,5],
[2,3],
[8,9],
[6],
[7],
[5,6]]
N=max(max(A) for A in data)
rank=[0]*(N+1)
parent=range(N+1)
def Find(x):
"""Find representative of connected component"""
if parent[x] != x:
parent[x] = Find(parent[x])
return parent[x]
def Union(x,y):
"""Merge sets containing elements x and y"""
x = Find(x)
y = Find(y)
if x == y:
return
if rank[x]<rank[y]:
parent[x] = y
elif rank[x]>rank[y]:
parent[y] = x
else:
parent[y] = x
rank[x] += 1
# First join all data
for row,A in enumerate(data):
for x in A:
Union(A[0],x)
# Then place rows into sets
D=defaultdict(list)
for row,A in enumerate(data):
D[Find(A[0])].append(row+1)
# Then display output
for i,L in enumerate(D.values()):
print i+1,L
Running this code prints the output:
1 [3]
2 [1, 2, 4, 6]
3 [5]

How do concatenation and indexing differ for cells and arrays in MATLAB?

I am a little confused about the usage of cells and arrays in MATLAB and would like some clarification on a few points. Here are my observations:
An array can dynamically adjust its own memory to allow for a dynamic number of elements, while cells seem to not act in the same way:
a=[]; a=[a 1]; b={}; b={b 1};
Several elements can be retrieved from cells, but it doesn't seem like they can be from arrays:
a={'1' '2'}; figure; plot(...); hold on; plot(...); legend(a{1:2});
b=['1' '2']; figure; plot(...); hold on; plot(...); legend(b(1:2));
%# b(1:2) is an array, not its elements, so it is wrong with legend.
Are these correct? What are some other different usages between cells and array?
Cell arrays can be a little tricky since you can use the [], (), and {} syntaxes in various ways for creating, concatenating, and indexing them, although they each do different things. Addressing your two points:
To grow a cell array, you can use one of the following syntaxes:
b = [b {1}]; % Make a cell with 1 in it, and append it to the existing
% cell array b using []
b = {b{:} 1}; % Get the contents of the cell array as a comma-separated
% list, then regroup them into a cell array along with a
% new value 1
b{end+1} = 1; % Append a new cell to the end of b using {}
b(end+1) = {1}; % Append a new cell to the end of b using ()
When you index a cell array with (), it returns a subset of cells in a cell array. When you index a cell array with {}, it returns a comma-separated list of the cell contents. For example:
b = {1 2 3 4 5}; % A 1-by-5 cell array
c = b(2:4); % A 1-by-3 cell array, equivalent to {2 3 4}
d = [b{2:4}]; % A 1-by-3 numeric array, equivalent to [2 3 4]
For d, the {} syntax extracts the contents of cells 2, 3, and 4 as a comma-separated list, then uses [] to collect these values into a numeric array. Therefore, b{2:4} is equivalent to writing b{2}, b{3}, b{4}, or 2, 3, 4.
With respect to your call to legend, the syntax legend(a{1:2}) is equivalent to legend(a{1}, a{2}), or legend('1', '2'). Thus two arguments (two separate characters) are passed to legend. The syntax legend(b(1:2)) passes a single argument, which is a 1-by-2 string '12'.
Every cell array is an array! From this answer:
[] is an array-related operator. An array can be of any type - array of numbers, char array (string), struct array or cell array. All elements in an array must be of the same type!
Example: [1,2,3,4]
{} is a type. Imagine you want to put items of different type into an array - a number and a string. This is possible with a trick - first put each item into a container {} and then make an array with these containers - cell array.
Example: [{1},{'Hallo'}] with shorthand notation {1, 'Hallo'}

Resources