numpy using multidimensional index array on another multidimensional array - arrays

I have a 2 multidimensional arrays, and I'd like to use one as the index to produce a new multidimensional array. For example:
a = array([[4, 3, 2, 5],
[7, 8, 6, 8],
[3, 1, 5, 6]])
b = array([[0,2],[1,1],[3,1]])
I want to use the first array in b to return those indexed elements in the first array of a, and so on. So I want the output to be:
array([[4,2],[8,8],[6,1]])
This is probably simple but I couldn't find an answer by searching. Thanks.

This is a little tricky, but the following will do it:
>>> a[np.arange(3)[:, np.newaxis], b]
array([[4, 2],
[8, 8],
[6, 1]])
You need to index both the rows and the columns of the a array, so to match your b array you would need an array like this:
rows = np.array([[0, 0],
[1, 1],
[2, 2]])
And then a[rows, b] would clearly return what you are after. You can get the same result relying on broadcasting as above, replacing the rows array with np.arange(3)[:, np.newaxis], which is equivalent to np.arange(3).reshape(3, 1).

Related

Select rows by minimum values of a column considering unique values of another column (numpy array)

I want to select only the rows for each unique value of a column (first column) that have a minimum value in another column (second column).
How can I do it?
Let's say I have this array:
[[10, 1], [10, 5], [10, 2], [20, 4], [20, 1], [20, 7], [20, 2], [40, 7], [40, 4], [40, 5]]
I would like to obtain the following array:
[[10, 1], [20, 1], [40, 4]]
I was trying selecting rows in this way:
d = {i: array[array[:, 0] == i] for i in np.unique(array[:, 0])}
but then I dont't know how to detect the one with minimum value in the second row.
What you want is the idea of groupby, as implemented in pandas for instance. As we don't have that in numpy, let's implement something similar to this other answer.
Let's call your input array A. So first, sort the rows by the values in the first column. We do this so that all entries with the same value appear one after the other.
sor = A[A[:,0].argsort()]
And get the indices where new unique values are found.
uniq=np.unique(sor[:,0],return_index=True)[1]
print(uniq)
>>> array([0, 3, 7])
This indicates the places of the array where we need to cut to get groups. Now split the second column into such groups. That way you get chunks of elements of the second column, grouped by the elements on the first column.
grp=np.split(sor[:,1],uni[1:])
print(grp)
>>> [array([1, 5, 2]), array([4, 1, 7, 2]), array([7, 4, 5])]
Last step is to get the index of the minimum value out of each of these groups
ind=np.array(list(map(np.argmin,grp))) + uni
print(ind)
>>> array([0, 4, 8])
The first part maps the np.argmin function to every group in grp. The + uniq part is there for mapping every one of these minimum arguments into the original scale.
Now you only need to index your sorted array using these indices.
print(sor[ind])
>>> array([[10, 1],
[20, 1],
[40, 4]])

Find all array with second highest elements in a list

Assuming that I have a list of arrays in Python 3.2, and I want to output an array that contains every array elements, together with their index position in the list, which have the highest second elements. How can I achieve this goal in the most scalable way (i.e., without having to use the nested for-loop )?
Input
a = [[2,3], [1,4,5], [1,4,6,2], [3,3,5], [9,4]]
Expected Output
res = [[[1,4,5], 1], [[1, 4, 6,2], 2], [[9,4], 4]]
Can someone please help assist me on how to do this without using nested for-loop?
You could do:
b = max(a, key=lambda x:x[1])[1]
[[j, i] for i, j in enumerate(a) if j[1]==b]
Out[6]: [[[1, 4, 5], 1], [[1, 4, 6, 2], 2], [[9, 4], 4]]

get array index from sort in Ruby

I have an array
array_a1 = [9,43,3,6,7,0]
which I'm trying to get the sort indices out of, i.e. the answer should be
array_ordered = [6, 3, 4, 5, 1, 2]
I want to do this as a function, so that
def order (array)
will return array_ordered
I have tried implementing advice from Find the index by current sort order of an array in ruby but I don't see how I can do what they did for an array :(
if there are identical values in the array, e.g.
array_a1 = [9,43,3,6,7,7]
then the result should look like:
array_ordered = [3, 4, 5, 6, 1, 2]
(all indices should be 0-based, but these are 1-based)
You can do it this way:
[9,43,3,6,7,0].
each_with_index.to_a. # [[9, 0], [43, 1], [3, 2], [6, 3], [7, 4], [0, 5]]
sort_by(&:first). # [[0, 5], [3, 2], [6, 3], [7, 4], [9, 0], [43, 1]]
map(&:last)
#=> [5, 2, 3, 4, 0, 1]
First you add index to each element, then you sort by the element and finally you pick just indices.
Note, that array are zero-indexed in Ruby, so the results is less by one comparing to your spec.
You should be able to just map over the sorted array and lookup the index of that number in the original array.
arr = [9,43,3,6,7,0]
arr.sort.map { |n| arr.index(n) } #=> [5, 2, 3, 4, 0, 1]
Or if you really want it 1 indexed, instead of zero indexed, for some reason:
arr.sort.map { |n| arr.index(n) + 1 } #=> [6, 3, 4, 5, 1, 2]
array_a1 = [9,43,3,6,7,0]
array_a1.each_index.sort_by { |i| array_a1[i] }
#=> [5, 2, 3, 4, 0, 1]
If array_a1 may contain duplicates and ties are to be broken by the indices of the elements (the element with the smaller index first), you may modify the calculation as follows.
[9,43,3,6,7,7].each_index.sort_by { |i| [array_a1[i], i] }
#=> [2, 3, 4, 5, 0, 1]
Enumerable#sort_by compares two elements with the spaceship operator, <=>. Here, as pairs of arrays are being compared, it is the method Array#<=> that is used. See especially the third paragraph of that doc.

Is there any functional difference between array_upper and array_length?

The postgres docs on these 2 array functions are pretty weak.
I've tried both functions a few different ways and they seem to return the same results.
SELECT array_length(array[[1, 2], [3, 4], [5, 6]], 1);
SELECT array_upper(array[[1, 2], [3, 4], [5, 6]], 1);
SELECT array_length(array[[1, 2], [3, 4], [5, 6]], 2);
SELECT array_upper(array[[1, 2], [3, 4], [5, 6]], 2);
Yes, there is a difference. PostgreSQL array subscripts start at one by default but they don't have to:
By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n].
[...]
Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7] to create an array with subscript values from -2 to 7.
[...]
By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents.
In general, you need to use array_lower and array_upper instead of assuming that the array will start at 1 and end at array_length(a, n).

How to extract lines in an array, which contain a certain value? (numpy, scipy)

I have an numpy 2D array and I want it to return coloumn c where (r, c-1) (row r, coloumn c) equals a certain value (int n).
I don't want to iterate over the rows writing something like
for r in len(rows):
if array[r, c-1] == 1:
store array[r,c]
, because there are 4000 of them and this 2D array is just one of 20 i have to look trough.
I found "filter" but don't know how to use it (Found no doc).
Is there an function, that provides such a search?
I hope I understood your question correctly. Let's say you have an array a
a = array(range(7)*3).reshape(7, 3)
print a
array([[0, 1, 2],
[3, 4, 5],
[6, 0, 1],
[2, 3, 4],
[5, 6, 0],
[1, 2, 3],
[4, 5, 6]])
and you want to extract all lines where the first entry is 2. This can be done like this:
print a[a[:,0] == 2]
array([[2, 3, 4]])
a[:,0] denotes the first column of the array, == 2 returns a Boolean array marking the entries that match, and then we use advanced indexing to extract the respective rows.
Of course, NumPy needs to iterate over all entries, but this will be much faster than doing it in Python.
Numpy arrays are not indexed. If you need to perform this specific operation more effeciently than linear in the array size, then you need to use something other than numpy.

Resources