Related
I want to select only the rows for each unique value of a column (first column) that have a minimum value in another column (second column).
How can I do it?
Let's say I have this array:
[[10, 1], [10, 5], [10, 2], [20, 4], [20, 1], [20, 7], [20, 2], [40, 7], [40, 4], [40, 5]]
I would like to obtain the following array:
[[10, 1], [20, 1], [40, 4]]
I was trying selecting rows in this way:
d = {i: array[array[:, 0] == i] for i in np.unique(array[:, 0])}
but then I dont't know how to detect the one with minimum value in the second row.
What you want is the idea of groupby, as implemented in pandas for instance. As we don't have that in numpy, let's implement something similar to this other answer.
Let's call your input array A. So first, sort the rows by the values in the first column. We do this so that all entries with the same value appear one after the other.
sor = A[A[:,0].argsort()]
And get the indices where new unique values are found.
uniq=np.unique(sor[:,0],return_index=True)[1]
print(uniq)
>>> array([0, 3, 7])
This indicates the places of the array where we need to cut to get groups. Now split the second column into such groups. That way you get chunks of elements of the second column, grouped by the elements on the first column.
grp=np.split(sor[:,1],uni[1:])
print(grp)
>>> [array([1, 5, 2]), array([4, 1, 7, 2]), array([7, 4, 5])]
Last step is to get the index of the minimum value out of each of these groups
ind=np.array(list(map(np.argmin,grp))) + uni
print(ind)
>>> array([0, 4, 8])
The first part maps the np.argmin function to every group in grp. The + uniq part is there for mapping every one of these minimum arguments into the original scale.
Now you only need to index your sorted array using these indices.
print(sor[ind])
>>> array([[10, 1],
[20, 1],
[40, 4]])
I am very amateur when it comes to scipy. I am trying to use scipy's fmin function on a multidimensional variable system. For the sake of simplicity I am using list of list of list's. My data is 12 dimensional, when I enter np.shape(DATA) it returns (3,2,2), I am not even sure if scipy can handle that many dimensions, if not no problem I can reduce them, the point is that the optimize.fmin() function doesn't accept list based arrays as x0 initial parameters, so I need help either rewriting the x0 array into numpy compatible one or the entire DATA array into a 12 dimensional matrix or something like that.
Here is a simpler example illustrating the issue:
from scipy import optimize
import numpy as np
def f(x): return(x[0][0]*1.5-x[0][1]*2.0+x[1][0]*2.5-x[1][1]*3.0)
result = optimize.fmin(f,[[0.1,0.1],[0.1,0.1]])
print(result)
It will give an error saying invalid index to scalar variable which probably comes from not understanding the [[],[]] list of list structure, so it probably only understands numpy array formats.
So how to rewrite this to make it work, and also for my (3,2,2) shaped list of list as well!?
scipy.optimize.fmin needs the initial guess for the function parameters to be a 1D array with a number of elements that suits the function to optimize. In your case, maybe you can use flatten and reshape if you just need the output to be in the same shape as your input parameters. An example based on your illustration code:
from scipy import optimize
import numpy as np
def f(x):
return x[0]*1.5-x[1]*2.0+x[2]*2.5-x[3]*3.0
guess = np.array([[0.1, 0.1],
[0.1, 0.1]]) # guess.shape is (2,2)
out = optimize.fmin(f, guess.flatten()) # flatten upon input
# out.shape is (4,)
# reshape output according to guess
out = out.reshape(guess.shape) # out.shape is (2,2) again
or out = optimize.fmin(f, guess.flatten()).reshape(guess.shape) in one line. Note that this also works for a 3-dimensional array as you propose:
guess = np.arange(12).reshape(3,2,2)
# array([[[ 0, 1],
# [ 2, 3]],
# [[ 4, 5],
# [ 6, 7]],
# [[ 8, 9],
# [10, 11]]])
guess = guess.flatten()
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
guess = guess.reshape(3,2,2)
# array([[[ 0, 1],
# [ 2, 3]],
# [[ 4, 5],
# [ 6, 7]],
# [[ 8, 9],
# [10, 11]]])
So I was embarking on a mission to figure out how the numpy swapaxes function operates and reached a sort of a roadblock when it came to swapping axes in arrays of dimensions > 3.
Say
import numpy as np
array=np.arange(24).reshape(3,2,2,2)
This would create a numpy array of shape (3,2,2,2) with elements 0-2. Can someone explain to me how exactly axes swapping works in this case, where we cannot visualise the four axes separately?
Say I want to swap axes 0 and 2.
array.swapaxes(0,2)
It would be great if someone could actually describe the abstract swapping which is occurring when there are 4 or more axes. Thanks!
How do you 'describe' a 4d array? We don't have intuitions to match; the best we can do is project from 2d experience. rows, cols, planes, ??
This array is small enough to show the actual print:
In [271]: arr = np.arange(24).reshape(3,2,2,2)
In [272]: arr
Out[272]:
array([[[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]]],
[[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]]],
[[[16, 17],
[18, 19]],
[[20, 21],
[22, 23]]]])
The print marks the higher dimensions with extra [] and blank lines.
In [273]: arr.swapaxes(0,2)
Out[273]:
array([[[[ 0, 1],
[ 8, 9],
[16, 17]],
[[ 4, 5],
[12, 13],
[20, 21]]],
[[[ 2, 3],
[10, 11],
[18, 19]],
[[ 6, 7],
[14, 15],
[22, 23]]]])
To see what's actually being done, we have to look at the underlying properties of the arrays
In [274]: arr.__array_interface__
Out[274]:
{'data': (188452024, False),
'descr': [('', '<i4')],
'shape': (3, 2, 2, 2),
'strides': None, # arr.strides = (32, 16, 8, 4)
'typestr': '<i4',
'version': 3}
In [275]: arr.swapaxes(0,2).__array_interface__
Out[275]:
{'data': (188452024, False),
'descr': [('', '<i4')],
'shape': (2, 2, 3, 2),
'strides': (8, 16, 32, 4),
'typestr': '<i4',
'version': 3}
The data attributes are the same - the swap is a view, sharing data buffer with the original. So no numbers are moved around.
The shape change is obvious, that's what we told it swap. Sometimes it helps to make all dimensions different, e.g. (2,3,4)
It has also swapped 2 strides values, though how that affects the display is harder to explain. We have to know something about how shape and strides work together to create a multidimensional array (from a flat data buffer).
I'm trying to write a function that will take as input a numpy array in the form:
a = [[0,0], [10,0], [10,10], [5,4]]
and return a numpy array b such that:
b = [[[0,0]], [[10,0]], [[10,10]], [[5,4]]]
For some reason I'm finding this surprisingly difficult.
The reason I'm doing this is that I have some contours generated using skimage that I'm attempting to use opencv2 on to calculate features ( area, perimeter etc...) but the opencv functions will only take arrays in the form of b as input, rather than a.
a is shape (4,2), b is (4,1,2)
a.reshape(4,1,2)
np.expanddims(a, 1)
a[:,None]
all work
In [503]: B
Out[503]:
array([[[ 0, 0]],
[[10, 0]],
[[10, 10]],
[[ 5, 4]]])
In [504]: B.tolist()
Out[504]: [[[0, 0]], [[10, 0]], [[10, 10]], [[5, 4]]]
I have an numpy 2D array and I want it to return coloumn c where (r, c-1) (row r, coloumn c) equals a certain value (int n).
I don't want to iterate over the rows writing something like
for r in len(rows):
if array[r, c-1] == 1:
store array[r,c]
, because there are 4000 of them and this 2D array is just one of 20 i have to look trough.
I found "filter" but don't know how to use it (Found no doc).
Is there an function, that provides such a search?
I hope I understood your question correctly. Let's say you have an array a
a = array(range(7)*3).reshape(7, 3)
print a
array([[0, 1, 2],
[3, 4, 5],
[6, 0, 1],
[2, 3, 4],
[5, 6, 0],
[1, 2, 3],
[4, 5, 6]])
and you want to extract all lines where the first entry is 2. This can be done like this:
print a[a[:,0] == 2]
array([[2, 3, 4]])
a[:,0] denotes the first column of the array, == 2 returns a Boolean array marking the entries that match, and then we use advanced indexing to extract the respective rows.
Of course, NumPy needs to iterate over all entries, but this will be much faster than doing it in Python.
Numpy arrays are not indexed. If you need to perform this specific operation more effeciently than linear in the array size, then you need to use something other than numpy.