This question already has answers here:
Deleting certain elements from numpy array using conditional checks
(3 answers)
Remove all occurrences of a value from a list?
(26 answers)
Closed 4 years ago.
The title is pretty self-explanatory: I have an numpy array like (let's say ints)
[ 1 2 10 2 12 2 ] and I would like to remove all occurrences of 2, so that the resulting array is [ 1 10 12 ]. Preferably I would like to do this as fastest as possible, because I am using relatively large arrays.
NumPy has a function called numpy.delete() but it takes the indexes as an argument, which I do not have.
Edit: The question is indeed different from Deleting certain elements from numpy array using conditional checks, which is I guess a more "general" case. However, the idea of removing occurrences from an array is fundamental enough to merit its own explicit question, so I am keeping the question.
You can use indexing:
arr = np.array([1, 2, 10, 2, 12, 2])
print(arr[arr != 2])
# [ 1 10 12]
Timing is pretty good:
from timeit import Timer
arr = np.array(range(5000))
print(min(Timer(lambda: arr[arr != 4999]).repeat(500, 500)))
# 0.004942436999999522
you can use another numpy function.It is numpy.setdiff1d(ar1, ar2, assume_unique=False).
This function Finds the set difference of two arrays.
import numpy as np
a = np.array([1, 2, 10, 2,12, 2])
b = np.array([2])
c = np.setdiff1d(a,b,True)
print(c)
There are several ways to do this. I suggest you use a mask:
import numpy as np
a = np.array([ 1, 2 ,10, 2, 12, 2 ])
a[~np.isin(a, 2)]
>> array([ 1, 10, 12])
np.isin is convenient because you can apply the filter to multiple elements at once if you need to:
a[~np.isin(a, (1,2))]
>> array([ 10, 12])
Also note that a[mask] is a slice of the original array. This is memory efficient; but if you need to create a new array with your filtered values and leave the original ones untouched, use .copy, e.g.:
b = a[~np.isin(a, (1,2))].copy()
Related
Say that I have a batch of arrays, and I would like to alter them based on conditions of particular values located by indices.
For example, say that I would like to increase and decrease particular values if the difference between those values are less than two.
For a single 1D array it can be done like this
import numpy as np
single2 = np.array([8, 8, 9, 10])
if abs(single2[1]-single2[2])<2:
single2[1] = single2[1] - 1
single2[2] = single2[2] + 1
single2
array([ 8, 7, 10, 10])
But I do not know how to do it for batch of arrays. This is my initial attempt
import numpy as np
single1 = np.array([6, 0, 3, 7])
single2 = np.array([8, 8, 9, 10])
single3 = np.array([2, 15, 15, 20])
batch = np.array([
np.copy(single1),
np.copy(single2),
np.copy(single3),
])
if abs(batch[:,1]-batch[:,2])<2:
batch[:,1] = batch[:,1] - 1
batch[:,2] = batch[:,2] + 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Looking at np.any and np.all, they are used to create an array of booleans values, and I am not sure how they could be used in the code snippet above.
My second attempt uses np.where, using the method described here for comparing particular values of a batch of arrays by creating new versions of the arrays with values added to the front/back of the arrays.
https://stackoverflow.com/a/71297663/3259896
In the case of the example, I am comparing values that are right next to each other, so I created copies that shift the arrays forwards and backwards by 1. I also use only the particular slice of the array that I am comparing, since the other numbers would also be used in the comparison in np.where.
batch_ap = np.concatenate(
(batch[:, 1:2+1], np.repeat(-999, 3).reshape(3,1)),
axis=1
)
batch_pr = np.concatenate(
(np.repeat(-999, 3).reshape(3,1), batch[:, 1:2+1]),
axis=1
)
Finally, I do the comparisons, and adjust the values
batch[:, 1:2+1] = np.where(
abs(batch_ap[:,1:]-batch_ap[:,:-1])<2,
batch[:, 1:2+1]-1,
batch[:, 1:2+1]
)
batch[:, 1:2+1] = np.where(
abs(batch_pr[:,1:]-batch_pr[:,:-1])<2,
batch[:, 1:2+1]+1,
batch[:, 1:2+1]
)
print(batch)
[[ 6 0 3 7]
[ 8 7 10 10]
[ 2 14 16 20]]
Though I am not sure if this is the most computationally efficient nor programmatically elegant method for this task. Seems like a lot of operations and code for the task, but I do not have a strong enough mastery of numpy to be certain about this.
This works
mask = abs(batch[:,1]-batch[:,2])<2
batch[mask,1] -= 1
batch[mask,2] += 1
My basic problem is that I need to use 2 arrays with integers, and arrive at an combined array that is the combination of many ranges made using pairwise combinations from the 2 initial arrays.
Said slightly differently, I want to use 2 arrays, combine them to produce a set of ranges, and then merge these ranges together. Importantly, I need to do this without using any looping, as I am going to need to do this almost 4 million times.
My 2 starting arrays are:
import numpy as np
sd = np.array([3,3,4,2,5,1]) # StartDate
ed = np.array([4,5,5,5,8,2]) # EndDate
Pairwise, they would look like this, combining (sd[i] with ed[i]):
[(3, 4), (3, 5), (4, 5), (2, 5), (5, 8), (1, 2)] # Pairwise combinations of StartDate and EndDate
By way of example, I could iterate over these pairs, creating ranges, exemplifying below:
[In]: range1 = np.arange(3,4)
[Out]: array([3])
[In]: range2 = np.arange(3,5)
[Out]: array([3,4])
...and so on, to arrive at the final out put which would be:
array([3, 3, 4, 4, 2, 3, 4, 5, 6, 7, 1]) # End result where the arrays are tiled after one another
#(note first 3 digits are array 1 and array 2 from immediately above.
My issue is that I need to go from the input arrays and to the output array without looping, as I have already tried a version of this, and it is WAY too slow. Any help very much appreciated.
You are in luck. Here is a one liner solution:
indexer = np.r_[tuple([np.s_[i:j] for (i,j) in zip(sd,ed)])]
output:
[3 3 4 4 2 3 4 5 6 7 1]
I have also explained similar case in here for torch: "Here is how it works:
np.s_[i:j] creates a slice object (simply a range) of indices from start=i to end=j.
np.r_[i:j, k:m] creates a list ALL indices in slices (i,j) and (k,m) (You can pass more slices to np.r_ to concatenate them all together at once. This is an example of concatenating only two slices.)
Therefore, indexer creates a list of ALL indices by concatenating a list of slices (each slice is a range of indices)."
This question already has answers here:
Get N maximum values and indices along an axis in a NumPy array
(4 answers)
Closed 4 years ago.
I got a numpy 2D array, and the list of indices corresponding to the top 3 elements obtained using argsort. Now, I am trying to extract the values corresponding to this indices, and it is not working. What is the workaround ?.
A = array([[0.19334242, 0.9787497 , 0.41453434, 0.35298119, 0.17943745,
0.63468207, 0.43840688],
[0.39811914, 0.68040634, 0.7589702 , 0.3573046 , 0.16365397,
0.86329535, 0.48559053],
[0.5848541 , 0.54203383, 0.27262654, 0.21979374, 0.06917679,
0.10586995, 0.57083441],
[0.76765549, 0.05703751, 0.83383973, 0.71867625, 0.16338699,
0.85721418, 0.5953548 ]])
np.flip(A.argsort(),axis=1)[:,0:3]
array([[1, 5, 6],
[5, 2, 1],
[0, 6, 1],
[5, 2, 0]])
gets error
>>> A[np.flip(A.argsort(),axis=1)[:,0:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for axis 0 with size 4
In [22]: A.ravel()[A.argsort(axis=None)[::-1][:3]]
Out[22]: array([ 0.9787497 , 0.86329535, 0.85721418])
Explanation
By default, argsort() sorts along the last axis. In your case, you want to sort a flattened version of the array as you don't give any meaning to the fact that the array is 2D. This happens by passing axis=None to argsort().
Since you get 1D indices, you also need to access values on a flattened version of the array, which is what ravel() do.
[::-1] reverses the argsort array to get top values first and [:3] gets the first 3 values.
Note: there are other and possibly more efficient ways to do that, but this was the first thing that came to my mind.
I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.
In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.
A working solution is the one below:
import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(A[fiber_index])
However, I am wondering whether there is something that is:
More idiomatic
Faster
Hope you can help!
I think you might be looking for numpy.apply_along_axis
In [10]: def my_func(x):
...: return x**2 + x
In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]:
array([[[ 0, 2, 6],
[ 12, 20, 30],
[ 42, 56, 72]],
[[ 90, 110, 132],
[156, 182, 210],
[240, 272, 306]],
[[342, 380, 420],
[462, 506, 552],
[600, 650, 702]]])
Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:
In [12]: np.sum(A, axis=2)
Out[12]:
array([[ 3, 12, 21],
[30, 39, 48],
[57, 66, 75]])
numpy provides a number of different ways of looping over 1 or more dimensions.
Your example:
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(fiber_index)
print A[fiber_index]
produces something like:
(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...
generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.
Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.
It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.
http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html
Iterating Over Arrays has good explanation, including an example of doing so in cython.
Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:
np.sum(A, axis=-1)
np.sum(A, axis=(1,2)) # sum over 2 axes
An equivalent is
np.add.reduce(A, axis=-1)
np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.
xnx suggests
np.apply_along_axis(np.sum, 2, A)
It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:
outarr[(i,j)] = np.sum(A[(i, j, slice(None))])
Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.
Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.
def with_ndindex(A, func, ax=-1):
# apply func along axis ax
A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
shape = A.shape[:-1]
B = np.empty(shape,dtype=A.dtype)
for ii in np.ndindex(shape):
B[ii] = func(A[ii])
return B
I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.
So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.
Say I have an Array[Int] like
val array = Array( 1, 2, 3 )
Now I would like to append an element to the array, say the value 4, as in the following example:
val array2 = array + 4 // will not compile
I can of course use System.arraycopy() and do this on my own, but there must be a Scala library function for this, which I simply could not find. Thanks for any pointers!
Notes:
I am aware that I can append another Array of elements, like in the following line, but that seems too round-about:
val array2b = array ++ Array( 4 ) // this works
I am aware of the advantages and drawbacks of List vs Array and here I am for various reasons specifically interested in extending an Array.
Edit 1
Thanks for the answers pointing to the :+ operator method. This is what I was looking for. Unfortunately, it is rather slower than a custom append() method implementation using arraycopy -- about two to three times slower. Looking at the implementation in SeqLike[], a builder is created, then the array is added to it, then the append is done via the builder, then the builder is rendered. Not a good implementation for arrays. I did a quick benchmark comparing the two methods, looking at the fastest time out of ten cycles. Doing 10 million repetitions of a single-item append to an 8-element array instance of some class Foo takes 3.1 sec with :+ and 1.7 sec with a simple append() method that uses System.arraycopy(); doing 10 million single-item append repetitions on 8-element arrays of Long takes 2.1 sec with :+ and 0.78 sec with the simple append() method. Wonder if this couldn't be fixed in the library with a custom implementation for Array?
Edit 2
For what it's worth, I filed a ticket:
https://issues.scala-lang.org/browse/SI-5017
You can use :+ to append element to array and +: to prepend it:
0 +: array :+ 4
should produce:
res3: Array[Int] = Array(0, 1, 2, 3, 4)
It's the same as with any other implementation of Seq.
val array2 = array :+ 4
//Array(1, 2, 3, 4)
Works also "reversed":
val array2 = 4 +: array
Array(4, 1, 2, 3)
There is also an "in-place" version:
var array = Array( 1, 2, 3 )
array +:= 4
//Array(4, 1, 2, 3)
array :+= 0
//Array(4, 1, 2, 3, 0)
The easiest might be:
Array(1, 2, 3) :+ 4
Actually, Array can be implcitly transformed in a WrappedArray