I have an enormous 1D numpy array of booleans w and an increasing list of indices i, which splits w into len(i)+1 subarrays. A toy example is:
w=numpy.array([True,False,False,False,True,True,True,True,False,False])
i=numpy.array([0,0,2,5,5,8,8])
I wish to compute a numpy array wi, whose i-th entry is 1 if the i-th subarray contains a True and 0 otherwise. In other words, the i-th entry of w is the sum (logical 'or') of elements of the i-th subarray of w. In our example, the output is:
[0 0 1 1 0 1 0 0]
This is achieved with the code:
wi=numpy.fromiter(map(numpy.any,numpy.split(w,i)),int)
Is there a more efficient way of doing this or is this optimal as far as memory is concerned?
P.S. related post
For efficiency (memory and performance), use np.bitwise_or.reduceat as it keeps the output in boolean -
In [10]: np.bitwise_or.reduceat(w,np.r_[0,i])
Out[10]: array([ True, True, False, True, False, False])
To have as int output, view as int -
In [11]: np.bitwise_or.reduceat(w,np.r_[0,i]).view('i1')
Out[11]: array([1, 1, 0, 1, 0, 0], dtype=int8)
Here's all-weather solution -
def slice_reduce_or(w, i):
valid = i<len(w)
invalidc =( ~valid).sum()
i = i[valid]
mi = np.r_[i[:-1]!=i[1:],True]
pp = i[mi]
p1 = np.bitwise_or.reduceat(w,pp)
N = len(i)+1
out = np.zeros(N+invalidc, dtype=bool)
out[1:N][mi] = p1
out[0] = w[:i[0]].any()
return out.view('i1')
Let's try np.add.reductat:
wi = np.add.reduceat(w,np.r_[0,i]).astype(bool)
output:
array([1, 1, 0, 1, 0, 0])
And performance:
%timeit -n 100 wi = np.add.reduceat(w,np.r_[0,i]).astype(bool).astype(int)
21.7 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n 100 wi=np.fromiter(map(np.any,np.split(w,i)),int)
44.5 µs ± 7.79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So we're looking at about 2x speed here.
Related
For a numpy array
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
You can get a slice using something like a[3:6]
But what about getting the rest of the slice? What is the most computationally efficient method for this? So something like a[:3, 6:].
The best I can come up with is to use a concatenate.
np.concatenate([a[:3], a[6:]], axis=0)
I am wondering if this is the best method, as I will be doing millions of these operations for a data processing pipeline.
Your solution seems to be the most efficient one since it is more than 2x faster than the next best thing.
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
%timeit -n 100000 np.concatenate([a[:3], a[6:]], axis=0)
%timeit -n 100000 np.delete(a, slice(3, 6))
%timeit -n 100000 a[np.r_[:3,6:]]
>2.03 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>4.61 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>11 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
However, the real question is if these operations (complement set of slice/deletion) need to be applied consecutively. Otherwise, you could aggregate the indices via set operations and slice the compliment a single time in the end to obtain the proper NumPy array.
I find declaring an empty array and filling it up seems to be very slightly better than using concat . As André mentioned in their comment this will vary based on the shape.
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
def testing123():
new = np.zeros(6, dtype=int)
new[:3] = a[0:3]
new[3:] = a[6:]
return new
%timeit -n 100000 np.concatenate([a[:3], a[6:]], axis=0)
100000 loops, best of 5: 2.18 µs per loop
%timeit -n 100000 np.delete(a, slice(3, 6))
100000 loops, best of 5: 6.11 µs per loop
%timeit -n 100000 a[np.r_[:3,6:]]
100000 loops, best of 5: 16.4 µs per loop
%timeit -n 100000 testing123()
100000 loops, best of 5: 2.01 µs per loop
a = np.arange(10_000)
def testing123():
new = np.empty(5000, dtype=int)
new[:2500] = a[:2500]
new[2500:] = a[7500:]
return new
%timeit -n 100000 np.concatenate([a[:2500], a[7500:]], axis=0)
100000 loops, best of 5: 3.99 µs per loop
%timeit -n 100000 np.delete(a, slice(2500, 7500))
100000 loops, best of 5: 7.76 µs per loop
%timeit -n 100000 a[np.r_[:2500,7500:]]
100000 loops, best of 5: 47.3 µs per loop
%timeit -n 100000 testing123()
100000 loops, best of 5: 3.61 µs per loop
The following function apply numpy functions to two numpy arrays.
import numpy as np
def my_func(a: np.ndarray, b: np.ndarray) -> float:
return np.nanmin(a, axis=0) + np.nanmin(b, axis=0)
>>> my_func(np.array([1., 2., np.nan]), np.array([1., np.nan]))
2.0
However what is the best way to apply this same function to an np.array of np.array of different shape ?
a = np.array([np.array([1., 2]), np.array([1, 2., 3, np.nan])], dtype=object) # First array shape (2,), second (3,)
b = np.array([np.array([1]), np.array([1.5, 2.5, np.nan])], dtype=object)
np.vectorize does work
>>> np.vectorize(my_func)(a, b)
array([2. , 2.5])
but as specified by the vectorize documentation:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Is there a more clever solution ?
I could use np.pad to have identifical shape but it seems sub-optimal as it requires to pad up to the maximum length of the inside arrays (here 4 for a and 3 for b).
I looked at numba and this stack exchange about performance but I am not sure of the best pratice for such a case.
Thanks !
Your function and arrays:
In [222]: def my_func(a: np.ndarray, b: np.ndarray) -> float:
...: return np.nanmin(a, axis=0) + np.nanmin(b, axis=0)
...:
In [223]: a = np.array([np.array([1., 2]), np.array([1, 2., 3, np.nan])], dtype=object
...: ) # First array shape (2,), second (3,)
...: b = np.array([np.array([1]), np.array([1.5, 2.5, np.nan])], dtype=object)
In [224]: a
Out[224]: array([array([1., 2.]), array([ 1., 2., 3., nan])], dtype=object)
In [225]: b
Out[225]: array([array([1]), array([1.5, 2.5, nan])], dtype=object)
Compare vectorize with a straightforward list comprehension:
In [226]: np.vectorize(my_func)(a, b)
Out[226]: array([2. , 2.5])
In [227]: [my_func(i,j) for i,j in zip(a,b)]
Out[227]: [2.0, 2.5]
and their times:
In [228]: timeit np.vectorize(my_func)(a, b)
157 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [229]: timeit [my_func(i,j) for i,j in zip(a,b)]
85.9 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [230]: timeit np.array([my_func(i,j) for i,j in zip(a,b)])
89.7 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you are going to work with object arrays, frompyfunc is faster than vectorize:
In [231]: np.frompyfunc(my_func,2,1)(a, b)
Out[231]: array([2.0, 2.5], dtype=object)
In [232]: timeit np.frompyfunc(my_func,2,1)(a, b)
83.2 µs ± 50.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I'm a bit surprised that it's even better than the list comprehension.
frompyfunc (and vectorize) are more useful when the inputs need to 'broadcast' against each other:
In [233]: np.frompyfunc(my_func,2,1)(a[:,None], b)
Out[233]:
array([[2.0, 2.5],
[2.0, 2.5]], dtype=object)
I'm not a numba expert, but I suspect it doesn't handle object dtype arrays, or it it does it doesn't improve speed much. Remember, object dtype means the elements are object references, just like in lists.
I get better times by using otypes and taking the function creation out of the timing loop:
In [235]: %%timeit f=np.vectorize(my_func, otypes=[float])
...: f(a, b)
...:
...:
95.5 µs ± 316 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [236]: %%timeit f=np.frompyfunc(my_func,2,1)
...: f(a, b)
...:
...:
81.1 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you don't know about otypes, you haven't read the np.vectorize docs well enough.
We have to perform the following operation around 400,000 times so I'm searching for the most efficient solution. I have tried several things but I'm curious whether there are even better approaches :)
Data example
We can use the following code to generate an example test set
random.seed(10)
np.random.seed(10)
def test_str():
n = 10000000
arr = np.random.randint(10000, size=n)
sign = np.random.choice(['+','-'], size=n)
return 'ID1' + '\t' + ' '.join(["{}{}".format(a,b) for a,b in zip(arr, sign)])
Which looks like ID1\t7688+ 737+ 677+ 1508- 9251-......
The code where it is all about :)
Copy the code from google colab (P.s. running it there gave me a TypingError whereas it ran fine on my machine), or just see the functions below
General function
From this Numba issue , but based on #armamut answer this may introduce a lot of overhead with Numba, making native Numpy apparently faster..
#nb.jit(nopython=True)
def str_to_int(s):
final_index, result = len(s) - 1, 0
for i,v in enumerate(s):
result += (ord(v) - 48) * (10 ** (final_index - i))
return result
Approach 1
#nb.jit(nopython=True)
def process_number(numb, identifier, i):
sign = 1 if numb[-1] == '+' else -1
return str_to_int(numb[:-1]), sign, i, identifier
#nb.jit(nopython=True)
def expand1(data):
identifier, l = data.split('\t')
identifier = str_to_int(identifier[-1])
numbers = l.split()
# init emtpy numpy array
arr = np.empty(shape = (len(numbers), 4), dtype = np.int64)
# Fill array
for i, numb in enumerate(numbers):
arr[i,:] = process_number(numb, identifier, i)
return arr
Approach 2
#nb.jit(nopython=True)
def expand2(data):
identifier, l = data.split('\t')
identifier = str_to_int(identifier[-1])
numbers = l.split()
size = len(numbers)
numbs = [ str_to_int(numb[:-1]) for numb in numbers ]
signs = [ 1 if numb[:-1] =='+' else -1 for numb in numbers ]
arr = np.empty(shape = (size, 4), dtype = np.int64)
arr[:,0] = numbs
arr[:,1] = signs
arr[:,2] = np.arange(0, size)
arr[:,3] = np.repeat(identifier, size)
return arr
Approach 3
#nb.jit(nopython=True)
def expand3(data):
identifier, l = data.split('\t')
identifier = str_to_int(identifier[-1])
numbers = l.split()
arr = np.empty(shape = (len(numbers), 4), dtype = np.int64)
for i, numb in enumerate(numbers):
arr[i,:] = str_to_int(numb[:-1]), 1 if numb[:-1] =='+' else -1, i, identifier
return arr
Answer approach
def expand4(t):
identifier, l = t.split('\t')
identifier = np.int(identifier[-1])
numbers = np.array([np.int(k[:-1]) for k in l.split(' ')])
signs = np.array([(k[-1] == '+') for k in l.split(' ')]) * 2 - 1
N = len(numbers)
arr = np.empty(shape = (N, 4), dtype = np.int64)
arr[:, 0] = numbers
arr[:, 1] = signs
arr[:, 2] = identifier
arr[:, 3] = np.arange(N)
return arr
Test results:
Expand 1
72.7 ms ± 177 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
Expand 2
27.9 ms ± 67.1 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
Expand 3
8.81 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
Expand 4 ANSWER 1
429 µs ± 63.4 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)
I cannot replicate your code, as I also got "ord" is not implemented error for numba.
But why are you using numba? Your str_to_int operation seems to be very expensive and unoptimized for vector operations etc. Why not (without numba):
def expand(t):
identifier, l = t.split('\t')
identifier = np.int(identifier[-1])
numbers = np.array([np.int(k[:-1]) for k in l.split(' ')])
signs = np.array([(k[-1] == '+') for k in l.split(' ')]) * 2 - 1
N = len(numbers)
arr = np.empty(shape = (N, 4), dtype = np.int64)
arr[:, 0] = numbers
arr[:, 1] = signs
arr[:, 2] = identifier
arr[:, 3] = np.arange(N)
return arr
t = test_str()
%timeit expand(t)
>>>
1.01 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Basically I have two 1d numpy arrays, let's call them x and y, both of the same length. I want to essentially get the result x1y1 + x2y2 + ... + xn*yn. Obviously I could do this with a for loop but is there a built-in method or something where I can do this in one line?
What you are trying to compute is known as an 'inner product' and, in the case of two vectors, is called a 'dot product'. Numpy has built-in functions for computing both which are optimized for speed over the simple (x*y).sum() solution.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([3, 2, 1])
print(np.inner(a, b))
# 10
print(np.dot(a, b))
# 10
Some timing results in the table below with vectors a and b being 1000 randomly selected elements using np.random.randn:
np.dot(a, b) # 920 ns ± 9.9 ns
np.inner(a, b) # 1.1 µs ± 83.5 ns
(a*b).sum() # 4.2 µs ± 62.9 ns
np.sum(a*b) # 5.7 µs ± 170 ns
You can use sum(x*y) or (x*y).sum(), they're equivalent.
Say I have the following numpy array
n = 50
a = np.array(range(1, 1000)) / 1000.
I would like to execute this line of code
%timeit v = [a ** k for k in range(0, n)]
1000 loops, best of 3: 2.01 ms per loop
However, this line of code will ultimately be executed in a loop, therefore I have performance issues.
Is there a way to optimize the loop? For example, the result of a specific calculation i in the list comprehension is simply the result of the previous calculation result in the loop, multiplied by a again.
I don't mind storing the results in a 2d-array instead of arrays in a list. That would probably be cleaner. By the way, I also tried the following, but it yields similar performance results:
k = np.array(range(0, n))
ones = np.ones(n)
temp = np.outer(a, ones)
And then performed the following calculation
%timeit temp ** k
1000 loops, best of 3: 1.96 ms per loop
or
%timeit np.power(temp, k)
1000 loops, best of 3: 1.92 ms per loop
But both yields similar results to the list comprehension above. By the way, n will always be an integer in my case.
In quick tests cumprod seems to be faster.
In [225]: timeit v = np.array([a ** k for k in range(0, n)])
2.76 ms ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [228]: %%timeit
...: A=np.broadcast_to(a[:,None],(len(a),50))
...: v1=np.cumprod(A,axis=1)
...:
208 µs ± 42.3 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
To compare values I have to tweak ranges, since v includes a 0 power, while v1 starts with a 1 power:
In [224]: np.allclose(np.array(v)[1:], v1.T[:-1])
Out[224]: True
But the timings suggest that cumprod is worth refining.
The proposed duplicate was Efficient way to compute the Vandermonde matrix. That still has good ideas.