The following function apply numpy functions to two numpy arrays.
import numpy as np
def my_func(a: np.ndarray, b: np.ndarray) -> float:
return np.nanmin(a, axis=0) + np.nanmin(b, axis=0)
>>> my_func(np.array([1., 2., np.nan]), np.array([1., np.nan]))
However what is the best way to apply this same function to an np.array of np.array of different shape ?
a = np.array([np.array([1., 2]), np.array([1, 2., 3, np.nan])], dtype=object) # First array shape (2,), second (3,)
b = np.array([np.array([1]), np.array([1.5, 2.5, np.nan])], dtype=object)
np.vectorize does work
>>> np.vectorize(my_func)(a, b)
array([2. , 2.5])
but as specified by the vectorize documentation:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Is there a more clever solution ?
I could use np.pad to have identifical shape but it seems sub-optimal as it requires to pad up to the maximum length of the inside arrays (here 4 for a and 3 for b).
I looked at numba and this stack exchange about performance but I am not sure of the best pratice for such a case.
Thanks !

Your function and arrays:
In [222]: def my_func(a: np.ndarray, b: np.ndarray) -> float:
...: return np.nanmin(a, axis=0) + np.nanmin(b, axis=0)
In [223]: a = np.array([np.array([1., 2]), np.array([1, 2., 3, np.nan])], dtype=object
...: ) # First array shape (2,), second (3,)
...: b = np.array([np.array([1]), np.array([1.5, 2.5, np.nan])], dtype=object)
In [224]: a
Out[224]: array([array([1., 2.]), array([ 1., 2., 3., nan])], dtype=object)
In [225]: b
Out[225]: array([array([1]), array([1.5, 2.5, nan])], dtype=object)
Compare vectorize with a straightforward list comprehension:
In [226]: np.vectorize(my_func)(a, b)
Out[226]: array([2. , 2.5])
In [227]: [my_func(i,j) for i,j in zip(a,b)]
Out[227]: [2.0, 2.5]
and their times:
In [228]: timeit np.vectorize(my_func)(a, b)
157 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [229]: timeit [my_func(i,j) for i,j in zip(a,b)]
85.9 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [230]: timeit np.array([my_func(i,j) for i,j in zip(a,b)])
89.7 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you are going to work with object arrays, frompyfunc is faster than vectorize:
In [231]: np.frompyfunc(my_func,2,1)(a, b)
Out[231]: array([2.0, 2.5], dtype=object)
In [232]: timeit np.frompyfunc(my_func,2,1)(a, b)
83.2 µs ± 50.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I'm a bit surprised that it's even better than the list comprehension.
frompyfunc (and vectorize) are more useful when the inputs need to 'broadcast' against each other:
In [233]: np.frompyfunc(my_func,2,1)(a[:,None], b)
array([[2.0, 2.5],
[2.0, 2.5]], dtype=object)
I'm not a numba expert, but I suspect it doesn't handle object dtype arrays, or it it does it doesn't improve speed much. Remember, object dtype means the elements are object references, just like in lists.
I get better times by using otypes and taking the function creation out of the timing loop:
In [235]: %%timeit f=np.vectorize(my_func, otypes=[float])
...: f(a, b)
95.5 µs ± 316 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [236]: %%timeit f=np.frompyfunc(my_func,2,1)
...: f(a, b)
81.1 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you don't know about otypes, you haven't read the np.vectorize docs well enough.


Most computationally efficient method to get the rest of the array of a slice in numpy array?

For a numpy array
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
You can get a slice using something like a[3:6]
But what about getting the rest of the slice? What is the most computationally efficient method for this? So something like a[:3, 6:].
The best I can come up with is to use a concatenate.
np.concatenate([a[:3], a[6:]], axis=0)
I am wondering if this is the best method, as I will be doing millions of these operations for a data processing pipeline.
Your solution seems to be the most efficient one since it is more than 2x faster than the next best thing.
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
%timeit -n 100000 np.concatenate([a[:3], a[6:]], axis=0)
%timeit -n 100000 np.delete(a, slice(3, 6))
%timeit -n 100000 a[np.r_[:3,6:]]
>2.03 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>4.61 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>11 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
However, the real question is if these operations (complement set of slice/deletion) need to be applied consecutively. Otherwise, you could aggregate the indices via set operations and slice the compliment a single time in the end to obtain the proper NumPy array.
I find declaring an empty array and filling it up seems to be very slightly better than using concat . As André mentioned in their comment this will vary based on the shape.
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
def testing123():
new = np.zeros(6, dtype=int)
new[:3] = a[0:3]
new[3:] = a[6:]
return new
%timeit -n 100000 np.concatenate([a[:3], a[6:]], axis=0)
100000 loops, best of 5: 2.18 µs per loop
%timeit -n 100000 np.delete(a, slice(3, 6))
100000 loops, best of 5: 6.11 µs per loop
%timeit -n 100000 a[np.r_[:3,6:]]
100000 loops, best of 5: 16.4 µs per loop
%timeit -n 100000 testing123()
100000 loops, best of 5: 2.01 µs per loop
a = np.arange(10_000)
def testing123():
new = np.empty(5000, dtype=int)
new[:2500] = a[:2500]
new[2500:] = a[7500:]
return new
%timeit -n 100000 np.concatenate([a[:2500], a[7500:]], axis=0)
100000 loops, best of 5: 3.99 µs per loop
%timeit -n 100000 np.delete(a, slice(2500, 7500))
100000 loops, best of 5: 7.76 µs per loop
%timeit -n 100000 a[np.r_[:2500,7500:]]
100000 loops, best of 5: 47.3 µs per loop
%timeit -n 100000 testing123()
100000 loops, best of 5: 3.61 µs per loop

Get max and std over array-fields in dataframe column pandas

(Pandas version 1.1.1.)
I have arrays as entries in the cells of a Dataframe column.
a = np.array([1,8])
b = np.array([5,14])
df = pd.DataFrame({'float':[1,2], 'array': [a,b]})
> float array
> 0 1 [1, 8]
> 1 2 [5, 14]
Now I need some statistics over each array position.
It works perfectly with the mean:
> array([ 3., 11.])
But if I try to do it with the maximum or the standard deviation error occur:
> setting an array element with a sequence.
> The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It seems like .mean() .std() ánd .max() are constructed differently. Anyhow, has someone an idea how to caluculate the std and max (and min etc), without dividing the array into several columns?
(The DataFrame has array's of different shapes. But I do only want to caluculate statistics within a .groupyby() over rows where the arrays do have the same shape.)
You can convert columns to 2d arrays and use numpy for count:
a = np.array([1,8])
b = np.array([5,14])
df = pd.DataFrame({'float':[1,2], 'array': [a,b]})
#2k for test
df = pd.concat([df] * 1000, ignore_index=True)
In [150]: %timeit (pd.DataFrame(df['array'].tolist(), index=df.index).std())
4.25 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [151]: %timeit (np.std(np.array(df['array'].tolist()), ddof=1, axis=0))
944 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [152]: %timeit (pd.DataFrame(df['array'].tolist(), index=df.index).max())
4.31 ms ± 646 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [153]: %timeit (np.max(np.array(df['array'].tolist()), axis=0))
836 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For 20k rows:
df = pd.concat([df] * 10000, ignore_index=True)
In [155]: %timeit (pd.DataFrame(df['array'].tolist(), index=df.index).std())
35.3 ms ± 87.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [156]: %timeit (np.std(np.array(df['array'].tolist()), ddof=1, axis=0))
9.13 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [157]: %timeit (pd.DataFrame(df['array'].tolist(), index=df.index).max())
35.3 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [158]: %timeit (np.max(np.array(df['array'].tolist()), axis=0))
8.21 ms ± 27.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Multiply each element of 2 numpy arrays and then sum up

Basically I have two 1d numpy arrays, let's call them x and y, both of the same length. I want to essentially get the result x1y1 + x2y2 + ... + xn*yn. Obviously I could do this with a for loop but is there a built-in method or something where I can do this in one line?
What you are trying to compute is known as an 'inner product' and, in the case of two vectors, is called a 'dot product'. Numpy has built-in functions for computing both which are optimized for speed over the simple (x*y).sum() solution.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([3, 2, 1])
print(np.inner(a, b))
# 10
print(, b))
# 10
Some timing results in the table below with vectors a and b being 1000 randomly selected elements using np.random.randn:, b) # 920 ns ± 9.9 ns
np.inner(a, b) # 1.1 µs ± 83.5 ns
(a*b).sum() # 4.2 µs ± 62.9 ns
np.sum(a*b) # 5.7 µs ± 170 ns
You can use sum(x*y) or (x*y).sum(), they're equivalent.

how to optimize iteration on exponential in numpy

Say I have the following numpy array
n = 50
a = np.array(range(1, 1000)) / 1000.
I would like to execute this line of code
%timeit v = [a ** k for k in range(0, n)]
1000 loops, best of 3: 2.01 ms per loop
However, this line of code will ultimately be executed in a loop, therefore I have performance issues.
Is there a way to optimize the loop? For example, the result of a specific calculation i in the list comprehension is simply the result of the previous calculation result in the loop, multiplied by a again.
I don't mind storing the results in a 2d-array instead of arrays in a list. That would probably be cleaner. By the way, I also tried the following, but it yields similar performance results:
k = np.array(range(0, n))
ones = np.ones(n)
temp = np.outer(a, ones)
And then performed the following calculation
%timeit temp ** k
1000 loops, best of 3: 1.96 ms per loop
%timeit np.power(temp, k)
1000 loops, best of 3: 1.92 ms per loop
But both yields similar results to the list comprehension above. By the way, n will always be an integer in my case.
In quick tests cumprod seems to be faster.
In [225]: timeit v = np.array([a ** k for k in range(0, n)])
2.76 ms ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [228]: %%timeit
...: A=np.broadcast_to(a[:,None],(len(a),50))
...: v1=np.cumprod(A,axis=1)
208 µs ± 42.3 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
To compare values I have to tweak ranges, since v includes a 0 power, while v1 starts with a 1 power:
In [224]: np.allclose(np.array(v)[1:], v1.T[:-1])
Out[224]: True
But the timings suggest that cumprod is worth refining.
The proposed duplicate was Efficient way to compute the Vandermonde matrix. That still has good ideas.

numpy.array(list) being slow

So I have a list with 5,000,000 integers. And I want to cover the list to a numpy array. I tried following code:
numpy.array( list )
But it is very slow.
I benchmarked this operation for 100 times and loop over the list for 100 times. There is no much difference.
Any good idea how to make it faster?
If you have cython you can create a function that is definetly faster. But just a warning: It will crash if there are invalid elements inside your list (not-integers or too big integers).
I use the IPython magic here (%load_ext cython and %%cython), the point is to show how the function looks like - not to show how you can compile Cython code (it's not hard and Cythons "how-to-compile" documentation is quite good).
%load_ext cython
cimport cython
import numpy as np
cpdef to_array(list inp):
cdef long[:] arr = np.zeros(len(inp), dtype=long)
cdef Py_ssize_t idx
for idx in range(len(inp)):
arr[idx] = inp[idx]
return np.asarray(arr)
And the timings:
import numpy as np
def other(your_list): # the approach from #Damian Lattenero in the other answer
ret = np.zeros(shape=(len(your_list)), dtype=int)
np.copyto(ret, your_list)
return ret
inp = list(range(1000000))
%timeit np.array(inp)
# 315 ms ± 5.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit np.array(inp, dtype=int)
# 311 ms ± 2.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit other(inp)
# 316 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit to_array(inp)
# 23.4 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So it's more than 10 times faster.
I think this is fast, I checked the times:
import numpy as np
import time
start_time = time.time()
number = 1
elements = 10000000
your_list = [number] * elements
ret = np.zeros(shape=(len(your_list)))
np.copyto(ret, your_list)
print("--- %s seconds ---" % (time.time() - start_time))
--- 0.7615997791290283 seconds ---
Make a big list of small integers; use the numpy crutch:
In [619]: arr = np.random.randint(0,256, 5000000)
In [620]: alist = arr.tolist()
In [621]: timeit alist = arr.tolist() # just for reference
10 loops, best of 3: 108 ms per loop
And time for plain list iteration (doesn't do anything)
In [622]: timeit [i for i in alist]
10 loops, best of 3: 193 ms per loop
Make an array of specified dtype
In [623]: arr8 = np.array(alist, 'uint8')
In [624]: timeit arr8 = np.array(alist, 'uint8')
1 loop, best of 3: 508 ms per loop
We can get a 2x improvement with fromiter; evidently it does less checking. np.array will work even if the list is a mix of numbers and strings. It also handles lists of lists etc.
In [625]: timeit arr81 = np.fromiter(alist, 'uint8')
1 loop, best of 3: 249 ms per loop
The advantage of working with arrays becomes apparent when we do math across the whole thing:
In [628]: timeit arr8.sum()
100 loops, best of 3: 6.93 ms per loop
In [629]: timeit sum(alist)
10 loops, best of 3: 74.4 ms per loop
In [630]: timeit 2*arr8
100 loops, best of 3: 6.89 ms per loop
In [631]: timeit [2*i for i in alist]
1 loop, best of 3: 465 ms per loop
It's well known that working with arrays is faster than with lists, but that there is a significant 'startup' overhead.
