I would like to vectorize the NumPy function polyder, which computes derivatives of polynomials. Is there a simple way or a built-in function to do it?
With vectorize, I mean that if the input is an array of polynomials, the output would be the array with the derivative of the polynomials.
An example:
p = np.array([[3,4,5], [1,2]])
the output should be something like
np.array([[6, 4], [1]])
Since your subarrays, both input and output, can have different lengths, you are better off treating both as lists.
In [97]: [np.polyder(d) for d in [[3,4,5],[1,2]]]
Out[97]: [array([6, 4]), array([1])]
Your p is just a list in an expensive (timewise) array wrapper.
In [100]: p=np.array([[3,4,5],[1,2]])
In [101]: p
Out[101]: array([[3, 4, 5], [1, 2]], dtype=object)
There is little that you can do with such an array that you can't do just as well with a list. Do some time tests. You probably will find that iterating over the arrays of objects is slower than iteration over equivalent lists, especially if you take into account the time it takes convert a list to array.
It can also be tricky to create such arrays. If all the sublists are the same length the result will be a 2d array. Forcing them to be an object array takes special initiation.
A general rull of thumb is - if individual steps work with arrays or lists of different length, you probably can't vectorize. That is, you can't form a rectangular 2d array and apply vector operations.
If the polynomial lists were all the same length, then p could be 2d, and the result could also be that:
In [107]: p=np.array([[3,4,5],[0,1,2]])
In [108]: p
Out[108]:
array([[3, 4, 5],
[0, 1, 2]])
In [109]: np.array([np.polyder(i) for i in p])
Out[109]:
array([[6, 4],
[0, 1]])
In effect it is iterating over the rows of p, and then reassembling the result into an array. There are some numpy functions that streamline iteration (but don't speed it up much), but I see little need for those here.
Looking at the code of this function, the core is:
p = NX.asarray(p)
n = len(p) - 1
y = p[:-1] * NX.arange(n, 0, -1)
which for this 2d array, (len 3) is:
In [117]: p[:,:-1]*np.arange(2,0,-1)
Out[117]:
array([[6, 4],
[0, 1]])
So if the number of polynomials are all the same, this simple multiplication gives the 1st order derivative coefficients. And of course the rows can be padded so they are all the same. So 'vectorization' is easier than I initially thought.
import numpy as np
p = np.array([[3,4,5], [1,2]])
np.array([np.polyder(coefficients) for coefficients in p]) # array([[6 4], [1]], dtype=object)
would fulfill your interface for your specific example. But as hpaulj mentions, there's little sense in working with NumPy arrays instead of normal python lists here, and no actual (hardware-level) vectorization will happen. (Though, as with list comprehensions in general, the interpreter would be free to employ other means of parallelism to compute them.)
Related
Summary of problem
Ultimate goal
I would like to take a sub-array from a large input numpy array. This sub array is dynamic, and every iteration through the larger numpy input array will change the sub array so that I can perform a set of calculations that depend on previous iterations of the array. This involves nested for loops, which I realize is not very pythonic, but I don't know of another way.
Problem
The problem arises when I add to the existing dynamic sub-array, it seems to grow extra bracketing. This seems simple to fix, but I am having trouble adapting my Matlab knowledge of array indexing to numpy indexing. I have not even started implementing my calculations yet, but I cannot seem to get the structure of this loop correct.
What I've tried
I have [tried this originally in Pandas][1]. Originally, I thought I could write a pretty simply program to do this using pandas indexing and column naming. But it was SLOW! So I trying to streamline this by
changing the architecture and
relying on numpy instead of Pandas.
Below is a simple program that emulates what I want to do. I am sure I will have other questions, but this is the start. I have a simple (5, 2) array that I loop through the rows of. With each row after row 0, I add the new row to the top of the temp sub-array and delete the last row of the array, maintaining a (2, 2) array throughout. However, as you will see when you run this code, it results in some strange behavior that results in not being able to write the results into the output array. You will also see that I have tried several ways to add and delete columns. Whether these are optimal is besides the point - the current code is the closest I have gotten to running this program!
Some Example code
This code 'works' in the sense that it doesn't trow errors. However, it doesnt' produce the desired results. In this case it would be an output array with the same values as the inputs (because I am not doing any calculations- this is just to get the architecture correct). The desired result would be that each loop creates a sub array in this order:
n=1 [1 1]
n=2 [[1,1], [2,2]]
n=3 [[2, 2], [3, 3]]
n=4 [[3, 3], [4, 4]]
...
N [[N-1, N-1], [N, N]].
This does not need to be limited to 2 items (if list) or rows (if array), and the length will be set by an input variable. Thus, the size of this array must be dynamic (set during the call of the function). Furthermore, I supply a simple example here, but each loop will basically need to add a row from the input. It will be a little more advanced than simply a 2 member NDarray. Lists have the advantage of being able to use .append and .pop attributes, but as far as I can tell, arrays do not. I present the following code example using only arrays.
import numpy as np
a = np.array([[1, 1], [2, 2], [3, 3], [4,4], [5,5]])
print('Original a array: ', a)
out = np.empty_like(a)
b = np.empty(len(a[0,:]))
for ii, rr in enumerate(a):
if ii == 0:
c = [a[ii]]
else:
print('Before: ', c)
#Add next row from array a to the temp array for calculations
c = np.insert(c, 1, [rr], axis=0)
print('During: ', c)
#Remove the last row of the temp array prior to calculations
#indices_to_remove = [0]
#d = c[~np.isin(np.arange(c.size), [indices_to_remove])]
d = c[1::]
c = [d]
print('After: ', c)
#Add the temp array to the output array after calculations
#THIS THROWS ERRORS, AND I THINK IT IS DUE TO THE INCREASING NUMBERS OF BRACKETS.
#out[ii, :] = c
#print(c)
[1]: https://stackoverflow.com/questions/70186681/nested-loops-altering-rows-in-pandas-avoiding-a-value-is-trying-to-be-set-on?noredirect=1#comment124076103_70186681
MATLAB is 1-base indexing whereas Python uses 0-base indexing.
let's say we have a 2D array like this:
a= [[1, 2],
[3, 4],
[5, 6]]
In MATLAB if you do a(1, 1) you will get 1 in python a[1, 1] you will get 4. Also as you know in MATLAB you can do natural indexing which works if you do a(6) = 6 in python if you do that you will get IndexError: . it is almost the same, except in python it starts from 0.
Here is the working example with your desired results.
import numpy as np
a = np.array([[1, 1], [2, 2], [3, 3], [4,4], [5,5]])
test = []
for idx in range(len(a)):
if idx == 0:
test.append(a[idx])
test.append(a[idx:idx+2, :])
# remove the last [5, 5]
test.pop(-1)
for i in test:
print(i, end=',\n')
output
[1 1],
[[1 1]
[2 2]],
[[2 2]
[3 3]],
[[3 3]
[4 4]],
[[4 4]
[5 5]],
I am working on implementing a kmeans algorithm in python.
I am testing out new ways of initializing my centroids and wanted to implement it and see what affect it would have on the cluster.
My idea is to select datapoints from my data set in a way that the centroids are initialized to edge points of my data.
Simple example 2 attribute example:
Lets say this is my input array
input = array([[3,3], [1,1], [-1,-1], [3,-3], [-1,1], [-3,3], [1,-1], [-3,-3]])
From this array I would like to select the edges points which would be [3,3] [-3,-3] [-3,3] [3,-3]. So if my k is 4, these points would be selected
In the data that I am working with has 4 and 9 attributes and around 300 data points in my data set
note: I have no found a solution to when k <> edge points but if k is > edge points I think I would select these 4 points and then try to place the rest of them around the center point of the graph
I have also thought about finding max and min for each column and from there try to find the edges of my data set but I don't have an idea of an effective way of identifying the edges from these values.
If you believe this idea will not work I would love to hear what you have to say.
Questions
Does numpy have such a function to get the indexes of data points on the edge of my data set?
If not, how would I go at finding these edge points in my data set?
Use scipy and pair-wise distances to find how farther each one is from another:
from scipy.spatial.distance import pdist, squareform
p=pdist(input)
Then, use sqaureform to get p vector into a matrix shape:
s=squareform(pdist(input))
Then, use numpy argwhere to find the indices where values are max or are extreme, and then look up those indices in the input array:
input[np.argwhere(s==np.max(p))]
array([[[ 3, 3],
[-3, -3]],
[[ 3, -3],
[-3, 3]],
[[-3, 3],
[ 3, -3]],
[[-3, -3],
[ 3, 3]]])
Complete code would be:
from scipy.spatial.distance import pdist, squareform
p=pdist(input)
s=squareform(p)
input[np.argwhere(s==np.max(p))]
I have a list of 50 numpy arrays called vectors:
[array([0.1, 0.8, 0.03, 1.5], dtype=float32), array([1.2, 0.3, 0.1], dtype=float32), .......]
I also have a smaller list (means) of 10 numpy arrays, all of which are from the bigger list above. I want to loop though each array in means and find its position in vectors.
So when I do this:
for c in means:
print(vectors.index(c))
I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I've gone through various SO questions and I know why I'm getting this error, but I can't find a solution. Any help?
Thanks!
One possible solution is converting to a list.
vectors = np.array([[1, 2, 3], [4, 5, 6], [7,8,9], [10,11,12]], np.int32)
print vectors.tolist().index([1,2,3])
This will return index 0, because [1,2,3] can be found in index 0 of vectors
Th example above is a 2d Numpy array, however you seem to have a list of numpy arrays,
so I would convert it to a list of lists this way:
vectors = [arr.tolist() for arr in vectors]
Do the same for means:
means = [arr.tolist() for arr in means]
Now we are working with two lists of lists:
So your original 'for loop' will work:
for c in means:
print(vectors.index(c))
I have two large data files, one with two columns and one with three columns. I want to select all the rows from the second file that are contained in the fist array. My idea was to compare the numpy arrays.
Let's say I have:
a = np.array([[1, 2, 3], [3, 4, 5], [1, 4, 6]])
b = np.array([[1, 2], [3, 4]])
and the result should look like this:
[[1, 2, 3], [3, 4, 5]]
Any advice on that?
EDIT:
So in the end this works. Not very handy but it works.
for ii in range(a.shape[0]):
u, v, w = a[ii,:]
for jj in range(b.shape[0]):
if (u == b[jj, 0] and v == b[jj, 1]):
print [u, v, w]
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems efficiently, without using any python loops:
import numpy_indexed as npi
a[npi.contains(b, a[:, :2])]
If you prefer to not use another library but want to do this in numpy only, you can do something similar to what is suggested here and here, namely to use np.in1d (see docs) which does provide you with a mask indicating if an element in one 1D array exists in another 1D array. As the name indicates, this function only works for 1D arrays. But you can use a structured array view (using np.view) to cheat numpy into thinking you have 1D arrays. One caveat is though, that you need a deep copy of the first array a since np.view doesn't mix with slices, well. But if that is not too big of an issue for you, something along the lines of:
a_cp = a[:, :2].copy()
a[np.in1d(a_cp.view((np.void, a_cp.dtype.itemsize*a_cp.shape[1])).ravel(),
b.view((np.void, b.dtype.itemsize*b.shape[1])).ravel())]
might work for you.
This directly uses the masked array to return the correct values from your array a.
Check this, #Ernie. It may help you to get to the solution. ;D
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.in1d.html
I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.
In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.
A working solution is the one below:
import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(A[fiber_index])
However, I am wondering whether there is something that is:
More idiomatic
Faster
Hope you can help!
I think you might be looking for numpy.apply_along_axis
In [10]: def my_func(x):
...: return x**2 + x
In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]:
array([[[ 0, 2, 6],
[ 12, 20, 30],
[ 42, 56, 72]],
[[ 90, 110, 132],
[156, 182, 210],
[240, 272, 306]],
[[342, 380, 420],
[462, 506, 552],
[600, 650, 702]]])
Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:
In [12]: np.sum(A, axis=2)
Out[12]:
array([[ 3, 12, 21],
[30, 39, 48],
[57, 66, 75]])
numpy provides a number of different ways of looping over 1 or more dimensions.
Your example:
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(fiber_index)
print A[fiber_index]
produces something like:
(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...
generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.
Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.
It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.
http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html
Iterating Over Arrays has good explanation, including an example of doing so in cython.
Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:
np.sum(A, axis=-1)
np.sum(A, axis=(1,2)) # sum over 2 axes
An equivalent is
np.add.reduce(A, axis=-1)
np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.
xnx suggests
np.apply_along_axis(np.sum, 2, A)
It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:
outarr[(i,j)] = np.sum(A[(i, j, slice(None))])
Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.
Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.
def with_ndindex(A, func, ax=-1):
# apply func along axis ax
A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
shape = A.shape[:-1]
B = np.empty(shape,dtype=A.dtype)
for ii in np.ndindex(shape):
B[ii] = func(A[ii])
return B
I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.
So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.