Replace zero array with new values one by one NumPy - arrays

I stuck with a simple question in NumPy. I have an array of zero values. Once I generate a new value I would like to add it one by one.
arr=array([0,0,0])
# something like this
l=[1,5,10]
for x in l:
arr.append(x) # from python logic
so I would like to add one by one x into array, so I would get: 1st iteration arr=([1,0,0]); 2d iteration arr=([1,5,0]); 3rd arr=([1,5,10]);
Basically I need to substitute zeros with new values one by one in NumPy (I am learning NumPy!!!!!!).
I checked many of NumPy options like np.append (it adds to existing values new values), but can't find the right.
thank you

There are a few things to pick up with numpy:
you can generate the array full of zeros with
>>> np.zeros(3)
array([ 0., 0., 0.])
You can get/set array elements with indexing as with lists etc:
arr[2] = 7
for i, val in enumerate([1, 5, 10]):
arr[i] = val
Or, if you want to fill with array with something like a list, you can directly use:
>>> np.array([1, 5, 10])
array([ 1, 5, 10])
Also, numpy's signature for appending stuff to an array is a bit different:
arr = np.append(arr, 7)
Having said that, you should just consider diving into Numpy's own userguide.

Related

Python: Finding a numpy array in a list of numpy arrays

I have a list of 50 numpy arrays called vectors:
[array([0.1, 0.8, 0.03, 1.5], dtype=float32), array([1.2, 0.3, 0.1], dtype=float32), .......]
I also have a smaller list (means) of 10 numpy arrays, all of which are from the bigger list above. I want to loop though each array in means and find its position in vectors.
So when I do this:
for c in means:
print(vectors.index(c))
I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I've gone through various SO questions and I know why I'm getting this error, but I can't find a solution. Any help?
Thanks!
One possible solution is converting to a list.
vectors = np.array([[1, 2, 3], [4, 5, 6], [7,8,9], [10,11,12]], np.int32)
print vectors.tolist().index([1,2,3])
This will return index 0, because [1,2,3] can be found in index 0 of vectors
Th example above is a 2d Numpy array, however you seem to have a list of numpy arrays,
so I would convert it to a list of lists this way:
vectors = [arr.tolist() for arr in vectors]
Do the same for means:
means = [arr.tolist() for arr in means]
Now we are working with two lists of lists:
So your original 'for loop' will work:
for c in means:
print(vectors.index(c))

Why does deepcopy change values of numpy array?

I am having a problem in which values in a Numpy array change after copying it with copy.deepcopy or numpy.copy, in fact, I get different values if I just print the array first before copying it.
I am using Python 3.5, Numpy 1.11.1, Scipy 0.18.0
My starting array is contained in a list of tuples; each tuple is pair: a float (a time point) and a numpy array (the solution of an ODE at that time point), e.g.:
[(0.0, array([ 0., ... 0.])), ...
(3.0, array([ 0., ... 0.]))]
In this case, I want the array for the last time point.
When I call the following:
tandy = c1.IntegrateColony(3)
ylast = copy.deepcopy(tandy[-1][1])
print(ylast)
I get something that makes sense for the system I'm trying to simulate:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
However, with the following:
tandy = c1.IntegrateColony(3)
print(tandy[-1][1])
ylast = copy.deepcopy(tandy[-1][1])
print(ylast)
I get all zeros:
[0.00000000e+00 0.00000000e+00 ... 0.00000000e+00 0.00000000e+00]
[ 0. 0. ... 0. 0.]
I should add, with larger systems and different parameters, displaying tandy[k][1] (either with print() or just by calling it in the command line) shows all non-zero values that are all very close to zero, i.e. <1e-70, but that's still not sensible for the system.
With:
tandy = c1.IntegrateColony(3)
ylast = np.copy(tandy[-1][1])
print(ylast)
I get sensible output again:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
The function that generates 'tandy' is the following (edited for clarity), which uses scipy.integrate.ode, and the set_solout method to get the solution at intermediate time points:
def IntegrateColony(self, tmax=1):
# I edited out initialization of dCdt & first_step for clarity.
y = ode(dCdt)
y.set_integrator('dopri5', first_step=dt0, nsteps=2000)
sol = []
def solout(tcurrent, ytcurrent):
sol.append((tcurrent, ytcurrent))
y.set_solout(solout)
y.set_initial_value(y=C0, t=0)
yfinal = y.integrate(tmax)
return sol
Although I could get the last time point by returning yfinal, I'd like to get the whole time course once I figure out why it's behaving the way it is.
Thanks for your suggestions!
Mickey
Edit:
If I print all of sol (print(tandy) or print(IntegrateColony...), it comes out as shown above (with the values in the arrays as 0), i.e.:
[(0.0, array([ 0., ... 0.])), ...
(3.0, array([ 0., ... 0.]))]
However, if I copy it with (y = copy.deepcopy(tandy); print(y)), the arrays take on values between 1e-7 and 1e+1.
If I do print(tandy[-1][1]) twice in a row, they're filled with zeros, but the format changes (from 0.0000 to 0.).
One other feature I noticed while following the suggestions in LutzL's and hpaulj's comments: if I run tandy = c1.IntegrateColony(3) in the console (running Spyder), the arrays are filled with zeros in the variable explorer. However, if I run the following in the console:
tandy = c1.IntegrateColony(3); ylast=copy.deepcopy(tandy)
Both the arrays in tandy and in ylast are filled with values in the range I would expect, and print(tandy[-1][1]) now gives:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
Even if I find a solution that stops this behavior, I'd appreciate anyone's insight about what's going on so I don't make the same mistakes again.
Thanks!
Edit:
Here's a simple case that gives this behavior:
import numpy as np
from scipy.integrate import ode
def testODEint(tmax=1):
C0 = np.ones((3,))
# C0 = 1 # This seems to behave the same
def dCdt_simpleinputs(t, C):
return C
y = ode(dCdt_simpleinputs)
y.set_integrator('dopri5')
sol = []
def solout(tcurrent, ytcurrent):
sol.append((tcurrent, ytcurrent)) # Behaves oddly
# sol.append((tcurrent, ytcurrent.copy())) # LutzL's idea: Works
y.set_solout(solout)
y.set_initial_value(y=C0, t=0)
yfinal = y.integrate(tmax)
return sol
tandy = testODEint(1)
ylast = np.copy(tandy[-1][1])
print(ylast) # Expect same values as tandy[-1][1] below
tandy = testODEint(1)
tandy[-1][1]
print(tandy[-1][1]) # Expect same values as ylast above
When I run this, I get the following output for ylast and tandy[-1][1]:
[ 2.71828196 2.71828196 2.71828196]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00]
The code I was working on when I ran into this problem is an embarrassing mess, but if you want to take a look, an old version is here: https://github.com/mvondassow/BryozoanModel2
The details of why this is happening are tied to how ytcurrent is handled in integrate. But there are various contexts in Python where all values of a list end up the same - contrary to expectations.
For example:
In [159]: x
Out[159]: [0, 1, 2]
In [160]: x=[]
In [161]: y=np.array([1,2,3])
In [162]: for i in range(3):
...: y += i
...: x.append(y)
In [163]: x
Out[163]: [array([4, 5, 6]), array([4, 5, 6]), array([4, 5, 6])]
All elements of x have the same value - because they all are pointers to the same y, and thus show its final value.
but if I copy y before appending it to the list, I see the changes.
In [164]: x=[]
In [165]: for i in range(3):
...: y += i
...: x.append(y.copy())
In [166]: x
Out[166]: [array([4, 5, 6]), array([5, 6, 7]), array([7, 8, 9])]
In [167]:
Now that does not explain why the print statement changes the values. But that whole solout callback mechanism is a bit obscure. I wonder if there are any warnings in scipy about pitfalls in defining such a callback?

How to vectorize NumPy polyder function?

I would like to vectorize the NumPy function polyder, which computes derivatives of polynomials. Is there a simple way or a built-in function to do it?
With vectorize, I mean that if the input is an array of polynomials, the output would be the array with the derivative of the polynomials.
An example:
p = np.array([[3,4,5], [1,2]])
the output should be something like
np.array([[6, 4], [1]])
Since your subarrays, both input and output, can have different lengths, you are better off treating both as lists.
In [97]: [np.polyder(d) for d in [[3,4,5],[1,2]]]
Out[97]: [array([6, 4]), array([1])]
Your p is just a list in an expensive (timewise) array wrapper.
In [100]: p=np.array([[3,4,5],[1,2]])
In [101]: p
Out[101]: array([[3, 4, 5], [1, 2]], dtype=object)
There is little that you can do with such an array that you can't do just as well with a list. Do some time tests. You probably will find that iterating over the arrays of objects is slower than iteration over equivalent lists, especially if you take into account the time it takes convert a list to array.
It can also be tricky to create such arrays. If all the sublists are the same length the result will be a 2d array. Forcing them to be an object array takes special initiation.
A general rull of thumb is - if individual steps work with arrays or lists of different length, you probably can't vectorize. That is, you can't form a rectangular 2d array and apply vector operations.
If the polynomial lists were all the same length, then p could be 2d, and the result could also be that:
In [107]: p=np.array([[3,4,5],[0,1,2]])
In [108]: p
Out[108]:
array([[3, 4, 5],
[0, 1, 2]])
In [109]: np.array([np.polyder(i) for i in p])
Out[109]:
array([[6, 4],
[0, 1]])
In effect it is iterating over the rows of p, and then reassembling the result into an array. There are some numpy functions that streamline iteration (but don't speed it up much), but I see little need for those here.
Looking at the code of this function, the core is:
p = NX.asarray(p)
n = len(p) - 1
y = p[:-1] * NX.arange(n, 0, -1)
which for this 2d array, (len 3) is:
In [117]: p[:,:-1]*np.arange(2,0,-1)
Out[117]:
array([[6, 4],
[0, 1]])
So if the number of polynomials are all the same, this simple multiplication gives the 1st order derivative coefficients. And of course the rows can be padded so they are all the same. So 'vectorization' is easier than I initially thought.
import numpy as np
p = np.array([[3,4,5], [1,2]])
np.array([np.polyder(coefficients) for coefficients in p]) # array([[6 4], [1]], dtype=object)
would fulfill your interface for your specific example. But as hpaulj mentions, there's little sense in working with NumPy arrays instead of normal python lists here, and no actual (hardware-level) vectorization will happen. (Though, as with list comprehensions in general, the interpreter would be free to employ other means of parallelism to compute them.)

Julia Approach to python equivalent list of lists

I just started tinkering with Julia and I'm really getting to like it. However, I am running into a road block. For example, in Python (although not very efficient or pythonic), I would create an empty list and append a list of a known size and type, and then convert to a NumPy array:
Python Snippet
a = []
for ....
a.append([1.,2.,3.,4.])
b = numpy.array(a)
I want to be able to do something similar in Julia, but I can't seem to figure it out. This is what I have so far:
Julia snippet
a = Array{Float64}[]
for .....
push!(a,[1.,2.,3.,4.])
end
The result is an n-element Array{Array{Float64,N},1} of size (n,), but I would like it to be an nx4 Array{Float64,2}.
Any suggestions or better way of doing this?
The literal translation of your code would be
# Building up as rows
a = [1. 2. 3. 4.]
for i in 1:3
a = vcat(a, [1. 2. 3. 4.])
end
# Building up as columns
b = [1.,2.,3.,4.]
for i in 1:3
b = hcat(b, [1.,2.,3.,4.])
end
But this isn't a natural pattern in Julia, you'd do something like
A = zeros(4,4)
for i in 1:4, j in 1:4
A[i,j] = j
end
or even
A = Float64[j for i in 1:4, j in 1:4]
Basically allocating all the memory at once.
Does this do what you want?
julia> a = Array{Float64}[]
0-element Array{Array{Float64,N},1}
julia> for i=1:3
push!(a,[1.,2.,3.,4.])
end
julia> a
3-element Array{Array{Float64,N},1}:
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
julia> b = hcat(a...)'
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
It seems to match the python output:
In [9]: a = []
In [10]: for i in range(3):
a.append([1, 2, 3, 4])
....:
In [11]: b = numpy.array(a); b
Out[11]:
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
I should add that this is probably not what you actually want to be doing as the hcat(a...)' can be expensive if a has many elements. Is there a reason not to use a 2d array from the beginning? Perhaps more context to the question (i.e. the code you are actually trying to write) would help.
The other answers don't work if the number of loop iterations isn't known in advance, or assume that the underlying arrays being merged are one-dimensional. It seems Julia lacks a built-in function for "take this list of N-D arrays and return me a new (N+1)-D array".
Julia requires a different concatenation solution depending on the dimension of the underlying data. So, for example, if the underlying elements of a are vectors, one can use hcat(a) or cat(a,dims=2). But, if a is e.g a 2D array, one must use cat(a,dims=3), etc. The dims argument to cat is not optional, and there is no default value to indicate "the last dimension".
Here is a helper function that mimics the np.array functionality for this use case. (I called it collapse instead of array, because it doesn't behave quite the same way as np.array)
function collapse(x)
return cat(x...,dims=length(size(x[1]))+1)
end
One would use this as
a = []
for ...
... compute new_a...
push!(a,new_a)
end
a = collapse(a)

Python 2.7: looping over 1D fibers in a multidimensional Numpy array

I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.
In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.
A working solution is the one below:
import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(A[fiber_index])
However, I am wondering whether there is something that is:
More idiomatic
Faster
Hope you can help!
I think you might be looking for numpy.apply_along_axis
In [10]: def my_func(x):
...: return x**2 + x
In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]:
array([[[ 0, 2, 6],
[ 12, 20, 30],
[ 42, 56, 72]],
[[ 90, 110, 132],
[156, 182, 210],
[240, 272, 306]],
[[342, 380, 420],
[462, 506, 552],
[600, 650, 702]]])
Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:
In [12]: np.sum(A, axis=2)
Out[12]:
array([[ 3, 12, 21],
[30, 39, 48],
[57, 66, 75]])
numpy provides a number of different ways of looping over 1 or more dimensions.
Your example:
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(fiber_index)
print A[fiber_index]
produces something like:
(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...
generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.
Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.
It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.
http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html
Iterating Over Arrays has good explanation, including an example of doing so in cython.
Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:
np.sum(A, axis=-1)
np.sum(A, axis=(1,2)) # sum over 2 axes
An equivalent is
np.add.reduce(A, axis=-1)
np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.
xnx suggests
np.apply_along_axis(np.sum, 2, A)
It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:
outarr[(i,j)] = np.sum(A[(i, j, slice(None))])
Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.
Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.
def with_ndindex(A, func, ax=-1):
# apply func along axis ax
A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
shape = A.shape[:-1]
B = np.empty(shape,dtype=A.dtype)
for ii in np.ndindex(shape):
B[ii] = func(A[ii])
return B
I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.
So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.

Resources