White spaces in Theano arrays - arrays

I just started playing around with Theano and I got surprised by the result of this code.
from theano import *
import theano.tensor as T
a = T.vector()
out = a + a ** 10
f = function([a], out)
print(f([0, 1, 2]))
Using python3 I get:
array([ 0., 2., 1026.])
The array itself is correct, it contains the right values, however the printed output is odd. I would expect something like this:
array([0, 2, 1026])
or
array([0.0, 2.0, 1026.0])
Why it is so? What are the extra white spaces? Shall I be concerned about?

What you're printing is a numpy.ndarray. By default they format themselves like this when printed.
The output array is a floating point array because, by default, Theano uses floating point tensors.
If you want to use integer tensors then you need to specify a dtype:
a = T.vector(dtype='int64')
Or use a bit of syntactic sugar:
a = T.lvector()
Compare your output with the output of the following:
print numpy.array([0, 2, 1026], dtype=numpy.float64)
print numpy.array([0, 2, 1026], dtype=numpy.int64)
You can change the default printing options of numpy using numpy.set_printoptions.

Related

Saving to an empty array of arrays from nested for-loop

I have an array of arrays filled with zeros, so this is the shape I want for the result.
I'm having trouble saving the nested for-loop to this array of arrays. In other words, I want to replace all of the zeros with what the last line calculates.
percent = []
for i in range(len(F300)):
percent.append(np.zeros(lengths[i]))
for i in range(0,len(Name)):
for j in range(0,lengths[i]):
percent[i][j]=(j+1)/lengths[i]
The last line only saves the last j value for each i.
I'm getting:
percent = [[0,0,1],[0,1],[0,0,0,1]]
but I want:
percent = [[.3,.6,1],[.5,1],[.25,.5,75,1]]
The problem with this code is that because it's in Python 2.7, the / operator is performing "classic" division. There are a couple different approaches to solve this in Python 2.7. One approach is to convert the numbers being divided into floating point numbers:
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = []
for i in range(3): # Deduced size of F300 from the length of percent.
percent.append(np.zeros(lengths[i]))
for i in range(0, len(percent)):
for j in range(0, lengths[i]): #
percent[i][j] = float(j + 1) / float(lengths[i])
Another approach would be to import division from the __future__ package. However, this import line must be the first statement in your code.
from __future__ import division
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = []
for i in range(3): # Deduced size of F300 from the length of percent.
percent.append(np.zeros(lengths[i]))
for i in range(0, len(percent)):
for j in range(0, lengths[i]):
percent[i][j] = (j + 1) / lengths[i]
The third approach, and the one I personally prefer, is to make good use of NumPy's built-in functions:
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = np.array([np.linspace(1.0 / float(l), 1.0, l) for l in lengths])
All three approaches will produce a list (or in the last case, numpy.ndarray object) of numpy.ndarray objects with the following values:
[[0.33333333, 0.66666667, 1.], [0.5, 1.], [0.25, 0.5, 0.75, 1.]]

iterating through two numpy arrays applying a function in Python

I have
import numpy as np
a = np.array([np.nan,2,3])
b = np.array([1,np.nan,2])
I want to apply a function to the a,b, is there a fast way of doing this. (like in Pandas, where we can do apply)
Specifically I am interesting in averaging a and b, but take the average to be one of the numbers when the other number is missing.
i.e. I want to return
np.array([1,2,2.5])
for the example above. However, I would like to know the answer to this in a more general setting (where I want to apply an operation element-wise to a number of numpy arrays)
You can use numpy.nanmean, which ignores NaNs:
np.nanmean([a, b], axis=0)
# array([ 1. , 2. , 2.5])
If you want to iterate some custom functions through NumPy arrays with the efficiency of NumPy's universal functions (ufunc), the choices are
Write your own C code
Use the ufuncify method of SymPy to generate code for you.
Here is an example of the latter, where the function is exp(x) + log(y) (since NumPy's ufuncs exp and log are already available, this is just for demonstration):
import numpy as np
import sympy as sym
from sympy.utilities.autowrap import ufuncify
x, y = sym.symbols('x y')
f = ufuncify([x, y], sym.exp(x) + sym.log(y))
Now applying f(np.array([1, 2, 3]), np.array([4, 5, 6])) will return NumPy array [4.10457619, 8.99849401, 21.87729639] in a way that's not a Python loop but a call to (by default) compiled Fortran code.
(But in practice, you are likely to find that NumPy already has some ufuncs that do what you want, if combined in a right way.)

Why does deepcopy change values of numpy array?

I am having a problem in which values in a Numpy array change after copying it with copy.deepcopy or numpy.copy, in fact, I get different values if I just print the array first before copying it.
I am using Python 3.5, Numpy 1.11.1, Scipy 0.18.0
My starting array is contained in a list of tuples; each tuple is pair: a float (a time point) and a numpy array (the solution of an ODE at that time point), e.g.:
[(0.0, array([ 0., ... 0.])), ...
(3.0, array([ 0., ... 0.]))]
In this case, I want the array for the last time point.
When I call the following:
tandy = c1.IntegrateColony(3)
ylast = copy.deepcopy(tandy[-1][1])
print(ylast)
I get something that makes sense for the system I'm trying to simulate:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
However, with the following:
tandy = c1.IntegrateColony(3)
print(tandy[-1][1])
ylast = copy.deepcopy(tandy[-1][1])
print(ylast)
I get all zeros:
[0.00000000e+00 0.00000000e+00 ... 0.00000000e+00 0.00000000e+00]
[ 0. 0. ... 0. 0.]
I should add, with larger systems and different parameters, displaying tandy[k][1] (either with print() or just by calling it in the command line) shows all non-zero values that are all very close to zero, i.e. <1e-70, but that's still not sensible for the system.
With:
tandy = c1.IntegrateColony(3)
ylast = np.copy(tandy[-1][1])
print(ylast)
I get sensible output again:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
The function that generates 'tandy' is the following (edited for clarity), which uses scipy.integrate.ode, and the set_solout method to get the solution at intermediate time points:
def IntegrateColony(self, tmax=1):
# I edited out initialization of dCdt & first_step for clarity.
y = ode(dCdt)
y.set_integrator('dopri5', first_step=dt0, nsteps=2000)
sol = []
def solout(tcurrent, ytcurrent):
sol.append((tcurrent, ytcurrent))
y.set_solout(solout)
y.set_initial_value(y=C0, t=0)
yfinal = y.integrate(tmax)
return sol
Although I could get the last time point by returning yfinal, I'd like to get the whole time course once I figure out why it's behaving the way it is.
Thanks for your suggestions!
Mickey
Edit:
If I print all of sol (print(tandy) or print(IntegrateColony...), it comes out as shown above (with the values in the arrays as 0), i.e.:
[(0.0, array([ 0., ... 0.])), ...
(3.0, array([ 0., ... 0.]))]
However, if I copy it with (y = copy.deepcopy(tandy); print(y)), the arrays take on values between 1e-7 and 1e+1.
If I do print(tandy[-1][1]) twice in a row, they're filled with zeros, but the format changes (from 0.0000 to 0.).
One other feature I noticed while following the suggestions in LutzL's and hpaulj's comments: if I run tandy = c1.IntegrateColony(3) in the console (running Spyder), the arrays are filled with zeros in the variable explorer. However, if I run the following in the console:
tandy = c1.IntegrateColony(3); ylast=copy.deepcopy(tandy)
Both the arrays in tandy and in ylast are filled with values in the range I would expect, and print(tandy[-1][1]) now gives:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
Even if I find a solution that stops this behavior, I'd appreciate anyone's insight about what's going on so I don't make the same mistakes again.
Thanks!
Edit:
Here's a simple case that gives this behavior:
import numpy as np
from scipy.integrate import ode
def testODEint(tmax=1):
C0 = np.ones((3,))
# C0 = 1 # This seems to behave the same
def dCdt_simpleinputs(t, C):
return C
y = ode(dCdt_simpleinputs)
y.set_integrator('dopri5')
sol = []
def solout(tcurrent, ytcurrent):
sol.append((tcurrent, ytcurrent)) # Behaves oddly
# sol.append((tcurrent, ytcurrent.copy())) # LutzL's idea: Works
y.set_solout(solout)
y.set_initial_value(y=C0, t=0)
yfinal = y.integrate(tmax)
return sol
tandy = testODEint(1)
ylast = np.copy(tandy[-1][1])
print(ylast) # Expect same values as tandy[-1][1] below
tandy = testODEint(1)
tandy[-1][1]
print(tandy[-1][1]) # Expect same values as ylast above
When I run this, I get the following output for ylast and tandy[-1][1]:
[ 2.71828196 2.71828196 2.71828196]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00]
The code I was working on when I ran into this problem is an embarrassing mess, but if you want to take a look, an old version is here: https://github.com/mvondassow/BryozoanModel2
The details of why this is happening are tied to how ytcurrent is handled in integrate. But there are various contexts in Python where all values of a list end up the same - contrary to expectations.
For example:
In [159]: x
Out[159]: [0, 1, 2]
In [160]: x=[]
In [161]: y=np.array([1,2,3])
In [162]: for i in range(3):
...: y += i
...: x.append(y)
In [163]: x
Out[163]: [array([4, 5, 6]), array([4, 5, 6]), array([4, 5, 6])]
All elements of x have the same value - because they all are pointers to the same y, and thus show its final value.
but if I copy y before appending it to the list, I see the changes.
In [164]: x=[]
In [165]: for i in range(3):
...: y += i
...: x.append(y.copy())
In [166]: x
Out[166]: [array([4, 5, 6]), array([5, 6, 7]), array([7, 8, 9])]
In [167]:
Now that does not explain why the print statement changes the values. But that whole solout callback mechanism is a bit obscure. I wonder if there are any warnings in scipy about pitfalls in defining such a callback?

Replace zero array with new values one by one NumPy

I stuck with a simple question in NumPy. I have an array of zero values. Once I generate a new value I would like to add it one by one.
arr=array([0,0,0])
# something like this
l=[1,5,10]
for x in l:
arr.append(x) # from python logic
so I would like to add one by one x into array, so I would get: 1st iteration arr=([1,0,0]); 2d iteration arr=([1,5,0]); 3rd arr=([1,5,10]);
Basically I need to substitute zeros with new values one by one in NumPy (I am learning NumPy!!!!!!).
I checked many of NumPy options like np.append (it adds to existing values new values), but can't find the right.
thank you
There are a few things to pick up with numpy:
you can generate the array full of zeros with
>>> np.zeros(3)
array([ 0., 0., 0.])
You can get/set array elements with indexing as with lists etc:
arr[2] = 7
for i, val in enumerate([1, 5, 10]):
arr[i] = val
Or, if you want to fill with array with something like a list, you can directly use:
>>> np.array([1, 5, 10])
array([ 1, 5, 10])
Also, numpy's signature for appending stuff to an array is a bit different:
arr = np.append(arr, 7)
Having said that, you should just consider diving into Numpy's own userguide.

Looping through slices of Theano tensor

I have two 2D Theano tensors, call them x_1 and x_2, and suppose for the sake of example, both x_1 and x_2 have shape (1, 50). Now, to compute their mean squared error, I simply run:
T.sqr(x_1 - x_2).mean(axis = -1).
However, what I wanted to do was construct a new tensor that consists of their mean squared error in chunks of 10. In other words, since I'm more familiar with NumPy, what I had in mind was to create the following tensor M in Theano:
M = [theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1) for i in xrange(0, 50, 10)]
Now, since Theano doesn't have for loops, but instead uses scan (which map is a special case of), I thought I would try the following:
sequence = T.arange(0, 50, 10)
M = theano.map(lambda i: theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1), sequence)
However, this does not seem to work, as I get the error:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Is there a way to loop through the slices using theano.scan (or map)? Thanks in advance, as I'm new to Theano!
Similar to what can be done in numpy, a solution would be to reshape your (1, 50) tensor to a (1, 10, 5) tensor (or even a (10, 5) tensor), and then to compute the mean along the second axis.
To illustrate this with numpy, suppose I want to compute means by slices of 2
x = np.array([0, 2, 0, 4, 0, 6])
x = x.reshape([3, 2])
np.mean(x, axis=1)
outputs
array([ 1., 2., 3.])

Resources