Why is Pymc3 ADVI worse than MCMC in this logistic regression example? - logistic-regression

I am aware of the mathematical differences between ADVI/MCMC, but I am trying to understand the practical implications of using one or the other. I am running a very simple logistic regressione example on data I created in this way:
import pandas as pd
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
def logistic(x, b, noise=None):
L = x.T.dot(b)
if noise is not None:
L = L+noise
return 1/(1+np.exp(-L))
x1 = np.linspace(-10., 10, 10000)
x2 = np.linspace(0., 20, 10000)
bias = np.ones(len(x1))
X = np.vstack([x1,x2,bias]) # Add intercept
B = [-10., 2., 1.] # Sigmoid params for X + intercept
# Noisy mean
pnoisy = logistic(X, B, noise=np.random.normal(loc=0., scale=0., size=len(x1)))
# dichotomize pnoisy -- sample 0/1 with probability pnoisy
y = np.random.binomial(1., pnoisy)
And the I run ADVI like this:
with pm.Model() as model:
# Define priors
intercept = pm.Normal('Intercept', 0, sd=10)
x1_coef = pm.Normal('x1', 0, sd=10)
x2_coef = pm.Normal('x2', 0, sd=10)
# Define likelihood
likelihood = pm.Bernoulli('y',
pm.math.sigmoid(intercept+x1_coef*X[0]+x2_coef*X[1]),
observed=y)
approx = pm.fit(90000, method='advi')
Unfortunately, no matter how much I increase the sampling, ADVI does not seem to be able to recover the original betas I defined [-10., 2., 1.], while MCMC works fine (as shown below)
Thanks' for the help!

This is an interesting question! The default 'advi' in PyMC3 is mean field variational inference, which does not do a great job capturing correlations. It turns out that the model you set up has an interesting correlation structure, which can be seen with this:
import arviz as az
az.plot_pair(trace, figsize=(5, 5))
PyMC3 has a built-in convergence checker - running optimization for to long or too short can lead to funny results:
from pymc3.variational.callbacks import CheckParametersConvergence
with model:
fit = pm.fit(100_000, method='advi', callbacks=[CheckParametersConvergence()])
draws = fit.sample(2_000)
This stops after about 60,000 iterations for me. Now we can inspect the correlations and see that, as expected, ADVI fit axis-aligned gaussians:
az.plot_pair(draws, figsize=(5, 5))
Finally, we can compare the fit from NUTS and (mean field) ADVI:
az.plot_forest([draws, trace])
Note that ADVI is underestimating variance, but fairly close for the mean of each parameter. Also, you can set method='fullrank_advi' to capture the correlations you are seeing a little better.
(note: arviz is soon to be the plotting library for PyMC3)

Related

spatial interpolation with gaussion process regression

I have a csv-file with 140.000 points(rows). It consists of:
longitude value
latitude value
subsidence value at specific points. I assume that these points are spatially correlated.
I want to perform a spatial interpolation analysis of the area of the points. Meaning, I will do a geostatistical interpolation analysis using for example Kriging i.e gaussian process regression.
I'm reading on the sci-kit learn page about gaussian regression. But I'm unsure how to implement it.
What characteristics determine which kernel I can use? How do I implement this with my spatial data correctly?
First, you should convert your data to a projected coordinate system. The best one depends on where your data are located; essentially you want the conformal projection with the least amount of distortion for your location (e.g. Mercator near the equator, or Transverse Mercator if your data are all close to a single meridian. You can achieve this in geopandas for example:
import pandas as pd
import geopandas as gpd
data = {'latitude': [54, 56, 58], 'longitude': [-62, -63, -64], 'subsidence': [10, 20, 30]}
df = pd.DataFrame(data)
params ={
'geometry': gpd.points_from_xy(df.longitude, df.latitude),
'crs': 'epsg:4326', # WGS84
}
gdf_ = gpd.GeoDataFrame(df, **params)
gdf = gdf_.to_crs('epsg:2961') # UTM20N
gdf
This GeoDataFrame is now in projected coordinates. Now you can do some spatial prediction:
import numpy as np
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process import GaussianProcessRegressor
kernel = RBF(length_scale=100_000)
gpr = GaussianProcessRegressor(kernel=kernel)
X = np.array([gdf.geometry.x, gdf.geometry.y]).T
y = gdf.subsidence
gpr.fit(X, y)
Now you can predict at a location, e.g. gpr.predict([(500_000, 5_900_000)]) gives array([22.86764555]) for my toy data.
To predict on a grid, you could do this:
x_min, x_max = np.min(gdf.geometry.x) - 10_000, np.max(gdf.geometry.x) + 10_000
y_min, y_max = np.min(gdf.geometry.y) - 10_000, np.max(gdf.geometry.y) + 10_000
grid_y, grid_x = np.mgrid[y_min:y_max:10_000, x_min:x_max:10_000]
X_grid = np.stack([grid_x.ravel(), grid_y.ravel()]).T
y_grid = gpr.predict(X_grid).reshape(grid_x.shape)
Things to think about:
You should read the docs for geopandas and sklearn.gaussian_process
You should fit the kernel to your data.
You might want to use an anisotropic kernel.
The estimator has a few hypterparameters which you should pay attention to.
Don't forget to do some validation of your estimates, check the distribution of the residuals, etc.
You might want to use a specialist geostats package like gstools, which will do a lot of the fiddly things for you.

Why does importing the numpy zeros function fail for parallelization using numba?

According to the Numba docs, numpy array creation functions zeros and ones should be supported. However, testing this with simple functions leads to a nopython error when I import the zeros function from numpy. However, if I do import numpy as np and use np.zeros, there is no problem. Is there some difference in the functions I'm getting from numpy? I'd prefer only to import the functions I need, rather than the entire numpy library.
This code snippet fails:
from numpy import array
from numpy import zeros
from numpy.random import rand
from numba import njit, prange
# #njit()
#njit(parallel=True)
def prange_test(A):
s = 0
z = zeros((3, 3))
for i in prange(A.shape[0]):
s += A[i]
return s
A = rand(10)
test = prange_test(A)
This code snippet works:
from numpy import array
from numpy.random import rand
from numba import njit, prange
import numpy as np
#njit(parallel=True)
def prange_test(A):
s = 0
z = np.zeros((3, 3))
for i in prange(A.shape[0]):
s += A[i]
return s
A = rand(10)
test = prange_test(A)
I'm using Numba version 0.35.0, Numpy version 1.13.2
Let's go step by step
a ) the #numba.njit( parallel = True ) decorator's parallel option is (cit.) "experimental" in its efforts to auto-detect chances in the code to introduce some form of parallelism.
b ) the code is almost exactly the code-snippet from numba documentation, using almost exactly the same prange()-constructor code-block, but inside an #autojit decorated example:
from numba import autojit, prange
#autojit
def parallel_sum(A):
sum = 0.0
for i in prange(A.shape[0]):
sum += A[i]
return sum
c ) error message reports problems inside almost with such auto-detect transformation related to the line 12 which only weakly referenced might be s += A[i], referring to some kind of a problem inside the "automated-understanding" of the intent expressed in the Intermediate Representation of the code-block, where the prange-index ought be used - Var($parfor_index_tuple_var.14) but some type-related or tuple-decoupling-related problem was not able to get resolved by numba.jit-LLVM translator. Yet, the traceback also mentions call_parallel_gufunc to have problems to detect the upper bound of the prange-constructor stop = load_range( stop ), whereas the numba documentation so far mentions that only CPU-directed parallel-code is supported ( not any { GPU | guvectorize | et al }-non-CPU-kernel(s) ), here a better documented MCVE altogether with matching error Traceback would be appreciated, instead of a weakly referring PNG-picture.
d ) last but not least, the numba requires as a mandatory step in the documentation the parallel=True to be used only (cit.) "in conjunction with nopython=True"
How to proceed?
1 ) test the above copied numba-published code as-is, to see, whether the newer release of numba still keeps all the promises that were already working in the previous releases. I.e. use #numba.autojit-decorator and re-run the exact code copy to { POSACK | NACK }-this test.
2 ) test the code, POSACK-ed from step 1, this time under #numba.njit( parallel = True, nopython = True ) decorator ( no other change except the decorator ) to
{ POSACK | NACK }-influence of the decorator-policy.
3 ) test the code, POSACK-ed from step 2, this time with other modifications
Conceptual remarks:
With all due respect to the numba-team, there could hardly be a worse example of parallel and prange() anti-pattern than this one.
Besides the awfully immense overhead costs of the [PAR]-process section setup and an absolutely nothing to efficiently compute in parallel ( just notice the actual value dependency-graph .. ) the criticism on the Amdahl's Law initial, add-on overheads-agnostic, formulation shows how much one can pay for principally just worse than original performance. Parallel process scheduling typically has exactly the opposite motivation.
If indeed interested in smarter code-execution, use numba.jit having much better performance/cost ratio:
shave off any residual type-analyses related parts of the IR-code using explicit announcements of the calling-interface signatures
avoid memory allocations inside the performance-tuned code, rather pre-allocate and pass as another parameter
extend calling interface, so as to avoid things well known at the caller side to be deferred into the numba-automated code-analyses
#numba.jit( 'float64( float64[:], int64, float64[:,:] )', nogil = True, nopython = True )
def prange_test( vectorA, #
vectorAshape0, # avoids numba-code to speculate on type
arrayZ # avoids "local" new memory allocation
):
sum = 0
...
return sum
Performance?
from zmq import Stopwatch; aClk = Stopwatch()
def a_just_vectorised_sum( vectorA ):
return vectorA.sum()
A = np.random.rand( 1000000 )
aClk.start(); s = a_just_vectorised_sum( A ); aClk.stop()
1145L
1190L
1188L
Benchmark. Always. Always on a real-world sized dataset. Never rely on a schoolbook sized artifacts, but go into real-world scales.
Results show that the 1.000.000 cell-sized vector took about 1,200 [us] ~ 0.0012 [s] to sum(), leaving less than about 1.2 [ns] per cell sum()-ed this sets a yardstick to compare any other implementation against.

How to do «go to left/right/forward/backward» with python dronekit?

I am using APM for autopiloting my hexacopter and following this tutorial. by looking at this available commands, I cannot see how one can command the drone to go to left/right/forward/backward?
Can anybody help me on this?
You need to create a vehicle.message_factory.set_position_target_local_ned_encode.
It will require a frame of mavutil.mavlink.MAV_FRAME_BODY_NED.
You the add the required x,y and/or z velocities (in m/s) to the message.
from pymavlink import mavutil
from dronekit import connect, VehicleMode, LocationGlobalRelative
import time
def send_body_ned_velocity(velocity_x, velocity_y, velocity_z, duration=0):
msg = vehicle.message_factory.set_position_target_local_ned_encode(
0, # time_boot_ms (not used)
0, 0, # target system, target component
mavutil.mavlink.MAV_FRAME_BODY_NED, # frame Needs to be MAV_FRAME_BODY_NED for forward/back left/right control.
0b0000111111000111, # type_mask
0, 0, 0, # x, y, z positions (not used)
velocity_x, velocity_y, velocity_z, # m/s
0, 0, 0, # x, y, z acceleration
0, 0)
for x in range(0,duration):
vehicle.send_mavlink(msg)
time.sleep(1)
connection_string = 'tcp:192.168.1.2:5760' # Edit to suit your needs.
takeoff_alt = 10
vehicle = connect(connection_string, wait_ready=True)
while not vehicle.is_armable:
time.sleep(1)
vehicle.mode = VehicleMode("GUIDED")
vehicle.armed = True
while not vehicle.armed:
print('Waiting for arming...')
time.sleep(1)
vehicle.simple_takeoff(takeoff_alt) # Take off to target altitude
while True:
print('Altitude: %d' % self.vehicle.location.global_relative_frame.alt)
if vehicle.location.global_relative_frame.alt >= takeoff_alt * 0.95:
print('REACHED TARGET ALTITUDE')
break
time.sleep(1)
# This is the command to move the copter 5 m/s forward for 10 sec.
velocity_x = 0
velocity_y = 5
velocity_z = 0
duration = 10
send_body_ned_velocity(velocity_x, velocity_y, velocity_z, duration)
# backwards at 5 m/s for 10 sec.
velocity_x = 0
velocity_y = -5
velocity_z = 0
duration = 10
send_body_ned_velocity(velocity_x, velocity_y, velocity_z, duration)
vehicle.mode = VehicleMode("LAND")
Have fun, and of course, observe rigorous safe guards when programming and flying UAVs. A manual mode override is a must!
Dronekit-python has non trivial APIs to command the drone in the local frame. From my personal experience, it was hard to get my head around on using these commands make my drone follow a shape locally, e.g a square or a circle.
An alternative is using FlytOS drone APIs. If you see this sample python code on github you can see how easy it is to do command the drone to go left x meters then forward y meters etc. Jon's answer does show correctly how dronekit could be used to achieve what you are trying to do, but another beginner who might be intimidated by the complex code.
I had the same problem, I hope this link helps look for the "controlling by specifying the vehicle’s velocity components" section of the article here

Theano - logistic regression example weight vector becomes NaN?

I am doing a tutorial (code here) and video here (13:00 minutes in).
My only change is using the mnist training set from a different location (creating a one-hot encoding) but it is not working. I literally copy-pasted all the code (except for the mnist loading) in this example. Here is the code:
import theano
from theano import tensor as T
import numpy as np
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST Original")
trX, teX, trY_digit, teY_digit = train_test_split(mnist.data, mnist.target, test_size=.4)
#Get one-hot encoding
enc = OneHotEncoder()
enc.fit([[n] for n in range(10)])
trY, teY = sparse_to_floatX(enc.transform(trY_digit[:,newaxis])), sparse_to_floatX(enc.transform(teY_digit[:,newaxis]))
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.1))
def model(X, w):
return T.nnet.softmax(T.dot(X, w))
X = T.fmatrix()
Y = T.fmatrix()
w = init_weights((784, 10))
py_x = model(X, w)
y_pred = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
update = [[w, w - gradient * 0.05]]
train = theano.function(inputs=[X, Y], outputs=cost, updates=update, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_pred, allow_input_downcast=True)
for i in range(10):
print w.get_value()
cost = train(trX, trY)
print i, predict(teX)
The weight vector updates once, and becomes all NaN on the second update. I am very new to theano, but I am looking for tips to figure this out, especially if someone has already done this tutorial.
UPDATE.
It looks like the gradient is the issue.
When I add this
the_grad = T.sum(gradient)
f_grad = theano.function(inputs=[X, Y], outputs=the_grad, allow_input_downcast=True)
print f_grad(trX, trY)
It prints NaN. This appears to be the correct usage of T.grad though.
UPDATE 2.
When I change the cost function to this:
cost = T.mean(T.sum(T.sqr(py_x - Y), axis=1), axis=0)
It is working now but I only have 70% accuracy which is really bad.
UPDATE 3.
I downloaded the MNIST data used in the tutorial and it worked with 92% accuary.
I am not sure why my first mnist datasource was dying with the crossentropy cost, and then performing really poor with mean squared error cost function.

Mathematical Operations on a Jython array

I am trying to do simple math operations on every element of a Jython array in the following manner:
import math
for i in xrange (x*y*z):
medfiltArray[i] = 2 * math.sqrt(medfiltArray[i] + (3.0/8.0) )
InputImgArray[i] = 2 * math.sqrt(InputImgArray[i] + (3.0/8.0) )
The problem is that my array is large (8388608 elements) and the process takes a little more than 12 seconds. Is there a more efficient way to do this whole process? I found a slightly more faster way (about 7 seconds):
medfiltArray = map(lambda x: 2 * math.sqrt(x + (3.0/8.0) ) , medfiltArray)
The advantage of the for loop over this method is that I can modify several arrays of the same size simultaneously and therefore save up on net time. But despite all this, this is still very slow. In MATLAB modifying a matrix would take less than a second:
img = 2 * sqrt(img + (3/8));
Any tips on modifying arrays in Jython would be very appreciated. Thanks !!!
Python comes with batteries included but no good matrix batteries. Fortunately NumPy fixes that but unfortunately I don't know of the Jython alternatives from personal experience, only what a couple searches reveal: jnumeric (seems outdated), http://acs.lbl.gov/ACSSoftware/colt/ (outdated as well?), http://mail.scipy.org/pipermail/numpy-discussion/2012-August/063751.html and its SO link: Using NumPy and Cpython with Jython ..
In any case a simple CPython/NumpPy example could look like this:
import numpy as np
# dummy init values:
x = 800
y = 100
z = 100
length = x*y*z
medfiltArray = np.arange(length, dtype='f')
InputImgArray = np.arange(length, dtype='f')
# m is a constant, no reason to recalculate it 8million times
m = (3.0/8.0)
medfiltArray = 2 * np.sqrt(medfiltArray + m)
InputImgArray = 2 * np.sqrt(InputImgArray + m)
# timed, it runs in:
# real 0m0.161s
# user 0m0.131s
# sys 0m0.032s
Good luck finding your Jython alternative, I hope this sets you onto the right path.
There is a fast vector and matrix java library called Vectorz. Vectorz can be imported in Jython and does the computation described in my question in about 200 ms. The user will have to switch over from the python (or java) arrays in Jython and use Vectorz arrays. There is also another solution, if you are doing image processing (like me), there is a program called ImageJ and it has extensive functionality. I am working on an ImageJ plugin and to do these math operations you can also use internal ImageJ math commands:
IJ.run(InputImg, "32-bit", "");
IJ.run(InputImg, "Add...", "value=0.375 stack");
IJ.run(InputImg, "Square Root", "stack");
IJ.run(InputImg, "Multiply...", "value=2 stack");
This takes only .1 sec.

Resources