spatial interpolation with gaussion process regression - artificial-intelligence

I have a csv-file with 140.000 points(rows). It consists of:
longitude value
latitude value
subsidence value at specific points. I assume that these points are spatially correlated.
I want to perform a spatial interpolation analysis of the area of the points. Meaning, I will do a geostatistical interpolation analysis using for example Kriging i.e gaussian process regression.
I'm reading on the sci-kit learn page about gaussian regression. But I'm unsure how to implement it.
What characteristics determine which kernel I can use? How do I implement this with my spatial data correctly?

First, you should convert your data to a projected coordinate system. The best one depends on where your data are located; essentially you want the conformal projection with the least amount of distortion for your location (e.g. Mercator near the equator, or Transverse Mercator if your data are all close to a single meridian. You can achieve this in geopandas for example:
import pandas as pd
import geopandas as gpd
data = {'latitude': [54, 56, 58], 'longitude': [-62, -63, -64], 'subsidence': [10, 20, 30]}
df = pd.DataFrame(data)
params ={
'geometry': gpd.points_from_xy(df.longitude, df.latitude),
'crs': 'epsg:4326', # WGS84
}
gdf_ = gpd.GeoDataFrame(df, **params)
gdf = gdf_.to_crs('epsg:2961') # UTM20N
gdf
This GeoDataFrame is now in projected coordinates. Now you can do some spatial prediction:
import numpy as np
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process import GaussianProcessRegressor
kernel = RBF(length_scale=100_000)
gpr = GaussianProcessRegressor(kernel=kernel)
X = np.array([gdf.geometry.x, gdf.geometry.y]).T
y = gdf.subsidence
gpr.fit(X, y)
Now you can predict at a location, e.g. gpr.predict([(500_000, 5_900_000)]) gives array([22.86764555]) for my toy data.
To predict on a grid, you could do this:
x_min, x_max = np.min(gdf.geometry.x) - 10_000, np.max(gdf.geometry.x) + 10_000
y_min, y_max = np.min(gdf.geometry.y) - 10_000, np.max(gdf.geometry.y) + 10_000
grid_y, grid_x = np.mgrid[y_min:y_max:10_000, x_min:x_max:10_000]
X_grid = np.stack([grid_x.ravel(), grid_y.ravel()]).T
y_grid = gpr.predict(X_grid).reshape(grid_x.shape)
Things to think about:
You should read the docs for geopandas and sklearn.gaussian_process
You should fit the kernel to your data.
You might want to use an anisotropic kernel.
The estimator has a few hypterparameters which you should pay attention to.
Don't forget to do some validation of your estimates, check the distribution of the residuals, etc.
You might want to use a specialist geostats package like gstools, which will do a lot of the fiddly things for you.

Related

Receiving different measured values from crossK and lohboot

I have a marked ppp dataset looking at crimes and their relation to locations.
I am performing an inhomogeneous cross-K using the Kcross.inhom, and am using lohboot to bootstrap confidence intervals around the inhomogenous cross-K. However, I am getting different measured values of the iso for the two when we would anticipate identical values.
The crime dataset is 26k rows, unsure of how to subset to create a reproducible example.
#creating the ppp
crime.coords = as.data.frame(st_coordinates(crime)) #coordinates of crimes
center.coords = as.data.frame(st_coordinates(center)) #coordinates of locations
temp = rbind(data.frame(x=crime.coords$X,y=crime.coords$Y,type='crime'),
data.frame(x=center.coords$X,y=center.coords$Y,type='center')) #df for maked ppp
temp = ppp(temp[,1],temp[,2], window=owin(border.coords), marks=relevel(as.factor(temp$type), 'crime')) #creating marked ppp
#creating an intensity model of the crimes
temp = rescale(temp, 10000) #rescaling for polynomial model coefficients
crime.ppp = unmark(split(temp)$crime)
model.crime = ppm(crime.ppp ~ polynom(x, y, 2), Poisson())
ck = Kcross.inhom(temp, i = 'crime', j = 'center', lambdaI = model.crime) #cross K w/ intensity function
ckenv = lohboot(temp, fun='Kcross.inhom', i = 'crime', j='center', lambdaI = model.crime) #bootstrapped CIs for cross K w/ intensity function
Here are the values plotted, showing different curves:
A few things I've noted are that the r are different for both functions, and setting the lohboot r does not in fact make them identical. Unsure of where to go from here, exhausted all my resources in finding a solution. Thank you in advance.
These curves are not guaranteed to be equal. lohboot subdivides the data, randomly resamples the subdivisions, computes the contributions from these randomly selected subdivisions, and averages them. If you repeat the experiment you should get a slightly different answer from lohboot each time. See the help file for lohboot.
It would desirable that the two curves are close. Unfortunately the default behaviour of lohboot does not often achieve that. For consistency, the default behaviour follows the original implementation which was not very good. Try setting block = TRUE for better performance. Also try the other options basicboot and Vcorrection.

Plotting a 2D vector with separate component arrays

I feel like this is a pretty basic question but I can't see to get my head around it. I have a velocity vector V with two components in x and in y that both depend on time. v_x(t) = sin(at) and v_y(t) = exp(bt).
I have created an array for t ranging from 0 to 100 with the function np.arange(0,100,1). I want to plot with matplotlib the resulting vector and its evolution with respect to t. How do I do that?
A simple way that you might try is the following:
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0,100,1)
a = 0.1
b = 0.05
vel = np.array([np.sin(a*t), np.exp(b*t)],float)
plt.plot(vel[0,:],vel[1,:])
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()
This gave me the plot
The line vel = np.array([np.sin(a*t), np.exp(b*t)],float) basically does all the magic. np.sin(a*t) makes a new array using each value in t to calculate each element (and np.exp() works similarly).
It would also be possible (and fun) to make an animation of the evolution of the vector.

Why is Pymc3 ADVI worse than MCMC in this logistic regression example?

I am aware of the mathematical differences between ADVI/MCMC, but I am trying to understand the practical implications of using one or the other. I am running a very simple logistic regressione example on data I created in this way:
import pandas as pd
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
def logistic(x, b, noise=None):
L = x.T.dot(b)
if noise is not None:
L = L+noise
return 1/(1+np.exp(-L))
x1 = np.linspace(-10., 10, 10000)
x2 = np.linspace(0., 20, 10000)
bias = np.ones(len(x1))
X = np.vstack([x1,x2,bias]) # Add intercept
B = [-10., 2., 1.] # Sigmoid params for X + intercept
# Noisy mean
pnoisy = logistic(X, B, noise=np.random.normal(loc=0., scale=0., size=len(x1)))
# dichotomize pnoisy -- sample 0/1 with probability pnoisy
y = np.random.binomial(1., pnoisy)
And the I run ADVI like this:
with pm.Model() as model:
# Define priors
intercept = pm.Normal('Intercept', 0, sd=10)
x1_coef = pm.Normal('x1', 0, sd=10)
x2_coef = pm.Normal('x2', 0, sd=10)
# Define likelihood
likelihood = pm.Bernoulli('y',
pm.math.sigmoid(intercept+x1_coef*X[0]+x2_coef*X[1]),
observed=y)
approx = pm.fit(90000, method='advi')
Unfortunately, no matter how much I increase the sampling, ADVI does not seem to be able to recover the original betas I defined [-10., 2., 1.], while MCMC works fine (as shown below)
Thanks' for the help!
This is an interesting question! The default 'advi' in PyMC3 is mean field variational inference, which does not do a great job capturing correlations. It turns out that the model you set up has an interesting correlation structure, which can be seen with this:
import arviz as az
az.plot_pair(trace, figsize=(5, 5))
PyMC3 has a built-in convergence checker - running optimization for to long or too short can lead to funny results:
from pymc3.variational.callbacks import CheckParametersConvergence
with model:
fit = pm.fit(100_000, method='advi', callbacks=[CheckParametersConvergence()])
draws = fit.sample(2_000)
This stops after about 60,000 iterations for me. Now we can inspect the correlations and see that, as expected, ADVI fit axis-aligned gaussians:
az.plot_pair(draws, figsize=(5, 5))
Finally, we can compare the fit from NUTS and (mean field) ADVI:
az.plot_forest([draws, trace])
Note that ADVI is underestimating variance, but fairly close for the mean of each parameter. Also, you can set method='fullrank_advi' to capture the correlations you are seeing a little better.
(note: arviz is soon to be the plotting library for PyMC3)

Theano - logistic regression example weight vector becomes NaN?

I am doing a tutorial (code here) and video here (13:00 minutes in).
My only change is using the mnist training set from a different location (creating a one-hot encoding) but it is not working. I literally copy-pasted all the code (except for the mnist loading) in this example. Here is the code:
import theano
from theano import tensor as T
import numpy as np
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST Original")
trX, teX, trY_digit, teY_digit = train_test_split(mnist.data, mnist.target, test_size=.4)
#Get one-hot encoding
enc = OneHotEncoder()
enc.fit([[n] for n in range(10)])
trY, teY = sparse_to_floatX(enc.transform(trY_digit[:,newaxis])), sparse_to_floatX(enc.transform(teY_digit[:,newaxis]))
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.1))
def model(X, w):
return T.nnet.softmax(T.dot(X, w))
X = T.fmatrix()
Y = T.fmatrix()
w = init_weights((784, 10))
py_x = model(X, w)
y_pred = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
update = [[w, w - gradient * 0.05]]
train = theano.function(inputs=[X, Y], outputs=cost, updates=update, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_pred, allow_input_downcast=True)
for i in range(10):
print w.get_value()
cost = train(trX, trY)
print i, predict(teX)
The weight vector updates once, and becomes all NaN on the second update. I am very new to theano, but I am looking for tips to figure this out, especially if someone has already done this tutorial.
UPDATE.
It looks like the gradient is the issue.
When I add this
the_grad = T.sum(gradient)
f_grad = theano.function(inputs=[X, Y], outputs=the_grad, allow_input_downcast=True)
print f_grad(trX, trY)
It prints NaN. This appears to be the correct usage of T.grad though.
UPDATE 2.
When I change the cost function to this:
cost = T.mean(T.sum(T.sqr(py_x - Y), axis=1), axis=0)
It is working now but I only have 70% accuracy which is really bad.
UPDATE 3.
I downloaded the MNIST data used in the tutorial and it worked with 92% accuary.
I am not sure why my first mnist datasource was dying with the crossentropy cost, and then performing really poor with mean squared error cost function.

ILNumerics: ILMath.ridge_regression

is there anyone know how to use ridge_regression in ILMath function?
I try to read the documents and search in several website, but I can't find the example.
Here is the method:
public static ILMath..::..ILRidgeRegressionResult<double> ridge_regression(
ILInArray<double> X,
ILInArray<double> Y,
ILBaseArray Degree,
ILBaseArray Regularization
)
click here to see the details of the function
I had a bit confuse with the "Regularization".
ILNumerics.ILMath.ridge_regression
Basically, ridge_regression learns a polynomial model from some test data. It works in a two step process:
1) In the learning phase you create a model. The model is represented by an instance of the ILRidgeRegressionResult class which is returned from ridge_regression:
using (var result = ridge_regression(Data,Labels,4,0.01)) {
// the model represented by 'result' is used within here
// to apply it to some unseen data... See step 2) below.
L.a = result.Apply(X + 0.6);
}
Here, X is some data set and Y is the set of 'labels' which corresponds to those X data. In this example, X is a linear vector and Y is the result of the sin() function on that vector. So the ridge_regression result represents a model which produces similar results as the sin() function - in certain limits. In real applications, X may be of any dimensionality.
2) Apply the model: the regression result is than used to estimate values corresponding to new, unseen data. We apply the model to data, which have the same number of dimensions as the original data. But some of the data points lay within the range, some outside of the range we used to learn the data from. The apply() function of the regression result object therefore allows us to interpolate as well as extrapolate data.
The complete example:
private class Computation : ILMath {
public static void Fit(ILPanel panel) {
using (ILScope.Enter()) {
// just some data
ILArray<double> X = linspace(0, 30, 20) / pi / 4;
// the underlying function. Here: sin()
ILArray<double> Y = sin(X);
// learn a model of 4th order, representing the sin() function
using (var result = ridge_regression(X, Y, 4, 0.002)) {
// the model represented by 'result' is used within here
// to apply it to some unseen data... See step 2) below.
ILArray<double> L = result.Apply(X + 0.6);
// plot the stuff: create a plotcube + 2 line XY-plots
ILArray<double> plotData = X.C; plotData["1;:"] = Y;
ILArray<double> plotDataL = X + 0.6; plotDataL["1;:"] = L;
panel.Scene.Add(new ILPlotCube() {
new ILLinePlot(tosingle(plotData), lineColor: Color.Black, markerStyle:MarkerStyle.Dot),
new ILLinePlot(tosingle(plotDataL), lineColor: Color.Red, markerStyle:MarkerStyle.Circle),
new ILLegend("Original", "Ridge Regression")
});
}
}
}
}
This produces the following result:
Some Notes
1) Use ridge_regression in a 'using' block (C#). This ensures that the data of the model which can be quite large are disposed off correctly.
2) The Regularization becomes more important, once you try to learn a model from data which may introduce some stability problems. You need to experiment with the regularization term and take the actual data into account.
3) In this example, you see the interpolation result fitting very nicely the original function. However, the underlying model is based on a polynomial. As common for (all/polynomial) models, the estimated values may reflect the underlying model less and less, the far you get from the original range of values used in the learning phase.

Resources