ILNumerics: ILMath.ridge_regression - arrays

is there anyone know how to use ridge_regression in ILMath function?
I try to read the documents and search in several website, but I can't find the example.
Here is the method:
public static ILMath..::..ILRidgeRegressionResult<double> ridge_regression(
ILInArray<double> X,
ILInArray<double> Y,
ILBaseArray Degree,
ILBaseArray Regularization
)
click here to see the details of the function
I had a bit confuse with the "Regularization".

ILNumerics.ILMath.ridge_regression
Basically, ridge_regression learns a polynomial model from some test data. It works in a two step process:
1) In the learning phase you create a model. The model is represented by an instance of the ILRidgeRegressionResult class which is returned from ridge_regression:
using (var result = ridge_regression(Data,Labels,4,0.01)) {
// the model represented by 'result' is used within here
// to apply it to some unseen data... See step 2) below.
L.a = result.Apply(X + 0.6);
}
Here, X is some data set and Y is the set of 'labels' which corresponds to those X data. In this example, X is a linear vector and Y is the result of the sin() function on that vector. So the ridge_regression result represents a model which produces similar results as the sin() function - in certain limits. In real applications, X may be of any dimensionality.
2) Apply the model: the regression result is than used to estimate values corresponding to new, unseen data. We apply the model to data, which have the same number of dimensions as the original data. But some of the data points lay within the range, some outside of the range we used to learn the data from. The apply() function of the regression result object therefore allows us to interpolate as well as extrapolate data.
The complete example:
private class Computation : ILMath {
public static void Fit(ILPanel panel) {
using (ILScope.Enter()) {
// just some data
ILArray<double> X = linspace(0, 30, 20) / pi / 4;
// the underlying function. Here: sin()
ILArray<double> Y = sin(X);
// learn a model of 4th order, representing the sin() function
using (var result = ridge_regression(X, Y, 4, 0.002)) {
// the model represented by 'result' is used within here
// to apply it to some unseen data... See step 2) below.
ILArray<double> L = result.Apply(X + 0.6);
// plot the stuff: create a plotcube + 2 line XY-plots
ILArray<double> plotData = X.C; plotData["1;:"] = Y;
ILArray<double> plotDataL = X + 0.6; plotDataL["1;:"] = L;
panel.Scene.Add(new ILPlotCube() {
new ILLinePlot(tosingle(plotData), lineColor: Color.Black, markerStyle:MarkerStyle.Dot),
new ILLinePlot(tosingle(plotDataL), lineColor: Color.Red, markerStyle:MarkerStyle.Circle),
new ILLegend("Original", "Ridge Regression")
});
}
}
}
}
This produces the following result:
Some Notes
1) Use ridge_regression in a 'using' block (C#). This ensures that the data of the model which can be quite large are disposed off correctly.
2) The Regularization becomes more important, once you try to learn a model from data which may introduce some stability problems. You need to experiment with the regularization term and take the actual data into account.
3) In this example, you see the interpolation result fitting very nicely the original function. However, the underlying model is based on a polynomial. As common for (all/polynomial) models, the estimated values may reflect the underlying model less and less, the far you get from the original range of values used in the learning phase.

Related

Receiving different measured values from crossK and lohboot

I have a marked ppp dataset looking at crimes and their relation to locations.
I am performing an inhomogeneous cross-K using the Kcross.inhom, and am using lohboot to bootstrap confidence intervals around the inhomogenous cross-K. However, I am getting different measured values of the iso for the two when we would anticipate identical values.
The crime dataset is 26k rows, unsure of how to subset to create a reproducible example.
#creating the ppp
crime.coords = as.data.frame(st_coordinates(crime)) #coordinates of crimes
center.coords = as.data.frame(st_coordinates(center)) #coordinates of locations
temp = rbind(data.frame(x=crime.coords$X,y=crime.coords$Y,type='crime'),
data.frame(x=center.coords$X,y=center.coords$Y,type='center')) #df for maked ppp
temp = ppp(temp[,1],temp[,2], window=owin(border.coords), marks=relevel(as.factor(temp$type), 'crime')) #creating marked ppp
#creating an intensity model of the crimes
temp = rescale(temp, 10000) #rescaling for polynomial model coefficients
crime.ppp = unmark(split(temp)$crime)
model.crime = ppm(crime.ppp ~ polynom(x, y, 2), Poisson())
ck = Kcross.inhom(temp, i = 'crime', j = 'center', lambdaI = model.crime) #cross K w/ intensity function
ckenv = lohboot(temp, fun='Kcross.inhom', i = 'crime', j='center', lambdaI = model.crime) #bootstrapped CIs for cross K w/ intensity function
Here are the values plotted, showing different curves:
A few things I've noted are that the r are different for both functions, and setting the lohboot r does not in fact make them identical. Unsure of where to go from here, exhausted all my resources in finding a solution. Thank you in advance.
These curves are not guaranteed to be equal. lohboot subdivides the data, randomly resamples the subdivisions, computes the contributions from these randomly selected subdivisions, and averages them. If you repeat the experiment you should get a slightly different answer from lohboot each time. See the help file for lohboot.
It would desirable that the two curves are close. Unfortunately the default behaviour of lohboot does not often achieve that. For consistency, the default behaviour follows the original implementation which was not very good. Try setting block = TRUE for better performance. Also try the other options basicboot and Vcorrection.

MatchIt - how to make matching date specific?

I'm trying to use MatchIt to create two sets of matched investment companies (treatment vs control).
I need to match the treatment companies to the control companies using only data from the 1-3 years proceeding the treatment.
For example if a company received treatment in 2009, then I would want to match it using data from 2009, 2008, 2007 (My after treatment effects dummy would hold a value from 2010 onwards in this case)
I am unsure how to add this selection into my matching code, which currently looks like this:
matchit(signatory ~ totalUSD + brownUSD + country + strategy, data = panel6, method = "full")
Should I consider using the 'after' treatments effects dummy in some way?
Any tips for how I add this in would be greatly appreciated!
There is no straightforward way to do this in MatchIt. You can set a caliper, which requires the control companies to be within a certain number of years from a treated company, but there isn't a way to require that control companies have a year strictly before the treated company. You can perform exact matching on year so that the treated and control companies have exactly the same year using the exact argument.
Another, slightly more involved way is to construct a distance matrix yourself and set to Inf any distances between units that are forbidden to match with each other. The first step would be estimating a propensity score, which you can do manually or using matchit(). Then you construct a distance matrix, and for each entry in the distance matrix, decide whether to set the distance to Inf. FInaly, you can supply the distance matrix to the distance argument of matchit(). Here's how you would do that:
#Estimate the propensity score
ps <- matchit(signatory ~ totalUSD + brownUSD + country + strategy,
data = panel6, method = NULL)$distance
#Create the distance matrix
dist <- optmatch::match_on(signatory ~ ps, data = panel6)
#Loop through the matrix and set set disallowed matches to Inf
t <- which(panel6$signatory == 1)
u <- which(panel6$signatory != 1)
for (i in seq_along(t)) {
for (j in seq_along(u)) {
if (panel6$year[u[j]] > panel6$year[t[i]] || panel6$year[u[j]] < panel6$year[t[i]] - 2)
dist[i,j] <- Inf
}
}
#Note: can be vectorized for speed but shouldn't take long regardless
#Supply the distance matrix to matchit() and match
m <- matchit(signatory ~ totalUSD + brownUSD + country + strategy,
data = panel6, method = "full", distance = dist)
That should work. You can verify by looking at individual groups of matched companies using match.data():
md <- match.data(m, data = panel6)
md <- md[with(md, order(subclass, signatory)),]
View(md) #assuming you're using RStudio
You should see that within subclasses, the control units are 0-2 years below the treated units.

Interpolate 2D Array to single point in MATLAB

I have 3 graphs of an IV curve (monotonic increasing function. consider a positive quadratic function in the 1st quadrant. Photo attached.) at 3 different temperatures that are not obtained linearly. That is, one is obtained at 25C, one at 125C and one at 150C.
What I want to make is an interpolated 2D array to fill in the other temperatures. My current method to build a meshgrid-type array is as follows:
H = 5;
W = 6;
[Wmat,Hmat] = meshgrid(1:W,1:H);
X = [1:W; 1:W];
Y = [ones(1,W); H*ones(1,W)];
Z = [vecsatIE25; vecsatIE125];
img = griddata(X,Y,Z,Wmat,Hmat,'linear')
This works to build a 6x6 array, which I can then index one row from, then interpolate from that 1D array.
This is really not what I want to do.
For example, the rows are # temps = 25C, 50C, 75C, 100C, 125C and 150C. So I must select a temperature of, say, 50C when my temperature is actually 57.5C. Then I can interpolate my I to get my V output. So again for example, my I is 113.2A, and I can actually interpolate a value and get a V for 113.2A.
When I take the attached photo and digitize the plot information, I get an array of points. So my goal is to input any Temperature and any current to get a voltage by interpolation. The type of interpolation is not as important, so long as it produces reasonable values - I do not want nearest neighbor interpolation, linear or something similar is preferred. If it is an option, I will try different kinds of interpolation later (cubic, linear).
I am not sure how I can accomplish this, ideally. The meshgrid array does not need to exist. I simply need the 1 value.
Thank you.
If I understand the question properly, I think what you're looking for is interp2:
Vq = interp2(X,Y,V,Xq,Yq) where Vq is the V you want, Xq and Yq are the temperature and current, and X, Y, and V are the input arrays for temperature, current, and voltage.
As an option, you can change method between 'linear', 'nearest', 'cubic', 'makima', and 'spline'

How to write a random array (with no spatial reference) to geotiff format?

The following MATLAB script generates random locations within a 300x400 array and codes those locations with values from 1-12. How can I convert this non-spatial array to a geotiff? I hope to use the geotiff output to perform some trial analyses. Any projected coordinate system (e.g. UTM) would do for this analysis.
I have tried using geotiffwrite() without success using the following implementation:
out = geotiffwrite('C:\path\to\file\test.tif', m)
Which yields the following error:
>> test
Error using geotiffwrite
Too many output arguments.
EDIT:
The main problem I am encountering is a lack of inputs into the geotiffwrite() function. I am unsure how to deal with this problem. For example, I have no A or R variable because the array has no spatial reference. As long as each pixel is georeferenced somewhere, I do not care what the spatial reference is. The purpose of this is to create a sample dataset that I can experiment with using MATLAB spatial functions.
% Generate a totally black image to start with.
m = zeros(300, 400, 'uint8');
% Generate 1000 random locations.
numRandom = 1000;
linearIndices = randi(numel(m), 1, numRandom);
% Set those locations to be "white".
m(linearIndices) = randi(12, [numel(linearIndices) 1]);
% Display it. Random locations will appear white.
image(m);
colormap(gray);
I believe your question has a very simple answer. Skip the out-variable when you call geotiffwrite. That is, use:
geotiffwrite('C:\path\to\file\test.tif', m)
Instead of
out = geotiffwrite('C:\path\to\file\test.tif', m)
This is example of a working code using geotiffwrite, taken from the documentation. As you can see, there is no output variable there:
basename = 'boston_ovr';
imagefile = [basename '.jpg'];
RGB = imread(imagefile);
worldfile = getworldfilename(imagefile);
R = worldfileread(worldfile, 'geographic', size(RGB));
filename = [basename '.tif'];
geotiffwrite(filename, RGB, R)
figure
usamap(RGB, R)
geoshow(filename)
Update:
According to the documentation, you need at least 3 input parameters. The correct syntax is:
geotiffwrite(filename,A,R)
geotiffwrite(filename,X,cmap,R)
geotiffwrite(...,Name,Value)
From documentation:
geotiffwrite(filename,A,R) writes a georeferenced image or data grid,
A, spatially referenced by R, into an output file, filename.
Please visit this link to see how to use the function.

Uniformly sampling on hyperplanes

Given the vector size N, I want to generate a vector <s1,s2, ..., sn> that s1+s2+...+sn = S.
Known 0<S<1 and si < S. Also such vectors generated should be uniformly distributed.
Any code in C that helps explain would be great!
The code here seems to do the trick, though it's rather complex.
I would probably settle for a simpler rejection-based algorithm, namely: pick an orthonormal basis in n-dimensional space starting with the hyperplane's normal vector. Transform each of the points (S,0,0,0..0), (0,S,0,0..0) into that basis and store the minimum and maximum along each of the basis vectors. Sample uniformly each component in the new basis, except for the first one (the normal vector), which is always S, then transform back to the original space and check if the constraints are satisfied. If they are not, sample again.
P.S. I think this is more of a maths question, actually, could be a good idea to ask at http://maths.stackexchange.com or http://stats.stackexchange.com
[I'll skip "hyper-" prefix for simplicity]
One of possible ideas: generate many uniformly distributed points in some enclosing volume and project them on the target part of plane.
To get uniform distribution the volume must be shaped like the part of plane but with added margins along plane normal.
To uniformly generate points in such volumewe can enclose it in a cube and reject everything outside of the volume.
select margin, let's take margin=S for simplicity (once margin is positive it affects only performance)
generate a point in cube [-M,S+M]x[-M,S+M]x[-M,S+M]
if distance to the plane is more than M, reject the point and go to #2
project the point on the plane
check that projection falls into [0,S]x[0,S]x[0,S], if not - reject and go to #2
add this point to the resulting set and go to #2 is you need more points
The problem can be mapped to that of sampling on linear polytopes for which the common approaches are Monte Carlo methods, Random Walks, and hit-and-run methods (see https://www.jmlr.org/papers/volume19/18-158/18-158.pdf for examples a short comparison). It is related to linear programming, and can be extended to manifolds.
There is also the analysis of polytopes in compositional data analysis, e.g. https://link.springer.com/content/pdf/10.1023/A:1023818214614.pdf, which provide an invertible transformation between the plane and the polytope that can be used for sampling.
If you are working on low dimensions, you can use also rejection sampling. This means you first sample on the plane containing the polytope (defined by your inequalities). This later method is easy to implement (and wasteful, of course), the GNU Octave (I let the author of the question re-implement in C) code below is an example.
The first requirement is to get vector orthogonal to the hyperplane. For a sum of N variables this is n = (1,...,1). The second requirement is a point on the plane. For your example that could be p = (S,...,S)/N.
Now any point on the plane satisfies n^T * (x - p) = 0
we assume also that x_i >= 0
With these given you compute an orthonormal basis on the plane (the nullity of the vector n) and then create random combination on that bases. Finally you map back to the original space and apply your constraints on the generated samples.
# Example in 3D
dim = 3;
S = 1;
n = ones(dim, 1); # perpendicular vector
p = S * ones(dim, 1) / dim;
# null-space of the perpendicular vector (transposed, i.e. row vector)
# this generates a basis in the plane
V = null (n.');
# These steps are just to reduce the amount of samples that are rejected
# we build a tight bounding box
bb = S * eye(dim); # each column is a corner of the constrained region
# project on the null-space
w_bb = V \ (bb - repmat(p, 1, dim));
wmin = min (w_bb(:));
wmax = max (w_bb(:));
# random combinations and map back
nsamples = 1e3;
w = wmin + (wmax - wmin) * rand(dim - 1, nsamples);
x = V * w + p;
# mask the points inside the polytope
msk = true(1, nsamples);
for i = 1:dim
msk &= (x(i,:) >= 0);
endfor
x_in = x(:, msk); # inside the polytope (your samples)
x_out = x(:, !msk); # outside the polytope
# plot the results
scatter3 (x(1,:), x(2,:), x(3,:), 8, double(msk), 'filled');
hold on
plot3(bb(1,:), bb(2,:), bb(3,:), 'xr')
axis image

Resources