Uniformly sampling on hyperplanes

Uniformly sampling on hyperplanes - c

Given the vector size N, I want to generate a vector <s1,s2, ..., sn> that s1+s2+...+sn = S.
Known 0<S<1 and si < S. Also such vectors generated should be uniformly distributed.
Any code in C that helps explain would be great!

The code here seems to do the trick, though it's rather complex.
I would probably settle for a simpler rejection-based algorithm, namely: pick an orthonormal basis in n-dimensional space starting with the hyperplane's normal vector. Transform each of the points (S,0,0,0..0), (0,S,0,0..0) into that basis and store the minimum and maximum along each of the basis vectors. Sample uniformly each component in the new basis, except for the first one (the normal vector), which is always S, then transform back to the original space and check if the constraints are satisfied. If they are not, sample again.
P.S. I think this is more of a maths question, actually, could be a good idea to ask at http://maths.stackexchange.com or http://stats.stackexchange.com

[I'll skip "hyper-" prefix for simplicity]
One of possible ideas: generate many uniformly distributed points in some enclosing volume and project them on the target part of plane.
To get uniform distribution the volume must be shaped like the part of plane but with added margins along plane normal.
To uniformly generate points in such volumewe can enclose it in a cube and reject everything outside of the volume.
select margin, let's take margin=S for simplicity (once margin is positive it affects only performance)
generate a point in cube [-M,S+M]x[-M,S+M]x[-M,S+M]
if distance to the plane is more than M, reject the point and go to #2
project the point on the plane
check that projection falls into [0,S]x[0,S]x[0,S], if not - reject and go to #2
add this point to the resulting set and go to #2 is you need more points

The problem can be mapped to that of sampling on linear polytopes for which the common approaches are Monte Carlo methods, Random Walks, and hit-and-run methods (see https://www.jmlr.org/papers/volume19/18-158/18-158.pdf for examples a short comparison). It is related to linear programming, and can be extended to manifolds.
There is also the analysis of polytopes in compositional data analysis, e.g. https://link.springer.com/content/pdf/10.1023/A:1023818214614.pdf, which provide an invertible transformation between the plane and the polytope that can be used for sampling.
If you are working on low dimensions, you can use also rejection sampling. This means you first sample on the plane containing the polytope (defined by your inequalities). This later method is easy to implement (and wasteful, of course), the GNU Octave (I let the author of the question re-implement in C) code below is an example.
The first requirement is to get vector orthogonal to the hyperplane. For a sum of N variables this is n = (1,...,1). The second requirement is a point on the plane. For your example that could be p = (S,...,S)/N.
Now any point on the plane satisfies n^T * (x - p) = 0
we assume also that x_i >= 0
With these given you compute an orthonormal basis on the plane (the nullity of the vector n) and then create random combination on that bases. Finally you map back to the original space and apply your constraints on the generated samples.
# Example in 3D
dim = 3;
S = 1;
n = ones(dim, 1); # perpendicular vector
p = S * ones(dim, 1) / dim;
# null-space of the perpendicular vector (transposed, i.e. row vector)
# this generates a basis in the plane
V = null (n.');
# These steps are just to reduce the amount of samples that are rejected
# we build a tight bounding box
bb = S * eye(dim); # each column is a corner of the constrained region
# project on the null-space
w_bb = V \ (bb - repmat(p, 1, dim));
wmin = min (w_bb(:));
wmax = max (w_bb(:));
# random combinations and map back
nsamples = 1e3;
w = wmin + (wmax - wmin) * rand(dim - 1, nsamples);
x = V * w + p;
# mask the points inside the polytope
msk = true(1, nsamples);
for i = 1:dim
msk &= (x(i,:) >= 0);
endfor
x_in = x(:, msk); # inside the polytope (your samples)
x_out = x(:, !msk); # outside the polytope
# plot the results
scatter3 (x(1,:), x(2,:), x(3,:), 8, double(msk), 'filled');
hold on
plot3(bb(1,:), bb(2,:), bb(3,:), 'xr')
axis image

Related

Creating a logarithmic spaced array in IDL

I was looking for a way to generate a logarithmic spaced array in IDL.
From the L3 Harris Geospatial website I came across "arrgen" and was trying to use it for this purpose. However,
arrgen(1,215,/log)
returns the error: Variable is undefined: ARRGEN.
What would be the correct way to do it?
Thanks in advance for your help

Start by defining your lower and upper bounds in which ever log-base you prefer. I will use base $e$ for brevity sake.
lowe = ALOG(low[0])
uppe = ALOG(upp[0])
where low and upp are scalar, numerical values you, the user, define (e.g., 1 and 215 in your example). Then construct an evenly spaced array of n elements, such as:
dinde = DINDGEN(n[0])*(uppe[0] - lowe[0])/(n[0] - 1L) + lowe[0]
where n is a scalar integer. Now convert back to linear space to get:
dind = EXP(dinde)
This will be a logarithmically spaced array. If you want to use base-10 log, then substitute ALOG for ALOG10. If you need another base, then you can use the logarithmic change of base rule given by:
logb x = logc x / logc b

FiPy: Can I directly change faceVariables depending on neighboring cells?

I am working with a biological model of the distribution of microbial biomass (b1) on a 2D grid. From the biomass a protein (p1) is produced. The biomass diffuses over the grid, while the protein does not. Only if a certain amount of protein is produced (p > p_lim), the biomass is supposed to diffuse.
I try to implement this by using a dummy cell variable z multiplied with the diffusion coefficient and setting it from 0 to 1 only in cells where p > p_lim.
The condition works fine and when the critical amount of p is reached in a cell, z is set to 1, and diffusion happens. However, the diffusion still does not work with the rate I would like, because to calculate diffusion, the face variable, not the value of the cell itself is used. The faces of z are always a mean of the cell with z=1 and its neighboring cells with z=0. I I, however, would like the diffusion to work at its original rate even if the neighbouring cell is still at p < p_lim.
So, my question is: Can i somehow access a faceVariable and change it? For example, set a face to 1 if any neigboring cell has reached p1 > p_lim? I guess this is not a proper mathematical thing to do, but I couldn't think of another way to simulate this problem.
I will show a very reduced form of my model below. In any case, I thank you very much for your time!
##### produce mesh
nx= 5.
ny= nx
dx = 1.
dy = dx
L = nx*dx
mesh = Grid2D(nx=nx,ny=ny,dx=dx,dy=dy)
#parameters
h1 = 0.5 # production rate of p
Db = 10. # diffusion coeff of b
p_lim=0.1
# cell variables
z = CellVariable(name="z",mesh=mesh,value=0.)
b1 = CellVariable(name="b1",mesh=mesh,hasOld=True,value=0.)
p1= CellVariable(name="p1",mesh=mesh,hasOld=True,value=0.)
# equations
eqb1 = (TransientTerm(var=b1)== DiffusionTerm(var=b1,coeff=Db*z.arithmeticFaceValue)-ImplicitSourceTerm(var=b1,coeff=h1))
eqp1 = (TransientTerm(var=p1)==ImplicitSourceTerm(var=b1,coeff=h1))
# set b1 to 10. in the center of the grid
b1.setValue(10.,where=((x>2.)&(x<3.)&(y>2.)&(y<3.)))
vi=Viewer(vars=(b1,p1),FIPY_VIEWER="matplotlib")
eq = eqb1 & eqp1
from builtins import range
for t in range(10):
b1.updateOld()
p1.updateOld()
z.setValue(z + 0.1,where=((p1>=p_lim) & (z < 1.)))
eq.solve(dt=0.1)
vi.plot()

In addition to .arithmeticFaceValue, FiPy provides other interpolators between cell and face values, such as .harmonicFaceValue and .minmodFaceValue.
These properties are implemented using subclasses of _CellToFaceVariable, specifically _ArithmeticCellToFaceVariable, _HarmonicCellToFaceVariable, and _MinmodCellToFaceVariable.
You can also make a custom interpolator by subclassing _CellToFaceVariable. Two such examples are _LevelSetDiffusionVariable and ScharfetterGummelFaceVariable (neither is well documented, I'm afraid).
You need to override the _calc_() method to provide your custom calculation. This method takes three arguments:
alpha: an array of the ratio (0-1) of the distance from the face to the cell on one side, relative to the distance from distance from the cell on the other side to the cell on the first side
id1: an array of indices of the cells on one side of the face
id2: an array of indices of the cells on the other side of the face
Note: You can ignore any clause if inline.doInline: and look at the _calc_() method defined under the else: clause.

Take numbers form two intervals in concentric spheres in Julia

I am trying to take numbers from two intervals in Julia. The problem is the following,
I am trying to create concentric spheres and I need to generate vectors of dimension equal to 15 filled with numbers taken from each circle. The code is:
rmax = 5
ra = fill(0.0,1,rmax)
for i=1:rmax-1
ra[:,i].=rad/i
ra[:,rmax].= 0
end
for i=1:3
ptset = Any[]
for j=1:200
yt= 0
yt= rand(Truncated(Normal(0, 1), -ra[i], ra[i] ))
if -ra[(i+1)] < yt <= -ra[i] || ra[(i+1)] <= yt < ra[i]
push!(ptset,yt)
if length(ptset) == 15
break
end
end
end
end
Here, I am trying to generate spheres with uniform random numbers inside of each one; In this case, yt is only part of the construction of the numbers inside the sphere.
I would like to generate points in a sphere with radius r0 (ra[:,4] for this case), then points distributed from the edge of the first sphere to the second one wit radius r1 (here ra[:,3]) and so on.
In order to do that, I try to take elements that fulfill one of the two conditions -ra[(i+1)] < yt <= -ra[i]
or ra[(i+1)] <= yt < ra[i], i.e. I would like to generate a vector with positive and negative numbers. I used the operator || but it seems to take only the positive part. I am new in Julia and I am not sure how to take the elements from both parts of the interval. Does anyone has a hit on how to do it?. Thanks in advance

I hope I understood you correctly. First, we need to be able to sample uniformly from an N-dimensional shell with radii r0 and r1:
using Random
using LinearAlgebra: normalize
struct Shell{N}
r0::Float64
r1::Float64
end
Base.eltype(::Type{<:Shell}) = Vector{Float64}
function Random.rand(rng::Random.AbstractRNG, d::Random.SamplerTrivial{Shell{N}}) where {N}
shell = d[]
Δ = shell.r1 - shell.r0
θ = normalize(randn(N)) # uniformly distributed N-dimensional direction of length 1
r = shell.r0 .* θ # scale to a point on the interior of the shell
return r .+ Δ .* θ .* .√rand(N) # add a uniformly random segment between r0 and r1
end
(See here for more info about hooking into Random. You could equally implement a new Distribution, but that's not really necessary.)
Most importantly, a truncated normal will not result in a uniform distribution, but neither will adding a uniform scaling into the right direction: see here for why the square root is necessary (and I hope I got it right; you should check the math once more).
Then we can just create a sequence of shell samples with nested radii:
rmax = 5
rad = 10.0
ra = range(0, rad, length=rmax)
ptset = [rand(Shell{2}(ra[i], ra[i+1]), 15) for i = 1:(rmax - 1)]
(This part I wasn't really sure about, but the point should be clear.)

How to efficiently evaluate or approximate a road Clothoid?

I'm facing the problem of computing values of a clothoid in C in real-time.
First I tried using the Matlab coder to obtain auto-generated C code for the quadgk-integrator for the Fresnel formulas. This essentially works great in my test scnearios. The only issue is that it runs incredibly slow (in Matlab as well as the auto-generated code).
Another option was interpolating a data-table of the unit clothoid connecting the sample points via straight lines (linear interpolation). I gave up after I found out that for only small changes in curvature (tiny steps along the clothoid) the results were obviously degrading to lines. What a surprise...
I know that circles may be plotted using a different formula but low changes in curvature are often encountered in real-world-scenarios and 30k sampling points in between the headings 0° and 360° didn't provide enough angular resolution for my problems.
Then I tried a Taylor approximation around the R = inf point hoping that there would be significant curvatures everywhere I wanted them to be. I soon realized I couldn't use more than 4 terms (power of 15) as the polynom otherwise quickly becomes unstable (probably due to numerical inaccuracies in double precision fp-computation). Thus obviously accuracy quickly degrades for large t values. And by "large t values" I'm talking about every point on the clothoid that represents a curve of more than 90° w.r.t. the zero curvature point.
For instance when evaluating a road that goes from R=150m to R=125m while making a 90° turn I'm way outside the region of valid approximation. Instead I'm in the range of 204.5° - 294.5° whereas my Taylor limit would be at around 90° of the unit clothoid.
I'm kinda done randomly trying out things now. I mean I could just try to spend time on the dozens of papers one finds on that topic. Or I could try to improve or combine some of the methods described above. Maybe there even exists an integrate function in Matlab that is compatible with the Coder and fast enough.
This problem is so fundamental it feels to me I shouldn't have that much trouble solving it. any suggetions?

about the 4 terms in Taylor series - you should be able to use much more. total theta of 2pi is certainly doable, with doubles.
you're probably calculating each term in isolation, according to the full formula, calculating full factorial and power values. that is the reason for losing precision extremely fast.
instead, calculate the terms progressively, the next one from the previous one. Find the formula for the ratio of the next term over the previous one in the series, and use it.
For increased precision, do not calculate in theta by rather in the distance, s (to not lose the precision on scaling).
your example is an extremely flat clothoid. if I made no mistake, it goes from (25/22) pi =~ 204.545° to (36/22) pi =~ 294.545° (why not include these details in your question?). Nevertheless it should be OK. Even 2 pi = 360°, the full circle (and twice that), should pose no problem.
given: r = 150 -> 125, 90 degrees turn :
r s = A^2 = 150 s = 125 (s+x)
=> 1+(x/s) = 150/125 = 1 + 25/125 x/s = 1/5
theta = s^2/2A^2 = s^2 / (300 s) = s / 300 ; = (pi/2) * (25/11) = 204.545°
theta2 = (s+x)^2/(300 s) = (6/5)^2 s / 300 ; = (pi/2) * (36/11) = 294.545°
theta2 - theta = ( 36/25 - 1 ) s / 300 == pi/2
=> s = 300 * (pi/2) * (25/11) = 1070.99749554 x = s/5 = 214.1994991
A^2 = 150 s = 150 * 300 * (pi/2) * (25/11)
a = sqrt (2 A^2) = 300 sqrt ( (pi/2) * (25/11) ) = 566.83264608
The reference point is at r = Infinity, where theta = 0.
we have x = a INT[u=0..(s/a)] cos(u^2) d(u) where a = sqrt(2 r s) and theta = (s/a)^2. write out the Taylor series for cos, and integrate it, term-by-term, to get your Taylor approximation for x as function of distance, s, along the curve, from the 0-point. that's all.
next you have to decide with what density to calculate your points along the clothoid. you can find it from a desired tolerance value above the chord, for your minimal radius of 125. these points will thus define the approximation of the curve by line segments, drawn between the consecutive points.

I am doing my thesis in the same area right now.
My approach is the following.
at each point on your clothoid, calculate the following (change in heading / distance traveled along your clothoid), by this formula you can calculate the curvature at each point by this simple equation.
you are going to plot each curvature value, your x-axis will be the distance along the clothoid, the y axis will be the curvature. By plotting this and applying very easy linear regression algorithm (search for Peuker algorithm implementation in your language of choice)
you can easily identify where are the curve sections with value of zero (Line has no curvature), or linearly increasing or decreasing (Euler spiral CCW/CW), or constant value != 0 (arc has constant curvature across all points on it).
I hope this will help you a little bit.
You can find my code on github. I implemented some algorithms for such problems like Peuker Algorithm.

What's the fastest way to find deepest path in a 3D array?

I've been trying to find solution to my problem for more than a week and I couldn't find out anything better than a milion iterations prog, so I think it's time to ask someone to help me.
I've got a 3D array. Let's say, we're talking about the ground and the first layer is a surface.
Another layers are floors below the ground. I have to find deepest path's length, count of isolated caves underground and the size of the biggest cave.
Here's the visualisation of my problem.
Input:
5 5 5 // x, y, z
xxxxx
oxxxx
xxxxx
xoxxo
ooxxx
xxxxx
xxoxx
and so...
Output:
5 // deepest path - starting from the surface
22 // size of the biggest cave
3 // number of izolated caves (red ones) (izolated - cave that doesn't reach the surface)
Note, that even though red cell on the 2nd floor is placed next to green one, It's not the same cave because it's placed diagonally and that doesn't count.
I've been told that the best way to do this, might be using recursive algorithm "divide and rule" however I don't really know how could it look like.

I think you should be able to do it in O(N).
When you parse your input, assign each node a 'caveNumber' initialized to 0. Set it to a valid number whenever you visit a cave:
CaveCount = 0, IsolatedCaveCount=0
AllSizes = new Vector.
For each node,
ProcessNode(size:0,depth:0);
ProcessNode(size,depth):
If node.isCave and !node.caveNumber
if (size==0) ++CaveCount
if (size==0 and depth!=0) IsolatedCaveCount++
node.caveNumber = CaveCount
AllSizes[CaveCount]++
For each neighbor of node,
if (goingDeeper) depth++
ProcessNode(size+1, depth).
You will visit each node 7 times at worst case: once from the outer loop, and possibly once from each of its six neighbors. But you'll only work on each one once, since after that the caveNumber is set, and you ignore it.
You can do the depth tracking by adding a depth parameter to the recursive ProcessNode call, and only incrementing it when visiting a lower neighbor.

The solution shown below (as a python program) runs in time O(n lg*(n)), where lg*(n) is the nearly-constant iterated-log function often associated with union operations in disjoint-set forests.
In the first pass through all cells, the program creates a disjoint-set forest, using routines called makeset(), findset(), link(), and union(), just as explained in section 22.3 (Disjoint-set forests) of edition 1 of Cormen/Leiserson/Rivest. In later passes through the cells, it counts the number of members of each disjoint forest, checks the depth, etc. The first pass runs in time O(n lg*(n)) and later passes run in time O(n) but by simple program changes some of the passes could run in O(c) or O(b) for c caves with a total of b cells.
Note that the code shown below is not subject to the error contained in a previous answer, where the previous answer's pseudo-code contains the line
if (size==0 and depth!=0) IsolatedCaveCount++
The error in that line is that a cave with a connection to the surface might have underground rising branches, which the other answer would erroneously add to its total of isolated caves.
The code shown below produces the following output:
Deepest: 5 Largest: 22 Isolated: 3
(Note that the count of 24 shown in your diagram should be 22, from 4+9+9.)
v=[0b0000010000000000100111000, # Cave map
0b0000000100000110001100000,
0b0000000000000001100111000,
0b0000000000111001110111100,
0b0000100000111001110111101]
nx, ny, nz = 5, 5, 5
inlay, ncells = (nx+1) * ny, (nx+1) * ny * nz
masks = []
for r in range(ny):
masks += [2**j for j in range(nx*ny)][nx*r:nx*r+nx] + [0]
p = [-1 for i in range(ncells)] # parent links
r = [ 0 for i in range(ncells)] # rank
c = [ 0 for i in range(ncells)] # forest-size counts
d = [-1 for i in range(ncells)] # depths
def makeset(x): # Ref: CLR 22.3, Disjoint-set forests
p[x] = x
r[x] = 0
def findset(x):
if x != p[x]:
p[x] = findset(p[x])
return p[x]
def link(x,y):
if r[x] > r[y]:
p[y] = x
else:
p[x] = y
if r[x] == r[y]:
r[y] += 1
def union(x,y):
link(findset(x), findset(y))
fa = 0 # fa = floor above
bc = 0 # bc = floor's base cell #
for f in v: # f = current-floor map
cn = bc-1 # cn = cell#
ml = 0
for m in masks:
cn += 1
if m & f:
makeset(cn)
if ml & f:
union(cn, cn-1)
mr = m>>nx
if mr and mr & f:
union(cn, cn-nx-1)
if m & fa:
union(cn, cn-inlay)
ml = m
bc += inlay
fa = f
for i in range(inlay):
findset(i)
if p[i] > -1:
d[p[i]] = 0
for i in range(ncells):
if p[i] > -1:
c[findset(i)] += 1
if d[p[i]] > -1:
d[p[i]] = max(d[p[i]], i//inlay)
isola = len([i for i in range(ncells) if c[i] > 0 and d[p[i]] < 0])
print "Deepest:", 1+max(d), " Largest:", max(c), " Isolated:", isola

It sounds like you're solving a "connected components" problem. If your 3D array can be converted to a bit array (e.g. 0 = bedrock, 1 = cave, or vice versa) then you can apply a technique used in image processing to find the number and dimensions of either the foreground or background.
Typically this algorithm is applied in 2D images to find "connected components" or "blobs" of the same color. If possible, find a "single pass" algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The same technique can be applied to 3D data. Googling "connected components 3D" will yield links like this one:
http://www.ecse.rpi.edu/Homepages/wrf/pmwiki/pmwiki.php/Research/ConnectedComponents
Once the algorithm has finished processing your 3D array, you'll have a list of labeled, connected regions, and each region will be a list of voxels (volume elements analogous to image pixels). You can then analyze each labeled region to determine volume, closeness to the surface, height, etc.
Implementing these algorithms can be a little tricky, and you might want to try a 2D implementation first. Thought it might not be as efficient as you like, you could create a 3D connected component labeling algorithm by applying a 2D algorithm iteratively to each layer and then relabeling the connected regions from the top layer to the bottom layer:
For layer 0, find all connected regions using the 2D connected component algorithm
For layer 1, find all connected regions.
If any labeled pixel in layer 0 sits directly over a labeled pixel in layer 1, change all the labels in layer 1 to the label in layer 0.
Apply this labeling technique iteratively through the stack until you reach layer N.
One important considering in connected component labeling is how one considers regions to be connected. In a 2D image (or 2D array) of bits, we can consider either the "4-connected" region of neighbor elements
X 1 X
1 C 1
X 1 X
where "C" is the center element, "1" indicates neighbors that would be considered connected, and "X" are adjacent neighbors that we do not consider connected. Another option is to consider "8-connected neighbors":
1 1 1
1 C 1
1 1 1
That is, every element adjacent to a central pixel is considered connected. At first this may sound like the better option. In real-world 2D image data a chessboard pattern of noise or diagonal string of single noise pixels will be detected as a connected region, so we typically test for 4-connectivity.
For 3D data you can consider either 6-connectivity or 26-connectivity: 6-connectivity considers only the neighbor pixels that share a full cube face with the center voxel, and 26-connectivity considers every adjacent pixel around the center voxel. You mention that "diagonally placed" doesn't count, so 6-connectivity should suffice.

You can observe it as a graph where (non-diagonal) adjacent elements are connected if they both empty (part of a cave). Note that you don't have to convert it to a graph, you can use normal 3d array representation.
Finding caves is the same task as finding the connected components in a graph (O(N)) and the size of a cave is the number of nodes of that component.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight