EKF-SLAM: What would be the consequences of ignoring correlations between landmarks? - slam

While implementing EKF SLAM, I noticed that the covariance matrix quickly begins to gain correlations between landmarks. These correlations get very expensive as the landmark count increases. I wondered what would be the consequence of simply ignoring these correlations all together.
I would only keep the correlations between the robot pose and each of the landmarks.
In my case, the depth of the landmarks start out with very high uncertainty, and I measure only the bearing to the landmarks. If I never computed the correlations between the landmarks, would the depths of the landmarks converge?
Intuitively, I would think that a measurement of a landmarks bearing vector would only impact that landmark's bearing vector and depth, and the pose.
Thanks

Related

Sample size calculation for experimental design

I have three treatments (Wild type, Mutant1 and Mutant2); I request inputs on how to decide the sample size that would be statistically significant (alpha <0.05) with high statistical power (1-beta=0.8).
Questions
I understand that we need the information of effect size. We approach this problem if we don't know the expected effect size prior; a trial experiment to estimate the effect size. In such case if we want to estimate the effect size with trial experiment; what could be the sample size to start with; a high (n=10) or as low as n=3? Can n=3 among treatments provide a good estimate of effect size or n=10 is better to get this estimate. Let's be specific; if we have resource for n=10 max. and we are given option to choose between n=3 or n=10 for this trial
This question is better asked in https://stats.stackexchange.com.
I would discourage you from trying to estimate effects sizes from pilot experiments with low n. Your estimates will be quite noisy and this is rarely done (at least in my field of neuroscience). Instead, I would suggest you estimate your effect size from the literature. Have other people measured something similar to what you are planning to do? What are the sample sizes they use? What kind of effect sizes do they report.
If you were going to go ahead with the plan to run a pilot study, I would recommend pre-registering your experimental design (https://www.cos.io/initiatives/prereg). Something like:
We will test the effects of mutation 1 and mutation 2 on XXXX (compared to wild type) in a cohort of 30 mice (10 in each group). Based on the results of this study, we will then conduct a power analysis and reproduce the experiments in a sample size required to have a power of 0.8 at p=0.05.
Our criteria for excluding animals from the power analysis will be .....
The statistical test for estimating effect size will be......"
etc.

How to apply gradient descent on the weights of a neural network?

Considering a neural network with two hidden layers. In this case we have three matrices of weights. Lets say I'm starting the training. In the first round I'll set random values for all weights of the three matrices. If this is correct I have two questions about:
1- Should I do the training from the input layer to the right or otherwise?
2- In the second round of the trainging I have to apply the gradient descent on the weights. Should I apply on all weights of all matrices an after that calculate the error or apply it weight by weight checking if the error has decreased to go to the next weight and so on to finally go to the next training round?
You need to be familiar with forward propagation and the backward propagation. In a neural network, first you initialize weights randomly. Then you predict the y value(let's say y_pred) according to the training set values(X_train). For each X_train sample you have y_train which is the true output(we say ground truth) for the training sample. Then you calculate a loss value according to the loss function, for simplicity let's say loss=y_pred-y_train (This is no the actual loss function it is a bit more complex than that). This is the forward propagation in short.
So you get the loss then you calculate the how much you need to change the weights in order to train your neural network in the next iteration. For this we use gradient descent algorithm. You calculate new weights using the loss value you get. This is the backward propagation in short.
You redo this steps multiple times and you will improve your weights from random to trained weights.

What is Bijectors in layman terms in tensorflow probability

I am not able to understand Bijectors in Tensorflow-probability.
how to use them.
standard_gumbel = tfd.TransformedDistribution(
distribution=tfd.Exponential(rate=1.),
bijector=tfb.Chain([
tfb.Affine(
scale_identity_multiplier=-1.,
event_ndims=0),
tfb.Invert(tfb.Exp()),
]))
Bijectors encapsulsate the change of variables for a probability density.
Roughly speaking, when you (smoothly and invertibly) map one space to another, you also induce a map from probability densities on the initial space to densities on the target space. In general such transformations warp lengths/areas/volumes (measures) in the initial space to different lengths/areas/volumes in the target space. Since densities incorporate volume information, we need to keep track of these warpings and account for them in the computation of the probability density in the latter space.
By implementing forward and inverse transformations, as well as log Jacobian determinants, Bijectors give us all the info we need to transform random samples as well as probability densities.

Can k-means clustering be used to define classifications in recognition?

I'm doing a recognition problem (faces) and trying to reduce the problem size. I originally began with training data in a feature-wise coordinate system in 120 dimensions, but through PCA I found a better PC-wise coordinate system needing only 20 dimensions while still conveying 95% of the data.
I began thinking that recognition by definition is a problem of classification. Points in n-space belonging to the same object/face/whatever would cluster. To take an example, if 5 instances of the same individual are in the training data, they would cluster and the mid-point of that cluster could be numerically defined using k-means.
I have 100,000 observations, each person is represented by 5-10 headshots, this means instead of comparing a novel input to 100,000 points in my 20-space, I could instead compare to 10,000-20,000 centroids. Can k-means be used like this or have I misinterpreted? k is obviously undefined but I've been reading up on ways to find optimal k.
My specific recognition problem doesn't use neural nets but rather simple arithmetic euclidean distances between points.

Check that smaller cubes fill bigger cube

Given one large cube (axis aligned and on integer coordinates), and many smaller cubes (also axis aligned and on integer coordinates). How can we check that the large cube is perfectly filled by the smaller cubes.
Currently we check that:
For each small cube it is fully contained by the large cube.
That it doesn't intersect any other small cube.
The sum of the volumes of the small cubes equals the volume of the large cube.
This is ok for small numbers of cubes but we need to support this test of cubes with dimensions greater than 2^32. Even at 2^16 the number of small cubes required to fill the large cube is large enough that step 2 takes a while (O(n^2) checking each cube intersects no other).
Is there a better algorithm?
EDIT:
There seems to be some confusion over this. I am not trying to split a cube into smaller cubes. That's already done. Part of our program splits large OpenCL ranges (axis aligned cubes on integer coordinates) into lots of smaller ranges that fit into a hardware job.
What I'm doing is hooking into this system and checking that the jobs it produces correctly cover the large initial range. My algorithm above works, but it's slow and given the amount of tests we have to run I'd like to keep these tests as fast as possible.
We are talking about 3D right?
For 2D one can do a similar (but simpler) process (with, I believe, an O(n log n) running time algorithm).
The basic idea of the below is the sweep-line algorithm.
Note that rectangle intersection can done by checking whether any corner of any cube is contained in any other cube.
You can improve on (2) as follows:
Split each cube into 2 rectangles on the y-z plane (so you'd have 2 rectangles defined by the same set of 4 (y,z) coordinates, but the x coordinates will be different between the rectangles).
Define the rectangle with the smaller x-coordinate as the start of a cube and the other rectangle as the end of a cube.
Sort the rectangles by x-coordinate
Have an initially empty interval tree
(each interval should also store a reference to the rectangle to which it belongs)
For each rectangle:
Look up the y-coordinate of each point of the rectangle in the interval tree.
For each matching interval, look up its rectangle and check whether the point is also contained within the z-coordinates (this is all that's required because the tree only contains x-coordinates in the correct range and we check the y-coordinates by doing the interval lookup).
If it is, we have overlap.
If the rectangle is the start of a cube, insert the 2 y-coordinates of the rectangle as an interval into the interval tree.
Otherwise, remove the interval defined by the 2 y-coordinates from the tree.
The running time is between O(n) (best case) and O(n2) (worst case), depending on how much overlap there is in the x- and y-coordinates (more overlap is worse).
order your insert cubes
insert the biggest insert cube in one of the corners of your cube and split up the remaining cube into subcubes
insert the second biggest insert cube in the first of the sub cubes that will fit and add the remaining subcubes of this subcube to the set of subcubes
etc.
Another go, again only addressing step 2 in the original question:
Define a space-filling curve with good spatial locality, such as a 3D Hilbert Curve.
For each cube calculate the pair of coordinates on the curve for the points at which the curve both enters and leaves the cube. The space-filling curve will enter and leave some cubes more than once, calculate more than one pair of coordinates for these cases.
You've now got I don't know how many pairs of coordinates, but I'd guess no more than 2^18. These coordinates define intervals along the space-filling curve, so sort them and look for overlaps.
Time complexity is probably dominated by the sort, space complexity is probably quite big.

Resources