How to use ELKI for DBSCAN with precomputed distance matrix - dbscan

I've a precomputed distance matrix for all the points in my database.
I'm trying to invoke ELKI gui with following command:
/usr/share/java/elki.jar
-dbc.in xml_files.1000
-dbc.filter FixedDBIDsFilter
-dbc.startid 0
-algorithm clustering.DBSCAN
-algorithm.distancefunction external.FileBasedDoubleDistanceFunction
-distance.matrix Distance.txt
-dbscan.epsilon 1
-dbscan.minpts 10
But I keep getting following error message :
Wrong parameter format! Parameter "dbscan.epsilon" requires a distance value, but the distance was not set!
I can't figure out what am I doing wrong here...

Which version of ELKI are you using?
This error message usually indicates an issue with the distance parser used by the matrix reader.
Since ELKI supports different valued distance functions, DBSCAN cannot parse the epsilon parameter until the actual distance value type is known (which will provide the value parsing function).
Any earlier error message? If you have any earlier error (including missing required parameters), it will prevent DBSCAN from being able to parse the value.
Try setting the epsilon value last, and also try the command line. In the MiniGUI, due to the incremental way the parameters are set, these dynamic-typed parameters unfortunately can be flaky. Any patch to improve the handling of such parameters is appreciated.

Related

I need to be able to find the mean mode and median but it won't work no matter what

It keeps saying tuple object can't be edited or unsupported operand for int and str. Please help me I need to do this.
I expected to be able to use the statistics module early on but it spectacularly failed. I want to be able to calculate statistics and graph the data as I append more data.Problem

Matlab's bvp4c: output arrays not always the same length as the initial guess

The Matlab function bvp4c solves boundary value problems. It takes a differential equation, boundary conditions and an initial guess as input, and returns a structure array containing arrays of x, y and yp (which stands for "y prime", or y').
The length of the output arrays should be the same as that of the initial guess, but I found that it isn't always. I have checked the dimensions of the input (the initial guess, always 1x101 double for x and 16x101 double for y) and the output (sometimes 1x101 double for x and 16x101 double for y and yp as it should be, but often different values, such as 1x91 double and 16x91 double or 1x175 double and 16x175 double).
Looking at the output array x when its length is off, some extra values are squeezed in, or some are taken out. For example, the initial guess has 100 positions between x=0 and x=1, and the x array should be [0 0.01 0.02 ... 1], but sometimes a new position like 0.015 shows up.
Question: Why does this happen, and how can this be solved?
"The length of the output arrays should be the same as that of the initial guess ...." This is incorrect.
As described in the bvp4c documentation, sol.x contains a "[mesh] selected by bvp4c" with an "[approximation] to y(x) at the mesh points of sol.x". In order to evaluate bvp4c's solution on your mesh, use deval.
Why does bvp4c choose a mesh? Quoting from the cited paper1, which you can get in full here if you have a MathWorks account:
Because BVPs can have more than one solution, BVP codes require users to supply a guess for the solution desired. The guess includes a guess for an initial mesh that reveals the behavior of the desired solution. The codes then adapt the mesh so as to obtain an accurate numerical solution with a modest number of mesh points.
Because a steady BVP generally has a global behavior strongly dependent on its boundary values, the spatial mesh between the two boundaries may need to be refined in order to properly approximate the desired solution with the locally chosen basis functions for the method. However, there may also be portions of the mesh that do not need to be refined and can even be coarsened in some cases to maintain a reasonably small residual and accurate approximation. Therefore, for general efficiency, the guess mesh is adaptively refined or coarsened depending on some locally chosen metric (since bvp4c is collocation based, the metric is probably point-based or division-integrated based) such that the mesh returned by bvp4c is, in some sense, adequate enough for generic interpolation within the boundaries.
I'll also note that this is different from numerically solving IVPs since their state is not global across the entire time integration locus and only depends on the current state to the next time-step, and possibly previous time steps if using a multi-step method or solving a delay differential equation, which makes the refinement inherently local. This local behavior of IVPs is what allows functions like ode45 to return a solution at pre-selected time values because it can locally refine the solution at the selected point while performing the time march (this is known as dense output).
1 Shampine, L.F., M.W. Reichelt, and J. Kierzenka, "Solving Boundary Value Problems for Ordinary Differential Equations in MATLAB with bvp4c".

LabView: Boolean Array to Number Block TroubleShooting

I am using an EV3 Cube to scan a sheet that represents a binary number; i.e a black line represents a 1 and a white line represents a 0.
Using this, I generate a numeric array consisting of 1's and 0's and convert them by using an Index Array to divide them into a single digit, use a quick comparison (!= 0) to generate their Boolean values, then using the Build Array block, I turn it into a Boolean array.
However, despite this, while using the Convert Boolean Array to Integer block, I receive an error which I do know the reason to.
If anyone could help me, I would be greatful! Thank you.
(By the way, I am a Freshman engineering student with no prior knowledge of LabView, just a year of C++ and 2 years of Java to help me. So thorough explanations would be much easier for me to comprehend)
Attached are pictures of my code along with the error I receive.
Unfortunately the error isn't fully visible as it is truncated in your screenshot,
it would help to either have the code or be able to read the entire message.
But what I'm guessing on what I see is, it says that this is Target Specific error Boolean Array To Number function is not supported.
This could mean that a function you are trying to use that normally is available on PC version of LabVIEW will not work on the target platform ( embedded CPU and OS of your EV3 ).

Mathematica Findroot Exploring the parameter space

I am solving three non-linear equations in three variables (H0D,H0S and H1S) using FindRoot. In addition to the three variables of interest, there are four parameters in these equations that I would like to be able to vary. My parameters and the range in which I want to vary them are as follows:
CF∈{0,15} , CR∈{0,8} , T∈{0,0.35} , H1R∈{40,79}
The problem is that my non-linear system may not have any solutions for part of this parameter range. What I basically want to ask is if there is a smart way to find out exactly what part of my parameter range admits real solutions.
I could run a FindRoot inside a loop but because of non-linearity, FindRoot is very sensitive to initial conditions so frequently error messages could be because of bad initial conditions rather than absence of a solution.
Is there a way for me to find out what parameter space works, short of plugging 10^4 combinations of parameter values by hand and playing around with the initial conditions and hoping that FindRoot gives me a solution?
Thanks a lot,

Find cosine similarity between two arrays

I'm wondering if there is a built in function in R that can find the cosine similarity (or cosine distance) between two arrays?
Currently, I implemented my own function, but I can't help but think that R should already come with one.
These sort of questions come up all the time (for me--and as evidenced by the r-tagged SO question list--others as well):
is there a function, either in R core or in any R Package, that does x? and if so,
where can i find it among the +2000 R Packages in CRAN?
short answer: give the sos package a try when these sort of questions come up
One of the earlier answers gave cosine along with a link to its help page. This is probably exactly what the OP wants. When you look at the linked-to page you see that this function is in the lsa package.
But how would you find this function if you didn't already know which Package to look for it in?
you can always try the standard R help functions (">" below just means the R command line):
> ?<some_name>
> ??<some_name>
> *apropos*<some_name>
if these fail, then install & load the sos package, then
***findFn***
findFn is also aliased to "???", though i don't often use that because i don't think you can pass in arguments other than the function name
for the question here, try this:
> library(sos)
> findFn("cosine", maxPages=2, sortby="MaxScore")
The additional arguments passed in ("maxPages=2" and "sortby="MaxScore") just limits the number of results returned, and specifies how the results are ranked, respectively--ie, "find a function named 'cosine' or that has the term 'cosine' in the function description, only return two pages of results, and order them by descending relevance score"
The findFn call above returns a data frame with nine columns and the results as rows--rendered as HTML.
Scanning the last column, Description and Link, item (row) 21 you find:
Cosine Measures (Matrices)
this text is also a link; clicking on it takes you to the help page for that function in the Package which contains that function--in other words
using findFn, you can pretty quickly find the function you want even though you have no idea which Package it's in
It looks like a few options are already available, but I just stumbled across an idiomatic solution I like so I thought I'd add it to the list.
install.packages('proxy') # Let's be honest, you've never heard of this before.
library('proxy') # Library of similarity/dissimilarity measures for 'dist()'
dist(m, method="cosine")
Taking the comment from Jonathan Chang I wrote this function to mimic dist. No extra packages to load.
cosineDist <- function(x){
as.dist(1 - x%*%t(x)/(sqrt(rowSums(x^2) %*% t(rowSums(x^2)))))
}
Check these functions lsa::cosine(), clv::dot_product() and arules::dissimilarity()
You can also check the vegan package: http://cran.r-project.org/web/packages/vegan//index.html
The function vegdist in this package has a variety of dissimilarity (distance) functions, such as manhattan, euclidean, canberra, bray, kulczynski, jaccard, gower, altGower, morisita, horn,mountford, raup , binomial, chao or cao. Please check the .pdf in the package for a definition or consult references https://stats.stackexchange.com/a/33001/12733.
If you have a dot product matrix, you can use this function to compute the cosine similarity matrix:
get_cos = function(S){
doc_norm = apply(as.matrix(dt),1,function(x) norm(as.matrix(x),"f"))
divide_one_norm = S/doc_norm
cosine = t(divide_one_norm)/doc_norm
return (cosine)
}
Input S is the matrix of dot product. Simply, S = dt %*% t(dt), where dt is your dataset.
This function is basically to divide the dot product by the norms of vectors.
The cosine similarity is not invariant to shift. The correlation similarity maybe a better choice because fixes this problem and it is also connected to squared Euclidean distances (if data are standardized)
If you have two objects described by p-dimensional vectors of features,
x1 and x2 both of dimension p, you can compute the correlation similarity by cor(x1, x2).
Note that in statistics correlation is used as a scaled moment notion, so it is naturally thought as correlation between random variables. The cor(dataset) function will compute correlations between columns of the data matrix.
In a typical situation with a (n x p) data matrix X, with units (or objects) on its rows, and variables (or features) on its columns you can compute the correlation similarity matrix simply by computing cor on the transpose of X, and giving the result object a dist class
as.distance(cor(t(X)))
By the way you can compute correlation dissimilarity matrix the same way. The following make a distinction about the size of the angle and the orientation between objects' vectors
1 - cor(t(X))
This one doesn't care about the orientation, only size of the angle
1 - abs(cor(t(X)))

Resources