Is there any geospatial support available for OrientDB (wether its included or from 3rd party vendors)?
In particular I need the ability to build spatial-indexes on my geo-data.
Since SBTree supports range queries, as I understand it, you could efficiently use the distance() function.
Sources:
https://github.com/orientechnologies/orientdb/wiki/SQL-Where#wiki-functions
https://github.com/orientechnologies/orientdb/wiki/Indexes
https://code.google.com/p/orient/source/browse/trunk/tests/src/test/java/com/orientechnologies/orient/test/database/auto/GEOTest.java
Is Lucene Spatial index you are looking for? See the documentation:
https://github.com/orientechnologies/orientdb-lucene/wiki/Spatial-Index
Related
Does Vespa support comparators for string matching like Levenshtein, Jaro–Winkler, Soundex etc? Is there any way we can implement them as plugins as some are available in Elasticsearch? What are the approaches to do this type of searches?
The match modes supported by Vespa is documented here https://docs.vespa.ai/documentation/reference/schema-reference.html#match plus regular expression for attribute fields https://docs.vespa.ai/documentation/reference/query-language-reference.html#matches
None of the mentioned string matching/ranking algorithms are supported out of the box. Both edit distance variants sounds more like a text ranking feature which should be easy to implement. (Open a github issue at https://github.com/vespa-engine/vespa/issues)
The matching in Vespa happens in a c++ component so no plugin support there yet.
You can deploy a plugin in the container which is written in Java by deploying a custom searcher (https://docs.vespa.ai/documentation/searcher-development.html). Then you can work on the top k hits, using e.g regular expression or n-gram matching to retrieve candidate documents. The soundex algorithm can be implemented accurately using a searcher and a document processor.
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
Is there a way for me to search across all the namespaces in google app engine? Conceptually its not possible but wanted to check with the community.
Currently, I iterate through all namespaces and query each of them. Its time consuming and slow.
Not possible with standard datastore queries. Options would be to use Search API, or export to BigQuery.
Not possible, as Gwyn is pointing out. I DO see that there is a bug for this feature to be added in Google's Public Issue Tracker (namely, this issue)
It's also not possible using the Search API. My understanding is that namespaces are designed for isolation.
You could assign the same search document to two index. One generic or default and other isolated.
Then just search over the generic one, for example:
generic = search.Index("all_docs")
specific = search.Index("specific", namespace="sample_namespace")
generic.put("search_document")
specific.put("search_document")
With ElasticSearch, an app can point to the alias of an index, instead of the index directly, which makes it easy to switch the index the app uses.
Tire, the equivalent of Sunspot for ES, allows me to interact with aliases.
I can't find anything regarding aliases with Sunspot. How do you handle them in your apps which use Sunspot?
I do not know anything about sunspot, but for Solr counts that there has been a core alias feature, until version 3.1 of Solr. This has been removed with SOLR-1637 and has been "really, really" removed with SOLR-6169 in version 4.9.
But with the advent of SolrCloud this feature has been re-introduced with a better/different implementation SOLR-4497 in Solr 4.2.
Unfortunately when skimming through the Reference of Sunspost I do not find a word about SolrCloud or aliasing. Probably that features have not been adopted by the Sunspot developers? As stated I do not know sunspot, probably they name it differently?
Most likely you will have to get your hands dirty and manage SolrCloud and in consequence aliases not through the API sunspot offers, but with admin interface of Solr.
Sources of information
There is this old Wiki page that covers SolrCloud. It has a small, separate section about creating aliases
In the official reference is also a section about collection aliases.
The guys of Cloudera who have donated the feature to Solr have also written a blog post about it.
How do we create RDF database in Jython? I use this to implement SparQL in Jython. So I need to create the database first.
See RDFAlchemyJython for reusing most well known Java tools for RDF and SPARQL in Jython; or go for RDFLIB, a wide spread RDF and SPARQL framework for Python.
I was going to say use the Jena libs, but msalvadores got there already, check the RDFAlchemyJython link. I'll add that it is pretty straightforward, just use them like you would any other Java libs in Jython.
TDB is probably the best bet for a SPARQLable database, see:
https://jena.apache.org/documentation/tdb/java_api.html
Just put the libs on your classpath, tweak the code to be js not Java.