We are desperate to switch over to Lucene (via Solr), but one big issue we have is the syntax support.
dtSearch supports xfirstword, w/N, pre/N, and probably some others.
I think w/N can be ported to Lucene, but the other ones I have no idea how to port.
I did a search and found an article that claims they have made the switch--still using dtSearch syntax, but I have yet to get the source. I left a comment about getting the source, but no response yet.
What do you guys recommend?
We basically want Solr with dtSearch syntax.
Do you have any good articles on how to specifically add features to indexing, etc. needed to accomplish these features?
Since I wasn't able to find a good solution to this, I wrote a dtSearch parser in Antlr4.
Many of you have asked for it, so I've posted it to GitHub.
Here's the link:
https://github.com/blmille1/dtsearchparser
Related
I saw there was a article in the Apache wiki on OpenNLP for Solr.
Is it valid for current solr version 5.3.1?
No, if you have a look at LUCENE-2899, you'll see that the code discussed was never added to trunk. You'll have to download/patch/update the code yourself if you're going to have it native to Solr.
It's probably a better idea to do all the NLP stuff outside of Solr, then index the result in a form suited for the task you're trying to solve.
Yes. It's better to keep it outside.
Here is a small project I tried.
https://github.com/john77eipe/DeepQA
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
I am currently in the process of choosing a technology/format to expose my API. It seems there are lots of discussion on this topic, but could not find the one for future use. I am planning to use Hydra:
http://www.markus-lanthaler.com/hydra
as it seems to be fully restfull (hypermedia api) but it seems it is not accepted yet (neither HAL is).
when I go to : http://www.markus-lanthaler.com/hydra/api-demo/vocab, I get a json that seems to be what swagger returns.
my questions:
- is Hydra Documentation meant to be sthg like swagger
- could find any tool for it like swagger has.
- I would prefer using Hydra as it seems it has more description on operations... by using json-ld, but it does not seem to be as supported as Hal or swagger is.
-does anyone have experience with hydra
Great that you consider using Hydra. You are right, we do not have extensive tooling yet but that's just a matter of time. In fact, I'm already working on a documentation generator. If you have further questions regarding Hydra, please don't hesitate to post to our mailing list. There are a lot of people on that list that will be happy to help.
I'm really interested in being able to annotate my data. I am not really sure where to start, so I thought of using Apache Uima with Solr. I'm not sure if I'm no the right path, yet. Anyhow, I'm looking for some good documentation on this component called Solr-Uima
http://code.google.com/p/solr-uima/
Thanks,
J
I have no idea what UIMA is, but when I can't find any docs I read the tests. If the tests are not enough I read the code itself. (sorry if this is a generic answer, but this code doesn't seem to be too difficult to follow, so I really think it's a viable option).
The former project on Google Code has been contributed to Lucene / Solr codebase.
There is a wiki page, but needs to be updated.
It's a simple question, but I haven't found the answer anywhere. Thoughts and input appreciated.
I'm using Django, too, for what it's worth. :)
Cheers.
The Search API is now available as experimental for Java and Python .
With Java GAE, you could use Compass, but that won't help with Django. For Python, Bill Katz offers one solution -- open source -- and these guys offer a Django-specific approach which, however, is free only for non-commercial applications (i.e. if your app makes money they want you to pay for their full-text search). I have no real-world experience with either of these solutions so I can't really give well-grounded recommendations, but from what one can see with just a little playing around they seem quite useful.
An overview of the Python App Engine searches that I am aware of:
Google did add a cut down search using SearchableModel although that has limitations (5000 indexed word limit, String property only not Text):
http://groups.google.com/group/google-appengine/browse_thread/thread/f64eacbd31629668/8dac5499bd58a6b7?lnk=gst&q=searchablemodel
Or as another posters have pointed out there are these options:
The Quick and simple text search:
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine
This product which has a fairly comprehensive free version and a more extensive commercial version:
http://gae-full-text-search.appspot.com/customers/download/
I've read that Google do have a project to bring full text search to App Engine although this is not scheduled to happen any time soon
I'd really like to see a comparison of the various searching frameworks and see how they stack up to each other. Does anyone know of any report like this?
Edit:
Google Search API now available (although still experimental)
For now, the real answer is that there is no real full-text search on Google App Engine. The solutions provided by the other answers here are fine for toy data sets, but do not scale to anything more than O(10000) documents or so. Google will have to provide search as an infrastructural feature of GAE. See the feature request for (mostly superfluous) discussion.
# The Quick and simple text search:
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine
this solution did not work for me - and looking at the limitations below, it is unlikely to be useful for real use cases.
It uses StringListProperty to store phrases which has a limitation of 500 characters.
It does not work with the standard query filters.
Issue 217 Bill Katz released a package to deal with and http://gae-full-text-search.appspot.com/ is available alternatively, levensthein is a another match measure
You should be able to adapt Whoosh! to write in the datastore instead of on disk. It's a pure python full-text search engine. It's not as fast or full-featured as Lucene, but it should run on GAE without too many modifications.