Is there something equivalent to solr's UpdateRequestProcessor in Elasticsearch? - solr

I want to create a plugin that adds a new field to the document before it get indexed. In Solr there is a specific component for this purpose UpdateRequestProcessor.
Is there something similar for elasticsearch?

Although some rivers support scripting to modify documents that are going to be indexed, that would definitely slow indexing down and is not supported within elasticsearch itself.
Doing this work in the client side is the way to go.

I just built a tool that allows you to use Solr's UpdateRequestProcessor in Elasticsearch.

Related

Has lucene search engine (Not Solr) REST API for querying the indexed content?

How to query the indexed content in Lucene. Do we need to write any script or any api's available to query the index.
No, Lucene is a library; you have to write custom java code to do anything useful with it.
If you are looking for something higher level, that does not require you to write code, look for Solr or elasticsearch both of which are built on top of Lucene.
How to query the indexed content in Lucene? You write a Java class using ÌndexReader and ÌndexSearcher classes of Lucene API. You need to build a query and pass on to searcher instance as parameter. There is no automatic REST point.
Lucene is simply an API, originally in Java then later ported to.NET too so either you can use Java or C# to develop your index creator as well as index searcher programs.
Your searcher code will eventually be a Java Class and it is you - the programmer who might wish to expose search logic via a REST End Point. Lucene doesn't provide any off the shelf things like that.
IndexReader and IndexSearcher are main Java classes for searching an index.
Lucene API changes heavily from one version to another so look for code examples for only your chosen version.
As per accepted answer of this SO Question, its possible to search a lucene index with SOLR. I have personally not performed that kind of search though.

Are there any nosql database could do search(like lucene) on map/reduce

I'm using cloudant which I could use mapreduce to project view of data and also it could search document with lucene
But these 2 feature is separate and cannot be used together
Suppose I make a game with userdata like this
{
name: ""
items:[]
}
Each user has item. Then I want to let user find all swords with quality +10. With cloudant I might project type and quality as key and use query key=["sword",10]
But it cannot make query more complex than that like lucene could do. To do lucene I need to normalize all items to be document and reference it with owner
I really wish I could do a lucene search on a key of data projection. I mean, instead of normalization, I could store nested document as I want and use map/reduce to project data inside document so I could search for items directly
PS. If that database has partial update by scripting and inherently has transaction update feature that would be the best
I'd suggest trying out elasticsearch.
Seems like your use case should be covered by the search api
If you need to do more complex analytics elasticsearch supports aggregations.
I am not at all sure that I got the question correctly, but you may want to take a look at riak. It offers a solr-based search, which is quite well documented. I have used it in the past for distributed search over a distributed key-value index and it was quite fast.
If you use this, you will also need to take a look at the syntax of solr queries, so I add it here to save you some time. However, keep in mind that not all of those solr query functionalities were available in riak (at least that was when I used it).
There are several solutions that would do the job. I can give my 2 cents proposing the well established MongoDB. With MongoDB you can create a text-Index on a given field and then do a full text Search as explained here. The feature is in MongoDb since version 2.4 and the syntax is well documented on MongoDB docs.

how to use Entity Recognition with Apache solr and LingPipe or similar tools

I would like to use NLP while indexing the data with Apache Solr.
Identify the synonyms of the words and index that also.
Identify thenamed entity and label it while indexing.
when some one query the Solr Index, I should able to extract the
named entity and intention from the query and form the query string,
so that it can effectively search the indexed file.
Is there any tools / plugins available to satisfy my requirements? I believe it is a common use cases for most of the content based websites. How people handling it?
Here's a tutorial on using Stanford NER with SOLR.
Check out Apache UIMA
Specifically, if you need Solr to do named entity recognition, you can integrate it with UIMA using SolrUIMA
Check out this talk, that demonstrates UIMA + Solr.

Can I use Solr just for search an existing Lucene index?

I use Lucene locally to index documents. I know how to use Lucene pretty well. I never used Solr but I want to run a web search using a Lucene index so I'm now looking into it.
Can I install Solr on EC2 let's say, and then instead of indexing documents using Solr, doing it locally using Lucene directly and then just coping the Lucene index from my machine to EC2 which Solr will be using for search?
I'm assuming it's possible as long as I keep the index on disk but would like to be sure.
Thanks!
It's certainly possible, you would only make sure to maintain the exactly the same index structure (defined by Solr schema). However, it would also mean that your configuration would be stored in two completely separate places -- e.g. each time you would change an analyzer in Lucene, you would need to synchronize this change in Solr XML configuration. I'm not sure what benefit would Solr bring in such use case.

Is there a project that integrates CouchDb and Solr?

I would like to be able to search a CouchDB database using Solr. Are there any projects that provide such an integration?
I am also aware of CouchDB-Lucene. Is there a way to hook Solr into that?
Thanks!
It would make more sense to roll your own, given how wasy it easy. First you need to decide what kind of SOLR schema to use and how to map your CouchDB documents onto that schema. Then simple iterate through all the documents in a db Pagination in CouchDB? and generate SOLR <add> documents.
People do this all the time with all kinds of data sources. Since SOLR is essentially searching a single table, the hard work is often figuring out how to map your database format onto a single table. Read up on what you can do with the SOLR schema, and you may be surprised at how easy this is.
There is a CouchDB integration for ElasticSearch available, apart from feeding ElasticSearch with JSON on your own. Both work with schema-less JSON, so it's very easy to integrate them.
In terms of features, ElasticSearch would offer a comparable set to Solr (in addition to some unique features, of course.)
According to this
http://wiki.apache.org/couchdb/Related_Projects
there was a CouchDB-Solr2 project (scroll down to the end), which is no longer maintained.

Resources