Testing SOLR to Elasticsearch data transfer - solr

I have recently moved my entire SOLR documents into Elasticsearch after creating an exact equivalent mapping of the schema.xml . To test the accuracy, i created about 120 lucene queries and queried it on SOLR and elasticsearch.
However on testing the hitcounts for 17/120 queries differed between SOLR and elasticsearch.Could there be any reasons for this apart from the analyzers, tokenizers, filters defined in schema.xml/ elasticsearch mappings. The SOLR version is 4.3.0 whereas the elasticsearch version is 1.3.2
The elasticsearch query i used is :
{"query_string":{"query":lucene_query}}
Please let me know, if there is any alternative way to test the query accuracy between SOLR and Elasticsearch.

First, make sure that you are using the same semantics. For example, same filters, tokenizers, stemmers.
Also, Apache Solr 4.3.0 is built on Apache Lucene 4.3.0 , while ElasticSearch 1.3.2 is built on Apache Lucene 4.9.0
This might not be the issue, I don't know to be honest. But if I were you, I would check the release notes of Apache Lucene > 4.3.0 and see what is changed.

Related

how to find out what Solr version is DSE using

I am trying to find out what Solr version our DSE setup is using. I know it uses a custom modified solr, but I want to know the index Lucene version.
Apart from opening an index with Luke, is there somewhere where DSE shows this info? I don't see it in the Solr admin overview.
EDIT: I am only counting on looking at the setup, not any doc
Check the release notes:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html
You can also see it in your system.log on startup.
Note: solr and lucene versions are the same now that they are a single project:
https://github.com/apache/lucene-solr/releases
In the solrconfig.xml, there is usually a line such as this:
<luceneMatchVersion>5.3.0</luceneMatchVersion>
This gives you the minimum version of Lucene required.

Update solr schema.xml in real time for Solr 4.10.1

I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl. However, I could not do that for my eariler version of Solr 4.10.1.
Would like to check, is this function available for the earlier version of Solr, and is the curl syntax the same as Solr 5.0?
According to Solr Wiki, it's possible to request schema from Solr 4.2 and modify it starting from Solr 4.4
In order to enable schema modifications via the Schema REST API, the
schema implementation must be declared as managed by Solr, that is,
not to be manually edited.
Further, the schema must be configured as mutable in order to make
modifications to it.
Both of these schema features (managed and mutable) are configured via
the element in solrconfig.xml.
More information - https://wiki.apache.org/solr/SchemaRESTAPI

Using ComplexPhraseQueryParser in Datastax search

I want to perform complex searches in Datastax search. On solr wiki page, it is suggested to use a complex phrase query parser to do the work (https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser) However, the syntax did not work - so it seems i need to plug it in additionally.
I am using Datastax enterprise 4.5. Is there any particular procedure to plug in the parser - maybe put it in particular location and make specific changes to get it started?
Complex phrase query parser was added in Solr 4.8. DSE 4.5 and DSE 4.6 is on Solr 4.6. DSE 4.7 (which is not available yet) will contain Solr 4.10 which will include the complex phrase query parser.

Best Tika integration on Solr or Nutch

Which is the best integration for Apache Tika assuming that I already connected and used Nutch(2.2.1) + Solr (4.3)?
I understand that Tika can be integrated within Nutch and/or Solr, but which one is the best decision?
Set up the Tika plugin with Nutch, Nutch will parse the data for you and will do all the hard work for you.
I would suggest setting it up on Solr as well, you may wish to send documents to Solr via the curl command and it would help to have it set up on Solr too. It comes with little extra configuration and no performance costs:
There is a guide to setting up Tika & extracting request handler here
Apply tika parser in Nutch's parsing phase.

How to configure Apache Tika with apache Solr 1.4.1

I want to index a large number of pdf documents.
I have found a reference showing that it could be done using Apache Tika but unfortunately I cannot find any reference that describes I could configure Apache Tika in Solr 1.4.1.
Once configured I do have it configured, how can I send documents to Solr directly without using curl?
I am using solrnet for indexing.
See ExtractingRequestHandler
Support for ExtractingRequestHandler in SolrNet is not yet complete. You can either finish implementing it, or work around it and craft your own HttpWebRequests.

Resources