Mahout & Lucene Version Compatibility - solr

I am trying to use Mahout to do some analysis on the term vectors stored in my Solr/Lucene index. Unfortunately, it seems that the latest Mahout release is behind the latest Solr/Lucene release.
My Solr/Lucene installation is 4.10.3. As far as I can tell, the latest Mahout release (1.0) expects Lucene indexes at version 4.6.1.
When I run mahout lucene.vector I get the error:
Exception in thread "main" org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: MMapIndexInput(path="/path/to/data/index/segments.gen")): -3 (needs to be between -2 and -2)
I have tried two things so far to tackle this problem:
First, I edited my solrconfig.xml file to say:
<luceneMatchVersion>4.6.1</luceneMatchVersion>
delete my indexed data, and built a clean index from the original documents. This has done nothing to fix the error.
So secondly, I tried to change the lucene.version in the Mahout pom.xml file to 4.10.3 and recompile the binary to see if the capabilities had been added yet. I knew this was unlikely to work, but tried anyway.
My question is, how do I appropriately change the Lucene version that Solr uses for writing index files if it is not the above luceneMatchVersion setting in solrconfig.xml?

Mahout seems to support Solr 3.x for now. You may try this patch for mahout.

Related

Is it possible to upgrade from Solr 4.x directly to Solr 6.1?

We are looking to upgrade from SolrCloud 4.10.3 to SolrCloud 6.1. The documentation for Solr 6.1 is not very clear on backward compatibility.
I came across this post on the LucidWorks site.
The index format is backward compatible between two consecutive major
Solr versions. So a Solr 3.x index is compatible with a Solr 4.x
index. However if you have a Solr 1.x index and want to upgrade to
Solr 4.x then you would need to first upgrade to Solr 3.x first.
It was written before Solr 6.x was out, and the wording of "between two consecutive major Solr versions" is unclear. The example skips the exact scenario that I'm interested in (skipping exactly 1 major version).
Do I have to first upgrade to Solr 5.x and then go to Solr 6.1?
Since I face same situation on upgrading SOLR from 4.x to 6.x I have been lucky and found on git hub next script, that is making the upgrade:
https://github.com/cominvent/solr-tools.git/
All the credits goes to "cominvent" for this script.
Since the folder cores vers 4.x structure is not same with version 6.x I have made a script that is creating the right tree configuration, then is applying upgradeindex.sh.
The script (buildsorltree.sh) can be found on https://github.com/cradules/bash_scripts and the repo dose have upgradeindex.sh too. Since I have linked this too scripts, I put them on same repo. Good luck!
I was able to find this on the Apache website.
Solr 6 has no support for reading Lucene/Solr 4.x and earlier indexes.
Be sure to run the Lucene IndexUpgrader included with Solr 5.5 if you
might still have old 4x formatted segments in your index.
Alternatively: fully optimize your index with Solr 5.5 to make sure it
consists only of one up-to-date index segment.
So this means that you can upgrade directly, but only if you run the IndexUpgrader from Solr 5.5 first.

how to find out what Solr version is DSE using

I am trying to find out what Solr version our DSE setup is using. I know it uses a custom modified solr, but I want to know the index Lucene version.
Apart from opening an index with Luke, is there somewhere where DSE shows this info? I don't see it in the Solr admin overview.
EDIT: I am only counting on looking at the setup, not any doc
Check the release notes:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html
You can also see it in your system.log on startup.
Note: solr and lucene versions are the same now that they are a single project:
https://github.com/apache/lucene-solr/releases
In the solrconfig.xml, there is usually a line such as this:
<luceneMatchVersion>5.3.0</luceneMatchVersion>
This gives you the minimum version of Lucene required.

Install ComplexPhraseQueryParser with Solr 4.2

In reference to my other SO question (Using solr 4.2 how do I use/enable fuzzy phrase searching)
I was told I can get fuzzy phrases working by installing the plugin mentioned there (https://issues.apache.org/jira/browse/SOLR-1604). However, with every attempt I've made I cannot get it to work. When I download the latest dated file, there is not readme or install directions. Also, I'm not entirely sure that there is a version for Solr 4.2 yet.
Can someone provide me with instructions on how to install that plugin with Solr 4.2?
I was just looking at this myself. According to another SO question (https://stackoverflow.com/a/28463319/1222019) that parser was added in Solr 4.8, though it may be possible to add it to the solr war file in earlier versions and recompile.

What are the differences between Solr 3.6.2 and Solr 4.0?

Are there any major differences between Solr 3.6 and Solr 4.0 other than new features? Am I safe using my existing queries (those that work in Solr 3.6) inside of Solr 4.0?
Are there any major differences between Solr 3.6 and Solr 4.0 other
than new features?
I find this question weird, least to say. Bug fixes and new features are the whole point of releases!
You can look at the full changelog of the Solr release which is a available here. Don't forget that Solr and Lucene are released in unison so you also need to look for relevant changes in both projects.
Am I safe using my existing queries (those that work in Solr 3.6)
inside of Solr 4.0?
Queries should be fine, but indices - probably not. Quoting javanna from another SO post:
The index format has changed, but Solr will take care of upgrading the
index. That happens automatically once you start Solr with your old
index. But after that the index cannot be read anymore by a previous
Solr/lucene version.
Ideally they should work.
You can probably check the Changes.txt which would give an idea of all the new features, Changes, Bug fixes, Optimization done.
If any things breaks, you can always refer to the Changes to check if any related has been changed.

Upgrade solr 1.4 index to solr 3.3?

I have an existing index build using apache solr 1.4.
I want to use this existing index in version 3.3. As you know the index format is changed after 3.x, so how is it possible to do this?
I have exported the existing index (that is in 1.4 version) using Luke to XML.
There's two ways to do this:
if your index is unoptimized, then simply optimize it - this will upgrade the file format along the way.
if your index is already optimized, you can't do this. Instead, use the command line tool supplied with solr (your path may differ from mine
java -cp work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-core-3.3.0.jar org.apache.lucene.index.IndexUpgrader -verbose /path/to/index/directory
However, note that this only changes the file format - it won't stop deprecation warnings because unless you tell it otherwise, solrconfig.xml defaults to still assuming you're using an old index format. see http://www.mail-archive.com/dev#lucene.apache.org/msg23233.html
You may still get lots of lines like this in your logfile:
WARNING: LowerCaseFilterFactory is using deprecated LUCENE_24 emulation. You should at some point declare and reindex to at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0
until you tell solrconfig.xml that you're ready to use all the features of the new index format. You do this by adding the following to solrconfig.xml (at the top level, just after the abortOnConfigurationError setting).
<!-- Controls what version of Lucene various components of Solr
adhere to. Generally, you want to use the latest version to
get all bug fixes and improvements. It is highly recommended
that you fully re-index after changing this setting as it can
affect both how text is indexed and queried.
-->
<luceneMatchVersion>LUCENE_33</luceneMatchVersion>
If you have the data: the best way is indexing all the data new in solr 3.3
You can use the data import handler to index your exported XML files.
If building up a new index is not an solution for you, you have got different possibilities:
As far as i know, Solr 3.3 can read old indexes.
So one idea could be using shards. One shard for the old data (read only) an the other shard for the new data. Unfortunately, in this solution you will be unable to modify old data.

Resources