Upgrade solr 1.4 index to solr 3.3? - solr

I have an existing index build using apache solr 1.4.
I want to use this existing index in version 3.3. As you know the index format is changed after 3.x, so how is it possible to do this?
I have exported the existing index (that is in 1.4 version) using Luke to XML.

There's two ways to do this:
if your index is unoptimized, then simply optimize it - this will upgrade the file format along the way.
if your index is already optimized, you can't do this. Instead, use the command line tool supplied with solr (your path may differ from mine
java -cp work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-core-3.3.0.jar org.apache.lucene.index.IndexUpgrader -verbose /path/to/index/directory
However, note that this only changes the file format - it won't stop deprecation warnings because unless you tell it otherwise, solrconfig.xml defaults to still assuming you're using an old index format. see http://www.mail-archive.com/dev#lucene.apache.org/msg23233.html
You may still get lots of lines like this in your logfile:
WARNING: LowerCaseFilterFactory is using deprecated LUCENE_24 emulation. You should at some point declare and reindex to at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0
until you tell solrconfig.xml that you're ready to use all the features of the new index format. You do this by adding the following to solrconfig.xml (at the top level, just after the abortOnConfigurationError setting).
<!-- Controls what version of Lucene various components of Solr
adhere to. Generally, you want to use the latest version to
get all bug fixes and improvements. It is highly recommended
that you fully re-index after changing this setting as it can
affect both how text is indexed and queried.
-->
<luceneMatchVersion>LUCENE_33</luceneMatchVersion>

If you have the data: the best way is indexing all the data new in solr 3.3
You can use the data import handler to index your exported XML files.
If building up a new index is not an solution for you, you have got different possibilities:
As far as i know, Solr 3.3 can read old indexes.
So one idea could be using shards. One shard for the old data (read only) an the other shard for the new data. Unfortunately, in this solution you will be unable to modify old data.

Related

Migrate solr standalone index to solrcloud

There are indexes of some solr cores which I convert them from solr4 to solr6 but in solr standalone mode. so they don't have the "version" field that solrcolud require.
Here now I want to migrate to solrcloud 6 and I need to put them under cluster. Because the version field dose not exist there in these indexes when I put them Under a solrcloud leader core on the data directory the replicas in the shard didn't update as I saw. so I decided to read them by lucene, get each doc fields, add them to a solrdoc and then put them doc by doc in solrcloud. But cause there are fields that not stored in these indexes so all fields that exist here in these indexes don't move there.
At the end it seems there is no way for me than re-indexing.
I appreciate if there is any better idea or solutions that can help me migrate more easily.
If there is any chance to reindex, just do so, it's going to be the best in the end (you have to deal with two separate issues: a) migrate from 4.X to 6.0 and b)from standalone to SolrCloud...it's going to be messy).
If you cannot reindex:
are all your fields stored OR have docValues=true? If so, you can get the original contents of your docs. Read them and index them with solrj or with some script.
if not, and you have a version field: try to manually put the index in Solrcloud. Not straighforward, but possible.
if you don't have a version field, I think it is impossible to put the index as is in Solrcloud (although some post on the net make you think it is). You could try to write some lucene code to add version field to all docs (with values that make sense), but this should be the very last resort.

Solr luceneMatchVersion syntax

I have Solr 4.10 and I have collection on it with solorconfig.xml has the value for <luceneMatchVersion> as follows:
<luceneMatchVersion>4.7</luceneMatchVersion>
Is this correct? I saw other examples that has values such as LUCENE_35 What I need to know also, how could I express LUCENE_xx from my current Solr version?
You should use:
<luceneMatchVersion>4.10.4</luceneMatchVersion>
I recommend you to check your current solr version, in my case was 4.10.4.
if you are going to reindex, then both numbers should match. The only reason you might want to have them different, is if you had and index created with say Lucene 4.7, then you would have
<luceneMatchVersion>4.7</luceneMatchVersion>
Then, you upgrade lucene to 4.10.
Now, if among the changes in between 4.7 and 4.10 there are things that work differently regarding analysis (you get the same sentence analysed in both versions and get different output as a result), then, you might want to keep the version number at 4.7, otherwise some queries that contain affected terms might not work (as they were analysed at index time in a different way than at query time). You have to asses how critical that issue might be.
That is why the recommendation is to upgrade, change the setting to the current number, and reindex. This way you are sure to avoid any issue.
If anyone is using Drupal, the Search API Solr (search_api_solr) module has config templates by version in /sites/all/modules/search_api_solr/solr-conf/.
The template README.md states the following:
The solr-conf-templates directory contains config-set templates for
different Solr versions.
These are templates and are not to be used as config-sets!
To get a functional config-set you need to generate it via the Drupal
admin UI or with drush solr-gsc. See README.md in the module
directory for details.
The module's README.md lists these instructions:
Make sure you have Apache Solr started and accessible (i.e. via port 8983). You can start it without having a core configured at
this stage.
Visit Drupal configuration (/admin/config/search/search-api) and create a new Search API Server according to the search_api
documentation using "Solr" as Backend and the connector that
matches your setup. Input the correct core name (which you will
create at step 4, below).
Download the config.zip from the server's details page or by using drush solr-gsc with proper options, for example for a server named
"my_solr_server": drush solr-gsc my_solr_server config.zip 8.4.
Copy the config.zip to the Solr server and extract.
I generated a config file for 8.x, and it uses this:
<luceneMatchVersion>${solr.luceneMatchVersion:LUCENE_80}</luceneMatchVersion>

Mahout & Lucene Version Compatibility

I am trying to use Mahout to do some analysis on the term vectors stored in my Solr/Lucene index. Unfortunately, it seems that the latest Mahout release is behind the latest Solr/Lucene release.
My Solr/Lucene installation is 4.10.3. As far as I can tell, the latest Mahout release (1.0) expects Lucene indexes at version 4.6.1.
When I run mahout lucene.vector I get the error:
Exception in thread "main" org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: MMapIndexInput(path="/path/to/data/index/segments.gen")): -3 (needs to be between -2 and -2)
I have tried two things so far to tackle this problem:
First, I edited my solrconfig.xml file to say:
<luceneMatchVersion>4.6.1</luceneMatchVersion>
delete my indexed data, and built a clean index from the original documents. This has done nothing to fix the error.
So secondly, I tried to change the lucene.version in the Mahout pom.xml file to 4.10.3 and recompile the binary to see if the capabilities had been added yet. I knew this was unlikely to work, but tried anyway.
My question is, how do I appropriately change the Lucene version that Solr uses for writing index files if it is not the above luceneMatchVersion setting in solrconfig.xml?
Mahout seems to support Solr 3.x for now. You may try this patch for mahout.

What are the differences between Solr 3.6.2 and Solr 4.0?

Are there any major differences between Solr 3.6 and Solr 4.0 other than new features? Am I safe using my existing queries (those that work in Solr 3.6) inside of Solr 4.0?
Are there any major differences between Solr 3.6 and Solr 4.0 other
than new features?
I find this question weird, least to say. Bug fixes and new features are the whole point of releases!
You can look at the full changelog of the Solr release which is a available here. Don't forget that Solr and Lucene are released in unison so you also need to look for relevant changes in both projects.
Am I safe using my existing queries (those that work in Solr 3.6)
inside of Solr 4.0?
Queries should be fine, but indices - probably not. Quoting javanna from another SO post:
The index format has changed, but Solr will take care of upgrading the
index. That happens automatically once you start Solr with your old
index. But after that the index cannot be read anymore by a previous
Solr/lucene version.
Ideally they should work.
You can probably check the Changes.txt which would give an idea of all the new features, Changes, Bug fixes, Optimization done.
If any things breaks, you can always refer to the Changes to check if any related has been changed.

Solr Upgrade from 3.4 to 4

In order to make use of pivot feature present on Solr 4, I upgraded from 3.4.
Shall I proceed with a full reindex of the content due this upgrade or are they compatible somehow?
And regarding my client-applications that are currently accessing my solr server 3.4, will they present problem after upgrade? (The preliminary test I did they are running, seems the xml schema returned in a query response didn't changed when you don't use new features)
You need to do a full reindex if you want to use the Solr 4 index structure. Else you need to change the Lucene version in solrconfig to use the old index.
The schema will need a new field called _version_ if you want to use the Real Time Get functionality.
Other then that most things are pretty much the same for the client.

Resources