Move Solr data from v4 to v7 - solr

I have a solr cluster running solr v4.3 I want to export all the data and import it into a new solr v7.1 cluster.
What options do I have to export/import the data?

options are:
if you have ALL your fields stored, you can try several things:
use DIH in Solr7 to index all data from a SolrEntityProcessor
write some scripts/code to export all data (in batches, using cursorMark if available in 4.3 or doing the cursor yourself with a fq on some field) in csv, and index it into Solr7
similarly, write some java/Solrj code that does the same thing
if you don't have all fields stored, then the only way is upgrading to Solr 6 first, then to 7 (by going through the upgrade process, but this does not reindex the data, which is highly recommended)
All this, assuming you don't have the original data to reindex, if you have it, it is a no brainer: reindex off it.

Related

How to reindex solr after schema change?

We need to modify the schema fields dynamically, which requires reindexing.The solr ref guide recommands deleting all the documents and then re-run the original index process,but that does fit us.Does anyone have other ideas?
Thanks for any help.
Reindexing is required. Best option is using Data Import Handler. If you don't want to run the queries from database or don't want to pass documents, then also solr provides way to index the solr core through other solr core.
You can do something like this :
For more details, read solr DIH and SolrEntityProcessor : https://solr.apache.org/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor

Can we compare Lucene and Solr index files

We are upgrading Sitecore 8 to 9.3 for that we upgraded Lucene to solr
Can we compare Lucene and Solr index files so that we will be able to know the newly generated solr index files have the same data or not
It seem technically possible as you could use Luke to explore the content of the Lucene index folder.
While Solr data can be queried via either Sitecore UI, or Solr admin.
No. The indexes are very different even though the underlying technology is similar. What I find best is to have an old and new version of the same site with the same data. Then you can compare site search pages and any part of the site that runs on search.

Migrate solr standalone index to solrcloud

There are indexes of some solr cores which I convert them from solr4 to solr6 but in solr standalone mode. so they don't have the "version" field that solrcolud require.
Here now I want to migrate to solrcloud 6 and I need to put them under cluster. Because the version field dose not exist there in these indexes when I put them Under a solrcloud leader core on the data directory the replicas in the shard didn't update as I saw. so I decided to read them by lucene, get each doc fields, add them to a solrdoc and then put them doc by doc in solrcloud. But cause there are fields that not stored in these indexes so all fields that exist here in these indexes don't move there.
At the end it seems there is no way for me than re-indexing.
I appreciate if there is any better idea or solutions that can help me migrate more easily.
If there is any chance to reindex, just do so, it's going to be the best in the end (you have to deal with two separate issues: a) migrate from 4.X to 6.0 and b)from standalone to SolrCloud...it's going to be messy).
If you cannot reindex:
are all your fields stored OR have docValues=true? If so, you can get the original contents of your docs. Read them and index them with solrj or with some script.
if not, and you have a version field: try to manually put the index in Solrcloud. Not straighforward, but possible.
if you don't have a version field, I think it is impossible to put the index as is in Solrcloud (although some post on the net make you think it is). You could try to write some lucene code to add version field to all docs (with values that make sense), but this should be the very last resort.

How to Insert a document to a specific shard in Apache Solr Cloud Mode

Is there a way we can add documents into a specific shard?
For example, documents type A will always get inserted into shard1 and document type B always go to shard2.
I have tried using custom router but it does not guaranty that different prefix will route to different shard.
PS. I am on Solr 5 using cloud mode.
A caveat: I'm using SolrNet to access SolrCloud, and it doesn't integrate with ZooKeeper yet. For Java clients, this might be far easier.
Despite what I read here and here with regard to the CompositeId Router, I could never get it to work. What #jay helped me figure out is a way to use "implicit" routing to achieve this. If you create your collection like this (leave out the numShards parameter):
http://localhost:8983/solr/admin/collections?action=CREATE&name=myCol&maxShardsPerNode=2&router.name=implicit&shards=shard1,shard2&router.field=shard
then add a field to your schema.xml named "shard" (matching the router.field parameter), you can index to a specific shard simply by adding the shard field to the document being indexed and specifying the shard name. At query time, you can specify shards to search -- more here (I was able to simply specify the shard name w/o a specific address).
I haven't tested this in production yet, but have verified using multiple VirtualBox instances, with ZooKeeper, HAProxy, and several Solr nodes, and it's doing exactly what I expected. Corrections and comments welcome.

Solr Upgrade from 3.4 to 4

In order to make use of pivot feature present on Solr 4, I upgraded from 3.4.
Shall I proceed with a full reindex of the content due this upgrade or are they compatible somehow?
And regarding my client-applications that are currently accessing my solr server 3.4, will they present problem after upgrade? (The preliminary test I did they are running, seems the xml schema returned in a query response didn't changed when you don't use new features)
You need to do a full reindex if you want to use the Solr 4 index structure. Else you need to change the Lucene version in solrconfig to use the old index.
The schema will need a new field called _version_ if you want to use the Real Time Get functionality.
Other then that most things are pretty much the same for the client.

Resources