There are indexes of some solr cores which I convert them from solr4 to solr6 but in solr standalone mode. so they don't have the "version" field that solrcolud require.
Here now I want to migrate to solrcloud 6 and I need to put them under cluster. Because the version field dose not exist there in these indexes when I put them Under a solrcloud leader core on the data directory the replicas in the shard didn't update as I saw. so I decided to read them by lucene, get each doc fields, add them to a solrdoc and then put them doc by doc in solrcloud. But cause there are fields that not stored in these indexes so all fields that exist here in these indexes don't move there.
At the end it seems there is no way for me than re-indexing.
I appreciate if there is any better idea or solutions that can help me migrate more easily.
If there is any chance to reindex, just do so, it's going to be the best in the end (you have to deal with two separate issues: a) migrate from 4.X to 6.0 and b)from standalone to SolrCloud...it's going to be messy).
If you cannot reindex:
are all your fields stored OR have docValues=true? If so, you can get the original contents of your docs. Read them and index them with solrj or with some script.
if not, and you have a version field: try to manually put the index in Solrcloud. Not straighforward, but possible.
if you don't have a version field, I think it is impossible to put the index as is in Solrcloud (although some post on the net make you think it is). You could try to write some lucene code to add version field to all docs (with values that make sense), but this should be the very last resort.
Related
We are upgrading Sitecore 8 to 9.3 for that we upgraded Lucene to solr
Can we compare Lucene and Solr index files so that we will be able to know the newly generated solr index files have the same data or not
It seem technically possible as you could use Luke to explore the content of the Lucene index folder.
While Solr data can be queried via either Sitecore UI, or Solr admin.
No. The indexes are very different even though the underlying technology is similar. What I find best is to have an old and new version of the same site with the same data. Then you can compare site search pages and any part of the site that runs on search.
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
Is there a way we can add documents into a specific shard?
For example, documents type A will always get inserted into shard1 and document type B always go to shard2.
I have tried using custom router but it does not guaranty that different prefix will route to different shard.
PS. I am on Solr 5 using cloud mode.
A caveat: I'm using SolrNet to access SolrCloud, and it doesn't integrate with ZooKeeper yet. For Java clients, this might be far easier.
Despite what I read here and here with regard to the CompositeId Router, I could never get it to work. What #jay helped me figure out is a way to use "implicit" routing to achieve this. If you create your collection like this (leave out the numShards parameter):
http://localhost:8983/solr/admin/collections?action=CREATE&name=myCol&maxShardsPerNode=2&router.name=implicit&shards=shard1,shard2&router.field=shard
then add a field to your schema.xml named "shard" (matching the router.field parameter), you can index to a specific shard simply by adding the shard field to the document being indexed and specifying the shard name. At query time, you can specify shards to search -- more here (I was able to simply specify the shard name w/o a specific address).
I haven't tested this in production yet, but have verified using multiple VirtualBox instances, with ZooKeeper, HAProxy, and several Solr nodes, and it's doing exactly what I expected. Corrections and comments welcome.
I am relatively new to Apache SOlr and have recently been working with DIH, specifically the XPathEntityProcessor. I need a way to periodically index new XML files, however, it appears the delta-import command is only supported by the sqlEntityProcessor [1].
I am working with an increasingly large dataset of XML files and was hoping solr could determine new files and index them...
A potential solution that came to mind is to possibly do a full-import from a staging area consisting of documents that have not been previously index, before moving the documents to their respective permanent locations.
Is there a workaround to mimicking delte-import using XPathEntityProcessor?
What sort of approaches do people using XPathEntityProcessor use to index newer documents?
[1] http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command-1
I've resorted to using the UpdateRequestHandler; it's perfect for what I want to do.
[1] http://wiki.apache.org/solr/XsltUpdateRequestHandler
In order to make use of pivot feature present on Solr 4, I upgraded from 3.4.
Shall I proceed with a full reindex of the content due this upgrade or are they compatible somehow?
And regarding my client-applications that are currently accessing my solr server 3.4, will they present problem after upgrade? (The preliminary test I did they are running, seems the xml schema returned in a query response didn't changed when you don't use new features)
You need to do a full reindex if you want to use the Solr 4 index structure. Else you need to change the Lucene version in solrconfig to use the old index.
The schema will need a new field called _version_ if you want to use the Real Time Get functionality.
Other then that most things are pretty much the same for the client.