I've accidentally cancelled the Solr index build in one my Search nodes. How do I restart the indexing on that node?
nodetool rebuild_index doesn't work. The command exits almost immediately - probably because it is meant to work with native Cassandra indexes whereas my table's indexes are of custom type "com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex"
Clicking the "Reindex"/"Full reindex" button in the Solr core admin UI, on the other hand, will trigger the re-indexing of the whole columnfamily across all Search nodes.
Is there a way to trigger the indexing in that node only? I'm using DSE 4.0.1 (Cassandra 2.0.5, Solr 4.6.0.1)
In order to reindex a single node, you have to reload its core with the reindex=true and distributed=false parameters, as explained in: http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/srch/srchReldCore.html
Related
We are upgrading Sitecore 8 to 9.3 for that we upgraded Lucene to solr
Can we compare Lucene and Solr index files so that we will be able to know the newly generated solr index files have the same data or not
It seem technically possible as you could use Luke to explore the content of the Lucene index folder.
While Solr data can be queried via either Sitecore UI, or Solr admin.
No. The indexes are very different even though the underlying technology is similar. What I find best is to have an old and new version of the same site with the same data. Then you can compare site search pages and any part of the site that runs on search.
I am using Solr v7.7.1 in cloud mode. I am facing an issue related to optimistic concurrency:
I have a nested document which can be updated concurrently multiple times before committing the updates. During the process of indexing, we fetch the document which we want to modify along with its _version_, modify it and then send it to solr along with the same _version_. If the update happens more than once before committing, the following error is thrown:
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at
http://1.2.3.4:8983/solr/mcollection_shard1_replica_n2: version
conflict for 1111 expected=1645085633861910528
actual=1645090791527284737
In the above error, we are basically trying to index a document with id 1111 before a previous version of the document was indexed and committed. The solution for this problem is to simply commit all the updates and then again try indexing the new document. However, the solr is giving the same error with same version codes even after committing. What could possibly the issue?
A strange observation is that this problem is not faced when solr is not running in the cloud mode.
This seems to be a very specific issue with solr when we are using nested documents.
While indexing a document, when _version_ is mentioned, the solr checks the version of the already existing latest document by doing a real-time get. The real-time get gets the data from update logs (which means that the data which is not yet open for search is also accessible). For this, solr does something like following:
http://1.2.3.4:8983/solr/mcollection/get?id=1111
Now if you have 2 nested documents where, in one document (doc1), parent has id=1111 and in other document(doc2), the child has id=1111, then it may be possible that solr might check version of doc2 when you intended to index doc1. This might be because solr still indexes all the documents in flat structure and doesn't consider parent-child relationship while doing real-time get.
The solution to this is to make the id of parent and child documents different from each other.
The bug has been reported: https://issues.apache.org/jira/browse/SOLR-13785
There are indexes of some solr cores which I convert them from solr4 to solr6 but in solr standalone mode. so they don't have the "version" field that solrcolud require.
Here now I want to migrate to solrcloud 6 and I need to put them under cluster. Because the version field dose not exist there in these indexes when I put them Under a solrcloud leader core on the data directory the replicas in the shard didn't update as I saw. so I decided to read them by lucene, get each doc fields, add them to a solrdoc and then put them doc by doc in solrcloud. But cause there are fields that not stored in these indexes so all fields that exist here in these indexes don't move there.
At the end it seems there is no way for me than re-indexing.
I appreciate if there is any better idea or solutions that can help me migrate more easily.
If there is any chance to reindex, just do so, it's going to be the best in the end (you have to deal with two separate issues: a) migrate from 4.X to 6.0 and b)from standalone to SolrCloud...it's going to be messy).
If you cannot reindex:
are all your fields stored OR have docValues=true? If so, you can get the original contents of your docs. Read them and index them with solrj or with some script.
if not, and you have a version field: try to manually put the index in Solrcloud. Not straighforward, but possible.
if you don't have a version field, I think it is impossible to put the index as is in Solrcloud (although some post on the net make you think it is). You could try to write some lucene code to add version field to all docs (with values that make sense), but this should be the very last resort.
Is there a way we can add documents into a specific shard?
For example, documents type A will always get inserted into shard1 and document type B always go to shard2.
I have tried using custom router but it does not guaranty that different prefix will route to different shard.
PS. I am on Solr 5 using cloud mode.
A caveat: I'm using SolrNet to access SolrCloud, and it doesn't integrate with ZooKeeper yet. For Java clients, this might be far easier.
Despite what I read here and here with regard to the CompositeId Router, I could never get it to work. What #jay helped me figure out is a way to use "implicit" routing to achieve this. If you create your collection like this (leave out the numShards parameter):
http://localhost:8983/solr/admin/collections?action=CREATE&name=myCol&maxShardsPerNode=2&router.name=implicit&shards=shard1,shard2&router.field=shard
then add a field to your schema.xml named "shard" (matching the router.field parameter), you can index to a specific shard simply by adding the shard field to the document being indexed and specifying the shard name. At query time, you can specify shards to search -- more here (I was able to simply specify the shard name w/o a specific address).
I haven't tested this in production yet, but have verified using multiple VirtualBox instances, with ZooKeeper, HAProxy, and several Solr nodes, and it's doing exactly what I expected. Corrections and comments welcome.
I saw from the FAQ that a DSE node can be reprovisioned from RT mode to Hadoop mode. Is something similar supported with DSE Search and DSE Spark? I have an existing 6-node DSE Search cluster. I want to test DSE Spark but I have very limited time left for development so if possible, I'd like to skip the bootstrap process by simply restarting my cluster as an Analytics DC instead of adding new nodes in a separate DC.
UPDATE:
I tried to find an answer on my own. These are the closest that I found:
http://www.datastax.com/wp-content/uploads/2012/03/WP-DataStax-WhatsNewDSE2.pdf
http://www.datastax.com/doc-source/pdf/dse20.pdf
These documents are for a very old release of DSE. Both documents say that only RT and Analytics node can be re-provisioned. The second document even explicitly says that a Solr node cannot be re-provisioned. Unfortunately, there is no mention about re-provisioning in more recent documentations.
Can anybody confirm whether this is still true with DSE 4.5.1? (preferably with a link to a reference)
I also saw this forum thread which explains why the section about re-provisioning was removed in recent documentations. However, in my case, I plan to re-provision all of my Search nodes as Analytics node (in contrast to re-provisioning only a subset), and the re-provisioning would only be temporary
Yes you can do that. Just start it using 'dse Cassandra -k'