I have a Solr Cloud with 2 node cluster. It has 2 replicas one on each node with a single shard.
The cores created are {collection_name}_shard1_replica1 and {collection_name}_shard1_replica2.
When I perform a collection backup, and restore into a new collection, the documents are indexed properly on both the nodes, However the cores created are named differently {collection_name}_shard1_replica0 and {collection_name}_shard1_replica1
Additionally, when I delete or add documents it gets only deleted from one node which means the replication does not work. I also noticed on one node I do not have the index folder on one of the nodes from where document is not getting deleted or added.
What could I be possibly doing wrong?
So all those interested in the solution, a restart of all the nodes sequentially helped (still not able to digest why it was required and is missing in the documentation).
Related
We have some sites which use solr as an internal search. This is done with the extension ext:solr from DKD. Within the extension there is an install script which provides core for multiple languages.
This is working well on most systems.
Meanwhile we have some bigger sites and as there are some specialities we get problems:
We have sites which import data on a regulary base from outside of TYPO3. To get the solr index up to date we need to rebuild the complete index (at night). But as the site gets bigger the reindex takes longer and longer. And if an error occurs the index is broken the next day.
You could say: no problem just refresh all records, but that would leave information in the index for records which are deleted meanwhile (there is no 'delete' information in the import, except that a deleted record is no longer in the import. So a complete delete of all records before the import (or special marking and explicit deletion afterwards) is necessary.
Anyway, the reindex takes very long and can't be triggered any time. And an error leaves the index incomplete.
In theory there is the option to work with two indices: one which is build up anew and the other one is used for search requests. In this way you always have a complete index, so it might be not up to date. After the new index is build you can swap the indices and rebuild the older one.
That needs to be triggered from inside of TYPO3, but I have not found anything about such a configuration.
Another theoretic option might be a master-slave configuration, but as far as I think about it:
when the index of master is reset to rebuild it, this reset would be synchronized to slave which looses all the information it should provide until the rebuild is complete.
(I think the problem is independent of a specific TYPO3 or solr version, so no version tag)
you know about our read and write concept introduced in EXT:Solr 9 https://docs.typo3.org/p/apache-solr-for-typo3/solr/11.0/en-us/Releases/solr-release-9-0.html#support-to-differ-between-read-and-write-connections ?
Isn't it something for your case?
The only one thing what you need is to setup it in deployment properly.
If your fresh Index is finalized and fine and not broken, you just switch the read core to read from previous write one.
I am running solr 4 version. I have created million fields in solr using a script . I saw GC has gone very high after adding these fields as every time searcher is open, these fields were loaded.
Now, I want to go back to the stage where my solr cluster was before adding those fields. Even though, I delete documents which has those fields, the cluster is not coming back to what it was as the fields are not getting deleted from fieldsInfo file.
Is there a way we can explicitly tell solr to delete the fields from the fieldsInfo file???
There is a schema API documented that can delete a field. However I don't know if this is already available for Solr 4. You should try if it works.
I remove document in CouchDB by setting the _deleted attribute to true (PUT method). The last revision of document is deleted but previous revision is still available.
And when I pull documents of specific type from database, this document is still available.
How I should delete document to not be available?
I use synchronization between CouchDB on the server and PouchDB instances on mobile applications (Ionic).
You need to compact your database. Compaction is a process of removing unused and old data from database or view index files, not unlike vacuum in RDBMS. It could be triggered by calling _compact end-point of a database, e.g. curl -X POST http://192.168.99.100:5984/koi/_compact -H'Content-Type: application/json'. After that the attempts to access the previous revisions of a deleted document should return error 404 with a reason missing.
Note that the document itself not going to completely disappear, something called "tombstone" will be left behind. The reason is that CouchDB needs to track deleted documents during replication to prevent accidental document recovery.
We have the following DSE cluster setup:
DC Cassandra
Cassandra node 1
DC Solr
Solr node 1
Solr node 2
Solr node 3
Solr node 4
We want to replace Solr node 1 with a more powerful machine. I'm under the impression that we need to follow the procedure for replacing a dead node which involves:
Adding the new node to the cluster
Allowing the cluster to automatically re-balance itself
Removing the old node via nodetool removenode
Running nodetool cleanup in each remaining node
However, my colleage resorted to copying everything (user files, system files, and the Cassandra/Solr data files) from the old machine to the new machine. Will this approach work? If yes, is there any additional step that we need to do? If not, how can we correct this? (i.e. do we simply delete the datafiles and restart the node as an empty node? or will doing so lead to data loss?)
So your approach should work... here are some things to observe
Make sure you shut down your C* on the node to replace.
Make it impossible to start C* on the old node by accident (move the jar files away for example, or at least temporarily move the /etc/init.d/dse script somewhere else)
Copy everything to the new machine
Shutdown the old machine (disconnect network if possible).
Make sure that the new machine has the same ip address as the old one, and that for the first boot it's not gonna start C* (not a real requirement, but more a precaution in case the ip address doesn't match, or there is something else wrong with that box).
double check everything is fine, reenable C* and restart the machine. Depending on how you copied that machine I would be more concerned with OS system files in terms of stability. If you just copied the C* app and data files you should be fine.
make sure you NEVER start the old machine with an active C*.
I haven't tried this, but there isn't anything I know off that would prevent this from working (now that I said this, I am probably gonna get dinged... but I DID ask one of our key engineers :-).
The more "standard" procedure is this, which I will propose for our docs:
Replacing a running node
Replace a node with a new node, for example to update to newer hardware/proactiv
e maintenance.
You must prepare and start the replacement node, integrate it into the cluster,
and then remove the old node.
Procedure
Confirm that the node is alive:
a) Run nodetool ring if not using vnodes.
b) Run nodetool status if using vnodes.
The nodetool command shows a up status for the node (UN)
Note Host ID of the node to replace; it is used in the last step.
Add and start the replacement node as described in http://www.datastax.com/docs/1.1/cluster_management#adding-capacity-to-an-existing-cluster
Using the Host ID of the original old node, remove the old node from the cluster using the nodetool removenode command. See http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html for detaile dinstructions.
We have started working on our current search from master/slave to SolrCloud. I have couple of questions related with expanding the nodes dynamically. Please help.
What is best way to migrate an existing shard to new node? is it just a creating a core on new node manually as below or there is another way?
localhost:8888/solr/admin/cores?action=CREATE&name=testcollection_shard1_replica1&collection=testcollection&shard=shard1&collection.configName=collection1
How to create new replica dynamically? is just creating a new core as below or there is another way?
localhost:8888/solr/admin/cores?action=CREATE&name=testcollection_shard1_replica2&collection=testcollection&shard=shard1&collection.configName=collection1
How to add a brand new shard to collection dynamically? is it just creating a new core with new shard name on a new node as below? will on newly created shard documents be distributed automatically? or this is not the right way and we should use shard splitting?
localhost:8888/solr/admin/cores?action=CREATE&name=testcollection_shard2_replica1&collection=testcollection&shard=shard2&collection.configName=collection1
Thank you so much for help!!
-Umesh
To move an existing shard to new node, just add a new replica on the new node and wait until the replication is completed. After that, you can shutdown the old node or remove it from the cluster using the UNLOAD command.
To create a new replica dynamically, the collection command you have mentionned is the only way.
To create a new shard, the only thing you can do is to split an existing shard. Just mind that your collection is not balanced. The splitted shard will have the hash range divided but all other unsplit shards still have the same hash range as before.