We have the following DSE cluster setup:
DC Cassandra
Cassandra node 1
DC Solr
Solr node 1
Solr node 2
Solr node 3
Solr node 4
We want to replace Solr node 1 with a more powerful machine. I'm under the impression that we need to follow the procedure for replacing a dead node which involves:
Adding the new node to the cluster
Allowing the cluster to automatically re-balance itself
Removing the old node via nodetool removenode
Running nodetool cleanup in each remaining node
However, my colleage resorted to copying everything (user files, system files, and the Cassandra/Solr data files) from the old machine to the new machine. Will this approach work? If yes, is there any additional step that we need to do? If not, how can we correct this? (i.e. do we simply delete the datafiles and restart the node as an empty node? or will doing so lead to data loss?)
So your approach should work... here are some things to observe
Make sure you shut down your C* on the node to replace.
Make it impossible to start C* on the old node by accident (move the jar files away for example, or at least temporarily move the /etc/init.d/dse script somewhere else)
Copy everything to the new machine
Shutdown the old machine (disconnect network if possible).
Make sure that the new machine has the same ip address as the old one, and that for the first boot it's not gonna start C* (not a real requirement, but more a precaution in case the ip address doesn't match, or there is something else wrong with that box).
double check everything is fine, reenable C* and restart the machine. Depending on how you copied that machine I would be more concerned with OS system files in terms of stability. If you just copied the C* app and data files you should be fine.
make sure you NEVER start the old machine with an active C*.
I haven't tried this, but there isn't anything I know off that would prevent this from working (now that I said this, I am probably gonna get dinged... but I DID ask one of our key engineers :-).
The more "standard" procedure is this, which I will propose for our docs:
Replacing a running node
Replace a node with a new node, for example to update to newer hardware/proactiv
e maintenance.
You must prepare and start the replacement node, integrate it into the cluster,
and then remove the old node.
Procedure
Confirm that the node is alive:
a) Run nodetool ring if not using vnodes.
b) Run nodetool status if using vnodes.
The nodetool command shows a up status for the node (UN)
Note Host ID of the node to replace; it is used in the last step.
Add and start the replacement node as described in http://www.datastax.com/docs/1.1/cluster_management#adding-capacity-to-an-existing-cluster
Using the Host ID of the original old node, remove the old node from the cluster using the nodetool removenode command. See http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html for detaile dinstructions.
Related
We have some sites which use solr as an internal search. This is done with the extension ext:solr from DKD. Within the extension there is an install script which provides core for multiple languages.
This is working well on most systems.
Meanwhile we have some bigger sites and as there are some specialities we get problems:
We have sites which import data on a regulary base from outside of TYPO3. To get the solr index up to date we need to rebuild the complete index (at night). But as the site gets bigger the reindex takes longer and longer. And if an error occurs the index is broken the next day.
You could say: no problem just refresh all records, but that would leave information in the index for records which are deleted meanwhile (there is no 'delete' information in the import, except that a deleted record is no longer in the import. So a complete delete of all records before the import (or special marking and explicit deletion afterwards) is necessary.
Anyway, the reindex takes very long and can't be triggered any time. And an error leaves the index incomplete.
In theory there is the option to work with two indices: one which is build up anew and the other one is used for search requests. In this way you always have a complete index, so it might be not up to date. After the new index is build you can swap the indices and rebuild the older one.
That needs to be triggered from inside of TYPO3, but I have not found anything about such a configuration.
Another theoretic option might be a master-slave configuration, but as far as I think about it:
when the index of master is reset to rebuild it, this reset would be synchronized to slave which looses all the information it should provide until the rebuild is complete.
(I think the problem is independent of a specific TYPO3 or solr version, so no version tag)
you know about our read and write concept introduced in EXT:Solr 9 https://docs.typo3.org/p/apache-solr-for-typo3/solr/11.0/en-us/Releases/solr-release-9-0.html#support-to-differ-between-read-and-write-connections ?
Isn't it something for your case?
The only one thing what you need is to setup it in deployment properly.
If your fresh Index is finalized and fine and not broken, you just switch the read core to read from previous write one.
I'm sry to speck english not well
job node can delete old checkpoints. but, task node can't
I already setted retain-num.
and I try to use NFS mount but not operating.
so What should I do?
I have a Solr Cloud with 2 node cluster. It has 2 replicas one on each node with a single shard.
The cores created are {collection_name}_shard1_replica1 and {collection_name}_shard1_replica2.
When I perform a collection backup, and restore into a new collection, the documents are indexed properly on both the nodes, However the cores created are named differently {collection_name}_shard1_replica0 and {collection_name}_shard1_replica1
Additionally, when I delete or add documents it gets only deleted from one node which means the replication does not work. I also noticed on one node I do not have the index folder on one of the nodes from where document is not getting deleted or added.
What could I be possibly doing wrong?
So all those interested in the solution, a restart of all the nodes sequentially helped (still not able to digest why it was required and is missing in the documentation).
We have a cassandra cluster of 4 nodes, and it was working perfectly. After 2 of the nodes got restarted (since they were lxcs on the same machine), those 2 nodes are not able to join the cluster and fail with the error :
ERROR [MigrationStage:1] 2014-07-06 20:34:36,994 MigrationTask.java (line 55) Can't send migration
request: node /X.X.X.93 is down.
Two of the nodes (not restarted), are showing them DN in the nodetool status, while the others (ones which got restarted), are showing the others as UN.
I've checked the gossipinfo and that is fine.
Can anybody help me on this?
I suppose you have cross_node_timeout = true and time between your servers is not in sync. You might want to check your ntp settings.
The new nodes might be dropping the requests for data that they are getting from the older nodes. Hence the ntp should be configured on all the nodes of cassandra.
I set up a replication master and a replication salve for SOLR and it doesn't do anything. My suspicion is the generation number. The master has 232 while the slave has 241. If somebody can confirm my suspicion, that would be great and I also would like to know how to resolve this issue if so?
I can't confirm it but I can deny it. :)
From this post, here is how it works:
It look at the index version AND the index generation. If both slave's
version and generation are the same as on master, nothing gets
replicated. if the master's generation is greater than on slave, the
slave fetches the delta files only (even if the partial merge was done
on the master) and put the new files from master to the same index
folder on slave (either index or index., see further
explanation). However, if the master's index generation is equals or
less than one on slave, the slave does the full replication by
fetching all files of the master's index and place them into a
separate folder on slave (index.). Then, if the fetch is
successfull, the slave updates (or creates) the index.properties file
and puts there the name of the "current" index folder. The "old"
index. folder(s) will be kept in 1.4.x - which was treated
as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
slave does commit or reload core depending whether the config files
were replicated. There is another bug in 1.4.x that fails replication
if the slave need to do the full replication AND the config files were
changed - also fixed in 3.1 (see SOLR-1983).
plus there is another bug still opened about the index generation id:
http://lucene.472066.n3.nabble.com/replication-problems-with-solr4-1-td4039647.html
Now to answer your question in one line, the replication always occur (full pr partial) if the version (or replication) numbers are different from master to slave.