New Solr node in "Active - Joining" state for several days - solr

We are trying to add a new Solr node to our cluster:
DC Cassandra
Cassandra node 1
DC Solr
Solr node 1 <-- new node (actually, a replacement for an old node; we followed the steps for "replacing a dead node")
Solr node 2
Solr node 3
Solr node 4
Solr node 5
Our Cassandra data is approximately 962gb. Replication factor is 1 for both DCs. Is it normal for the new node to be in "Active - Joining" state for several days? Is there a way to know the progress?
Last week, there was a time when we had to kill and restart the DSE process because it began throwing "too many open files" exception. Right now, the system log is full of messages about completed compaction/flushing tasks (no errors so far).
EDIT:
The node is still in "Active - Joining" state as of this moment. It's been exactly a week since we restarted the DSE process in that node. I started monitoring the size of the solr.data directory yesterday and so far I haven't seen an increase. The system.log is still filled with compacting/flushing messages.
One thing that bothers me is that in OpsCenter Nodes screen (ring/list view), the node is shown under the "Cassandra" DC even though the node is a Solr node. In nodetool status, nodetool ring, and dsetool ring, the node is listed under the correct DC.
EDIT:
We decided to restart the bootstrap process from scratch by deleting the data and commitlog directories. Unfortunately, during the subsequent bootstrap attempt:
The stream from node 3 to node 1 (the new node) failed with an exception: ERROR [STREAM-OUT-/] 2014-04-01 01:14:40,887 CassandraDaemon.java (line 196) Exception in thread Thread[STREAM-OUT-/,5,main]
The stream from node 4 to node 1 never started. The last relevant line in node 4's system.log is: StreamResultFuture.java (line 116) Received streaming plan for Bootstrap. It should have been followed by: Prepare completed. Receiving 0 files(0 bytes), sending x files(y bytes)
How can I force those streams to be retried?

Related

TDengine database 2.6 cluster failed to build

The first node had data. Now add the second node tdengine-server-b, but the status not received cannot be added.
I want to add the second node tdengine-server-b

Solr recovery mode

I am running Solr cluster 7.4 with 2 nodes and 9 shards and 2 replicas for each shard.
When one of the servers crashes, I see this message (Skipping download for _3nap.fnm because it already exists) in logs:
2019-04-16 09:20:21.333 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr
x:telegram_channel_post_archive_shard5_replica_n53
c:telegram_channel_post_archive s:shard5 r:core_node54)
[c:telegram_channel_post_archive s:shard5 r:core_node54
x:telegram_channel_post_archive_shard5_replica_n53]
o.a.s.h.IndexFetcher Skipping download for _3nap.fnm because it already exists
2019-04-16 09:20:35.265 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.dim because it already exists
2019-04-16 09:20:51.437 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.si because it already exists
2019-04-16 09:21:00.528 INFO (qtp1543148593-32) [c:telegram_channel_post_archive s:shard20 r:core_node41 x:telegram_channel_post_archive_shard20_replica_n38] o.a.s.u.p.LogUpdateProcessorFactory [telegram_channel_post_archive_shard20_replica_n38] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=dedupe&distrib.from=http://192.168.1.1:4239/solr/telegram_channel_post_archive_shard20_replica_n83/&min_rf=2&wt=javabin&version=2}{add=[9734588300_4723 (1630961769251864576), 9734588300_4693 (1630961769253961728), 9734588300_4670 (1630961769255010304), 9734588300_4656 (1630961769255010305)]} 0 80197
How is the recovery method in Solar?
Will they transfer all the documents from the shard or only the broken parts?
I found this note in the document:
If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If a replica is too far out of sync, the system asks for a full replication/replay-based recovery.
but I don't understand this part and what does this mean?
If a replica is too far out of sync
The note just says that it'll attempt to sync as little as possible, but if that's not possible - i.e. the sync is so far behind that the transaction log isn't usable any longer, the complete set of files in the index will be replicated to the index. This takes longer than regular replication.
The message you're getting is that the file in question has already been replicated, so it doesn't have to be sent to the replica again.

Corrupt sstable /var/lib/cassandra/data/solr_admin/solr_resources in Datastax

My DSE version is 4.7.3.
I got error "Corrupt sstable /var/lib/cassandra/data/solr_admin/solr_resources-a31c76040e40393b82d7ba3d910ad50a/solr_admin-solr_resources-ka-9808=[TOC.txt, Index.db, Digest.sha1, Filter.db, CompressionInfo.db, Statistics.db, Data.db]; skipping table"
so getting time out error while inserting records. After restart node the issue temp fixed but after some hours again i got time out error when insert records.
Kindly help me to fix the issue
You can get this if the server is being killed and not allowed to shutdown cleanly. Caused by https://issues.apache.org/jira/browse/CASSANDRA-10501. I would recommend updating to 4.8.11 or 5.0.4 (or later) to rule them out.
Follow below mentioned step :
1) Try to rebuild the sstable on the node using "nodetool scrub"
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsScrub.html
If issue still don't get solved, follow below step
2) Shutdown the dse node.
3) Scrub the the sstable using "sstablescrub [options] "
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableScrub_t.html
4) Remove the corrupt SSTable
5) Stare dse service in dse node
6) Repair using "nodetool repair"

SymmetricDS Could Not Find Batch to Acknowledge as OK

Im doing bi-directional push on 3 tier nodes.
Why on 1st and 2nd tier nodes are spamming error like this:
1st tier node is logging error:
"IncomingBatchService - Skipping batch x"
"DataLoaderService - x data and x batches loaded during push request from 2nd tier. There were x batches in error."
2nd tier node is logging error:
"PushService - Push data sent to 3rd tier"
"AcknowledgeService - Could not find batch to acknowledge as OK"
"PushService - Pushed data to 3rd tier. x data and x batches were processed"
After checking DBs:
On 2nd tier node the batch is pointed to 3rd tier node with LD status and reload channel. No batch in same id that pointed to 1st tier node
On 1st tier node the batch is pointed to 2nd tier node with OK status an reload channel
Help, thank you.
there must be logs on target nodes with exceptions thrown by data loader trying to load batches in error. find them and they'll tell what's wrong
there's a mistake in the 3rd tier node. sync.url should be http://<3rd_tier_node_IP>/sync/<engine.name>

Disabling virtual nodes in an existing Solr DC

I have an existing cluster with the following topology:
DC Cassandra: 2 nodes
DC Solr: 5 nodes
All of the nodes currently use vnodes. I want to disable vnodes in the Solr DC for performance reasons.
According to this document, to disable vnodes:
In the cassandra.yaml file, set num_tokens to 1
Uncomment the initial_token property and set it to 1 or to the value of a generated token for a multi-node cluster.
Is this all that I need to do? (no repair, no cleanup, no anything?) Seems too good to be true for me.
As for token assignment, should I use the python code found here (for Murmur3) or should I reuse one of the existing tokens from the vnodes that the node currently has?
The only way to disable vnodes is to do: http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configVnodesProduction_t.html
in the reverse. Make a new Solr dc with vnodes off and switch over to it.

Resources