Daemon thread doesn't complete it's execution when we restart zookeeper

Daemon thread doesn't complete it's execution when we restart zookeeper - solr

In our current architecture of the project we are using solr for gathering, storing and indexing documents from different sources and making them searchable in near real-time
Our web applications running on tomcat connecting to solr to create / modify the documents
Solr uses Zookeeper to keep the configuration centralized
There are 5 servers in our cluster where we are running solr
when the zookeeper restarts in one of the server the daemon thread created in the server doesn't complete it's execution due to which
We are getting continuous logs with below exceptions while trying to connect to zookeeper from tomcat instance
org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [org.apache.zookeeper.ClientCnxn$SendThread]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
which in some time runs out of thread in the server
can someone help me with the below question please ?
why the daemon thread doesn't complete it's execution when we restart zookeeper
Solr Version : 8.5.1
zookeeper version : 3.5.5

Related

Solr is stopped when update/reindex

Production server : Solr 5.4.1, Ruby on rails, Ubuntu server.
Solr is suddenly stopped, when I restarted, it work to select/get data but for any update/reindex record job execute, again Solr is stopped. In log also I can not find any error statement.
I have compared the solr log for running system and stopped system and found that after runing DirectUpdateHander2 end_commit_flush, below log does not exist on non-working system log:
97588877 INFO (searcherExecutor-7-thread-1-processing-x:namecol) [x:namecol] o.a.s.c.SolrCore [namecol] Registered new searcher Searcher#1bf35cb6[namecol main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_3rc22(5.4.1):C68771/19227:delGen=227) Uninverting(_4ee4k(5.4.1):C43777/12974) Uninverting(_4fogn(5.4.1):C13374/2400) Uninverting(_4fopo(5.4.1):c1712/83) Uninverting(_4fomr(5.4.1):c1150/216) Uninverting(_4foqs(5.4.1):c995/64) Uninverting(_4for4(5.4.1):c156) Uninverting(_4for8(5.4.1):c94) Uninverting(_4for9(5.4.1):c3)))}
Which part do I need to check? I have set softCommit to -1 so now solr is not stopped after any frontend changes but also not update the select data also until not restart it again.

As a workaround, I have created a new core and re-index all data again.
And also updated the Solr version to 8.8.2 for the better stable release.

Solr server not starting on Windows

Solr version 8.5.1
My solr is not starting anymore. I use solr start command to start the Solr. Every time I run this command I see the following error
Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory.
Waiting up to 30 to see Solr running on port 8983
ERROR: Solr at http://localhost:8983/solr did not come online within 30 seconds!
There is no error in the log files. But connecting to Solr is failing. This was working earlier.
Could someone please help me to troubleshoot the issue?

I found out what the issue is. Even though the message indicated that the server did not start in 30 seconds, it started after some time.
I closed the console window as the server was running in the background and it killed the server. The server is up as long as I keep the command window that I used to start the server.

Remote debugging Flink local cluster

I want to deploy my jobs on a local Flink cluster during development (i.e. JobManager and TaskManager running on my development laptop), and use remote debugging. I tried adding
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005" to the flink-conf.yaml file. Since job and task manager are running on the same machine, the task manager throws exception stating that the socket is already in use and terminates. Is there any way I can get this running.

You are probably setting env.java.opts, which affects all JVMs started by Flink. Since the jobmanager gets started first, it grabs the port before the taskmanager is started.
You can use env.java.opts.taskmanager to pass parameters only for taskmanager JVMs.

ZooKeeper - SOLR issue

We are using Solr 4.2.1 and ZooKeeper 3.4.5 and there are 2 Solr servers.
Solr is reporting "No registered leader was found" and "WARNING ZkStateReader ZooKeeper watch triggered, but Solr cannot talk to ZK".
ZooKeeper is reporting "Exception when following the leader".
But after restarting both, it works for some time and it reports the issue again.
Here are some additional logs from Solr:
SEVERE ZkController There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props
org.apache.solr.common.SolrException: No registered leader was found, collection:www-live slice:shard1
SEVERE: shard update error StdNode: http://10.23.3.47:8983/solr/www-live/:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://10.23.3.47:8983/solr/www-live
SEVERE: Recovery failed - trying again... (5) core=www-live
From ZooKeeper
2016-01-14 11:25:08,423 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower#89] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
Any help is much appreciated.
Thank you.

How many zookeepers you have?
It must be on odd numbers for leader election. If it is on even number, please update it to odd number and try again.
Three ZooKeeper servers is the minimum recommended size for an
ensemble, and we also recommend that they run on separate machines.
For reliable ZooKeeper service, you should deploy ZooKeeper in a
cluster known as an ensemble. As long as a majority of the ensemble
are up, the service will be available. Because Zookeeper requires a
majority, it is best to use an odd number of machines. For example,
with four machines ZooKeeper can only handle the failure of a single
machine; if two machines fail, the remaining two machines do not
constitute a majority. However, with five machines ZooKeeper can
handle the failure of two machines.
http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html

Tomcat 6.0 is getting stopped after certain time automatically

Tomcat 6.0 is getting stoped after certain time automatically.. My machine is never turned off. but still this process is stopped . I am using My tomcat server in production mode.. and I really don't feel good starting my server daily.
What could be the reason because in Production mode server should never get stopped.

Check in your task scheduler;
Go To start->type in search task schduler
go to task scheduler. Check whether any task is running to stop the serverr.
or you can increase permgen space.
Server might be stop because of out of memory exception.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Daemon thread doesn't complete it's execution when we restart zookeeper - solr

Related

Solr is stopped when update/reindex

Solr server not starting on Windows

Remote debugging Flink local cluster

ZooKeeper - SOLR issue

Tomcat 6.0 is getting stopped after certain time automatically

Categories

Resources