Zookeeper errors - solr

I am using solr with zookeeper and see the following errors in zookeeper logs
Using zk 3.4.10 and solr 6.6
EndOfStreamException: Unable to read additional data from client sessionid 0x1XXXXXXX, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-04-28 06:24:59,939 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /10.40.96.193:46260 which had sessionid 0x1XXXXXXX
The zoo keeper config
tickTime=2000
initLimit=10
syncLimit=5
Do these config values result in above exception? If yes, can someone explain whether we should increase or decrease initLimit & syncLimit?
Thanks in advance.

Those 3 config parameters only refer to the ZooKeeper servers (ensemble) and irrelevant to your exception. They are for synchronization between the leader and the followers.
Your client connection exception is more likely caused by a network issue (maybe TCP keep alive settings).
See ZooKeeper Administrator's Guide:Cluster options for more information on initLimit and syncLimit.

Related

Flink cluster unable to boot up - getChildren() failed w/ error = -6

I setup a new Flink cluster (v1.15) in a Kubernetes cluster. This new cluster is setup in the same namespace in which an existing Flink cluster (v1.13) is running fine.
The job-manager of the new Flink cluster is in a CrashLoopBackOff state. job-manager prints the following set of messages continuously, which includes a specific ERROR message:
2022-10-10 23:02:47,214 INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server flink-zk-client-service/100.65.161.135:2181
2022-10-10 23:02:47,214 ERROR org.apache.flink.shaded.curator5.org.apache.curator.ConnectionState [] - Authentication failed
2022-10-10 23:02:47,215 INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Socket connection established, initiating session, client: /100.98.125.116:57754, server: flink-zk-client-service/100.65.161.135:2181
2022-10-10 23:02:47,216 INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Session establishment complete on server flink-zk-client-service/100.65.161.135:2181, sessionid = 0x381a609d51f0082, negotiated timeout = 4000
2022-10-10 23:02:47,216 INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager [] - State change: RECONNECTED
2022-10-10 23:02:47,216 INFO org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - Connection to ZooKeeper was reconnected. Leader election can be restarted.
2022-10-10 23:02:47,217 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - Connection to ZooKeeper was reconnected. Leader retrieval can be restarted.
2022-10-10 23:02:47,217 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - Connection to ZooKeeper was reconnected. Leader retrieval can be restarted.
2022-10-10 23:02:47,218 ERROR org.apache.flink.shaded.curator5.org.apache.curator.framework.recipes.leader.LeaderLatch [] - getChildren() failed. rc = -6 <============
2022-10-10 23:02:47,218 INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Unable to read additional data from server sessionid 0x381a609d51f0082, likely server has closed socket, closing socket connection and attempting reconnect
It seems the error message indicates that a specific node of the new cluster in ZK either does not exist or does not have any children. But I could be off. I am using the same zookeeper for the v1.13 cluster, no issues with that cluster.
Content of zookeeper. (ClusterId is - dev-cl2):
ls /dev-cl2/dev-cl2
[leader]
ls /dev-cl2/dev-cl2/leader
[]
get /dev-cl2/dev-cl2/leader
// Nothing printed
Any help or pointers to troubleshooting this issue would be greatly appreciated. Thank you.
Update1:
I noticed the following the 1.15 release notes.
A new multiple component leader election service was implemented that only runs a single leader election per Flink process. If this should cause any problems, then you can set high-availability.use-old-ha-services: true in the flink-conf.yaml to use the old high availability services.
As a test, I set high-availability.use-old-ha-services: true. Did not have any effect.

ActiveMQ slave broker accepts incoming connection from Apache Camel

I have the following configuration:
Two actively running Tomcat instances running Apache Camel 2.20.2 that use the competing consumer concept to read message of the same JMS message queue
ActiveMQ 5.15.0 in a master/slave configuration using a shared kahaDB
It happens that one of the Camel instances connects to the slave broker even though the slave broker is not active (i.e. as far as I can tell from the log files it did not get a lock on the kahaDB).
When this occurs the route on that Camel instance is blocked, and we get a ExchangeTimedOutException and this blocks the route and messages are being queued up.
WARN EndpointMessageListener:213 - Execution of JMS message listener failed. Caused by: [org.apache.camel.RuntimeCamelException - org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 30000 millis. Exchange[ID-MXPBMES-01P-I02-1625784159041-1-16108]]
Is it normal that a slave broker accepts a connection from a client application (Camel in our case)?
The secondary broker should not accept connections so this sounds like a bug, although you are not using the latest broker so before doing anything you should update to the latest release as there are always bug fixes going on.
Some issues can arise if the underlying file system does not provide a reliable locking mechanism which can lead to both primary and backup brokers becoming active.

Solr loses connection with zookeeper and does not reconnect

We are having a Solr cloud setup (solr version 4.10.4). It works well except that sometimes, Solr instances loose connection with zookeeper. The exception in the Solr logs say 'No route to Host'. Restarting both the Solr instances solves the issue.
Could that be an issue with the Zookeeper DNS? Even if so, how can restarting Solr fix the issue?. Please help
Exception is as follows
"1480390035494","11/29/2016 14:27:15.494 +1100","2016-11-29 14:27:15,494 ERROR ajp-bio-127.0.0.1-18062-exec-18 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1555)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:650)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
Session 0x357b694d40e000e for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)","ip-","608","linux--solr","app/solr","ip-172","/opt/solr.log"

Connection refused when starting Solr with external Zookeeper

I have setup 3 servers with Amazon EC2, and have each server with the following Zookeeper-config.
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
server.1=server1address:2888:3888
server.2=server3address:2888:3888
server.3=server3address:2888:3888
I start zookeeper on each server, and after I start Solr on the servers, I get errors like this in Solr:
3766 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper
3790 [main-SendThread(*serverAddress*:2181)] WARN org.apache.zookeeper.ClientCnxn – Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
This was apparently coming because Zookeeper wasn't running properly. What I then figured out was that zookeeper was producing this error:
2013-06-09 08:00:57,953 [myid:1] - INFO [ec2amazonaddress.com/ipaddress#amazon:QuorumCnxManager$Listener#493] - Received connection request /ipaddress:60855
2013-06-09 08:00:57,963 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#368] - Cannot open
channel to 3 at election address ec2amazonaddress/ipaddress#amazon:
3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:35
4)
So the problem is with ZooKeeper. What I did was to start another server before the server I previously started first, and then it worked. However, after some restarts that didn't work anymore. In other words, it seems like the order of when you start the ZK server matters. I was able to see that some servers who were fired up first went into follower mode instead of leader mode right away, and maybe that's the reason. I have deleted and reinstalled my whole setup, but the problem was still there.
I have checked the ports and have killed all processes using ports 2181 and 2888/3888 before launching Zookeeper. What bothers me is that this has worked with the same setup earlier.
Hope some of you guys have some experience with this problem. Any suggestion that could be related to not being able to connect to ZK-servers is also welcomed

SolrCloud with embedded ZooKeeper server says: "ZooKeeperServer not running"

When I start my SolrCloud server, Solr opens a socket connection to the embedded ZooKeeper server but says: "ZooKeeperServer not running".
It doesn't state a reason.
How can I figure out why the ZooKeeper server isn't actually running?
2012-05-30 15:02:36.538 [main] INFO org.apache.solr.cloud.SolrZkServer - STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983
2012-05-30 15:02:36.545 [Thread-14] INFO o.a.z.server.ZooKeeperServerMain - Starting server
2012-05-30 15:02:36.552 [Thread-14] INFO o.a.zookeeper.server.ZooKeeperServer - Server environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
... [snip] ...
2012-05-30 15:02:37.092 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:9983
2012-05-30 15:02:37.097 [main-SendThread(localhost:9983)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:9983, initiating session
2012-05-30 15:02:37.097 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] INFO o.a.zookeeper.server.NIOServerCnxn - Accepted socket connection from /127.0.0.1:43635
2012-05-30 15:02:37.100 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN o.a.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-05-30 15:02:37.100 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] INFO o.a.zookeeper.server.NIOServerCnxn - Closed socket connection for client /127.0.0.1:43635 (no session established for client)
2012-05-30 15:02:37.101 [main-SendThread(localhost:9983)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
In my case specifically, it seemed that having a bunch of extra files in my conf/ directory was causing problems. Try to have the fewest amount of files necessary in that directory to ensure embedded Zookeeper running properly.

Resources