Connection refused when starting Solr with external Zookeeper - solr

I have setup 3 servers with Amazon EC2, and have each server with the following Zookeeper-config.
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
server.1=server1address:2888:3888
server.2=server3address:2888:3888
server.3=server3address:2888:3888
I start zookeeper on each server, and after I start Solr on the servers, I get errors like this in Solr:
3766 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper
3790 [main-SendThread(*serverAddress*:2181)] WARN org.apache.zookeeper.ClientCnxn – Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
This was apparently coming because Zookeeper wasn't running properly. What I then figured out was that zookeeper was producing this error:
2013-06-09 08:00:57,953 [myid:1] - INFO [ec2amazonaddress.com/ipaddress#amazon:QuorumCnxManager$Listener#493] - Received connection request /ipaddress:60855
2013-06-09 08:00:57,963 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#368] - Cannot open
channel to 3 at election address ec2amazonaddress/ipaddress#amazon:
3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:35
4)
So the problem is with ZooKeeper. What I did was to start another server before the server I previously started first, and then it worked. However, after some restarts that didn't work anymore. In other words, it seems like the order of when you start the ZK server matters. I was able to see that some servers who were fired up first went into follower mode instead of leader mode right away, and maybe that's the reason. I have deleted and reinstalled my whole setup, but the problem was still there.
I have checked the ports and have killed all processes using ports 2181 and 2888/3888 before launching Zookeeper. What bothers me is that this has worked with the same setup earlier.
Hope some of you guys have some experience with this problem. Any suggestion that could be related to not being able to connect to ZK-servers is also welcomed

Related

Zookeeper errors

I am using solr with zookeeper and see the following errors in zookeeper logs
Using zk 3.4.10 and solr 6.6
EndOfStreamException: Unable to read additional data from client sessionid 0x1XXXXXXX, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-04-28 06:24:59,939 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /10.40.96.193:46260 which had sessionid 0x1XXXXXXX
The zoo keeper config
tickTime=2000
initLimit=10
syncLimit=5
Do these config values result in above exception? If yes, can someone explain whether we should increase or decrease initLimit & syncLimit?
Thanks in advance.
Those 3 config parameters only refer to the ZooKeeper servers (ensemble) and irrelevant to your exception. They are for synchronization between the leader and the followers.
Your client connection exception is more likely caused by a network issue (maybe TCP keep alive settings).
See ZooKeeper Administrator's Guide:Cluster options for more information on initLimit and syncLimit.

Running Flink build-in program sometimes arise Exception:java.io.IOException: Connecting the channel failed

I have set up a flink standalone cluster, with one master and three slaves , all SESU Linux machines. In the master Dashboard http://flink-master:8081/ I can see 3 Task Managers and 3 task slots as I have set taskmanager.numberOfTaskSlots: 1 in flink-conf.yaml in all of the slaves.
When I run a flink built-in program,like the examples/streaming/Iteration.jar,I get exception often:
java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'ccr202/127.0.0.2:49651' has failed. This might indicate that the remote task manager has been lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:197)
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:132)
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:84)
at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:59)
at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:480)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:502)
at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:93)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:214)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager + 'ccr202/127.0.0.2:49651' has failed. This might indicate that the remote task manager has been lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:220)
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:132)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
Caused by: java.net.ConnectException: Connection refused: ccr202/127.0.0.2:49651
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 6 more
It seems that the network causes the problem,but sometimes the flink program can successfully finish.So what is the reason?
I also encounter this issue very frequently especially when there are many taskManagers. There are a few config I have tried to solve this issue. It's happened when the taskManager read the remote partition through netty connection. It timed out when request the connection. I increased the config "taskmanager.network.netty.server.numThreads", it solved the issue.

Solr loses connection with zookeeper and does not reconnect

We are having a Solr cloud setup (solr version 4.10.4). It works well except that sometimes, Solr instances loose connection with zookeeper. The exception in the Solr logs say 'No route to Host'. Restarting both the Solr instances solves the issue.
Could that be an issue with the Zookeeper DNS? Even if so, how can restarting Solr fix the issue?. Please help
Exception is as follows
"1480390035494","11/29/2016 14:27:15.494 +1100","2016-11-29 14:27:15,494 ERROR ajp-bio-127.0.0.1-18062-exec-18 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1555)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:650)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
Session 0x357b694d40e000e for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)","ip-","608","linux--solr","app/solr","ip-172","/opt/solr.log"

failed to connect to 127.0.0.1:7199: connection refused

I am getting error failed to connect to 127.0.0.1:7199: connection refused when I do a nodetool status on my RHEL machine. It was working fine until yesterday but today it suddenly started giving this error. I did not make any changes to the configuration files.
I have DSE installed and properly configured as it was running fine till yesterday from past 3-4 months. The cassandra.yaml has the cluster name, seed, rpc address, rpc port, listen address all configured correctly. Also I set -Djava.rmi.server.hostname=<server ip address>; in cassandra-env.sh. Still did not work. Nor am I able to connect to cqlsh, nor my SOLR is accessible after this. Also I have allowed all ports on my security group on my machine to check if it is any port problem but it is not.
Any help would be appreciated.
Check your /etc/cassandra/cassandra.yaml file. It should be like
authenticator: AllowAllAuthenticator
Problem may be caused of this.
I was getting the same error, and it worked for me after the following commands:
systemctl start cassandra
systemctl restart cassandra

SolrCloud with embedded ZooKeeper server says: "ZooKeeperServer not running"

When I start my SolrCloud server, Solr opens a socket connection to the embedded ZooKeeper server but says: "ZooKeeperServer not running".
It doesn't state a reason.
How can I figure out why the ZooKeeper server isn't actually running?
2012-05-30 15:02:36.538 [main] INFO org.apache.solr.cloud.SolrZkServer - STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983
2012-05-30 15:02:36.545 [Thread-14] INFO o.a.z.server.ZooKeeperServerMain - Starting server
2012-05-30 15:02:36.552 [Thread-14] INFO o.a.zookeeper.server.ZooKeeperServer - Server environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
... [snip] ...
2012-05-30 15:02:37.092 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:9983
2012-05-30 15:02:37.097 [main-SendThread(localhost:9983)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:9983, initiating session
2012-05-30 15:02:37.097 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] INFO o.a.zookeeper.server.NIOServerCnxn - Accepted socket connection from /127.0.0.1:43635
2012-05-30 15:02:37.100 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN o.a.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-05-30 15:02:37.100 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] INFO o.a.zookeeper.server.NIOServerCnxn - Closed socket connection for client /127.0.0.1:43635 (no session established for client)
2012-05-30 15:02:37.101 [main-SendThread(localhost:9983)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
In my case specifically, it seemed that having a bunch of extra files in my conf/ directory was causing problems. Try to have the fewest amount of files necessary in that directory to ensure embedded Zookeeper running properly.

Resources