Solr loses connection with zookeeper and does not reconnect - solr

We are having a Solr cloud setup (solr version 4.10.4). It works well except that sometimes, Solr instances loose connection with zookeeper. The exception in the Solr logs say 'No route to Host'. Restarting both the Solr instances solves the issue.
Could that be an issue with the Zookeeper DNS? Even if so, how can restarting Solr fix the issue?. Please help
Exception is as follows
"1480390035494","11/29/2016 14:27:15.494 +1100","2016-11-29 14:27:15,494 ERROR ajp-bio-127.0.0.1-18062-exec-18 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1555)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:650)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
Session 0x357b694d40e000e for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)","ip-","608","linux--solr","app/solr","ip-172","/opt/solr.log"

Related

Daemon thread doesn't complete it's execution when we restart zookeeper

In our current architecture of the project we are using solr for gathering, storing and indexing documents from different sources and making them searchable in near real-time
Our web applications running on tomcat connecting to solr to create / modify the documents
Solr uses Zookeeper to keep the configuration centralized
There are 5 servers in our cluster where we are running solr
when the zookeeper restarts in one of the server the daemon thread created in the server doesn't complete it's execution due to which
We are getting continuous logs with below exceptions while trying to connect to zookeeper from tomcat instance
org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [org.apache.zookeeper.ClientCnxn$SendThread]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
which in some time runs out of thread in the server
can someone help me with the below question please ?
why the daemon thread doesn't complete it's execution when we restart zookeeper
Solr Version : 8.5.1
zookeeper version : 3.5.5

Zookeeper errors

I am using solr with zookeeper and see the following errors in zookeeper logs
Using zk 3.4.10 and solr 6.6
EndOfStreamException: Unable to read additional data from client sessionid 0x1XXXXXXX, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-04-28 06:24:59,939 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /10.40.96.193:46260 which had sessionid 0x1XXXXXXX
The zoo keeper config
tickTime=2000
initLimit=10
syncLimit=5
Do these config values result in above exception? If yes, can someone explain whether we should increase or decrease initLimit & syncLimit?
Thanks in advance.
Those 3 config parameters only refer to the ZooKeeper servers (ensemble) and irrelevant to your exception. They are for synchronization between the leader and the followers.
Your client connection exception is more likely caused by a network issue (maybe TCP keep alive settings).
See ZooKeeper Administrator's Guide:Cluster options for more information on initLimit and syncLimit.

Destroying connection that could not be successfully matched : Error in jboss while connecting with Database

Modified: I am using Jboss 7 server and oracle 10g for my web Application but when I starting the server application working properly but after 2 or 3 hours it is becoming slow.
I guessed like.
1. connection is not happening between database and jboss server properly so that it is not fetching data, so cause of this it becoming slow.
Datasource information in my standalone-full.xml file in jboss server is below:
<datasources>
<datasource jndi-name="java:/TTKConnectionDataSource" pool-name="TTKConnectionDataSourcePool" enabled="true" use-java-context="true" use-ccm="false">
<connection-url>jdbc:oracle:thin:#10.1.0.112:1521:vidaltest</connection-url>
<driver-class>oracle.jdbc.OracleDriver</driver-class>
<driver>oracle</driver>
<transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
<pool>
<min-pool-size>5</min-pool-size>
<max-pool-size>150</max-pool-size>
<prefill>true</prefill>
<use-strict-min>true</use-strict-min>
<flush-strategy>FailingConnectionOnly</flush-strategy>
</pool>
<security>
<user-name>appln</user-name>
<password>appln</password>
</security>
<validation>
<valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
<validate-on-match>true</validate-on-match>
<background-validation-millis>300000</background-validation-millis>
<stale-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleStaleConnectionChecker"/>
<exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleExceptionSorter"/>
</validation>
<timeout>
<idle-timeout-minutes>10</idle-timeout-minutes>
</timeout>
</datasource>
And error what I am getting during slow down of server :
11:31:04,689 WARN [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] (http-web-10) IJ000612: Destroying connection that could not be successfully matched: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener#323ff8fa[state=DESTROYED managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection#62b2c9f connection handles=0 lastUse=1494391408674 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool#547ac048 pool internal context=SemaphoreArrayListManagedConnectionPool#46e5e24c[pool=TTKConnectionDataSourcePool] xaResource=LocalXAResourceImpl#1e6c0ff1[connectionListener=323ff8fa connectionManager=488aa6d1 warned=false currentXid=null] txSync=null]
11:37:09,075 INFO [com.ttk.action.claims.TTKListener] (http-web-20) Session is created sessionCreated
11:37:09,078 ERROR [org.apache.struts.actions.DispatchAction] (http-web-20) Request[/LoginAction] does not contain handler parameter named 'mode'. This may be caused by whitespace in the label text.
11:46:35,964 WARN [org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory] (http-web-10) Destroying connection that is not valid, due to the following exception: oracle.jdbc.driver.T4CConnection#5b8b1ccb: java.sql.SQLException: pingDatabase failed status=-1
at org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker.isValidConnection(OracleValidConnectionChecker.java:74) [ironjacamar-jdbc-1.0.12.Final.jar:1.0.12.Final]
05: Connection error occured: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener#38b370[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection#a8c7e2d connection handles=0 lastUse=1494390662156 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool#547ac048 pool internal context=SemaphoreArrayListManagedConnectionPool#46e5e24c[pool=TTKConnectionDataSourcePool] xaResource=LocalXAResourceImpl#dafc1c4[connectionListener=38b370 connectionManager=488aa6d1 warned=false currentXid=null] txSync=null]: java.sql.SQLException: pingDatabase failed status=-1
when I am restarting server than connection is fine up to 2 or 3 hours maximum after that server again becoming slowdown ,Please suggest me what are the possibilities to overcome on this issue.
Thanks in advance.
sorry for disturbance, but again I am facing this problem actually what is happening ? when my server become slowdown then it's not allowing me to login its keep on buffering because its not connecting with my database as i think or may be its not getting connection object. and after some time of buffering its giving warning message what i mentioned above "Destroying connection that could not be successfully matched" . than after that if i will try to login the application than its taking time to get login or last option i need to restart my server that is not that much preferable ever.
The WARN messages are not unusual when something outside JBoss closes a connection.
That warning indicates that JBoss got the Oracle JDBC driver to ping the database to ensure that the connection still worked, and it reported that it didn't, so JBoss destroyed the connection. JBoss would then create a new one and give that to the application, so it should in theory not cause any actual problems.
Method "org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker.isValidConnection()" throws this exception. Method isValidConnection() internally calls OracleConnection's pingDatabase() [1] method. It the database is closed it returns -1, refer [2]. This issue is a known issue with Oracle driver which is seen when their are issues with network or database and is not related to JBoss
This is a known type of error indicating some problem with your Oracle database or occasionally with the network.
I would ask you to check the network connectivity is stable between the JBoss node and Database.
[1] https://docs.oracle.com/cd/E18283_01/appdev.112/e13995/oracle/jdbc/OracleConnection.html#pingDatabase__
[2] https://docs.oracle.com/cd/E18283_01/appdev.112/e13995/oracle/jdbc/OracleConnection.html#DATABASE_CLOSED

Intermittent Communications link failure with Cloud SQL

I'm using jmeter to stress test a GAE web service which uses CloudSQL and I'm getting intermittent communications link failure exceptions.
I've tried using direct connections and a connection pool, and I see exceptions in either scenario. The exceptions increase as the number of requests per second increase.
Note that we are using the highest tier of cloud sql, D32 and the tests are well under the max 3200 connections.
Here's a stack trace for reference:
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.GeneratedConstructorAccessor48.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:33)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2413)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2450)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2235)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:818)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46)
at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:33)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.GoogleNonRegisteringDriver$JdbcWrapper.getInstance(GoogleNonRegisteringDriver.java:276)
at com.mysql.jdbc.GoogleNonRegisteringDriver.connect(GoogleNonRegisteringDriver.java:246)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
Update: I changed the connection pool settings to maxActive = 5 and maxIdle = 5 and the intermittent communications link exceptions went away. Note that I've tried commons dbcp and tomcat dbcp. I'm now seeing the following exceptions in the logs:
Caused by: java.sql.SQLException: java.lang.SecurityException: Unable to access gatherPerformanceMetrics
Caused by: java.sql.SQLException: java.lang.SecurityException: Unable to access includeThreadDumpInDeadlockExceptions
Caused by: java.sql.SQLException: java.lang.SecurityException: Unable to access nullNamePatternMatchesAll
From https://cloud.google.com/appengine/docs/java/cloud-sql/#Java_Size_and_access_limits
"Each App Engine instance cannot have more than 12 concurrent connections to a Google Cloud SQL instance."
Can you tell more about the test set-up? How many requests is jmeter sending to appengine and how many connections does the app instance open for each of those requests?
To everyone who are looking for why you might be getting "com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure" on a connection.
Make sure your IP is allowed if you are calling from a test server!
I was testing at a friends house, and this unhelpful error kept showing up.

Connection refused when starting Solr with external Zookeeper

I have setup 3 servers with Amazon EC2, and have each server with the following Zookeeper-config.
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
server.1=server1address:2888:3888
server.2=server3address:2888:3888
server.3=server3address:2888:3888
I start zookeeper on each server, and after I start Solr on the servers, I get errors like this in Solr:
3766 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper
3790 [main-SendThread(*serverAddress*:2181)] WARN org.apache.zookeeper.ClientCnxn – Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
This was apparently coming because Zookeeper wasn't running properly. What I then figured out was that zookeeper was producing this error:
2013-06-09 08:00:57,953 [myid:1] - INFO [ec2amazonaddress.com/ipaddress#amazon:QuorumCnxManager$Listener#493] - Received connection request /ipaddress:60855
2013-06-09 08:00:57,963 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#368] - Cannot open
channel to 3 at election address ec2amazonaddress/ipaddress#amazon:
3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:35
4)
So the problem is with ZooKeeper. What I did was to start another server before the server I previously started first, and then it worked. However, after some restarts that didn't work anymore. In other words, it seems like the order of when you start the ZK server matters. I was able to see that some servers who were fired up first went into follower mode instead of leader mode right away, and maybe that's the reason. I have deleted and reinstalled my whole setup, but the problem was still there.
I have checked the ports and have killed all processes using ports 2181 and 2888/3888 before launching Zookeeper. What bothers me is that this has worked with the same setup earlier.
Hope some of you guys have some experience with this problem. Any suggestion that could be related to not being able to connect to ZK-servers is also welcomed

Resources