Intermittent Communications link failure with Cloud SQL - google-app-engine

I'm using jmeter to stress test a GAE web service which uses CloudSQL and I'm getting intermittent communications link failure exceptions.
I've tried using direct connections and a connection pool, and I see exceptions in either scenario. The exceptions increase as the number of requests per second increase.
Note that we are using the highest tier of cloud sql, D32 and the tests are well under the max 3200 connections.
Here's a stack trace for reference:
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.GeneratedConstructorAccessor48.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:33)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2413)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2450)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2235)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:818)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46)
at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:33)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.GoogleNonRegisteringDriver$JdbcWrapper.getInstance(GoogleNonRegisteringDriver.java:276)
at com.mysql.jdbc.GoogleNonRegisteringDriver.connect(GoogleNonRegisteringDriver.java:246)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
Update: I changed the connection pool settings to maxActive = 5 and maxIdle = 5 and the intermittent communications link exceptions went away. Note that I've tried commons dbcp and tomcat dbcp. I'm now seeing the following exceptions in the logs:
Caused by: java.sql.SQLException: java.lang.SecurityException: Unable to access gatherPerformanceMetrics
Caused by: java.sql.SQLException: java.lang.SecurityException: Unable to access includeThreadDumpInDeadlockExceptions
Caused by: java.sql.SQLException: java.lang.SecurityException: Unable to access nullNamePatternMatchesAll

From https://cloud.google.com/appengine/docs/java/cloud-sql/#Java_Size_and_access_limits
"Each App Engine instance cannot have more than 12 concurrent connections to a Google Cloud SQL instance."
Can you tell more about the test set-up? How many requests is jmeter sending to appengine and how many connections does the app instance open for each of those requests?

To everyone who are looking for why you might be getting "com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure" on a connection.
Make sure your IP is allowed if you are calling from a test server!
I was testing at a friends house, and this unhelpful error kept showing up.

Related

Running Flink build-in program sometimes arise Exception:java.io.IOException: Connecting the channel failed

I have set up a flink standalone cluster, with one master and three slaves , all SESU Linux machines. In the master Dashboard http://flink-master:8081/ I can see 3 Task Managers and 3 task slots as I have set taskmanager.numberOfTaskSlots: 1 in flink-conf.yaml in all of the slaves.
When I run a flink built-in program,like the examples/streaming/Iteration.jar,I get exception often:
java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'ccr202/127.0.0.2:49651' has failed. This might indicate that the remote task manager has been lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:197)
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:132)
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:84)
at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:59)
at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:480)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:502)
at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:93)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:214)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager + 'ccr202/127.0.0.2:49651' has failed. This might indicate that the remote task manager has been lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:220)
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:132)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
Caused by: java.net.ConnectException: Connection refused: ccr202/127.0.0.2:49651
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 6 more
It seems that the network causes the problem,but sometimes the flink program can successfully finish.So what is the reason?
I also encounter this issue very frequently especially when there are many taskManagers. There are a few config I have tried to solve this issue. It's happened when the taskManager read the remote partition through netty connection. It timed out when request the connection. I increased the config "taskmanager.network.netty.server.numThreads", it solved the issue.

exception occurred when using GAE/J and Google Cloud SQL with multiple project

following exception occured.
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:33)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2416)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2450)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2235)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:818)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46)
Caused by: java.net.SocketException: Unable to open connection to the instance: project-A:dba
at com.mysql.jdbc.GoogleCloudSqlSocket.<init>(GoogleCloudSqlSocket.java:48)
at com.mysql.jdbc.GoogleCloudSqlSocketFactory.connect(GoogleCloudSqlSocketFactory.java:81)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300)
... 44 more
Caused by: java.io.FileNotFoundException: /cloudsql/project-A:dba(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:247)
at java.io.RandomAccessFile.<init>(RandomAccessFile...(length 8379)
I have two GAE/J project and both refer same google cloud sql instance.
ex)
project-A(gae/j) use project-A:dba(cloudsql)
project-B(gae/j) use project-A:dba(cloudsql)
such exception occured in project-B.
db connection setting is following.
////(in project-B's java file)//////
String url = "jdbc:google:mysql://"+ "project-B" + ":db/db1";
Connection con = DriverManager.getConnection(url, properties);
////
this is bug? or any mistake?
If the Cloud SQL instance is the same for both projects, then the "project X" used as a qualifier of the instance, must be the same for both applications. This is the project name under which the Cloud SQL instance was created.
The DB host is built in Java as "jdbc:google:mysql://your-project-id:your-instance-name/"
According to your example, it would be "project-A:db/".
Don't forget to authorize your project-B app to access the Cloud SQL instance.

GAE: java.sql.SQLException: Transient error, please try again

db.DbTransaction getConnection: null
java.sql.SQLException: Transient error, please try again.
at
com.google.appengine.api.rdbms.RdbmsApiProxyClient$ApiProxyBlockingInterface.makeSyncCall(RdbmsApiProxyClient.java:108)
at com.google.appengine.api.rdbms.RdbmsApiProxyClient$ApiProxyBlockingInterface.openConnection(RdbmsApiProxyClient.java:71)
at com.google.cloud.sql.jdbc.internal.SqlProtoClient.openConnection(SqlProtoClient.java:58)
at com.google.cloud.sql.jdbc.Driver.connect(Driver.java:66)
at com.google.cloud.sql.jdbc.Driver.connect(Driver.java:26)
at java.sql.DriverManager.getConnection(DriverManager.java:620)
at java.sql.DriverManager.getConnection(DriverManager.java:222)
at db.DbTransaction.getConnection(DbTransaction.java:44)
When i restarted the instance of Google Cloud SQL, it is resolved. What the reason behind it and how can i solve it. I have used servlet based connection pooling in this application. Whether the cause of exception have an relation to connection pooling or have any issue related to An App Engine instance cannot have more than 30 concurrent connections to Google Cloud SQL, so a leak will eventually cause new connections to fail (https://developers.google.com/appengine/docs/java/cloud-sql/).

Connection refused when starting Solr with external Zookeeper

I have setup 3 servers with Amazon EC2, and have each server with the following Zookeeper-config.
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
server.1=server1address:2888:3888
server.2=server3address:2888:3888
server.3=server3address:2888:3888
I start zookeeper on each server, and after I start Solr on the servers, I get errors like this in Solr:
3766 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper
3790 [main-SendThread(*serverAddress*:2181)] WARN org.apache.zookeeper.ClientCnxn – Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
This was apparently coming because Zookeeper wasn't running properly. What I then figured out was that zookeeper was producing this error:
2013-06-09 08:00:57,953 [myid:1] - INFO [ec2amazonaddress.com/ipaddress#amazon:QuorumCnxManager$Listener#493] - Received connection request /ipaddress:60855
2013-06-09 08:00:57,963 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#368] - Cannot open
channel to 3 at election address ec2amazonaddress/ipaddress#amazon:
3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:35
4)
So the problem is with ZooKeeper. What I did was to start another server before the server I previously started first, and then it worked. However, after some restarts that didn't work anymore. In other words, it seems like the order of when you start the ZK server matters. I was able to see that some servers who were fired up first went into follower mode instead of leader mode right away, and maybe that's the reason. I have deleted and reinstalled my whole setup, but the problem was still there.
I have checked the ports and have killed all processes using ports 2181 and 2888/3888 before launching Zookeeper. What bothers me is that this has worked with the same setup earlier.
Hope some of you guys have some experience with this problem. Any suggestion that could be related to not being able to connect to ZK-servers is also welcomed

Failure trying to get a pooled connection. java.sql.SQLException: Protocol violation

I'm administering a web-based application that is set-up to pull in data from a variety of databases; SQL, Oracle, Mainframe, etc.
I was given credentials to access an Oracle DB, and am establishing a connection through the web-based app server via JDBC. The JDBC connection requires me to provide a Database URL and JDBC driver for the connection. I also built in a SQL statement to pull only the information I needed from the Oracle DB into my web-based app.
Things were running smoothly with this set-up, until just recently. I now receive the following error when trying to establish a connection from my web-based app to the Oracle DB:
Failure trying to get a pooled connection to
[jdbc:oracle:thin:#<SERVER NAME>:1521:cqdb]java.sql.SQLException: Protocol violation
The Oracle DBA I work with is not very helpful in resolving in helping me troubleshoot this issue. Without his help, I really don't even know where to start with troubleshooting.
Any suggestions on where to start? I can provide additional information if needed.
*Additional information. This is what is in my STDOUT file in relation to the error. I can keep digging as well:
07:31:08,565 WARN QuartzScheduler_Worker-5 WEB-APP.api.Aggregator:979 - Exception during aggregation. Reason: Failure trying to get a pooled connection to [jdbc:oracle:thin:#SERVER-NAME:1521:cqdb]java.sql.SQLException: Protocol violation
WEB-APP.tools.GeneralException: Failure trying to get a pooled connection to [jdbc:oracle:thin:#SERVER-NAME:1521:cqdb]java.sql.SQLException: Protocol violation
at WEB-APP.api.Aggregator.aggregateAccounts(Aggregator.java:1897)
at WEB-APP.api.Aggregator.execute(Aggregator.java:1222)
at WEB-APP.task.ResourceIdentityScan.execute(ResourceIdentityScan.java:76)
at WEB-APP.api.TaskManager.runSync(TaskManager.java:643)
at WEB-APP.scheduler.JobAdapter.execute(JobAdapter.java:116)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
Caused by: WEB-APP.connector.ConnectorException: Failure trying to get a pooled connection to [jdbc:oracle:thin:#SERVER-NAME:1521:cqdb]java.sql.SQLException: Protocol violation
at WEB-APP.connector.JDBCConnector.getConnection(JDBCConnector.java:833)
at WEB-APP.connector.JDBCConnector.iterateObjects(JDBCConnector.java:649)
at WEB-APP.connector.JDBCConnector.iterateObjects(JDBCConnector.java:90)
at WEB-APP.connector.ConnectorProxy.iterateObjects(ConnectorProxy.java:109)
at WEB-APP.api.Aggregator.iterateObjects(Aggregator.java:2673)
at WEB-APP.api.Aggregator.aggregateAccounts(Aggregator.java:1818)
... 6 more
Caused by: WEB-APP.tools.GeneralException: Failure trying to get a pooled connection to [jdbc:oracle:thin:#SERVER-NAME:1521:cqdb]java.sql.SQLException: Protocol violation
at WEB-APP.tools.JdbcUtil.getPooledConnection(JdbcUtil.java:1178)
at WEB-APP.tools.JdbcUtil.getConnection(JdbcUtil.java:823)
at WEB-APP.connector.JDBCConnector.getConnection(JDBCConnector.java:830)
... 11 more
07:33:03,983 ERROR http-8080-2 WEB-APP.server.Authenticator:229 - WEB-APP.connector.AuthenticationFailedException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C090334, comment: AcceptSecurityContext error, data 52e, vece
**Additional Information:
Oracle JDBC Driver version 14, JRE ver 1.6.0_23-b05. I don't have the Oracle DB version. Awaiting response from our Oracle DBA.
***Additional Information:
This issue was resolved. Our Oracle DBA did something on his end to correct the connection issue. He hasn't explained what he did, yet. Thanks for your help. Sorry for not getting you all the info you needed up front.

Resources