WorkLightAuthenticationException - worklight-server

In production server a series of below exception is generating and which leading taking restart of server.
Please let us know what is the cause for below exception. Using worklight 6.1.
[8/31/17 9:00:53:093 IST] 000002ac ServletWrappe E com.ibm.ws.webcontainer.servlet.ServletWrapper service SRVE0068E: An exception was thrown by one of the service methods of the servlet [GadgetAPIServlet] in application [IBM_Worklight_Console]. Exception created : [com.worklight.server.auth.api.WorkLightAuthenticationException
at com.worklight.core.auth.impl.AuthenticationContext.checkAuthentication(AuthenticationContext.java:548)
at com.worklight.core.auth.impl.AuthenticationContext.processRealms(AuthenticationContext.java:414)
at com.worklight.core.auth.impl.AuthenticationContext.pushCurrentResource(AuthenticationContext.java:391)
at com.worklight.core.auth.impl.AuthenticationServiceBean.accessResource(AuthenticationServiceBean.java:75)
at com.worklight.integration.services.impl.DataAccessServiceImpl.invokeProcedureInternal(DataAccessServiceImpl.java:384)
at com.worklight.integration.services.impl.DataAccessServiceImpl.invokeProcedure(DataAccessServiceImpl.java:112)
at com.worklight.gadgets.serving.handler.BackendQueryHandler.getContent(BackendQueryHandler.java:184)
at com.worklight.gadgets.serving.handler.BackendQueryHandler.doPost(BackendQueryHandler.java:75)
at com.worklight.gadgets.serving.GadgetAPIServlet.doGetOrPost(GadgetAPIServlet.java:141)
at com.worklight.gadgets.serving.GadgetAPIServlet.doPost(GadgetAPIServlet.java:103)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:595)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1230)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:779)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:478)
at com.ibm.ws.webcontainer.servlet.ServletWrapperImpl.handleRequest(ServletWrapperImpl.java:178)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:136)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:97)
at com.worklight.core.auth.impl.AuthenticationFilter$1.execute(AuthenticationFilter.java:191)
at com.worklight.core.auth.impl.AuthenticationServiceBean.accessResource(AuthenticationServiceBean.java:76)
at com.worklight.core.auth.impl.AuthenticationFilter.doFilter(AuthenticationFilter.java:195)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:195)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:967)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1107)
at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:87)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:939)
at com.ibm.ws.webcontainer.WSWebContainer.handleRequest(WSWebContainer.java:1662)
at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:200)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:463)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:530)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:316)
at com.ibm.ws.http.channel.inbound.impl.HttpICLReadCallback.complete(HttpICLReadCallback.java:88)
at com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:175)
at com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
at com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:138)
at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:204)
at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:775)
at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1881)
]

I've been seeing a similar error but for a different issue in the apps we have. In researching this error, I came across this IBM document which says there is an issue with Websphere 6.1. It appears the issue is that it’s not cleaning up the pool of database connections which means means it runs out of available connections. This may be why you were seeing it fail after so many hours (you’ve depleted all available connections) and why rebooting helps (the pool of connections are reset).
Hopefully you've figured it out by now, but if not, check out the IBM document. There should already be a fix out for it.
http://www-01.ibm.com/support/docview.wss?uid=swg1PK92140

Related

Error when trying to start Flink job from retained checkpoint

As I understand from the documentation, it should be possible to resume a Flink job from a checkpoint just as from a savepoint by specifing the checkpoint path in the "Savepoint path" input box of the web UI (e.g. /path/to/my/checkpoint/chk-1, where "chk-1" contains the "_metadata" file).
I've been trying this out but the I get the following exception:
2020-09-04 10:35:11
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:255)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1006)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:454)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:449)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for LegacyKeyedProcessOperator_632e4c67d1f4899514828b9c5059a9bb_(1/1) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:304)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:131)
... 9 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:336)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:548)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:288)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
... 11 more
Caused by: java.nio.file.NoSuchFileException: /tmp/flink-io-ee95b361-a616-4531-b402-7a21189e8ce5/job_c71cd62de3a34d90924748924e78b3f8_op_LegacyKeyedProcessOperator_632e4c67d1f4899514828b9c5059a9bb__1_1__uuid_ae7dd096-f52f-4eab-a2a3-acbfe2bc4573/336ed2fe-30a4-44b5-a419-9e485cd456a4/CURRENT
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
at java.nio.file.Files.copy(Files.java:1274)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:483)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:218)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:194)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:168)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:154)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:279)
... 15 more
Anyone has an idea of what's causing this?
UPDATE: After some tests, I noticed that this behavior depends on the state backend used. In this case I'm using RocksDBStateBackend with incremental checkpointing enabled. When I switched to FsStateBackend, the error disappeared.
Come to think of it, that would make sense since, from what I understand, checkpoints taken with incremental checkpointing enabled only record the changes compared to the previous completed checkpoint instead of the full job state, so it would not be possible to restore the job from this kind of checkpoint.
If that's correct, I think it would be useful to add a notice on the documentation (https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint)

Intermittent errors that go away after a few minutes

Everyday, I'm seeing errors come in that go away on their own after a few minutes. Usually they seem related to Cloud SQL but I've also seen SSL errors:
1) "Lost connection to MySQL server at 'reading initial communication packet', system error: 95"
2) HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),))
Anyone have any idea what could be causing this? Or tips on how to diagnose? It almost seems like a service is restarting and get's resolved on it's own.

Google PubSub Simultaneous Publish Requests

In Google PubSub, the publish call from the client can be called asynchronously. Because of this, I would think that it would be possible to have multiple publish requests triggered and sent to the server, all at the same time, especially if the batch thresholds are too low.
If this is true, how does the pubsub client control the number of simultaneous publish requests that can be created? Is there a hard limit, or an error that can occur if too many requests are created? Is this the intended use of having an asynchronous publisher, or is simply to allow for other non-publishing activity to occur?
Though this question applies to any of the clients, we are specifically having an issue with the C# client, and are intermittently receiving the following error:
Grpc.Core.RpcException: Status(StatusCode=DeadlineExceeded, Detail="Deadline Exceeded")
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Google.Api.Gax.Grpc.ApiCallRetryExtensions.<>c__DisplayClass0_0`2.<<WithRetry>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
My thought is that we are sending too many publish requests..., but I am not sure.
I would advice using the raw gRPC code, but use the client library that has a very thin wrapper.
Looking at the client source code always helps me, you can find for c# code here PublisherClient.cs (thin wrapper)
If you are using PublishAsync it queues/batch the messages anyway, the behaviour is controlled the settings you give to the client (see PublisherServiceApiClient for how to tune it). You can also control the number of client connections that are used to send the queues in the client. I suggest playing with the batch-size first, then the number of connections till you found your sweet spot for your throughput.

Weird LDAP error

See stacktrace below
Caused by: javax.naming.AuthenticationException: [LDAP: error code 49
- 8009030C: LdapErr: DSID-0C0903A8, comment: AcceptSecurityContext error, data 52e, v1db1 ] at
com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3099) at
com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3045) at
com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2847) at
com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2761) at
com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:328)
Cant seem to find the exact error description based on the LdapErr code (DSID-0C0903A8) and the other details. Weird thing is, this happens intermittently. It gets fixed when application server connecting to ldap is restarted. We are using Websphere Application server 8.5
Can anyone suggest the cause for this error?
This error will occur when invalid credentials (data 52e) are presented. The trivial reason is a mistyped password or username.
A more sophisticated reason can be an unknown user or outdated password due to replication issues. This can happen if an administrator creates (or modifies) the user account on DC-1 and Websphere tries to bind that user to DC-42 before the user account was replicated to that DC. Depending on network topology and latency settings you may have a lot of time (between seconds and hours) to play that game.
You may want to make sure Websphere connects to the PDC-Emulator, so at least the current passwords are known as fast as can be.

List not disconnected or not properly disconnected Database-connections

I'm using a dynamic Java Web Application (Tomcat 8.0.15, Java EE 7 Web) with a SQL Server 2008 and after getting the warning/exception
WARNING [Tomcat JDBC Pool Cleaner[510210701:1481713957404]] org.apache.tomcat.jdbc.pool.ConnectionPool.suspect Connection has been marked suspect, possibly abandoned PooledConnection[net.sourceforge.jtds.jdbc.JtdsConnection#510fc080][67975 ms.]:java.lang.Exception
quite too often I wonder somewhere in the depths of my source code I forgot to disconnect a JDBC or Hibernate Connection to the database. I'd like to list them somehow.
A regular
static
{
try {
Context context = new InitialContext ();
dataSource = (DataSource) context.lookup("java:comp/env/jdbc/sqlserv");
} catch (NamingException ex) {
Logger.getLogger(Basisverbindung.class.getName()).log(Level.SEVERE, null, ex);
}
}
does that job and in my hibernate.cfg.xml it's the same:
<property name="hibernate.connection.datasource">java:comp/env/jdbc/sqlserv</property>
I looked through Stackoverflow and found only a few entries which I consulted already (and even upvoted):
Tomcat 7 connection pooling error
WebApp (Tomcat-jdbc) Pooled DB connection throwing abandon exception
https://dba.stackexchange.com/questions/114759/tomcat7-jdbc-connection-pool-connection-has-been-abandoned
But the issue persists or comes up again after a while so I would like to find a way how to track down where I forgot to close the connection. On my Tomcat there's also a PSI Probe running telling me there are coming up some errors in the requests and sometimes maxing out the Response time.
I see a nice list of requests there but don't know which ones are abandoned.
The ActivityMonitor in the SQL-Server Management Studio is not of too much help either it lists quite a few processes of which I know they are closed (or well, should be).
What's the best way to analyze that kind of problem?
What you really want to do is enable "abandoned connection" tracking and reporting.
You didn't say which of Tomcat's JDBC DataSource pools you were using (there are two), but they are configured similarly:
commons-dbcp2-based: logAbandoned=true, removeAbandonedOnBorrow=true
tomcat-jdbc-based: logAbandoned=true, removeAbandoned=true
I always recommend everyone run with a maximum pool size of 1 in development environments. This will help you identify pool leakage very quickly, plus catch any potential deadlocks you may have planted in your code.

Resources