Flink job submission failed for akka.pattern.AskTimeoutException - apache-flink

Job submission failed for akka.pattern.AskTimeoutException, I have tried to set akka.ask.timeout: "60s", still failed for the same error. Any idea?
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1594246162]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)

This seems to be a known issue FLINK-11143. Hope it could be resolved soon.

Related

How to handle camel-quickfix CannotSendException?

I am using QuickFix/J 1.6.4 in a camel-quickfix component. Sometimes, I get the CannotSendException and it is not clear, what the exception cause is. I looked at the code, but there it seems that this exception is thrown, when the message could not be send - regardless of the reason.
So how do I have to handle this exception? Do I have to implement a retry mechanism, or does the engine handle this for me? If the engine does, how can I verify, that the message is sent afterwards?
Exception Stacktrace (I replaced the FIX message with *** because I cannot publish it):
org.apache.camel.component.quickfixj.CannotSendException: Cannot send FIX message: ***
at org.apache.camel.component.quickfixj.QuickfixjProducer.sendMessage(QuickfixjProducer.java:75)
at org.apache.camel.component.quickfixj.QuickfixjProducer.process(QuickfixjProducer.java:47)
at org.apache.camel.util.AsyncProcessorConverterHelper$ProcessorToAsyncProcessorBridge.process(AsyncProcessorConverterHelper.java:61)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:101)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:171)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.ChoiceProcessor.process(ChoiceProcessor.java:117)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler$1.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:131)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:114)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:196)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:196)
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:109)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:91)
at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:112)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:555)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:515)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:485)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:243)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1103)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1095)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:992)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1164)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:634)
at java.lang.Thread.run(Thread.java:809)
I have found the nabble on this topic and in camel-quickfix versions before Camel 2.10.0 it was probably not possible to ensure that a FIX message was sent. This has been changed with the CAMEL-5247 patch and the CannotSendException has been added. So it is now possible to configure the sending of FIX messages as a synchronous write with SocketSynchronousWrites, even if it is not recommended for performance reasons.
Write messages synchronously. This is not generally recommended as it may result in performance degradation. The MINA communication layer is asynchronous by design, but this option will override that behavior if needed.
So now you can also get info about failed sending of FIX messages, even if you do not use SocketSynchronousWrites. The error handling must then be implemented in the application.

Handle large messages with Apache Camel and AMQ Artemis

When I receive a large message (100KiB+) in a AMQ Artemis Queue and try to route this message to another AMQ and this message have the property _AMQ_LARGE_SIZE I got the follow error:
14:38:56.250 [Camel (CamelTestRoute) thread #1 - JmsConsumer[QUEUE.TEST]] WARN o.a.c.c.jms.EndpointMessageListener - Execution of JMS message listener failed. Caused by: [org.apache.camel.RuntimeCamelException - javax.jms.JMSRuntimeException: Invalid address QUEUE.TEST]
org.apache.camel.RuntimeCamelException: javax.jms.JMSRuntimeException: Invalid address QUEUE.TEST
I know if I set the property minLargeMessageSize in the Connection Factory that post the message in AMQ, this problem does not happens.
The problem is, I don't have control of the codes that create the Connection Factories, and some times they don't set the Large Message Size property.
Is there a way that can I handle this in Camel with my Connection Factory?
*EDIT
16:33:03.836 [Camel (CamelTestRoute) thread #1 - JmsConsumer[QUEUE.TEST]] WARN o.a.c.c.jms.EndpointMessageListener - Execution of JMS message listener failed. Caused by: [org.apache.camel.RuntimeCamelException - javax.jms.JMSRuntimeException: Invalid address QUEUE.TEST]
org.apache.camel.RuntimeCamelException: javax.jms.JMSRuntimeException: Invalid address QUEUE.TEST
at org.apache.camel.util.ObjectHelper.wrapRuntimeCamelException(ObjectHelper.java:1830)
at org.apache.camel.component.jms.EndpointMessageListener$EndpointMessageListenerAsyncCallback.done(EndpointMessageListener.java:196)
at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:117)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:719)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:679)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:649)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:317)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:255)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1168)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1160)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1057)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.jms.JMSRuntimeException: Invalid address QUEUE.TEST
at org.apache.activemq.artemis.jms.client.ActiveMQDestination.fromAddress(ActiveMQDestination.java:119)
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.getJMSDestination(ActiveMQMessage.java:386)
at org.apache.camel.component.jms.JmsBinding.extractHeadersFromJms(JmsBinding.java:187)
at org.apache.camel.component.jms.JmsMessage.populateInitialHeaders(JmsMessage.java:229)
at org.apache.camel.impl.DefaultMessage.createHeaders(DefaultMessage.java:257)
at org.apache.camel.component.jms.JmsMessage.ensureInitialHeaders(JmsMessage.java:214)
at org.apache.camel.component.jms.JmsMessage.getHeader(JmsMessage.java:164)
at org.apache.camel.impl.DefaultMessage.getHeader(DefaultMessage.java:93)
at org.apache.camel.impl.DefaultUnitOfWork.<init>(DefaultUnitOfWork.java:115)
at org.apache.camel.impl.MDCUnitOfWork.<init>(MDCUnitOfWork.java:54)
at org.apache.camel.impl.DefaultUnitOfWorkFactory.createUnitOfWork(DefaultUnitOfWorkFactory.java:32)
at org.apache.camel.processor.CamelInternalProcessor$UnitOfWorkProcessorAdvice.createUnitOfWork(CamelInternalProcessor.java:695)
at org.apache.camel.processor.CamelInternalProcessor$UnitOfWorkProcessorAdvice.before(CamelInternalProcessor.java:663)
at org.apache.camel.processor.CamelInternalProcessor$UnitOfWorkProcessorAdvice.before(CamelInternalProcessor.java:634)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:149)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:113)
... 11 common frames omitted
If you're using an Artemis 1.x client against an Artemis 2.x broker then you need to configure the acceptor that the client is connecting to with the appropriate anycastPrefix and multicastPrefix, e.g:
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;useEpoll=true;amqpCredits=1000;amqpLowCredits=300</acceptor>

How to apply expiration message in dead letter queue (amq)

<policyEntry queue="Activemq.DLQ">
<deadLetterStrategy>
<sharedDeadLetterStrategy processExpired="false" expiration="300000"/>
</deadLetterStrategy>
 </policyEntry>
above is my amq.xml changes. when i applied this changes, this broker unable to start easy to say is not running. My intentions is to put expiration towards all the messages in dlq and auto discard it. Any help is appreciated below is the error from amq log
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.7.0_45]
at java.lang.reflect.Method.invoke(Method.java:606)[:1.7.0_45]
at org.apache.activemq.console.Main.runTaskClass(Main.java:262)[activemq.jar:5.10.0]
at org.apache.activemq.console.Main.main(Main.java:115)[activemq.jar:5.10.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.7.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)[:1.7.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.7.0_45]
at java.lang.reflect.Method.invoke(Method.java:606)[:1.7.0_45]
at org.tanukisoftware.wrapper.WrapperSimpleApp.run(WrapperSimpleApp.java:240)[wrapper.jar:3.2.3]
at java.lang.Thread.run(Thread.java:744)[:1.7.0_45]

How can I see if a job failed and why?

How can I use ClusterClient to check if a job failed and why?
ClusterClient#getJobStatus may seem like a good first candidate but it only says if the job failed without any information regarding the exceptions.
The submission of the job is being done with a detached client therefore waiting for its ClusterClient#run to return a JobExecutionResult is not an option.
I've also tried
RestClusterClient#retrieveJob also does not work, failing with:
org.apache.flink.runtime.client.JobRetrievalException: Couldn't
retrieve leading JobManager. at
org.apache.flink.runtime.client.JobListeningContext.getJobManager(JobListeningContext.java:157)
at
org.apache.flink.runtime.client.JobListeningContext.getClassLoader(JobListeningContext.java:141)
at
org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:262)
at
org.apache.flink.client.program.ClusterClient.retrieveJob(ClusterClient.java:586)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException:
Could not retrieve the leader gateway. at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:82)
at
org.apache.flink.runtime.client.JobListeningContext.getJobManager(JobListeningContext.java:152)
... 10 more
Caused by: java.util.concurrent.TimeoutException: Futures
timed out after [10000 milliseconds] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190) at
scala.concurrent.Await.result(package.scala) at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:80)
... 11 more
Use NewClusterClient#requestJobResult which can be done using a RestClusterClient.

Camel with RabbitMQ exception only occurs on second message - mis-spelt exchange name

I'm using Camel within a Spring boot application and integrate with RabbitMQ but am encountering strange behaviour.
My app has Restful endpointswhich convert the http request to a RabbitMQ message and publish this to a predefined exchange. There is a separate consumer app which listens to a queue and processes the messages.
I have deliberately entered an incorrect rabbitmq exchange name (invalidxchangename)to check that the application will fail if the exchange does not exist however the camel context starts without error and when I send in a first request is does not report any error. This message gets lost as there is no matching RabbitMQ exchange. When I submit a second request I receive the following exception which I would have expected on route startup.
com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'invalidxchangename' in vhost
EDIT:
I've tried a more simple example to show the issue in Camel.
I've created a simple route as follows:
from("file:in?fileName=in.txt").log(LoggingLevel.DEBUG, "in here!").to("rabbitmq://localhost:5762/invalidexchange?declare=false");
where there is an existing RabbitMQ exchange called validexchange (so I have deliberately made a typo in the RabbitMQ uri). I would expect the camel route to fail at startup since the exchange doesn't exist, or even the first time it tries to process a new in.txt file.
What I am actually seeing in the logs is that on start up it reports no error and only on the 2nd invocation of the route does it report an error.
2015-03-11 16:17:04.356 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-2 >>> (route2) from(file://in?fileName=in.txt) --> log[in here!] <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:04.360 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-2 >>> (route2) log[in here!] --> rabbitmq://localhost:5762/customerchannel.exchang?declare=false <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:45.073 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-4 >>> (route2) from(file://in?fileName=in.txt) --> log[in here!] <<< Pattern:InOnly, Headers: ...
2015-03-11 16:17:45.079 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-4 >>> (route2) log[in here!] --> rabbitmq://localhost:5762/customerchannel.exchang?declare=false <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:45.092 ERROR 9756 : Failed delivery for (MessageId: ID-SBMELW7W-06220-59960-1426051020468-0-3 on ExchangeId: ID-SBMELW7W-06220-59960-1426051020468-0-4). Exhausted after delivery attempt: 1 caught: com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'customerchannel.exchang' in vhost '/', class-id=60, method-id=40)
It looks like the first request is causing an error which closes the connection and logs the reason, and when you try to use the channel the second time it's returning an AlreadyClosedException with the message that caused the channel to close in the first call.
You can test this by trying to publish the second message to a different exchange name in the same channel and checking which exchange is in the error. E.g. publish the second message to invalidxchangename2 and you should still see invalidxchangename as the exchange in the error.
To fix, you should handle the publish result when you publish and re-establish the connection if there's an error.
If you want to be sure that a message got delivered to a RabbitMQ queue, then you have to use publisher confirms: https://www.rabbitmq.com/confirms.html
That you are able to publish a message it doesn't mean that the message will reach a queue. You could go to a mailbox and leave a letter inside, but between the time you left the letter there and a postman picked up, many things could have happened, for example, the mailbox catching fire and so on.

Resources