I am trying to achieve "requests recovery" in fail-over scenario in two different machine with their clock also sync.
My configuration as below:
step 1: camel-context.xml
I have defined the below route in camel-context.xml file.
<route id="quartz" trace="true">
<from uri="quartz2://cluster/quartz?cron=0+0/2+++*+?&durableJob=true&stateful=true&recoverableJob=true">
<route>
step 2: quartz.properties:
I have enabled
org.quartz.jobStore.isClustered = true
org.quartz.scheduler.instanceId = AUTO
org.quartz.scheduler.instanceName =ClusteredScheduler
Currently I am running same camel application in two different instances in my local and clustering is working fine . But when I try to test the "requests recovery" I am getting below exception.
Exception :
[QuartzScheduler_ClusteredScheduler-camelContext-16308243724_ClusterManager] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: detected 1 failed or restarted instances.
[QuartzScheduler_ClusteredScheduler-camelContext-16308243724_ClusterManager] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: Scanning for instance "6308270818"'s failed in-progress jobs.
[QuartzScheduler_ClusteredScheduler-camelContext-16308243724_ClusterManager] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: ......Scheduled 1 recoverable job(s) for recovery.
[ClusteredScheduler-camelContext_Worker-1] WARN org.apache.camel.component.quartz2.CamelJob - Cannot find existing QuartzEndpoint with uri: quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true. Creating new endpoint instance.
[ClusteredScheduler-camelContext_Worker-1] ERROR org.apache.camel.component.quartz2.CamelJob - Failed to execute CamelJob.
**org.apache.camel.ResolveEndpointFailedException: Failed to resolve endpoint: quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true due to: Trigger key cluster.quartz is already in used by Endpoint[quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true]**
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:545)
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:558)
at org.apache.camel.component.quartz2.CamelJob.lookupQuartzEndpoint(CamelJob.java:123)
at org.apache.camel.component.quartz2.CamelJob.execute(CamelJob.java:49)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.IllegalArgumentException: Trigger key cluster.quartz is already in used by Endpoint[quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true]
at org.apache.camel.component.quartz2.QuartzEndpoint.ensureNoDupTriggerKey(QuartzEndpoint.java:272)
at org.apache.camel.component.quartz2.QuartzEndpoint.addJobInScheduler(QuartzEndpoint.java:254)
at org.apache.camel.component.quartz2.QuartzEndpoint.doStart(QuartzEndpoint.java:202)
at org.apache.camel.support.ServiceSupport.start(ServiceSupport.java:61)
at org.apache.camel.impl.DefaultCamelContext.startService(DefaultCamelContext.java:2158)
at org.apache.camel.impl.DefaultCamelContext.doAddService(DefaultCamelContext.java:1016)
at org.apache.camel.impl.DefaultCamelContext.addService(DefaultCamelContext.java:977)
at org.apache.camel.impl.DefaultCamelContext.addService(DefaultCamelContext.java:973)
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:541)
... 5 more
After shutting down the instance1 which is currently excuting the job , instance 2 is trying to recover the job immediately but its failing to execute the job .It is picking the same job in next interval (which is fine).
My requirement is active node immediately recover the failed job.
Thanks in advance.
I think we can avoid the checking of ensureNoDupTriggerKey, if the recoverableJob is true. I just created a JIRA CAMEL-8076 for it.
Related
Background :
I have been trying to setup BATCH + STREAMING in the same flink application which is deployed on kinesis analytics runtime. The STREAMING part works fine, but I'm having trouble adding support for BATCH.
Flink : Handling Keyed Streams with data older than application watermark
Apache Flink : Batch Mode failing for Datastream API's with exception `IllegalStateException: Checkpointing is not allowed with sorted inputs.`
The logic is something like this :
The logic is something like this :
streamExecutionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
streamExecutionEnvironment.fromSource(FileSource.forRecordStreamFormat(new TextLineFormat(), path).build(),
WatermarkStrategy.noWatermarks(),
"Text File")
.process(process function which transforms input)
.assignTimestampsAndWatermarks(WatermarkStrategy
.<DetectionEvent>forBoundedOutOfOrderness(orderness)
.withTimestampAssigner(
(SerializableTimestampAssigner<Event>) (event, l) -> event.getEventTime()))
.keyBy(keyFunction)
.window(TumblingEventWindows(Time.of(x days))
.process(processWindowFunction);
On doing this I'm getting the below exception :
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:571)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_90bea66de1c231edf33913ecd54406c1_(1/1) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
... 10 more
Caused by: java.io.IOException: Failed to acquire shared cache resource for RocksDB
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:306)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:426)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:90)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 12 more
Caused by: java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0. Please make sure that all types of managed memory consumers contained in the job are configured with a non-negative weight via `taskmanager.memory.managed.consumer-weights`.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
at org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
at org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
at org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:521)
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:302)
... 17 more
Seems like kinesis-analytics does not allow clients to define a flink-conf.yaml file to define taskmanager.memory.managed.consumer-weights. Is there any way around this ?
It's not clear to me what the underlying cause of this exception is, nor how to make batch processing work on KDA.
You can try this (but I'm not sure KDA will allow it):
Configuration conf = new Configuration();
conf.setString("taskmanager.memory.managed.consumer-weights", "put-the-value-here");
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(conf);
I am new to flink. I am trying to run the flink example on my local PC(windows).
However, after I run the start-cluster.bat, I login to the dashboard, it shows the task manager is 0.
I checked the log and seems it fails to initialize:
2020-02-21 23:03:14,202 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager initialization failed.
org.apache.flink.configuration.IllegalConfigurationException: Failed to create TaskExecutorResourceSpec
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpec.FromConfig(TaskExecutorResourceUtils.java:72)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerSecurely$2(TaskManagerRunner.java:322)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerSecurely(TaskManagerRunner.java:321)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:287)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: The required configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) is not set
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84)
at java.util.Arrays$ArrayList.forEach(Arrays.java:3880)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
... 7 more
2020-02-21 23:03:14,217 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
Basically, it looks like a required option 'taskmanager.cpu.cores' is not set. However, I can't find this property in flink-conf.yaml and in the document(https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html) either.
I am using flink 1.10.0. Any help would be highly appreciated!
That configuration option is intended for internal use only -- it shouldn't be user configured, which is why it isn't documented.
The windows start-cluster.bat is failing because of a bug introduced in Flink 1.10. See https://jira.apache.org/jira/browse/FLINK-15925.
One workaround is to use the bash script, start-cluster.sh, instead.
See also this mailing list thread: https://lists.apache.org/thread.html/r7693d0c06ac5ced9a34597c662bcf37b34ef8e799c32cc0edee373b2%40%3Cdev.flink.apache.org%3E
Polling app daily once. We are loading data using upsert bulk operation in salesforce. First day it is working fine . Second day on wards we are getting the below error(InvalidSession id ). I found workaround as reconnection strategy and configured even after it is not working.
[2017-06-07 10:02:24.519] ERROR org.mule.retry.notifiers.ConnectNotifier [[test-salesforce-bulk].checkbulkupsert.stage1.05]: Failed to connect/reconnect: Work Descriptor. Root Exception was: InvalidSessionId : Invalid session id. Type: class com.sforce.async.AsyncApiException
[2017-06-07 10:02:24.547] ERROR org.mule.exception.CatchMessagingExceptionStrategy [[test-salesforce-bulk].checkbulkupsert.stage1.05]:
********************************************************************************
Message : Failed to invoke upsertBulk.
Element : /test-salesforce-bulk/processors/4/prepareAccountRequestSubFlow/subprocessors/1 # test-salesforce-bulk
--------------------------------------------------------------------------------
Exception stack is:
Failed to invoke upsertBulk. (org.mule.api.MessagingException)
com.sforce.async.BulkConnection.parseAndThrowException(BulkConnection.java:180)
com.sforce.async.BulkConnection.createOrUpdateJob(BulkConnection.java:164)
com.sforce.async.BulkConnection.createOrUpdateJob(BulkConnection.java:132)
com.sforce.async.BulkConnection.createJob(BulkConnection.java:122)
org.mule.modules.salesforce.SalesforceConnector.createJobInfo(SalesforceConnector.java:2570)
org.mule.modules.salesforce.SalesforceConnector.upsertBulk(SalesforceConnector.java:672)
org.mule.modules.salesforce.generated.processors.UpsertBulkMessageProcessor$1.process(UpsertBulkMessageProcessor.java:153)
(50 more...)
(set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
Can you please help on this.
There error says Invalid Session Id.
This means that the initial session has expired. Check your login history and see if there's even an attempt to login. If there's not, you need to update your username in Mulesoft. If there is and it was unsuccessful, you need to update your password/security token. If there is and it was successful, then it could be that your Session Settings are set to limit calls to the initial IP address or something like that.
I am getting below error while starting one of the manage node in my cluster. There are around 6 manage node and the other 5 are coming up fine. However one node is giving below error. Can someone let me know where i need to look for the cause of this.
<Mar 3, 2017 9:26:10 AM CST> <Error> <Deployer> <WL-149231> <Unable to set the activation state to true for the application 'apsp-ear-trp'.
weblogic.application.ModuleException: Could not setup environment
at weblogic.servlet.internal.WebAppModule.activateContexts(WebAppModule.java:1516)
at weblogic.servlet.internal.WebAppModule.activate(WebAppModule.java:444)
at weblogic.application.internal.flow.ModuleStateDriver$2.next(ModuleStateDriver.java:375)
at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
at weblogic.application.internal.flow.ModuleStateDriver.activate(ModuleStateDriver.java:95)
Truncated. see log file for complete stacktrace
Caused By: weblogic.deployment.EnvironmentException: [J2EE:160101]Error: The ejb-link 'UserManager' declared in the ejb-ref or ejb-local-ref 'ejb/UserManager' in the application module 'admin-tools.war' could not be resolved. The target EJB for the ejb-ref could not be found. Please ensure the link is correct.
at weblogic.deployment.BaseEnvironmentBuilder.addEJBLinkRef(BaseEnvironmentBuilder.java:469)
at weblogic.deployment.EnvironmentBuilder.addEJBReferences(EnvironmentBuilder.java:496)
at weblogic.servlet.internal.CompEnv.activate(CompEnv.java:157)
at weblogic.servlet.internal.WebAppServletContext.activate(WebAppServletContext.java:3164)
at weblogic.servlet.internal.WebAppModule.activateContexts(WebAppModule.java:1514)
Truncated. see log file for complete stacktrace
<Mar 3, 2017 9:26:10 AM CST> <Error> <Deployer> <WL-149250> <Unable to unprepare application 'apsp-ear-trp'.
java.lang.NoClassDefFoundError: org/hibernate/event/EventListeners$2
at org.hibernate.event.EventListeners.destroyListeners(EventListeners.java:215)
at org.hibernate.impl.SessionFactoryImpl.close(SessionFactoryImpl.java:850)
at org.hibernate.ejb.EntityManagerFactoryImpl.close(EntityManagerFactoryImpl.java:46)
at weblogic.deployment.BasePersistenceUnitInfoImpl.close(BasePersistenceUnitInfoImpl.java:656)
at weblogic.deployment.PersistenceUnitInfoImpl.close(PersistenceUnitInfoImpl.java:19)
Truncated. see log file for complete stacktrace
Caused By: java.lang.NoClassDefFoundError: org/hibernate/event/EventListeners$2
at org.hibernate.event.EventListeners.destroyListeners(EventListeners.java:215)
at org.hibernate.impl.SessionFactoryImpl.close(SessionFactoryImpl.java:850)
at org.hibernate.ejb.EntityManagerFactoryImpl.close(EntityManagerFactoryImpl.java:46)
at weblogic.deployment.BasePersistenceUnitInfoImpl.close(BasePersistenceUnitInfoImpl.java:656)
at weblogic.deployment.PersistenceUnitInfoImpl.close(PersistenceUnitInfoImpl.java:19)
Truncated. see log file for complete stacktrace
I'm using RemoteAPI to fetch entities from GAE Datastore, 300 at a time.
I'm doing something along the lines of:
while(!(emails = getEmails()).isEmpty()) {
Filter filter = new FilterPredicate("email", FilterOperator.IN, emails)
Query query = new Query("MyEntity").setFilter(filter);
QueryResultIterable<Entity> result = ds.prepare(query).asQueryResultIterable();
for (Entity entity : result) {
System.out.println(entity.getProperty("name"));
}
}
I'm processing something like 50k emails. The first time I ran this code it got to maybe 3/4 of the way, then it threw the following exception. Now it throws it after a single loop iteration is run.
com.google.appengine.tools.remoteapi.RemoteApiException: remote API call: I/O error
at com.google.appengine.tools.remoteapi.RemoteRpc.makeException(RemoteRpc.java:160)
at com.google.appengine.tools.remoteapi.RemoteRpc.callImpl(RemoteRpc.java:104)
at com.google.appengine.tools.remoteapi.RemoteRpc.call(RemoteRpc.java:50)
at com.google.appengine.tools.remoteapi.RemoteDatastore.runQuery(RemoteDatastore.java:156)
at com.google.appengine.tools.remoteapi.RemoteDatastore.handleRunQuery(RemoteDatastore.java:115)
at com.google.appengine.tools.remoteapi.RemoteDatastore.handleDatastoreCall(RemoteDatastore.java:93)
at com.google.appengine.tools.remoteapi.RemoteApiDelegate.makeDefaultSyncCall(RemoteApiDelegate.java:57)
at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate.makeSyncCall(StandaloneRemoteApiDelegate.java:47)
at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate$1.call(StandaloneRemoteApiDelegate.java:58)
at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate$1.call(StandaloneRemoteApiDelegate.java:54)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.google.appengine.repackaged.com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
at com.google.appengine.repackaged.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.appengine.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.appengine.tools.remoteapi.OAuthClient.post(OAuthClient.java:54)
at com.google.appengine.tools.remoteapi.RemoteRpc.callImpl(RemoteRpc.java:102)
... 12 more
I can't figure out what the problem is, but the code seems to be evaluating the for() condition before throwing the exception.
Could this be a quota problem? The quota details screen doesn't show any problems and I can't find any relevant information in the documentation.
For future readers of this question, if you see occurrences of RemoteApiException: remote API call: I/O error which are happening consistently and not intermittently, this could be related to a disruption in network connectivity or possibly a remote issue on the App Engine side.
If the first possibility is ruled out, the best course of action is to report the issue on the Google App Engine issue tracker.
To fix this, first, check your Internet connection. Then clean all artifacts and build them again by (with IntelliJ):
Go to Build => Build Artifacts...
Focus on All Artifacts => Clean
Focus on All Artifacts => Build