Is there any example to follow for using Apache Camel Atmosphere Websocket component (atmosphere-websocket)? I am trying a basic thing; sending message from the client(Html5 Websocket) to the severe (Camel atmosphere- webscoket component) and resending back to the client.The severe i am using is vFabric tc Runtime 2.9.5.SR1 or Tomcat 7.0.50
My Route looks like this
from("atmosphere-websocket://localhost:8181/test1").to("log:body ${body}").to("atmosphere-websocket://localhost:8181/test1");
Javascript code
var ws = new WebSocket("ws://localhost:8181/test1");
ws.onopen = function (e) {
console.log("open");
}
ws.onclose = function (e) {
console.log("close");
}
ws.onerror = function (e) {
console.log("error");
}
ws.onmessage = function (e) {
console.log("message");
}
But i am getting an error
WebSocket connection to 'ws://localhost:8181/test1' failed: Error during WebSocket handshake: Unexpected response code: 403
From the log i can see that the route and Atmospher framework started without any error
2015-03-10 16:13:14,391 [ost-startStop-1] WARN IOUtils - META-INF/services/org.atmosphere.cpr.AtmosphereFramework not found in class loader
2015-03-10 16:13:14,435 [ost-startStop-1] INFO AtmosphereFramework - Atmosphere is using org.atmosphere.cpr.DefaultAnnotationProcessor for processing annotation
2015-03-10 16:13:14,436 [ost-startStop-1] INFO DefaultAnnotationProcessor - AnnotationProcessor class org.atmosphere.cpr.DefaultAnnotationProcessor$BytecodeBasedAnnotationProcessor being used
2015-03-10 16:13:14,482 [ost-startStop-1] INFO AtmosphereFramework - Auto detecting atmosphere handlers /WEB-INF/classes/
2015-03-10 16:13:14,484 [ost-startStop-1] INFO AtmosphereFramework - Auto detecting WebSocketHandler in /WEB-INF/classes/
2015-03-10 16:13:14,485 [ost-startStop-1] INFO AtmosphereFramework - Installed WebSocketProtocol org.apache.camel.component.atmosphere.websocket.WebsocketHandler
2015-03-10 16:13:14,516 [ost-startStop-1] INFO AtmosphereFramework - Installed AtmosphereHandler org.atmosphere.cpr.AtmosphereFramework$5 mapped to context-path: /*
2015-03-10 16:13:14,516 [ost-startStop-1] INFO AtmosphereFramework - Installed the following AtmosphereInterceptor mapped to AtmosphereHandler org.atmosphere.cpr.AtmosphereFramework$5
2015-03-10 16:13:14,516 [ost-startStop-1] INFO AtmosphereFramework - Installing Default AtmosphereInterceptor
2015-03-10 16:13:14,517 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.CorsInterceptor : CORS Interceptor Support
2015-03-10 16:13:14,517 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.CacheHeadersInterceptor : Default Response's Headers Interceptor
2015-03-10 16:13:14,520 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.PaddingAtmosphereInterceptor : Browser Padding Interceptor Support
2015-03-10 16:13:14,520 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.AndroidAtmosphereInterceptor : Android Interceptor Support
2015-03-10 16:13:14,521 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.HeartbeatInterceptor : Heartbeat Interceptor Support
2015-03-10 16:13:14,521 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.SSEAtmosphereInterceptor : SSE Interceptor Support
2015-03-10 16:13:14,521 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.JSONPAtmosphereInterceptor : JSONP Interceptor Support
2015-03-10 16:13:14,524 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.JavaScriptProtocol : Atmosphere JavaScript Protocol
2015-03-10 16:13:14,524 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.WebSocketMessageSuspendInterceptor : org.atmosphere.interceptor.WebSocketMessageSuspendInterceptor
2015-03-10 16:13:14,525 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.OnDisconnectInterceptor : Browser disconnection detection
2015-03-10 16:13:14,525 [ost-startStop-1] INFO AtmosphereFramework - org.atmosphere.interceptor.IdleResourceInterceptor : org.atmosphere.interceptor.IdleResourceInterceptor
2015-03-10 16:13:14,525 [ost-startStop-1] INFO AtmosphereFramework - Set org.atmosphere.cpr.AtmosphereInterceptor.disableDefaults to disable them.
2015-03-10 16:13:14,532 [ost-startStop-1] INFO AtmosphereFramework - Using EndpointMapper class org.atmosphere.util.DefaultEndpointMapper
2015-03-10 16:13:14,532 [ost-startStop-1] WARN AtmosphereFramework - No BroadcasterCache configured. Broadcasted message between client reconnection will be LOST. It is recommended to configure the org.atmosphere.cache.UUIDBroadcasterCache
2015-03-10 16:13:14,532 [ost-startStop-1] INFO AtmosphereFramework - Default Broadcaster Class: org.atmosphere.cpr.DefaultBroadcaster
2015-03-10 16:13:14,532 [ost-startStop-1] INFO AtmosphereFramework - Broadcaster Polling Wait Time 100
2015-03-10 16:13:14,532 [ost-startStop-1] INFO AtmosphereFramework - Shared ExecutorService supported: true
2015-03-10 16:13:14,532 [ost-startStop-1] INFO AtmosphereFramework - Messaging Thread Pool Size: Unlimited
2015-03-10 16:13:14,532 [ost-startStop-1] INFO AtmosphereFramework - Async I/O Thread Pool Size: 200
2015-03-10 16:13:14,533 [ost-startStop-1] INFO AtmosphereFramework - Using BroadcasterFactory: org.atmosphere.cpr.DefaultBroadcasterFactory
2015-03-10 16:13:14,533 [ost-startStop-1] INFO AtmosphereFramework - Using WebSocketProcessor: org.atmosphere.websocket.DefaultWebSocketProcessor
2015-03-10 16:13:14,533 [ost-startStop-1] INFO AtmosphereFramework - Invoke AtmosphereInterceptor on WebSocket message true
2015-03-10 16:13:14,533 [ost-startStop-1] INFO AtmosphereFramework - HttpSession supported: false
2015-03-10 16:13:14,533 [ost-startStop-1] INFO AtmosphereFramework - Atmosphere is using DefaultAtmosphereObjectFactory for dependency injection and object creation
2015-03-10 16:13:14,533 [ost-startStop-1] INFO AtmosphereFramework - Atmosphere is using async support: org.atmosphere.container.Tomcat7Servlet30SupportWithWebSocket running under container: Undefined using javax.servlet/3.0
2015-03-10 16:13:14,535 [ost-startStop-1] INFO AtmosphereFramework - Atmosphere Framework 2.2.0 started.
2015-03-10 16:13:14,576 [ost-startStop-1] INFO SpringCamelContext - Route: route4 started and consuming from: Endpoint[atmosphere-websocket://localhost:8181/test1]
When you are using an atmosphere-websocket endpoint, you should just specify the the path. There is a jave DSL sample in the unit test folder of the component.
And there is also an osgi sample code and its instruction at
https://github.com/elakito/testzone/tree/master/samples/osgi_camel_websocket_sample_route_bp
https://github.com/elakito/testzone/blob/master/samples/instruction_osgi_camel_websocket_sample_route.txt
Related
I am executing Camel code in Windows using Eclipse and it is working fine.
However, when I execute the same code in standalone from Linux, the route has print first log but when fetching file it stops without any error.
Here is my code:
from("timer://alertstrigtimer?period=90s&repeatCount=1")
.log(LoggingLevel.INFO, "*******************************Job-Alert-System: Started: alertstrigtimer******************************" + getFileURI(getWorkFilePath(), getWorkFileName()))
.pollEnrich(getFileURI(getWorkFilePath(), getWorkFileName()))
.log(LoggingLevel.INFO, "*******************************Job-Alert-System: Started: alertstrigtimer******************************" + getFileURI(getWorkFilePath(), getWorkFileName()))
.choice()
.when(header("CamelFileName").isNull())
.log(LoggingLevel.INFO, "No File")
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
log.info("Job-Alert-System: No Date File Exist!!!! Calculate 15 Minutes Back and fetching data from Masterdata");
// Do something
}
})
.otherwise()
.log(LoggingLevel.INFO, "Job Alert System: Date File Loaded: ${header.CamelFileName} at ${header.CamelFileLastModified}")
.process(new Processor() {
// Do something by a processor
})
public static String getFileURI(String filePath, String fileName) {
return "file://" + filePath + "?fileName=" + fileName
+ "&preMove=$simple{file:onlyname.noext}.$simple{date:now:yyyy-MM-dd'T'hh-mm-ss}";
}
Here are my logs from the Linux environment:
[main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.21.1 (CamelContext: camel-1) is starting
[main] INFO org.apache.camel.management.ManagedManagementStrategy - JMX is enabled
[main] INFO org.apache.camel.impl.converter.DefaultTypeConverter - Type converters loaded (core: 194, classpath: 0)
[main] INFO org.apache.camel.impl.DefaultCamelContext - StreamCaching is not in use. If using streams then its recommended to enable stream caching. See more details at http://camel.apache.org/stream-caching.html
[main] INFO org.apache.camel.impl.DefaultCamelContext - Route: route1 started and consuming from: timer://alertstrigtimer?period=90s&repeatCount=1
[main] INFO org.apache.camel.impl.DefaultCamelContext - Route: loadDataAndAlerts started and consuming from: direct://loadDataAndAlerts
[main] INFO org.apache.camel.impl.DefaultCamelContext - Total 2 routes, of which 2 are started
[main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.21.1 (CamelContext: camel-1) started in 0.664 seconds
[Camel (camel-1) thread #1 - timer://alertstrigtimer] INFO route1 - *******************************Job-Alert-System: Started: alertstrigtimer******************************file:///shared/wildfly/work-files/alerts?fileName=LastExecutionTime_JobAlerts.txt&preMove=.2020-10-12T06-48-16
It stops here. It creates a directory structure, but does not move forward.
Logs from My Local Machine:
[main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.21.1 (CamelContext: camel-1) is starting
[main] INFO org.apache.camel.management.ManagedManagementStrategy - JMX is enabled
[main] INFO org.apache.camel.impl.converter.DefaultTypeConverter - Type converters loaded (core: 194, classpath: 5)
[main] INFO org.apache.camel.impl.DefaultCamelContext - StreamCaching is not in use. If using streams then its recommended to enable stream caching. See more details at http://camel.apache.org/stream-caching.html
[main] INFO org.apache.camel.impl.DefaultCamelContext - Route: route1 started and consuming from: timer://alertstrigtimer?period=90s&repeatCount=1
[main] INFO org.apache.camel.impl.DefaultCamelContext - Route: loadDataAndAlerts started and consuming from: direct://loadDataAndAlerts
[main] INFO org.apache.camel.impl.DefaultCamelContext - Total 2 routes, of which 2 are started
[main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.21.1 (CamelContext: camel-1) started in 0.845 seconds
[Camel (camel-1) thread #1 - timer://alertstrigtimer] INFO route1 - *******************************Job-Alert-System: Started: alertstrigtimer******************************file://null?fileName=null&preMove=null.2020-10-12T10-28-51
[Camel (camel-1) thread #1 - timer://alertstrigtimer] INFO route1 - Job Alert System: Date File Loaded: null.2020-10-12T10-28-51 at 0
It creates directory structure in addition to a file, but the file is not present and it moves forward.
I am running a yarn 3 node cluster on EMR(1 Master 2 Core nodes). I am using 1.6.0. I have check-pointing enabled(rocksdb), writing to S3. Check-pointing seems to work correctly in other tests. In the case where yarn crashes(In this case, I killed the yarn processes) on the master node, I an unable to resume my application from the last checkpoint. Here is the output when I try and restart:
[hadoop#emr flink-1.6.0]$ bin/flink run -s s3://bucket/kinesis-pipeline-checkpoint/a8a9ceb95845c3ea9833e025b5771470 -p 1 -d ~/pipeline-assembly-0.2.0.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/flink-1.6.0/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-11-08 19:01:06,069 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2018-11-08 19:01:06,069 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2018-11-08 19:01:06,488 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 1
2018-11-08 19:01:06,488 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 1
YARN properties set default parallelism to 1
2018-11-08 19:01:06,637 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at emr:8032
2018-11-08 19:01:06,745 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-11-08 19:01:06,745 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-11-08 19:01:06,845 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Found application JobManager host name 'emr' and port '39541' from supplied application id 'application_1541703591281_0001'
Starting execution of program
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: c701b6511ad76b5e4faae703763f388e)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
... 5 more
Is this expected behavior, or am I doing something wrong in this situation?
Thank you
UPDATE: jobmanager.log
LogType:jobmanager.log
Log Upload Time:Tue Nov 20 16:37:52 +0000 2018
LogLength:49255
Log Contents:
2018-11-20 16:33:33,276 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-11-20 16:33:33,277 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.6.0, Rev:ff472b4, Date:07.08.2018 # 13:31:13 UTC)
2018-11-20 16:33:33,278 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: yarn
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: hadoop
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 13653 MiBytes
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-openjdk
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.3
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx15360m
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.log
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:logback.xml
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none)
2018-11-20 16:33:33,674 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-11-20 16:33:33,675 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2018-11-20 16:33:33,678 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon is running as: hadoop Yarn client user obtainer: hadoop
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, s3://bucket/kinesis-checkpoint
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.timeout, 60000
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, application_1542731534971_0001
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: internal.cluster.execution-mode, NORMAL
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.fraction, 0.9
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, rocksdb
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 60s
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 20480m
2018-11-20 16:33:33,682 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 20480m
2018-11-20 16:33:33,682 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.checkpoints.dir, s3://bucket/kinesis-checkpoint
2018-11-20 16:33:33,695 INFO org.apache.flink.runtime.clusterframework.BootstrapTools - Setting directories for temporary files to: /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001
2018-11-20 16:33:33,708 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint.
2018-11-20 16:33:33,708 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2018-11-20 16:33:33,772 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to hadoop (auth:SIMPLE)
2018-11-20 16:33:33,786 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2018-11-20 16:33:33,791 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at ip-172-31-18-80.us-west-2.compute.internal:45751
2018-11-20 16:33:34,239 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2018-11-20 16:33:34,328 INFO akka.remote.Remoting - Starting remoting
2018-11-20 16:33:34,428 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751]
2018-11-20 16:33:34,437 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751
2018-11-20 16:33:34,469 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/blobStore-1dc43ec8-8ed7-4342-adae-c8d20a691640
2018-11-20 16:33:34,473 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:39955 - max concurrent requests: 50 - max backlog: 1000
2018-11-20 16:33:34,488 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2018-11-20 16:33:34,492 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/executionGraphStore-0c4fd7ac-17d2-40d6-b279-dfef5041a76f, expiration time 3600000, maximum cache size 52428800 bytes.
2018-11-20 16:33:34,514 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/blobStore-4c662c5c-afa5-4bf2-8a01-3acc0b9aa491
2018-11-20 16:33:34,521 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-6885656b-18cc-451f-8853-03ff7cf14b0e/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2018-11-20 16:33:34,522 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-6885656b-18cc-451f-8853-03ff7cf14b0e/flink-web-upload for file uploads.
2018-11-20 16:33:34,525 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2018-11-20 16:33:34,702 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.log
2018-11-20 16:33:34,702 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.out
2018-11-20 16:33:34,844 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at ip-172-31-18-80.us-west-2.compute.internal:35939
2018-11-20 16:33:34,844 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://ip-172-31-18-80.us-west-2.compute.internal:35939 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2018-11-20 16:33:34,844 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://ip-172-31-18-80.us-west-2.compute.internal:35939.
2018-11-20 16:33:34,857 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.yarn.YarnResourceManager at akka://flink/user/resourcemanager .
2018-11-20 16:33:34,948 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-11-20 16:33:34,981 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-172-31-30-52.us-west-2.compute.internal/172.31.30.52:8030
2018-11-20 16:33:35,234 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).
2018-11-20 16:33:35,237 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0
2018-11-20 16:33:35,238 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2018-11-20 16:33:35,239 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2018-11-20 16:33:35,252 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2018-11-20 16:33:35,252 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2018-11-20 16:34:20,094 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Submitting job bd0d5dbaeba3990a3bef1eebee49cd79 (Data Session Pipeline v0.0.7).
2018-11-20 16:34:20,108 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/jobmanager_0 .
2018-11-20 16:34:20,115 INFO org.apache.flink.runtime.jobmaster.JobMaster - Initializing job Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,124 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using restart strategy FixedDelayRestartStrategy(maxNumberRestartAttempts=2147483647, delayBetweenRestartAttempts=0) for Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,127 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.slotpool.SlotPool at akka://flink/user/0e6f5de3-53ad-4bae-acf3-3c66106c0a54 .
2018-11-20 16:34:20,148 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart
2018-11-20 16:34:20,170 INFO org.apache.flink.runtime.jobmaster.JobMaster - Running initialization on master for job Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,170 INFO org.apache.flink.runtime.jobmaster.JobMaster - Successfully ran initialization on master in 0 ms.
2018-11-20 16:34:20,203 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 's3://bucket/kinesis-checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=null, enableIncrementalCheckpointing=TRUE}
2018-11-20 16:34:20,203 INFO org.apache.flink.runtime.jobmaster.JobMaster - Configuring application-defined state backend with job/cluster config
2018-11-20 16:34:22,624 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job bd0d5dbaeba3990a3bef1eebee49cd79 from savepoint s3://bucket/kinesis-pipeline-checkpoint/8a6e5aeebeef202a2daddd3cf9419a80 ()
2018-11-20 16:34:22,663 ERROR org.apache.flink.runtime.rest.handler.job.JobSubmitHandler - Exception occurred in REST handler.
org.apache.flink.runtime.rest.handler.RestHandlerException: Job submission failed.
at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$handleRequest$2(JobSubmitHandler.java:119)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$submitJob$2(Dispatcher.java:256)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:690)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
... 4 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
... 24 more
Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
... 18 more
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
at org.apache.flink.util.function.ConsumerWithException.accept(ConsumerWithException.java:40)
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$waitForTerminatingJobManager$29(Dispatcher.java:820)
at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:705)
... 19 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:176)
at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:936)
at org.apache.flink.runtime.dispatcher.Dispatcher.createJobManagerRunner(Dispatcher.java:291)
at org.apache.flink.runtime.dispatcher.Dispatcher.runJob(Dispatcher.java:281)
at org.apache.flink.runtime.dispatcher.Dispatcher.persistAndRunJob(Dispatcher.java:266)
at org.apache.flink.util.function.ConsumerWithException.accept(ConsumerWithException.java:38)
... 21 more
Caused by: java.io.FileNotFoundException: Cannot find meta data file '_metadata' in directory 's3://sledfs/kinesis-pipeline-checkpoint/8a6e5aeebeef202a2daddd3cf9419a80'. Please try to load the checkpoint/savepoint directly from the metadata file instead of the directory.
at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:256)
at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:109)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1102)
at org.apache.flink.runtime.jobmaster.JobMaster.tryRestoreExecutionGraphFromSavepoint(JobMaster.java:1220)
at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1144)
at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:295)
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)
... 26 more
2018-11-20 16:37:52,321 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2018-11-20 16:37:52,322 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2018-11-20 16:37:52,340 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:39955
The checkpoint you are referring to s3://bucket/kinesis-pipeline-checkpoint/a8a9ceb95845c3ea9833e025b5771470 does not contain a valid _metadata file. This indicates that this checkpoint was started but could not be completed. Please choose a checkpoint which has been successfully completed.
I'm having problems with flink application fail.
This streaming job runs shortly after deploying on Yarn.
But is fails after some minutes with below error messages.
Can it be the evidence of high load in low performance yarn cluster?
1.5.0 flink and yarn single job
Single node is equipped with 100GBytes RAM and 40 v-cores
48 Yarn node manager.
2 Kafka topic input ( 150GBytes/hour for each input stream. )
480 kafka partition.
10 flink slot per node manager
From the beginning of the flink
Log Type: jobmanager.log
Log Upload Time: Tue Jun 12 18:19:50 +0900 2018
Log Length: 10807897
2018-06-11 18:59:27,167 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-06-11 18:59:27,168 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.0, Rev:c61b108, Date:24.05.2018 # 14:54:44 UTC)
2018-06-11 18:59:27,168 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: irteam
2018-06-11 18:59:27,472 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-11 18:59:27,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: irteam
2018-06-11 18:59:27,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.161-b14
2018-06-11 18:59:27,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 66667 MiBytes
2018-06-11 18:59:27,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
2018-06-11 18:59:27,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.3
2018-06-11 18:59:27,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx75000m
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djava.library.path=/home1/irteam/realtime-tools
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/naver/search-cluster/eye/var/logs/application_1528711080009_0002/container_e08_1528711080009_0002_01_000001/jobmanager.log
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:logback.xml
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none)
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Class path[omit]
2018-06-11 18:59:27,539 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-06-11 18:59:27,539 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2018-06-11 18:59:27,542 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon is running as: irteam Yarn client user obtainer: irteam
2018-06-11 18:59:27,544 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.home, "/usr/lib/jvm/java-1.8.0-openjdk"
2018-06-11 18:59:27,544 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.opts, "-Djava.library.path=/home1/irteam/realtime-tools"
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, application_1528711080009_0002
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 0.0.0.0
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 100000
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.request-backoff.max, 100000
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: containerized.taskmanager.env.JAVA_HOME, /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: internal.cluster.execution-mode, NORMAL
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 480
2018-06-11 18:59:27,546 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 10
2018-06-11 18:59:27,546 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 100000
2018-06-11 18:59:27,546 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: containerized.master.env.JAVA_HOME, /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
2018-06-11 18:59:27,558 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Setting directories for temporary files to: /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002
2018-06-11 18:59:27,570 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint.
2018-06-11 18:59:27,570 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2018-06-11 18:59:27,636 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to irteam (auth:SIMPLE)
2018-06-11 18:59:27,650 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2018-06-11 18:59:27,654 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at chd004.eye.nfra.io:33524
2018-06-11 18:59:28,126 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2018-06-11 18:59:28,222 INFO akka.remote.Remoting - Starting remoting
2018-06-11 18:59:28,322 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink#chd004.eye.nfra.io:33524]
2018-06-11 18:59:28,329 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink#chd004.eye.nfra.io:33524
2018-06-11 18:59:28,348 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002/blobStore-c25d4d9d-4ddc-442d-8d5e-7bec36dca006
2018-06-11 18:59:28,349 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:45733 - max concurrent requests: 50 - max backlog: 1000
2018-06-11 18:59:28,363 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2018-06-11 18:59:28,367 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002/executionGraphStore-63bcf196-410d-4d8c-8388-f270beb53555, expiration time 3600000, maximum cache size 52428800 bytes.
2018-06-11 18:59:28,388 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002/blobStore-02db740f-8c23-46e8-bb24-1f583b6a0b33
2018-06-11 18:59:28,395 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-8698d702-67fe-437c-b62e-78c2969bf770/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2018-06-11 18:59:28,396 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-8698d702-67fe-437c-b62e-78c2969bf770/flink-web-upload for file uploads.
2018-06-11 18:59:28,399 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2018-06-11 18:59:28,737 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /naver/search-cluster/eye/var/logs/application_1528711080009_0002/container_e08_1528711080009_0002_01_000001/jobmanager.log
2018-06-11 18:59:28,737 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /naver/search-cluster/eye/var/logs/application_1528711080009_0002/container_e08_1528711080009_0002_01_000001/jobmanager.out
2018-06-11 18:59:28,808 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at chd004.eye.nfra.io:39794
2018-06-11 18:59:28,808 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://chd004.eye.nfra.io:39794 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2018-06-11 18:59:28,808 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://chd004.eye.nfra.io:39794.
2018-06-11 18:59:28,817 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.yarn.YarnResourceManager at akka://flink/user/resourcemanager .
2018-06-11 18:59:28,902 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-06-11 18:59:28,916 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink#chd004.eye.nfra.io:33524/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2018-06-11 18:59:28,917 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2018-06-11 18:59:29,161 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).
2018-06-11 18:59:29,163 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0
2018-06-11 18:59:29,174 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink#chd004.eye.nfra.io:33524/user/dispatcher was granted leadership with fencing token 00000000000000000000000000000000
2018-06-11 18:59:29,174 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2018-06-11 18:59:31,120 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Submitting job 5f090c4f4287db062cee0996da5d5ffc (LCS realtime data).
2018-06-11 18:59:31,130 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/jobmanager_0 .
2018-06-11 18:59:31,136 INFO org.apache.flink.runtime.jobmaster.JobMaster - Initializing job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc).
2018-06-11 18:59:31,144 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using restart strategy FixedDelayRestartStrategy(maxNumberRestartAttempts=3, delayBetweenRestartAttempts=30000) for LCS realtime data (5f090c4f4287db062cee0996da5d5ffc).
2018-06-11 18:59:31,148 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.slotpool.SlotPool at akka://flink/user/a6ffe322-07db-4282-a29c-0836ad26cd9f .
2018-06-11 18:59:31,165 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart
2018-06-11 18:59:31,174 INFO org.apache.flink.runtime.jobmaster.JobMaster - Running initialization on master for job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc).
2018-06-11 18:59:31,174 INFO org.apache.flink.runtime.jobmaster.JobMaster - Successfully ran initialization on master in 0 ms.
2018-06-11 18:59:31,248 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using application-defined state backend: File State Backend (checkpoints: 'file:/home1/irteam/apps/flink-1.4.0/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1)
2018-06-11 18:59:31,248 INFO org.apache.flink.runtime.jobmaster.JobMaster - Configuring application-defined state backend with job/cluster config
2018-06-11 18:59:31,258 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManager runner for job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://flink#chd004.eye.nfra.io:33524/user/jobmanager_0.
2018-06-11 18:59:31,260 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc)
2018-06-11 18:59:31,261 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc) switched from state CREATED to RUNNING.
2018-06-11 18:59:31,264 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/480) (98a01166bb2ac99dd301e4b60febbc45) switched from CREATED to SCHEDULED.
Near the timeout event which might cause flink job fails.
2018-06-12 18:17:39,750 INFO org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator - Cancelling sample 5589
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#chd023.eye.nfra.io:34783/user/taskmanager_0#-297572584]] after [15000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
2018-06-12 18:17:39,770 INFO org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator - Cancelling sample 5590
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#chd032.eye.nfra.io:34653/user/taskmanager_0#424015125]] after [15000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
2018-06-12 18:17:51,270 INFO org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator - Cancelling sample 5591
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#chd032.eye.nfra.io:34653/user/taskmanager_0#424015125]] after [15000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
2018-06-12 18:17:55,650 INFO org.apache.flink.yarn.YarnResourceManager - The heartbeat of TaskManager with id container_e08_1528711080009_0002_01_000017 timed out.
2018-06-12 18:17:55,650 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_e08_1528711080009_0002_01_000017 because: The heartbeat of TaskManager with id container_e08_1528711080009_0002_01_000017 timed out.
2018-06-12 18:17:55,650 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Unregister TaskManager 525095d833344e8b205017666accd9c5 from the SlotManager.
2018-06-12 18:17:55,650 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(EventTimeSessionWindows(300000), NowTrigger, NowSessionProcessor) -> Sink: Unnamed (188/480) (f9ed2fc23d6ca5a364300864b60760af) switched from RUNNING to FAILED.
org.apache.flink.util.FlinkException: Releasing TaskManager container_e08_1528711080009_0002_01_000017.
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-06-12 18:17:55,651 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc) switched from state RUNNING to FAILING.
org.apache.flink.util.FlinkException: Releasing TaskManager container_e08_1528711080009_0002_01_000017.
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-06-12 18:17:55,679 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/480) (98a01166bb2ac99dd301e4b60febbc45) switched from RUNNING to CANCELING.
I try to launch flink scala shell in yarn mode, but hit the following error.
This is the command I use, Do I miss anything ? Thanks
bin/start-scala-shell.sh yarn -n 2
Starting Flink Shell:
2018-06-04 17:31:18,166 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2018-06-04 17:31:18,168 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-06-04 17:31:18,168 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2018-06-04 17:31:18,168 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2018-06-04 17:31:18,169 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-06-04 17:31:18,169 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2018-06-04 17:31:18,169 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
Exception in thread "main" java.lang.UnsupportedOperationException: Can't deploy a standalone cluster.
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.deploySessionCluster(StandaloneClusterDescriptor.java:57)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.deploySessionCluster(StandaloneClusterDescriptor.java:31)
at org.apache.flink.api.scala.FlinkShell$.deployNewYarnCluster(FlinkShell.scala:272)
at org.apache.flink.api.scala.FlinkShell$.fetchConnectionInfo(FlinkShell.scala:164)
at org.apache.flink.api.scala.FlinkShell$.liftedTree1$1(FlinkShell.scala:194)
at org.apache.flink.api.scala.FlinkShell$.startShell(FlinkShell.scala:193)
at org.apache.flink.api.scala.FlinkShell$.main(FlinkShell.scala:135)
at org.apache.flink.api.scala.FlinkShell.main(FlinkShell.scala)
Which version of flink do you use? If it is 1.5.0 there is known issue that scala shell does not work with flip-6 mode (enabled by default). You can try running it with legacy mode. There is already open JIRA FLINK-8795 for fixing it.
We have Camel (2.15.2) based application with some REST services published.
Camel-swagger component is used to publish information about the services.
Everything works perfectly if the app is alone in a Tomcat container.
127.0.0.1 - - [12/Jun/2015:11:25:20 +0200] "GET /myapp/api-docs/ HTTP/1.1" 200 383
However if I deploy freshly downloaded Hawt.io 1.4.51 WAR (sample-1.4.51.war) in the same container (no changes o configurations done) I get 204 response code from my original app.
127.0.0.1 - - [12/Jun/2015:12:50:51 +0200] "GET /myapp/api-docs/ HTTP/1.1" 204 -
I guess it all about JMX and how Swagger gets information about REST services published in the Camel context, but I not sure how to avoid this error