How to run flink scala shell in yarn mode - apache-flink

I try to launch flink scala shell in yarn mode, but hit the following error.
This is the command I use, Do I miss anything ? Thanks
bin/start-scala-shell.sh yarn -n 2
Starting Flink Shell:
2018-06-04 17:31:18,166 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2018-06-04 17:31:18,168 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-06-04 17:31:18,168 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2018-06-04 17:31:18,168 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2018-06-04 17:31:18,169 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-06-04 17:31:18,169 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2018-06-04 17:31:18,169 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
Exception in thread "main" java.lang.UnsupportedOperationException: Can't deploy a standalone cluster.
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.deploySessionCluster(StandaloneClusterDescriptor.java:57)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.deploySessionCluster(StandaloneClusterDescriptor.java:31)
at org.apache.flink.api.scala.FlinkShell$.deployNewYarnCluster(FlinkShell.scala:272)
at org.apache.flink.api.scala.FlinkShell$.fetchConnectionInfo(FlinkShell.scala:164)
at org.apache.flink.api.scala.FlinkShell$.liftedTree1$1(FlinkShell.scala:194)
at org.apache.flink.api.scala.FlinkShell$.startShell(FlinkShell.scala:193)
at org.apache.flink.api.scala.FlinkShell$.main(FlinkShell.scala:135)
at org.apache.flink.api.scala.FlinkShell.main(FlinkShell.scala)

Which version of flink do you use? If it is 1.5.0 there is known issue that scala shell does not work with flip-6 mode (enabled by default). You can try running it with legacy mode. There is already open JIRA FLINK-8795 for fixing it.

Related

Flink Standalone K8 - Cannot create HA service - NullPointerException

I currently have a Flink (1.12) cluster running in Standalone Kubernetes (v1.16).
For our purposes, we have went with an application cluster mode deployment.
To make our flink cluster more resilient to failures we want to add HA to our current setup, and I have gone through the documentation and followed the example configurations recommended for our given setup (here).
flink-conf.yaml
jobmanager.rpc.address: {{ $fullName }}-jobmanager
jobmanager.rpc.port: 6123
jobmanager.memory.process.size: 1600m
taskmanager.numberOfTaskSlots: 2
taskmanager.rpc.port: 6122
taskmanager.memory.process.size: 1728m
blob.server.port: 6124
queryable-state.proxy.ports: 6125
parallelism.default: 2
scheduler-mode: reactive
execution.checkpointing.interval: 10s
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.cluster-id: thoros-cluster-1
high-availability.storageDir: s3:///company-flink-{{ .Values.environment }}/recovery
job.yaml (excerpt)
...
restartPolicy: OnFailure
containers:
- name: jobmanager
image: "{{ .Values.thoros.image.repository }}:{{ .Chart.AppVersion }}"
imagePullPolicy: {{ default "Always" .Values.thoros.image.pullPolicy }}
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
envFrom:
- configMapRef:
name: {{ $fullName }}
# The following args overwrite the value of jobmanager.rpc.address configured in the configuration config map to POD_IP.
args: [
"standalone-job",
"--host",
"$(POD_IP)",
"--job-classname",
"com.company.beam.Main"]
There are of course a couple of other configurations I am leaving out (happy to provide those if needed)
To test, I have set the Job parallelism to 2 (which spins up two JobManagers, one of which should be standby)
When trying to deploy this to K8 the JobManager pods fail immediately with the following error - I am not sure what may be missing here aside from the fact that something seems to be missing hence causing the Nullpointerexception?
2021-08-20 12:06:55,133 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Initializing cluster services.
2021-08-20 12:06:55,176 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address 100.107.0.5:6123, bind address 0.0.0.0:6123.
2021-08-20 12:06:56,956 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started
2021-08-20 12:06:57,067 INFO akka.remote.Remoting [] - Starting remoting
2021-08-20 12:06:57,469 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink#100.107.0.5:6123]
2021-08-20 12:06:57,687 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink#100.107.0.5:6123
2021-08-20 12:06:58,671 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting StandaloneApplicationClusterEntryPoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the ha services from the instantiated HighAvailabilityServicesFactory org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:268)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:124)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:338)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:296)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:178)
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:175)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:585)
at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:85)
Caused by: java.lang.NullPointerException
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:59)
at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.<init>(Fabric8FlinkKubeClient.java:85)
at org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory.fromConfiguration(FlinkKubeClientFactory.java:106)q
at org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:37)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:265)
... 9 more
.
2021-08-20 12:06:58,684 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Stopping Akka RPC service.
2021-08-20 12:06:58,754 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator [] - Shutting down remote daemon.
2021-08-20 12:06:58,767 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator [] - Remote daemon shut down; proceeding with flushing remote transports.
2021-08-20 12:06:58,833 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator [] - Remoting shut down.
2021-08-20 12:06:58,882 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Stopped Akka RPC service.
2021-08-20 12:06:58,882 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not start cluster entrypoint StandaloneApplicationClusterEntryPoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneApplicationClusterEntryPoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:201) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:585) [flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:85) [flink-dist_2.12-1.12.5.jar:1.12.5]
Caused by: org.apache.flink.util.FlinkException: Could not create the ha services from the instantiated HighAvailabilityServicesFactory org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:268) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:124) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:338) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:296) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:178) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:175) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
... 2 more
Caused by: java.lang.NullPointerException
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:59) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.<init>(Fabric8FlinkKubeClient.java:85) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory.fromConfiguration(FlinkKubeClientFactory.java:106) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:37) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:265) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:124) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:338) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:296) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:178) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:175) ~[flink-dist_2.12-1.12.5.jar:1.12.5]
This issue was due to using high-availability.cluster-id when it should be kubernetes.cluster-id.

Flink 1.10.0 - The heartbeat of ResourceManager with id xxxx timed out

I am running flink standalone cluster HA in kubernetes. The same setup runs perfectly when using Flink 1.9 but getting below error continuously when using Flink 1.10.
INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - The heartbeat of ResourceManager with id 783439e4ead380c60498e32a8e1c0ce3 timed out.
DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor - Close ResourceManager connection 783439e4ead380c60498e32a8e1c0ce3.
org.apache.flink.runtime.taskexecutor.exceptions.TaskManagerException: The heartbeat of ResourceManager with id 783439e4ead380c60498e32a8e1c0ce3 timed out.
at org.apache.flink.runtime.taskexecutor.TaskExecutor$ResourceManagerHeartbeatListener.notifyHeartbeatTimeout(TaskExecutor.java:1842)
at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:109)
flink-conf.yaml :
jobmanager.rpc.address: xx.xxx.xx.xxx
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1500m
taskmanager.memory.process.size: 4000m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 1
jobmanager.execution.failover-strategy: region
state.backend: filesystem
state.checkpoints.dir: file:///checkpoints
state.savepoints.dir: file:///savepoints
high-availability: zookeeper
high-availability.jobmanager.port: 50010
high-availability.zookeeper.quorum: xx.xx.xx.xx:xxxx
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: /ABCD
high-availability.storageDir: file:///recovery
heartbeat.interval: 60000
heartbeat.timeout: 60000
taskmanager.debug.memory.log: true
taskmanager.debug.memory.log-interval: 10000
taskmanager.memory.managed.fraction: 0.1
blob.server.port: 6124
query.server.port: 6125

Flink CLI throws exception on EMR on a yarn cluster

After moving my enviornment from standalone cluster to yarn EMR cluster, I have been running into issues after with the flink cli commands when a job is running for a long time. Running flink list on the CLI I will get an exception thrown:
> bin/flink list
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/flink-1.6.0/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Waiting for response...
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.util.FlinkException: Failed to retrieve job list.
at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:438)
at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:420)
at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:979)
at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:417)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1047)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:276)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 17 more
Caused by: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:325)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
... 7 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
The job itself, as well as yarn, seem to be fine, there is no issue there. I am unsure of the time it takes for this to happen, I have some jobs running for about a week with no problems, but usually after 2+ weeks the exception will occur at some point. I am currently running version 1.6.0.
I am not sure would logs would be useful in this case, but would be happy to provide anything I can in order to solve this problem.
Thank you
Update with logs:
018-11-28 17:01:52,368 INFO org.apache.flink.client.cli.CliFrontend - --------------------------------------------------------------------------------
2018-11-28 17:01:52,369 INFO org.apache.flink.client.cli.CliFrontend - Starting Command Line Client (Version: 1.6.0, Rev:ff472b4, Date:07.08.2018 # 13:31:13 UTC)
2018-11-28 17:01:52,369 INFO org.apache.flink.client.cli.CliFrontend - OS current user: hadoop
2018-11-28 17:01:52,790 INFO org.apache.flink.client.cli.CliFrontend - Current Hadoop/Kerberos user: hadoop
2018-11-28 17:01:52,790 INFO org.apache.flink.client.cli.CliFrontend - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-11-28 17:01:52,790 INFO org.apache.flink.client.cli.CliFrontend - Maximum heap size: 7150 MiBytes
2018-11-28 17:01:52,790 INFO org.apache.flink.client.cli.CliFrontend - JAVA_HOME: /etc/alternatives/jre
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - Hadoop version: 2.8.3
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - JVM Options:
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - -Dlog.file=/home/hadoop/flink-1.6.0/log/flink-hadoop-client.log
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - -Dlog4j.configuration=file:/home/hadoop/flink-1.6.0/conf/log4j-cli.properties
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - -Dlogback.configurationFile=file:/home/hadoop/flink-1.6.0/conf/logback.xml
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - Program Arguments:
2018-11-28 17:01:52,792 INFO org.apache.flink.client.cli.CliFrontend - list
2018-11-28 17:01:52,794 INFO org.apache.flink.client.cli.CliFrontend - --------------------------------------------------------------------------------
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 20480m
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 20480m
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.fraction, 0.9
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-11-28 17:01:52,797 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2018-11-28 17:01:52,798 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, rocksdb
2018-11-28 17:01:52,798 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, s3://bucket/checkpoint
2018-11-28 17:01:52,798 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.checkpoints.dir, s3://bucket/checkpoint
2018-11-28 17:01:52,798 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2018-11-28 17:01:52,798 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.timeout, 60000
2018-11-28 17:01:52,798 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 60s
2018-11-28 17:01:53,029 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to hadoop (auth:SIMPLE)
2018-11-28 17:01:53,051 INFO org.apache.flink.client.cli.CliFrontend - Running 'list' command.
2018-11-28 17:01:53,082 WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2018-11-28 17:01:53,256 INFO org.apache.flink.runtime.rest.RestClient - Rest client endpoint started.
2018-11-28 17:01:53,418 INFO org.apache.flink.client.cli.CliFrontend - Waiting for response...
2018-11-28 17:02:53,492 INFO org.apache.flink.runtime.rest.RestClient - Shutting down rest endpoint.
2018-11-28 17:02:53,493 INFO org.apache.flink.runtime.rest.RestClient - Rest endpoint shutdown complete.
2018-11-28 17:02:53,495 ERROR org.apache.flink.client.cli.CliFrontend - Error while running the command.
org.apache.flink.util.FlinkException: Failed to retrieve job list.
at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:438)
at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:420)
at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:979)
at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:417)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1047)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:276)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 17 more
Caused by: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:325)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
... 7 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
Entrypoint + config logs
Container: container_1541525872902_0001_01_000001 on compute.internal_8041
=======================================================================================================
LogType:jobmanager.log
Log Upload Time:Thu Nov 29 20:13:10 +0000 2018
LogLength:12837590
Log Contents:
2018-11-06 18:26:25,585 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-11-06 18:26:25,586 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.6.0, Rev:ff472b4, Date:07.08.2018 # 13:31:13 UTC)
2018-11-06 18:26:25,586 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: yarn
2018-11-06 18:26:26,007 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: hadoop
2018-11-06 18:26:26,007 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-11-06 18:26:26,007 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 13653 MiBytes
2018-11-06 18:26:26,007 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-openjdk
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.3
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx15360m
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/var/log/hadoop-yarn/containers/application_1541525872902_0001/container_1541525872902_0001_01_000001/jobmanager.log
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:logback.xml
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties
2018-11-06 18:26:26,008 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none)
2018-11-06 18:26:26,010 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-11-06 18:26:26,011 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2018-11-06 18:26:26,013 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon is running as: hadoop Yarn client user obtainer: hadoop
2018-11-06 18:26:26,015 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, s3://bucket/checkpoint
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.timeout, 60000
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, application_1541525872902_0001
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: internal.cluster.execution-mode, NORMAL
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.fraction, 0.9
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2018-11-06 18:26:26,016 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-11-06 18:26:26,017 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, rocksdb
2018-11-06 18:26:26,017 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 60s
2018-11-06 18:26:26,017 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 20480m
2018-11-06 18:26:26,017 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 20480m
2018-11-06 18:26:26,017 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.checkpoints.dir, s3://bucket/checkpoint
2018-11-06 18:26:26,031 INFO org.apache.flink.runtime.clusterframework.BootstrapTools - Setting directories for temporary files to: /mnt/yarn/usercache/hadoop/appcache/application_1541525872902_0001
2018-11-06 18:26:26,046 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint.
2018-11-06 18:26:26,046 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2018-11-06 18:26:26,108 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to hadoop (auth:SIMPLE)
2018-11-06 18:26:26,125 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2018-11-06 18:26:26,131 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at compute.internal:40607
2018-11-06 18:26:26,612 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2018-11-06 18:26:26,706 INFO akka.remote.Remoting - Starting remoting
2018-11-06 18:26:26,804 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink#compute.internal:40607]
2018-11-06 18:26:26,813 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink#compute.internal:40607
2018-11-06 18:26:26,838 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /mnt/yarn/usercache/hadoop/appcache/application_1541525872902_0001/blobStore-b4eb7331-9ac8-4fc9-ab1f-64f6a9c8173f
2018-11-06 18:26:26,842 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:32901 - max concurrent requests: 50 - max backlog: 1000
2018-11-06 18:26:26,857 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2018-11-06 18:26:26,860 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /mnt/yarn/usercache/hadoop/appcache/application_1541525872902_0001/executionGraphStore-0b405259-cc50-4332-93dc-847b92071699, expiration time 3600000, maximum cache size 52428800 bytes.
2018-11-29 20:13:10,717 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Resume Flink when yarn crashes

I am running a yarn 3 node cluster on EMR(1 Master 2 Core nodes). I am using 1.6.0. I have check-pointing enabled(rocksdb), writing to S3. Check-pointing seems to work correctly in other tests. In the case where yarn crashes(In this case, I killed the yarn processes) on the master node, I an unable to resume my application from the last checkpoint. Here is the output when I try and restart:
[hadoop#emr flink-1.6.0]$ bin/flink run -s s3://bucket/kinesis-pipeline-checkpoint/a8a9ceb95845c3ea9833e025b5771470 -p 1 -d ~/pipeline-assembly-0.2.0.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/flink-1.6.0/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-11-08 19:01:06,069 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2018-11-08 19:01:06,069 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2018-11-08 19:01:06,488 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 1
2018-11-08 19:01:06,488 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 1
YARN properties set default parallelism to 1
2018-11-08 19:01:06,637 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at emr:8032
2018-11-08 19:01:06,745 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-11-08 19:01:06,745 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-11-08 19:01:06,845 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Found application JobManager host name 'emr' and port '39541' from supplied application id 'application_1541703591281_0001'
Starting execution of program
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: c701b6511ad76b5e4faae703763f388e)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
... 5 more
Is this expected behavior, or am I doing something wrong in this situation?
Thank you
UPDATE: jobmanager.log
LogType:jobmanager.log
Log Upload Time:Tue Nov 20 16:37:52 +0000 2018
LogLength:49255
Log Contents:
2018-11-20 16:33:33,276 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-11-20 16:33:33,277 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.6.0, Rev:ff472b4, Date:07.08.2018 # 13:31:13 UTC)
2018-11-20 16:33:33,278 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: yarn
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: hadoop
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 13653 MiBytes
2018-11-20 16:33:33,672 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-openjdk
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.3
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx15360m
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.log
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:logback.xml
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties
2018-11-20 16:33:33,673 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none)
2018-11-20 16:33:33,674 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-11-20 16:33:33,675 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2018-11-20 16:33:33,678 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon is running as: hadoop Yarn client user obtainer: hadoop
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, s3://bucket/kinesis-checkpoint
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.timeout, 60000
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, application_1542731534971_0001
2018-11-20 16:33:33,680 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: internal.cluster.execution-mode, NORMAL
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.fraction, 0.9
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, rocksdb
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 60s
2018-11-20 16:33:33,681 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 20480m
2018-11-20 16:33:33,682 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 20480m
2018-11-20 16:33:33,682 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.checkpoints.dir, s3://bucket/kinesis-checkpoint
2018-11-20 16:33:33,695 INFO org.apache.flink.runtime.clusterframework.BootstrapTools - Setting directories for temporary files to: /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001
2018-11-20 16:33:33,708 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint.
2018-11-20 16:33:33,708 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2018-11-20 16:33:33,772 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to hadoop (auth:SIMPLE)
2018-11-20 16:33:33,786 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2018-11-20 16:33:33,791 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at ip-172-31-18-80.us-west-2.compute.internal:45751
2018-11-20 16:33:34,239 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2018-11-20 16:33:34,328 INFO akka.remote.Remoting - Starting remoting
2018-11-20 16:33:34,428 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751]
2018-11-20 16:33:34,437 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751
2018-11-20 16:33:34,469 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/blobStore-1dc43ec8-8ed7-4342-adae-c8d20a691640
2018-11-20 16:33:34,473 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:39955 - max concurrent requests: 50 - max backlog: 1000
2018-11-20 16:33:34,488 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2018-11-20 16:33:34,492 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/executionGraphStore-0c4fd7ac-17d2-40d6-b279-dfef5041a76f, expiration time 3600000, maximum cache size 52428800 bytes.
2018-11-20 16:33:34,514 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/blobStore-4c662c5c-afa5-4bf2-8a01-3acc0b9aa491
2018-11-20 16:33:34,521 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-6885656b-18cc-451f-8853-03ff7cf14b0e/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2018-11-20 16:33:34,522 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-6885656b-18cc-451f-8853-03ff7cf14b0e/flink-web-upload for file uploads.
2018-11-20 16:33:34,525 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2018-11-20 16:33:34,702 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.log
2018-11-20 16:33:34,702 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.out
2018-11-20 16:33:34,844 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at ip-172-31-18-80.us-west-2.compute.internal:35939
2018-11-20 16:33:34,844 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://ip-172-31-18-80.us-west-2.compute.internal:35939 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2018-11-20 16:33:34,844 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://ip-172-31-18-80.us-west-2.compute.internal:35939.
2018-11-20 16:33:34,857 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.yarn.YarnResourceManager at akka://flink/user/resourcemanager .
2018-11-20 16:33:34,948 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-11-20 16:33:34,981 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-172-31-30-52.us-west-2.compute.internal/172.31.30.52:8030
2018-11-20 16:33:35,234 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).
2018-11-20 16:33:35,237 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0
2018-11-20 16:33:35,238 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2018-11-20 16:33:35,239 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2018-11-20 16:33:35,252 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink#ip-172-31-18-80.us-west-2.compute.internal:45751/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2018-11-20 16:33:35,252 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2018-11-20 16:34:20,094 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Submitting job bd0d5dbaeba3990a3bef1eebee49cd79 (Data Session Pipeline v0.0.7).
2018-11-20 16:34:20,108 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/jobmanager_0 .
2018-11-20 16:34:20,115 INFO org.apache.flink.runtime.jobmaster.JobMaster - Initializing job Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,124 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using restart strategy FixedDelayRestartStrategy(maxNumberRestartAttempts=2147483647, delayBetweenRestartAttempts=0) for Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,127 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.slotpool.SlotPool at akka://flink/user/0e6f5de3-53ad-4bae-acf3-3c66106c0a54 .
2018-11-20 16:34:20,148 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart
2018-11-20 16:34:20,170 INFO org.apache.flink.runtime.jobmaster.JobMaster - Running initialization on master for job Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,170 INFO org.apache.flink.runtime.jobmaster.JobMaster - Successfully ran initialization on master in 0 ms.
2018-11-20 16:34:20,203 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 's3://bucket/kinesis-checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=null, enableIncrementalCheckpointing=TRUE}
2018-11-20 16:34:20,203 INFO org.apache.flink.runtime.jobmaster.JobMaster - Configuring application-defined state backend with job/cluster config
2018-11-20 16:34:22,624 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job bd0d5dbaeba3990a3bef1eebee49cd79 from savepoint s3://bucket/kinesis-pipeline-checkpoint/8a6e5aeebeef202a2daddd3cf9419a80 ()
2018-11-20 16:34:22,663 ERROR org.apache.flink.runtime.rest.handler.job.JobSubmitHandler - Exception occurred in REST handler.
org.apache.flink.runtime.rest.handler.RestHandlerException: Job submission failed.
at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$handleRequest$2(JobSubmitHandler.java:119)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$submitJob$2(Dispatcher.java:256)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:690)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
... 4 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
... 24 more
Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
... 18 more
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
at org.apache.flink.util.function.ConsumerWithException.accept(ConsumerWithException.java:40)
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$waitForTerminatingJobManager$29(Dispatcher.java:820)
at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:705)
... 19 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:176)
at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:936)
at org.apache.flink.runtime.dispatcher.Dispatcher.createJobManagerRunner(Dispatcher.java:291)
at org.apache.flink.runtime.dispatcher.Dispatcher.runJob(Dispatcher.java:281)
at org.apache.flink.runtime.dispatcher.Dispatcher.persistAndRunJob(Dispatcher.java:266)
at org.apache.flink.util.function.ConsumerWithException.accept(ConsumerWithException.java:38)
... 21 more
Caused by: java.io.FileNotFoundException: Cannot find meta data file '_metadata' in directory 's3://sledfs/kinesis-pipeline-checkpoint/8a6e5aeebeef202a2daddd3cf9419a80'. Please try to load the checkpoint/savepoint directly from the metadata file instead of the directory.
at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:256)
at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:109)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1102)
at org.apache.flink.runtime.jobmaster.JobMaster.tryRestoreExecutionGraphFromSavepoint(JobMaster.java:1220)
at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1144)
at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:295)
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)
... 26 more
2018-11-20 16:37:52,321 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2018-11-20 16:37:52,322 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2018-11-20 16:37:52,340 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:39955
The checkpoint you are referring to s3://bucket/kinesis-pipeline-checkpoint/a8a9ceb95845c3ea9833e025b5771470 does not contain a valid _metadata file. This indicates that this checkpoint was started but could not be completed. Please choose a checkpoint which has been successfully completed.

Flink job fails after 10 minutes from initialization

I'm having problems with flink application fail.
This streaming job runs shortly after deploying on Yarn.
But is fails after some minutes with below error messages.
Can it be the evidence of high load in low performance yarn cluster?
1.5.0 flink and yarn single job
Single node is equipped with 100GBytes RAM and 40 v-cores
48 Yarn node manager.
2 Kafka topic input ( 150GBytes/hour for each input stream. )
480 kafka partition.
10 flink slot per node manager
From the beginning of the flink
Log Type: jobmanager.log
Log Upload Time: Tue Jun 12 18:19:50 +0900 2018
Log Length: 10807897
2018-06-11 18:59:27,167 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-06-11 18:59:27,168 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.0, Rev:c61b108, Date:24.05.2018 # 14:54:44 UTC)
2018-06-11 18:59:27,168 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: irteam
2018-06-11 18:59:27,472 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-11 18:59:27,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: irteam
2018-06-11 18:59:27,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.161-b14
2018-06-11 18:59:27,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 66667 MiBytes
2018-06-11 18:59:27,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
2018-06-11 18:59:27,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.3
2018-06-11 18:59:27,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx75000m
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djava.library.path=/home1/irteam/realtime-tools
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/naver/search-cluster/eye/var/logs/application_1528711080009_0002/container_e08_1528711080009_0002_01_000001/jobmanager.log
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:logback.xml
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none)
2018-06-11 18:59:27,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Class path[omit]
2018-06-11 18:59:27,539 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-06-11 18:59:27,539 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2018-06-11 18:59:27,542 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon is running as: irteam Yarn client user obtainer: irteam
2018-06-11 18:59:27,544 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.home, "/usr/lib/jvm/java-1.8.0-openjdk"
2018-06-11 18:59:27,544 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.opts, "-Djava.library.path=/home1/irteam/realtime-tools"
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, application_1528711080009_0002
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 0.0.0.0
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 100000
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.request-backoff.max, 100000
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: containerized.taskmanager.env.JAVA_HOME, /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: internal.cluster.execution-mode, NORMAL
2018-06-11 18:59:27,545 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 480
2018-06-11 18:59:27,546 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 10
2018-06-11 18:59:27,546 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 100000
2018-06-11 18:59:27,546 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: containerized.master.env.JAVA_HOME, /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
2018-06-11 18:59:27,558 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Setting directories for temporary files to: /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002
2018-06-11 18:59:27,570 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint.
2018-06-11 18:59:27,570 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2018-06-11 18:59:27,636 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to irteam (auth:SIMPLE)
2018-06-11 18:59:27,650 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2018-06-11 18:59:27,654 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at chd004.eye.nfra.io:33524
2018-06-11 18:59:28,126 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2018-06-11 18:59:28,222 INFO akka.remote.Remoting - Starting remoting
2018-06-11 18:59:28,322 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink#chd004.eye.nfra.io:33524]
2018-06-11 18:59:28,329 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink#chd004.eye.nfra.io:33524
2018-06-11 18:59:28,348 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002/blobStore-c25d4d9d-4ddc-442d-8d5e-7bec36dca006
2018-06-11 18:59:28,349 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:45733 - max concurrent requests: 50 - max backlog: 1000
2018-06-11 18:59:28,363 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2018-06-11 18:59:28,367 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002/executionGraphStore-63bcf196-410d-4d8c-8388-f270beb53555, expiration time 3600000, maximum cache size 52428800 bytes.
2018-06-11 18:59:28,388 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /home1/irteam/naver/search-cluster/eye/volume/nodemanager/usercache/irteam/appcache/application_1528711080009_0002/blobStore-02db740f-8c23-46e8-bb24-1f583b6a0b33
2018-06-11 18:59:28,395 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-8698d702-67fe-437c-b62e-78c2969bf770/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2018-06-11 18:59:28,396 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-8698d702-67fe-437c-b62e-78c2969bf770/flink-web-upload for file uploads.
2018-06-11 18:59:28,399 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2018-06-11 18:59:28,737 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /naver/search-cluster/eye/var/logs/application_1528711080009_0002/container_e08_1528711080009_0002_01_000001/jobmanager.log
2018-06-11 18:59:28,737 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /naver/search-cluster/eye/var/logs/application_1528711080009_0002/container_e08_1528711080009_0002_01_000001/jobmanager.out
2018-06-11 18:59:28,808 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at chd004.eye.nfra.io:39794
2018-06-11 18:59:28,808 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://chd004.eye.nfra.io:39794 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2018-06-11 18:59:28,808 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://chd004.eye.nfra.io:39794.
2018-06-11 18:59:28,817 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.yarn.YarnResourceManager at akka://flink/user/resourcemanager .
2018-06-11 18:59:28,902 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-06-11 18:59:28,916 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink#chd004.eye.nfra.io:33524/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2018-06-11 18:59:28,917 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2018-06-11 18:59:29,161 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).
2018-06-11 18:59:29,163 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0
2018-06-11 18:59:29,174 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink#chd004.eye.nfra.io:33524/user/dispatcher was granted leadership with fencing token 00000000000000000000000000000000
2018-06-11 18:59:29,174 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2018-06-11 18:59:31,120 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Submitting job 5f090c4f4287db062cee0996da5d5ffc (LCS realtime data).
2018-06-11 18:59:31,130 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/jobmanager_0 .
2018-06-11 18:59:31,136 INFO org.apache.flink.runtime.jobmaster.JobMaster - Initializing job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc).
2018-06-11 18:59:31,144 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using restart strategy FixedDelayRestartStrategy(maxNumberRestartAttempts=3, delayBetweenRestartAttempts=30000) for LCS realtime data (5f090c4f4287db062cee0996da5d5ffc).
2018-06-11 18:59:31,148 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.slotpool.SlotPool at akka://flink/user/a6ffe322-07db-4282-a29c-0836ad26cd9f .
2018-06-11 18:59:31,165 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart
2018-06-11 18:59:31,174 INFO org.apache.flink.runtime.jobmaster.JobMaster - Running initialization on master for job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc).
2018-06-11 18:59:31,174 INFO org.apache.flink.runtime.jobmaster.JobMaster - Successfully ran initialization on master in 0 ms.
2018-06-11 18:59:31,248 INFO org.apache.flink.runtime.jobmaster.JobMaster - Using application-defined state backend: File State Backend (checkpoints: 'file:/home1/irteam/apps/flink-1.4.0/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1)
2018-06-11 18:59:31,248 INFO org.apache.flink.runtime.jobmaster.JobMaster - Configuring application-defined state backend with job/cluster config
2018-06-11 18:59:31,258 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManager runner for job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://flink#chd004.eye.nfra.io:33524/user/jobmanager_0.
2018-06-11 18:59:31,260 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc)
2018-06-11 18:59:31,261 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc) switched from state CREATED to RUNNING.
2018-06-11 18:59:31,264 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/480) (98a01166bb2ac99dd301e4b60febbc45) switched from CREATED to SCHEDULED.
Near the timeout event which might cause flink job fails.
2018-06-12 18:17:39,750 INFO org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator - Cancelling sample 5589
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#chd023.eye.nfra.io:34783/user/taskmanager_0#-297572584]] after [15000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
2018-06-12 18:17:39,770 INFO org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator - Cancelling sample 5590
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#chd032.eye.nfra.io:34653/user/taskmanager_0#424015125]] after [15000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
2018-06-12 18:17:51,270 INFO org.apache.flink.runtime.rest.handler.legacy.backpressure.StackTraceSampleCoordinator - Cancelling sample 5591
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#chd032.eye.nfra.io:34653/user/taskmanager_0#424015125]] after [15000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
2018-06-12 18:17:55,650 INFO org.apache.flink.yarn.YarnResourceManager - The heartbeat of TaskManager with id container_e08_1528711080009_0002_01_000017 timed out.
2018-06-12 18:17:55,650 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_e08_1528711080009_0002_01_000017 because: The heartbeat of TaskManager with id container_e08_1528711080009_0002_01_000017 timed out.
2018-06-12 18:17:55,650 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Unregister TaskManager 525095d833344e8b205017666accd9c5 from the SlotManager.
2018-06-12 18:17:55,650 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(EventTimeSessionWindows(300000), NowTrigger, NowSessionProcessor) -> Sink: Unnamed (188/480) (f9ed2fc23d6ca5a364300864b60760af) switched from RUNNING to FAILED.
org.apache.flink.util.FlinkException: Releasing TaskManager container_e08_1528711080009_0002_01_000017.
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-06-12 18:17:55,651 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job LCS realtime data (5f090c4f4287db062cee0996da5d5ffc) switched from state RUNNING to FAILING.
org.apache.flink.util.FlinkException: Releasing TaskManager container_e08_1528711080009_0002_01_000017.
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-06-12 18:17:55,679 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/480) (98a01166bb2ac99dd301e4b60febbc45) switched from RUNNING to CANCELING.

Resources