Configuring Ports for Flink Job/Task Manager Metrics - apache-flink

I am running Flink in Amazon EMR. In flink-conf.yaml, I have metrics.reporter.prom.port: 9249-9250
Depending whether the job manager and task manager are running in the same node, the task manager metrics are reported on port 9250 (if running on same node as job manager), or on port 9249 (if running on a different node).
Is there a way to configure so that the task manager metrics are always reported on port 9250?
I saw a post that we can "provide each *Manager with a separate configuration." How to do that?
Thanks

You can configure different ports for the JM and TM by starting the processes with differently configured flink-conf.yaml.
On Yarn, Flink currently uses the same flink-conf.yaml for all processes.

Related

Daemon thread doesn't complete it's execution when we restart zookeeper

In our current architecture of the project we are using solr for gathering, storing and indexing documents from different sources and making them searchable in near real-time
Our web applications running on tomcat connecting to solr to create / modify the documents
Solr uses Zookeeper to keep the configuration centralized
There are 5 servers in our cluster where we are running solr
when the zookeeper restarts in one of the server the daemon thread created in the server doesn't complete it's execution due to which
We are getting continuous logs with below exceptions while trying to connect to zookeeper from tomcat instance
org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [org.apache.zookeeper.ClientCnxn$SendThread]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
which in some time runs out of thread in the server
can someone help me with the below question please ?
why the daemon thread doesn't complete it's execution when we restart zookeeper
Solr Version : 8.5.1
zookeeper version : 3.5.5

How to migrate the vertx timer in cluster mode?

i deploy multiple vertx instance in different server with cluster mode use HazelcastCluster. if there is a vertx.timer in one instance , when this instance unexpected shutdown, how can i migrate this timer to other instance and make sure this timer can run correctly in the right delay
bad english I apologize,but i really need help
Vert.x timers are not clustered. What you could do is:
starting your cluster nodes in High-Availability mode
deploy the verticle to a HA node.
When Vert.x runs with HA enabled, if a Vert.x instance where a
verticle runs fails or dies, the verticle is redeployed automatically
on another vert.x instance of the cluster.

Apache Flink number of taskmanagers in local mode

I am working on an Apache Flink (1.5.0) based streaming application.
As part of this I have launched Flink in local mode on my Windows machine.
In order to run my job with the degree of parallelism of 8, I need 8 Task managers providing one task slot each.
I added a task manager with following command:
/cygdrive/b/Binaries Flink/flink-1.5.0/bin/taskmanager.sh' start
The first few times, a task manager was added successfully with following message:
[INFO] 3 instance(s) of taskexecutor are already running on ... .
Starting taskexecutor daemon on host ... .
After 5 task managers were available I got the same message
[INFO] 5 instance(s) of taskexecutor are already running on ... .
Starting taskexecutor daemon on host ... .
The problem is that a sixth task manager is never created.
When I stop one task manager it goes down to 4, I can add one additional task manager but never more than 5 task managers.
Is there any limitation to the amount of task managers?
Did anyone experience a similar behaviour?
Thank you very much
There is no limit of how many TaskManager you can start locally. The only limit is the available resources you have on your local machine.
If you are using the standalone mode in Flink 1.5.0, then you can also set the number of slots per TaskManager to 7 by adding the following line to the flink-conf.yaml:
taskmanager.numberOfTaskSlots: 7

Remote debugging Flink local cluster

I want to deploy my jobs on a local Flink cluster during development (i.e. JobManager and TaskManager running on my development laptop), and use remote debugging. I tried adding
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005" to the flink-conf.yaml file. Since job and task manager are running on the same machine, the task manager throws exception stating that the socket is already in use and terminates. Is there any way I can get this running.
You are probably setting env.java.opts, which affects all JVMs started by Flink. Since the jobmanager gets started first, it grabs the port before the taskmanager is started.
You can use env.java.opts.taskmanager to pass parameters only for taskmanager JVMs.

Flink state backend for TaskManager

I have a Flink v1.2 setup with 1 JobManager, 2 TaskManagers each in it's own VM. I configured the state backend to filesystem and pointed it to a local location in the case of each of the above hosts (state.backend.fs.checkpointdir: file:///home/ubuntu/Prototype/flink/flink-checkpoints). I have set parallelism to 1 and each taskanager has 1 slot.
I then run an event processing job on the JobManager which assigns it to a TaskManager.
I kill the TaskManager running the job and after a few unsuccessful attempts on the failed TaskManager Flink tries to run the job on the remaining TaskManager. At this point it fails again because it cannot find the corresponding checkpoints / state : java.io.FileNotFoundException: /home/ubuntu/Prototype/flink/flink-checkpoints/56c409681baeaf205bc1ba6cbe9f8091/chk-14/46f6e71d-ebfe-4b49-bf35-23c2e7f97923 (No such file or directory)
The folder /home/ubuntu/Prototype/flink/flink-checkpoints/56c409681baeaf205bc1ba6cbe9f8091 only exists on the TaskManager that I killed and not on the other one.
My question is am I supposed to set the same location for checkpointing / state on all the task managers if I want the above functionality?
Thanks!
The checkpoint directory you use needs to be shared across all machines that make up your Flink cluster. Typically this would be something like HDFS or S3 but can be any shared filesystem.

Resources