Flink - multiple job managers in jobmanager.rpc.address - apache-flink

I'm trying to configure Flink with two job managers for HA. Should I specify both of them in flink-conf.yaml / jobmanager.rpc.address ?
If yes how?

You don't need to. In HA mode the rpc.address is chosen automatically by default. Have a look at docs.
By default, the job manager will pick a random port for inter process
communication. You can change this via the
high-availability.jobmanager.port key. This key accepts single ports
(e.g. 50010), ranges (50000-50025), or a combination of both
(50010,50011,50020-50025,50050-50075).

Related

Flink on YARN: how do I specify the number of Task Managers

In early versions of Flink, (e.g., 1.6), I can specify the number of Task Managers for both session mode with -n and per-job mode with -yn, but the flags don't exist in later versions of Flink (e.g., 1.12).
Wondering how should I set the number of Task Managers on YARN for newer versions of Flink? Or what are the related properties I can use to control the resources used by Flink?
In newer versions of Flink, the resource manager dynamically launches task managers as needed to provide the number of slots requested by the job(s) that are submitted. Each task manager will take its configuration either from flink-conf.yaml, or from the parameters provided when the cluster is started via yarn-session.sh.

Passing custom parameters to docker when running Flink on Mesos/Marathon

My team are trying set-up Apache Flink (v1.4) cluster on Mesos/Marathon. We are using the docker image provided by mesosphere. It works really well!
Because of a new requirement, the task managers have to launched with extend runtime privileges. We can easily enable this runtime privileges for the app manager via the Marathon web UI. However, we cannot find a way to enable the privileges for task managers.
In Apache Spark, we can set spark.mesos.executor.docker.parameters privileged=true in Spark's configuration file. Therefore, Spark can pass this parameter to docker run command. I am wondering if Apache Flink allow us to pass a custom parameter to docker run when launching task managers. If not, how can we start task managers with extended runtime privileges?
Thanks
There is a new parameter mesos.resourcemanager.tasks.container.docker.parameters introduced in this commit which will allow passing arbitrary parameters to Docker.
Unfortunately, this is not possible as of right now (or only for the framework scheduler as Tobi pointed out).
I went ahead and created a Jira for this feature so you can keep track/add details/contribute it yourself: https://issues.apache.org/jira/browse/FLINK-8490
You should be able to tweak the setting for the parameters in the ContainerInfo of https://github.com/mesoshq/flink-framework/blob/master/index.js to support this. I’ll eventually update the Flink version in the Docker image...

Flink : Unable to collect Task Metrics via JMX

I have been able to run JMX with Flink with the following configuration applied to the flink-conf.yaml file of all nodes in the cluster:
metrics.reporters: jmx
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 9020-9022
env.java.opts: -Dcom.sun.management.jmxremote -
Dcom.sun.management.jmxremote.port=9999 -
Dcom.sun.management.jmxremote.authenticate=false -
Dcom.sun.management.jmxremote.ssl=false
When I run JConsole and listen on ports master-IP:9999/slave-IP:9020, I am able to see the system metrics like CPU, memory etc.
How can I access the task metrics and their respective graphs like bytesRead, latency etc. which are collected for each subtask and shown on the GUI.
you can go to mbeans tab on jconsole and there you will see various dropdowns on RHS in the name of job and tasks. Let me know if you have any issues.

Can't set parallelism using Flink's CLI or Web-UI when using Apache Beam

I am using Flink 1.2.1 running on Docker, with Task Managers distributed across different VMs as part of a Docker Swarm.
Uploading an Apache Beam application using the Flink Web UI and trying to set the parallelism at job submission point doesn't work. Neither does submit a job using the Flink CLI.
It seems like the parallelism doesn't get picked up at client level, it ends up defaulting to 1.
When I set the parallelism programmatically within the Apache Beam code, it works: flinkPipelineOptions.setParallelism(4);
I suspect the root of the problem may be in the org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks for Flink's GlobalConfiguration, which may not pick up runtime values passed to Flink.
Any ideas on how this could be fixed or worked around? I need to be able to change the parallelism dynamically, so the programmatic approach won't work, nor will setting the Flink configuration at system level.
I am using the following documentation:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/parallel.html
https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/runners/flink/DefaultParallelismFactory.html
This should probably be fixed in the Beam Flink Runner but as a workaround you can try setting the parallelism to -1 programatically. This should make the translation pick up the parallelism that is specified when submitting the job.

Flink add Task/JobManagers to cluster

Regarding adding new Task/JobManagers to an existing running cluster the procedure can be found here (https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cluster_setup.html#adding-jobmanagertaskmanager-instances-to-a-cluster).
However if we shutdown the cluster and start it again the information about the added hosts will be lost.
Is it safe practice that while adding the new host to the cluster to also update and save in parallel the "masters" and "slaves" configuration files on all nodes?
Yes it is absolutely safe. The information from masters and slaves files are read only in starting scripts.

Resources