How to install Flink on Mesos cluster without DC/OS? - apache-flink

I am newbie in Apache Flink and our team is trying to set up an Apache Flink Cluster on Apaches Mesos. We have already installed Apache Mesos & Marathon with 3 Master nodes and 3 Slaves and now we are trying to install Apache Flink without DC/OS as mentioned here https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#mesos-without-dcos.
I have couple of questions over here :
Do we need to download Flink on all the nodes(master and slaves) and configure mesos.master in all nodes?
Or Shall we download flink on only one master node and configure mesos.master over there?
If flink needs to be downloaded on all the nodes then what should be the location of flink directory or if there is any script where I can specify that?
Is running "mesos-appmaster.sh" on master node also responsible for running flink libraries and classes on slaves?
Thanks

Do we need to download Flink on all the nodes(master and slaves) and configure mesos.master in all nodes?
No you don't. Actualy it depends on the way you want to run Flink. In your setup the most convenient way to run Flink would be to run it with Marathon and download binaries during deployment. See this
Or Shall we download flink on only one master node and configure mesos.master over there?
It's up to you. You can run Flink on dedicated server or let Marathon do it for you. If you already have Marathon then it's easier to run Flink with Marathon. On the other hand for debugging purposes and proof of concept I'll recommend standalone version where you can quickly change configuration on local machine and see how it works. Creating docker images or binaries and publishing them in repository and finally deploying Flink on Marathon could have more overhead that will slow you down on development but will keep you safe on production. Flink does not come with support for High Availability (HA) so Marathon is required to provide basic HA support (launch new instance of Flink when agent crash).
If flink needs to be downloaded on all the nodes then what should be the location of flink directory or if there is any script where I can specify that?
Flink does not have to be downloaded on all nodes. It can be downloaded when needed at deployment.
Is running "mesos-appmaster.sh" on master node also responsible for running flink libraries and classes on slaves?
Flink is a scheduler which means that it should start tasks and executors on Mesos when needed.

Even when not using DC/OS, feel free to look at the Apache Flink DC/OS package. At its core, it is a marathon app definition you can deploy on pure Marathon/Mesos. The Flink package (as of today) does not require any DC/OS specific features.
The DC/OS example might also provide useful information.

Related

Is it possible to add new embedded worker while cluster is running on statefun?

Here is the deal;
I'm dealing with adding new worker (embbeded) to on running the cluster (flink statefun 2.2.1).
As you see the new task manager can be registered to the cluster;
Screenshot of new deployed taskmanager
But it doesn't initialize (it doesn't deploying sources);
What am I missing here?? (master and workers has to same jar files too? or it should be enough deploying taskmanager with jar file)
Any help would be appreciated,
Thx.
Flink supports two different approaches to rescaling: active and reactive.
Reactive mode is new in Flink 1.13 (released just this week), and works as you expected: add (or remove) a task manager, and your application will adjust to the new parallelism. You can read about elastic scaling and reactive mode in the docs.
Reactive mode is currently a work in progress, but might need your needs.
In broad strokes, for active mode rescaling you need to:
Do a stop with savepoint to bring down your current job while taking a snapshot of its state.
Relaunch with the new parallelism, using the savepoint as the starting point.
The exact details depend on how your cluster is deployed.
For a step-by-step tutorial, see Upgrading & Rescaling a Job in the Flink Operations Playground.
The above applies to rescaling statefun embedded functions. Being stateless, remote functions can be rescaled more straightforwardly.

After the Flink version is upgraded, the taskmanager log information cannot be seen in the web UI

After the Flink version is upgraded, the taskmanager log information can not be seen in the web UI. In stdout, you can see the log of the code itself, but can not see the log of Spring and Flink itself.
What version have you upgraded to, and how is Flink running (i.e., Yarn, Kubernetes, standalone, etc)?
With some versions of Flink in certain environments, the logs aren't available in the web UI because they are being aggregated elsewhere. For example, you will want to use something like kubectl logs to access the logs if you are running on Kubernetes with certain versions of Flink.
UPDATE
Flink 1.11 switched from log4j1 to log4j2. See the release notes for details. Also, the logging properties file log4j-yarn-session.properties was renamed to log4j-session.properties and yarn-session.sh was updated to use the new file. Again, see the release notes for more info.

Flink EMR Installation

I am new to flink and trying to deploy this on EMR cluster. I have used 3 node cluster (1 master and 2 slaves) with their default configuration. I have not done any configuration changes and sticking with default configuration.
I am curious to understand the following points:
How does master and slaves communicate with each other as I have not mentioned any IP in conf/slaves in master node?
I can see a flink library in master node (Path: /usr/lib/flink) but cannot find flink library in slave nodes. How is my code getting executed on slave nodes?
I will change some config according to my requirements in conf/flink-config.yml, if required. Do I need to make any other change on master or slave node apart from this?
See the Running flink-crawler in EMR wiki page for details on how we run a Flink streaming job on top of EMR. Note that in this mode Flink is running via YARN, thus the Flink conf/slaves file isn't being used. You should also take a look at the YARN Setup documentation to better understand how Flink runs on top of YARN.

Flink dynamic scaling

I am currently studying scalability on Flink. Starting from Version 1.2.0, dynamic rescaling was introduced. I am looking at scaling a long running job which reads data from Kafka source.
Questions regarding dynamic rescaling.
To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource?
I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already available in existing flink yarn client application?
Pardon me if these questions are too basic, I did go through the documents and I have to admit I have not been able to put the concepts altogether with some test deployments on yarn recently.
Currently, Dynamic Scaling means the capability to update the operators' parallelism(Flink 1.2), either for keyed state or for non-keyed state.
To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource? - Yes, the job has to be stopped first, update the parallelism, and restart it again. Do not have to worry about the state, Flink will handle them, including repartition.
I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already available in existing flink yarn client application? - No, you can not. This feature seems to be added in the future. Currently, we can not do that.

How to deploy new changes of my flow to Apache Flink cluster?

For example I uploaded JAR with my flow and run it through Apache Flink dashboard. Then I implemented some changes in flow and want to deploy them.
Can anybody explain me step-by-step how to deploy new version of my flow to Apache Flink cluster correctly (without downtime, loosing state, etc.)? I didn't find description of deploy process in official documentation.
What you want to use is the savepoints in Flink.
The steps are as follows:
Prepare the new jar for your job
Save the state of the currently running job using flink savepoint <JobID>
Stop the job
Start the new jar using the just created savepoint flink run -s <pathToSavepoint> <jobJar> ...
See also: https://www.ververica.com/blog/how-apache-flink-enables-new-streaming-applications-part-1

Resources