Flink EMR Installation - apache-flink

I am new to flink and trying to deploy this on EMR cluster. I have used 3 node cluster (1 master and 2 slaves) with their default configuration. I have not done any configuration changes and sticking with default configuration.
I am curious to understand the following points:
How does master and slaves communicate with each other as I have not mentioned any IP in conf/slaves in master node?
I can see a flink library in master node (Path: /usr/lib/flink) but cannot find flink library in slave nodes. How is my code getting executed on slave nodes?
I will change some config according to my requirements in conf/flink-config.yml, if required. Do I need to make any other change on master or slave node apart from this?

See the Running flink-crawler in EMR wiki page for details on how we run a Flink streaming job on top of EMR. Note that in this mode Flink is running via YARN, thus the Flink conf/slaves file isn't being used. You should also take a look at the YARN Setup documentation to better understand how Flink runs on top of YARN.

Related

Is it possible to add new embedded worker while cluster is running on statefun?

Here is the deal;
I'm dealing with adding new worker (embbeded) to on running the cluster (flink statefun 2.2.1).
As you see the new task manager can be registered to the cluster;
Screenshot of new deployed taskmanager
But it doesn't initialize (it doesn't deploying sources);
What am I missing here?? (master and workers has to same jar files too? or it should be enough deploying taskmanager with jar file)
Any help would be appreciated,
Thx.
Flink supports two different approaches to rescaling: active and reactive.
Reactive mode is new in Flink 1.13 (released just this week), and works as you expected: add (or remove) a task manager, and your application will adjust to the new parallelism. You can read about elastic scaling and reactive mode in the docs.
Reactive mode is currently a work in progress, but might need your needs.
In broad strokes, for active mode rescaling you need to:
Do a stop with savepoint to bring down your current job while taking a snapshot of its state.
Relaunch with the new parallelism, using the savepoint as the starting point.
The exact details depend on how your cluster is deployed.
For a step-by-step tutorial, see Upgrading & Rescaling a Job in the Flink Operations Playground.
The above applies to rescaling statefun embedded functions. Being stateless, remote functions can be rescaled more straightforwardly.

Flink Logging not working in cluster mode

Recently, I encountered a problem in Flink Logging in Standalone cluster mode when using logback.xml as logging. My requirement is that all my jobs should log in the particular folder and my flink framework logs should be placed in the seperate folder and also for each job running in my flink cluster there should be seperate folder for different jobs. I tested it in my local cluster which works fine and i get all my logs seperate folders respective to my Flink job submitted but as soon as i deploy my code in the Standalone cluster along with respective logback.xml for each job it doesn't logs at all. I also referred the follow. link for my query but still i am stuck with the problem.
Flink logging limitation: How to pass logging configuration to a flink job
Could you please specify where your log file resides ?
According to flink docs, it should either be specified explicitly by setting the environment property -Dlogback.configurationFile=<file> or by putting logback.xml in the classpath - usually, I overridden the one in flink/conf directory.

How to install Flink on Mesos cluster without DC/OS?

I am newbie in Apache Flink and our team is trying to set up an Apache Flink Cluster on Apaches Mesos. We have already installed Apache Mesos & Marathon with 3 Master nodes and 3 Slaves and now we are trying to install Apache Flink without DC/OS as mentioned here https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#mesos-without-dcos.
I have couple of questions over here :
Do we need to download Flink on all the nodes(master and slaves) and configure mesos.master in all nodes?
Or Shall we download flink on only one master node and configure mesos.master over there?
If flink needs to be downloaded on all the nodes then what should be the location of flink directory or if there is any script where I can specify that?
Is running "mesos-appmaster.sh" on master node also responsible for running flink libraries and classes on slaves?
Thanks
Do we need to download Flink on all the nodes(master and slaves) and configure mesos.master in all nodes?
No you don't. Actualy it depends on the way you want to run Flink. In your setup the most convenient way to run Flink would be to run it with Marathon and download binaries during deployment. See this
Or Shall we download flink on only one master node and configure mesos.master over there?
It's up to you. You can run Flink on dedicated server or let Marathon do it for you. If you already have Marathon then it's easier to run Flink with Marathon. On the other hand for debugging purposes and proof of concept I'll recommend standalone version where you can quickly change configuration on local machine and see how it works. Creating docker images or binaries and publishing them in repository and finally deploying Flink on Marathon could have more overhead that will slow you down on development but will keep you safe on production. Flink does not come with support for High Availability (HA) so Marathon is required to provide basic HA support (launch new instance of Flink when agent crash).
If flink needs to be downloaded on all the nodes then what should be the location of flink directory or if there is any script where I can specify that?
Flink does not have to be downloaded on all nodes. It can be downloaded when needed at deployment.
Is running "mesos-appmaster.sh" on master node also responsible for running flink libraries and classes on slaves?
Flink is a scheduler which means that it should start tasks and executors on Mesos when needed.
Even when not using DC/OS, feel free to look at the Apache Flink DC/OS package. At its core, it is a marathon app definition you can deploy on pure Marathon/Mesos. The Flink package (as of today) does not require any DC/OS specific features.
The DC/OS example might also provide useful information.

DC/OS: modifying Cluster name post installation

I missed to update the cluster name (cluster_name) in my boot node's genconf/config.yaml before deploying the DC/OS cluster. I was wondering if there's a configuration/properties file in the nodes (or using dcos-cli or in etcd) that I need to change to update the cluster name string (that appears on the DC/OS UI). 'appreciate any help.
version: DC/OS 1.8
nodes running on CoreOS
size: 3 masters and 11 agents
The cluster name that appears on the DC/OS interface is extracted from the Mesos cluster name. According to this configuration generation file it's possible to change the name of the environment variable. Obviously you're going to have to restart the Mesos master one by one.
Important note: I have not had the possibility to test it, if you are in a production environment I highly recommend you not to do.

How to deploy new changes of my flow to Apache Flink cluster?

For example I uploaded JAR with my flow and run it through Apache Flink dashboard. Then I implemented some changes in flow and want to deploy them.
Can anybody explain me step-by-step how to deploy new version of my flow to Apache Flink cluster correctly (without downtime, loosing state, etc.)? I didn't find description of deploy process in official documentation.
What you want to use is the savepoints in Flink.
The steps are as follows:
Prepare the new jar for your job
Save the state of the currently running job using flink savepoint <JobID>
Stop the job
Start the new jar using the just created savepoint flink run -s <pathToSavepoint> <jobJar> ...
See also: https://www.ververica.com/blog/how-apache-flink-enables-new-streaming-applications-part-1

Resources