I have a flink job and it expects another dependency jar to be available at run time. Is it possible to submit a job with multiple jars in flink cluster. I can not build one common jar.
You could install the dependency jar on every flink node in the flink/lib folder as described here. That would make it available to user uploaded jars without the need for your users to bundle it with their jars.
Keep in mind that this would be handled entirely outside of the job submission API, so you would have to incorporate it into whatever system you're using to manage configuration on your flink nodes.
Related
In Spark, instead of building a fat-jar, we are able to upload third-party dependencies required for my job via --jars.
Similarly, I am able to upload the configuration files needed for my job via --files.
But in Flink, I checked --help and didn't find a similar option.
So:
For dependent jars, is here a way to load third-party dependencies for jobs on-demand without modifying the cluster environment? (The required third-party dependencies may be different for different jobs)
For the configured files, is there a way to upload them like --files? And how do I read it?
Thanks.
additional
When use flink run to submit a job to yarn-per-job, I can read my local-file in flink.
But when using flink run-application, I tried a variety of ways, are unable to get to read the local-file, there is any way to go to the it.
I am using yarn. Flink version: 1.14
Recently, I encountered a problem in Flink Logging in Standalone cluster mode when using logback.xml as logging. My requirement is that all my jobs should log in the particular folder and my flink framework logs should be placed in the seperate folder and also for each job running in my flink cluster there should be seperate folder for different jobs. I tested it in my local cluster which works fine and i get all my logs seperate folders respective to my Flink job submitted but as soon as i deploy my code in the Standalone cluster along with respective logback.xml for each job it doesn't logs at all. I also referred the follow. link for my query but still i am stuck with the problem.
Flink logging limitation: How to pass logging configuration to a flink job
Could you please specify where your log file resides ?
According to flink docs, it should either be specified explicitly by setting the environment property -Dlogback.configurationFile=<file> or by putting logback.xml in the classpath - usually, I overridden the one in flink/conf directory.
My Flink job has a jar file provided by the client, which I can store in /lib folder. Is there a way to reload the updated jar file, without restarting the cluster?
No, that is not possible with the current version (Flink 1.4.0, Dec 2017).
Flink offers savepoints to save the state of an application.
If you want to change the code (or dependencies) of an application, zou have to take a savepoint, update the code/dependencies, and restart the application from the savepoint.
This technique can also be used to scale an application up or down or to migrate it.
I am newbie in Apache Flink and our team is trying to set up an Apache Flink Cluster on Apaches Mesos. We have already installed Apache Mesos & Marathon with 3 Master nodes and 3 Slaves and now we are trying to install Apache Flink without DC/OS as mentioned here https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#mesos-without-dcos.
I have couple of questions over here :
Do we need to download Flink on all the nodes(master and slaves) and configure mesos.master in all nodes?
Or Shall we download flink on only one master node and configure mesos.master over there?
If flink needs to be downloaded on all the nodes then what should be the location of flink directory or if there is any script where I can specify that?
Is running "mesos-appmaster.sh" on master node also responsible for running flink libraries and classes on slaves?
Thanks
Do we need to download Flink on all the nodes(master and slaves) and configure mesos.master in all nodes?
No you don't. Actualy it depends on the way you want to run Flink. In your setup the most convenient way to run Flink would be to run it with Marathon and download binaries during deployment. See this
Or Shall we download flink on only one master node and configure mesos.master over there?
It's up to you. You can run Flink on dedicated server or let Marathon do it for you. If you already have Marathon then it's easier to run Flink with Marathon. On the other hand for debugging purposes and proof of concept I'll recommend standalone version where you can quickly change configuration on local machine and see how it works. Creating docker images or binaries and publishing them in repository and finally deploying Flink on Marathon could have more overhead that will slow you down on development but will keep you safe on production. Flink does not come with support for High Availability (HA) so Marathon is required to provide basic HA support (launch new instance of Flink when agent crash).
If flink needs to be downloaded on all the nodes then what should be the location of flink directory or if there is any script where I can specify that?
Flink does not have to be downloaded on all nodes. It can be downloaded when needed at deployment.
Is running "mesos-appmaster.sh" on master node also responsible for running flink libraries and classes on slaves?
Flink is a scheduler which means that it should start tasks and executors on Mesos when needed.
Even when not using DC/OS, feel free to look at the Apache Flink DC/OS package. At its core, it is a marathon app definition you can deploy on pure Marathon/Mesos. The Flink package (as of today) does not require any DC/OS specific features.
The DC/OS example might also provide useful information.
I am currently studying scalability on Flink. Starting from Version 1.2.0, dynamic rescaling was introduced. I am looking at scaling a long running job which reads data from Kafka source.
Questions regarding dynamic rescaling.
To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource?
I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already available in existing flink yarn client application?
Pardon me if these questions are too basic, I did go through the documents and I have to admit I have not been able to put the concepts altogether with some test deployments on yarn recently.
Currently, Dynamic Scaling means the capability to update the operators' parallelism(Flink 1.2), either for keyed state or for non-keyed state.
To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource? - Yes, the job has to be stopped first, update the parallelism, and restart it again. Do not have to worry about the state, Flink will handle them, including repartition.
I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already available in existing flink yarn client application? - No, you can not. This feature seems to be added in the future. Currently, we can not do that.