Flink run job with remote jar file - apache-flink

I'm new to flink and trying to submit my flink program to my flink cluster.
I have a flink cluster running on remote kubernetes and a blob storage on Azure.
I know how to submit a flink job when I have the jar file on my local machine but no idea how to submit the job with the remote jar file(the jar can be access by https)
checked the documents and it seems doesn't provide something like what we do in spark
Thanks in advance

I think you can use an init container to download the job jar into a shared volume, then submit the local jar to Flink.
Ads: Google's Flink Operator supports remote job jar, see this example.

Related

Uploading jar on multiple flink job managers

So currently we are using flink 1.12 in HA mode on production. There are 3 job managers (1 leader and 2 standby). When I am uploading a jar on one of the job managers, somehow it is not reflected on other job managers. Is there any way where I can achieve a behaviour where uploading jar to a single job manager also gets reflected in other job managers in HA as well?
The problem that I am facing due to this is that when the jar is uploaded on let's say 'A' job manager, but when sending a job submit a request using uploaded jar on 'B' job manager, I get an error saying jar not found.
On https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/ha/overview/ is mentioned State persistence: Persisting state which is required for the successor to resume the job execution (JobGraphs, user code jars, completed checkpoints)
That implies that Flink's HA service takes care of the JARs part too. Have you tried what happens if you shutdown the active JobManager?

Adding Hadoop dependencies to standalone Flink cluster

I want to create a Apache Flink standalone cluster with serveral taskmanagers. I would like to use HDFS and Hive. Therefore i have to add some Hadoop dependencies.
After reading the documentation, the recommended way is to set the HADOOP_CLASSPATH env variable. But how do i have to add the hadoop files? Should i download the source files in some directory like /opt/hadoop ont the taskmanagers and set the variable to this path?
I only know the old but deprecated way downloading a Uber-Jar with the dependencies and place it under the /lib folder.
Normally you'd do the standard Hadoop installation, since you (for HDFS) need Node Managers running on every server (with appropriate configuration), plus the NameNode running on your master server.
So then you can do something like this on the master server where you're submitting your Flink workflow:
export HADOOP_CLASSPATH=`hadoop classpath`
export HADOOP_CONF_DIR=/etc/hadoop/conf

Flink savepoint with local execution environment (like standalone application)

How can I implement flink savepoint with standalone application (local execution env or mini cluster).
I configured savepoint directory in flink-config.yaml file but not sure how to take the savepoint before shutdown the application and how to restore with restart the application?
is there any way or have to use flink cluster and then use the CLI.
Appreciate your help. thanks
You can use either the CLI or the REST API to trigger savepoints.
https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#savepoints
For example, to trigger a savepoint while leaving the job running:
./bin/flink savepoint <jobId> [savepointDirectory]
or to take a savepoint while stopping the job:
./bin/flink stop [-p targetDirectory] [-d] <jobID>
To restore the state from a savepoint during a restart:
./bin/flink run -s <savepointPath> ...
For a tutorial on this and related topics, see https://ci.apache.org/projects/flink/flink-docs-stable/try-flink/flink-operations-playground.html#upgrading--rescaling-a-job.
The REST API is documented here: https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html. For example, you can take a savepoint via
curl -X POST localhost:8001/jobs/:jobid/savepoints -d '{"cancel-job": false}'
If you want to use REST API to trigger savepoints without running a cluster, you can do this in your job to start a local cluster (in a single JVM) with the WebUI and REST API:
Configuration conf = new Configuration();
conf.setString("state.savepoints.dir", "file:///tmp/savepoints");
StreamExecutionEnvironment env =
StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
These are the only ways to do this with open source Flink. There are commercial products (such as the free community edition of Ververica Platform) that make this easier.

How to execute flink job remotely when the flink job jar is bulky

I have flink server running on Kubernetes cluster. I have a job jar which is bulky due to product and third party dependencies.
I run it via
ExecutionEnvironment env = ExecutionEnvironment.createRemoteEnvironment(host, port, jar);
The jar size is around 130 MB after optimization.
I want to invoke the remoteExecution without jar upload so that the upload does not happen everytime when the job needs to be executed. Is there a way to upload the jar once and call it remotely without mentioning the jar (in java)?
You could deploy a per job cluster on Kubernetes. This will submit your user code jar along with the Flink binaries to your Kubernetes cluster. The downside is that you cannot change the job afterwards without restarting the Flink cluster.

Where can I find my jar on Apache Flink server which I submitted using Apache Flink dashboard

I developed a Flink job and submitted my job using Apache Flink dashboard. Per my understanding, when I submit my job, my jar should be available on Flink server. I tried to figure out path of my jar but couldn't able to. Does Flink keep these jar file on server? If yes, where I can find? Any documentation? Please help. Thanks!
JAR files are renamed when they are uploaded and stored in a directory that can be configured with the web.upload.dir configuration key.
If the web.upload.dir parameter is not set, the JAR files are stored in a dynamically generated directory under the jobmanager.web.tmpdir (default is System.getProperty("java.io.tmpdir")).

Resources