Pass Flink Job Manager Configuration via Flink Submit Job Rest API

Pass Flink Job Manager Configuration via Flink Submit Job Rest API - apache-flink

We are using the Flink REST API to submit job to Flink EMR clusters. These clusters are already running in Session mode. We want to know if there is any way to pass following Flink Job manager configuration param while submitting the job via Flink REST API call.
s3.connection.maximum : 1000
state.backend.local-recovery: true
state.checkpoints.dir: hdfs://ha-nn-uri/flink/checkpoints
state.savepoints.dir : hdfs://ha-nn-uri/flink/savepoints
I figured out Flink submit job has "programArgs" field and I tried using it but Flink job manager configuration didn't pick up these settings
"programArgs": f" --s3.connection.maximum 1000 state.backend.local-recovery true --stage '{ddb_config}' --cell-name '{cluster_name}'"

Related

Flink Submit Job REST API returning JobManager metaspace error

We are using Flink REST APIs to submit jobs on Flink. I observed that we start getting Metaspace size error on submit job if we frequently do multiple submit jobs after the upload jar REST call. Our job jar size is around ~300MB. I suspect Flink is storing all old jars in its memory, which takes up a lot of space. Is there any automated way to clean up old jars?
{
"errors": [
"Internal server error: Metaspace"
]
}

Query on automating Flink Job submission

I am trying to use Flink REST APIs to automate Flink job submission process via pipeline. To call any Flink Rest endpoint we should be aware about the Job Manager Web interface IP. For my POC, I got the IP after running flink-yarn-session command on CLI, but what is the way to get it from code?
Fo automation, I am planning to call following REST API in sequence
request. get('http://ip-10-0-127-59.ec2.internal:8081/jobs/overview') // Get Running job Id
requests.post('http://ip-10-0-127-59.ec2.internal:8081/jobs/:jobID/savepoints/') // Cancel job with savepoint
requests.get('http://ip-10-0-127-59.ec2.internal:8081/jobs/:JobId/savepoints/
:savepointId') // Get savepoint status
requests. Post("http://ip-10-0-127-59.ec2.internal:8081/jars/upload"). // Upload jar for new job
requests.post(
"http://ip-10-0-127-59.ec2.internal:8081/jars/de05ced9-03b7-4f8a-bff9-4d26542c853f_ATVPlaybackStateMachineFlinkJob-1.0-super-2.3.3.jar/run") // submit new job
requests.get('http://ip-10-0-116-99.ec2.internal:35497/jobs/:jobId') // Get status of new job

If you have the flexibility to run on Kubernetes instead on Yarn (looks like you are on AWS from your hostnames, so you could use EKS) then I would recommend using the official Flink Kubernetes Operator - it is built for exactly this purpose by the community.
If Yarn is a given for your use case then you may follow the code examples that Flink uses to talk to the Yarn ResourceManager in the flink-yarn package, especially the following:
https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L384
https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManagerDriver.java#L258

Set a Job Name to Flink job using Table API

I want to set up a Job Name for my Flink application written using Table API, like I did it using Streaming API env.execute(jobName).
I want to replace:
I can't find a way in documentation except to do it while running a job from jar
bin/flink run -d -yD pipeline.name=MyPipelineName-v1.0 ...
flink: 1.14.5
env: Yarn
Update:
In case someone will face the same situation. We can add Table API pipelines to Data Stream API Doc, so doing like that will allow us to have a desired job name set programmatically.
Ex.:
val sinkDescriptor = TableDescriptor.forConnector("kafka")
.option("topic","topic_out")
.option("properties.bootstrap.servers", "localhost:9092")
.schema(schema)
.format(FormatDescriptor.forFormat("avro").build())
.build()
tEnv.createTemporaryTable("OutputTable", sinkDescriptor)
statementSet.addInsert(sinkDescriptor, tA)
statementSet.attachAsDataStream()
env.execute(jobName)

Only StreamExecutionEnvironment calls setJobName on the stream graph.

flink disk usage in job manager increases after every job submission over rest

I have deployed my own flink setup in AWS ECS. One Service for JobManager and one Service for task Managers. I am running one ECS task for job manager and 3 ecs tasks for TASK managers.
I have a kind of batch job which I upload using flink rest every-day with changing new arguments, when I submit each time disk memory getting increased by ~ 600MB, I have given a checkpoint as S3 . Also I have set historyserver.archive.clean-expired-jobs true .
Since I am running on ECS, not able to find why the memory is getting increased on every jar upload and execution.
What are the flink config params I should look to make sure the memory is not shooting up on every new job upload?

Try this configuration.
blob.service.cleanup.interval:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#blob-service-cleanup-interval
historyserver.archive.retained-jobs:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#historyserver-archive-retained-jobs

Prometheus alert for flink failed job?

I'm trying to monitor the availability of my flink jobs using Prometheus alerts.
I have tried with the flink_jobmanager_job_uptime/downtime metrics but they don't seem to fit since they just stop being emmited after the job has failed/finished.
I have already been pointed out to the numRunningJobs metric in order to alert of a missing job. I don't want to use this solution since I would have to update my prometheus config each time i want to deploy a new job.
Has anyone managed to create this alert of a Flink failed job using Prometheus?

Prometheus has an absent() function that will return 1 if the metric don't exist. So, you can just set the alert expression to something like
absent(flink_jobmanager_job_uptime) == 1