How to fail whole flink application if one job gets fail? - apache-flink

There are two jobs running in flink shown in the below image, If one gets failed, I need to fail the whole flink application? How can I do it? Suppose job with parallelism:1 fails due to some exception, How to fail job with parallelism:4?

The details of how you should go about this depend a bit on the type of infrastructure you are using to run Flink, and how are submitting the jobs. But if you look at ClusterClient and JobClient and associated classes, you should be able to find a way forward.
If you aren't already, you may want to take advantage of application mode, which was added in Flink 1.11. This makes it possible for a single main() method to launch multiple jobs, and added env.executeAsync() for non-blocking job submission.

Related

Flink StreamSink and Checkpoint Understanding

I have written a job where 5 different source and sink is there in a single application. i am writing the data in parquet format using stream sink. As parquet sink write data on checkpoint. If one of the source get some malform records than i am getting exception in sink.
But that causing my all the consumer to getting stopped. I am not able to write any data by other sinks also.
Example:
source1(kafka)---sink1(s3)
source2(kafka) -sink2(s3)
source3(kafka) - sink3(s3)
i need to understand why due to one sink getting failed causing all the consumer to get stopped and no data is getting write in S3. can somebody please help to understand this or i am missing something.
The application needs to fail or otherwise orderness and consistency guarantees cannot hold anymore. This is completely independent of checkpointing.
If just one task fails, all other tasks in one application need to fail as well as Flink cannot know which tasks are relevant or not.
In your case, you actually seem to have 3 independent applications. So you have three options:
If they should fail together, you put them all in the same StreamExecutionEnvironment as you have done.
If all applications should run independently, you need to start the job 3 times with different parameters. The three deployments can be then restarted independently.
If you would still like to deploy only once, then you could spawn 3 StreamExecutionEnvironments and let them run in parallel in different threads. The main should then join on these threads.

Data/event exchange between jobs

Is it possible in Apache Flink, to create an application, which consists of multiple jobs who build a pipeline to process some data.
For example, consider a process with an input/preprocessing stage, a business logic and an output stage.
In order to be flexible in development and (re)deployment, I would like to run these as independent jobs.
Is it possible in Flink to built this and directly pipe the output of one job to the input of another (without external components)?
If yes, where can I find documentation about this and can it buffer data if one of the jobs is restarted?
If no, does anyone have experience with such a setup and point me to a possible solution?
Thank you!
If you really want separate jobs, then one way to connect them is via something like Kafka, where job A publishes, and job B (downstream) subscribes. Once you disconnect the two jobs, though, you no longer get the benefit of backpressure or unified checkpointing/saved state.
Kafka can do buffering of course (up to some max amount of data), but that's not a solution to a persistent different in performance, if the upstream job is generating data faster than the downstream job can consume it.
I imagine you could also use files as the 'bridge' between jobs (streaming file sink and then streaming file source), though that would typically create significant latency as the downstream job has to wait for the upstream job to decide to complete a file, before it can be consumed.
An alternative approach that's been successfully used a number of times is to provide the details of the preprocessing and business logic stages dynamically, rather than compiling them into the application. This means that the overall topology of the job graph is static, but you are able to modify the processing logic while the job is running.
I've seen this done with purpose-built DSLs, PMML models, Javascript (via Rhino), Groovy, Java classloading, ...
You can use a broadcast stream to communicate/update the dynamic portions of the processing.
Here's an example of this pattern, described in a Flink Forward talk by Erik de Nooij from ING Bank.

how to deploy a new job without downtime

I have an Apache Flink application that reads from a single Kafka topic.
I would like to update the application from time to time without experiencing downtime. For now the Flink application executes some simple operators such as map and some synchronous IO to external systems via http rest APIs.
I have tried to use the stop command, but i get "Job termination (STOP) failed: This job is not stoppable.", I understand that the Kafka connector does not support the the stop behavior - a link!
A simple solution would be to cancel with savepoint and to redeploy the new jar with the savepoint, but then we get downtime.
Another solution would be to control the deployment from the outside, for example, by switching to a new topic.
what would be a good practice ?
If you don't need exactly-once output (i.e., can tolerate some duplicates) you can take a savepoint without cancelling the running job. Once the savepoint is completed, you start a second job. The second job could write to different topic but doesn't have to. When the second job is up, you can cancel the first job.

Flink: how to have separate log files and logging configuration for different jobs on the same TaskManager?

I run multiple job on the same TaskManager on a single machine. Both of them write to the same TaskManager's and JobManager's log and .out files.
Is there any way to separate the log files of these jobs into separate files with different logging configs for each job?
Is that solution also applied to Flink in standalone cluster mode?
As far as I know this is not possible. The reason is that when you make a call to the LOGGER, it would be quite hard for the framework to figure out from which job instance you are calling it.
The only way around it is to have one task manager per job, but it comes with a big overhead.

MVC3 Job Scheduler - XML Export

I am new to the community and looking forward to being a contributing member. I wanted to throw this out there and see if anyone had an advice:
I am currently in the middle of developing a MVC 3 app that controls various SQL Jobs. It basically allows user to schedule jobs to be completed in the future, but also also allows them to run jobs on demand.
I was thinking of having a thread run in the web app that pulls entity information into an XML file, and writing a window service to monitor this file to perform the requested jobs. Does this sound like a good method? Has anyone done something like this before and used a different approach? Any advice would be great. I will keep the forum posted on progress and practices.
Thanks
I can see you running into some issues using a file for complex communication between processes - files can generally only be written by one process at a time, so what happens if the worker process tries to remove a task at the same time as the web process tries to add a task?
A better approach would be to store the tasks in a database that is accessible to both processes - a database can be written to by multiple processes, and it is easy to select all tasks that have a scheduled date in the past.
Using a database you don't get to use FileSystemWatcher, which I suspect is one of the main reasons you want to use a file. If you really need the job to run instantly there are various sorts of messaging you could use, but for most purposes you can just check the queue table on a timer.

Resources