Task Manager Affinity in Flink - apache-flink

We need to run 10 jobs on a Flink cluster, 4 out of them are not CPU bound, so for them, we can have 2xcpu task slots, however, 6 jobs are CPU bound and they need heavy CPU i.e vpcu/2 slots on each task manager. My question is how can I tell Flink that use x machines(task managers) for this job and y task managers for another one. Do I need to have a separate cluster for CPU bound jobs or is there any way to achieve this in a single cluster

Currently you will need to have a separate cluster for this. FLIP-169: DataStream API for Fine-Grained Resource Requirements, coming in Flink 1.14, may better support this use case.

Related

Share data between task slots in Flink JVM memory

I have 5 different jobs running in 5 task slots. They all read from Kafka and sink back to Kafka. Kafka load is about 200K messages/sec.
I have another job, lets say ,job6 which needs to get some information from these 5 jobs. For each device we make some calculations in those 5 jobs, and according the results of this calculations, in the 6. task I need to do something more.
As a first solution, I used sideOutputs in these 5 jobs and sent these additional info to an Kafka topic. Then my 6. job subscribed to it. But as the workload on Kafka was already very high, this solution doubled the workload on Kafka.
As all task slots run in the same task manager JVM, what I have in my mind is , developing custom RichSink and RichSource functions which use same static/singleton java object. As it will be static, I beleive all tasks will have access to same object. This object will keep a queue (java BlockingQueue).Instead of feeding data to Kafka, I will feed this queue in all tasks and 6.task will process the data received from this queue.
Please let me know if this is a good idea for a big distributed system. I assume clusters will not be a problem because after reading data from shared queue, I will call keyBy() so I hope Flink will handle that part. Also please let me know dangereous points and tips if you have.
You essentially have an in-memory data store for bridging between two jobs. One of several issues here is that if the Task Manager crashes, you lose this data, thus eliminating one of the key benefits of Flink (guaranteed at-least-once or exactly-once processing).
You'd also have to ensure that you've got at least one of your job 6 source operators running in a slot on every TM instance. Flink doesn't yet support the ability to easily control which sub-tasks run in what slots, though if you set the downstream job's parallelism == the number of slots then you can work around that issue.
I'm sure there are other issues, I just haven't spent much time thinking about it :)
Depending on the version of Flink you're using, I wonder if Flink's new Table Store would be an option for you.
The GlobalAggregateManager in the Flink may be helpful.
This can be used to share the state amongst parallel tasks in a job. However, performance may be poor in high-throughput scenarios.
Here are some demos of these projects:
Arctic, Flink

flink jobmanger or taskmanger instances

I had few questions in flink stream processing framework. Please let me know the your comments on these questions.
Let say If I build the cluster with n nodes, out of which I had m nodes as job mangers (for HA) then, remaining nodes (n-m) are the ask mangers?
In each node, We had n cores then how we can control/to use the specific number of cores to task-manger/job-manger?
If we add the new node as task-manger then, does the job manger automatically assign the task to the newly added task-manger?
Does flink has concept of partitions and data skew?
If flink connects to pulsar and need to read the data from portioned topic. So, what is the parallelism here? (parallelism is equal to no. of partitions or it's completely depends the flink task-manager's no.of task slots)
Does flink has any inbuilt optimization on job graph? (Example. My job graph has so many filter, map , flatmap.. etc). Please can you suggest any docs/materials for flink job optimizations?
do we have any option like, one dedicated core can be used for prometheus metrics scraping?
Yes
Configuring the number of slots per TM: https://nightlies.apache.org/flink/flink-docs-stable/docs/concepts/flink-architecture/#task-slots-and-resources although each operator runs in its own thread and you have no control on which core they run, so you don't really have a fine-grained control of how cores are used. Configuring resource groups also allows you to distribute operators across slots: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/#task-chaining-and-resource-groups
Not for currently running jobs, you'd need to re-scale them. New jobs will use it though.
Yes. https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/sources/
It will depend on the Fink source parallelism.
It automatically optimizes the graph as it sees fit. You have some control rescaling and chaining/splitting operators: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/ (towards the end). As a rule of thumb, I would start deploying a full job per slot and then, once properly understood where are the bottlenecks, try to optimize the graph. Most of the time is not worth it due to increased serialization and shuffling of data.
You can export Prometheus metrics, but not have a core dedicated to it: https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/metric_reporters/#prometheus

How to increase number of taskmanagers in flink 1.9.0 or latest versions

I have 1452 independent tasks in a flink job. It reads from kafka. Then do some transformation using flatmap and then sinks in hdfs files. Kafka and and flatmap has parallelism of 20 each and i have 1450 independent sink each having parallelism of 1. As maximum parallelism is 20 here. Flink creates only 5 taskmanagers when i use 4 slots per taskmanagers. As number of total tasks in job is very high i need more taskmanagers to be created.
As of now i am giving more parallelism(100) to one of the sink task so that i get desired number of taskmanagers(which is not a proper way) but it leads to all sink tasks(except the one with 100 task slots) to be created on only one taskmanager and remaning taskmanager is used by other tasks.
So i need some way to instantiate desired number of taskmanagers in fink and some way to distribute sink tasks into all taskmanagers.
The simpliest way to isolate slots and force the YARN to provide additional task managers and slots will be to use Slot Sharing Groups for operations you want to isolate. By default all operations are put into the default slot sharing group and therefore all your tasks share the same slot.
To do it, just specify a different slot sharing group for different jobs after operators you want to isolate:
strean
.op(...)
.slotSharingGroup("job-N")

Why should user have to set parallelism explicitly

I kicked off a flink application with n TaskManagers and s slots for each TaskManager, so that, My application will have n*s slots.
That means, flink could be able to run n*s subtasks at most at the same time. But why flink doesn't try to use most resources to run as many subtasks as possible, and bother end users to set the parallelism explicitly?
For the flink beginners that don't know the parallelism setting(default is 1), it will always run only one subtask even given more resources!
I would like to know the design considerations here, thanks!
A Flink cluster can also be used by multiple users or a single user can run multiple jobs on a cluster. Such clusters are not sized to run a single job but to run multiple jobs. In such environments its not desirable if jobs grab all available resources by default.

Distribute a Flink operator evenly across taskmanagers

I'm prototyping a Flink streaming application on a bare-metal cluster of 15 machines. I'm using yarn-mode with 90 task slots (15x6).
The app reads data from a single Kafka topic. The Kafka topic has 15 partitions, so I set the parallelism of the source operator to 15 as well. However, I found that Flink in some cases assigns 2-4 instances of the consumer task to the same taskmanager. This causes certain nodes to become network-bound (the Kafka topic is serving high volume of data and the machines only have 1G NICs) and bottlenecks in the entire data flow.
Is there a way to "force" or otherwise instruct Flink to distribute a task evenly across all taskmanagers, perhaps round robin? And if not, is there a way to manually assign tasks to specific taskmanager slots?
To the best of my knowledge, this isn't possible. The job manager, which schedules tasks into task slots, is only aware of task slots. It isn't aware that some task slots belong to one task manager, and others to another task manager.
Flink does not allow manually assign task slots as in case of failure handling, it can distribute the task to remaining task managers.
However, you can distribute the workload evenly by setting cluster.evenly-spread-out-slots: true in flink-conf.yaml.
This works for Flink >= 1.9.2.
To make it work, you may also have to set:
taskmanager.numberOfTaskSlots equal to the number of available CPUs per machine, and
parallelism.default equal to the the total number of CPUs in the cluster.

Resources