How can I specify that parts of my flink job run in different taskmanagers - flink-streaming

I have a cluster with several taskmanagers. Each taskmanager (1 taskslot per TM) is running a different breed of job.
I have a particular job consisting on stages, which runs in 1 taskmanager (there is no rebalancing, so the graph optimizer merges everything in the same thread) and I want their 3 operators to run in 3 different taskmanagers, how do I setup that?

The mechanism you're looking for is slot sharing groups. This will allow you to force each stage of your pipeline into its own slot.
Your application might perform better if instead you were to disable operator chaining (env.disableOperatorChaining() will force each pipeline stage into its own thread) and then run this job on a TM that uses 2 or 3 CPU cores per slot. With this configuration you'd be using shared memory for communication between the stages, rather than the network.

Related

Configuring core usage per slot in Flink

I have a cluster of 3 machines with 4 cores each. Each machine has one task manager. I know that the number of slots in Flink can be controlled by taskmanager.numberOfTaskSlots. I initially had allotted 12 slots in total (every task manager had 4 slots). Although, there is no explicit CPU isolation among slots (as mentioned here), I assume that each slot is roughly using 1 core. Am I right in assuming this?
I haven't mentioned any slot sharing group in my code and my pipeline does not have any blocking edges. The parallelism of each task is the same and is equal to the number of slots. I am assuming that one subtask from each task will be in a slot. Am I correct in this understanding?
After some conversation (link for the curious minds :-)), I wanted to increase the cores per slot to 2 for my experiments. So, I reduced the taskmanager.numberOfTaskSlots to 2 on each machine? After doing this, I see that the Flink WebUI shows 6 slots is total and 2 slots for each task manager. I have also reduced the parallelism of each task to 6. Is this all that I need to do?
Note: I am not using the MVP feature of fine grained resource management right now.
That sounds right.
Each Task Manager is a single JVM. A task slot doesn't correspond to anything physical -- it's just an abstract resource managed by the Flink scheduler. Each task in a task slot is an instance of an operator chain in the execution graph, and each task is single-threaded. No two instances of the same operator chain will ever be scheduled into the same slot.
All of the threads for all of the tasks in all of the slots in given task manager will compete for the resources available to that JVM: cores, memory, etc.
As you have noted, there is no way to explicitly set the number of cores per slot. And there's no requirement that the number be an integer. You could, for example, decide that your 4-core TMs are each providing 3 slots, for a total parallelism of 9 across the 3 TMs.

Can I run multiple taskManager in one single pc?

flink version: 1.10
os: centos 7
detail:
I've started a standlone flink cluster in my server.Then I can see one taskManager in flink web-ui.
Question: Is it reasonable to run another taskManger on this server?
Here's my step(For now, flink cluster has been started):
1. Im my server, go to flink's root directory.Then start another taskManger:
cd bin
./taskManager.sh start
For a while, There are two taskManager appear in my flink web-ui.
And if run multiple taskManager in one single server is accetpable. What should I take a notice when I'm doing this.
The existing task manager (TM) has 4 slots and has 4 CPU cores available to it. Whether it's reasonable to run another TM depends on what resources the server has, and how resource intensive your workload is. If your server still has free cores and isn't busy doing other things besides running Flink, then sure, run another TM -- or make this one bigger.
What matters most is how many total task slots are being provided by the server. As a starting point, you might think in terms of one slot per CPU core. Whether those slots are all in on TM, or each in their own TM, or somewhere in between, is a secondary concern. (See Is one TaskManager with three slots the same as three TaskManagers with one slot in Apache Flink for discussion of that point.)

Apache Flink: number of TaskManagers per machine

The number of CPU cores per machine is four. In flink standalone mode, how should I set the number of TaskManagers on each machine?
1 TaskManager, each TaskManager has 4 slots.
2 TaskManagers, each TaskManager has 2 slots.
4 TaskManagers, each TaskManager has 1 slot. This setting is like apache-storm.
Normally you'd have one TaskManager per server, and (as per the doc that bupt_ljy referenced) one slot per physical CPU core. So I'd go with your option #1.
There's also the consideration of Flink's scheduling algorithm. We've frequently run into problems where, with multiple hosts running one large task manager a piece, all jobs get scheduled to one host, which can cause load problems.
We ended up making multiple smaller task managers per host and jobs seem to be distributed better (although they still cluster on one node often).
So, in my experience, I'd lean more towards 4 task managers with 1 slot a piece, or maybe compromise at 2 task managers with 2 slots a piece.
I think it depends on your application.
In official documents Distributed Runtime Environment, it says As a rule-of-thumb, a good default number of task slots would be the number of CPU cores. With hyper-threading, each slot then takes 2 or more hardware thread contexts.
But if you have to use a lot of memory in your application, then you don't need too many slots in one task manager.

What are reasons to prefer increasing the number of task managers instead of task slots per task manager?

According to the Flink documentation, there exist two dimensions to affect the amount of resources available to a task:
The number of task managers
The number of task slots available to a task manager.
Having one slot per TaskManager means each task group runs in a separate JVM (which can be started in a separate container, for example). Having multiple slots means more subtasks share the same JVM. Tasks in the same JVM share TCP connections (via multiplexing) and heartbeat messages. They may also share data sets and data structures, thus reducing the per-task overhead.
With this line in the documentation, it seems that you would always err on the side of increasing the number of task slots per task manager instead of increasing the number of task managers.
A concrete scenario: if I have a job cluster deployed in Kubernetes (let's assume 16 CPU cores are available) and a pipeline consisting of one source + one map function + one sink, then I would default to having a single TaskManager with 16 slots available to that TaskManager.
Is this the optimal configuration? Is there a case where I would prefer 16 TaskManagers with a single slot each or maybe a combination of TaskManager and slots that could take advantage of all 16 CPU cores?
There is no optimal configuration because "optimal" cannot be defined in general. A configuration with a single slot per TM provides good isolation and is often easier to manage and reason about.
If you run multiple jobs, a multi-slot configuration might schedule tasks of different jobs to one TM. If the TM goes down, e.g., because either of two tasks consumed too much memory, both jobs will be restarted. On the other hand, running one slot per TM might leave more memory unused. If you only run a single job per cluster, multiple slots per TM might be fine.

Distribute a Flink operator evenly across taskmanagers

I'm prototyping a Flink streaming application on a bare-metal cluster of 15 machines. I'm using yarn-mode with 90 task slots (15x6).
The app reads data from a single Kafka topic. The Kafka topic has 15 partitions, so I set the parallelism of the source operator to 15 as well. However, I found that Flink in some cases assigns 2-4 instances of the consumer task to the same taskmanager. This causes certain nodes to become network-bound (the Kafka topic is serving high volume of data and the machines only have 1G NICs) and bottlenecks in the entire data flow.
Is there a way to "force" or otherwise instruct Flink to distribute a task evenly across all taskmanagers, perhaps round robin? And if not, is there a way to manually assign tasks to specific taskmanager slots?
To the best of my knowledge, this isn't possible. The job manager, which schedules tasks into task slots, is only aware of task slots. It isn't aware that some task slots belong to one task manager, and others to another task manager.
Flink does not allow manually assign task slots as in case of failure handling, it can distribute the task to remaining task managers.
However, you can distribute the workload evenly by setting cluster.evenly-spread-out-slots: true in flink-conf.yaml.
This works for Flink >= 1.9.2.
To make it work, you may also have to set:
taskmanager.numberOfTaskSlots equal to the number of available CPUs per machine, and
parallelism.default equal to the the total number of CPUs in the cluster.

Resources