Can I run multiple taskManager in one single pc? - apache-flink

flink version: 1.10
os: centos 7
I've started a standlone flink cluster in my server.Then I can see one taskManager in flink web-ui.
Question: Is it reasonable to run another taskManger on this server?
Here's my step(For now, flink cluster has been started):
1. Im my server, go to flink's root directory.Then start another taskManger:
cd bin
./ start
For a while, There are two taskManager appear in my flink web-ui.
And if run multiple taskManager in one single server is accetpable. What should I take a notice when I'm doing this.

The existing task manager (TM) has 4 slots and has 4 CPU cores available to it. Whether it's reasonable to run another TM depends on what resources the server has, and how resource intensive your workload is. If your server still has free cores and isn't busy doing other things besides running Flink, then sure, run another TM -- or make this one bigger.
What matters most is how many total task slots are being provided by the server. As a starting point, you might think in terms of one slot per CPU core. Whether those slots are all in on TM, or each in their own TM, or somewhere in between, is a secondary concern. (See Is one TaskManager with three slots the same as three TaskManagers with one slot in Apache Flink for discussion of that point.)


How can I specify that parts of my flink job run in different taskmanagers

I have a cluster with several taskmanagers. Each taskmanager (1 taskslot per TM) is running a different breed of job.
I have a particular job consisting on stages, which runs in 1 taskmanager (there is no rebalancing, so the graph optimizer merges everything in the same thread) and I want their 3 operators to run in 3 different taskmanagers, how do I setup that?
The mechanism you're looking for is slot sharing groups. This will allow you to force each stage of your pipeline into its own slot.
Your application might perform better if instead you were to disable operator chaining (env.disableOperatorChaining() will force each pipeline stage into its own thread) and then run this job on a TM that uses 2 or 3 CPU cores per slot. With this configuration you'd be using shared memory for communication between the stages, rather than the network.

Ideal Number of Task Slots

We have developed a Flink application on v1.13.0 and deployed it on Kubernetes that runs a Task Manager instance on a Kubernetes pod. I am not sure how to determine the ideal number of task slots on each Task Manager instance. Should we configure/choose one task slot on each task manager/pod or two slots per Task Manager/pod or more. We currently configured two task slots per Task Manager instance and wondering if that is the right choice/setting. What are the pros and cons of running one task slot vs running two or more slots on a Task Manager/pod.
As a general rule, for containerized deployments (like yours), one slot per TM is a good default starting point. This tends to keep the configuration as straightforward as possible.
Depends on your expected workload, input, state size.
Is it a batch or a stream?
Batch: is the worload fast enough?
Stream: is the worload backpressuring?
For these throughput limitations, you might want to increase the number of TMs
State size: how are you processing your data? Does it require a lot of state?
For example, this query:
FROM user_logins
will need a state proportional with the number of users.
You can tune the memory of TM in the options.
Here is a useful link:
Concurrent jobs: is this machine under-used, and do you need to keep a pool of unused TS ready to execute a job?
A TM's memory will be sliced between the TS (be sure it fits your state size), but the CPU will be shared when idle.
Other than that if it's going fine on one TM on one pod then you have nothing to do.

Apache Flink: number of TaskManagers per machine

The number of CPU cores per machine is four. In flink standalone mode, how should I set the number of TaskManagers on each machine?
1 TaskManager, each TaskManager has 4 slots.
2 TaskManagers, each TaskManager has 2 slots.
4 TaskManagers, each TaskManager has 1 slot. This setting is like apache-storm.
Normally you'd have one TaskManager per server, and (as per the doc that bupt_ljy referenced) one slot per physical CPU core. So I'd go with your option #1.
There's also the consideration of Flink's scheduling algorithm. We've frequently run into problems where, with multiple hosts running one large task manager a piece, all jobs get scheduled to one host, which can cause load problems.
We ended up making multiple smaller task managers per host and jobs seem to be distributed better (although they still cluster on one node often).
So, in my experience, I'd lean more towards 4 task managers with 1 slot a piece, or maybe compromise at 2 task managers with 2 slots a piece.
I think it depends on your application.
In official documents Distributed Runtime Environment, it says As a rule-of-thumb, a good default number of task slots would be the number of CPU cores. With hyper-threading, each slot then takes 2 or more hardware thread contexts.
But if you have to use a lot of memory in your application, then you don't need too many slots in one task manager.

What are reasons to prefer increasing the number of task managers instead of task slots per task manager?

According to the Flink documentation, there exist two dimensions to affect the amount of resources available to a task:
The number of task managers
The number of task slots available to a task manager.
Having one slot per TaskManager means each task group runs in a separate JVM (which can be started in a separate container, for example). Having multiple slots means more subtasks share the same JVM. Tasks in the same JVM share TCP connections (via multiplexing) and heartbeat messages. They may also share data sets and data structures, thus reducing the per-task overhead.
With this line in the documentation, it seems that you would always err on the side of increasing the number of task slots per task manager instead of increasing the number of task managers.
A concrete scenario: if I have a job cluster deployed in Kubernetes (let's assume 16 CPU cores are available) and a pipeline consisting of one source + one map function + one sink, then I would default to having a single TaskManager with 16 slots available to that TaskManager.
Is this the optimal configuration? Is there a case where I would prefer 16 TaskManagers with a single slot each or maybe a combination of TaskManager and slots that could take advantage of all 16 CPU cores?
There is no optimal configuration because "optimal" cannot be defined in general. A configuration with a single slot per TM provides good isolation and is often easier to manage and reason about.
If you run multiple jobs, a multi-slot configuration might schedule tasks of different jobs to one TM. If the TM goes down, e.g., because either of two tasks consumed too much memory, both jobs will be restarted. On the other hand, running one slot per TM might leave more memory unused. If you only run a single job per cluster, multiple slots per TM might be fine.

Distribute a Flink operator evenly across taskmanagers

I'm prototyping a Flink streaming application on a bare-metal cluster of 15 machines. I'm using yarn-mode with 90 task slots (15x6).
The app reads data from a single Kafka topic. The Kafka topic has 15 partitions, so I set the parallelism of the source operator to 15 as well. However, I found that Flink in some cases assigns 2-4 instances of the consumer task to the same taskmanager. This causes certain nodes to become network-bound (the Kafka topic is serving high volume of data and the machines only have 1G NICs) and bottlenecks in the entire data flow.
Is there a way to "force" or otherwise instruct Flink to distribute a task evenly across all taskmanagers, perhaps round robin? And if not, is there a way to manually assign tasks to specific taskmanager slots?
To the best of my knowledge, this isn't possible. The job manager, which schedules tasks into task slots, is only aware of task slots. It isn't aware that some task slots belong to one task manager, and others to another task manager.
Flink does not allow manually assign task slots as in case of failure handling, it can distribute the task to remaining task managers.
However, you can distribute the workload evenly by setting cluster.evenly-spread-out-slots: true in flink-conf.yaml.
This works for Flink >= 1.9.2.
To make it work, you may also have to set:
taskmanager.numberOfTaskSlots equal to the number of available CPUs per machine, and
parallelism.default equal to the the total number of CPUs in the cluster.
