I kicked off a flink application with n TaskManagers and s slots for each TaskManager, so that, My application will have n*s slots.
That means, flink could be able to run n*s subtasks at most at the same time. But why flink doesn't try to use most resources to run as many subtasks as possible, and bother end users to set the parallelism explicitly?
For the flink beginners that don't know the parallelism setting(default is 1), it will always run only one subtask even given more resources!
I would like to know the design considerations here, thanks!
A Flink cluster can also be used by multiple users or a single user can run multiple jobs on a cluster. Such clusters are not sized to run a single job but to run multiple jobs. In such environments its not desirable if jobs grab all available resources by default.
Related
I have 5 different jobs running in 5 task slots. They all read from Kafka and sink back to Kafka. Kafka load is about 200K messages/sec.
I have another job, lets say ,job6 which needs to get some information from these 5 jobs. For each device we make some calculations in those 5 jobs, and according the results of this calculations, in the 6. task I need to do something more.
As a first solution, I used sideOutputs in these 5 jobs and sent these additional info to an Kafka topic. Then my 6. job subscribed to it. But as the workload on Kafka was already very high, this solution doubled the workload on Kafka.
As all task slots run in the same task manager JVM, what I have in my mind is , developing custom RichSink and RichSource functions which use same static/singleton java object. As it will be static, I beleive all tasks will have access to same object. This object will keep a queue (java BlockingQueue).Instead of feeding data to Kafka, I will feed this queue in all tasks and 6.task will process the data received from this queue.
Please let me know if this is a good idea for a big distributed system. I assume clusters will not be a problem because after reading data from shared queue, I will call keyBy() so I hope Flink will handle that part. Also please let me know dangereous points and tips if you have.
You essentially have an in-memory data store for bridging between two jobs. One of several issues here is that if the Task Manager crashes, you lose this data, thus eliminating one of the key benefits of Flink (guaranteed at-least-once or exactly-once processing).
You'd also have to ensure that you've got at least one of your job 6 source operators running in a slot on every TM instance. Flink doesn't yet support the ability to easily control which sub-tasks run in what slots, though if you set the downstream job's parallelism == the number of slots then you can work around that issue.
I'm sure there are other issues, I just haven't spent much time thinking about it :)
Depending on the version of Flink you're using, I wonder if Flink's new Table Store would be an option for you.
The GlobalAggregateManager in the Flink may be helpful.
This can be used to share the state amongst parallel tasks in a job. However, performance may be poor in high-throughput scenarios.
Here are some demos of these projects:
Arctic, Flink
I had few questions in flink stream processing framework. Please let me know the your comments on these questions.
Let say If I build the cluster with n nodes, out of which I had m nodes as job mangers (for HA) then, remaining nodes (n-m) are the ask mangers?
In each node, We had n cores then how we can control/to use the specific number of cores to task-manger/job-manger?
If we add the new node as task-manger then, does the job manger automatically assign the task to the newly added task-manger?
Does flink has concept of partitions and data skew?
If flink connects to pulsar and need to read the data from portioned topic. So, what is the parallelism here? (parallelism is equal to no. of partitions or it's completely depends the flink task-manager's no.of task slots)
Does flink has any inbuilt optimization on job graph? (Example. My job graph has so many filter, map , flatmap.. etc). Please can you suggest any docs/materials for flink job optimizations?
do we have any option like, one dedicated core can be used for prometheus metrics scraping?
Yes
Configuring the number of slots per TM: https://nightlies.apache.org/flink/flink-docs-stable/docs/concepts/flink-architecture/#task-slots-and-resources although each operator runs in its own thread and you have no control on which core they run, so you don't really have a fine-grained control of how cores are used. Configuring resource groups also allows you to distribute operators across slots: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/#task-chaining-and-resource-groups
Not for currently running jobs, you'd need to re-scale them. New jobs will use it though.
Yes. https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/sources/
It will depend on the Fink source parallelism.
It automatically optimizes the graph as it sees fit. You have some control rescaling and chaining/splitting operators: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/ (towards the end). As a rule of thumb, I would start deploying a full job per slot and then, once properly understood where are the bottlenecks, try to optimize the graph. Most of the time is not worth it due to increased serialization and shuffling of data.
You can export Prometheus metrics, but not have a core dedicated to it: https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/metric_reporters/#prometheus
We need to run 10 jobs on a Flink cluster, 4 out of them are not CPU bound, so for them, we can have 2xcpu task slots, however, 6 jobs are CPU bound and they need heavy CPU i.e vpcu/2 slots on each task manager. My question is how can I tell Flink that use x machines(task managers) for this job and y task managers for another one. Do I need to have a separate cluster for CPU bound jobs or is there any way to achieve this in a single cluster
Currently you will need to have a separate cluster for this. FLIP-169: DataStream API for Fine-Grained Resource Requirements, coming in Flink 1.14, may better support this use case.
We have developed a Flink application on v1.13.0 and deployed it on Kubernetes that runs a Task Manager instance on a Kubernetes pod. I am not sure how to determine the ideal number of task slots on each Task Manager instance. Should we configure/choose one task slot on each task manager/pod or two slots per Task Manager/pod or more. We currently configured two task slots per Task Manager instance and wondering if that is the right choice/setting. What are the pros and cons of running one task slot vs running two or more slots on a Task Manager/pod.
As a general rule, for containerized deployments (like yours), one slot per TM is a good default starting point. This tends to keep the configuration as straightforward as possible.
Depends on your expected workload, input, state size.
Is it a batch or a stream?
Batch: is the worload fast enough?
Stream: is the worload backpressuring?
For these throughput limitations, you might want to increase the number of TMs
State size: how are you processing your data? Does it require a lot of state?
For example, this query:
SELECT
user_id,
count(*)
FROM user_logins
will need a state proportional with the number of users.
You can tune the memory of TM in the options.
Here is a useful link: https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines
Concurrent jobs: is this machine under-used, and do you need to keep a pool of unused TS ready to execute a job?
A TM's memory will be sliced between the TS (be sure it fits your state size), but the CPU will be shared when idle.
Other than that if it's going fine on one TM on one pod then you have nothing to do.
I'm prototyping a Flink streaming application on a bare-metal cluster of 15 machines. I'm using yarn-mode with 90 task slots (15x6).
The app reads data from a single Kafka topic. The Kafka topic has 15 partitions, so I set the parallelism of the source operator to 15 as well. However, I found that Flink in some cases assigns 2-4 instances of the consumer task to the same taskmanager. This causes certain nodes to become network-bound (the Kafka topic is serving high volume of data and the machines only have 1G NICs) and bottlenecks in the entire data flow.
Is there a way to "force" or otherwise instruct Flink to distribute a task evenly across all taskmanagers, perhaps round robin? And if not, is there a way to manually assign tasks to specific taskmanager slots?
To the best of my knowledge, this isn't possible. The job manager, which schedules tasks into task slots, is only aware of task slots. It isn't aware that some task slots belong to one task manager, and others to another task manager.
Flink does not allow manually assign task slots as in case of failure handling, it can distribute the task to remaining task managers.
However, you can distribute the workload evenly by setting cluster.evenly-spread-out-slots: true in flink-conf.yaml.
This works for Flink >= 1.9.2.
To make it work, you may also have to set:
taskmanager.numberOfTaskSlots equal to the number of available CPUs per machine, and
parallelism.default equal to the the total number of CPUs in the cluster.