How to auto scale up/down Flink Stateful Functions on K8s - apache-flink

My Current Flink Application
based on Flink Stateful Function 3.1.1, it reads message from Kafka, process the message and then sink to Kafka Egress
Application has been deployed on K8s following guide and is running well: Stateful Functions Deployment
Based on the standard deployment, I have turned on kubernetes HA
My Objectives
I want to auto scale up/down the stateful functions.
I also want to know how to create more standby job managers
My Observations about the HA
I tried to set kubernetes.jobmanager.replicas in the flink-config ConfigMap:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: flink-config
labels:
app: shadow-fn
data:
flink-conf.yaml: |+
kubernetes.jobmanager.replicas: 7
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
I see no standby job managers in K8s.
Then I directly adjust the replicas of deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: statefun-master
spec:
replicas: 7
Standby job managers show up. I check the pod log, the leader election is done successfully. However, when I access UI in the web browser, it says:
{"errors":["Service temporarily unavailable due to an ongoing leader election. Please refresh."]}
What's wrong with my approach?
My Questions about the scaling
Reactive Mode is exactly what I need. I tried but failed, job manager has error message:
Exception in thread "main" org.apache.flink.configuration.IllegalConfigurationException: Reactive mode is configured for an unsupported cluster type. At the moment, reactive mode is only supported by standalone application clusters (bin/standalone-job.sh).
It seems that stateful function auto scaling shouldn't be done in this way.
What's the correct way to do the auto scaling, then?
Potential Approach(Probably incorrect)
After some research, my current direction is:
Job Manger has nothing to do with auto scaling. It is related to HA on K8s. I just need to make sure Job Manager has correct failover behaviors
My stateful functions are Flink remote services, i.e., they are regular k8s services. they can be deployed in form of KNative service to achieve auto scaling. Replicas of services goes up only when http requests come from Flink's worker
The most important part, Flink's worker(or Task Manager) I have no idea how to do the auto scaling yet. Maybe I should use KNative to deploy the Flink worker?
If it doesn't work with KNative, maybe I should totally change the flink runtime deployment. E.g., to try the original reactive demo. But I'm afraid the Stateuful functions are not intended to work like that.
At the last
I have read the Flink documentation and Github samples over and over but cannot find any more information to do this. Any hint/instructions/guideline are appreciated!

Since Reactive Mode is a new, experimental feature, not all features supported by the default scheduler are also available with Reactive Mode (and its adaptive scheduler). The Flink community is working on addressing these limitations.
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/elastic_scaling/

Related

Flink Session Cluster vs Job Cluster

Please can some one help me on this to under stand better
How to create Flink Session based cluster and How to create job based cluster? Do we have any specific configuration params?
How de we know cluster is session based or job based?
This is covered in the Flink documentation. For an overview, see https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/overview/, and for the details, see the pages for your specific environment, for example:
standalone
standalone kubernetes
native kubernetes
yarn
Note that job mode deployments have been deprecated in favor of application mode.

Kubernetes volumes: when is Statefulset necessary?

I'm approaching k8s volumes and best practices and I've noticed that when reading documentation it seems that you always need to use StatefulSet resource if you want to implement persistency in your cluster:
"StatefulSet is the workload API object used to manage stateful
applications."
I've implemented some tutorials, some of them use StatefulSet, some others don't.
In fact, let's say I want to persist some data, I can have my stateless Pods (even MySql server pods!) in which I use a PersistentVolumeClaim which persists the state. If I stop and rerun the cluster, I can resume the state from the Volume with no need of StatefulSet.
I attach here an example of Github repo in which there is a stateful app with MySql and no StatefulSet at all:
https://github.com/shri-kanth/kuberenetes-demo-manifests
So do I really need to use a StatefulSet resource for databases in k8s? Or are there some specific cases it could be a necessary practice?
PVCs are not the only reason to use Statefulsets over Deployments.
As the Kubernetes manual states:
StatefulSets are valuable for applications that require one or more of the following:
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
You can read more about database considerations when deployed on Kubernetes here To run or not to run a database on Kubernetes
StatefulSet is not the same as PV+PVC.
A StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
In other words it manages the deployment and scaling of a set of Pods , and provides guarantees about the ordering and uniqueness of these Pods.
So do I really need to use a StatefulSet resource for databases in k8s?
It depends on what you would like to achieve.
StatefulSet gives you:
Possibility to have a Stable Network ID (so your pods will be always named as $(statefulset name)-$(ordinal) )
Possibility to have a Stable Storage, so when a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolume Claims.
...MySql and no StatefulSet...
As you can see, if your goal is just to run single RDBMS Pod (for example Mysql) that stores all its data (DB itself) on PV+PVC, then the StatefulSet is definitely an overkill.
However, if you need to run Redis cluster (distributed DB) :-D it'll be close to impossible to do that without a StatefulSet (to the best of my knowledge and based on numerous threads about the same on StackOverflow).
I hope that this info helps you.

Flink: Multi-data center deployment possible?

Does Flink support multi-data center deployments so that jobs can survive a single data-center outage (i.e. resume failed jobs in a different data-center) ?
My limited research on Google suggests the answer: there is no such support.
Thanks
While there is no explicit support for multi-datacenter failover in Flink, it can be (and has been) setup. A Flink Forward talk provides one example: engineers from Uber describe their experience with this and related topics in Practical Experience running Flink in Production. There are a number of components to their solution, but the key ideas are: (1) running extra YARN resource managers, configured with a longer inetaddr.ttl, and (2) checkpointing to a federation of two HDFS clusters.

Flink: How to configure Flink such that the Taskmanagers auto restart after a failure?

How to configure Flink such that the Taskmanagers auto restart after a failure ?
On yarn and kubernetes Flink has a native resource manager (YarnResourceManager and KubernetesResourceManager) that will arrange for the requested number of slots to be available. In other environments you'll need to use cluster-framework-specific mechanisms to take care of this yourself.
Note that for k8s, only session clusters are supported by this new, more active mode implemented by KubernetesResourceManager. Job clusters still need to be managed in the old fashioned way, as described in the docs.
And then there are managed Flink environments where these details are taken care of for you -- e.g., Ververica Platform or Kinesis Data Analytics.

Where is the JobManager on embedded Flink instances?

I am developing an application with multiple (micro)services.
I am using Flink (over Kafka) to stream messages between the services. Flink is embedded in the Java applications, each running in a separate docker container.
This is the first time I'm trying Flink and after reading the docs I still have a feeling I'm missing something basic.
Who is managing the jobs?
Where is the JobManager running?
How do I monitor the processing?
Thanks,
Moshe
I would recommend this talk by Stephan Ewen at Flink Forward 2016. It explains the current Apache Flink architecture (10:45) for different deployments as well as future goals.
In general, the JobManager is managing Flink jobs and TaskManagers execute your job consisting of multiple tasks. How the components are orchestrated depends on your deployment (local, Flink cluster, YARN, Mesos etc.).
The best tool for monitor your processing is the Flink Web UI at port 8081 by default, it offers different metrics for debugging and monitoring (e.g. monitoring checkpointing or back-pressure).

Resources