Flink: How to set System Properties in TaskManager? - apache-flink

I have some code reading message from kafka like below:
def main(args: Array[String]): Unit = {
System.setProperty("java.security.auth.login.config", "someValue")
val env = StreamExecutionEnvironment.getExecutionEnvironment
val consumerProperties = new Properties()
consumerProperties.setProperty("security.protocol", "SASL_PLAINTEXT")
consumerProperties.setProperty("sasl.mechanism", "PLAIN")
val kafkaConsumer = new FlinkKafkaConsumer011[ObjectNode](consumerProperties.getProperty("topic"), new JsonNodeDeserializationSchema, consumerProperties)
val stream = env.addSource(kafkaConsumer)
}
When the source try to read message from Apache Kafka, org.apache.kafka.common.security.JaasContext.defaultContext function will load "java.security.auth.login.config" property.
But the property is only set in JobManager, and when my job get running, the property can not load correctly in TaskManager, so the source will fail.
I tried to set extra JVM_OPTS like "-Dxxx=yyy", but the flink cluster is deployed in standalone mode, environment variable can not be changed very often.
Is there any way to set property in TaskManager?

The file bin/config.sh of Flink standalone cluster holds a property named DEFAULT_ENV_JAVA_OPTS.
Also if you export $JVM_ARGS="your parameters" the file bin/config.sh will load it using these lines:
# Arguments for the JVM. Used for job and task manager JVMs.
# DO NOT USE FOR MEMORY SETTINGS! Use conf/flink-conf.yaml with keys
# KEY_JOBM_MEM_SIZE and KEY_TASKM_MEM_SIZE for that!
if [ -z "${JVM_ARGS}" ]; then
JVM_ARGS=""
fi

Related

Kinesis Data Analytics (Flink) - how do I configure environment variables?

we are currently running a flink cluster in a standalone mode on Kubernetes. We have wanted to explore whether we could migrate over to managed flink on AWS (KDA).
But I don't seem to find any documentation or indication that it is possible to inject environment variables? Do these need to be provided as runtime arguments?
Related, is it possible to override default flink configurations that we currently specify in our flink-conf.yml in managed Flink?
Thanks in advance!
I'll answer my own question, I seems there is no way to provide the environment variables in the same fashion you would with configmaps in Kubernetes. Instead, we need to use the Runtime properties that can be defined in KDA. These can then be retrieved using the KinesisAnalyticsRuntime.getApplicationProperties()
As an example:
val params: ParameterTool = ParameterTool.fromArgs(args)
val config = params.get("env", "") match {
case "local" => AppConfiguration.initialize(sys.env)
case _ => // KDA
val kdaProperties = KinesisAnalyticsRuntime.getApplicationProperties()
logger.error(
s"kdaProperties $kdaProperties",
Some(Map("kda" -> kdaProperties))
)
Option(kdaProperties.get("DevProperties")) match {
case Some(kdaProperties) =>
val kdaPropsToMap = kdaProperties.asScala.toMap
AppConfiguration.initialize(kdaPropsToMap)
case None =>
logger.error(s"could not read KDA runtime properties", Some(Map("kda" -> kdaProperties)))
throw new Error(
"unable to read KDA runtime properties"
) // scalafix:ok
}
}
And where the grouping key defined in KDA for the Runtime properties is used to fetch these.
This also means configuring flink-conf.yml will be possible as Runtime properties which then need to be set during runtime (is my understanding)

Is there a way limit the members of a camel-cluster (using KubernetesClusterService) using a pod selector (selection)

I am programmatically creating my CamelContext and successfully creating the ClusterServiceProvider using the KubernetesClusterService implementation. When running in my kubernetes cluster, it is electing a leader and responding appropriately to "master:" routes. All good.
However, I would like to limit the Pod/Deployments that are detected in the cluster member negotiation/inspection. It currently has knowledge of (finds) every Pod in the cluster namespace which includes completely unrelated deployments/instances.
The overall question is how to select what Pods/Deployments should be included in the particular camel cluster?
I see in the KubernetesLockConfiguration there is an attribute called clusterLabels however it is unclear to me how or what that is used for. When I do set the clusterLabels to something syntactically common in kubernetes (e.g. app -> my-app), then the cluster finds no members.
I mention that I am doing this programmatically as there is no spring-boot or other commonly documented configuration of camel involved. Running in Play-Framework Scala.
First, I create a CamelContext
val context = new DefaultCamelContext()
val rb = new RouteBuilder() {
def configure() = {
val policy = ClusteredRoutePolicy.forNamespace("default")
from(s"master:ip:timer://master-timer?fixedRate=true&period=60000")
.routePolicy(policy)
.bean(classOf[MasterTimer], "execute()")
.log("Current Leader ${routeId}")
}
}
Second, I create a ClusterServiceProvider
import org.apache.camel.component.kubernetes.cluster.KubernetesClusterService
val crc = new ClusteredRouteController()
val service = new KubernetesClusterService
service.setCamelContext(cc)
service.setKubernetesNamespace("default")
//
// if I set clusterLabels here, no camel cluster is realize/created.
// assumption is the my syntax for CamelKubernetes is wrong however
// it is unclear from documentation how to make this work.
//
// if I do not set clusterLabels, every pod in my kubernetes cluster is
// part of a cluster-member (CamelKubernetesLeaderNotifier logs that
// the list of cluster members has changed). So I get completely
// un-related deployments in the context of something that I want
// to specifically related, namely all pods with an "app" label of
// "my-app"
//
service.setClusterLabels(Map("app" -> "my-app").asJava)
crc.setNamespace("default")
crc.setClusterService(service)
camelContext.addService(service)
camelContext.setRouteController(crc)
camelContext.start()
camelContext.getRouteController().startAllRoutes()

Is there a method to setup flink memory dynamically?

When using Apache Flink we can configure values in flink-conf.yaml. But here using CLI commands we can assign some values Dynamically when starting or submitting a job or a task in flink.
eg:- bin/taskmanager.sh start-foreground -Dtaskmanager.numberOfSlots=12
But some values like jobmanager.memory.process.size and taskmanager.memory.process.size are unable to set Dynamically using "-D".
Is there a way to set up those values dynamically when starting jobmanager and taskmanager using CLI?
May be this might help you :
ParameterTool parameters = ParameterTool.fromPropertiesFile("src/main/resources/application.properties"); // one can specify the properties defined in conf/flink-conf.yaml in this properties file
Configuration config = Configuration.fromMap(parameters.toMap());
TaskExecutorResourceUtils.adjustForLocalExecution(config);
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(config);
env.setParallelism(3);
System.out.println("Config Params : " + config.toMap());
Make a note to set the parallelism equal to the number of task slots for task manager as per this link. By default, the number of task slots for a task manager is one.

Flink Table-API and DataStream ProcessFunction

I want to join a big table, impossible to be contained in TM memory and a stream (kakfa). I successfully joined both on my tests, mixing table-api with datastream api. I did the following:
val stream: DataStream[MyEvent] = env.addSource(...)
stream
.timeWindowAll(...)
.trigger(...)
.process(new ProcessAllWindowFunction[MyEvent, MyEvent, TimeWindow] {
var tableEnv: StreamTableEnvironment = _
override def open(parameters: Configuration): Unit = {
//init table env
}
override def process(context: Context, elements: Iterable[MyEvent], out: Collector[MyEvent]): Unit = {
val table = tableEnv.sqlQuery(...)
elements.map(e => {
//do process
out.collect(...)
})
}
})
It is working, but I have never seen anywhere this type of implementation. Is it ok ? what would be the drawback ?
One should not use StreamExecutionEnvironment or TableEnvironment within a Flink function. An environment is used to construct a pipeline that is submitted to the cluster.
Your example submits a job to the cluster within a cluster's job.
This might work for certain use cases but is generally discouraged. Imagine your outer stream contains thousands of events and your function would create a job for every event, it could potentially DDoS your cluster.

Use database property from external property file in grails 3

I want to use external property file for database configuration in production environment. i have tried some of solution from blogs and stack overflow but its work only for development environment.
grailsVersion=3.3.2
First create properties file in src/main/resources (if resources dir does not exist, create it).
then remove configuration from application.yml (if won't then it will override).
in Application.groovy load the file with this :
def url = getClass().classLoader.getResource("myconfig.properties")
def confFile = new File(url.toURI())
Properties properties = new Properties()
confFile.withInputStream {
properties.load(it)
}
environment.propertySources.addFirst(new PropertiesPropertySource("local.config.location", properties))

Resources