Flink with Log4j2 - apache-flink

We are running a flink job using v.1.13.2 and setup/configured logging using log4j(classic/pre-log4j2). We want to upgrade to and use Log4j2 instead and could not find any way to do that. Wondering if there are any teams who went down this path to try to upgrade Log4j. Thanks.

Log4j2 has been the default logger since Flink 1.11. In order to be using log4j v1, there must be some configuration in place that needs to be removed / updated. See the documentation for details.

Although the log configuration file of flink is named log4j.properties, it actually use log4j2,as david said.

Related

Use OpenTelemetry with Apache Flink

I have been trying to use OpenTelemetry (https://opentelemetry.io/) in an Apache Flink's job. I am sending the traces to a Kafka topic in order to see it in a Jaeger.
The traceability is working in the job when I am executing it inside my IntelliJ IDE, but once I create the package and try to execute it inside the cluster, I am not able to make it work.
Is there any blocker in that sense for Apache Flink that I am not aware of?
I have accomplished this using a variable:
export FLINK_ENV_JAVA_OPTS=-javaagent:./lib/opentelemetry-javaagent-all.jar
But this is working if I am setting up the Flink's cluster. The problem it's that the cluster that I am using is inside AWS (Kinesis Analytics) and I am not able to set up this variable.
Is there a way to use OpenTelemetry with Flink?

Flink checkpointing failing in Kubernetes with FsStateBackend

I am getting the error as stated below while using flink in kubernetes with per job state backend of FsStateBackend like so -: env.setStateBackend(new FsStateBackend("file:///data/flink/checkpoints"))
I am setting it in my code itself.
Error -:
Mkdirs failed to create file:/data/flink/checkpoints/3321ab76ccf319397f5b52be25f6cd8d
Can someone suggest resolution for this -:
Thanks in advance. Cheers!!
In addition to what #chuckskull pointed out, also make sure that this file URI is accessible to every pod in your cluster. All of the task managers and the job manager have to be able to read and write the checkpoint files using this URI.
Here are a couple of things you can check:
Make sure that /data/flink/checkpoints exists.
Make sure that the user running the flink job has read/write access to this directory.

Using Flink LocalEnvironment for Production

I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).

Passing custom parameters to docker when running Flink on Mesos/Marathon

My team are trying set-up Apache Flink (v1.4) cluster on Mesos/Marathon. We are using the docker image provided by mesosphere. It works really well!
Because of a new requirement, the task managers have to launched with extend runtime privileges. We can easily enable this runtime privileges for the app manager via the Marathon web UI. However, we cannot find a way to enable the privileges for task managers.
In Apache Spark, we can set spark.mesos.executor.docker.parameters privileged=true in Spark's configuration file. Therefore, Spark can pass this parameter to docker run command. I am wondering if Apache Flink allow us to pass a custom parameter to docker run when launching task managers. If not, how can we start task managers with extended runtime privileges?
Thanks
There is a new parameter mesos.resourcemanager.tasks.container.docker.parameters introduced in this commit which will allow passing arbitrary parameters to Docker.
Unfortunately, this is not possible as of right now (or only for the framework scheduler as Tobi pointed out).
I went ahead and created a Jira for this feature so you can keep track/add details/contribute it yourself: https://issues.apache.org/jira/browse/FLINK-8490
You should be able to tweak the setting for the parameters in the ContainerInfo of https://github.com/mesoshq/flink-framework/blob/master/index.js to support this. I’ll eventually update the Flink version in the Docker image...

Is there any way to index kafka outputs in Apache solr?

I'm new to Apache solr and I want to index data from kafka into solr. Can anyone give simple example of doing this ?
The easiest way to get started on this would probably be to use Kafka Connect.
Connect is part of the apache Kafka package, so should already be installed on your Kakfa node(s). Please refer to the quickstart for a brief introduction on how to run connect.
For writing data to Solr there are two connectors that you could try:
https://github.com/jcustenborder/kafka-connect-solr
https://github.com/MSurendra/kafka-connect-solr
While I don't have any experience with either of them, I'd probably try Jeremy's first based on latest commit and the fact that he works for Confluent.

Resources