Recently, I encountered a problem in Flink Logging in Standalone cluster mode when using logback.xml as logging. My requirement is that all my jobs should log in the particular folder and my flink framework logs should be placed in the seperate folder and also for each job running in my flink cluster there should be seperate folder for different jobs. I tested it in my local cluster which works fine and i get all my logs seperate folders respective to my Flink job submitted but as soon as i deploy my code in the Standalone cluster along with respective logback.xml for each job it doesn't logs at all. I also referred the follow. link for my query but still i am stuck with the problem.
Flink logging limitation: How to pass logging configuration to a flink job
Could you please specify where your log file resides ?
According to flink docs, it should either be specified explicitly by setting the environment property -Dlogback.configurationFile=<file> or by putting logback.xml in the classpath - usually, I overridden the one in flink/conf directory.
Related
After the Flink version is upgraded, the taskmanager log information can not be seen in the web UI. In stdout, you can see the log of the code itself, but can not see the log of Spring and Flink itself.
What version have you upgraded to, and how is Flink running (i.e., Yarn, Kubernetes, standalone, etc)?
With some versions of Flink in certain environments, the logs aren't available in the web UI because they are being aggregated elsewhere. For example, you will want to use something like kubectl logs to access the logs if you are running on Kubernetes with certain versions of Flink.
UPDATE
Flink 1.11 switched from log4j1 to log4j2. See the release notes for details. Also, the logging properties file log4j-yarn-session.properties was renamed to log4j-session.properties and yarn-session.sh was updated to use the new file. Again, see the release notes for more info.
I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).
We have a few flink jobs that run on yarn. We would like to upload flink job logs to ELK to simplify debugging/analysis. Currently flink task managers write logs to /mnt/flinklogs/$application_id/$container_id. We want to have it write to a directory without $applicatoin_id/$container_id nested structure.
I tried with env.log.dir: /mnt/flink. With this setting, the configuration is not passed correctly.
-Dlog.file=/mnt/flinklogs/application_1560449379756_1312/\
container_e02_1560449379756_1312_01_000619/taskmanager.log
I think that the best approche to solve this is using yarn log aggregation to write to log to disk and elastic filebit to send them to elastic.
I have a flink job and it expects another dependency jar to be available at run time. Is it possible to submit a job with multiple jars in flink cluster. I can not build one common jar.
You could install the dependency jar on every flink node in the flink/lib folder as described here. That would make it available to user uploaded jars without the need for your users to bundle it with their jars.
Keep in mind that this would be handled entirely outside of the job submission API, so you would have to incorporate it into whatever system you're using to manage configuration on your flink nodes.
Can anybody explain me why "Configuration" section of running job in Apache Flink Dashboard is empty?
How to use this job configuration in my flow? Seems like this is not described in documentation.
The configuration tab of a running job shows the values of the ExecutionConfig. Depending on the version of Flink you might will experience a different behaviour.
Flink <= 1.0
The ExecutionConfig is only accessible for finished jobs. For running jobs, it is not possible to access it. Once the job has finished or has been stopped/cancelled, you should be able to see the ExecutionConfig.
Flink > 1.0
The ExecutionConfig can also be accessed for running jobs.