Why "Configuration" section of running job is empty? - apache-flink

Can anybody explain me why "Configuration" section of running job in Apache Flink Dashboard is empty?
How to use this job configuration in my flow? Seems like this is not described in documentation.

The configuration tab of a running job shows the values of the ExecutionConfig. Depending on the version of Flink you might will experience a different behaviour.
Flink <= 1.0
The ExecutionConfig is only accessible for finished jobs. For running jobs, it is not possible to access it. Once the job has finished or has been stopped/cancelled, you should be able to see the ExecutionConfig.
Flink > 1.0
The ExecutionConfig can also be accessed for running jobs.

Related

DebeziumIO read with SQL Server not streaming with Apache beam in GCP

I did configure the standalone Debezium and tested the streaming. After that I created a pipeline as follows
pipeline.apply("Read from DebeziumIO",
DebeziumIO.<String>read()
.withConnectorConfiguration(
DebeziumIO.ConnectorConfiguration.create()
.withUsername("user")
.withPassword("password")
.withHostName("hostname")
.withPort("1433")
.withConnectorClass(SqlServerConnector.class)
.withConnectionProperty("database.server.name", "customer")
.withConnectionProperty("database.dbname", "test001")
.withConnectionProperty("database.include.list", "test002")
.withConnectionProperty("include.schema.changes", "true")
.withConnectionProperty("database.history.kafka.bootstrap.servers", "kafka:9092")
.withConnectionProperty("database.history.kafka.topic", "schema-changes.inventory")
.withConnectionProperty("connect.keep.alive", "false")
.withConnectionProperty("connect.keep.alive.interval.ms", "200")
).withFormatFunction(new SourceRecordJson.SourceRecordJsonMapper()).withCoder(StringUtf8Coder.of())
)
When I start the pipeline using DirectRunner, datastream is not captured by the pipeline. In my pipeline code I just added code to dump the data into console for the time being.
Also from the log I observe that the Debezium is being started and stopped frequently. Is that by design?
Also when there is a change made into the DB (INSERT/DELETE/UPDATE), I dont find it being reflected in the logs.
So my question is,
Configuration what I provided is that sufficient?
Why is the pipeline not being triggered when there is a change?
What additional steps I need to perform to get it working?
Restarting debezium multiple times can it cause performance impacts. Since it creates a jdbc connection.

Is it possible to add new embedded worker while cluster is running on statefun?

Here is the deal;
I'm dealing with adding new worker (embbeded) to on running the cluster (flink statefun 2.2.1).
As you see the new task manager can be registered to the cluster;
Screenshot of new deployed taskmanager
But it doesn't initialize (it doesn't deploying sources);
What am I missing here?? (master and workers has to same jar files too? or it should be enough deploying taskmanager with jar file)
Any help would be appreciated,
Thx.
Flink supports two different approaches to rescaling: active and reactive.
Reactive mode is new in Flink 1.13 (released just this week), and works as you expected: add (or remove) a task manager, and your application will adjust to the new parallelism. You can read about elastic scaling and reactive mode in the docs.
Reactive mode is currently a work in progress, but might need your needs.
In broad strokes, for active mode rescaling you need to:
Do a stop with savepoint to bring down your current job while taking a snapshot of its state.
Relaunch with the new parallelism, using the savepoint as the starting point.
The exact details depend on how your cluster is deployed.
For a step-by-step tutorial, see Upgrading & Rescaling a Job in the Flink Operations Playground.
The above applies to rescaling statefun embedded functions. Being stateless, remote functions can be rescaled more straightforwardly.

After the Flink version is upgraded, the taskmanager log information cannot be seen in the web UI

After the Flink version is upgraded, the taskmanager log information can not be seen in the web UI. In stdout, you can see the log of the code itself, but can not see the log of Spring and Flink itself.
What version have you upgraded to, and how is Flink running (i.e., Yarn, Kubernetes, standalone, etc)?
With some versions of Flink in certain environments, the logs aren't available in the web UI because they are being aggregated elsewhere. For example, you will want to use something like kubectl logs to access the logs if you are running on Kubernetes with certain versions of Flink.
UPDATE
Flink 1.11 switched from log4j1 to log4j2. See the release notes for details. Also, the logging properties file log4j-yarn-session.properties was renamed to log4j-session.properties and yarn-session.sh was updated to use the new file. Again, see the release notes for more info.

Flink Logging not working in cluster mode

Recently, I encountered a problem in Flink Logging in Standalone cluster mode when using logback.xml as logging. My requirement is that all my jobs should log in the particular folder and my flink framework logs should be placed in the seperate folder and also for each job running in my flink cluster there should be seperate folder for different jobs. I tested it in my local cluster which works fine and i get all my logs seperate folders respective to my Flink job submitted but as soon as i deploy my code in the Standalone cluster along with respective logback.xml for each job it doesn't logs at all. I also referred the follow. link for my query but still i am stuck with the problem.
Flink logging limitation: How to pass logging configuration to a flink job
Could you please specify where your log file resides ?
According to flink docs, it should either be specified explicitly by setting the environment property -Dlogback.configurationFile=<file> or by putting logback.xml in the classpath - usually, I overridden the one in flink/conf directory.

Can't set parallelism using Flink's CLI or Web-UI when using Apache Beam

I am using Flink 1.2.1 running on Docker, with Task Managers distributed across different VMs as part of a Docker Swarm.
Uploading an Apache Beam application using the Flink Web UI and trying to set the parallelism at job submission point doesn't work. Neither does submit a job using the Flink CLI.
It seems like the parallelism doesn't get picked up at client level, it ends up defaulting to 1.
When I set the parallelism programmatically within the Apache Beam code, it works: flinkPipelineOptions.setParallelism(4);
I suspect the root of the problem may be in the org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks for Flink's GlobalConfiguration, which may not pick up runtime values passed to Flink.
Any ideas on how this could be fixed or worked around? I need to be able to change the parallelism dynamically, so the programmatic approach won't work, nor will setting the Flink configuration at system level.
I am using the following documentation:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/parallel.html
https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/runners/flink/DefaultParallelismFactory.html
This should probably be fixed in the Beam Flink Runner but as a workaround you can try setting the parallelism to -1 programatically. This should make the translation pick up the parallelism that is specified when submitting the job.

Resources