Flink CLI vs Flink Web Console - apache-flink

We have a requirement where to replace flink console UI and enable all the functionalities of Flink Web console using CLI utilities, for some of the functionalities like starting job, save-points etc we are using Flink CLI.
My questions are
Does Flink CLI has parity with Flink Web UI Console?
If not, is there alternate ways to do things without ui what is possible via Flink Console (like checking/monitoring back pressure of a job etc)
I am trying to find a solution where on-call engineer can completely monitor and operate on flink using command line / terminal without need to go to web ui
Thanks in Advance

In theory the Flink CLI plus the REST api provide a superset of the functionality available via the web UI. But some things, like identifying a busy task that's causing backpressure, can be done much more quickly with the web UI. For monitoring and troubleshooting I think you'll need to either build some tooling and/or set up a metrics dashboard (e.g., using Grafana in combination with your preferred metrics reporter).

Related

Use OpenTelemetry with Apache Flink

I have been trying to use OpenTelemetry (https://opentelemetry.io/) in an Apache Flink's job. I am sending the traces to a Kafka topic in order to see it in a Jaeger.
The traceability is working in the job when I am executing it inside my IntelliJ IDE, but once I create the package and try to execute it inside the cluster, I am not able to make it work.
Is there any blocker in that sense for Apache Flink that I am not aware of?
I have accomplished this using a variable:
export FLINK_ENV_JAVA_OPTS=-javaagent:./lib/opentelemetry-javaagent-all.jar
But this is working if I am setting up the Flink's cluster. The problem it's that the cluster that I am using is inside AWS (Kinesis Analytics) and I am not able to set up this variable.
Is there a way to use OpenTelemetry with Flink?

Apache Flink in Kubernetes

Could anyone please let me know how I can setup Flink in my Serverless platform (FaaS) to perform event driven operations?
I looked at Flink functions and it seems to be promising. Could anyone clarify on the below?
What I need to install in my FaaS env. to trigger the flink function when an event (file changes in my s3 bucket) occurs?
I don't have big data platform and so planning to use flink in my serverless/kubernetes env.
Thanks in advance!!
To use StateFun You would generally need:
An Ingress that would trigger the functions.
The actual code that would react to your events (the stateful function) Dockerized
A way to lunch your application
Specifically:
Every stateful function application starts with an Ingress, basically that is a funnel of events that your functions can react to.
In your case, you can use Amazon Kinesis as your Ingress, and make sure that your S3 events will end up there.
The next thing that you would need, is to get yourself familiar with a stateful function SDK, either in Java or in Python and write the logic that deals with the incoming events. The result of that stage would be a Docker image.
Then, you need to lunch the image obtained at (2) and for that you can use Kubernetes (you don't have to).
There are Helm charts provided for your convenience and a simple utility to generate the necessary k8s resources.

Using Flink LocalEnvironment for Production

I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).

Can't set parallelism using Flink's CLI or Web-UI when using Apache Beam

I am using Flink 1.2.1 running on Docker, with Task Managers distributed across different VMs as part of a Docker Swarm.
Uploading an Apache Beam application using the Flink Web UI and trying to set the parallelism at job submission point doesn't work. Neither does submit a job using the Flink CLI.
It seems like the parallelism doesn't get picked up at client level, it ends up defaulting to 1.
When I set the parallelism programmatically within the Apache Beam code, it works: flinkPipelineOptions.setParallelism(4);
I suspect the root of the problem may be in the org.apache.beam.runners.flink.DefaultParallelismFactory class, as it checks for Flink's GlobalConfiguration, which may not pick up runtime values passed to Flink.
Any ideas on how this could be fixed or worked around? I need to be able to change the parallelism dynamically, so the programmatic approach won't work, nor will setting the Flink configuration at system level.
I am using the following documentation:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/parallel.html
https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/runners/flink/DefaultParallelismFactory.html
This should probably be fixed in the Beam Flink Runner but as a workaround you can try setting the parallelism to -1 programatically. This should make the translation pick up the parallelism that is specified when submitting the job.

Deploying AngularJs + Sinatra to AWS

I have an AngularJS site consuming an API written in Sinatra.
I'm simply trying to deploy these 2 components together on an AWS EC2 instance.
How would one go about doing that? What tools do you recommend? What structure do you think is most suitable?
Cheers
This is based upon my experience of utilizing the HashciCorp line of tools.
Manual: Launch an Ubuntu image, gem install sinatra and deploy your code. Take a snapshot for safe keeping. This one off approach is good for a development box to iron out the configuration process. Write down the commands you run and any options you may need.
Automated: Use the Packer EC2 Builder and Shell Provisioner to automate your commands from the previous manual approach. This will give you a configured AMI that can be launched.
You can apply different methods of getting to an AMI using different toolsets. However, in the end, you want a single immutable image that can be deployed. repeatedly.

Resources