I'm looking for a sample project that will help me understand how to use the Flink RemoteStreamEnvironment api. Does anyone have a link to a sample project?
Normally there is no need to use either RemoteStreamEnvironment or LocalStreamEnvironment. Normally you can do
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
and the runtime will use either a local or remote stream execution environment, depending on how Flink is being launched (e.g., if from within an IDE you'll get a local mini-cluster, and with the CLI or REST api it's straightforward to submit a job to a cluster).
But if you want to use a RemoteStreamEnvironment directly, you can use one of the StreamExecutionEnvironment#createRemoteEnvironment(...) methods to create the remote execution environment, and then just use that env like you would any other to setup the job graph and execute the job.
Related
I have been trying to use OpenTelemetry (https://opentelemetry.io/) in an Apache Flink's job. I am sending the traces to a Kafka topic in order to see it in a Jaeger.
The traceability is working in the job when I am executing it inside my IntelliJ IDE, but once I create the package and try to execute it inside the cluster, I am not able to make it work.
Is there any blocker in that sense for Apache Flink that I am not aware of?
I have accomplished this using a variable:
export FLINK_ENV_JAVA_OPTS=-javaagent:./lib/opentelemetry-javaagent-all.jar
But this is working if I am setting up the Flink's cluster. The problem it's that the cluster that I am using is inside AWS (Kinesis Analytics) and I am not able to set up this variable.
Is there a way to use OpenTelemetry with Flink?
I would like to expose an end point from my flink streaming application.Which returns some static metadata about the app . What are the possible ways to implement this . Please help
What sort of metadata would you like to retrieve? Flink exposes a CLI which is enables you to gather data about the running job. Which you are able to use both if you're running it on e.g. Kubernetes or AWS KDA.
You can also define and expose your own metrics if the CLI doesn't fulfil your use case.
Could anyone please let me know how I can setup Flink in my Serverless platform (FaaS) to perform event driven operations?
I looked at Flink functions and it seems to be promising. Could anyone clarify on the below?
What I need to install in my FaaS env. to trigger the flink function when an event (file changes in my s3 bucket) occurs?
I don't have big data platform and so planning to use flink in my serverless/kubernetes env.
Thanks in advance!!
To use StateFun You would generally need:
An Ingress that would trigger the functions.
The actual code that would react to your events (the stateful function) Dockerized
A way to lunch your application
Specifically:
Every stateful function application starts with an Ingress, basically that is a funnel of events that your functions can react to.
In your case, you can use Amazon Kinesis as your Ingress, and make sure that your S3 events will end up there.
The next thing that you would need, is to get yourself familiar with a stateful function SDK, either in Java or in Python and write the logic that deals with the incoming events. The result of that stage would be a Docker image.
Then, you need to lunch the image obtained at (2) and for that you can use Kubernetes (you don't have to).
There are Helm charts provided for your convenience and a simple utility to generate the necessary k8s resources.
I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).
My team are trying set-up Apache Flink (v1.4) cluster on Mesos/Marathon. We are using the docker image provided by mesosphere. It works really well!
Because of a new requirement, the task managers have to launched with extend runtime privileges. We can easily enable this runtime privileges for the app manager via the Marathon web UI. However, we cannot find a way to enable the privileges for task managers.
In Apache Spark, we can set spark.mesos.executor.docker.parameters privileged=true in Spark's configuration file. Therefore, Spark can pass this parameter to docker run command. I am wondering if Apache Flink allow us to pass a custom parameter to docker run when launching task managers. If not, how can we start task managers with extended runtime privileges?
Thanks
There is a new parameter mesos.resourcemanager.tasks.container.docker.parameters introduced in this commit which will allow passing arbitrary parameters to Docker.
Unfortunately, this is not possible as of right now (or only for the framework scheduler as Tobi pointed out).
I went ahead and created a Jira for this feature so you can keep track/add details/contribute it yourself: https://issues.apache.org/jira/browse/FLINK-8490
You should be able to tweak the setting for the parameters in the ContainerInfo of https://github.com/mesoshq/flink-framework/blob/master/index.js to support this. I’ll eventually update the Flink version in the Docker image...