I would like to expose an end point from my flink streaming application.Which returns some static metadata about the app . What are the possible ways to implement this . Please help
What sort of metadata would you like to retrieve? Flink exposes a CLI which is enables you to gather data about the running job. Which you are able to use both if you're running it on e.g. Kubernetes or AWS KDA.
You can also define and expose your own metrics if the CLI doesn't fulfil your use case.
I want to create a Custom Apache Flink Sink to AWS Sagemaker Feature store, but there is no documentation for how to create custom sinks on Flink's website. There are also multiple base classes that I can potentially extend (e.g. AsyncSinkBase, RichSinkFunction), so I'm not sure which to use.
I am looking for guidelines regarding how to implement a custom sink (both in general and for my specific use-case). For my specific use-case: Sagemaker Feature Store has a synchronous client with a putRecord call to send records to AWS Sagemaker FS, so I am ideally looking for a way to create a custom sink that would work well with this client. Note: I require at at least once processing guarantees, as Sagemaker FS is DynamoDB (a key-value store) under the hood.
Java Client: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sagemakerfeaturestoreruntime/AmazonSageMakerFeatureStoreRuntime.html
Example of the putRecord call using the Python client: https://github.com/aws-samples/amazon-sagemaker-feature-store-streaming-aggregation/blob/main/src/lambda/StreamingIngestAggFeatures/lambda_function.py#L31
What I've Found so Far
Some older articles which say to use org.apache.flink.streaming.api.functions.sink.RichSinkFunction and SinkFunction
Some connectors using classes in org.apache.flink.connector.base.sink.writer (e.g. AsyncSinkWriter, AsyncSinkBase)
This section of the Flink docs says to use the SourceReaderBase from org.apache.flink.connector.base.source.reader when creating custom sources; SourceBaseReader seems to be the equivalent source to the sink classes in the bullet above
Any help/guidance/insights are much appreciated, thanks.
How about extending RichAsyncFunction ?
you can find similar example here - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api
Hi I am planning to use flink as a backend for my feature where we will show a UI to user to graphically create event patterns for eg: Multiple login failures from the same Ip address.
We will create the flink pattern programmatically using the given criteria by the user in the UI.
Is there any documentation on how to dynamically create the jar file and dynamically submit the job with it to flink cluster?
Is there any best practice for this kind of use case using apache flink?
The other way you can achieve that is that you can have one jar which contains something like an “interpreter” and you will pass to it the definition of your patterns in some format (e.g. json). After that “interpreter” translates this json to Flink’s operators. It is done in such a way in https://github.com/TouK/nussknacker/ Flink’s based execution engine. If you use such an approach you will need to handle redeployment of new definition in your own application.
One straightforward way to achieve this would be to generate a SQL script for each pattern (using MATCH_RECOGNIZE) and then use Ververica Platform's REST API to deploy and manage those scripts: https://docs.ververica.com/user_guide/application_operations/deployments/artifacts.html?highlight=sql#sql-script-artifacts
Flink doesn't provide tooling for automating the creation of JAR files, or submitting them. That's the sort of thing you might use a CI/CD pipeline to do (e.g., github actions).
Disclaimer: I work for Ververica.
I have been trying to use OpenTelemetry (https://opentelemetry.io/) in an Apache Flink's job. I am sending the traces to a Kafka topic in order to see it in a Jaeger.
The traceability is working in the job when I am executing it inside my IntelliJ IDE, but once I create the package and try to execute it inside the cluster, I am not able to make it work.
Is there any blocker in that sense for Apache Flink that I am not aware of?
I have accomplished this using a variable:
export FLINK_ENV_JAVA_OPTS=-javaagent:./lib/opentelemetry-javaagent-all.jar
But this is working if I am setting up the Flink's cluster. The problem it's that the cluster that I am using is inside AWS (Kinesis Analytics) and I am not able to set up this variable.
Is there a way to use OpenTelemetry with Flink?
I'm looking for a sample project that will help me understand how to use the Flink RemoteStreamEnvironment api. Does anyone have a link to a sample project?
Normally there is no need to use either RemoteStreamEnvironment or LocalStreamEnvironment. Normally you can do
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
and the runtime will use either a local or remote stream execution environment, depending on how Flink is being launched (e.g., if from within an IDE you'll get a local mini-cluster, and with the CLI or REST api it's straightforward to submit a job to a cluster).
But if you want to use a RemoteStreamEnvironment directly, you can use one of the StreamExecutionEnvironment#createRemoteEnvironment(...) methods to create the remote execution environment, and then just use that env like you would any other to setup the job graph and execute the job.
Could anyone please let me know how I can setup Flink in my Serverless platform (FaaS) to perform event driven operations?
I looked at Flink functions and it seems to be promising. Could anyone clarify on the below?
What I need to install in my FaaS env. to trigger the flink function when an event (file changes in my s3 bucket) occurs?
I don't have big data platform and so planning to use flink in my serverless/kubernetes env.
Thanks in advance!!
To use StateFun You would generally need:
An Ingress that would trigger the functions.
The actual code that would react to your events (the stateful function) Dockerized
A way to lunch your application
Every stateful function application starts with an Ingress, basically that is a funnel of events that your functions can react to.
In your case, you can use Amazon Kinesis as your Ingress, and make sure that your S3 events will end up there.
The next thing that you would need, is to get yourself familiar with a stateful function SDK, either in Java or in Python and write the logic that deals with the incoming events. The result of that stage would be a Docker image.
Then, you need to lunch the image obtained at (2) and for that you can use Kubernetes (you don't have to).
There are Helm charts provided for your convenience and a simple utility to generate the necessary k8s resources.