Does Apache Flink's Python SDK (PyFlink) Datastream API support operators like Windowing? Whatever examples I have seen so far for Windowing with PyFlink, all use the Table API. The Datastream API does support these operators, but looks like these are not available via PyFlink yet?
Thanks!
That's correct, PyFlink doesn't yet support the DataStream window API. Follow FLINK-21842 to track progress on this issue.
Related
I want to create a Custom Apache Flink Sink to AWS Sagemaker Feature store, but there is no documentation for how to create custom sinks on Flink's website. There are also multiple base classes that I can potentially extend (e.g. AsyncSinkBase, RichSinkFunction), so I'm not sure which to use.
I am looking for guidelines regarding how to implement a custom sink (both in general and for my specific use-case). For my specific use-case: Sagemaker Feature Store has a synchronous client with a putRecord call to send records to AWS Sagemaker FS, so I am ideally looking for a way to create a custom sink that would work well with this client. Note: I require at at least once processing guarantees, as Sagemaker FS is DynamoDB (a key-value store) under the hood.
Java Client: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sagemakerfeaturestoreruntime/AmazonSageMakerFeatureStoreRuntime.html
Example of the putRecord call using the Python client: https://github.com/aws-samples/amazon-sagemaker-feature-store-streaming-aggregation/blob/main/src/lambda/StreamingIngestAggFeatures/lambda_function.py#L31
What I've Found so Far
Some older articles which say to use org.apache.flink.streaming.api.functions.sink.RichSinkFunction and SinkFunction
Some connectors using classes in org.apache.flink.connector.base.sink.writer (e.g. AsyncSinkWriter, AsyncSinkBase)
This section of the Flink docs says to use the SourceReaderBase from org.apache.flink.connector.base.source.reader when creating custom sources; SourceBaseReader seems to be the equivalent source to the sink classes in the bullet above
Any help/guidance/insights are much appreciated, thanks.
How about extending RichAsyncFunction ?
you can find similar example here - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api
I want to subscribe to a kafka topic and ingest/ transform the data in snowflake.I would like to use the snowpark api for scala and the Streams & Tasks available in snowflake.
I couldnt find anything in the API Dokumentation nor via google. Is this option already available and if not is it planed on some roadmap?
I want to call my Java interfaces in a jar file in a PyFlink job. No solutions are found in the offical document.
It looks to me like support for this was not included in Flink 1.9, but is ongoing work. See FLIP-58. FLIP-78 and FLIP-88 may also be of interest. Note that most of these improvements will be included in the upcoming Flink 1.10 release.
You can use python table api to register java user-defined function if it satisfies your need. The signature of method is register_java_function in table_environment
Is it possible to use PyFlink with python machine learning libraries such as LightGBM for a streaming application? Is there any good example for this?
There is no complete example but you can take a loot at Getting Started with Flink Python and then take a look at how Python UDFs can be used: UDFs in the Table API.
In Fink source, there are flink-stream-java and flink-stream-scala modules. Why do we need two modules for flink streaming?
https://github.com/apache/flink/tree/master/flink-streaming-java
https://github.com/apache/flink/tree/master/flink-streaming-scala
Both flink-stream-java and flink-stream-scala provide a similar API to manage Flink Streams ; you only have to use one of them, depending on your language.
Please note that whatever your choice, some dependencies like flink-runtime and flink-clients depend on a version of scala (2.11 or 2.12), because Flink is based on a framework written in scala, Akka.
There is an ongoing effort to remove scala dependency from a higher level API, flink-table (FLINK-11063).
flink-stream-java is the implement of java api for stream. flink-stream-scala is the implement of scala api for stream. So you can find DataStream.java in flink-stream-java, and DataStream.scala in flink-stream-scala.
These two modules will accomplish the same function, but different developers receive different languages, and personal task scala is more suitable for operator description in languages such as big data, flink spark, etc.