Use KeyProcessFunction on KeyBy - apache-flink

I have a snippet of code that looks like this:
DataStream<Tuple2<Long, Integer>> datastream = otherDatastream
.keyBy(event -> event.getField(1))
.process(new SomeFunction());
My someFunction is a class that extends the KeyedProcessFunction. But trying this code results in a Cannot resolve method process(SomeFunction). I am unsure what the correct syntax would look like for this case.

It's necessary to get all of the details in SomeFunction exactly right: the type parameters, method overrides, etc. If you share all of the details we can be more helpful, but a good strategy, in general, is to rely on your IDE to generate the boilerplate for you.
For starters, make sure that the SomeFunction class extends KeyedProcessFunction<KEY, IN, OUT>, where KEY is whatever type is returned by event.getField(1), IN is whatever type event is, and OUT appears to be Tuple2<Long, Integer>.
Another strategy would be to start from working examples, like the ones in the Apache Flink training repository.

Related

Is it better to use Row or GenericRowData with DataStream API?

I Am working with flink 1.15.2, should i use Row or GenericRowData that inherit RowData for my own data type?, i mostly use streaming api.
Thanks.
Sig.
In general the DataStream API is very flexible when it comes to record types. POJO types might be the most convenient ones. Basically any Java class can be used but you need to check which TypeInformation is extracted via reflection. Sometimes it is necessary to manually overwrite it.
For Row you will always have to provide the types manually as reflection cannot do much based on class signatures.
GenericRowData should be avoided, it is rather an internal class with many caveats (strings must be StringData and array handling is not straightforward). Also GenericRowData becomes BinaryRowData after deserialization. TLDR This type is meant for the SQL engine.
The docs are actually helpful here, I was confused too.
The section at the top titled "All Known Implementing Classes" lists all the implementations. RowData and GenericRowData are described as internal data structures. If you can use a POJO, then great. But if you need something that implements RowData, take a look at BinaryRowData, BoxedWrapperRowData, ColumnarRowData, NestedRowData, or any of the implementations there that aren't listed as internal.
I'm personally using NestedRowData to map a DataStream[Row] into a DataStream[RowData] and I'm not at all sure that's a good idea :) Especially since I can't seem to add a string attribute

how to implement the class `MyTupleReducer`in flink official document

I'm learning flink document-dataset api
there's a class calledmytupleReducer
I'm trying to complete it:
https://paste.ubuntu.com/p/3CjphGQrXP/
but it' full of red line in Intellij.
could you give me a right style of above code?
Thanks for your help~!
PS:
I'm writing part of MyTupleReduce
https://pastebin.ubuntu.com/p/m4rjs6t8QP/
but the return part is Wrong.
I suspect that importing the Reduce from akka has thrown you off course. Flink's ReduceFunction<T> interface needs you to implement a reduce method with this signature:
T reduce(T value1, T value2) throws Exception;
This is a classic reduce that takes two objects of type T and produces a third object of the same type.

Use Storage.writeObject with a Runnable in Codename One

In Codename One, a code like the following doesn't compile:
Runnable r = (Runnable & Serializable)() -> Log.p("Serializable!");
I get:
error: cannot find symbol
symbol: method getImplMethodKind()
location: interface SerializedLambda
Is there any way to write a Runnable to the Storage? Thank you
No. Unlike Java serialization we don't write Class data only the data of the object. Since the bytecode is transpiled to native platforms there's no applicable class data to write. We also don't support the Serializable interface, only our version of Externalizable which isn't compatible.
You can write something based on that but it won't be as pretty as you need to create a regular class. That's because we can't use reflection voodoo to load an oddly structured class dynamically.

Correct way to add a (lagging) watermark to a source which already has timestamps

First up, I'm very new to stream processing, feel free to correct me where I misunderstand concepts :)
I'm using Apache Fink.
My source is a FlinkKafkaConsumer, which already adds timestamps which it takes from Kafka.
In my processing I want to be able to use a watermark (the why is out of scope for this question).
What I'm after is the watermark generation behaviour as provided by the abstract class BoundedOutOfOrdernessTimestampExtractor.
But this class only provides a:
public abstract long extractTimestamp(T element);
Which if you override it gives you the element, but not the timestamp originally provided by FlinkKafkaConsumer
The TimestampAssigner interface implemented by BoundedOutOfOrdernessTimestampExtractor does provide a public final long extractTimestamp(T element, long previousElementTimestamp), which does give you the previously assigned, in which case you could just re-use that. But this method is made final in BoundedOutOfOrdernessTimestampExtractor, and thus can't be used.
So my way of getting around this now is to copy/paste the source code of BoundedOutOfOrdernessTimestampExtractor, and rewrite it to use previousElementTimestamp as the timestamp.
My question is: Is this indeed the best way to go about this, or is there a (better) alternative?
I'm just surprised having to copy paste classes, I'm used to (spoiled by) frameworks designed so that such 'basic' functionality can be done with what's incuded. (Or mayby what I want is actually very esoteric :)

Kotlin Nested Object Classes

Ok so i'v been starting to learn kotlin for a week now, and i love the language:p
Besides the great utility of extension function, i feel like they lack a proper way of creating namespaces like java utility classes (xxxUtil).
I have recently starting to use this aproach, which im not sure is the right one, and i would like some feedback from Kotlin experienced users.
Is this a valid and proper thing todo:
object RealmDb {
private val realmInstance by lazy{ Realm.getInstance(MainApplication.instance) }
private fun wrapInTransaction(code:() -> Unit){
realmInstance.beginTransaction();
code.invoke()
realmInstance.commitTransaction();
}
object NormaNote{
fun create(...) {...}
fun update(...) {...}
}
}
So, whenever i want to update some NormalNote value to a Realm Database, i do the following:
RealmDb.NormaNote.create(title.text.toString(), note.text.toString())
Is this a common thing to do? Are there better approaches? As i understood, this is singleton nesting, i don't think there's any problem with this, i just don't like to put this common things like DB operations inside classes that need to be instantiated. In old java i opted to static classes
The officially recommended way to create namespaces in Kotlin is to put properties and functions that don't need to be inside classes at the top level of the file, and to use the package statements to create a namespace hierarchy. We see the practice of creating utility classes in Java as a workaround for a deficiency in the language, and not as a good practice to be followed in other languages.
In your example, I would put all of the code in top-level functions and properties.
I don't know about the rest of the code, but I do know that you don't have to call .invoke () on code. The invoke method can always be shortened to a direct call, which in this case would be code ().

Resources