I'm learning flink document-dataset api
there's a class calledmytupleReducer
I'm trying to complete it:
https://paste.ubuntu.com/p/3CjphGQrXP/
but it' full of red line in Intellij.
could you give me a right style of above code?
Thanks for your help~!
PS:
I'm writing part of MyTupleReduce
https://pastebin.ubuntu.com/p/m4rjs6t8QP/
but the return part is Wrong.
I suspect that importing the Reduce from akka has thrown you off course. Flink's ReduceFunction<T> interface needs you to implement a reduce method with this signature:
T reduce(T value1, T value2) throws Exception;
This is a classic reduce that takes two objects of type T and produces a third object of the same type.
Related
I Am working with flink 1.15.2, should i use Row or GenericRowData that inherit RowData for my own data type?, i mostly use streaming api.
Thanks.
Sig.
In general the DataStream API is very flexible when it comes to record types. POJO types might be the most convenient ones. Basically any Java class can be used but you need to check which TypeInformation is extracted via reflection. Sometimes it is necessary to manually overwrite it.
For Row you will always have to provide the types manually as reflection cannot do much based on class signatures.
GenericRowData should be avoided, it is rather an internal class with many caveats (strings must be StringData and array handling is not straightforward). Also GenericRowData becomes BinaryRowData after deserialization. TLDR This type is meant for the SQL engine.
The docs are actually helpful here, I was confused too.
The section at the top titled "All Known Implementing Classes" lists all the implementations. RowData and GenericRowData are described as internal data structures. If you can use a POJO, then great. But if you need something that implements RowData, take a look at BinaryRowData, BoxedWrapperRowData, ColumnarRowData, NestedRowData, or any of the implementations there that aren't listed as internal.
I'm personally using NestedRowData to map a DataStream[Row] into a DataStream[RowData] and I'm not at all sure that's a good idea :) Especially since I can't seem to add a string attribute
I have a snippet of code that looks like this:
DataStream<Tuple2<Long, Integer>> datastream = otherDatastream
.keyBy(event -> event.getField(1))
.process(new SomeFunction());
My someFunction is a class that extends the KeyedProcessFunction. But trying this code results in a Cannot resolve method process(SomeFunction). I am unsure what the correct syntax would look like for this case.
It's necessary to get all of the details in SomeFunction exactly right: the type parameters, method overrides, etc. If you share all of the details we can be more helpful, but a good strategy, in general, is to rely on your IDE to generate the boilerplate for you.
For starters, make sure that the SomeFunction class extends KeyedProcessFunction<KEY, IN, OUT>, where KEY is whatever type is returned by event.getField(1), IN is whatever type event is, and OUT appears to be Tuple2<Long, Integer>.
Another strategy would be to start from working examples, like the ones in the Apache Flink training repository.
I'm seeing some logs within my flink app with respect to my thrift classes:
2020-06-01 14:31:28 INFO TypeExtractor:1885 - Class class com.test.TestStruct contains custom serialization methods we do not call, so it cannot be used as a POJO type and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
So I followed the instructions here:
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html#apache-thrift-via-kryo
And I did that for the thrift of TestStruct along with all the thrift structs within that. ( I've skipped over named types though ).
Also the thrift code that got generated is in Java whereas the flink app is written using scala.
How would I make that error disappear? Because I'm getting another bug where if I pass my dataStream to convert into that TestStruct, some fields are missing. I suspect this is due to serialization issues?
Actually, as of now, you can't get rid of this warning, but it is also not a problem for the following reason:
The warning basically just says that Flink's type system is not using any of its internal serializers but will instead treat the type as a "generic type" which means, it is serialized via Kryo. If you followed my blog post on this, this is exactly what you want: use Kryo to serialize via Thrift. You could use a debugger to set a breakpoint into TBaseSerializer to verify that Thrift is being used.
As for the missing fields, I would suspect that this happens during the conversion into your TestStruct in your (flat)map operator and maybe not in the serialization that is used to pass this struct to the next operator. You should verify where these fields get missing - if you have this reproducible, a breakpoint in the debugger of your favourite IDE should help you find the cause.
First up, I'm very new to stream processing, feel free to correct me where I misunderstand concepts :)
I'm using Apache Fink.
My source is a FlinkKafkaConsumer, which already adds timestamps which it takes from Kafka.
In my processing I want to be able to use a watermark (the why is out of scope for this question).
What I'm after is the watermark generation behaviour as provided by the abstract class BoundedOutOfOrdernessTimestampExtractor.
But this class only provides a:
public abstract long extractTimestamp(T element);
Which if you override it gives you the element, but not the timestamp originally provided by FlinkKafkaConsumer
The TimestampAssigner interface implemented by BoundedOutOfOrdernessTimestampExtractor does provide a public final long extractTimestamp(T element, long previousElementTimestamp), which does give you the previously assigned, in which case you could just re-use that. But this method is made final in BoundedOutOfOrdernessTimestampExtractor, and thus can't be used.
So my way of getting around this now is to copy/paste the source code of BoundedOutOfOrdernessTimestampExtractor, and rewrite it to use previousElementTimestamp as the timestamp.
My question is: Is this indeed the best way to go about this, or is there a (better) alternative?
I'm just surprised having to copy paste classes, I'm used to (spoiled by) frameworks designed so that such 'basic' functionality can be done with what's incuded. (Or mayby what I want is actually very esoteric :)
I am beginner to spring aop and i am going through spring aop documentation to understand the concepts but failed to understand 'target object'.
the documentation says target object is the "object being advised by one or more aspects. Also referred to as the advised object".
what is the meaning of being advised by one or more aspects here? can anyone explain me what is target object in Lyman terms as i am still a beginner.
For a simple explanation of some basic AOP terms please refer to my other answer. Please read that one first before continuing to read here.
So the target object is the (Java or Spring) component to which you want to add new behaviour, usually a cross-cutting concern, i.e. some behaviour that is to be applied to many classes in your code base.
An aspect is a class in which you implement that cross-cutting concern and also determine where and how to apply it. The where is defined by a pointcut, some kind of search expression finding the relevant parts of your code base to apply the behaviour to. The how is implemented in an aspect method called an advice.
So when we say that an aspect advises an object, it means that it adds (cross-cutting) behaviour to it without changing the class itself.
In Spring AOP this is mostly method interception, i.e. doing something before or after a method executes.
In the more powerful AspectJ you can also intercept changes of member variables and constructor execution. Furthermore you can change the class structure itself by adding new members or methods or making the target class implement an interface etc.
Is it possible to define multiple targets like below:
#Before(value = "com.test.createUpdateDeletePointCut() && (target(com.testlab.A) || target(com.testlab.B))")