Flink re-scalable keyed stream stateful function - apache-flink

I have the following Flink job where I tried to use keyed-stream stateful function (MapState) with backend type RockDB,
environment
.addSource(consumer).name("MyKafkaSource").uid("kafka-id")
.flatMap(pojoMapper).name("MyMapFunction").uid("map-id")
.keyBy(new MyKeyExtractor())
.map(new MyRichMapFunction()).name("MyRichMapFunction").uid("rich-map-id")
.addSink(sink).name("MyFileSink").uid("sink-id")
MyRichMapFunction is a stateful function which extends RichMapFunction which has following code,
public static class MyRichMapFunction extends RichMapFunction<MyEvent, MyEvent> {
private transient MapState<String, Boolean> cache;
#Override
public void open(Configuration config) {
MapStateDescriptor<String, Boolean> descriptor =
new MapStateDescriptor("seen-values", TypeInformation.of(new TypeHint<String>() {}), TypeInformation.of(new TypeHint<Boolean>() {}));
cache = getRuntimeContext().getMapState(descriptor);
}
#Override
public MyEvent map(MyEvent value) throws Exception {
if (cache.contains(value.getEventId())) {
value.setIsSeenAlready(Boolean.TRUE);
return value;
}
value.setIsSeenAlready(Boolean.FALSE);
cache.put(value.getEventId(), Boolean.TRUE)
return value;
}
}
In future, I would like to rescale the parallelism (from 2 to 4), so my question is, how can I achieve re-scalable keyed states so that after changing the parallelism I can get the corresponding cache keyed data to its corresponding task slot. I tried to explore this, where I found a documentation here. According to this, re-scalable operator state can be achieved by using ListCheckPointed interface which provides snapshotState/restoreState method for that. But not sure how re-scalable keyed state (MyRichMapFunction) can be achieved? Should I need to implement ListCheckPointed interface for my MyRichMapFunction class? If yes how can I redistribute the cache according to new parallelism key hash on restoreState method (my MapState will hold huge number of keys with TTL enabled, let's say max it will hold 1 billion keys at any point of time)? Could some one please help me on this or if you point me to any example that would be great too.

The code you've written is already rescalable; Flink's managed keyed state is rescalable by design. Keyed state is rescaled by rebalancing the assignment of keys to instances. (You can think of keyed state as a sharded key/value store. Technically what happens is that consistent hashing is used to map keys to key groups, and each parallel instance is responsible for some of the key groups. Rescaling simply involves redistributing the key groups among the instances.)
The ListCheckpointed interface is for state used in a non-keyed context, so it's inappropriate for what you are doing. Note also that ListCheckpointed will be deprecated in Flink 1.11 in favor of the more general CheckpointedFunction.
One more thing: if MyKeyExtractor is keying by value.getEventId(), then you could be using ValueState<Boolean> for your cache, rather than MapState<String, Boolean>. This works because with keyed state there is a separate value of ValueState for every key. You only need to use MapState when you need to store multiple attribute/value pairs for each key in your stream.
Most of this is discussed in the Flink documentation under Hands-on Training, which includes an example that's very close to what you are doing.

Related

How to config rockDB StateBackend parameter for specific keyed state in Flink

I have a Flink job using RocksDBStateBackend.
Thera are several ProcessFunctions and each of them use MapState.
Suppose one of the MapStates is rather small: 200~300MB, and I want to ensure high read qps. I want to config enough block cache size for the MapState, while other MapStates use managed memory of task slot, So that total memory in block cache won't grow too much.
public static class MyProcessFunction1 extends KeyedProcessFunction<Integer, String, Long> {
// this map state will store lot of kv pairs,
// so I hope to config rocksdb column family options
// optimized for write
private transient MapState<byte[], byte[]> largeMapState;
}
public static class MyProcessFunction2 extends KeyedProcessFunction<Integer, String, Long> {
// this map state will store dozen of kv,
// so I hope to config rocksdb column family options
// optimized for small db, further more I want this
// map state do not share managed memory with other state.
private transient MapState<byte[], byte[]> smallMapState;
}

Instance of object related to flink Parallelism & Apply Method

First let me ask the my question then could you please clarify my assumption about apply method?
Question: If my application creates 1.500.000 (approximately) records in every one minute interval and flink job reads these records from kafka consumer with let's say 15++ different operators, then this logic could create latency, backpressure etc..? (you may assume that parallelism is 16)
public class Sample{
//op1 =
kafkaSource
.keyBy(something)
.timeWindow(Time.minutes(1))
.apply(new ApplySomething())
.name("Name")
.addSink(kafkaSink);
//op2 =
kafkaSource
.keyBy(something2)
.timeWindow(Time.seconds(1)) // let's assume that this one second
.apply(new ApplySomething2())
.name("Name")
.addSink(kafkaSink);
// ...
//op16 =
kafkaSource
.keyBy(something16)
.timeWindow(Time.minutes(1))
.apply(new ApplySomething16())
.name("Name")
.addSink(kafkaSink);
}
// ..
public class ApplySomething ... {
private AnyObject object;
private int threshold = 30, 40, 100 ...;
#Override
public void open(Configuration parameters) throws Exception{
object = new AnyObject();
}
#Override
public void apply(Tuple tuple, TimeWindow window, Iterable<Record> input, Collector<Result> out) throws Exception{
int counter = 0;
for (Record each : input){
counter += each.getValue();
if (counter > threshold){
out.collec(each.getResult());
return;
}
}
}
}
If yes, should i use flatMap with state(rocksDB) instead of timeWindow?
My prediction is "YES". Let me explain why i am thinking like that:
If parallelism is 16 than there will be a 16 different instances of indivudual ApplySomething1(), ApplySomething2()...ApplySomething16() and also there will be sixteen AnyObject() instances for per ApplySomething..() classes.
When application works, if keyBy(something)partition number is larger than 16 (assumed that my application has 1.000.000 different something per day), then some of the ApplySomething..()instances will handle the different keys therefore one apply() should wait the others for loops before processing. Then this will create a latency?
Flink's time windows are aligned to the epoch (e.g., if you have a bunch of hourly windows, they will all trigger on the hour). So if you do intend to have a bunch of different windows in your job like this, you should configure them to have distinct offsets, so they aren't all being triggered simultaneously. Doing that will spread out the load. That will look something like this
.window(TumblingProcessingTimeWindows.of(Time.minutes(1), Time.seconds(15))
(or use TumblingEventTimeWindows as the case may be). This will create minute-long windows that trigger at 15 seconds after each minute.
Whenever your use case permits, you should use incremental aggregation (via reduce or aggregate), rather than using a WindowFunction (or ProcessWindowFunction) that has to collect all of the events assigned to each window in a list before processing them as a sort of mini-batch.
A keyed time window will keep its state in RocksDB, assuming you have configured RocksDB as your state backend. You don't need to switch to using a RichFlatMap to have access to RocksDB. (Moreover, since a flatMap can't use timers, I assume you would really end up using a process function instead.)
While any of the parallel instances of the window operator is busy executing its window function (one of the ApplySomethings) you are correct in thinking that that task will not be doing anything else -- and thus it will (unless it completes very quickly) create temporary backpressure. You will want to increase the parallelism as needed so that the job can satisfy your requirements for throughput and latency.

State handling on KeyedCoProcessFunction serving ML models

I am working on a KeyedCoProcessFunction that looks like this:
class MyOperator extends KeyedCoProcessFunction[String, ModelDef, Data, Prediction]
with CheckpointedFunction {
// To hold loaded models
#transient private var models: HashMap[(String, String), Model] = _
// For serialization purposes
#transient private var modelsBytes: MapState[(String, String), Array[Bytes]] = _
...
override def snapshotState(context: FunctionSnapshotContext): Unit = {
modelsBytes.clear() // This raises an exception when there is no active key set
for ((k, model) <- models) {
modelsBytes.put(k, model.toBytes(v))
}
}
override def initializeState(context: FunctionInitializationContext): Unit = {
modelsBytes = context.getKeyedStateStore.getMapState[String, String](
new MapStateDescriptor("modelsBytes", classOf[String], classOf[String])
)
if (context.isRestored) {
// restore models from modelsBytes
}
}
}
The state consists of a collection of ML models built using a third party library. Before checkpoints, I need to dump the loaded models into byte arrays in snapshotState.
My question is, within snapshotState, modelsBytes.clear() raises an exception when there is no active key. This happens when I start the application from scratch without any data on the input streams. So, when the time for a checkpoint comes, I get this error:
java.lang.NullPointerException: No key set. This method should not be called outside of a keyed context.
However, when the input stream contains data, checkpoints work just fine. I am a bit confused about this because snapshotState does not provide a keyed context (contrary to processElement1 and processElement2, where the current key is accessible by doing ctx.getCurrentKey) so it seems to me that the calls to clear and put within snapshotState should fail always since they're supposed to work only within a keyed context. Can anyone clarify if this is the expected behaviour actually?
A keyed state can only be used on a keyed stream as written in the documentation.
* <p>The state is only accessible by functions applied on a {#code KeyedStream}. The key is
* automatically supplied by the system, so the function always sees the value mapped to the
* key of the current element. That way, the system can handle stream and state partitioning
* consistently together.
If you call clear(), you will not clear the whole map, but just reset the state of the current key. The key is always known in processElementX.
/**
* Removes the value mapped under the current key.
*/
void clear();
You should actually receive a better exception when you try to call clear in a function other than processElementX. In the end, you are using the keyed state incorrectly.
Now for your actual problem. I'm assuming you are using a KeyedCoProcessFunction because the models are updated in a separate input. If they are static, you could just load them open from a static source (for example, included in the jar). Furthermore, often there is only one model that is applied for all values with different keys, then you could use BroadCast state. So I'm assuming you have different models for different types of data separated by keys.
If they are coming in from input2, then you already serialize them upon invocation of processElement2.
override def processElement2(model: Model, ctx: Context, collector): Unit = {
models.put(ctx.getCurrentKey, model)
modelsBytes.put(ctx.getCurrentKey, model.toBytes(v))
}
Then you would not override snapshotState, as the state is already up-to-date. initializeState would deserialize models eagerly or you could also materialize them lazily in processElement1.

how can I implement keyed window timeouts in Flink?

I have keyed events coming in on a stream that I would like to accumulate by key, up to a timeout (say, 5 minutes), and then process the events accumulated up to that point (and ignore everything after for that key, but first things first).
I am new to Flink, but conceptually I think I need something like the code below.
DataStream<Tuple2<String, String>> dataStream = see
.socketTextStream("localhost", 9999)
.flatMap(new Splitter())
.keyBy(0)
.window(GlobalWindows.create())
.trigger(ProcessingTimeTrigger.create()) // how do I set the timeout value?
.fold(new Tuple2<>("", ""), new FoldFunction<Tuple2<String, String>, Tuple2<String, String>>() {
public Tuple2<String, String> fold(Tuple2<String, String> agg, Tuple2<String, String> elem) {
if ( agg.f0.isEmpty()) {
agg.f0 = elem.f0;
}
if ( agg.f1.isEmpty()) {
agg.f1 = elem.f1;
} else {
agg.f1 = agg.f1 + "; " + elem.f1;
}
return agg;
}
});
This code does NOT compile because a ProcessingTimeTrigger needs a TimeWindow, and GlobalWindow is not a TimeWindow. So...
How can I accomplish keyed window timeouts in Flink?
You will have a much easier time if you approach this with a KeyedProcessFunction.
I suggest establishing an item of keyed ListState in the open() method of a KeyedProcessFunction. In the processElement() method, if the list is empty, set a processing-time timer (a per-key timer, relative to the current time) to fire when you want the window to end. Then append the incoming event to the list.
When the timer fires the onTimer() method will be called, and you can iterate over the list, produce a result, and clear the list.
To arrange for only doing all of this only once per key, add a ValueState<Boolean> to the KeyedProcessFunction to keep track of this. (Note that if your key space is unbounded, you should think about a strategy for eventually expiring the state for stale keys.)
The documentation describes how to use Process Functions and how to work with state. You can find additional examples in the Flink training site, such as this exercise.

Is there an equivalent to Kafka's KTable in Apache Flink?

Apache Kafka has a concept of a KTable, where
where each data record represents an update
Essentially, I can consume a kafka topic, and only keep the latest message for per key.
Is there a similar concept available in Apache Flink? I have read about Flink's Table API, but does not seem to be solving the same problem.
Some help comparing and contrasting the 2 frameworks would be helpful. I am not looking for which is better or worse. But rather just how they differ. The answer for which is right would then depend on my requirements.
You are right. Flink's Table API and its Table class do not correspond to Kafka's KTable. The Table API is a relational language-embedded API (think of SQL integrated in Java and Scala).
Flink's DataStream API does not have a built-in concept that corresponds to a KTable. Instead, Flink offers sophisticated state management and a KTable would be a regular operator with keyed state.
For example, a stateful operator with two inputs that stores the latest value observed from the first input and joins it with values from the second input, can be implemented with a CoFlatMapFunction as follows:
DataStream<Tuple2<Long, String>> first = ...
DataStream<Tuple2<Long, String>> second = ...
DataStream<Tuple2<String, String>> result = first
// connect first and second stream
.connect(second)
// key both streams on the first (Long) attribute
.keyBy(0, 0)
// join them
.flatMap(new TableLookup());
// ------
public static class TableLookup
extends RichCoFlatMapFunction<Tuple2<Long,String>, Tuple2<Long,String>, Tuple2<String,String>> {
// keyed state
private ValueState<String> lastVal;
#Override
public void open(Configuration conf) {
ValueStateDescriptor<String> valueDesc =
new ValueStateDescriptor<String>("table", Types.STRING);
lastVal = getRuntimeContext().getState(valueDesc);
}
#Override
public void flatMap1(Tuple2<Long, String> value, Collector<Tuple2<String, String>> out) throws Exception {
// update the value for the current Long key with the String value.
lastVal.update(value.f1);
}
#Override
public void flatMap2(Tuple2<Long, String> value, Collector<Tuple2<String, String>> out) throws Exception {
// look up latest String for current Long key.
String lookup = lastVal.value();
// emit current String and looked-up String
out.collect(Tuple2.of(value.f1, lookup));
}
}
In general, state can be used very flexibly with Flink and let's you implement a wide range of use cases. There are also more state types, such as ListState and MapState and with a ProcessFunction you have fine-grained control over time, for example to remove the state of a key if it has not been updated for a certain amount of time (KTables have a configuration for that as far as I know).

Resources