Why flink does not drop late data? - apache-flink

I am calculating the maximum value of a simple steam and the result is:
(S1,1000,S1, value: 999)
(S1,2000,S1, value: 41)
The last line of data is obviously late: new SensorReading("S1", 999, 100L)
why was it calculated by the first window(0-1000)?
I think that the first window should be fired when SensorReading("S1", 41, 1000L) arrives.
I am very confused about this result.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(TrainingBase.parallelism);
DataStream<SensorReading> input = env.fromElements(
new SensorReading("S1", 35, 500L),
new SensorReading("S1", 42, 999L),
new SensorReading("S1", 41, 1000L),
new SensorReading("S1", 40, 1200L),
new SensorReading("S1", 23, 1400L),
new SensorReading("S1", 999, 100L)
);
input.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<SensorReading>() {
private long currentMaxTimestamp;
#Nullable
#Override
public Watermark getCurrentWatermark() {
return new Watermark(currentMaxTimestamp);
}
#Override
public long extractTimestamp(SensorReading element, long previousElementTimestamp) {
currentMaxTimestamp = element.ts;
return currentMaxTimestamp;
}
})
.keyBy((KeySelector<SensorReading, String>) value -> value.sensorName)
.window(TumblingEventTimeWindows.of(Time.seconds(1)))
.reduce(new MyReducingMax(), new MyWindowFunction())
.print();
env.execute();
MyReducingMax(), MyWindowFunction()
private static class MyReducingMax implements ReduceFunction<SensorReading> {
public SensorReading reduce(SensorReading r1, SensorReading r2) {
return r1.getValue() > r2.getValue() ? r1 : r2;
}
}
private static class MyWindowFunction extends
ProcessWindowFunction<SensorReading, Tuple3<String, Long, SensorReading>, String, TimeWindow> {
#Override
public void process(
String key,
Context context,
Iterable<SensorReading> maxReading,
Collector<Tuple3<String, Long, SensorReading>> out) {
SensorReading max = maxReading.iterator().next();
out.collect(new Tuple3<>(key, context.window().getEnd(), max));
}
}
public static class SensorReading {
String sensorName;
int value;
Long ts;
public SensorReading() {
}
public SensorReading(String sensorName, int value, Long ts) {
this.sensorName = sensorName;
this.value = value;
this.ts = ts;
}
public Long getTs() {
return ts;
}
public void setTs(Long ts) {
this.ts = ts;
}
public String getSensorName() {
return sensorName;
}
public void setSensorName(String sensorName) {
this.sensorName = sensorName;
}
public int getValue() {
return value;
}
public void setValue(int value) {
this.value = value;
}
public String toString() {
return this.sensorName + "(" + this.ts + ") value: " + this.value;
}
;
}

An AssignerWithPeriodicWatermarks doesn't create a Watermark at every conceivable opportunity. Instead, Flink calls such an assigner periodically to get the latest watermark, and by default this is done every 200 msec (of real time, not event time). This interval is controlled by ExecutionConfig.setAutoWatermarkInterval(...).
This means that all six of your test events have almost certainly been processed before your watermark assigner could be called.
If you care about having more predictable watermarking, you could use an AssignerWithPunctuatedWatermarks instead.
BTW, the way that your watermark assigner is written, all of the out-of-order events are potentially late. It is more typical to use a BoundedOutOfOrdernessTimestampExtractor that allows for some out-of-orderness.

Related

Why was clear method in custom trigger of global window not been invoked?

I have used global window and custom trigger. Then notice that the state size in every checkpoint keeps increasing. So I tried to set breakpoints in clear method and found clear method seems not been invoked. So I guess it is because clear method not been invoked which makes the state size keeps increasing.
main method
final StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
see.enableCheckpointing(5000L, CheckpointingMode.EXACTLY_ONCE);
see.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000L);
see.setStateBackend(new MemoryStateBackend());
see.getCheckpointConfig().setCheckpointTimeout(3000L);
DataStream<String> dataStream = generateData(see);
dataStream.flatMap(new FlatMapFunction<String, Tuple2<String,Integer>>() {
#Override
public void flatMap(String line, Collector<Tuple2<String,Integer>> collector) throws Exception {
String[] split = line.split(" ");
for (String s1 : split) {
collector.collect(new Tuple2<>(s1,1));
}
}
}).keyBy(0).window(GlobalWindows.create())
.trigger(PurgingTrigger.of(TimeoutCountTrigger.of(10,1000L)))
.process(new CustomProcessWindow())
.print().setParallelism(1);
see.execute();
Trigger implement:
public class CountWithTimeoutTrigger<T, W extends Window> extends Trigger<T, W> {
private static final long serialVersionUID = 1L;
private final long maxCount;
private final long timeoutMs;
private final ValueStateDescriptor<Long> countDesc = new ValueStateDescriptor<>("count", LongSerializer.INSTANCE, 0L);
private final ValueStateDescriptor<Long> deadlineDesc = new ValueStateDescriptor<>("deadline", LongSerializer.INSTANCE, Long.MAX_VALUE);
private CountWithTimeoutTrigger(long maxCount, long timeoutMs) {
this.maxCount = maxCount;
this.timeoutMs = timeoutMs;
}
#Override
public TriggerResult onElement(T element, long timestamp, W window, Trigger.TriggerContext ctx) throws IOException {
final ValueState<Long> deadline = ctx.getPartitionedState(deadlineDesc);
final ValueState<Long> count = ctx.getPartitionedState(countDesc);
final long currentDeadline = deadline.value();
final long currentTimeMs = System.currentTimeMillis();
final long newCount = count.value() + 1;
if (currentTimeMs >= currentDeadline || newCount >= maxCount) {
return fire(deadline, count);
}
if (currentDeadline == deadlineDesc.getDefaultValue()) {
final long nextDeadline = currentTimeMs + timeoutMs;
deadline.update(nextDeadline);
ctx.registerProcessingTimeTimer(nextDeadline);
}
count.update(newCount);
return TriggerResult.CONTINUE;
}
#Override
public TriggerResult onEventTime(long time, W window, Trigger.TriggerContext ctx) {
return TriggerResult.CONTINUE;
}
#Override
public TriggerResult onProcessingTime(long time, W window, Trigger.TriggerContext ctx) throws Exception {
final ValueState<Long> deadline = ctx.getPartitionedState(deadlineDesc);
// fire only if the deadline hasn't changed since registering this timer
if (deadline.value() == time) {
return fire(deadline, ctx.getPartitionedState(countDesc));
}
return TriggerResult.CONTINUE;
}
#Override
public void clear(W window, TriggerContext ctx) throws Exception {
// ***** this method not been invoked *****
final ValueState<Long> deadline = ctx.getPartitionedState(deadlineDesc);
final ValueState<Long> cntState = ctx.getPartitionedState(countDesc);
final long deadlineValue = deadline.value();
if (deadlineValue != deadlineDesc.getDefaultValue()) {
ctx.deleteProcessingTimeTimer(deadlineValue);
}
deadline.clear();
cntState.clear();
}
private TriggerResult fire(ValueState<Long> deadline, ValueState<Long> count) throws IOException {
deadline.update(Long.MAX_VALUE);
count.update(0L);
return TriggerResult.FIRE;
}
public static <T, W extends Window> CountWithTimeoutTrigger<T, W> of(long maxCount, long intervalMs) {
return new CountWithTimeoutTrigger<>(maxCount, intervalMs);
}
}
I expect the clear method to be called and clear state in clear method, but it seems clear method in trigger not been invoked and state size in every checkpoint keeps increasing.
The Trigger.clear() method is invoked when the window is closed. This happens when the application time (processing time or event time as defined by WindowAssigner.isEventTime()) reaches the end timestamp of the window.
Since a GlobalWindow never ends, the end timestamp of a GlobalWindow is Long.MAX_VALUE. Hence, the Trigger.clear() method will never be called if the trigger is applied on a GlobalWindow.

How to sort the union datastream of flink without watermark

The flink flow has multi data stream, then I merge those data stream with org.apache.flink.streaming.api.datastream.DataStream#union method.
Then, I got the problem, the datastream is disordered and I can not set window to sort the data in data stream.
Sorting union of streams to identify user sessions in Apache Flink
I got the the answer, but the com.liam.learn.flink.example.union.UnionStreamDemo.SortFunction#onTimer
never been invoked.
Environment Info: flink version 1.7.0
In general, I hope to sort the union datastream witout watermark.
You need watermarks so that the sorting function knows when it can safely emit sorted elements. Without watermarks, you get get an record from stream B that has an earlier date than any of the first N records of stream A, right?
But adding watermarks is easy, especially if you know that "event time" is strictly increasing for any one stream. Below is some code I wrote that extends what David Anderson posted in his answer to the other SO issue you referenced above - hopefully this will get you started.
-- Ken
package com.scaleunlimited.flinksnippets;
import java.util.PriorityQueue;
import java.util.Random;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.TimerService;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
import org.apache.flink.util.Collector;
import org.junit.Test;
public class MergeAndSortStreamsTest {
#Test
public void testMergeAndSort() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(2);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<Event> streamA = env.addSource(new EventSource("A"))
.assignTimestampsAndWatermarks(new EventTSWAssigner());
DataStream<Event> streamB = env.addSource(new EventSource("B"))
.assignTimestampsAndWatermarks(new EventTSWAssigner());
streamA.union(streamB)
.keyBy(r -> r.getKey())
.process(new SortByTimestampFunction())
.print();
env.execute();
}
private static class Event implements Comparable<Event> {
private String _label;
private long _timestamp;
public Event(String label, long timestamp) {
_label = label;
_timestamp = timestamp;
}
public String getLabel() {
return _label;
}
public void setLabel(String label) {
_label = label;
}
public String getKey() {
return "1";
}
public long getTimestamp() {
return _timestamp;
}
public void setTimestamp(long timestamp) {
_timestamp = timestamp;
}
#Override
public String toString() {
return String.format("%s # %d", _label, _timestamp);
}
#Override
public int compareTo(Event o) {
return Long.compare(_timestamp, o._timestamp);
}
}
#SuppressWarnings("serial")
private static class EventTSWAssigner extends AscendingTimestampExtractor<Event> {
#Override
public long extractAscendingTimestamp(Event element) {
return element.getTimestamp();
}
}
#SuppressWarnings("serial")
private static class SortByTimestampFunction extends KeyedProcessFunction<String, Event, Event> {
private ValueState<PriorityQueue<Event>> queueState = null;
#Override
public void open(Configuration config) {
ValueStateDescriptor<PriorityQueue<Event>> descriptor = new ValueStateDescriptor<>(
// state name
"sorted-events",
// type information of state
TypeInformation.of(new TypeHint<PriorityQueue<Event>>() {
}));
queueState = getRuntimeContext().getState(descriptor);
}
#Override
public void processElement(Event event, Context context, Collector<Event> out) throws Exception {
TimerService timerService = context.timerService();
long currentWatermark = timerService.currentWatermark();
System.out.format("processElement called with watermark %d\n", currentWatermark);
if (context.timestamp() > currentWatermark) {
PriorityQueue<Event> queue = queueState.value();
if (queue == null) {
queue = new PriorityQueue<>(10);
}
queue.add(event);
queueState.update(queue);
timerService.registerEventTimeTimer(event.getTimestamp());
}
}
#Override
public void onTimer(long timestamp, OnTimerContext context, Collector<Event> out) throws Exception {
PriorityQueue<Event> queue = queueState.value();
long watermark = context.timerService().currentWatermark();
System.out.format("onTimer called with watermark %d\n", watermark);
Event head = queue.peek();
while (head != null && head.getTimestamp() <= watermark) {
out.collect(head);
queue.remove(head);
head = queue.peek();
}
}
}
#SuppressWarnings("serial")
private static class EventSource extends RichParallelSourceFunction<Event> {
private String _prefix;
private transient Random _rand;
private transient boolean _running;
private transient int _numEvents;
public EventSource(String prefix) {
_prefix = prefix;
}
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
_rand = new Random(_prefix.hashCode() + getRuntimeContext().getIndexOfThisSubtask());
}
#Override
public void cancel() {
_running = false;
}
#Override
public void run(SourceContext<Event> context) throws Exception {
_running = true;
_numEvents = 0;
long timestamp = System.currentTimeMillis() + _rand.nextInt(10);
while (_running && (_numEvents < 100)) {
long deltaTime = timestamp - System.currentTimeMillis();
if (deltaTime > 0) {
Thread.sleep(deltaTime);
}
context.collect(new Event(_prefix, timestamp));
_numEvents++;
// Generate a timestamp every 5...15 ms, average is 10.
timestamp += (5 + _rand.nextInt(10));
}
}
}
}

Why CEP doesn't print the first event only after I input second event when using ProcessingTime?

I sent one event with isStart true to kafka ,and made Flink consumed the event from the kafka, also set the TimeCharacteristic to ProcessingTime and set within(Time.seconds(5)), so I expected that CEP would print the event after 5 seconds I sent the first event, however it didn't, and it printed the first event only after I sent the second event to kafka. Why it printed the first event only I after sent two events? Didn't it should be print the event just after 5 seconds I sent the first one when using ProcessingTime ?
The following is the code:
public class LongRidesWithKafka {
private static final String LOCAL_ZOOKEEPER_HOST = "localhost:2181";
private static final String LOCAL_KAFKA_BROKER = "localhost:9092";
private static final String RIDE_SPEED_GROUP = "rideSpeedGroup";
private static final int MAX_EVENT_DELAY = 60; // rides are at most 60 sec out-of-order.
public static void main(String[] args) throws Exception {
final int popThreshold = 1; // threshold for popular places
// set up streaming execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
Properties kafkaProps = new Properties();
//kafkaProps.setProperty("zookeeper.connect", LOCAL_ZOOKEEPER_HOST);
kafkaProps.setProperty("bootstrap.servers", LOCAL_KAFKA_BROKER);
kafkaProps.setProperty("group.id", RIDE_SPEED_GROUP);
// always read the Kafka topic from the start
kafkaProps.setProperty("auto.offset.reset", "earliest");
// create a Kafka consumer
FlinkKafkaConsumer011<TaxiRide> consumer = new FlinkKafkaConsumer011<>(
"flinktest",
new TaxiRideSchema(),
kafkaProps);
// assign a timestamp extractor to the consumer
//consumer.assignTimestampsAndWatermarks(new CustomWatermarkExtractor());
DataStream<TaxiRide> rides = env.addSource(consumer);
DataStream<TaxiRide> keyedRides = rides.keyBy("rideId");
// A complete taxi ride has a START event followed by an END event
Pattern<TaxiRide, TaxiRide> completedRides =
Pattern.<TaxiRide>begin("start")
.where(new SimpleCondition<TaxiRide>() {
#Override
public boolean filter(TaxiRide ride) throws Exception {
return ride.isStart;
}
})
.next("end")
.where(new SimpleCondition<TaxiRide>() {
#Override
public boolean filter(TaxiRide ride) throws Exception {
return !ride.isStart;
}
});
// We want to find rides that have NOT been completed within 120 minutes
PatternStream<TaxiRide> patternStream = CEP.pattern(keyedRides, completedRides.within(Time.seconds(5)));
OutputTag<TaxiRide> timedout = new OutputTag<TaxiRide>("timedout") {
};
SingleOutputStreamOperator<TaxiRide> longRides = patternStream.flatSelect(
timedout,
new LongRides.TaxiRideTimedOut<TaxiRide>(),
new LongRides.FlatSelectNothing<TaxiRide>()
);
longRides.getSideOutput(timedout).print();
env.execute("Long Taxi Rides");
}
public static class TaxiRideTimedOut<TaxiRide> implements PatternFlatTimeoutFunction<TaxiRide, TaxiRide> {
#Override
public void timeout(Map<String, List<TaxiRide>> map, long l, Collector<TaxiRide> collector) throws Exception {
TaxiRide rideStarted = map.get("start").get(0);
collector.collect(rideStarted);
}
}
public static class FlatSelectNothing<T> implements PatternFlatSelectFunction<T, T> {
#Override
public void flatSelect(Map<String, List<T>> pattern, Collector<T> collector) {
}
}
private static class TaxiRideTSExtractor extends AscendingTimestampExtractor<TaxiRide> {
private static final long serialVersionUID = 1L;
#Override
public long extractAscendingTimestamp(TaxiRide ride) {
// Watermark Watermark = getCurrentWatermark();
if (ride.isStart) {
return ride.startTime.getMillis();
} else {
return ride.endTime.getMillis();
}
}
}
private static class CustomWatermarkExtractor implements AssignerWithPeriodicWatermarks<TaxiRide> {
private static final long serialVersionUID = -742759155861320823L;
private long currentTimestamp = Long.MIN_VALUE;
#Override
public long extractTimestamp(TaxiRide ride, long previousElementTimestamp) {
// the inputs are assumed to be of format (message,timestamp)
if (ride.isStart) {
this.currentTimestamp = ride.startTime.getMillis();
return ride.startTime.getMillis();
} else {
this.currentTimestamp = ride.endTime.getMillis();
return ride.endTime.getMillis();
}
}
#Nullable
#Override
public Watermark getCurrentWatermark() {
return new Watermark(currentTimestamp == Long.MIN_VALUE ? Long.MIN_VALUE : currentTimestamp - 1);
}
}
}
The reason is that Flink's CEP library currently only checks the timestamps if another element arrives and is processed. The underlying assumption is that you have a steady flow of events.
I think this is a limitation of Flink's CEP library. To work correctly, Flink should register processing time timers with arrivalTime + timeout which trigger the timeout of patterns if no events arrive.

Evaluate only the latest window for event time based sliding windows

I would like to process events in EvenTime using sliding windows. The sliding interval is 24 hours and increment is 30 minutes. The problem is that below code is producing 48 calculations for each event. In our case events are coming in order so we need only the latest window to be evaluated.
Thanks,
Dejan
public static void processEventsa(
DataStream<Tuple2<String, MyEvent>> events) throws Exception {
events.assignTimestampsAndWatermarks(new MyWatermark()).
keyBy(0).
timeWindow(Time.hours(windowSizeHour), Time.seconds(windowSlideSeconds)).
apply(new WindowFunction<Tuple2<String, MyEvent>, Tuple2<String, MyEvent>, Tuple, TimeWindow>() {
#Override
public void apply(Tuple key, TimeWindow window, Iterable<Tuple2<String, MyEvent>> input,
Collector<Tuple2<String, MyEvent>> out) throws Exception {
for (Tuple2<String, MyEvent> record : input) {
}
}
});
}
public class MyWatermark implements
AssignerWithPunctuatedWatermarks<Tuple2<String, MyEvent>> {
#Override
public long extractTimestamp(Tuple2<String, MyEvent> event, long previousElementTimestamp) {
return event.f1.eventTime;
}
#Override
public Watermark checkAndGetNextWatermark(Tuple2<String, MyEvent> event, long previousElementTimestamp) {
return new Watermark(event.f1.eventTime);
}
}
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
The problem was in watermark. AssignerWithPeriodicWatermarks should be used
public class MyWatermark implements
AssignerWithPeriodicWatermarks<Tuple2<String, MyEvent>> {
private final long maxTimeLag = 5000;
#Override
public long extractTimestamp(Tuple2<String, MyEvent> event, long previousElementTimestamp) {
try {
return event.f1.eventTime;
}
catch(NullPointerException ex) {}
return System.currentTimeMillis() - maxTimeLag;
}
#Override
public Watermark getCurrentWatermark() {
return new Watermark(System.currentTimeMillis() - maxTimeLag);
}
}

Apache Flink 1.3 table api rowtime strange behavior

The following code sample not work in 1.3
public class TumblingWindow {
public static void main(String[] args) throws Exception {
List<Content> data = new ArrayList<Content>();
data.add(new Content(1L, "Hi"));
data.add(new Content(2L, "Hallo"));
data.add(new Content(3L, "Hello"));
data.add(new Content(4L, "Hello"));
data.add(new Content(7L, "Hello"));
data.add(new Content(8L, "Hello world"));
data.add(new Content(16L, "Hello world"));
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
final StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
DataStream<Content> stream = env.fromCollection(data);
DataStream<Content> stream2 = stream.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor<Content>(Time.milliseconds(1)) {
/**
*
*/
private static final long serialVersionUID = 410512296011057717L;
#Override
public long extractTimestamp(Content element) {
return element.getRecordTime();
}
});
Table table = tableEnv.fromDataStream(stream2,
"urlKey,httpGetMessageCount,httpPostMessageCount" + ",uplink,downlink,statusCode,statusCodeCount,rowtime.rowtime");
table.window(Tumble.over("1.hours").on("rowtime").as("w")).groupBy("w, urlKey")
.select("w.start,urlKey,uplink.sum,downlink.sum,httpGetMessageCount.sum,httpPostMessageCount.sum ");
env.execute();
}
public static class Content implements Serializable {
private String urlKey;
private long recordTime;
// private String recordTimeStr;
private long httpGetMessageCount;
private long httpPostMessageCount;
private long uplink;
private long downlink;
private long statusCode;
private long statusCodeCount;
public Content() {
super();
}
public Content(long recordTime, String urlKey) {
super();
this.recordTime = recordTime;
this.urlKey = urlKey;
}
public String getUrlKey() {
return urlKey;
}
public void setUrlKey(String urlKey) {
this.urlKey = urlKey;
}
public long getRecordTime() {
return recordTime;
}
public void setRecordTime(long recordTime) {
this.recordTime = recordTime;
}
public long getHttpGetMessageCount() {
return httpGetMessageCount;
}
public void setHttpGetMessageCount(long httpGetMessageCount) {
this.httpGetMessageCount = httpGetMessageCount;
}
public long getHttpPostMessageCount() {
return httpPostMessageCount;
}
public void setHttpPostMessageCount(long httpPostMessageCount) {
this.httpPostMessageCount = httpPostMessageCount;
}
public long getUplink() {
return uplink;
}
public void setUplink(long uplink) {
this.uplink = uplink;
}
public long getDownlink() {
return downlink;
}
public void setDownlink(long downlink) {
this.downlink = downlink;
}
public long getStatusCode() {
return statusCode;
}
public void setStatusCode(long statusCode) {
this.statusCode = statusCode;
}
public long getStatusCodeCount() {
return statusCodeCount;
}
public void setStatusCodeCount(long statusCodeCount) {
this.statusCodeCount = statusCodeCount;
}
}
private class TimestampWithEqualWatermark implements AssignerWithPunctuatedWatermarks<Object[]> {
/**
*
*/
private static final long serialVersionUID = 1L;
#Override
public long extractTimestamp(Object[] element, long previousElementTimestamp) {
// TODO Auto-generated method stub
return (long) element[0];
}
#Override
public Watermark checkAndGetNextWatermark(Object[] lastElement, long extractedTimestamp) {
return new Watermark(extractedTimestamp);
}
}
}
will raise following exception
Exception in thread "main" org.apache.flink.table.api.TableException: The rowtime attribute can only be replace a field with a valid time type, such as Timestamp or Long.
at org.apache.flink.table.api.StreamTableEnvironment$$anonfun$validateAndExtractTimeAttributes$1.apply(StreamTableEnvironment.scala:450)
at org.apache.flink.table.api.StreamTableEnvironment$$anonfun$validateAndExtractTimeAttributes$1.apply(StreamTableEnvironment.scala:440)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.flink.table.api.StreamTableEnvironment.validateAndExtractTimeAttributes(StreamTableEnvironment.scala:440)
at org.apache.flink.table.api.StreamTableEnvironment.registerDataStreamInternal(StreamTableEnvironment.scala:401)
at org.apache.flink.table.api.java.StreamTableEnvironment.fromDataStream(StreamTableEnvironment.scala:88)
at com.taiwanmobile.cep.noc.TumblingWindow.main(TumblingWindow.java:53)
But if I delete statusCodeCount in fromDataStream, this sample runs successfully without Exception.
Table table = tableEnv.fromDataStream(stream2,
"urlKey,httpGetMessageCount,httpPostMessageCount" + ",uplink,downlink,statusCode,statusCodeCount,rowtime.rowtime");
table.window(Tumble.over("1.hours").on("rowtime").as("w")).groupBy("w, urlKey")
.select("w.start,urlKey,uplink.sum,downlink.sum,httpGetMessageCount.sum,httpPostMessageCount.sum ");
Any suggestion?
This is bug that is filed as FLINK-6881. As a workaround you could define your own StreamTableSource that implements DefinedRowtimeAttribute (see also this documentation draft). A table source also nicely hides the underlying DataStream API which makes table programs more compact.

Resources