How to create batch or slide windows using Flink CEP? - apache-flink

I'm just starting with Flink CEP and I come from Esper CEP engine. As you may (or not) know, in Esper using their syntax (EPL) you can create a batch or slide window easily, grouping the events in those windows and allowing you to use this events with functions (avg, max, min, ...).
For example, with the following pattern you can create a batch windows of 5 seconds and calculate the average value of the attribute price of all the Stock events that you have received in that specified window.
select avg(price) from Stock#time_batch(5 sec)
The thing is I would like to know how to implement this on Flink CEP. I'm aware that, probably, the goal or approach in Flink CEP is different, so the way to implement this may not be as simple as in Esper CEP.
I have taken a look at the docs regarding to time windows, but I'm not able to implement this windows along with Flink CEP. So, given the following code:
DataStream<Stock> stream = ...; // Consume events from Kafka
// Filtering events with negative price
Pattern<Stock, ?> pattern = Pattern.<Stock>begin("start")
.where(
new SimpleCondition<Stock>() {
public boolean filter(Stock event) {
return event.getPrice() >= 0;
}
}
);
PatternStream<Stock> patternStream = CEP.pattern(stream, pattern);
/**
CREATE A BATCH WINDOW OF 5 SECONDS IN WHICH
I COMPUTE OVER THE AVERAGE PRICES AND, IF IT IS
GREATER THAN A THREESHOLD, AN ALERT IS DETECTED
return avg(allEventsInWindow.getPrice()) > 1;
*/
DataStream<Alert> result = patternStream.select(
new PatternSelectFunction<Stock, Alert>() {
#Override
public Alert select(Map<String, List<Stock>> pattern) throws Exception {
return new Alert(pattern.toString());
}
}
);
How can I create that window in which, from the first one received, I start to calculate the average for the following events within 5 seconds. For example:
t = 0 seconds
Stock(price = 1); (...starting batch window...)
Stock(price = 1);
Stock(price = 1);
Stock(price = 2);
Stock(price = 2);
Stock(price = 2);
t = 5 seconds (...end of batch window...)
Avg = 1.5 => Alert detected!
The average after 5 seconds would be 1.5, and will trigger the alert. How can I code this?
Thanks!

With Flink's CEP library this behavior is not expressible. I would rather recommend using Flink's DataStream or Table API to calculate the averages. Based on that you could again use CEP to generate other events.
final DataStream<Stock> input = env
.fromElements(
new Stock(1L, 1.0),
new Stock(2L, 2.0),
new Stock(3L, 1.0),
new Stock(4L, 2.0))
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Stock>(Time.seconds(0L)) {
#Override
public long extractTimestamp(Stock element) {
return element.getTimestamp();
}
});
final DataStream<Double> windowAggregation = input
.timeWindowAll(Time.milliseconds(2))
.aggregate(new AggregateFunction<Stock, Tuple2<Integer, Double>, Double>() {
#Override
public Tuple2<Integer, Double> createAccumulator() {
return Tuple2.of(0, 0.0);
}
#Override
public Tuple2<Integer, Double> add(Stock value, Tuple2<Integer, Double> accumulator) {
return Tuple2.of(accumulator.f0 + 1, accumulator.f1 + value.getValue());
}
#Override
public Double getResult(Tuple2<Integer, Double> accumulator) {
return accumulator.f1 / accumulator.f0;
}
#Override
public Tuple2<Integer, Double> merge(Tuple2<Integer, Double> a, Tuple2<Integer, Double> b) {
return Tuple2.of(a.f0 + b.f0, a.f1 + b.f1);
}
});
final DataStream<Double> result = windowAggregation.filter((FilterFunction<Double>) value -> value > THRESHOLD);

Related

Absence of event in Apache Flink CEP

I'm new at Apache Flink CEP and I'm struggle trying to detect a simple absence of event.
What I'm trying to detect is wheter an event of type CurrencyEvent with a certain id does not occur in certain amount of time. I would like to detect the absence of such event every time that after 3000ms the event does not occur.
My pattern code looks as follows:
Pattern<CurrencyEvent, ?> myPattern = Pattern.<Event>begin("CurrencyEvent")
.subtype(CurrencyEvent.class)
.where(new SimpleCondition<CurrencyEvent>() {
#Override
public boolean filter(CurrencyEvent currencyEvent) throws Exception {
return currencyEvent.getId().equalsIgnoreCase("usd");
}
})
.within(Time.milliseconds(3000L));
So now my idea is to use timeout functions in order to detect timeout events:
DataStreamSource<Event> events = env.addSource(new TestSource(
Arrays.asList(
basicCurrencyWithMivLevelEvent("EUR", 100L, Arrays.asList("1", "2"), 200D),
basicCurrencyWithMivLevelEvent("USD", 100L, Arrays.asList("1", "2"), 200D),
basicCurrencyWithMivLevelEvent("EUR", 100L, Arrays.asList("1", "2"), 200D)
),
1636040364820L, // initial timestamp for the first element
7000 // 7 seconds between each event
));
PatternStream<Event> patternStream = CEP.pattern(
events,
(Pattern<Event, ?>) myPattern
);
OutputTag<Alarm> tag = new OutputTag<Alarm>("currency-timeout"){};
PatternFlatTimeoutFunction<Event, Alarm> eventAlarmTimeoutPatternFunction = (patterns, timestamp, ctx) -> {
System.out.println("New alarm, since after 3 seconds an event with id=usd is not detected");
//TODO: call collect
};
PatternFlatSelectFunction<Event, Alarm> eventAlarmPatternSelectFunction = (patterns, ctx) -> {
System.out.println("Select! (we can ignore it) " + patterns);
// ignore matched events
};
return patternStream.flatSelect(
tag,
eventAlarmTimeoutPatternFunction,
TypeInformation.of(Alarm.class),
eventAlarmPatternSelectFunction
);
My Test source is using event timestamps and watermarks, as shown as follows:
public class TestSource implements SourceFunction<Event> {
private final List<Event> events;
private final long initialTimestamp;
private final long timeBetweenInMillis;
public TestSource(List<Event> events, long initialTimestamp, long timeBetweenInMillis){
this.events = events;
this.initialTimestamp = initialTimestamp;
this.timeBetweenInMillis = timeBetweenInMillis;
}
#Override
public void run(SourceContext<Event> sourceContext) throws InterruptedException {
long timestamp = this.initialTimestamp;
for(Event event: this.events){
sourceContext.collectWithTimestamp(event, timestamp);
sourceContext.emitWatermark(new Watermark(timestamp));
timestamp+=this.timeBetweenInMillis;
}
}
#Override
public void cancel() {
}
}
I'm using TimeCharacteristics.EventTime.
Since the the window time (3seconds) is lower than the event time difference between every event (7 seconds), I expect to get some timeout events, but I'm getting 0.
A CEP Pattern matches a sequence of one or more events; the within(interval) clause adds an additional constraint that all of the events in the sequence must occur within the specified interval. When partial matches time out, this can be captured in a TimedOutPartialMatchHandler.
In your case, since a successfully matched Pattern consists of a single event, there can be no partial matches, and a match can never time out. (Your matching sequences are always less than 3 seconds long.)
What you can do is to extend the pattern definition to include a second event, so that to match there must be a start event followed by another event within 3 seconds. When that second event is missing, then you will have a partial match that times out.
For more flexibility than what CEP offers for implementing use cases involving missing events, you can use a KeyedProcessFunction with timers.

How to add a custom WatermarkGenerator to a WatermarkStrategy

I'm using Apache Flink 1.11 and want to use some custom WatermarkGenerator.
With the Watermarkstrategy, you can add built-in WatermarkGenerators with ease:
WatermarkStrategy.forMonotonousTimestamps();
WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(10));
In the documentation, you can see how to implement a custom Watermarkgenerator, for example a Periodic WatermarkGenerator:
public class BoundedOutOfOrdernessGenerator implements WatermarkGenerator<MyEvent> {
private final long maxOutOfOrderness = 3500; // 3.5 seconds
private long currentMaxTimestamp;
#Override
public void onEvent(MyEvent event, long eventTimestamp, WatermarkOutput output) {
currentMaxTimestamp = Math.max(currentMaxTimestamp, eventTimestamp);
}
#Override
public void onPeriodicEmit(WatermarkOutput output) {
// emit the watermark as current highest timestamp minus the out-of-orderness bound
output.emitWatermark(new Watermark(currentMaxTimestamp - maxOutOfOrderness - 1));
}
}
How can i add this custom BoundedOutOfOrdernessGenerator to a Watermarkstrategy?
A WatermarkStrategy is the thing you need to define. So assuming you have some class MyWatermarkGenerator that implements WatermarkGenerator<MyEvent>, then you'd do something like:
WatermarkStrategy<WatermarkedRecord> ws = (ctx -> new MyWatermarkGenerator());
...
DataStream<MyEvent> ds = xxx;
ds.assignTimestampsAndWatermarks(ws);
Note that unless your source is setting up timestamps for you (e.g. Kafka record timestamps), you'll want to add a timestamp extractor to your WatermarkStrategy, as in...
WatermarkStrategy<WatermarkedRecord> ws = (ctx -> new MyWatermarkGenerator());
ws = ws.withTimestampAssigner((r, ts) -> r.getTimestamp());

How to adjust colors in the UI of OptaPlanner?

I am currently using the OptaPlanner's job schedule algorithm to create a certain planning. I want every execution mode used in the planning to be shown in a different color (instead of all different projects to be shown in different colors). Is it possible to implement this and if so, how? I have been searching through the code for a while now and have no idea how to do this.
This cannot be done easily with the Project Scheduling Swing application that's part of OptaPlanner project. It plots the data using JFreeChart and I couldn't find a simple way to associate metadata (like color) with the data that's being plotted.
You can override YIntervalRenderer behavior to return color of your choice based on data item's row (seriesIndex) and column (item's index in the series) but you have to keep the mapping between execution mode and [row, column] yourself, which is cumbersome.
Here's an example of modified ProjectJobSchedulingPanel that does the above:
public class ProjectJobSchedulingPanel extends SolutionPanel<Schedule> {
private static final Logger logger = LoggerFactory.getLogger(ProjectJobSchedulingPanel.class);
private static final Paint[] PAINT_SEQUENCE = DefaultDrawingSupplier.DEFAULT_PAINT_SEQUENCE;
public static final String LOGO_PATH = "/org/optaplanner/examples/projectjobscheduling/swingui/projectJobSchedulingLogo.png";
public ProjectJobSchedulingPanel() {
setLayout(new BorderLayout());
}
#Override
public void resetPanel(Schedule schedule) {
removeAll();
ChartPanel chartPanel = new ChartPanel(createChart(schedule));
add(chartPanel, BorderLayout.CENTER);
}
private JFreeChart createChart(Schedule schedule) {
YIntervalSeriesCollection seriesCollection = new YIntervalSeriesCollection();
Map<Project, YIntervalSeries> projectSeriesMap = new LinkedHashMap<>(
schedule.getProjectList().size());
ExecutionMode[][] executionModeByRowAndColumn = new ExecutionMode[schedule.getProjectList().size()][schedule.getAllocationList().size()];
YIntervalRenderer renderer = new YIntervalRenderer() {
#Override
public Paint getItemPaint(int row, int column) {
ExecutionMode executionMode = executionModeByRowAndColumn[row][column];
logger.info("getItemPaint: ExecutionMode [{},{}]: {}", row, column, executionMode);
return executionMode == null
? TangoColorFactory.ALUMINIUM_5
: PAINT_SEQUENCE[(int) (executionMode.getId() % PAINT_SEQUENCE.length)];
}
};
Map<Project, Integer> seriesIndexByProject = new HashMap<>();
int maximumEndDate = 0;
int seriesIndex = 0;
for (Project project : schedule.getProjectList()) {
YIntervalSeries projectSeries = new YIntervalSeries(project.getLabel());
seriesCollection.addSeries(projectSeries);
projectSeriesMap.put(project, projectSeries);
renderer.setSeriesShape(seriesIndex, new Rectangle());
renderer.setSeriesStroke(seriesIndex, new BasicStroke(3.0f));
seriesIndexByProject.put(project, seriesIndex);
seriesIndex++;
}
for (Allocation allocation : schedule.getAllocationList()) {
int startDate = allocation.getStartDate();
int endDate = allocation.getEndDate();
YIntervalSeries projectSeries = projectSeriesMap.get(allocation.getProject());
int column = projectSeries.getItemCount();
executionModeByRowAndColumn[seriesIndexByProject.get(allocation.getProject())][column] = allocation.getExecutionMode();
logger.info("ExecutionMode [{},{}] = {}", seriesIndexByProject.get(allocation.getProject()), column, allocation.getExecutionMode());
projectSeries.add(allocation.getId(), (startDate + endDate) / 2.0,
startDate, endDate);
maximumEndDate = Math.max(maximumEndDate, endDate);
}
NumberAxis domainAxis = new NumberAxis("Job");
domainAxis.setStandardTickUnits(NumberAxis.createIntegerTickUnits());
domainAxis.setRange(-0.5, schedule.getAllocationList().size() - 0.5);
domainAxis.setInverted(true);
NumberAxis rangeAxis = new NumberAxis("Day (start to end date)");
rangeAxis.setRange(-0.5, maximumEndDate + 0.5);
XYPlot plot = new XYPlot(seriesCollection, domainAxis, rangeAxis, renderer);
plot.setOrientation(PlotOrientation.HORIZONTAL);
// Uncomment this to use Tango color sequence instead of JFreeChart default sequence.
// This results in color per project mode.
// DefaultDrawingSupplier drawingSupplier = new DefaultDrawingSupplier(
// TangoColorFactory.SEQUENCE_1,
// DefaultDrawingSupplier.DEFAULT_FILL_PAINT_SEQUENCE,
// DefaultDrawingSupplier.DEFAULT_OUTLINE_PAINT_SEQUENCE,
// DefaultDrawingSupplier.DEFAULT_STROKE_SEQUENCE,
// DefaultDrawingSupplier.DEFAULT_OUTLINE_STROKE_SEQUENCE,
// DefaultDrawingSupplier.DEFAULT_SHAPE_SEQUENCE);
// plot.setDrawingSupplier(drawingSupplier);
return new JFreeChart("Project Job Scheduling", JFreeChart.DEFAULT_TITLE_FONT, plot, true);
}
}
Result:
Another approach would be to implement JFreeChart interfaces and make custom Dataset and Renderer so that you could plot Allocations directly. Similar to the Gantt chart implementaion in JFreeChart.
Or write your custom UI from the ground up. Depends op how much effort you're willing to put into it :)

Is there a work-around to handle multiple "temporal constraints" in Flink CEP?

As stated in CEP document (https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/cep.html) that only one temporal constraint is allowed in a pattern sequence, I'm struggling to find out a way to handle a business case that contains 2 temporal constraints.
I need to monitor some business events and alert on the events that meet the following rules:
a new account is registered
the account gets authenticated in 5 minutes after registration
the account completes at least 2 transactions which transaction amount is greater than 1000.00 in next 1 hour.
And the code is something like this:
Pattern<Event, ?> pattern = Pattern.<Event>begin("register").where(new SimpleCondition<Event>() {
#Override
public boolean filter<Event value> throws Exception {
return (value.getEventType() == EventType.REGISTER);
}
}).followedBy("authentication").where(new SimpleCondition<Event>() {
#Override
public boolean filter<Event value> throws Exception {
return (value.getEventType() == EventType.AUTHENTICATION);
}
}).where(new IterativeCondition<Event>() {
#Override
public boolean filter(Event value, Context<Event> ctx) throws Exception {
for (Event event : ctx.getEventsForPattern("register")) {
if (value.getEventTime() - event.getEventTime() <= 1000 * 60 * 5) {
return true;
}
}
return false;
}
}).followedBy("transaction").where(new SimpleCondition<Event>() {
#Override
public boolean filter<Event value> throws Exception {
return (value.getEventType() == EventType.TRANSACTION && value.getAmount() > 1000.00);
}
}).where(new IterativeCondition<Event>() {
#Override
public boolean filter(Event value, Context<Event> ctx) throws Exception {
for (Event event : ctx.getEventsForPattern("authentication")) {
if (value.getEventTime() - event.getEventTime() <= 1000 * 60 * 60) {
return true;
}
}
return false;
}
}).timesOrMore(2);
You can see that I use 2 IterativeConditions to handle the temporal constraints. Is there a better way to make the code more concise?
As you said you can apply only one time constraint to whole pattern right now in CEP library. What you could do though is to split you pattern into 2 sub patterns. First apply pattern that will look for REGISTER -> AUTHENTICATE and generate complex event out of those (let's name it REGISTER_AUTHENTICATED). And then use it in the subsequent pattern REGISTER_AUTHENTICATED -> 2* TRANSACTIONS.
Then you can apply two time constraints to both of those patterns.

How to report a value in real time in Fink?

I want three values, they are aggValueInLastHour aggValueInLastDay aggValueInLastThreeDay.
I've tried like below.
But I don't want to wait, means that I'm not prefer to use sliding window to do aggregation.(3 day window must wait three days' data, this is unbearable for our system.)
How can I get last 3 day aggregation value when first event come?
Thanks for any advice in advance!
If you want to get more frequent updates you can use QueryableState, polling the state at a rate that suits your use case.
You can make use of the ContinuousEventTimeTrigger, which will cause your window to fire on a shorter time period than the the full window, allowing you to see the intermediate state. You can optionally wrap that in a PurgingTrigger if the downstream consumers of your sink are expecting each output to be a partial aggregation (rather than the full current state) and sums them up.
I've tried CEP.
code:
AfterMatchSkipStrategy strategy = AfterMatchSkipStrategy.skipShortOnes();
Pattern<RiskEvent, ?> loginPattern = Pattern.<RiskEvent>begin("start", strategy)
.where(eventTypeCondition)
.timesOrMore(1)
.greedy()
.within(Time.hours(1));
KeyedStream<RiskEvent, String> keyedStream = dataStream.keyBy(new KeySelector<RiskEvent, String>() {
#Override
public String getKey(RiskEvent riskEvent) throws Exception {
// key by user for aggregation
return riskEvent.getEventType() + riskEvent.getDeviceFp();
}
});
PatternStream<RiskEvent> eventPatternStream = CEP.pattern(keyedStream, loginPattern);
eventPatternStream.select(new PatternSelectFunction<RiskEvent, RiskResult>() {
#Override
public RiskResult select(Map<String, List<RiskEvent>> map) throws Exception {
List<RiskEvent> list = map.get("start");
ArrayList<Long> times = new ArrayList<>();
for (RiskEvent riskEvent : list) {
times.add(riskEvent.getEventTime());
}
Long min = Collections.min(times);
Long max = Collections.max(times);
Set<String> accountList = list.stream().map(RiskEvent::getUserName).collect(Collectors.toSet());
logger.info("时间范围:" + new Date(min) + " --- " + new Date(max) + " 事件:" + list.get(0).getEventType() + ", 设备指纹:" + list.get(0).getDeviceFp() + ", 关联账户:" + accountList.toString());
return null;
}
});
maybe you notice that, the skip strategy skipShortOnes is a customized strategy.
Show you my modification in CEP lib.
add strategy in Enum.
public enum SkipStrategy{
NO_SKIP,
SKIP_PAST_LAST_EVENT,
SKIP_TO_FIRST,
SKIP_TO_LAST,
SKIP_SHORT_ONES
}
add access method in AfterMatchSkipStrategy.java
public static AfterMatchSkipStrategy skipShortOnes() {
return new AfterMatchSkipStrategy(SkipStrategy.SKIP_SHORT_ONES);
}
add strategy actions in discardComputationStatesAccordingToStrategy method at NFA.java.
case SKIP_SHORT_ONES:
int i = 0;
List>> tempResult = new ArrayList<>(matchedResult);
for (Map> resultMap : tempResult) {
if (i++ == 0) {
continue;
}
matchedResult.remove(resultMap);
}
break;

Resources