Is there a work-around to handle multiple "temporal constraints" in Flink CEP? - flink-streaming

As stated in CEP document (https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/cep.html) that only one temporal constraint is allowed in a pattern sequence, I'm struggling to find out a way to handle a business case that contains 2 temporal constraints.
I need to monitor some business events and alert on the events that meet the following rules:
a new account is registered
the account gets authenticated in 5 minutes after registration
the account completes at least 2 transactions which transaction amount is greater than 1000.00 in next 1 hour.
And the code is something like this:
Pattern<Event, ?> pattern = Pattern.<Event>begin("register").where(new SimpleCondition<Event>() {
#Override
public boolean filter<Event value> throws Exception {
return (value.getEventType() == EventType.REGISTER);
}
}).followedBy("authentication").where(new SimpleCondition<Event>() {
#Override
public boolean filter<Event value> throws Exception {
return (value.getEventType() == EventType.AUTHENTICATION);
}
}).where(new IterativeCondition<Event>() {
#Override
public boolean filter(Event value, Context<Event> ctx) throws Exception {
for (Event event : ctx.getEventsForPattern("register")) {
if (value.getEventTime() - event.getEventTime() <= 1000 * 60 * 5) {
return true;
}
}
return false;
}
}).followedBy("transaction").where(new SimpleCondition<Event>() {
#Override
public boolean filter<Event value> throws Exception {
return (value.getEventType() == EventType.TRANSACTION && value.getAmount() > 1000.00);
}
}).where(new IterativeCondition<Event>() {
#Override
public boolean filter(Event value, Context<Event> ctx) throws Exception {
for (Event event : ctx.getEventsForPattern("authentication")) {
if (value.getEventTime() - event.getEventTime() <= 1000 * 60 * 60) {
return true;
}
}
return false;
}
}).timesOrMore(2);
You can see that I use 2 IterativeConditions to handle the temporal constraints. Is there a better way to make the code more concise?

As you said you can apply only one time constraint to whole pattern right now in CEP library. What you could do though is to split you pattern into 2 sub patterns. First apply pattern that will look for REGISTER -> AUTHENTICATE and generate complex event out of those (let's name it REGISTER_AUTHENTICATED). And then use it in the subsequent pattern REGISTER_AUTHENTICATED -> 2* TRANSACTIONS.
Then you can apply two time constraints to both of those patterns.

Related

Can I use Flink CEP to sort a stream?

I know I can use Flink SQL to sort a stream by timestamp, but as I'm already using CEP, I'd like to use it for sorting instead.
Sorting with CEP is pretty easy, since CEP always sorts its input by timestamp. Something like this will do the trick:
DataStream<Event> streamWithTimestampsAndWatermarks = ...
Pattern<Event, ?> matchEverything =
Pattern.<Event>begin("any")
.where(new SimpleCondition<Event>() {
#Override
public boolean filter(Event event) throws Exception {
return true;
}
});
PatternStream<Event> patternStream = CEP.pattern(
streamWithTimestampsAndWatermarks, matchEverything);
SingleOutputStreamOperator<Event> sorted = patternStream
.select(new PatternSelectFunction<Event, Event>() {
#Override
public Event select(Map<String, List<Event>> map) throws Exception {
return map.get("any").get(0);
}
});
If you want to sort the stream key-by-key, rather than globally, then use keyBy before applying a pattern to it.

Absence of event in Apache Flink CEP

I'm new at Apache Flink CEP and I'm struggle trying to detect a simple absence of event.
What I'm trying to detect is wheter an event of type CurrencyEvent with a certain id does not occur in certain amount of time. I would like to detect the absence of such event every time that after 3000ms the event does not occur.
My pattern code looks as follows:
Pattern<CurrencyEvent, ?> myPattern = Pattern.<Event>begin("CurrencyEvent")
.subtype(CurrencyEvent.class)
.where(new SimpleCondition<CurrencyEvent>() {
#Override
public boolean filter(CurrencyEvent currencyEvent) throws Exception {
return currencyEvent.getId().equalsIgnoreCase("usd");
}
})
.within(Time.milliseconds(3000L));
So now my idea is to use timeout functions in order to detect timeout events:
DataStreamSource<Event> events = env.addSource(new TestSource(
Arrays.asList(
basicCurrencyWithMivLevelEvent("EUR", 100L, Arrays.asList("1", "2"), 200D),
basicCurrencyWithMivLevelEvent("USD", 100L, Arrays.asList("1", "2"), 200D),
basicCurrencyWithMivLevelEvent("EUR", 100L, Arrays.asList("1", "2"), 200D)
),
1636040364820L, // initial timestamp for the first element
7000 // 7 seconds between each event
));
PatternStream<Event> patternStream = CEP.pattern(
events,
(Pattern<Event, ?>) myPattern
);
OutputTag<Alarm> tag = new OutputTag<Alarm>("currency-timeout"){};
PatternFlatTimeoutFunction<Event, Alarm> eventAlarmTimeoutPatternFunction = (patterns, timestamp, ctx) -> {
System.out.println("New alarm, since after 3 seconds an event with id=usd is not detected");
//TODO: call collect
};
PatternFlatSelectFunction<Event, Alarm> eventAlarmPatternSelectFunction = (patterns, ctx) -> {
System.out.println("Select! (we can ignore it) " + patterns);
// ignore matched events
};
return patternStream.flatSelect(
tag,
eventAlarmTimeoutPatternFunction,
TypeInformation.of(Alarm.class),
eventAlarmPatternSelectFunction
);
My Test source is using event timestamps and watermarks, as shown as follows:
public class TestSource implements SourceFunction<Event> {
private final List<Event> events;
private final long initialTimestamp;
private final long timeBetweenInMillis;
public TestSource(List<Event> events, long initialTimestamp, long timeBetweenInMillis){
this.events = events;
this.initialTimestamp = initialTimestamp;
this.timeBetweenInMillis = timeBetweenInMillis;
}
#Override
public void run(SourceContext<Event> sourceContext) throws InterruptedException {
long timestamp = this.initialTimestamp;
for(Event event: this.events){
sourceContext.collectWithTimestamp(event, timestamp);
sourceContext.emitWatermark(new Watermark(timestamp));
timestamp+=this.timeBetweenInMillis;
}
}
#Override
public void cancel() {
}
}
I'm using TimeCharacteristics.EventTime.
Since the the window time (3seconds) is lower than the event time difference between every event (7 seconds), I expect to get some timeout events, but I'm getting 0.
A CEP Pattern matches a sequence of one or more events; the within(interval) clause adds an additional constraint that all of the events in the sequence must occur within the specified interval. When partial matches time out, this can be captured in a TimedOutPartialMatchHandler.
In your case, since a successfully matched Pattern consists of a single event, there can be no partial matches, and a match can never time out. (Your matching sequences are always less than 3 seconds long.)
What you can do is to extend the pattern definition to include a second event, so that to match there must be a start event followed by another event within 3 seconds. When that second event is missing, then you will have a partial match that times out.
For more flexibility than what CEP offers for implementing use cases involving missing events, you can use a KeyedProcessFunction with timers.

How to create batch or slide windows using Flink CEP?

I'm just starting with Flink CEP and I come from Esper CEP engine. As you may (or not) know, in Esper using their syntax (EPL) you can create a batch or slide window easily, grouping the events in those windows and allowing you to use this events with functions (avg, max, min, ...).
For example, with the following pattern you can create a batch windows of 5 seconds and calculate the average value of the attribute price of all the Stock events that you have received in that specified window.
select avg(price) from Stock#time_batch(5 sec)
The thing is I would like to know how to implement this on Flink CEP. I'm aware that, probably, the goal or approach in Flink CEP is different, so the way to implement this may not be as simple as in Esper CEP.
I have taken a look at the docs regarding to time windows, but I'm not able to implement this windows along with Flink CEP. So, given the following code:
DataStream<Stock> stream = ...; // Consume events from Kafka
// Filtering events with negative price
Pattern<Stock, ?> pattern = Pattern.<Stock>begin("start")
.where(
new SimpleCondition<Stock>() {
public boolean filter(Stock event) {
return event.getPrice() >= 0;
}
}
);
PatternStream<Stock> patternStream = CEP.pattern(stream, pattern);
/**
CREATE A BATCH WINDOW OF 5 SECONDS IN WHICH
I COMPUTE OVER THE AVERAGE PRICES AND, IF IT IS
GREATER THAN A THREESHOLD, AN ALERT IS DETECTED
return avg(allEventsInWindow.getPrice()) > 1;
*/
DataStream<Alert> result = patternStream.select(
new PatternSelectFunction<Stock, Alert>() {
#Override
public Alert select(Map<String, List<Stock>> pattern) throws Exception {
return new Alert(pattern.toString());
}
}
);
How can I create that window in which, from the first one received, I start to calculate the average for the following events within 5 seconds. For example:
t = 0 seconds
Stock(price = 1); (...starting batch window...)
Stock(price = 1);
Stock(price = 1);
Stock(price = 2);
Stock(price = 2);
Stock(price = 2);
t = 5 seconds (...end of batch window...)
Avg = 1.5 => Alert detected!
The average after 5 seconds would be 1.5, and will trigger the alert. How can I code this?
Thanks!
With Flink's CEP library this behavior is not expressible. I would rather recommend using Flink's DataStream or Table API to calculate the averages. Based on that you could again use CEP to generate other events.
final DataStream<Stock> input = env
.fromElements(
new Stock(1L, 1.0),
new Stock(2L, 2.0),
new Stock(3L, 1.0),
new Stock(4L, 2.0))
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Stock>(Time.seconds(0L)) {
#Override
public long extractTimestamp(Stock element) {
return element.getTimestamp();
}
});
final DataStream<Double> windowAggregation = input
.timeWindowAll(Time.milliseconds(2))
.aggregate(new AggregateFunction<Stock, Tuple2<Integer, Double>, Double>() {
#Override
public Tuple2<Integer, Double> createAccumulator() {
return Tuple2.of(0, 0.0);
}
#Override
public Tuple2<Integer, Double> add(Stock value, Tuple2<Integer, Double> accumulator) {
return Tuple2.of(accumulator.f0 + 1, accumulator.f1 + value.getValue());
}
#Override
public Double getResult(Tuple2<Integer, Double> accumulator) {
return accumulator.f1 / accumulator.f0;
}
#Override
public Tuple2<Integer, Double> merge(Tuple2<Integer, Double> a, Tuple2<Integer, Double> b) {
return Tuple2.of(a.f0 + b.f0, a.f1 + b.f1);
}
});
final DataStream<Double> result = windowAggregation.filter((FilterFunction<Double>) value -> value > THRESHOLD);

Processing multiple patterns in Flink CEP in Parallel

I have following case scenario
There are 2 Virtual machines which are sending streams to Kafka which are being received by CEP engine where warnings are generated when particular conditions are satisfied on the individual Stream.
Currently, CEP is checking for same conditions on both streams( when heart rate > 65 and respiration rate > 68) for both patients and raising alarms in Parallel as shown below
// detecting pattern
Pattern<joinEvent, ? > pattern = Pattern.<joinEvent>begin("start")
.subtype(joinEvent.class).where(new FilterFunction<joinEvent>() {
#Override
public boolean filter(joinEvent joinEvent) throws Exception {
return joinEvent.getHeartRate() > 65 ;
}
})
.subtype(joinEvent.class)
.where(new FilterFunction<joinEvent>() {
#Override
public boolean filter(joinEvent joinEvent) throws Exception {
return joinEvent.getRespirationRate() > 68;
}
}).within(Time.milliseconds(100));
But I want to use different conditions for both Streams. For example, I would like to raise alarm if
For patient A : if heart rate > 65 and Respiration Rate > 68
For patient B : if heart rate > 75 and Respiration Rate > 78
How do I achieve this ? do I need to create multiple stream environments or multiple patterns in the same environment.
For your requirements, you can create 2 different patterns to have clear separation if you want.
If you want to perform this with the same pattern then it would be possible as well. To do this, read all your kafka topics in one kafka source:
FlinkKafkaConsumer010<JoinEvent> kafkaSource = new FlinkKafkaConsumer010<>(
Arrays.asList("topic1", "topic2"),
new StringSerializerToEvent(),
props);
Here I am assuming that the structure of your event from both the topics are the same and you have the patient name as well as part of the event which is trasnmitted.
Once you did that, it becomes easy as you just need to create a pattern with "Or", something like the following:
Pattern.<JoinEvent>begin("first")
.where(new SimpleCondition<JoinEvent>() {
#Override
public boolean filter(JoinEvent event) throws Exception {
return event.getPatientName().equals("A") && event.getHeartRate() > 65 && joinEvent.getRespirationRate() > 68;
}
})
.or(new SimpleCondition<JoinEvent>() {
#Override
public boolean filter(JoinEvent event) throws Exception {
return event.getPatientName().equals("B") && event.getHeartRate() > 75 && joinEvent.getRespirationRate() > 78;
}
});
This would produce a match whenever your condition matches. Although, I am not really sure what ".within(Time.milliseconds(100))" is achieving in your example.

Nullpointerexception throws when inserting entity using Auto-generated Classendpoint insert method

I am confused to using auto-generated endpoint class. I want to use generated endpoint to insert new object into datastore. But, an exception is throwing.
fooEndpoint.insertFoo(foo); // throws null pointer exception
My entity class is similar with the given example at this source: https://developers.google.com/appengine/docs/java/datastore/jpa/overview.
Here is my entity:
#Entity
public class Foo {
#Id
#GeneratedValue(strategy=GenerationType.IDENTITY)
private Key ID;
Here is the stack trace:
java.lang.NullPointerException
at org.datanucleus.api.jpa.JPAEntityManager.find(JPAEntityManager.java:318)
at org.datanucleus.api.jpa.JPAEntityManager.find(JPAEntityManager.java:256)
at com.FooEndpoint.containsFoo(FooEndpoint.java:150)
at com.FooEndpoint.insertFoo(FooEndpoint.java:96)
On the other side, I can insert new object when I use the EntityManager persist method. Because, this does not check exist or not on the datastore.
I expect that, classEndpoint insert method should save the object and assing auto key to ID field.
Or I need to initialize the ID field.
Here is auto-generated endpoint class insertFoo method.
/**
* This inserts a new entity into App Engine datastore. If the entity already
* exists in the datastore, an exception is thrown.
* It uses HTTP POST method.
*
* #param foo the entity to be inserted.
* #return The inserted entity.
*/
public Foo insertFoo(Foo foo) {
EntityManager mgr = getEntityManager();
try {
if (containsFoo(foo)) {
throw new EntityExistsException("Object already exists");
}
mgr.persist(foo);
} finally {
mgr.close();
}
return foo;
}
Here is the containsFoo method
private boolean containsFoo(Foo foo) {
EntityManager mgr = getEntityManager();
boolean contains = true;
try {
Foo item = mgr.find(Foo.class, foo.getID()); // exception occurs here
if (item == null) {
contains = false;
}
} finally {
mgr.close();
}
return contains;
}
foo.getID() is null. Because, it is new object. I am expecting that, app engine creates a key for it. Or I need to explicitly create a key for it?
Other fields in Foo class are simple types such as String and booleans.
Thanks for your time.
I had exactly the same problem.
I will present the way I worked around it.
Original auto-generated Endpoints class relevant code:
private boolean containsFoo(Foo foo) {
EntityManager mgr = getEntityManager();
boolean contains = true;
try {
Foo item = mgr.find(Foo.class, foo.getID());
if (item == null) {
contains = false;
}
} finally {
mgr.close();
}
return contains;
}
Changed relevant code to include a null check for the entity object that is passed as an argument.
private boolean containsFoo(Foo foo) {
EntityManager mgr = getEntityManager();
boolean contains = true;
try {
// If no ID was set, the entity doesn't exist yet.
if(foo.getID() == null)
return false;
Foo item = mgr.find(Foo.class, foo.getID());
if (item == null) {
contains = false;
}
} finally {
mgr.close();
}
return contains;
}
This way it will work as supposed, although I'm confident that more experienced answers and explanations will appear.
I was having the same exact problem after using the Eclipse Plugin to autogenerate the cloud endpoints (by selecting "Google > Generate Cloud Endpoint Class").
Following your advice, I added:
if(foo.getID() == null) // replace foo with the name of your own object
return false;
The problem was solved.
How is that Google hasn't updated the autogenerated code yet as this must be a highly recurring issue?
Thanks for the solution.

Resources