Flink Complex Event Processing - apache-flink

I have a flink cep code that reads from socket and detects for a pattern. Lets say the pattern(word) is 'alert'. If the word alert occurs five times or more, an alert should be created. But I am getting an input mismatch error. Flink version is 1.3.0. Thanks in advance !!
package pattern;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.List;
import java.util.Map;
public class cep {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> dss = env.socketTextStream("localhost", 3005);
dss.print();
Pattern<String,String> pattern = Pattern.<String> begin("first")
.where(new IterativeCondition<String>() {
#Override
public boolean filter(String word, Context<String> context) throws Exception {
return word.equals("alert");
}
})
.times(5);
PatternStream<String> patternstream = CEP.pattern(dss, pattern);
DataStream<String> alerts = patternstream
.flatSelect((Map<String,List<String>> in, Collector<String> out) -> {
String first = in.get("first").get(0);
for (int i = 0; i < 6; i++ ) {
out.collect(first);
}
});
alerts.print();
env.execute();
}
}

Just some clarification on the original problem. In 1.3.0 there was a bug that made using lambdas as arguments to select/flatSelect impossible.
It was fixed in 1.3.1, so your first version of the code would work with 1.3.1.
Besides I think you misinterpret the times quantifier. It matches exact number of times. So in your case it will return only when event will be matched exactly 3 times, not 3 or more.

So I have got the code to work. Here is the working solution,
package pattern;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.List;
import java.util.Map;
public class cep {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> dss = env.socketTextStream("localhost", 3005);
dss.print();
Pattern<String,String> pattern = Pattern.<String> begin("first")
.where(new IterativeCondition<String>() {
#Override
public boolean filter(String word, Context<String> context) throws Exception {
return word.equals("alert");
}
})
.times(5);
PatternStream<String> patternstream = CEP.pattern(dss, pattern);
DataStream<String> alerts = patternstream
.select(new PatternSelectFunction<String, String>() {
#Override
public String select(Map<String, List<String>> in) throws Exception {
String first = in.get("first").get(0);
if(first.equals("alert")){
return ("5 or more alerts");
}
else{
return (" ");
}
}
});
alerts.print();
env.execute();
}
}

Related

Check pointing is not working with dynamodb streams record in flink

How do a checkpoint the processed records in apache flink? same messages are being consumed at regular intervals.
Do I need to explicitly checkpoint each message post consumption?
I can see the eventId and sequenceNumber are matching for multiple messages being consumed.
It seems the checkpointing is not done and so same messages are retrieved from steams at regular intervals.
Here is the code
package com.flink.basics;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.kinesis.shaded.com.amazonaws.services.dynamodbv2.model.AttributeValue;
import org.apache.flink.kinesis.shaded.com.amazonaws.services.dynamodbv2.model.Record;
import org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.clientlibrary.lib.worker.PreparedCheckpointer;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
import org.apache.flink.streaming.connectors.kinesis.FlinkDynamoDBStreamsConsumer;
import org.apache.flink.streaming.connectors.kinesis.config.AWSConfigConstants;
import org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants;
import org.apache.flink.streaming.connectors.kinesis.serialization.DynamoDBStreamsSchema;
import org.apache.flink.util.Collector;
import java.nio.file.Paths;
import java.util.Collections;
import java.util.Properties;
public class DynamoDbConsumer {
public static void main(String[] args) throws Exception {
Properties consumerConfig = new Properties();
consumerConfig.put(AWSConfigConstants.AWS_REGION, "us-east-1");
consumerConfig.put(AWSConfigConstants.AWS_ACCESS_KEY_ID, "aws_access_key_id");
consumerConfig.put(AWSConfigConstants.AWS_SECRET_ACCESS_KEY, "aws_secret_access_key");
consumerConfig.put(AWSConfigConstants.AWS_ENDPOINT, "http://localhost:4566");
consumerConfig.put(ConsumerConfigConstants.STREAM_INITIAL_POSITION, "LATEST");
System.setProperty("com.amazonaws.sdk.disableCbor", "true");
System.setProperty("org.apache.flink.kinesis.shaded.com.amazonaws.sdk.disableCbor", "true");
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(1000, CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// File based Backend
env.setStateBackend(new FsStateBackend(Paths.get("/Users/polimea/flink-basics/stbackend").toUri(), false));
FlinkDynamoDBStreamsConsumer<Record> flinkConsumer = new FlinkDynamoDBStreamsConsumer<Record>(
Collections.singletonList("arn:aws:dynamodb:us-east-1:000000000000:table/FDXTable/stream/2022-05-24T00:18:12.500"),
new DynamoDBStreamsSchema(), consumerConfig);
DataStream<Record> kinesisDBStream = env.addSource(flinkConsumer);
KeyedStream<Record, String> snapshotKeyedStream = kinesisDBStream.keyBy((KeySelector<Record, String>)
record -> record.getDynamodb().getNewImage().get("SNP").getS());
SingleOutputStreamOperator<Tuple2<String, Record>> records = snapshotKeyedStream.process(new StatefulReduceFunc());
records.print();
records.addSink(new DiscardingSink<>());
snapshotKeyedStream.process(new KeyedProcessFunction<String, Record, Object>() {
#Override
public void processElement(Record record, KeyedProcessFunction<String, Record, Object>.Context context,
Collector<Object> collector) throws Exception {
}
});
// kinesisDBStream.print();
env.execute("Stream for buffering dynamodb records till snapshot is committed");
}
private static class StatefulReduceFunc extends KeyedProcessFunction<String, Record, Tuple2<String, Record>> {
private transient ListState<Record> records;
public void open(Configuration parameters) {
ListStateDescriptor<Record> listStateDescriptor =
new ListStateDescriptor<>("records", Record.class);
records = getRuntimeContext().getListState(listStateDescriptor);
}
#Override
public void processElement(Record record, Context context,
Collector<Tuple2<String, Record>> collector) throws Exception {
Iterable<Record> recordIterator = this.records.get();
AttributeValue snCommitted = record.getDynamodb().getNewImage().get("SNCommitted");
if (snCommitted != null && snCommitted.getBOOL()) {
for (Record recordInList : recordIterator) {
collector.collect(new Tuple2<>(record.getDynamodb().getNewImage().get("SNP").getS(), recordInList));
}
} else {
records.add(record);
}
}
}
}
Not sure if this is related to your issue but the code you provided will buffer the records forever. I think what you want is to emit records and clear the state once commit message comes. Something along those lines
// ...
if (snCommitted != null && snCommitted.getBOOL()) {
var snp = record.getDynamodb().getNewImage().get("SNP").getS();
for (Record recordInList : recordIterator) {
collector.collect(new Tuple2<>(snp, recordInList));
}
// explicitly clear the buffer not to emit same events over and over again
records.clear();
}
// ...

When the Flink+Redisson runs abnormally and repeated retries, the Redis connections counts will continue to grow and eventually reach the upper limit

I encountered a problem in the process of integrating Flink and Redisson. When the task encounters abnormalities and keeps retries, it will cause the number of Redis Clients to increase volatility (sometimes the number increases, sometimes the number decreases, but the overall trend is growth). Even if I shutdown the Redisson Instance by overwriting the close function , the number of Redis-Clients cannot be prevented from continuing to grow, and eventually the number of Clients will reach the upper limit and an error will be thrown. Moreover, this situation only occurs in the Flink cluster operation mode, and the number of Redis-Clients will remain stable in the local mode. The test code is below. I wonder if you can provide specific reasons and solutions for this situation, thank you.
flink version:1.13.0
redisson version:3.16.1
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import java.util.Properties;
import java.util.Random;
public class ExceptionTest {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(new Configuration());
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
env.enableCheckpointing(1000 * 60);
DataStream<String> mock = createDataStream(env);
mock.keyBy(x -> new Random().nextInt(20))
.process(new ExceptionTestFunction())
.uid("batch-query-key-process")
.filter(x->x!=null)
.print();
env.execute("Exception-Test");
}
private static DataStream<String> createDataStream(StreamExecutionEnvironment env) {
String topic = "test_topic_xhb03";
Properties test = new Properties();
test.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker");
test.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "group");
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(topic, new SimpleStringSchema(), test);
consumer.setStartFromLatest();
DataStream<String> source = env.addSource(consumer);
return source;
}
}
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;
import org.redisson.Redisson;
import org.redisson.api.RedissonRxClient;
import org.redisson.config.Config;
#Slf4j
public class ExceptionTestFunction extends KeyedProcessFunction<Integer, String, String> {
private RedissonRxClient redisson;
#Override
public void close() {
this.redisson.shutdown();
log.info(String.format("Shut down redisson instance in close method, RedissonRxClient shutdown is %s", redisson.isShutdown()));
}
#Override
public void open(Configuration parameters) {
String prefix = "redis://";
Config config = new Config();
config.useSingleServer()
.setClientName("xhb-redisson-main")
.setTimeout(5000)
.setConnectTimeout(10000)
.setConnectionPoolSize(4)
.setConnectionMinimumIdleSize(2)
.setIdleConnectionTimeout(10000)
.setAddress("127.0.0.1:6379")
.setDatabase(0)
.setPassword(null);
this.redisson = Redisson.create(config).rxJava();
}
#Override
public void processElement(String value, Context ctx, Collector<String> out) throws Exception {
throw new NullPointerException("Null Pointer in ProcessElement");
}
}

Why is my Flink standalone-cluster not receiving my job?

I created a program in Flink (Java) to calculate the average of 9 fake sensors on 3 different rooms. The program runs fine if I start the jar file. So I decided to start the flink standalone-cluster to check the TaskManagers running my Job and respective tasks, like here (https://ci.apache.org/projects/flink/flink-docs-stable/tutorials/local_setup.html). I am running everything on my machine.
Why Can I not see the job running on the dashboard (http://localhost:8081/#/overview) but if I watch the log files (tail -f log/flink--client--*-T430.log) I can see something being processed?
Moreover, the print() method is spilling the output to the console.
I start my application with this command ./bin/flink run examples/explore-flink.jar -c
But maybe there is some parameter on a config file that I have to configure. Here is my code:
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.MapState;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.sense.flink.mqtt.MqttTemperature;
import org.sense.flink.mqtt.TemperatureMqttConsumer;
public class SensorsMultipleReadingMqttEdgentQEP {
private boolean checkpointEnable = true;
private long checkpointInterval = 1000;
private CheckpointingMode checkpointMode = CheckpointingMode.EXACTLY_ONCE;
public SensorsMultipleReadingMqttEdgentQEP() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
if (checkpointEnable)
env.enableCheckpointing(checkpointInterval, checkpointMode);
DataStream<MqttTemperature> temperatureStream01 = env.addSource(new TemperatureMqttConsumer("topic-edgent-01"));
DataStream<MqttTemperature> temperatureStream02 = env.addSource(new TemperatureMqttConsumer("topic-edgent-02"));
DataStream<MqttTemperature> temperatureStream03 = env.addSource(new TemperatureMqttConsumer("topic-edgent-03"));
DataStream<MqttTemperature> temperatureStreams = temperatureStream01.union(temperatureStream02)
.union(temperatureStream03);
DataStream<Tuple2<String, Double>> average = temperatureStreams.keyBy(new TemperatureKeySelector())
.map(new AverageTempMapper());
average.print();
String executionPlan = env.getExecutionPlan();
System.out.println("ExecutionPlan ........................ ");
System.out.println(executionPlan);
System.out.println("........................ ");
// env.execute("SensorsMultipleReadingMqttEdgentQEP");
env.execute();
}
public static class TemperatureKeySelector implements KeySelector<MqttTemperature, Integer> {
private static final long serialVersionUID = 5905504239899133953L;
#Override
public Integer getKey(MqttTemperature value) throws Exception {
return value.getId();
}
}
public static class AverageTempMapper extends RichMapFunction<MqttTemperature, Tuple2<String, Double>> {
private static final long serialVersionUID = -5489672634096634902L;
private MapState<String, Double> averageTemp;
#Override
public void open(Configuration parameters) throws Exception {
averageTemp = getRuntimeContext()
.getMapState(new MapStateDescriptor<>("average-temperature", String.class, Double.class));
}
#Override
public Tuple2<String, Double> map(MqttTemperature value) throws Exception {
String key = "no-room";
Double temp = value.getTemp();
if (value.getId().equals(1) || value.getId().equals(2) || value.getId().equals(3)) {
key = "room-A";
} else if (value.getId().equals(4) || value.getId().equals(5) || value.getId().equals(6)) {
key = "room-B";
} else if (value.getId().equals(7) || value.getId().equals(8) || value.getId().equals(9)) {
key = "room-C";
} else {
System.err.println("Sensor not defined in any room.");
}
if (averageTemp.contains(key)) {
temp = (averageTemp.get(key) + value.getTemp()) / 2;
} else {
averageTemp.put(key, temp);
}
return new Tuple2<String, Double>(key, temp);
}
}
}
Thanks,
Felipe
After I select the option "Extract required libraries into generated JAR" it worked. Strange because I was generating the JAR with the option "Package required libraries into generated JAR" and it was not working.

Flink CEP No Results Printed

I am trying to print out a string if Hello and world are found using the Flink CEP library. My source is Kafka and using the console-producer to input the data. That part is working. I can print out what I enter into the topic. However, it will not print out my final message "The world is so nice!". It will not even print out that it entered the lambda. Below is the class
package kafka;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector;
import java.util.Map;
import java.util.Properties;
/**
* Created by crackerman on 9/16/16.
*/
public class WordCount {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("zookeeper.connect", "localhost:2181");
properties.put("group.id", "test");
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<String> src = see.addSource(new FlinkKafkaConsumer08<>("complexString",
new SimpleStringSchema(),
properties));
src.print();
Pattern<String, String> pattern = Pattern.<String>begin("first")
.where(evt -> evt.contains("Hello"))
.followedBy("second")
.where(evt -> evt.contains("World"));
PatternStream<String> patternStream = CEP.pattern(src, pattern);
DataStream<String> alerts = patternStream.flatSelect(
(Map<String, String> in, Collector<String> out) -> {
System.out.println("Made it to the lambda");
String first = in.get("first");
String second = in.get("second");
System.out.println("First: " + first);
System.out.println("Second: " + second);
if (first.equals("Hello") && second.equals("World")) {
out.collect("The world is so nice!");
}
});
alerts.print();
see.execute();
}
}
Any help would be greatly appreciated.
Thanks!
The issue is the following line
see.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
If that is removed, it works the way I expected it to.

Selenium Web driver--Failure Screenshot is not captured in TestNG report

With below mentioned code,if the test case is pass-screenshot captured successfully and displayed in report.But when the test is failed--screenshot is not displayed.Even screenshot hyperlink is not displayed in report.Anybody can sort out the mistake in code?
package listeners;
import java.io.File;
import java.io.IOException;
import java.text.Format;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.By;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.ITestResult;
import org.testng.Reporter;
import org.testng.annotations.Listeners;
import org.testng.annotations.Test;
import org.testng.ITestResult;
import org.testng.Reporter;
import org.testng.TestListenerAdapter;
import java.util.logging.Logger;
#Listeners
public class CountryChoserLayer extends TestListenerAdapter {
#Test(priority=1)
public void choseCountry() throws Exception{
driver.findElement(By.id("intselect")).sendKeys("India");
driver.findElement(By.xpath(".//*[#id='countryChooser']/a/img")).click();
//window.onbeforeunload = null;
Date date=new Date();
Format formatter = new SimpleDateFormat("yyyy-MM-dd_hh-mm-ss");
File scrnsht = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
String NewFileNamePath=("C://Documents and Settings//vlakshm//workspace//MyTNG//test-output//Screenshots"+"//SearsINTL_"+ formatter.format(date)+".png");
FileUtils.copyFile(scrnsht, new File(NewFileNamePath));
System.out.println(NewFileNamePath);
Reporter.log("Passed Screenshot");
System.out.println("---------------------------------------");
System.out.println("Country choser layer test case-Success");
System.out.println("---------------------------------------");
}
public String baseurl="http://www.sears.com/shc/s/CountryChooserView?storeId=10153&catalogId=12605";
public WebDriver driver;
public int Count = 0;
#Test(priority=0)
public void openBrowser() {
driver = new FirefoxDriver();
driver.manage().deleteAllCookies();
driver.get(baseurl);
}
#Test(priority=2)
public void closeBrowser() {
driver.quit();
}
#Override
public void onTestFailure(ITestResult result){
Reporter.log("Fail");
System.out.println("BBB");
//Reporter.setCurrentTestResult(result);
Date date=new Date();
Format formatter = new SimpleDateFormat("yyyy-MM-dd_hh-mm-ss");
File scrnsht = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
//File scrFile = ((TakesScreenshot) WebDriver.globalDriverInstance).getScreenshotAs(OutputType.FILE);
String NewFileNamePath=("C://Documents and Settings//vlakshm//workspace//MyTNG//test-output//Screenshots"+"//SearsINTL_"+ formatter.format(date)+".png");
//System.out.println("AAA" + NewFileNamePath);
try {
//System.out.println("CCC");
FileUtils.copyFile(scrnsht,new File(NewFileNamePath));
System.out.println(NewFileNamePath);
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("DDD");
e.printStackTrace();
}
Reporter.log("Failed Screenshot");
Reporter.setCurrentTestResult(null);
System.out.println("---------------------------------------");
System.out.println("Country choser layer test case Failed");
System.out.println("---------------------------------------");
}
#Override
public void onTestSkipped(ITestResult result) {
// will be called after test will be skipped
Reporter.log("Skip");
}
#Override
public void onTestSuccess(ITestResult result) {
// will be called after test will pass
Reporter.log("Pass");
}
}
Your onTestFailure method is not being called because you didn't specify listener for your test class. You are missing a value in #Listeners annotation. It should be something like
#Listeners({CountryChoserLayer.class})
You can find more ways of specifying a listener in official TestNg's documentation.
Another problem you are likely to encounter would be NullPointerException while trying to take screenshot in onTestFailure method. The easiest workaround for that would be changing the declaration of driver field to static. I run the code with those fixes and I got the report with screenshot.
I must add that in my opinion putting both test and listener methods into one class is not a good practice.

Resources