NotifyBuilder.matches() always times out - apache-camel

I would like to validate the retry logic built into my Camel route definition.
from(somewhere)
.errorHandler(
defaultErrorHandler()
.log("something")
.maxRedeliveries(3)
)
.to(somewhere-else)
To do so I wrote test deliberately raise an exception.
int counter = 0;
#Test
public void simulateError() throws Exception {
NotifyBuilder nb = new NotifyBuilder(mock.getCamelContext()).whenDone(3).create();
mock.whenAnyExchangedReceived(
new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
counter++;
throw new FooException("Error during processing: " + counter);
}
}
);
template.sendBody(somewhere, "some message");
boolean matches = nb.matches(8, TimeUnit.SECONDS);
assertEquals("Exception raised", 3, counter);
}
Now this works fine. However if I assert on matches by adding
assertTrue(matches)
It fails. In other words, the NotifyBuilder's match criterion is never met and it always times out.
Why is that? Is it because retries don't count as exchange deliveries?
What is the canonical way to test that redelivery is attempted the expected number of times?

Closing the loop & answering my own question.
Firstly - Indeed retries don't count towards done-messages.
As noted by Claus Ibsen, the preferred (shortest?) solution is to verify that the mock receives the expected number of messages. That will be max_retries + 1 (4 in my case). So the working code looks like
#Test
public void simulateError() throws Exception {
/*
* Verify the error handling logic by checking the number of messages that are delivered.
* It must be 1 + number of retries.
*/
mock.expectedMessageCount(maxRetries + 1);
mock.setAssertPeriod(6000); // Necessary to ensure the message count is treated as an exact number.
mock.whenAnyExchangeReceived(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
System.out.println("Intercepted to-endpoint");
ProcessingFailedException e = new FooException("Error during processing");
exchange.setException(e);
throw e;
}
});
producerTemplate.sendBody(umbFrom, "Hello world");
mock.assertIsSatisfied();
}

Related

Flink KeyedProcessFunction integrating anonymous methods

I am attempting to write a keyedProcessFunction, the code looks like this below:
DataStream<Tuple2<Long, Integer>> busyMachinesPerWindow = busyMachines
// group by timestamp (window end)
.keyBy(event -> event.getField(1))
.process(new KeyedProcessFunction<Tuple1<Long>, Tuple3<Long, Long, Long>, Tuple2<Long, Integer>>() {
private ValueState<Integer> state;
#Override
public void open(Configuration config) throws IOException {
// initialize the state descriptors here
state = getRuntimeContext().getState(new ValueStateDescriptor<>("machine-counts", Integer.class));
if (state.value() == null) {
state.update(0);
}
}
#Override
public void processElement(Tuple3<Long, Long, Long> inWindow, Context ctx, Collector<Tuple2<Long, Integer>> out) throws Exception {
if (state.value() != null) {
state.update(state.value() + 1);
} else {
state.update(1);
}
ctx.timerService().registerEventTimeTimer(inWindow.f1);
}
#Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tuple2<Long, Integer>> out) throws Exception {
int counter = state.value();
state.clear();
// we can now output the window and the machine count
out.collect(new Tuple2<>(((Tuple1<Long>) ctx.getCurrentKey()).f0, counter));
}
});
However this pops up an error saying cannot derive anonymous method. I don't see what the problem is with this code. Is there some type ambiguity that I am not doing right?
One problem with this code is that you are calling state.value() and state.update(0) in the open method. This is not allowed. These methods can only be used in processElement and in onTimer, because only then is there a specific event being processed whose key can be used to access/update the appropriate entry in the state backend.
An instance of a KeyedProcessFunction is multiplexed across all of the keys assigned to a given task slot. The open method is called just once, at a time when there is no specific key in the runtime context, so the state cannot be accessed or updated at this time.

Apache Flink side-output not outputing exected results when order of processors swapped in the original stream

I have a small Flink app:
public class App {
public static final OutputTag<String> numberOutputTag = new OutputTag<String>("side-output") {
};
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> text = env.fromElements(
"abc,123"
);
// Router will split input on commas and redirect number strings to the side output
SingleOutputStreamOperator<String> ingestStream = text
.process(new RouterProcessor())
.process(new UppercaseProcessor())
;
DataStream<String> numberStream = ingestStream.getSideOutput(numberOutputTag)
// Prepends a "$" to the values.
.map(new MoneyMapper());
numberStream.print();
ingestStream.print();
env.execute();
}
}
class RouterProcessor extends ProcessFunction<String, String> {
#Override
public void processElement(String value, Context ctx, Collector<String> out) throws Exception {
String[] tokens = value.split(",");
for (String token : tokens) {
if (token.matches("[0-9]+")) {
ctx.output(App.numberOutputTag, token);
} else {
out.collect(token);
}
}
}
}
class MoneyMapper implements MapFunction<String, String> {
#Override
public String map(String t) throws Exception {
return "$" + t;
}
}
class UppercaseProcessor extends ProcessFunction<String, String> {
#Override
public void processElement(String value, Context ctx, Collector<String> out) throws Exception {
out.collect(value.toUpperCase());
}
}
I'd expect it to output something similar to:
18> ABC
18> $123
However, it only outputs:
10> ABC
If I swap the order of the processors to:
.process(new UppercaseProcessor())
.process(new RouterProcessor())
everything works as expected.
I've read the documentation but I don't see anything that would explain why this is as it is. I'm curious if I'm missing something or doing something wrong.
I've included a GitHub jist here for easier viewing with all the supporting files: https://gist.github.com/baelec/95f41d875dda0a2806a0fb9b9313b90e
Here is a repo if you'd prefer to download the sample project: https://github.com/baelec/flink_sample_broken_0
EDIT: I see that StackOverflow asks us to avoid comments like "Thanks!" but I don't have enough rep to visibly upvote the responses so thanks David and Jaya for your help. I had made some incorrect assumptions regarding side outputs. I appreciate the clarification.
The problem is that you are taking the side output from the UppercaseProcessor, which doesn't use a side output.
It's easier to see what's wrong if you look at the job graph, which looks like this:
If you rearrange the code to be like this:
SingleOutputStreamOperator<String> ingestStream = text
.process(new RouterProcessor());
DataStream<String> numberStream = ingestStream.getSideOutput(numberOutputTag)
.map(new MoneyMapper());
numberStream.print();
ingestStream
.process(new UppercaseProcessor())
.print();
then it works as you expected, and the job graph has become this:
numberOutputTag side output emit logic happens inside RouterProcessor. So you need to extract the side output from the SingleOutputStreamOperator returned by the RouterProcessor process function. But in your code, your side output logic extraction happens after the UppercaseProcessor function.
Change something like below,
SingleOutputStreamOperator<String> tempStream = text.process(new RouterProcessor());
SingleOutputStreamOperator<String> ingestStream = tempStream.process(new UppercaseProcessor());
DataStream<String> numberStream = tempStream.getSideOutput(numberOutputTag).map(new MoneyMapper());
numberStream.print();
ingestStream.print();
Note: Check the usage of tempStream variable in the above example.

Flink pattern getting issue with Arralist in alert code?

I followed this example and implemented with kafka json same sample data.
consumer sample data {"temperature" : 28,"machineName":"xyz"}
DataStream<Alert> patternStream = CEP.pattern(inputEventStream, warningPattern)
.flatSelect(new PatternFlatSelectFunction<TemperatureEvent, Alert>() {
private static final long serialVersionUID = 1L;
#Override
public void flatSelect(Map<String, List<TemperatureEvent>> event, Collector<Alert> out) throws Exception {
new Alert("Temperature Rise Detected:" + ((TemperatureEvent) event.get("first")).getTemperature()
+ " on machine name:" + ((MonitoringEvent) event.get("first")).getMachineName());
}
Now i am getting issue with ArrayList cast
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at Test.KafkaApp.main(KafkaApp.java:61)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to Test.TemperatureEvent
at Test.KafkaApp$2.flatSelect(KafkaApp.java:53)
at org.apache.flink.cep.operator.FlatSelectCepOperator.processMatchedSequences(FlatSelectCepOperator.java:66)
at org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:382)
at org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processElement(AbstractKeyedCEPPatternOperator.java:198)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Unknown Source)
Your code contains two problems:
First of all flatSelect receives a Map<String, List<TemperatureEvent>>. This means that you get potentially multiple TemperatureEvents per pattern. Thus, you have to select which one you want.
You don't add any Alerts to the Collector<Alert>. A flat map function does not return values but outputs them via a Collector<Alert>
Without compiling, I think this should do the trick
DataStream<Alert> patternStream = CEP.pattern(inputEventStream, warningPattern)
.flatSelect(
new PatternFlatSelectFunction<TemperatureEvent, Alert>() {
private static final long serialVersionUID = 1L;
#Override
public void flatSelect(Map<String, List<TemperatureEvent>> event, Collector<Alert> out) throws Exception {
TemperatureEvent temperatureEvent = event.get("first").get(0);
out.collect(new Alert("Temperature Rise Detected:" + temperatureEvent.getTemperature() + " on machine name:" + temperatureEvent.getMachineName()));
}
});
By the way, the linked code from the O'Reilly repository won't compile with Flink. The PatternSelectFunction has the wrong signature.

Flink: ValueState on RichFlatMapFunktion always returns null

I try to calculate the highest amount of found hashtags in a given Tumbling window.
For this I do kind of a "word count" for hashtags and sum them up. This works fine. After this, I try to find the hashtag with the highest order in the given window. I use a RichFlatMapFunction for this and ValueState to save the current maximum of the appearance of a single hashtag, but this doesn't work.
I have debugged my code and find out that the value of the ValueState "maxVal" is in every flatMap step "null". So the update() and the value() method doesn't work in my scenario.
Do I misunderstood the RichFlatMap function or their usage?
Here is my code, everything except the last flatmap function is working as expected:
public class TwitterHashtagCount {
public static void main(String args[]) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
DataStream<String> tweetsRaw = env.addSource(new TwitterSource(TwitterConnection.getTwitterConnectionProperties()));
DataStream<String> tweetsGerman = tweetsRaw.filter(new EnglishLangFilter());
DataStream<Tuple2<String, Integer>> tweetHashtagCount = tweetsGerman
.flatMap(new TwitterHashtagFlatMap())
.keyBy(0)
.timeWindow(Time.seconds(15))
.sum(1)
.flatMap(new RichFlatMapFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() {
private transient ValueState<Integer> maxVal;
#Override
public void open(Configuration parameters) throws Exception {
ValueStateDescriptor<Integer> descriptor =
new ValueStateDescriptor<>(
// state name
"max-val",
// type information of state
TypeInformation.of(Integer.class));
maxVal = getRuntimeContext().getState(descriptor);
}
#Override
public void flatMap(Tuple2<String, Integer> value, Collector<Tuple2<String, Integer>> out) throws Exception {
Integer maxCount = maxVal.value();
if(maxCount == null) {
maxCount = 0;
maxVal.update(0);
}
if(value.f1 > maxCount) {
maxVal.update(maxCount);
out.collect(new Tuple2<String, Integer>(value.f0, value.f1));
}
}
});
tweetHashtagCount.print();
env.execute("Twitter Streaming WordCount");
}
}
I'm wondering why the code you've shared runs at all. The result of sum(1) is non-keyed stream, and the managed state interface you are using expects a keyed stream, and will keep a separate instance of the state for each key. I'm surprised you're not getting an error saying "Keyed state can only be used on a 'keyed stream', i.e., after a 'keyBy()' operation."
Since you've previously windowed the stream, if you do key it again (with the same key) before the RichFlatMapFunction, each key will occur once and the maxVal will always be null.
Something like this might do what you intend, if your goal is to find the max across all hashtags in each time window:
tweetsGerman
.flatMap(new TwitterHashtagFlatMap())
.keyBy(0)
.timeWindow(Time.seconds(15))
.sum(1)
.timeWindowAll(Time.seconds(15))
.max(1)

How to close a Beanstalkd message before the end of a camel route

I have a beanstalkd queue and have enabled job dependencies via a sub queue that posts when the dependent job is complete, in this example I have distributed parallel processing, so that I can trigger an action once all jobs are complete.
The issue is that I am looping through the response queue picking up messages, but once complete it does not attempt to delete any of the messages until the end, causing it to attempt to close the same message multiple times. For this I am trying to enrich the original message with the responses from the parallel processor, is there any way to make message end after the direct is complete, or via another method of message aggregation?
#Bean
RouteBuilder exampleRoute() {
return new RouteBuilder() {
#Override
public void configure() throws Exception {
/**
* Step 1: Read a message from the listener tube, this will take the form:
* {
* parent_job_id : The parent job id
* split_count: Number of responses to expect
* }
**/
from("beanstalk://localhost/dev_job_listener?onFailure=release&jobDelay=20&jobTimeToRun=10")
.unmarshal().json(JsonLibrary.Jackson, Map.class)
.setProperty("message", simple("${body}"))
// Get required tube jobId to allow deletion of job at end of this route
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
final Deque<Long> stack = new ArrayDeque<>();
stack.push(Long.valueOf(exchange.getIn().getHeader("beanstalk.jobId").toString()));
exchange.setProperty("stack", stack);
Map<String, Object> message = (HashMap) exchange.getProperties().get("message");
exchange.setProperty("parent_job_id", Integer.valueOf(message.get("parent_job_id").toString()));
exchange.setProperty("message_count", Integer.valueOf(message.get("split_count").toString()));
}
})
/**
* Step 2: Listed to the response queue job_{parent_job_id} for the number of completion messages
* from the split_count.
*/
.to("direct:verifySubJobs").end()
/**
* Step 3: Post to the complete queue that all jobs have completed
*/
.to("beanstalk://localhost/next_step?jobTimeToRun=10")
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
exchange.getIn().setHeader("beanstalk.jobId", exchange.getProperty("stack", Deque.class).pop());
}
})
.log("Ending Job : " + simple("${header.beanstalk.jobId}"));
from("direct:verifySubJobs")
.log("-> direct:verifySubJobs")
.setProperty("url", simple("beanstalk://localhost/dev_job_" + simple("${property.parent_job_id}").getText() + "?onFailure=release&jobTimeToRun=10"))
.pollEnrich().simple("${property.url}")
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
Deque<Long> stack = exchange.getProperty("stack", Deque.class);
stack.push(Long.valueOf(exchange.getIn().getHeader("beanstalk.jobId").toString()));
Integer counter = exchange.getProperty("message_count", Integer.class);
exchange.setProperty("message_count", counter);
}
}).end()
.choice().when(simple("${property.message_count} > 0"))
.to("direct:verifySubJobs")
.end()
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
exchange.getIn().setHeader("beanstalk.jobId", exchange.getProperty("stack", Deque.class).pop());
}
}).end()
.log("<- direct:verifySubJobs");
}
};
}

Resources