I have created a interceptor added to the context. This interceptor is getting executed for each node in the route. But i want to identify when processing of all nodes is complete and do some action.
public class MyInterceptStrategy implements InterceptStrategy {
public int count = 0;
#Override
public Processor wrapProcessorInInterceptors(final CamelContext context,
final ProcessorDefinition<?> definition, final Processor target,
final Processor nextTarget) throws Exception {
return new DelegateAsyncProcessor(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
count++;
target.process(exchange);
//if this is the last node print the message "all nodes processed" and count is "count"
//System.out.println(count);
}
}) {
};
}
}
UPDATE: Tried doing this to get the total count of nodes,but this returns all the nodes in all the routes and not just the nodes that are eligible for processing.
public int getTotalProcessors(CamelContext context) {
int totalProcessorsCount = 0;
for (Route r : context.getRoutes()) {
totalProcessorsCount = totalProcessorsCount + r.getRouteContext().getRoute().getOutputs().size();
}
return totalProcessorsCount;
}
Related
I am trying to do pre shuffle aggregation in flink. Following is the MapBundle implementation.
public class TaxiFareMapBundleFunction extends MapBundleFunction<Long, TaxiFare, TaxiFare, TaxiFare> {
#Override
public TaxiFare addInput(#Nullable TaxiFare value, TaxiFare input) throws Exception {
if (value == null) {
return input;
}
value.tip = value.tip + input.tip;
return value;
}
#Override
public void finishBundle(Map<Long, TaxiFare> buffer, Collector<TaxiFare> out) throws Exception {
for (Map.Entry<Long, TaxiFare> entry : buffer.entrySet()) {
out.collect(entry.getValue());
}
}
}
I am using "CountBundleTrigger.java" . But the pre-shuffle aggregation is not working as the "count" variable is always 0. Please let me know If I am missing something.
#Override
public void onElement(T element) throws Exception {
count++;
if (count >= maxCount) {
callback.finishBundle();
reset();
}
}
Here is the main code.
MapBundleFunction<Long, TaxiFare, TaxiFare, TaxiFare> mapBundleFunction = new TaxiFareMapBundleFunction();
BundleTrigger<TaxiFare> bundleTrigger = new CountBundleTrigger<>(10);
KeySelector<TaxiFare, Long> taxiFareLongKeySelector = new KeySelector<TaxiFare, Long>() {
#Override
public Long getKey(TaxiFare value) throws Exception {
return value.driverId;
}
};
DataStream<Tuple3<Long, Long, Float>> hourlyTips =
// fares.keyBy((TaxiFare fare) -> fare.driverId)
//
.window(TumblingEventTimeWindows.of(Time.hours(1))).process(new AddTips());;
fares.transform("preshuffle", TypeInformation.of(TaxiFare.class),
new TaxiFareStream(mapBundleFunction, bundleTrigger,
taxiFareLongKeySelector
))
.assignTimestampsAndWatermarks(new
BoundedOutOfOrdernessTimestampExtractor<TaxiFare>(Time.seconds(20)) {
#Override
public long extractTimestamp(TaxiFare element) {
return element.startTime.getEpochSecond();
}
})
.keyBy((TaxiFare fare) -> fare.driverId)
.window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
.process(new AddTips());
DataStream<Tuple3<Long, Long, Float>> hourlyMax =
hourlyTips.windowAll(TumblingEventTimeWindows.of(Time.hours(1))).maxBy(2);
Here is the code for TaxiFareStream.java.
public class TaxiFareStream extends MapBundleOperator<Long, TaxiFare, TaxiFare, TaxiFare> {
private KeySelector<TaxiFare, Long> keySelector;
public TaxiFareStream(MapBundleFunction<Long, TaxiFare,
TaxiFare, TaxiFare> userFunction,
BundleTrigger<TaxiFare> bundleTrigger,
KeySelector<TaxiFare, Long> keySelector) {
super(userFunction, bundleTrigger, keySelector);
this.keySelector = keySelector;
}
#Override
protected Long getKey(TaxiFare input) throws Exception {
return keySelector.getKey(input);
}
}
Update
I have created the following class but I am seeing an error. I think it is not able to serialize the class MapStreamBundleOperator.java
public class MapStreamBundleOperator<K, V, IN, OUT> extends
AbstractMapStreamBundleOperator<K, V, IN, OUT> {
private static final long serialVersionUID = 6556268125924098320L;
/** KeySelector is used to extract key for bundle map. */
private final KeySelector<IN, K> keySelector;
public MapStreamBundleOperator(MapBundleFunction<K, V, IN, OUT> function, BundleTrigger<IN> bundleTrigger,
KeySelector<IN, K> keySelector) {
super(function, bundleTrigger);
this.keySelector = keySelector;
}
#Override
protected K getKey(IN input) throws Exception {
return this.keySelector.getKey(input);
}
}
`
2021-08-27 05:06:04,814 ERROR FlinkDefaults.class - Stream execution failed
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot serialize operator object class org.apache.flink.streaming.api.operators.SimpleUdfStreamOperatorFactory.
at org.apache.flink.streaming.api.graph.StreamConfig.setStreamOperatorFactory(StreamConfig.java:247)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.setVertexConfig(StreamingJobGraphGenerator.java:497)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createChain(StreamingJobGraphGenerator.java:318)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createChain(StreamingJobGraphGenerator.java:297)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createChain(StreamingJobGraphGenerator.java:297)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.setChaining(StreamingJobGraphGenerator.java:264)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:173)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:113)
at org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:850)
at org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:52)
at org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:43)
at org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:55)
at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:62)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1810)
at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128)
at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1697)
at com.pinterest.xenon.flink.FlinkDefaults$.run(FlinkDefaults.scala:46)
at com.pinterest.xenon.flink.FlinkWorkflow.run(FlinkWorkflow.scala:74)
at com.pinterest.xenon.flink.WorkflowLauncher$.executeWorkflow(WorkflowLauncher.scala:43)
at com.pinterest.xenon.flink.WorkflowLauncher$.delayedEndpoint$com$pinterest$xenon$flink$WorkflowLauncher$1(WorkflowLauncher.scala:25)
at com.pinterest.xenon.flink.WorkflowLauncher$delayedInit$body.apply(WorkflowLauncher.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.pinterest.xenon.flink.WorkflowLauncher$.main(WorkflowLauncher.scala:9)
at com.pinterest.xenon.flink.WorkflowLauncher.main(WorkflowLauncher.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:168)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:699)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:232)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: java.io.NotSerializableException: visibility.mabs.src.main.java.com.pinterest.mabs.MabsFlinkJob
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
`
I would not rely on the official MapBundleOperator since David already said that this is not very well documented. I will answer this question based on my own AbstractMapStreamBundleOperator. I think that you are missing the counter numOfElements++; inside the processElement() method. And it is also better to use generic types. Use this code:
public abstract class AbstractMapStreamBundleOperator<K, V, IN, OUT>
extends AbstractUdfStreamOperator<OUT, MapBundleFunction<K, V, IN, OUT>>
implements OneInputStreamOperator<IN, OUT>, BundleTriggerCallback {
private static final long serialVersionUID = 1L;
private final Map<K, V> bundle;
private final BundleTrigger<IN> bundleTrigger;
private transient TimestampedCollector<OUT> collector;
private transient int numOfElements = 0;
public AbstractMapStreamBundleOperator(MapBundleFunction<K, V, IN, OUT> function, BundleTrigger<IN> bundleTrigger) {
super(function);
chainingStrategy = ChainingStrategy.ALWAYS;
this.bundle = new HashMap<>();
this.bundleTrigger = checkNotNull(bundleTrigger, "bundleTrigger is null");
}
#Override
public void open() throws Exception {
super.open();
numOfElements = 0;
collector = new TimestampedCollector<>(output);
bundleTrigger.registerCallback(this);
// reset trigger
bundleTrigger.reset();
}
#Override
public void processElement(StreamRecord<IN> element) throws Exception {
// get the key and value for the map bundle
final IN input = element.getValue();
final K bundleKey = getKey(input);
final V bundleValue = this.bundle.get(bundleKey);
// get a new value after adding this element to bundle
final V newBundleValue = userFunction.addInput(bundleValue, input);
// update to map bundle
bundle.put(bundleKey, newBundleValue);
numOfElements++;
bundleTrigger.onElement(input);
}
protected abstract K getKey(final IN input) throws Exception;
#Override
public void finishBundle() throws Exception {
if (!bundle.isEmpty()) {
numOfElements = 0;
userFunction.finishBundle(bundle, collector);
bundle.clear();
}
bundleTrigger.reset();
}
}
Then create the MapStreamBundleOperator like you already have. Use this code:
public class MapStreamBundleOperator<K, V, IN, OUT> extends AbstractMapStreamBundleOperator<K, V, IN, OUT> {
private final KeySelector<IN, K> keySelector;
public MapStreamBundleOperator(MapBundleFunction<K, V, IN, OUT> function, BundleTrigger<IN> bundleTrigger,
KeySelector<IN, K> keySelector) {
super(function, bundleTrigger);
this.keySelector = keySelector;
}
#Override
protected K getKey(IN input) throws Exception {
return this.keySelector.getKey(input);
}
}
The counter inside the trigger is that makes the Bundle operator flush the events to the next phase. The CountBundleTrigger is like below. Use this code. You will need also the BundleTriggerCallback.
public class CountBundleTrigger<T> implements BundleTrigger<T> {
private final long maxCount;
private transient BundleTriggerCallback callback;
private transient long count = 0;
public CountBundleTrigger(long maxCount) {
Preconditions.checkArgument(maxCount > 0, "maxCount must be greater than 0");
this.maxCount = maxCount;
}
#Override
public void registerCallback(BundleTriggerCallback callback) {
this.callback = Preconditions.checkNotNull(callback, "callback is null");
}
#Override
public void onElement(T element) throws Exception {
count++;
if (count >= maxCount) {
callback.finishBundle();
reset();
}
}
#Override
public void reset() { count = 0; }
#Override
public String explain() {
return "CountBundleTrigger with size " + maxCount;
}
}
Then you have to create one of this trigger to pass on your operator. Here I am creating a bundle of 100 TaxiFare events. Take this example with another POJO. I wrote the MapBundleTaxiFareImpl here but you can create your UDF based on this one.
private OneInputStreamOperator<Tuple2<Long, TaxiFare>, Tuple2<Long, TaxiFare>> getPreAggOperator() {
MapBundleFunction<Long, TaxiFare, Tuple2<Long, TaxiFare>, Tuple2<Long, TaxiFare>> myMapBundleFunction = new MapBundleTaxiFareImpl();
CountBundleTrigger<Tuple2<Long, TaxiFare>> bundleTrigger = new CountBundleTrigger<Tuple2<Long, TaxiFare>>(100);
return new MapStreamBundleOperator<>(myMapBundleFunction, bundleTrigger, keyBundleSelector);
}
In the end you call this new operator somewhere using the transform(). Take this example with another POJO.
stream
...
.transform("my-pre-agg",
TypeInformation.of(new TypeHint<Tuple2<Long, TaxiFare>>(){}), getPreAggOperator())
...
I this that it is all that you need. Try to use those class and if it is missing something it is probably on the gitrepository that I put the links. i hope you can make it work.
I'm fairly new to Flink and would be grateful for any advice with this issue.
I wrote a job that receives some input events and compares them with some rules before forwarding them on to kafka topics based on whatever rules match. I implemented this using a flatMap and found it worked well, with one downside: I was loading the rules just once, during application startup, by calling an API from my main() method, and passing the result of this API call into the flatMap function. This worked, but it means that if there are any changes to the rules I have to restart the application, so I wanted to improve it.
I found this page in the documentation which seems to be an appropriate solution to the problem. I wrote a custom source to poll my Rules API every few minutes, and then used a BroadcastProcessFunction, with the Rules added to to the broadcast state using processBroadcastElement and the events processed by processElement.
The solution is working, but with one problem. My first approach using a FlatMap would process the events almost instantly. Now that I changed to a BroadcastProcessFunction each event takes 60 seconds to process, and it seems to be more or less exactly 60 seconds every time with almost no variation. I made no changes to the rule matching logic itself.
I've had a look through the documentation and I can't seem to find a reason for this, so I'd appreciate if anyone more experienced in flink could offer a suggestion as to what might cause this delay.
The job:
public static void main(String[] args) throws Exception {
// set up the streaming execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
// read the input from Kafka
DataStream<KafkaEvent> documentStream = env.addSource(
createKafkaSource(getSourceTopic(), getSourceProperties())).name("Kafka[" + getSourceTopic() + "]");
// Configure the Rules data stream
DataStream<RulesEvent> ruleStream = env.addSource(
new RulesApiHttpSource(
getApiRulesSubdomain(),
getApiBearerToken(),
DataType.DataTypeName.LOGS,
getRulesApiCacheDuration()) // Currently set to 120000
);
MapStateDescriptor<String, RulesEvent> ruleStateDescriptor = new MapStateDescriptor<>(
"RulesBroadcastState",
BasicTypeInfo.STRING_TYPE_INFO,
TypeInformation.of(new TypeHint<RulesEvent>() {
}));
// broadcast the rules and create the broadcast state
BroadcastStream<RulesEvent> ruleBroadcastStream = ruleStream
.broadcast(ruleStateDescriptor);
// extract the resources and attributes
documentStream
.connect(ruleBroadcastStream)
.process(new FanOutLogsRuleMapper()).name("FanOut Stream")
.addSink(createKafkaSink(getDestinationProperties()))
.name("FanOut Sink");
// run the job
env.execute(FanOutJob.class.getName());
}
The custom HTTP source which gets the rules
public class RulesApiHttpSource extends RichSourceFunction<RulesEvent> {
private static final Logger LOGGER = LoggerFactory.getLogger(RulesApiHttpSource.class);
private final long pollIntervalMillis;
private final String endpoint;
private final String bearerToken;
private final DataType.DataTypeName dataType;
private final RulesApiCaller caller;
private volatile boolean running = true;
public RulesApiHttpSource(String endpoint, String bearerToken, DataType.DataTypeName dataType, long pollIntervalMillis) {
this.pollIntervalMillis = pollIntervalMillis;
this.endpoint = endpoint;
this.bearerToken = bearerToken;
this.dataType = dataType;
this.caller = new RulesApiCaller(this.endpoint, this.bearerToken);
}
#Override
public void open(Configuration configuration) throws Exception {
// do nothing
}
#Override
public void close() throws IOException {
// do nothing
}
#Override
public void run(SourceContext<RulesEvent> ctx) throws IOException {
while (running) {
if (pollIntervalMillis > 0) {
try {
RulesEvent event = new RulesEvent();
event.setRules(getCurrentRulesList());
event.setDataType(this.dataType);
event.setRetrievedAt(Instant.now());
ctx.collect(event);
Thread.sleep(pollIntervalMillis);
} catch (InterruptedException e) {
running = false;
}
} else if (pollIntervalMillis <= 0) {
cancel();
}
}
}
public List<Rule> getCurrentRulesList() throws IOException {
// call API and get rulles
}
#Override
public void cancel() {
running = false;
}
}
The BroadcastProcessFunction
public abstract class FanOutRuleMapper extends BroadcastProcessFunction<KafkaEvent, RulesEvent, KafkaEvent> {
protected final String RULES_EVENT_NAME = "rulesEvent";
protected final MapStateDescriptor<String, RulesEvent> ruleStateDescriptor = new MapStateDescriptor<>(
"RulesBroadcastState",
BasicTypeInfo.STRING_TYPE_INFO,
TypeInformation.of(new TypeHint<RulesEvent>() {
}));
#Override
public void processBroadcastElement(RulesEvent rulesEvent, BroadcastProcessFunction<KafkaEvent, RulesEvent, KafkaEvent>.Context ctx, Collector<KafkaEvent> out) throws Exception {
ctx.getBroadcastState(ruleStateDescriptor).put(RULES_EVENT_NAME, rulesEvent);
LOGGER.debug("Added to broadcast state {}", rulesEvent.toString());
}
// omitted rules matching logic
}
public class FanOutLogsRuleMapper extends FanOutRuleMapper {
public FanOutLogsJobRuleMapper() {
super();
}
#Override
public void processElement(KafkaEvent in, BroadcastProcessFunction<KafkaEvent, RulesEvent, KafkaEvent>.ReadOnlyContext ctx, Collector<KafkaEvent> out) throws Exception {
RulesEvent rulesEvent = ctx.getBroadcastState(ruleStateDescriptor).get(RULES_EVENT_NAME);
ExportLogsServiceRequest otlpLog = extractOtlpMessageFromJsonPayload(in);
for (Rule rule : rulesEvent.getRules()) {
boolean match = false;
// omitted rules matching logic
if (match) {
for (RuleDestination ruleDestination : rule.getRulesDestinations()) {
out.collect(fillInTheEvent(in, rule, ruleDestination, otlpLog));
}
}
}
}
}
Maybe you can give the complete code of the FanOutLogsRuleMapper class, currently the match variable is always false
I am write my Apache Flink(1.10) to update records real time like this:
public class WalletConsumeRealtimeHandler {
public static void main(String[] args) throws Exception {
walletConsumeHandler();
}
public static void walletConsumeHandler() throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FlinkUtil.initMQ();
FlinkUtil.initEnv(env);
DataStream<String> dataStreamSource = env.addSource(FlinkUtil.initDatasource("wallet.consume.report.realtime"));
DataStream<ReportWalletConsumeRecord> consumeRecord =
dataStreamSource.map(new MapFunction<String, ReportWalletConsumeRecord>() {
#Override
public ReportWalletConsumeRecord map(String value) throws Exception {
ObjectMapper mapper = new ObjectMapper();
ReportWalletConsumeRecord consumeRecord = mapper.readValue(value, ReportWalletConsumeRecord.class);
consumeRecord.setMergedRecordCount(1);
return consumeRecord;
}
}).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessGenerator());
consumeRecord.keyBy(
new KeySelector<ReportWalletConsumeRecord, Tuple2<String, Long>>() {
#Override
public Tuple2<String, Long> getKey(ReportWalletConsumeRecord value) throws Exception {
return Tuple2.of(value.getConsumeItem(), value.getTenantId());
}
})
.timeWindow(Time.seconds(5))
.reduce(new SumField(), new CollectionWindow())
.addSink(new SinkFunction<List<ReportWalletConsumeRecord>>() {
#Override
public void invoke(List<ReportWalletConsumeRecord> reportPumps, Context context) throws Exception {
WalletConsumeRealtimeHandler.invoke(reportPumps);
}
});
env.execute(WalletConsumeRealtimeHandler.class.getName());
}
private static class CollectionWindow extends ProcessWindowFunction<ReportWalletConsumeRecord,
List<ReportWalletConsumeRecord>,
Tuple2<String, Long>,
TimeWindow> {
public void process(Tuple2<String, Long> key,
Context context,
Iterable<ReportWalletConsumeRecord> minReadings,
Collector<List<ReportWalletConsumeRecord>> out) throws Exception {
ArrayList<ReportWalletConsumeRecord> employees = Lists.newArrayList(minReadings);
if (employees.size() > 0) {
out.collect(employees);
}
}
}
private static class SumField implements ReduceFunction<ReportWalletConsumeRecord> {
public ReportWalletConsumeRecord reduce(ReportWalletConsumeRecord d1, ReportWalletConsumeRecord d2) {
Integer merged1 = d1.getMergedRecordCount() == null ? 1 : d1.getMergedRecordCount();
Integer merged2 = d2.getMergedRecordCount() == null ? 1 : d2.getMergedRecordCount();
d1.setMergedRecordCount(merged1 + merged2);
d1.setConsumeNum(d1.getConsumeNum() + d2.getConsumeNum());
return d1;
}
}
public static void invoke(List<ReportWalletConsumeRecord> records) {
WalletConsumeService service = FlinkUtil.InitRetrofit().create(WalletConsumeService.class);
Call<ResponseBody> call = service.saveRecords(records);
call.enqueue(new Callback<ResponseBody>() {
#Override
public void onResponse(Call<ResponseBody> call, Response<ResponseBody> response) {
}
#Override
public void onFailure(Call<ResponseBody> call, Throwable t) {
t.printStackTrace();
}
});
}
}
and now I found the Flink task only receive at least 2 records to trigger sink, is the reduce action need this?
You need two records to trigger the window. Flink only knows when to close a window (and fire subsequent calculation) when it receives a watermark that is larger than the configured value of the end of the window.
In your case, you use BoundedOutOfOrdernessGenerator, which updates the watermark according to the incoming records. So it generates a second watermark only after having seen the second record.
You can use a different watermark generator. In the troubleshooting training there is a watermark generator that also generates watermarks on timeout.
I am writting a pipe to group session for a user keyed by id and window using eventSessionWindow. I am using the Periodic WM and a custom session accumulator which will count the event is a given session.
What is happenning is my window operator is consuming records but not emmiting out. I am not sure what is missing here.
FlinkKafkaConsumer010<String> eventSource =
new FlinkKafkaConsumer010<>("events", new SimpleStringSchema(), properties);
eventSource.setStartFromLatest();
DataStream<Event> eventStream = env.addSource(eventSource
).flatMap(
new FlatMapFunction<String, Event>() {
#Override
public void flatMap(String value, Collector<Event> out) throws Exception {
out.collect(Event.toEvent(value));
}
}
).assignTimestampsAndWatermarks(
new AssignerWithPeriodicWatermarks<Event>() {
long maxTime;
#Override
public long extractTimestamp(Event element, long previousElementTimestamp) {
maxTime = Math.max(previousElementTimestamp, maxTime);
return previousElementTimestamp;
}
#Nullable
#Override
public Watermark getCurrentWatermark() {
return new Watermark(maxTime);
}
}
);
DataStream <Session> session_stream =eventStream.keyBy((KeySelector<Event, String>)value -> value.id)
.window(EventTimeSessionWindows.withGap(Time.minutes(5)))
.aggregate(new AggregateFunction<Event, pipe.SessionAccumulator, Session>() {
#Override
public pipe.SessionAccumulator createAccumulator() {
return new pipe.SessionAccumulator();
}
#Override
public pipe.SessionAccumulator add(Event e, pipe.SessionAccumulator sessionAccumulator) {
sessionAccumulator.add(e);
return sessionAccumulator;
}
#Override
public Session getResult(pipe.SessionAccumulator sessionAccumulator) {
return sessionAccumulator.getLocalValue();
}
#Override
public pipe.SessionAccumulator merge(pipe.SessionAccumulator prev, pipe.SessionAccumulator next) {
prev.merge(next);
return prev;
}
}, new WindowFunction<Session, Session, String, TimeWindow>() {
#Override
public void apply(String s, TimeWindow timeWindow, Iterable<Session> iterable, Collector<Session> collector) throws Exception {
collector.collect(iterable.iterator().next());
}
});
public static class SessionAccumulator implements Accumulator<Event, Session>{
Session session;
public SessionAccumulator(){
session = new Session();
}
#Override
public void add(Event e) {
session.add(e);
}
#Override
public Session getLocalValue() {
return session;
}
#Override
public void resetLocal() {
session = new Session();
}
#Override
public void merge(Accumulator<Event, Session> accumulator) {
session.merge(Collections.singletonList(accumulator.getLocalValue()));
}
#Override
public Accumulator<Event, Session> clone() {
SessionAccumulator sessionAccumulator = new SessionAccumulator();
sessionAccumulator.session = new Session(
session.id,
);
return sessionAccumulator;
}
}
public static class SessionAccumulator implements Accumulator<Event, Session>{
Session session;
public SessionAccumulator(){
session = new Session();
}
#Override
public void add(Event e) {
session.add(e);
}
#Override
public Session getLocalValue() {
return session;
}
#Override
public void resetLocal() {
session = new Session();
}
#Override
public void merge(Accumulator<Event, Session> accumulator) {
session.merge(Collections.singletonList(accumulator.getLocalValue()));
}
#Override
public Accumulator<Event, Session> clone() {
SessionAccumulator sessionAccumulator = new SessionAccumulator();
sessionAccumulator.session = new Session(
session.id,
session.lastEventTime,
session.earliestEventTime,
session.count;
);
return sessionAccumulator;
}
}
If your watermarks are not advancing, this would explain why no results are being emitted by the window. Possible causes include:
Your events haven't been timestamped by Kafka, and thus previousElementTimestamp isn't set.
You have an idle Kafka partition holding back the watermarks. (This is a somewhat complex topic. If this turns out to be the cause of your problems, and you get stuck on it, please come back with a new question.)
Another possibility is that there is never a 5 minute-long gap in the events, in which case the events will accumulate in a never-ending session.
Also, you don't appear to have included a sink. If you don't print or otherwise send the results to a sink, Flink won't do anything.
And don't forget that you must call env.execute() to get anything to happen.
A few other things:
Your watermark generator isn't allowing for any out-of-orderness, so the window is going to ignore all out-of-order events (because they will be late). If your events have strictly ascending timestamps you should go ahead and use a AscendingTimestampExtractor; if they can be out-of-order, then a BoundedOutOfOrdernessTimestampExtractor is appropriate.
Your WindowFunction is superfluous. It is simply forwarding downstream the result from the aggregator, so you could remove it.
You have posted two different implementations of SessionAccumulator.
During the processing of an Exchange received from JMS I'm creating dynamically a route that fetches a file from FTP to the file system and when the batch is done I need to remove that same route. The following code fragment shows how I do this:
public void execute() {
try {
context.addRoutes(createFetchIndexRoute(routeId()));
} catch (Exception e) {
throw Throwables.propagate(e);
}
}
private RouteBuilder createFetchIndexRoute(final String routeId) {
return new RouteBuilder() {
#Override
public void configure() throws Exception {
from("ftp://" + getRemoteQuarterDirectory() +
"?fileName=" + location.getFileName() +
"&binary=true" +
"&localWorkDirectory=" + localWorkDirectory)
.to("file://" + getLocalQuarterDirectory())
.process(new Processor() {
RouteTerminator terminator;
#Override
public void process(Exchange exchange) throws Exception {
if (camelBatchComplete(exchange)) {
terminator = new RouteTerminator(routeId,
exchange.getContext());
terminator.start();
}
}
})
.routeId(routeId);
}
};
}
I'm Using a thread to stop a route from a route, which is an approach recommended in the Camel Documentation - How can I stop a route from a route
public class RouteTerminator extends Thread {
private String routeId;
private CamelContext camelContext;
public RouteTerminator(String routeId, CamelContext camelContext) {
this.routeId = routeId;
this.camelContext = camelContext;
}
#Override
public void run() {
try {
camelContext.stopRoute(routeId);
camelContext.removeRoute(routeId);
} catch (Exception e) {
throw Throwables.propagate(e);
}
}
}
In result the route does stop. But what I see in the jconsole is that the thread that corresponds to the route isn't removed. Thus in time these abandoned threads just keep accumulating.
Is there a way to properly stop/remove a route dynamically/programmatically and also to release the route's thread, so that they don't accumulate through time?
This is fixed in the next Camel release 2.9.2 and 2.10. Fixed by this ticket:
https://issues.apache.org/jira/browse/CAMEL-5072