Camel condition on aggregate of messages - apache-camel

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)

Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Related

Salesforce Apex CPU LImit

currently we are having issue with an CPU Limit. We do have a lot of processes that are most likely not optimized, I have already combined some processes for the same object but it is not enough. I am trying to understand logs rights now - as you can see on the screenshots, there is one process that is being called multiple times (I assume each time for created record). Even if I create, for example, 60 records in one operation/dml statement, the Process Builders still gets called 60 times? (this is what I think is happening) Is that a problem we are having right now? If so, is there a better way to do it? Because right now we need updates from PB to run, but I expected it should get bulkified or something like that. I was also thinking there might be some looping between processes. If there are more information you need, please let me know. Thank you.
Well, yes, the process builder will be invoked 60 times, 1 record at a time. But that shouldn't be your problem. The final update / create child records / email send (or whatever your action is) will be bulkified, it won't save 1 record at a time. If the process calls some apex actions - they're supposed to support passing collection of records, not just single record.
You maybe looking at wrong place. CPU time suggests code problems, not config (flow, workflow, process builder... although if you're doing updates of fields on "this" record it's possible you'd benefit from before-save flows). Try to compare timestamps related to METHOD_BEGIN, METHOD_END for triggers, code methods (including invocable action / process plugin interfaces).
Maybe there's code that doesn't need to run because key fields didn't change, there's nothing to recalculate, rollup. Hard to say without seeing the debug log.
Maybe the operation doesn't have to be immediate. Think if you can offload some stuff to "scheduled actions", "time based workflows" or in apex terms "#future, batchable, queueable". But they'd have to be relatively safe to run, if there's error - it won't display to the user because the action will be in the background, you'd need to handle the errors manually (send an email, create a record, make chatter post or bell notification).
You could try uploading the log to https://apextimeline.herokuapp.com/ and try to make sense out of that Gantt-chart-like output. Or capture the log "pro" way, with https://help.salesforce.com/s/articleView?id=sf.code_dev_console_solving_problems_using_system_log.htm&type=5 or https://marketplace.visualstudio.com/items?itemName=financialforce.lana (you'll likely need developer's help to make sense out of it).

Extended windows

I have an always one application, listening to a Kafka stream, and processing events. Events are part of a session. And I need to do calculations based off of a sessions data. I am running into a problem trying to correctly run my calculations due to the length of my sessions. 90% of my sessions are done after 5 minutes. 99% are done after 1 hour. Sessions may last more than a day, due to this being a real-time system, there is no determined end. Session are unique, and show never collide.
I am looking for a way where I can process a window multiple times, either with an initial wait period and processing any later events after that, or a pure process per event type structure. I will need to keep all previous events around(ListState), as well as previously processed values(ValueState).
I previously thought allowedLateness would allow me to do this, but it seems the lateness is only considered for when the event should have been processed, it does not extend an actual window. GlobalWindows may also work, but I am unsure if there is a way to process a window multiple times. I believe I can used an evictor with GlobalWindows to purge the Windows after a period of inactivity(although admittedly, I did not research this yet, because I was unsure of how to trigger a GlobalWindow multiple times.
Any suggestions on how to achieve what I am looking to do would be greatly appreciated, I would also be happy to clarify any points needed.
If SessionWindows won't do the job, then you can use GlobalWindows with a custom Trigger and Evictor. The Trigger interface has onElement and timer-based callbacks that can fire whenever and as often as you like. If you go down this route, then yes, you'll also need to implement an Evictor to dispose of elements when they are no longer needed.
The documentation and the source code are helpful when trying to understand how this all fits together.

Adding patterns dynamically in Apache Flink without restarting job

My use case is that I want to apply different CEP patterns to the same datastream. the CEP patterns come dynamically & i want them to be added to flink without having to restart the job. While all conditions can be handled via custom classes that implement IterativeCondition, my main problem is that the temporal condition accepts only TimeWindow; which cannot be handled. Is there some way that the value passed to .within() be set based on the input elements?
Something similar was asked here: Flink and Dynamic templates recognition
Best Answer:
"What one could add is a co-flat map operator which receives on one input channel the events and on the other input channel patterns. For each newly received pattern one either updates the existing NFA (this functionality is missing) or compiles a new one. In the latter case, one would apply incoming events to all stored NFAs."
I am trying to implement this but I am facing some difficulty. Specifically, on the point of "In the latter case, one would apply incoming events to all stored NFAs"
Reason being that I apply stream to pattern using: PatternStream matchStream = CEP.pattern(tmatchStream, pattern);
But the stream "tmatchStream" would not be defined inside the co-flatMap. Am I missing something here??? Any help would be greatly appreciated.
Unfortunately the answer to the linked question is still valid. Flink CEP does not support dynamic patterns at that moment. There is already a JIRA ticket for that though: FLINK-7129
The earliest reasonable target version for that feature will be 1.6.0

Multiple consumers for the same endpoint is not allowed

I would like to read files from a directory with camel file consumer but I need my route to be transacted. So I can not use threads inside the rout.
Is it ok to write multiply routes to read from the same endpoint (same directory) with a little change between the uris (for example the sort type) , and like this to avoid the Multiple consumers for the same endpoint is not allowed exception ?
Yeah sure you can do that, mind that you will have competing consumes for the same files now, so mind about read-locks. By default Camel use the marker file.
You can also use different delay so they dont poll at the same interval/time. And you can sort by random to make less chance of processing the same files.

How to make another search call inside SOLR

I would like to implement some kind of fallback querying mechanism inside SOLR. That is if a first search call doesn't generate enough results, I would like to make another call with different ranking and then combine the results and return it. I guess this can be done in the SOLR client side but I hope to do this inside the SOLR. By reading documentation, I guess I need to implement a search component and then add it next to "query" component? Any reference or experience in this regard would be highly appreciated.
SearchHandler calls all the registered search components in order you define, and there are several stages (prepare,process etc.).
You know the number of results only after the distributed processing phase (I suppose you work with distributed mode),so your custom search component should check the number of results in response object and run its own query if necessary.
Actually you may inherit (or wrap) a regular QueryComponent for that, augmenting its process/distributed process phases.

Resources