onComplete and aggregate with completionTimeout - apache-camel

I'm trying to run an onCompletion() block on my route, which contains an aggregate definition with completionTimeout. It seems like onCompletion is called before the route is actually completed, since I get log entries from OnCompletion before AggregateTimeoutChecker log entries.
How can I make onComplete wait for aggregation timeout?
Of course I can add a delay greater than completionTimeout to onCompletion, but that will slow down my tests a lot.
My route looks like this:
from(fileEndpoint)
.bean(externalLogger, "start")
.onCompletion()
.bean(externalLogger, "end") // <-- Gets called too early
.end()
.split().tokenize("\n")
.bean(MyBean.class)
.aggregate(header("CamelFileName"), ...)
.completionSize(size)
.completionTimeout(500)
.bean(AggregatesProcessor.class); // <-- some changes here don't arrive
// at onCompletion

onCompletion() is triggered for each incoming exchange when it has completed the route. When using an aggregator, all exchanges not completing the aggregation finish the route at the aggregator, thus your externalLogger gets called for each file being aggregated.
If you want logging after the aggregation you could just call the logger after aggregate().
If you need to distinguish between timeout and completion of your aggregation it could be helpful to provide a custom AggregationStrategy and to also implement the interfaces CompletionAwareAggregationStrategy and TimeoutAwareAggregationStrategy.

Related

How to perform Camel routes with different exchanges synchronously?

I have some synchronous Camel routes with:
from("file:...")
...
.to("direct:next1")
from("direct:next1")
...
Now I'd like to run another route with a different exchange synchronously:
from("file:local/A")
...
.to("file:remote/A")
.to("direct:next2")
from("file:remote/A") // direct:next2 ?
...
How can I achieve this?
First off - the elephant in the room. Both of your file routes are going to start and run asynchronously by default because there is no dependency between those two routes at an EIP level to tell one to fire after the other.
There are two ways to solve this (e.g. to force an interdependency between the routes):
Require local/A to start the remote/A route (and to not attempt to restart the route if it's running.
Allow remote/A to poll for the file at given intervals.
Chain-starting the routes is a more sophisticated function of Camel and it allows you to precisely define the lifecycle of routes at your leisure. In this case, after your local/A file route finishes consuming, it would start up the remote/A route.
For that we can leverage the Control Bus EIP which allows us to control the lifecycle of routes directly.
from("file:local/A")
.routeId("localA")
.process(aProcessor)
.to("controlbus:route?routeId=remoteA&action=start");
from("file:remote/A")
.routeId("remoteA")
.autoStartup(false)
.process(aDifferentProcessor)
.to("controlbus:route?routeId=remoteA&action=suspend&async=true");
Polling is probably the most straightforward approach and it doesn't require much in the way of finickiness. Either the file exists or it doesn't when you go to process it, and you can choose to wait for a specific period of time when that file appears again.
from("file:local/A")
.routeId("localA")
.process(aProcessor);
from("file:remote/A?delay=2m")
.routeId("remoteA")
.process(aDifferentProcessor);
}
My preference would be to poll at a fixed interval or on a cron schedule, simply because it doesn't feel like you'd be gaining true synchronicity with either approach. You get the power to decide what fires when, but it's not 100% synchronous like direct routes.

Camel unit of work

I am trying to understand the unit of work concept in Camel. I have a simple question hopefully someone here can help.
If there are multiple routes involved in routing the Excachange for example
from("aws-sqs:Q1").to("direct:processMe");//route1
from("direct:processMe").to("direct:aws-post");//route2
from("direct:aws-post").to("htt4:myservice");//route3
Is the unit of work invoked at the end of each routes? Or only at the end of route3? In my example, will the SQS message be deleted off of SQS once route1 completes? Or will it wait until my message reaches "myservice"?
Thanks.
FOLLOW UP:
I've modified the route slighty:
from("aws-sqs:Q1").to("direct:processMe");//route1
from("direct:processMe").process(
new Processor(){
public void process(Exchange exchange) throws Exception {
throw new RuntimeException("fail on purpose");
}
}
).to("direct:aws-post");//route2
from("direct:aws-post").to("http4:myservice");//route3
The thought process is this:
If the unit of work is invoked at end of each route then once the message is read from SQS Queue it will be acknowledged as read by the SQS component. On the other hand if unit of work is only invoked once the Exchange is done routing through all routes, then the exception in route 2 will result in the message not being acknowledged and will be available for redelivery once the visibility period expires.
The test showed that the message remains on the Q despite being read by first route. It is picked up again and again (until it ends up in dead letter q). As such I strongly believe that the unit boundary is defined by the end of Exchange being routed.
A unit of work is basically a transaction.
By default a message for a route (a camel Exchange) runs inside a single UnitOfWork context.
In your example there are 3 UnitOfWorks set up, starting at each from and finishing at the final to in each route.
I would expect the message from SQS to be consumed after the 1st route finishes. To test, you could add in a sleep to allow you to check the queue.
from("direct:processMe").process(new Processor()
{ void process() { try { Thread.sleep(60000L) } catch (Exception e) { } }
}).to("direct:aws-post")
If you want the message to remain on the queue until myservice gets the message then you need to put the processing in a single route.
I have found a great explanation as to the unit of work boundary:
"The OnCompletion DSL name is used to define an action that is to take place when a Unit of Work is completed.
A Unit of Work is a Camel concept that encompasses an entire exchange. See Section 43.1, “Exchanges”. The onCompletion command has the following features:
The scope of the OnCompletion command can be global or per route. A route scope overrides global scope.
OnCompletion can be configured to be triggered on success for failure.
The onWhen predicate can be used to only trigger the onCompletion in certain situations.
You can define whether or not to use a thread pool, though the default is no thread pool."
In case of SQS processing, the consumer defines onCompletion on the exchange. So it is invoked only after the exchange is done routing.
The whole answer can be found here: Apache Camel Development Guide2.11. OnCompletion

Aggregate results of batch consumer in Camel (for example from SQS)

I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter
When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.
The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.

Apache Camel aggregator in combination with onCompletion

I want the onCompletion to occur after all the aggregated exchanges triggered by both completion size followed by timeout are processed. But it occurs right after the completion size is triggered with some of the exchanges waiting to be triggered by the timeout criteria.
I have the route configured as
from(fromEndPoint)
.onCompletion()
.doSomething()
.split() // each Line
.streaming()
.parallelProcessing()
.unmarshal().bindy
.aggregate()
.completionSize(100)
.completionTimeout(5000)
.to(toEndpoint)
Assume if the split was done on 405 lines, the first 4 sets of aggregated exchanges go to the to endpoint completing 400 lines(exchanges) . And then, it immediately triggers the onCompletion. But there are still 5 more aggregated exchanges which would be triggered when the completionTimeout criteria is met. It didn't trigger the onCompletion after the 5 exchanges are routed to the to endpoint.
My question here is , either the onCompletion should be triggered for each exchange or once after all.
Note:- My from endpoint here is a File.

Getting IDs of added documents after import operation is complete

I'm trying to setup a Solr dataimport.EventListener to call a SOAP service with the IDs of the documents which have been added in the update event. I have a class which implements org.apache.solr.handler.dataimport.EventListener and I thought that the result of getAllEntityFields() would yield a collection of document IDs. Unfortunately, the result of the method yields an empty list. Even more confusing is that context.getSolrCore().getName() yields an empty string rather than the actual core name. So it seems I am not quite on the right path here.
The current setup is the following:
Whenever a certain sproc is called in SQL, it puts a message in a queue. This queue has a listener on it which initiates a program which reads the queue and calls other sprocs. After the sprocs are complete, a delta or full import operation is performed on Solr. Immediately after, a method is called to update a cache. However, because the import operation on Solr may not have yet been completed before this update method is called the cache may be updated with "stale" data.
I was hoping to use a dataimport EventListener to call the method which updates the cache since my other options seem far too complex (e.g. polling the dataimport URL to determine when to call the update method or using a queue to list document IDs which need to be updated and have the EventListener call a method on a service to receive this queue and update the cache). I'm having a bit of a hard time finding documentation or examples. Does anyone have any ideas on how I should approach the problem?
From what i understand, you are trying to update your cache as and when the documents are added. Depending on what version of solr you are running, you can do one of the following.
Solr 4.0 provides script transformer that lets you do this.
http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer
With prior versions of solr, you can chain one handler on top of other as answered in the following post.
Solr and custom update handler

Resources