Apache Camel splitter with hazelcast seda queue - apache-camel

I'm trying to do a file import process where a file is picked up in a subdirectory of a given folder, the subdirectory identifying the client the file is for, then the records are parsed, split, and sent on Hazelcast SEDA queues. I want to process each record as its read off of the Hazelcast SEDA queue, then it returns a status code (created, updated, or errored) which can be aggregated.
I'm also creating a job record when the file is first picked up and I want to update the job record with the final count of created, updated, and errors.
The JobProcessor below creates this record and sets the client Organization and Job objects in headers on the message. The CensusExcelDataFormat reads an Excel file and creates an Employee object for each line, then returns a Collection.
from("file:" + censusDirectory + "?recursive=true").idempotentConsumer(new SimpleExpression("file:name"), idempotentRepository)
.process(new JobProcessor(organizationService, jobService, Job.JobType.CENSUS))
.unmarshal(censusExcelDataFormat)
.split(body(), new ListAggregationStrategy()).parallelProcessing()
.to(ExchangePattern.InOut, "hazelcast:seda:process-employee-import").end()
.process(new JobCompletionProcessor(jobService))
.end();
from("hazelcast:seda:process-employee-import")
.idempotentConsumer(simple("${body.entityId}"), idempotentRepository)
.bean(employeeImporterJob, "importOrUpdate");
The problem I'm having is that the list aggregation happens immediately and instead of getting a list of statuses I'm getting the same list of Employee objects. I want the Employee objects to be sent on the SEDA queue and the return value from the processing on the queue to be aggregated then run through the JobCompletionProcessor to update the Job record.

The behaviour is you are seeing is the default behavior. The apache camel splitter documentation clearly states this in the what the splitter returns section.
Camel 2.2 or older: The Splitter will by default return the last
splitted message.
Camel 2.3 and newer: The Splitter will by default return the
original input message.
For all versions: You can override this by supplying your own
strategy as an AggregationStrategy. There is a sample on this page
(Split aggregate request/reply sample). Notice it's the same
strategy as the Aggregator supports. This Splitter can be viewed as
having a build in light weight Aggregator.
So as you can see you are required to implement your own splitter aggregation strategy. To do this create a new class that implements AggrgationStrategy something like the code below:
public class MyAggregationStrategy implements AggregationStrategy
{
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null) //this would be null on the first exchange.
{
//do some work on the first time if needed
}
/*
Here you put your code to calculate failed, updated, created.
*/
}
}
You can then use your custom aggregation strategy by specifying it like the following examples:
.split(body(), new MyAggregationStrategy()) //Java DSL
<split strategyRef="myAggregationStrategy"/> //XML Blueprint

Related

Code after Splitter with aggregation strategy is not executed if exception in inner route were handled (Apache Camel)

I've faced with behavior that I can't understand. This issue happens when Split with AggregationStrategy is executed and during one of the iterations, an exception occurs. An exception occurs inside of Splitter in another route (direct endpoint which is called for each iteration). Seems like route execution stops just after Splitter.
Here is sample code.
This is a route that builds one report per each client and collects names of files for internal statistics.
#Component
#RequiredArgsConstructor
#FieldDefaults(level = PRIVATE, makeFinal = true)
public class ReportRouteBuilder extends RouteBuilder {
ClientRepository clientRepository;
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("direct:handleError")); //handles an error, adds error message to internal error collector for statistic and writes log
from("direct:generateReports")
.setProperty("reportTask", body()) //at this point there is in the body an object of type ReportTask, containig all data required for building report
.bean(clientRepository, "getAllClients") // Body is a List<Client>
.split(body())
.aggregationStrategy(new FileNamesListAggregationStrategy())
.to("direct:generateReportForClient") // creates report which is saved in the file system. uses the same error handler
.end()
//when an exception occurs during split then code after splitter is not executed
.log("Finished generating reports. Files created ${body}"); // Body has to be List<String> with file names.
}
}
AggregationStrategy is pretty simple - it just extracts the name of the file. If the header is absent it returns NULL.
public class FileNamesListAggregationStrategy extends AbstractListAggregationStrategy<String> {
#Override
public String getValue(Exchange exchange) {
Message inMessage = exchange.getIn();
return inMessage.getHeader(Exchange.FILE_NAME, String.class);
}
}
When everything goes smoothly after splitting there is in the Body List with all file names. But when in the route "direct:generateReportForClient" some exception occurred (I've added error simulation for one client) than aggregated body just contains one less file name -it's OK (everything was aggregated correctly).
BUT just after Split after route execution stops and result that is in the body at this point (List with file names) is returned to the client (FluentProducer) which expects ReportTask as a response body.
and it tries to convert value - List (aggregated result) to ReportTask and it causes org.apache.camel.NoTypeConversionAvailableException: No type converter available to convert from type
Why route breaks after split? All errors were handled and aggregation finished correctly.
PS I've read Camel In Action book and Documentation about Splitter but I haven't found the answer.
PPS project runs on Spring Boot 2.3.1 and Camel 3.3.0
UPDATE
This route is started by FluentProducerTemplate
ReportTask processedReportTask = producer.to("direct:generateReports")
.withBody(reportTask)
.request(ReportTask.class);
The problem is error handler + custom aggregation strategy in the split.
From Camel in Action book (5.3.5):
WARNING When using a custom AggregationStrategy with the Splitter,
it’s important to know that you’re responsible for handling
exceptions. If you don’t propagate the exception back, the Splitter
will assume you’ve handled the exception and will ignore it.
In your code, you use the aggregation strategy extended from AbstractListAggregationStrategy. Let's look to aggregate method in AbstractListAggregationStrategy:
#Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
List<V> list;
if (oldExchange == null) {
list = getList(newExchange);
} else {
list = getList(oldExchange);
}
if (newExchange != null) {
V value = getValue(newExchange);
if (value != null) {
list.add(value);
}
}
return oldExchange != null ? oldExchange : newExchange;
}
If a first exchange is handled by error handler we will have in result exchange (newExchange) number of properties set by Error Handler (Exchange.EXCEPTION_CAUGHT, Exchange.FAILURE_ENDPOINT, Exchange.ERRORHANDLER_HANDLED and Exchange.FAILURE_HANDLED) and exchange.errorHandlerHandled=true. Methods getErrorHandlerHandled()/setErrorHandlerHandled(Boolean errorHandlerHandled) are available in ExtendedExchange interface.
In this case, your split finishes with an exchange with errorHandlerHandled=true and it breaks the route.
The reason is described in camel exception clause manual
If handled is true, then the thrown exception will be handled and
Camel will not continue routing in the original route, but break out.
To prevent this behaviour you can cast your exchange to ExtendedExchange and set errorHandlerHandled=false in the aggregation strategy aggregate method. And your route won't be broken but will be continued.
#Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
Exchange aggregatedExchange = super.aggregate(oldExchange, newExchange);
((ExtendedExchange) aggregatedExchange).setErrorHandlerHandled(false);
return aggregatedExchange;
}
The tricky situation is that if you have exchange handled by Error Handler as not a first one in your aggregation strategy you won't face any issue. Because camel will use the first exchange(without errorHandlerHandled=true) as a base for aggregation.

Camel: Poll Enrich with Aggregation

From the camel book, section 'Using pollEnrich to merge additional data with an existing message', it shows that you can merge the oldExchange(from the quarz) with the new one (from ftp).
The problem is that I have a file from a topic(old Exchange) and I use pollEnrich to get a new file from a ftp server and I want to merge this too. I am interested in set some headers from oldExchange to the newExchange.
The problem that I am facing is that the oldExchange is all the time null.
I have read the examples from camel book, for aggregator and there said: "The first message arrives for the first group. == null".
I don't understand, then where is my oldExchange? the one from the topic. Why only at the second iteration the exchange is not null (for the same group).
from("myTopic")
.pollEnrich()
.simple("ftp://myUrl&fileName=${in.headers.test}")
.aggregate((Exchange oldExchange, Exchange newExchange) -> {
final String oldHeader = oldExchange.getIn().getHeader("test", String.class);
newExchange.getIn().setHeader("test", oldHeader);
return newExchange;
})
I have read this: http://camel.465427.n5.nabble.com/Split-and-Aggregate-Old-Exchange-is-null-everytime-in-AggregationStrategy-td5746365.html#a5746405 and still I don't understand how can both messages belong to the same group.
The first message arrives for the first group. == null. I don't understand ...
This is true for a standard aggregation where you aggregate for example multiple incoming messages to one. In this case, on the first incoming message the aggregator is still empty and therefore the oldExchange (aggregator content) is null. You have to wait for another (second) message to be able to aggregate something.
However, in your case (enrich) the oldExchange should not be null because the first message, i.e. the message from your topic, is already there.
Have you tried to inspect the message from the topic in the debugger or log it out before it reaches the enricher? Just to make sure it is not empty.
Added after a test
This is fascinating, I tried this with a unit test and when I define the pollEnrich as you do, I get the inverse result: My consumed message routed by .from(...) is the oldExchange and my newExchange is always null.
However, if I define the pollEnrich "inline", it works fine
.pollEnrich("URI", Timeout, (AggregationStrategy))
I suspect that this is explainable if you analyze what the DSL does with these two definitions, but from my quick test perspective it looks a bit strange.
#burki true, is it working as you said with the aggregationStrategy inside the pollEnrich() but I need the simple because I am calling an endpoint dynamically and I cannot do this in the pollEnrich (or at least I don't know how).
I was able to solve like this:
from("myTopic")
.pollEnrich()
.simple("ftp://myUrl&fileName=${in.headers.test}")
.aggregationStrategy((Exchange oldExchange, Exchange newExchange) -> {
final String oldHeader = oldExchange.getIn().getHeader("test", String.class);
newExchange.getIn().setHeader("test", oldHeader);
return newExchange;
})
So instead of the .aggregate call, I am using .aggregationStrategy , what I understood is that the .aggregate call is for the standard aggregation (as #burki mentioned) if we want to aggregate multiple messages and the .aggregationStrategy call can be used to merge 2 messages (one of them is from an external service).

How to use multithreading inside an JPA camel route

We have an importer running on a powerful, multi-core server. However, our Apache Camel routes are single threaded, which is a shame.
Our [camel] importer is a single-instance program. How can I make a specific route process the messages using multiple threads? The messages are atomic and are processed by a bean, which already does this in a thread-safe way.
I am already happy if I could process batches (maxMessagesPerPoll) in threads and have idle time until the next poll takes place (after all, that's still better than sequential processing).
Here is one of the routes I would like to make multithreaded:
public void onConfigure() throws Exception {
// This is a JPA query which selects all unprocessed modules
String query = RouteQueryHelper.selectNextUnprocessedStaged(ImportAction.IMPORT_MODULES);
from("jpa:com.so.importer.entity.ModuleStageEntity" +
"?consumer.query=" + query +
"&maxMessagesPerPoll=2000" +
"&consumeLockEntity=false" +
"&consumer.delay=1000" +
"&consumeDelete=false")
.transacted().policy("CAMEL_DEFAULT_POLICY")
.bean(moduleImportService) // processes the entity and updates it's status flag
.to("log:import-module?groupInterval=10000")
.routeId("so.route.import-module");
}
The route has consumeDelete=false, because we use a status property on the entity instead (which is modified and saved). The status property is also respected in the consumer.query.
We use camel version 2.17.1 in spring boot (1.3.8.RELEASE) on Java 8.
EDIT 2019-Jan-21: The entities have a method with #Consumed on them, which pushes the entity into the next route after it was processed:
#Consumed
public void gotoNextStatus() {
switch (stageStatus) {
case STAGED: setStageStatus(StageStatus.IMPORTED); break;
case IMPORTED: setStageStatus(StageStatus.RENDERED); break;
case RENDERED: setStageStatus(StageStatus.DONE); break;
}
}
You could introduce some asynchronisation by sending your messages to an intermediate SEDA endpoint:
from("jpa:")
...
.to("seda:intermediateStage")
And then put the real processing inside a new route with N concurrrent SEDA consumers (default is one):
from("seda:intermediateStage?concurrentConsumers=5")
.process(...)

Camel - data from two sources

We are using apache Camel for routing and extracting from the files.
I have a situation where I need to get the data from a file on a shared folder and data from database. I need to combine the data only when data from both sides have arrived. If either side has not received then my data combine process should wait till both sides are present.
Is it possible to possible? How can I achieve that? Any sample code?
Something must trigger the process - either the file or the database so pick one.
Then you can use the enricher pattern to populate the other source (when data is ready). An aggregation strategy is used to combine the data. You typically write the aggregation strategy in java.
The link has examples how to enrich and merge data. You can find out how to handle database and files in the Camel documentation.
I use this for zip processed file with processing log together. I attached an example, hope it will help you.
//Archived
from("direct:" + EnvironmentSetup.ARCHIVED)
.routeId(ROUTES.ARCHIVED.name())
.setHeader(HEADER_ZIP_AGG_ID, header(Exchange.FILE_NAME))
.setHeader(HEADER_AFTER_ZIP_DEST).constant(getArchiveUri())
.setHeader(HEADER_STATUS).constant(STATUS.SUCCESS)
.pipeline()
.to("direct:" + EnvironmentSetup.ARCHIVED_ZIP)
.end()
.pipeline()
.setHeader(Exchange.FILE_NAME, header(Exchange.FILE_NAME).append(".report"))
.setBody(header(ProcessManager.PROCESS_LOG).convertToString())
.to("direct:" + EnvironmentSetup.ARCHIVED_ZIP)
.end()
.end();
from(
"direct:" + EnvironmentSetup.DECRYPT_FAILED_ZIP,
"direct:"+EnvironmentSetup.PROCESS_FAILED_ZIP,
"direct:"+EnvironmentSetup.ARCHIVED_ZIP
)
.routeId("ZIP")
.aggregate(header(HEADER_ZIP_AGG_ID), new CopiedGroupedExchangeAggregationStrategy())
.completionSize(2)
.marshal(zipFileDataFormat)
.multicast()
.pipeline()
.setHeader(Exchange.FILE_NAME, simple(String.format(
"${in.header.%s}/${in.header.%s}", HEADER_EMAIL, Exchange.FILE_NAME))) //header(HEADER_EMAIL). header(Exchange.FILE_NAME))
//.dynamicRouter(header(HEADER_AFTER_ZIP_DEST))
.to("direct:dynamic")
.end()
.pipeline()
.marshal(encryption)
.setHeader(Exchange.FILE_NAME, simple(String.format(
"${in.header.%s}/${in.header.%s}.gpg", HEADER_EMAIL, Exchange.FILE_NAME)))
//.setHeader(Exchange.FILE_NAME, header(Exchange.FILE_NAME).append(".gpg"))
.to("direct:"+EnvironmentSetup.SEND_BACK)
.end()
.end() //end aggregate
.end();
CopiedGroupedExchangeAggregationStrategy.java
public class CopiedGroupedExchangeAggregationStrategy extends
AbstractListAggregationStrategy<Exchange> {
#Override
public boolean isStoreAsBodyOnCompletion() {
// keep the list as a property to be compatible with old behavior
return true;
}
#Override
public Exchange getValue(Exchange exchange) {
return exchange.copy();
}
}

Apache Camel: how to consume messages from two or more JMS queues

From a programming point of view, I have a very simple business case. However, I can't figure out how to implement it using Apache Camel... Well, I have 2 JMS queues: one to receive commands, another - to store large number of message which should be delivered to external system in a batches of 1000 or less.
Here is the concept message exchange algorithm:
upon receiving a command message in 1st JMS queue I prepare XML
message
Send the XML message to external SOAP Web Service to obtain a usertoken
Using the usertoken, prepare another XML message and send it to a REST service to obtain jobToken
loop:
4.1. aggregate messages from 2nd JMS queue in batches of 1000, stop aggregation at timeout
4.2. for every batch, convert it to CSV file
4.3. send csv via HTTP Post to a REST service
4.4. retain batchtoken assigned to each batch
using the jobtoken prepare XML message and send to REST service to commit the batches
using batchtoken check execution status of each batch via XML message to REST service
While looking at Camel I could create a sample project where I can model out the exchange 1-3, 5:
from("file:src/data?noop=true")
.setHeader("sfUsername", constant("a#fd.com"))
.setHeader("sfPwd", constant("12345"))
.to("velocity:com/eip/vm/bulkPreLogin.vm?contentCache=false")
.setHeader(Exchange.CONTENT_TYPE, constant("text/xml; charset=UTF-8"))
.setHeader("SOAPAction", constant("login"))
.setHeader("CamelHttpMethod", constant("POST"))
.to("http4://bulklogin") // send login
.to("xslt:com/eip/xslt/bulkLogin.xsl") //xslt transformation to retrieve userToken
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
String body = (String) exchange.getIn().getBody();
String[] bodyParts = body.split(",");
exchange.getProperties().put("userToken", bodyParts[0]);
.....
}
})
.to("velocity:com/eip/vm/jobInsertTeamOppStart.vm")
.setHeader(Exchange.CONTENT_TYPE, constant("application/xml; charset=UTF-8"))
.setHeader("X-Session", property("userToken"))
.setHeader("CamelHttpMethod", constant("POST"))
.to("http4://scheduleJob") //schedule job
.to("xslt:com//eip/xslt/jobInfoTransform.xsl")
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
String body = (String) exchange.getIn().getBody();
exchange.getProperties().put("jobToken",body.trim());
}
})
//add batches in a loop ???
.to("velocity:com/eip/vm/jobInsertTeamOppEnd.vm")
.setHeader(Exchange.HTTP_URI, simple("https://na15.com/services/async/job/${property.jobToken}"))
.setHeader(Exchange.CONTENT_TYPE, constant("application/xml; charset=UTF-8"))
.setHeader("X-ID-Session", property("userToken"))
.setHeader("CamelHttpMethod", constant("POST"))
.to("http4://closeJob") //schedule job
//check batch?
.bean(new SomeBean());
So, my question is:
How can I read messages from my 2nd JMS queue?
This doesn't strike me as a very good use-case for a single camel route. I think you should implement the main functionality in a POJO and use Camels Bean Integration for consuming and producing messages. This will result in much more easy to maintain code, and also for easier Exception handling.
See https://camel.apache.org/pojo-consuming.html

Resources