How can I improve the performance of a Seda queue? - apache-camel

Take this example:
from("seda:data").log("data added to queue")
.setHeader("CamelHttpMethod", constant("POST"))
.setHeader(Exchange.CONTENT_TYPE, constant("application/json"))
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
exchange.setProperty(Exchange.CHARSET_NAME, "UTF-8");
}
})
.recipientList(header(RECIPIENT_LIST))
.ignoreInvalidEndpoints().parallelProcessing();
Assume the RECIPENT_LIST header contains only one http endpoint. For a given http endpoint, messages should be processed in order, but two messages for different end points can be processed in parallel.
Basically, I want to know if there is anything be done to improve performance. For example, would using concurrentConsumers help?

SEDA with concurrentConsumers > 1 would absolutely help with throughput because it would allow multiple threads to run in parallel...but you'll need to implement your own locking mechanism to make sure only a single thread is hitting a given http endpoint at a given time
otherwise, here is an overview of your options: http://camel.apache.org/parallel-processing-and-ordering.html
in short, if you can use JMS, then consider using ActiveMQ message groups as its trivial to use and is designed for exactly this use case (parallel processing, but single threaded by groups of messages, etc).

Related

how to perform parallel processing of gcp pubsub messages in apache camel

I have this code below that takes message from pubsub source topic -> transform it as per a template -> then publish the transformed message to a target topic.
But to improve performance I need to do this task in parallel.That is i need to poll 500 messages,and then transform it in parallel and then publish them to the target topic.
From the camel gcp component documentation I believe maxMessagesPerPoll and concurrentConsumers parameter will do the job.Due to lack of documentation I am not sure how does it internally works.
I mean a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic b)what about ordering of the messages c) should I be looking at parallel processing EIPs as an alternative
etc.
The concept is not clear to me
Was go
// my route
private void addRouteToContext(final PubSub pubSub) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("google-pubsub:{{gcp_project_id}}:{{pubsub.dead.letter.topic}}")
.useOriginalMessage().onPrepareFailure(new FailureProcessor()));
/*
* from topic
*/
from("google-pubsub:{{gcp_project_id}}:" + pubSub.getFromSubscription() + "?"
+ "maxMessagesPerPoll={{consumer.maxMessagesPerPoll}}&"
+ "concurrentConsumers={{consumer.concurrentConsumers}}").
/*
* transform using the velocity
*/
to("velocity:" + pubSub.getToTemplate() + "?contentCache=true").
/*
* attach header to the transform message
*/
setHeader("Header ", simple("${date:now:yyyyMMdd}")).routeId(pubSub.getRouteId()).
/*
* log the transformed event
*/
log("${body}").
/*
* publish the transformed event to the target topic
*/
to("google-pubsub:{{gcp_project_id}}:" + pubSub.getToTopic());
}
});
}
a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic
No, Camel does not create 500 parallel threads in this case. As you suspect, the number of concurrent consumer threads is set with concurrentConsumers. So if you define 5 concurrentConsumers with a maxMessagesPerPoll of 500, every consumer will fetch up to 500 messages and process them one after the other in a single thread. That is, you have 5 messages processed in parallel.
what about ordering of the messages
As soon as you process messages in parallel, the order of messages is messed up. But this already happens with 1 Consumer when you got processing errors and they are detoured to your deadLetterChannel and reprocessed later.
should I be looking at parallel processing EIPs as an alternative
Only if the concurrentConsumers option is not sufficient.
When you mention the concurrentConsumers option(let's say concurrentConsumers=10), you are asking Camel to create a thread pool of 10 threads, and each of those 10 threads will pick up a different message from the pub-sub queue and process them.
The thing to note here is that when you are specifying the concurrentConsumers option, the thread pool uses a fixed size, which means that a fixed number of active threads are waiting at all times to process incoming messages. So 10 threads(since I specified concurrentConsumers=10) will be waiting to process my messages, even if there aren't 10 messages coming in simultaneously.
Obviously, this is not going to guarantee that the incoming messages will be processed in the same order. If you are looking to have the messages in the same order, you can have a look at the Resequencer EIP to order your messages.
As for your third question, I don't think google-pubsub component allows a parallel processing option. You can make your own using the Threads EIP. This would definitely give more control over your concurrency.
Using Threads, your code would look something like this:
from("google-pubsub:project-id:destinationName?maxMessagesPerPoll=20")
// the 2 parameters are 'pool size' and 'max pool size'
.threads(5, 20)
.to("direct:out");

Camel ApacheMQ -> AHC behaviour (blocking?)

I just started using Apache Camel and I'm curious about the seemingly counter-intuitive default behaviour of asynchronous http client (AHC). While consuming messages from ActiveMQ, I can't get it to act in a non-blocking fashion.
My route looks like this:
#Component
public class Broadcaster extends RouteBuilder {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("activemq:failed.messages"));
from("activemq:outbound.messages")
.setExchangePattern(ExchangePattern.InOnly)
.recipientList(simple("ahc:${in.header[PublishDestination]}"))
.end();
}
}
I enqueued several messages, half of which I sent to a delayed web server, and the other half to a normal one. I expected to see all the normal messages consumed immediately by the fast server, and the slow messages gradually over time. However, this was the behaviour observed on the fast web server:
00:24:02.585, <hello>World</hello>
00:24:03.622, <hello>World</hello>
00:24:04.640, <hello>World</hello>
00:24:05.658, <hello>World</hello>
As you can see there is exactly one second between each logged request that corresponds to the artificial 1 second delay on the slow server. Based on the route timings, it looks like the JMS consumer is waiting for AHC to complete before it consumes the next message off the queue:
Processor Elapsed (ms)
[activemq://outbound.messages ] [ 1020]
[setExchangePattern[InOnly] ] [ 0]
[ahc:${in.header[PublishDestination]}} ] [ 1018]
Am I supposed to explicitly use async producers and write callback handlers in these cases, or is there something else I'm missing? Thank you!
Well, case of a RTFM I guess, although I ActiveMQ page leaves a lot to be desired in terms of properties available for endpoint configuration. There should probably be a note to say most (all?) JMS config options are also available for ActiveMQ component. In any case, the solution is to define the consumer as follows:
from("activemq:outbound.messages?asyncConsumer=true")

Apache Camel: how to consume messages from two or more JMS queues

From a programming point of view, I have a very simple business case. However, I can't figure out how to implement it using Apache Camel... Well, I have 2 JMS queues: one to receive commands, another - to store large number of message which should be delivered to external system in a batches of 1000 or less.
Here is the concept message exchange algorithm:
upon receiving a command message in 1st JMS queue I prepare XML
message
Send the XML message to external SOAP Web Service to obtain a usertoken
Using the usertoken, prepare another XML message and send it to a REST service to obtain jobToken
loop:
4.1. aggregate messages from 2nd JMS queue in batches of 1000, stop aggregation at timeout
4.2. for every batch, convert it to CSV file
4.3. send csv via HTTP Post to a REST service
4.4. retain batchtoken assigned to each batch
using the jobtoken prepare XML message and send to REST service to commit the batches
using batchtoken check execution status of each batch via XML message to REST service
While looking at Camel I could create a sample project where I can model out the exchange 1-3, 5:
from("file:src/data?noop=true")
.setHeader("sfUsername", constant("a#fd.com"))
.setHeader("sfPwd", constant("12345"))
.to("velocity:com/eip/vm/bulkPreLogin.vm?contentCache=false")
.setHeader(Exchange.CONTENT_TYPE, constant("text/xml; charset=UTF-8"))
.setHeader("SOAPAction", constant("login"))
.setHeader("CamelHttpMethod", constant("POST"))
.to("http4://bulklogin") // send login
.to("xslt:com/eip/xslt/bulkLogin.xsl") //xslt transformation to retrieve userToken
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
String body = (String) exchange.getIn().getBody();
String[] bodyParts = body.split(",");
exchange.getProperties().put("userToken", bodyParts[0]);
.....
}
})
.to("velocity:com/eip/vm/jobInsertTeamOppStart.vm")
.setHeader(Exchange.CONTENT_TYPE, constant("application/xml; charset=UTF-8"))
.setHeader("X-Session", property("userToken"))
.setHeader("CamelHttpMethod", constant("POST"))
.to("http4://scheduleJob") //schedule job
.to("xslt:com//eip/xslt/jobInfoTransform.xsl")
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
String body = (String) exchange.getIn().getBody();
exchange.getProperties().put("jobToken",body.trim());
}
})
//add batches in a loop ???
.to("velocity:com/eip/vm/jobInsertTeamOppEnd.vm")
.setHeader(Exchange.HTTP_URI, simple("https://na15.com/services/async/job/${property.jobToken}"))
.setHeader(Exchange.CONTENT_TYPE, constant("application/xml; charset=UTF-8"))
.setHeader("X-ID-Session", property("userToken"))
.setHeader("CamelHttpMethod", constant("POST"))
.to("http4://closeJob") //schedule job
//check batch?
.bean(new SomeBean());
So, my question is:
How can I read messages from my 2nd JMS queue?
This doesn't strike me as a very good use-case for a single camel route. I think you should implement the main functionality in a POJO and use Camels Bean Integration for consuming and producing messages. This will result in much more easy to maintain code, and also for easier Exception handling.
See https://camel.apache.org/pojo-consuming.html

Camel RaabitMQ Acknowledgement

I am using Camel for my messaging application. In my use case I have a producer (which is RabbitMQ here), and the Consumer is a bean.
from("rabbitmq://127.0.0.1:5672/exDemo?queue=testQueue&username=guest&password=guest&autoAck=false&durable=true&exchangeType=direct&autoDelete=false")
.throttle(100).timePeriodMillis(10000)
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
MyCustomConsumer.consume(exchange.getIn().getBody())
}
});
Apparently, when autoAck is false, acknowledgement is sent when the process() execution is finished (please correct me if I am wrong here)
Now I don't want to acknowledge when the process() execution is finished, I want to do it at a later stage. I have a BlockingQueue in my MyCustomConsumer where consume() is putting messages, and MyCustomConsumer has different mechanism to process them. I want to acknowledge message only when MyCustomConsumer finishes processing messages from BlockingQueue. How can I achieve this?
You can consider to use the camel AsyncProcessor API to call the callback done once you processing the message from BlockingQueue.
I bumped into the same issue.
The Camel RabbitMQConsumer.RabbitConsumer implementation does
consumer.getProcessor().process(exchange);
long deliveryTag = envelope.getDeliveryTag();
if (!consumer.endpoint.isAutoAck()) {
log.trace("Acknowledging receipt [delivery_tag={}]", deliveryTag);
channel.basicAck(deliveryTag, false);
}
So it's just expecting a synchronous processor.
If you bind this to a seda route for instance, the process method returns immediately and you're pretty much back to the autoAck situation.
My understanding is that we need to make our own RabbitMQ component to do something like
consumer.getAsyncProcessor().process(exchange, new AsynCallback() {
public void done(doneSync) {
if (!consumer.endpoint.isAutoAck()) {
long deliveryTag = envelope.getDeliveryTag();
log.trace("Acknowledging receipt [delivery_tag={}]", deliveryTag);
channel.basicAck(deliveryTag, false);
}
}
});
Even then, the semantics of the "doneSync" parameter is not clear to me. I think it's merely a marker to identify whether we're dealing with a real async processor or a synchronous processor that was automatically wrapped into an async one.
Maybe someone can validate or invalidate this solution?
Is there a lighter/faster/stronger alternative?
Or could this be suggested as the default implementation for the RabbitMQConsumer?

Apache Camel: access both request and reply message at end of route

I would like to process both request and response messages at the end of my route. However, I do not see a way how to access the original request message.
I have the terrible feeling I am struggling with some basic concept.
Here is a simple example route in DSL to outline my problem (streamCaching is enabled for the whole context):
from("activemq:queue:myQueue")
.to("log:" + getClass().getName() + "?showOut=true")
.to("http://localhost:8080/someBackend")
.log("Now in.Body returns this: ${in.body} and out.Body this: ${out.body}")
.to("log:" + getClass().getName() + "?showOut=true");
Here is an according excerpt from my logs (line-breaks edited for better reading). As one can see, the original SOAP message is lost once the http server replied, and the SOAP response object is stored in the inBody of the message.
2012-09-25 17:28:08,312 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut, BodyType:byte[],
Body:<?xml version="1.0" encoding="UTF-8"?><env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header /><env:Body><urn:someRequest xmlns:urn="http://foo.bar.local/ns"></urn:someRequest></env:Body></env:Envelope>,
Out: null]
2012-09-25 17:28:08,398 org.apache.camel.component.http.HttpProducer DEBUG -
Executing http POST method: http://localhost:8080/someBackend
2012-09-25 17:28:09,389 org.apache.camel.component.http.HttpProducer DEBUG -
Http responseCode: 200
2012-09-25 17:28:09,392 route2 INFO -
Now in.Body returns this: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:someResponse xmlns:ns2="http://foo.bar.local/ns"</ns2:someResponse></soap:Body></soap:Envelope>
and out.Body this:
2012-09-25 17:28:09,392 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut,
BodyType:org.apache.camel.converter.stream.InputStreamCache,
Body:[Body is instance of org.apache.camel.StreamCache],
Out: null]
I would have expected to have in.body and out.body be preserved across the whole route?
Alternative solutions I am considering:
Make use of the Correlation Identifier pattern to correlate both request and reply. But would this preserve the message bodies as well? Also, my request/reply messages do not have unique identifiers for correlation.
Write a custom bean, which performs the call to the http backend, processing both request and reply objects (but this is basically a no-Camel solution, reinventing the wheel and hence not preferred)
Already failed approaches:
I tried to access the original request message using a Processor like this at the end of my route, with no success:
process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
Message originalInMessage = exchange.getUnitOfWork().getOriginalInMessage();
logger.debug(originalInMessage.getBody(String.class));
logger.debug(exchange.getIn().getBody(String.class));
}
});
Thanks for any help
Simply store the original body of the in message in a header or a property and retrieve it at the end:
from("activemq:queue:myQueue")
.setProperty("origInBody", body())
.to("http://localhost:8080/someBackend")
After the http call you can then access the property origInBody.
First, this article shows very well how in and out works in camel: http://camel.apache.org/using-getin-or-getout-methods-on-exchange.html
Typically, the out message is not always used, but rather copied from the in-message in each step.
In your case, where you want the original message to stay around til the end of the route, you could go ahead with the Enrichment EIP. http://camel.apache.org/content-enricher.html
Your route would be something like this:
public class MyAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange orig, Exchange httpExchange){
// if you want to check something with the Http request, you better do that here
if( httpExchange is not correct in some way )
throw new RuntimeException("Something went wrong");
return orig;
}
}
AggregationStrategy aggStrategy = new MyAggregationStrategy();
from("activemq:queue:myQueue")
.enrich("http://localhost:8080/someBackend",aggStrategy)
.//keep processing the original request here if you like, in the "in" message
One of the biggest problem of camel, is the ease to misuse it. The best way to use it correctly is to think in terms of EIP : one of the main goals of camel, is to implement EIP in its DSL.
Here is a list of EIP
Now think about it. You want the request and the response at the end, for what use ? Logging, Aggregation, ... ? For logging, a correlationId should suffice, so I presume you need it to create a response, based on both request and the proxied-response. If that's what you want, you could do something like
from("direct:receiveRequest")
.enrich("direct:proxyResponse", new RequestAndResponseAggregationStrategy())
You will have the opportunity to merge your Request (in oldExchange) and your Response (in newExchange).
With all the due respect I have for Christian Schneider, I do think the idea of putting the request in a property that could be reused later is a bad design. By doing it, you create side-effect between your routes. If your route is a subroute for another, you'll maybe erase their property. If you store it to put it back later, maybe you should do something like
from("direct:receiveRequest")
.enrich("direct:subRouteToIgnoreResponse", AggregationStrategies.useOriginal())
A really really bad design that I have done too many time myself is to do :
from("direct:receiveRequest")
.to("direct:subroute")
from("direct:subroute")
.setProperty("originalBody", body())
.to("direct:handling")
.transform(property("originalBody")
This will lead to "properties/headers hell", and to routes that are just a successive call of processors.
And if you can't think of a solution of your problem with EIP, you should maybe use camel only to access their components. For example, something like :
from("http://api.com/services")
.to(new SomeNotTranslatableToEIPProcessor())
.to("ftp://destination")
But don't forget that those components has their own goals : creating a common abstraction between similar behaviour (e.g, time based polling consumer). If you have a very specific need, trying to bend a camel component to this specific need can lead to huge chunk of code not easily maintainable.
Don't let Camel become your Golden Hammer anti-pattern
I often use an aggregation strategy, which preserves the old exchange and puts the result of the enrich into a header:
import org.apache.camel.Exchange;
import org.apache.camel.processor.aggregate.AggregationStrategy;
public class SetBodyToHeaderAggregationStrategy implements AggregationStrategy {
private String headerName = "newBody";
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
oldExchange.getIn().setHeader(headerName, newExchange.getIn().getBody());
return oldExchange;
}
#SuppressWarnings("unused")
public void setHeaderName(String headerName) {
this.headerName = headerName;
}
}
Now you can use it like this:
<enrich strategyRef="setBodyToHeaderAggregationStrategy">
<constant>dosomething</constant>
</enrich>
<log loggingLevel="INFO" message="Result body from enrich: ${header.newBody}. Original body: ${body}" loggerRef="log"/>

Resources