Camel Multicast Subroutes out of order - apache-camel

I have a scenario where I get as input Message A. Message A must then be split into 3 different types of message, and forwarded to other routes. It is important that the messages arrive in a precise order, Ie. A-1 must be sent before A-2, which must be sent before A-3.
To do this I have done the following (outline):
from("activemq:queue:somequeue-local")
.multicast().to("direct:a1","direct:a2","direct:a3");
from("direct:a1)
//split incoming message and prepare output document for A-1
.to("activemq:queue:otherqueue")
.from("direct:a2)
//split incoming message and prepare output document for A-2
.to("activemq:queue:otherqueue")
.from("direct:a3)
//split incoming message and prepare output document for A-3
.to("activemq:queue:otherqueue")
And in another context, responsible for sending out the info to the external system, I have
.from("activemq:queue:otherqueue?maxMessagesPerTask=1&concurrentConsumers=1&maxConcurrentConsumers=1")
// do different stuff based on which type we are called with then end with
.beanref("somebean","writeToFileAndCallImportbat");
Now, my problem is, that when I get to the receiver, I get the messages in random order. Sometimes A-1,A-3,A-2, sometimes right, A-1,A-2,A-3.
I have tried adding JMSXGroupID and JMSXGroupSeq to the messages, but without any luck.
I have also tried skipping the MQ part entirely, and use direct-vm: to call the shared receiver, but then it looks like I have three simultanious invocations of the receiver at once, and still in random execution order.
I was under the impression that multicast would run sequential, unless otherwise prompted to?
Is there something fundamentally wrong with the approach taken?
I am using Camel version 2.12.
Or, said more plainly:
I would like a route that creates three different output messages, and executes a batch file on them, in order. How do I go about that?

If you use the Splitter pattern, have you checked to see if the streaming property is set to false.
If enabled then Camel will split in a streaming fashion, which means it will split the input message in chunks. This reduces the memory overhead. For example if you split big messages its recommended to enable streaming. If streaming is enabled then the sub-message replies will be aggregated out-of-order, eg in the order they come back. If disabled, Camel will process sub-message replies in the same order as they where splitted.

So, it turned out to not be a problem with multicast after all.
Rather, in each of my sub-routes, I did this:
.split(..stax(SpecialClass)).streaming()
.beanRef("transformationBean","somefunction")
.aggregate(constant("1"), new MyAggregator())
.completionTimeout(5000)
.completionSize(1000)
.to(writeToFileAndRunBat)
Which, I assumed meant "Process all elements in the split, and if you aren't finished in 5 seconds or after 1000 elements, break out".
I changed it to
.split(..stax(SpecialClass), , new MyAggregator()).streaming()
.beanRef("transformationBean","somefunction")
.end()
.to(writeToFileAndRunBat)
Coming to think of it, it makes perfect sense, as the first version couldn't really know when we were done, while the last (I assume) just iterate over all elements in the split and calls the Aggregator for each.
Also, I had to .end() in the first version. So I guess the whole thing was just acting random.

Related

Apache Camel - Dead Letter Channel: apply comparison after dequeuing

I'm having some issues trying to figure out the solution for this problem:
I need to implement a DLC on Apache Camel, though when message are dequeued from the dead letter queue I have on ActiveMQ, every single one of them has to be compared with the latest massage present on another AMQ queue.
So to be clear: when Camel is consuming from queue1 (dead letter queue) the message M1, before trying to resend it to a certain route, it has to compare M1 (for example header comparison) with the latest message present on queue2, M2. M2 is not to be removed from queue2 (it shall serve also for the next comparison) while M1 has to be removed from queue1.
I want to understand if this is possible and which EIP I'm missing in order to implement this.
What you need is a QueueBrowser to browse the messages of queue2 without consuming them.
Alternatively you could also consume from queue2 in a transaction and then force a rollback so that the message is not consumed. But when "latest message present on queue2" does not mean the first message, this will not work because you can only process the first message like this.

Aggregate results of batch consumer in Camel (for example from SQS)

I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter
When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.
The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.

Request Reply and Scatter Gather Using Apache Camel

I am attempting to construct a route which will do the following:
Consume a message from jms:sender-in. I am using a INOUTrequest reply pattern. The JMSReplyTo = sender-out
The above message will be routed to multiple recipients like jms:consumer1-in, jms:consumer2-in and jms:consumer3-in. All are using a request reply pattern. The JMSReplyTo is specified per consumer ( in this case, the JMSReplyTo are in this order jms:consumer1-out, jms:consumer2-out, jms:consumer3-out
I need to aggregate all the replies together and send the result back to jms:sender-out.
I constructed a route which will resemble this:
from("jms:sender-in")
.to("jms:consumer1-in?exchangePattern=InOut&replyTo=queue:consumer1-out&preserveMessageQos=true")
.to("jms:consumer2-in?exchangePattern=InOut&replyTo=queue:consumer2-out&preserveMessageQos=true")
.to("jms:consumer3-in?exchangePattern=InOut&replyTo=queue:consumer3-out&preserveMessageQos=true");
I then send the replies back to some queue to gather and aggreagte:
from("jms:consumer1-out?preserveMessageQos=true").to("jms:gather");
from("jms:consumer1-out?preserveMessageQos=true").to("jms:gather");
from("jms:consumer1-out?preserveMessageQos=true").to("jms:gather");
from("jms:gather").aggregate(header("TransactionID"), new GatherResponses()).completionSize(3).to("jms:sender-out");
To emulate the behavior of my consumers, I added the following route:
from("jms:consumer1-in").setBody(body());
from("jms:consumer2-in").setBody(body());
from("jms:consumer3-in").setBody(body());
I am getting a couple off issues:
I am getting a timeout error on the replies. If I comment out the gather part, then no issues. Why is there a timeout even though the replies are coming back to the queue and then forwarded to another queue.
How can I store the original JMSReplyTo value so Camel is able to send the aggregated result back to the sender's reply queue.
I have a feeling that I am struggling with some basic concepts. Any help is appreciated.
Thanks.
A good question!
There are two things you need to consider
Don't mix the exchange patterns, Request Reply (InOut) vs Event
message (InOnly). (Unless you have a good reason).
If you do a scatter-gather, you need to make the requests
multicast, otherwise they will be pipelined which is not
really scatter-gather.
I've made two examples which are similar to your case - one with Request Reply and one with (one way) Event messages.
Feel free to replace the activemq component with jms - it's the same thing in these examples.
Example one, using event messages - InOnly:
from("activemq:amq.in")
.multicast()
.to("activemq:amq.q1")
.to("activemq:amq.q2")
.to("activemq:amq.q3");
from("activemq:amq.q1").setBody(constant("q1")).to("activemq:amq.gather");
from("activemq:amq.q2").setBody(constant("q2")).to("activemq:amq.gather");
from("activemq:amq.q3").setBody(constant("q3")).to("activemq:amq.gather");
from("activemq:amq.gather")
.aggregate(new ConcatAggregationStrategy())
.header("breadcrumbId")
.completionSize(3)
.to("activemq:amq.out");
from("activemq:amq.out")
.log("${body}"); // logs "q1q2q3"
Example two, using Request reply - note that the scattering route has to gather the responses as they come in. The result is the same as the first example, but with less routes and less configuration.
from("activemq:amq.in2")
.multicast(new ConcatAggregationStrategy())
.inOut("activemq:amq.q4")
.inOut("activemq:amq.q5")
.inOut("activemq:amq.q6")
.end()
.log("Received replies: ${body}"); // logs "q4q5q6"
from("activemq:amq.q4").setBody(constant("q4"));
from("activemq:amq.q5").setBody(constant("q5"));
from("activemq:amq.q6").setBody(constant("q6"));
As for your question two - of course, it's possible to pass around JMSReplyTo headers and force exchange patterns along the road - but you will create hard to debug code. Keep your exchange patterns simple and clean - it keep bugs away.

Camel condition on aggregate of messages

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Server client message verification

I have a question about communication between a client and a server. I would like to send data over a TCP unix socket (I know how to do this), and I don't know what is the best practises to test if the message sent is ready to be read entirely (not block per block).
Thus, I'm thinking of this
The client send the data printf(3) formatted, the message is written in a string and sent.
The server receive the message, but how to be sure the message if full ? Do I need to loop until the message is complete ?
So my idea is to use a code (or checksum maybe ?) that will be prepended and appended to the message like this :
[verification code] my_long_data_formatted [verification code]
And then, the server tries to read the data until the second verification code is read and succesfully checked.
Is this a proper solution for client / server communication ? If yes, what do you advise me for the verfications boundaries ?
TCP already has a built-in checksum/verification. So if the message was received, it was received correctly.
Typically then, the only thing you have to worry about is figuring out how long the message is. This is done either by sending the message length at the beginning or by putting a termination character or sequence at the end.
To make sure both the sender and receiver are "on the same page" so to speak, the receiver will typically send back a response after the message was received, even if that response only says "OK".
Examples of this technique include HTTP, SMTP, POP3, IMAP, and many others.
There are several ways to do this, so I don't think there is a "proper" solution. Your solution will probably work. The one note of caution is that you need to make sure the valiation code you chose is not sent as part of the data in the message. If that is the case, you will detect that the message is complete, even though it is really not the end of the message. If that is not possible to know what the data will look like, you may need to try a different technique.
From your description, it sounds like your messages are variable length. Another way would be to make all the messages the same length, that way you know how much data is to be read each time to get a complete message.
Another way would be to send over the length of the message first (such as a binary 32 bit number) which indicates the number of bytes to be read until the end of the message. You read this first to get the amount of data, and then read that amount from the socket.
If you had a set number of messages where the length was the same each time, you could assign a number to each message and send that number first, which you could then read. With that information you can determine how much data is to be read based on the assigned number to the message.
What you select to use for a solution will probably be based on factors like if messages are varaible or fixed in length and/or do you need to send additional information with the data. In this case, you might have a mixture where you send a fixed length header, which contains the information about the data which follows; either in length or the type of data which follows.
You need to establish an application-level protocol that would tell you somehow where application messages start and end in the byte stream provided to you by TCP (and then maybe how connected parties proceed with the conversation).
Popular choices are:
Fixed-length messages. Works well with binary data and is very simple.
Variable-size messages with fixed format or size header that tells exact size (and maybe type) of the rest of the message. Works with both binary and text data.
Delimited messages - some character(s) like new-line or \x1 are special and denote message boundaries. Best with text data.
Self-describing messages like XML, or S-expressions, or ASN.1.

Resources