I am looking to use apache-camel to poll an imap inbox, but I am wondering how this setup would behave in a cluster. I would deploy apache camel on each node of the cluster, and each node would poll the inbox.
How can I avoid having many consumers pick up the same message?
I decided to take the simple road and not install additional components. I used a clustered quartz job to trigger the polling of the inbox. The poller then places a retrieval command on a Hazelcast distributed queue, which is received by an array of Message retrieval components in the cluster.
Installing, Jms, James, in addition to Camel smelled to me, just to solve this task.
Not very easy, since imap is not really a protocol for these kind of tasks.
The trick is still to have one consumer do the polling, not many. If you have many nodes for high availablility, you could do some tricks with JMS to trigger IMAP polls.
For instance, you could use a jms trigger message to init a poll and have all members of the cluster listen to that poll. Keep the concurrentConsumer to 1 and async. JMS disabled in Camel. You can rely on Message Groups or ActiveMQ exclusive consumer to be sure that only one node gets the trigger messages (when alive, otherwise another node will take over). Generating the polling messages might be tricky, but could be done as simply as a timer route from each camel node. Just tune the frequency.
This setup will avoid race conditions in IMAP, while not beeing load balanced, at least fail over secured. It might be good enough to just go ahead and do concurrent polling, with few issues. However, I don't think you will be 100% safe without only allowing one consumer.
In a clustered environment you may consider having a way of electing a single Camel route that is active, which does the imap polling. And then have logic for failover if the node goes down.
In Camel you can take a look at route policy which can be applied to routes.
http://camel.apache.org/routepolicy
The zookeeper component has a policy for electing a leader in a cluster, and only allow one route to be active. This requires though that you use zookeeper.
http://camel.apache.org/zookeeper
Related
We are investigating Camel for use in a new system; one of our situations is that a set of steps is started, and some of the steps in the set can take hours or days to execute. We need to execute other steps after such long-elapsed-time steps are finished.
We also need to have the system survive reboot while some of the "routes" with these steps are in progress, i.e., the system might be rebooted, and we need the "routes" in progress to remain in their state and pick up where they left off.
I know we can use a queued messaging system such as JMS, and that messages put into such a queue can be handled to be persisted. What isn't clear to me is how (or whether) that fits into Camel -- would we need to treat the step(s) after each queue as its own route, so that it could, on startup, read from the queue? That's going to split up our processing steps into many more 'routes' than we would have otherwise, but maybe that's the way it's done.
Is/are there Camel construct/constructs which assist with this kind of system? If I know their terms and basic outline, I can probably figure it out, but what I really need is an explanation of what the constructs do.
Camel is not a human workflow / long-lasting tasks system. For that kind look at BPMS systems. Camel is more fitting for real time / near real time integrations.
For long tasks you persist their state in some external system like a message broker or database or BPMS, and then you can use Camel routes to process and move from one state to the next - or where Camel fit in such as integrating with the many different systems you can do OOTB with the 200+ Camel components.
Camel do provide graceful shutdown so you can safely shutdown or reboot Camel. But in the unlikely event of a crash, you may want to look at transactions and idempotency if you are talking about surviving a system crash.
You are referring to asynchronous processing of messages in routes. Camel has a couple of components that you can use to achieve this.
SEDA: In memory not persistent and can only call end points in the same route.
VM: In memory not persistent and can call endpoints in different camel contexts but limited to the same JVM. This component is a extension of SEDA.
JMS: Persistence is configurable on the queue stack. Much more heavy weight but also more fault tolerant than SEDA/JVM.
SEDA/JVM can be used as low overhead replacements for JMS components and in some cases I would use them exclusively. In your case the persistence element is a required so SEDA/JVM is not an option, but to keep things simple the examples will use SEDA as you can get some basics up and running quickly.
The example will assume the following we have a timer that kicks off and then there is two processes it needs to run. See screenshot below:
In this route the message flow is synchronous between the timer and the two process beans.
If we wanted to make these steps asynchronous we would need to break each step into a route of its own. We would then connect these routes using one of the components listed in the beginning. See the screenshot below:
Notice we have three routes each route only has one "processing" step in it. Route Test only has a timer which fires a messages to the SEDA queue called processOne. This message is received on the SEDA queue and sent to the Process_One bean. After this it is the send to the SEDA queue called processTwo, where it is received and passed to the Process_Two bean. All this is done asynchronously.
You can replace the SEDA components with JMS once you get to understand the concepts. I suspect that state tracking is going to be the most complicated part as Camel makes the asynchronous part easy.
I'm trying to create a router to integrate a number of JMS topics & Queues. I am constrained by the fact the client I am working for can't change the JMS implementation (TibCo EMS with some custom client libraries) and the fact that they have written their own XA transaction manager which doesn't quite conform with the JTA spec. It is very important that message delivery is guaranteed.
I've done a lot of reading and experimenting with Camel and I've realised that I probably need to write my own JMS component, as the standard JMS component is not going to integrate with the JMS client libraries or TM I have.
I need to be able to put hooks into the route lifecycle at the following points:
During the route startup, I need to identify all JMS connections and enlist them as XA resources with the TM implementation
When a message is received at the consumer, I need to start a transaction including all the JMS connections in the route
When a routing decision is made, I need to send the message to the producer and commit the transaction
Given the above, I think I can implement a very simplified version of the camel-jms component which strips out all the Spring parts and only contains the bare minimum required to interact with my JMS libraries.
Where would be the best place to initialise the transaction manager? I've been looking at DefaultCamelContext, RoutePolicy and RouteContext but I can't find a place where all the endpoints are resolved and initialised.
I solved this problem by implementing the UserTransaction and TransactionManager interfaces and creating a PlatformTransactionManager which the Camel JMS component uses to create the DefaultMessageListenerContainer.
One important point to note is that the transacted property on the Camel JMSComponent refers to local transactions, not XA transactions. If you set this property to true after passing a PlatformTransactionManager to the component, the DMLC will effectively try to commit your transaction twice, which won't work.
This leaves me with a nice working example consuming from one JMS broker and producing to another, but it is very slow - ~5 messages per second. Unfortunately Spring JMS does not support batching so it seems the best solution here is to adjust the JMS topic configurations such that routing only takes place between topics on the same broker.
I am implementing a callback in java to store messages in a database. I have a client subscribing to '#'. But the problem is when this # client disconnects and reconnect it adds duplicate entries in the database of retained messages. If I search for previous entries bigger tables will be expensive in computing power. So should I allot a separate table for each sensor or per broker. I would really appreciate if you suggest me better designs.
Subscribing to wildcard with a single client is definitely an anti-pattern. The reasons for that are:
Wildcard subscribers get all messages of the MQTT broker. Most client libraries can't handle that load, especially not when transforming / persisting messages.
If you wildcard subscriber dies, you will lose messages (unless the broker queues endlessly for you, which also doesn't work)
You essentially have a single point of failure in your system. Use MQTT brokers which are hardened for production use. These are much more robust single point of failures than your hand-written clients. (You can overcome the SIP through clustering and load balancing, though).
So to solve the problem, I suggest the following:
Use a broker which can handle shared subscriptions (like HiveMQ or MessageSight), so you can balance all messages between many clients
Use a custom plugin for doing the persistence at the broker instead of the client.
You can also read more about that topic here: http://www.hivemq.com/blog/mqtt-sql-database
Also consider using QoS = 3 for all message to make sure one and only one message is delivered. Also you may consider time-stamp each message to avoid inserting duplicate messages if QoS requirement is not met.
I'm currently developing a Camel Integration app in which resumption from a previous state of processing is important. When there's a power outage, for instance, it's important that all previously processed messages are not re-processed. The processing should resume from where it left off before the outage.
I've gone through a number of possible solutions including Terracotta and Apache Shiro. I'm not sure how to use either as documentation on the integration with Apache Camel is scarce. I've not settled on the two, however.
I'm looking for suggestions on the potential alternatives I can use or a pointer to some tutorial to get me started.
The difficulty in surviving outages lies primarily in state, and what to do with in-flight messages.
Usually, when you're talking state within routes the solution is to flush it to disk, or other nodes in the cluster. Taking the aggregator pattern as an example, aggregated state is persisted in an aggregation repository. The default implementation is in memory, so if the power goes out, all the state is lost. However, there are other implementations, including one for JDBC, and another using Hazelcast (a lightweight in-memory data grid). I haven't used Hazelcast myself, but JDBC does a synchronous write to disk. The aggregator pattern allows you to resume from where you left off. A similar solution exists for idempotent consumption.
The second question, around in-flight messages is a little more complicated, and largely depends on where you are consuming from. If you're in the middle of handling a web service request, and the power goes out, does it matter if you have lost the message? The user can simply retry. Any effects on external systems can be wrapped in a transaction, or an idempotent consumer with JDBC idempotent repository.
If you are building out integrations based on messaging, you should consume within a transaction, so that if your server goes down, the messages go back into the broker and can be replayed to another consumer.
Be careful when using seda: or threads blocks, these use an in-memory queue to pass exchanges between threads, any messages flowing down these sorts of routes will be lost if someone trips over the power cable. If you can't afford message loss, and need this sort of processing model, consider using a JMS queue as the endpoints between the two routes (with transactions to ensure you pick up where you left off).
Let's say I have a Job Scheduler which has 4 consumers A, B, C and D. Jobs of type X will have to be routed to Consumer A, type Y to Consumer B and so on. Consumers A, B, C and D are to run as independent applications without any dependency, either locally or remotely.
The consumers take varying times to complete their jobs, which are subsequently routed to the Job Scheduler for aggregation.
Clones of one of the consumers may also be needed to share its eligible jobs. A job should however be processed only once.
Is Content-based router the best solution for this? Mind you, I need the custom job scheduler, because it only has the intelligence to split up a job among the consumers.
Or is there any better way to handle this? I don't require those features of the broker like automatically switching over to another consumer (load balancing) and such in case of a failure.
I'm not completly sure that I follow you. This sounds like a rather straight forward scenario for asychronous processing.
I'm not sure how you plan to send these jobs to the Camel application, but given you can receive them somewhere you could probably go ahead with a simple content based router.
Given your requirements for the consumers, I would go for JMS queues (using Apache ActiveMQ or similar broker middleware), one queue per job type. This makes it easy to distribute consumers to different machines without really changing the code.
// Central node routes
from("xxx:routeJob")
.choice()
.when(header("type").isEqualTo("x"))
.to("jms:queue:processJobTypeX")
.when(header("type").isEqualTo("y"))
.to("jms:queue:processJobTypeY")
.otherwise()
.to("jms:queue:processJobTypeZ");
from("jms:queue:aggregateJob")
.bean(aggregate);
// different camel application (may be duplicated for multiple instances).
from("jms:queue:processJobTypeX")
.bean(heavyProcessing)
.to("jms:queue:aggregateJob");
// Yet another camel application
from("jms:queue:processJobTypeY")
.bean(lightProcessing)
.to("jms:queue:aggregateJob");
Please revisit your question for a better answer :)