I am using camel to implement a route, that load data from DB and then apply some processing on it before producing results that are saved in the DB again.
This is part of a web application.
My problem is this war is going to be deployed by a load balancer into two servers. Then there will be two camel contexts with two routes performing the same processing on the same DB.
I will have the case where the same record is being processed by the two routes. How to handle this problem to prevent the routes from performing the same job twice?
If you need to have that setup so that each server might receive the same record - then you need an idempotent route. And you need to make sure your idempotent repository is the same between your machines. Using a database as the repository is an easy option. If you do not have a database, a hazelcast repo might be an option.
What can be an issue is to determine what is unique in your records - such as an order number or customer + date/time or some increasing transaction ID number.
Related
I have Vespa.ai cluster with multiple container/content nodes. After Vespa is loaded with data, my app sends queries and gets the data from Vespa. I want to be sure that I utilize well all the nodes and I get the data as fast as possible. My app builds HTTP request and sends it to one of the nodes.
Which node/nodes should I direct my request to?
How can I be sure that all instances participate in answering queries?
What should I do to utilize all the cluster nodes?
Does Vespa know to load balance these requests to other instances for better performance?
Vespa is a 2-tier system:
The containers will load balance over the content nodes (if you have multiple groups), but since you are sending the requests to the containers, you need to load balance over those.
This can be done by code you write in your client, by VIP, by another tier of nodes you host yourself such as e.g Nginx, or by a hosted load balancer such as AWS ELB.
You can debug the distributed query execution by adding &presentation.timing=true&trace.timestamps&tracelevel=5
to the search request, then you'll get a trace in the response where you can see how the query was dispatched and how long each node uses to match the query. See also Scaling Vespa https://docs.vespa.ai/en/performance/sizing-search.html
We have created a little spring boot application with Camel embedded. It simply polls an Office365 mailbox via imap for unread emails.
We have some verbose logging and we have seen Box one consumes the message and then processes it (sends some REST requests) and finishes. 2s later after box 1 has finished box 2 picks the same message up and processes it.
We implemented an Idempotent consumer:
from(casesMailBox.getUri()).idempotentConsumer(simple("${in.headers.Message-ID}"), repo).routeId("messaging").process(emailToCaseProcessor);
We can see duplicate entries in the underlying Oracle tables.
The documents are not clear but I assume the idempotentConsumer would commit as soon as possible to the DB.
Am I missing something here?
Idempotent consumer will not work in clustered environment as idempotent repository is a in-memory one.
You have to use central database or Hazelcast data grid based implementation.
For more info refer: http://camel.apache.org/idempotent-consumer.html
Zookeeper,
If you want to use polling consumer; scheduler in a clustered environment and want to avoid duplicate route triggering, you can use zookeeper with route policy.
Ref:http://camel.apache.org/zookeeper.html
Hope it helps!!!
I'm looking for a best practise how to monitor the functionality of camel routes.
I know there are monitoring tools like hawtio and camelwatch, but that's not exactly what I'm looking for.
I want to know if a route is "working" as aspected, for example you have a route which listens on a queue(from("jms...")). Maybe there are messages in the queue, but the listener is not able to dequeue them because of some db issues or something else(depends on the jms provider). With the monitoring tools mentioned above you just see inflight/failed/completed messages but you don't see if the listener is able to get the messages -> so the route is not "working".
I know there is also apache BAM, maybe I have to do some more research, but somehow it looks like BAM creates new routes and you can't monitor existing routes. I also don't want to implement/define such business cases for each route, I look for a more generic way. It's also mentioned on the camel 3.0 idea board that BAM wasn't touched for 5 years, so I think people don't use it that often(which means for me it doesn't fit their needs exactly).
I had similar requirement some time ago and at the end I developed a small Camel application for monitoring.
It run on timer, query different Camel applications installed in remote servers through JMX/Jolokia and if LastExchangeCompletedTimestamp of the route I am interested in is older than some time interval, send a mail to administrators.
Maybe this approach is too simple for your scenario, but could be an option.
(Edit: more details added)
Principal points:
Main routes queries DB for entities to control and spawns controlling routes
Controlling routes fires on quartz and http post the following url
.to("http://server:port/app/jolokia/?"+
"maxDepth=7&maxCollectionSize=500&ignoreErrors=true&canonicalNaming=false")
sending the following jsonRequest body
LinkedHashMap<String,Object> request=new LinkedHashMap<String,Object>();
request.put("type","read");
request.put("mbean","org.apache.camel:"+entity.getRouteId());
jsonRequest=mapper.writeValueAsString(request);
As response you get another JSON, parse it and get LastExchangeCompletedTimestamp value
We're having some came routes defined in a single CamelContext which contains Web services,activemq.. in the Route.
Initially we've deployed the Routes as WAR in single Jboss node.
To scale out(usually we're doing for web services) , I've deployed the same CamelContext in multiple Jboss nodes.
But the performance is actually decreased.
FYI: All the CamelContexts points to the Same activemq brokers.
Here are my questions:
How to load balance/ Fail over camel context in different machines?
If CamelContexts are deployed in multiple nodes, Will aggregation work correctly?
Kindly give your thoughts!
Without seeing your system in detail, there is no way of knowing why it has slowed down so I'll pass over that. For your other two questions:
Failover
You don't say what sort of failover/load balancing behaviour you want. The not-very-helpful Camel documentation is here: http://camel.apache.org/clustering-and-loadbalancing.html.
One mechanism that works easily with Camel and ActiveMQ is to deploy to multiple servers and run active-active, sharing the same ActiveMQ queues. Each route attempts to read from the same queue to get a message to process. Only one route will get the message and therefore only one route processes it. Other routes are free to read subsequent messages, giving you simple load balancing. If one route crashes, the other routes will continue to process the messages, there will just be reduced capacity on your system.
If you need to provide fault tolerance for your web services then you need to look outside Camel and use something like Elastic Load Balancing. http://aws.amazon.com/elasticloadbalancing/
Aggregation
Each Camel context will run independently of the other contexts so one context will aggregate messages independently of what other contexts are up to. For example, suppose you have an aggregator that stores messages from ActiveMQ queue until receives a special end-of-batch message. If you have the aggregator running in two different routes, the messages will be split between the two routes and only one route will receive the end-of-batch message. So one aggregator will sit there with half the messages and do nothing. The other aggregator will have the other messages and will process the end-of-batch message but won't know about the messages the other route picked up.
I have generated endpoints methods:get,list,remove,update.But what if i have collection of objects that i want to insert,is it only way -insert in loop, or exists solution of bulk insert in AppEngine?
You will have to look at alternate strategies to load data into your application. The reason is that there could be hundreds / thousands of records that you want to do insert as part of your bulk insert.
Now having said that, you could look at the following approach with Cloud Endpoints:
Considering uploading a file (CSV , JSON , XML) to your endpoint API Method. This file will have multiple records that you want to insert.
Process the file in your Endpoint #APIMethod implementation. Process each record and insert them accordingly.
While the above is achievable .. you have to consider the fact that a client has made this API call and is waiting for the response. So if you are going to end up processing multiple records (insert) and then throw back the response, things could time out quickly and also it is not best practices to make the API client wait.
So I suggest that while there are ways to do it via the API, you should look at various alternatives to get the data into your App Engine app. In case you really have to do the File thing, consider accepting the file and throwing back an ACCEPT Response. You could then use a Task Queue on the App Engine to process the file.