How to chain two separate camel routes together for throttling - apache-camel

I have two separate routes running in an application and I want to control the total amount of work in flight across the entire path.
Route 1: Gzipped file on SFTP --> unzip --> local directory
Route 2: local directory --> process stuff --> Kafka
If route 2 has problems or falls behind in its work, I don't want route 1 to fill up the local directory. How can I limit the total number of files sitting in local dir waiting to be processed?
(if it was a single route I may be able to throttle() easier, but are there other options to look at the overall picture of multiple routes?)

You can implement a custom RoutePolicy where you check the number of files in that directory and if its bigger than X then suspend the route, and resume it again if its lower than X.
See more details at the Camel docs: http://camel.apache.org/routepolicy.html
You can look at the existing ThrottlingInflightRoutePolicy how its implemented for inspiration.

Related

Camel: Concurrent consumers, process file in route by order of arrival

I have a scenario: where I have 2 routes, one of my routes is used to download the file from FTP Server and put it in Local directory.
Second route will pick the file from local directory and start processing further.
Multiple files can arrive on FTP at any time. Second route uses thread pool(default 10 consumers) and as files are downloaded to local directory, second route will pick those file and start processing.
But this route picks the file randomly from local directory. I want second route to pick the file as per the timestamp.
So in case if second route is processing 10 files (as there are 10 threads configured) currently and if more files arrive at local directory then if any consumer gets free it should pick the file
from local directory which came first.
Can anyone please guide as how can I achieve this?

Throttling FTP Polling consumers using apache camel

I have a requirement where in at one point of time, I need to connect to multiple ftp/sftp endpoints (say 100 ftp endpoints) to download files and process them.
I have a route like below. The Seda queue further processes the messages by moving them into appropriate folders
from(ftp://username#host/foldername?password=XXXXX&include=.*).to("seda:"+routeId)
Now if I am starting all the FTP endpoints at the same time, which is resulting in JVM memory issues. How could I throttle the starting of the ftp endpoints? can I use a SEDA before the ftp to throttle (if so how can I use it)? Any other EIP's or ideas I could use to throttle the triggering of the polling ftp consumers?
You can look into the throttler dsl to if you want to throttle the fetching of the messages.
http://camel.apache.org/throttler.html
For controlling the startup you can look into the simplescheduleroutepolicy..
http://camel.apache.org/simplescheduledroutepolicy.html
It handles route activating and deactivating. Although I haven't used it myself but it looks like you can perhaps add a controlled delay on when routes should start and stop.
I have had this problem in the past solved it using cron in the following way:
from("ftp://username#host/foldername?password=XXXXX&include=.*&scheduler=quartz2&scheduler.cron=0/2+*+*+*+*+?")
You can set up every FTP consumer to pull at different times (say with one minute difference).
If you decided to go down this path, you can use the following website to construct your crons easily:
http://www.cronmaker.com/
Hope this helps.
R.

Camel FTP endpoint move to a folder based on date

I have a camel route consuming from an FTP server and storing any files it consumes to a directory with move=.dealtWith. However, the number of files in this .dealtWith directory can quickly become unmanageable for users to view, so I would like to move the file to a .dealtWith/{the_date} directory. Is there a way to specify this functionality in camel without bringing the route down?
Use Camel Simple Expression Language
ftp:url?move=.dealtWith/${date:now:yyyy-MM-dd}/${header.CamelFileName}

How to deploy same Camel routes in multiple server nodes for load balancing and fail over?

We're having some came routes defined in a single CamelContext which contains Web services,activemq.. in the Route.
Initially we've deployed the Routes as WAR in single Jboss node.
To scale out(usually we're doing for web services) , I've deployed the same CamelContext in multiple Jboss nodes.
But the performance is actually decreased.
FYI: All the CamelContexts points to the Same activemq brokers.
Here are my questions:
How to load balance/ Fail over camel context in different machines?
If CamelContexts are deployed in multiple nodes, Will aggregation work correctly?
Kindly give your thoughts!
Without seeing your system in detail, there is no way of knowing why it has slowed down so I'll pass over that. For your other two questions:
Failover
You don't say what sort of failover/load balancing behaviour you want. The not-very-helpful Camel documentation is here: http://camel.apache.org/clustering-and-loadbalancing.html.
One mechanism that works easily with Camel and ActiveMQ is to deploy to multiple servers and run active-active, sharing the same ActiveMQ queues. Each route attempts to read from the same queue to get a message to process. Only one route will get the message and therefore only one route processes it. Other routes are free to read subsequent messages, giving you simple load balancing. If one route crashes, the other routes will continue to process the messages, there will just be reduced capacity on your system.
If you need to provide fault tolerance for your web services then you need to look outside Camel and use something like Elastic Load Balancing. http://aws.amazon.com/elasticloadbalancing/
Aggregation
Each Camel context will run independently of the other contexts so one context will aggregate messages independently of what other contexts are up to. For example, suppose you have an aggregator that stores messages from ActiveMQ queue until receives a special end-of-batch message. If you have the aggregator running in two different routes, the messages will be split between the two routes and only one route will receive the end-of-batch message. So one aggregator will sit there with half the messages and do nothing. The other aggregator will have the other messages and will process the end-of-batch message but won't know about the messages the other route picked up.

Apache Camel File Component dynamic source directory in from EndPoint

I am currently working on a Camel based test harness application, that processes groups of files from multiple folders and compares with the files present in the local repository.
Is there any way to change the folder location from the end point in Camel route dynamically? I want to use a single route for polling files from the multiple folders.
According dynamic change endpoint camel, use following procedure:
stop the route
remove the route
change the endpoint
add the route
start the route

Resources