Camel file polling: finish processing a batch before polling again - apache-camel

I have a file drop endpoint that I poll from. I need to poll the files in sequential order as they are received and I am using a cron expression to poll at only certain hours to the day. Here is my file input:
file:///tmp/input?idempotent=true&moveFailed=/tmp/error&readLock=changed&readLockCheckInterval=2500&sortBy=file:modified&move=processed/&scheduler=quartz2&scheduler.cron=0+0/5+0-3,5-23+*+*+?
The issue that I have is that Camel polls a batch of files but then subsequently newer files are written to the directory so in a subsequent poll a new file is processed before the previous batch is completed.
I added some properties to my route to show the batch size and whether or not it has been completed just for some info:
<camel:log message="Camel batch size: $simple{property.CamelBatchSize}, Camel Batch Index: $simple{property.CamelBatchIndex}, Camel Batch finished: $simple{property.CamelBatchComplete}"/>
How can I tell Camel not to poll until the previous batch is complete? I do this because order of file processing is important. Thanks!

Not sure if there is any existing method to accomplish your goal by cron job in file route directly. However, You could achieve your task by using 3 routes.
Cron job route
Emit suspend signal to Stopper route if Collector route is already started (check via controlBus component)
Startup Collector route at correct time (trigger by controlBus component)
Collector route
Control file consumer behavior
Emit complete signal to Stopper route when batch completed
Stopper route
Suspend Collector route when signal received (trigger by controlBus component)

Related

How to resume a suspended Camel consumer

I created a RoutePolicy that suspends a consumer when a configurable number of errors occurred in the route.
Before I suspend the consumer I want to make sure it will be resumed after a configurable amount of time (for example 30 minutes after suspension).
What is the best way to achieve this?
I tried to use the onExchangeBegin method of the RoutePolicy. But in a test I found that it is no more executed when the route is suspended (as I assumed).
I tried to create a SimpleScheduledRoutePolicy before suspending the route but I didn't found a way to register this new Bean in the Camel context (backed by Spring).
Therefore I currently create a TimerTask that sends a message to Camel Control Bus to resume the route. That works, but feels a bit of alien since Camel does not know about such resume tasks.
Is there another, more "Camel native" way to reach my goal?
Before you suspend the consumer, you can create a dummy file.
Have another route polling for the dummy file with a filter to check if it was created 30 minutes prior. Something like:
from("file:dummyLocation?include=.dummy&delete=true&filter=#filterFileOlderThanThirtyMins)
.to("controlbus:route?routeId=suspendedRoute&action=start")
This is just off the top of my head, though!

apache camel ftp: how to prevent ftp component to process file when that file being written

in my camel app, it is process file from ftp server. When I test, I found during file upload, meantime my route start pick up that file and do process. I have set readLock to 'changed' and delay is '60000', my file is around 500m. Does I missing anything?
Notice that the delay option is just a fixed interval that is not "coupled" with the readLock.
The readLock option changed checks every second if the filesize has changed. With slow uploads this could be the reason that files still uploading are already be consumed.
You could try to increase the readLockCheckInterval higher than 1 second.
See Camel FTP docs for more details and options (Option readLock)

Stopping Camel Route when no more message in ActiveMQ

Is there a way using which we can stop the Camel route if we don't have any more message in ActiveMQ? My required scenario.
Fetch all the message from ActiveMQ queue and process them.
Poll for 2-3 more times to check if we have any other new message, is yes execute step #1.
If no message in present stop the route and start it back after say 5 min (which I guess can be achieved by Polling Strategy).
Have a look at this answer. It polls the queue with a scheduler and a polling strategy (POJO).
With the scheduler you can choose the interval of polling
With the timeout of the polling strategy consumer you can stop consuming (e.g. if no message arrives for 5 seconds the queue is probably empty)
If you want to stop/start the consumer completely, you can add Camel Control Bus to the mix. You could then start and stop the consumer route.

Throttling FTP Polling consumers using apache camel

I have a requirement where in at one point of time, I need to connect to multiple ftp/sftp endpoints (say 100 ftp endpoints) to download files and process them.
I have a route like below. The Seda queue further processes the messages by moving them into appropriate folders
from(ftp://username#host/foldername?password=XXXXX&include=.*).to("seda:"+routeId)
Now if I am starting all the FTP endpoints at the same time, which is resulting in JVM memory issues. How could I throttle the starting of the ftp endpoints? can I use a SEDA before the ftp to throttle (if so how can I use it)? Any other EIP's or ideas I could use to throttle the triggering of the polling ftp consumers?
You can look into the throttler dsl to if you want to throttle the fetching of the messages.
http://camel.apache.org/throttler.html
For controlling the startup you can look into the simplescheduleroutepolicy..
http://camel.apache.org/simplescheduledroutepolicy.html
It handles route activating and deactivating. Although I haven't used it myself but it looks like you can perhaps add a controlled delay on when routes should start and stop.
I have had this problem in the past solved it using cron in the following way:
from("ftp://username#host/foldername?password=XXXXX&include=.*&scheduler=quartz2&scheduler.cron=0/2+*+*+*+*+?")
You can set up every FTP consumer to pull at different times (say with one minute difference).
If you decided to go down this path, you can use the following website to construct your crons easily:
http://www.cronmaker.com/
Hope this helps.
R.

How to stop camel from deleting FTP file when processing fails and exception is handled by an error handler

I have a route that reads from an FTP server, then processes the message. The route has DeadLetterChannel error handler that routes the message to some bean when an exception is thrown while processing the message.
Now when an exception is handled by the error handler, Camel presumes everything passed fine and still deletes the FTP file.
If I remove the error handler, Camel doesn't delete the file when there is an exception.
Now my question is, how can i have a DeadLetterChannel error handler and at the same time stop Camel from deleting FTP file when processing fails?
You can set the option noop=true on the ftp endpoint. Then the file will be left alone.
Though you would then have to consider how you can skip picking up the files in the future? And for that you can use the idempotent repository to keep track of which files you have processed before. Or an alternative is to move the file when you are done etc.
As the ftp component extends the file component see details at: http://camel.apache.org/file2
You have several options to do that:
You do not use the delete=true option at all and handle the delete of the file in the "success" scenario by yourself. This would be relatively transparent.
In case you enter the DLC you can manipulate the endpoint from which you are consuming. Therefore just define your own processor for the DLC in onPrepareFailure. See example: deadLetterChannel("jms:dlc").onPrepareFailure(new ErrorProcessor())
After that you can use the getContext() method to get the camel context and one of the getEndpoint() methods to get your consumer endpoint.
When you have the endpoint you can see which 'process factory' class is used with the getProcessStrategyand there you can update the delete flag to avoid deleting your file.
For this endpoint it is also possible to define your own 'processStrategy' class with the method setProcessStrategy. Please take a look for yourself which process strategy class is used in your case. You then can override the according delete method like deleteLocalWorkFile and just do nothing.

Resources