Skip duplicate for idempotent file2 component - apache-camel

I am using camel over a clustered environment and want to use a readlock on my file consumer endpoint so only one server tries to process each file.
The only cluster safe readlock is the idempotent readlock however this requires an idempotentRepository to be set on the file uri.
I use an idempotent consumer within the route which moves any duplicate files to an error folder and logs the error to a specific file. This uses a specified JDBCMessageIdRepository to store the idempotent keys.
Is there a way to use the duplicate handling logic from the idempotent consumer with the idempotent readlock? Or a way to set the idempotentRepository in the file component to not skip the duplicates so they are picked up by the idempotent consumer in the route instead?

If you're using the same idempotent repository keys for locking and consuming purposes then you won't be able to try processing the same file twice - idempotent consumer's check will happen after read lock's check so your file will be skipped before the consumer gets to check if it exists or not. Also, I think you may encounter issues with the consumer and read lock itself since the consumer may report your file as existing since the read lock will insert the row in the DB before the consumer does the check.
The fastest solution that I can think of is to use different keys for idempotent consumer and read lock - that way there won't be any conflict between them - and also make read lock remove the key on commit by setting readLockRemoveOnCommit to true. This way, the read lock will not allow concurrent processing of the file but will use idempotent consumer's key to check whether the file has been processed before or not.

Related

Apache camel file component to periodically read files without deleting or moving files and without idempotency

file:/../..?noop=true&scheduler=quartz2&scheduler.cron=0+0/10+*+*+*+?
Using noop=true allows me to have the files at same place after the route consumes the files but it also enables idempotent which I don't want. (There is second route which will do deletion based on some other logic so the first route shouldn't lead to infinite loop by consuming non-idempotently I believe)
I think I can overwrite the file and use idempotentKey as ${file:name}-${file:modified} so that the file will be picked up on next polling but that still means extra write. Or just deleting and creating the same file also should work but again not a clean approach.
Is there a better way to accomplish this? I could not find it in Camel documentation.
Edit: To summarize I want to read the same files over and over in a scheduled manner (say every 10 mins) from the same repo. SOLVED! - Answer below.
Camel Version: 2.14.1
Thanks!
SOLVED!
file:/../..?noop=true&idempotent=false&scheduler=quartz2&scheduler.cron=0+0/10+*+*+*+?

Camel File component - whats the difference between a IdempotentRepository and a InProgressRepository

What's exactly the difference between the IdempotentRepository and the InProgressRepository?
I have following definitions from the File component page:
IdempotentRepository: "Option to use the Idempotent Consumer EIP pattern to let Camel skip already processed files"
InProgressRepository: "The in-progress repository is used to account the current in progress files being consumed."
For me these are the same definitions, only slightly differently phrased.
They can also both use the same idempotent repository.
So I'm slightly confused, do I need both? Or is the idempotentRepository good enough?
Before you read below info, make sure you have read and understand the concept of idempotent.
IdempotentRepository - The place used to store the cache of already processed file (i.e. files have been consumed and handled by your route). In use when you check on idempotent feature.
InProgressRepository - The place used to store the cache of current in progress file (i.e. files to be consumed in current batch). Always in use for file consumer.
IMO, one always need InProgressRepository and with default setup (memory based repository) in general. One might need IdempotentRepository if idempotent is required and choose their own setup (file-based, JPA-based, ...) to against app restart.

Camel SFTP fetch on schedule and on demand

I can see similar problems in different variations but haven't managed to find a definite answer.
Here is the usecase:
SFTP server that I want to poll from every hour
on top of that, I want to expose a REST endpoint that the user can hit do force an ad-hoc retrieval from that same SFTP. I'm happy with the schedule on the polling to remain as-is, i.e. if I polled, 20 mins later the user forces refresh, the next poll can be 40 mins later.
Both these should be idempotent in that a file that was downloaded using the polling mechanism should not be downloaded again in ad-hoc pull and vice-versa. Both ways of accessing should download ALL the files available that were not yet downloaded (there will likely be more than one new file - I saw a similar question here for on-demand fetch but it was for a single file).
I would like to avoid hammering the SFTP via pollEnrich - my understanding is that each pollEnrich would request a fresh list of files from SFTP, so doing pollEnrich in a loop until all files are retrieved would be calling the SFTP multiple times.
I was thinking of creating a route that will start/stop a separate route for the ad-hoc fetch, but I'm not sure that this would allow for the idempotent behaviour between routes to be maintained.
So, smart Camel brains out there, what is the most elegant way of fulfilling such requirements?
Not a smart camel brain, but I would give a try as per my understanding.
Hope, you already went through:
http://camel.apache.org/file2.html
http://camel.apache.org/ftp2.html
I would have created a filter, separate routes for consumer and producer.
And for file options, I would have used: idempotent, delay, initialDelay, useFixedDelay=true, maxMessagesPerPoll=1, eagerMaxMessagesPerPoll as true, readLock=idempotent, idempotent=true, idempotentKey=${file:onlyname}, idempotentRepository, recursive=false
- For consuming.
No files will be read again! You can use a diversity of options as documented and try which suits you the best, like delay option. If yo
"I would like to avoid hammering the SFTP via pollEnrich - my understanding is that each pollEnrich would request a fresh list of files from SFTP, so doing pollEnrich in a loop until all files are retrieved would be calling the SFTP multiple times." - > Unless you use the option disconnect=true, the connection will not be terminated and you can either consume or produce files continously, check ftp options for disconnect and disconnectOnBatchComplete.
Hope this helps!

i want to process the same file multiple times in apache camel2

My requirement is to process the same file multiple times in Apache camel 2. how can i achieve this?
if I use noop=true then idempotent will be set to true. Otherwise file will be moved to the other directory.

Read files in sequence with MULE

I'm using a File Inbound Endpoint in Mule to process files from one directory and after processing move the files to another directory. The problem I have is that sometimes there's a lot of files in the "incoming directory" and when MULE starts up it tries to process them concurrently. This is no good for the DB accessed and updated in the flow. Can the files be read in sequence, no matter what order?
Set the flow processing strategy to synchronous to ensure the file poller thread gets mobilized across the flow.
<flow name="filePoller" processingStrategy="synchronous">
On top of that, do not use any <async> block or one-way endpoint downstream in the flow, otherwise, another thread pool will kick in, leading to potential (and undesired for your use case) parallel processing.

Resources