i want to process the same file multiple times in apache camel2 - apache-camel

My requirement is to process the same file multiple times in Apache camel 2. how can i achieve this?
if I use noop=true then idempotent will be set to true. Otherwise file will be moved to the other directory.

Related

Apache camel file component to periodically read files without deleting or moving files and without idempotency

file:/../..?noop=true&scheduler=quartz2&scheduler.cron=0+0/10+*+*+*+?
Using noop=true allows me to have the files at same place after the route consumes the files but it also enables idempotent which I don't want. (There is second route which will do deletion based on some other logic so the first route shouldn't lead to infinite loop by consuming non-idempotently I believe)
I think I can overwrite the file and use idempotentKey as ${file:name}-${file:modified} so that the file will be picked up on next polling but that still means extra write. Or just deleting and creating the same file also should work but again not a clean approach.
Is there a better way to accomplish this? I could not find it in Camel documentation.
Edit: To summarize I want to read the same files over and over in a scheduled manner (say every 10 mins) from the same repo. SOLVED! - Answer below.
Camel Version: 2.14.1
Thanks!
SOLVED!
file:/../..?noop=true&idempotent=false&scheduler=quartz2&scheduler.cron=0+0/10+*+*+*+?

Consume multiple text files with Apache Flink DataSet API

I am writing a batch job with Apache Flink using the DataSet API. I can read a text file using readTextFile() but this function just read one file at once.
I would like to be able to consume all the text files in my directory one by one and process them at the same time one by one, in the same function as a batch job with the DataSet API, if it is possible.
Other option is implement a loop doing multiple jobs, one for each file, instead of one job, with multiples files. But I think this solution is not the best.
Any suggestion?
If I got the documentation right you can read an entire path using ExecutionEnvironment.readTextFile(). You can find an example here: Word-Count-Batch-Example
References:
Flink Documentation
Flink Sources

Creating new Files after a specific period of time Apache Camel File

I have this camel route that consumes from Kafka in a constant flow. Is there some way to define that in each 10 min, for example,
a new file be created? I need that each file has different names since when a file cannot have the same name as other. I've tried with timer and quartz camel component without success.
from("kafka:testTopic?brokers=localhost:9092&groupId=01")
.to("file:C://DESTINATION?fileExist=Append&fileName=prefixName_${variable}.json");

How can I synchronize different Apache Camel servers to work together on files without issues?

Our setup : Our production is using several instances (distinct JVM) of Apache Camel spread over a few physical computers (each computer is running more than one JVM).
Most of our Camel routes are HTTP-based (REST or SOAP), and we have a network component dealing with load-balancing the http queries, so it's working fine for that purpose.
On the file side, the shared folder is a Linux NFS mount. Each of the physical computer have the same shared folder mounted.
My issue : For a "new" pattern, we have to deal with files : somebody will produce files and put them in that shared folder, Apache Camel will have to detect the files, do some work on them, and rename them.
We have 2 uses for that pattern, one is processing a single file, the other is processing a folder containing several file.
I've tried various things, but I can't find a reliable way to ensure that one and only one Apache Camel will consume the file.
Here are some of my attempts :
Using the file component with the option readlock=markerFile . As far as I know, it worked fine for the single file (there may have been issues that I'm not aware of, the single file wasn't used so much), but was not working properly for the folder (I don't remember the exact issues, I stopped trying that 1 year ago)
Using a timer to start a java bean that will do the work :
I tried to create my own lock files, but it turns out that it's not reliable on a NFS filesystem. On some rare occasions, 2 physical computers would both think they succeeded in creating the lock file, and process the folder, and when the rename operation happened one JVM would just hang there forever
I tried to use a database with primary key constraints to synchronize the Camels, but I'm getting some SQL deadlock exceptions which result in processing fails
I'm out of ideas, and I'm worried that there is no effective way to synchronize Camels to work together with our architecture (parallel peers).
Is there a way to simulate a master/slave architecture using the same code spread over different Camels ?

Read files in sequence with MULE

I'm using a File Inbound Endpoint in Mule to process files from one directory and after processing move the files to another directory. The problem I have is that sometimes there's a lot of files in the "incoming directory" and when MULE starts up it tries to process them concurrently. This is no good for the DB accessed and updated in the flow. Can the files be read in sequence, no matter what order?
Set the flow processing strategy to synchronous to ensure the file poller thread gets mobilized across the flow.
<flow name="filePoller" processingStrategy="synchronous">
On top of that, do not use any <async> block or one-way endpoint downstream in the flow, otherwise, another thread pool will kick in, leading to potential (and undesired for your use case) parallel processing.

Resources