I need to process 1000 files after downloading from ftp , the downloading part is done using apache camel, can I also break the processing of files into sub tasks using camel. like multiple processes that camel is handling for me
you can always use the threads() API to enable concurrent processing on a route...
from("file://downloaded").threads(10).to(...);
Related
We are using apache camel sftp 2.25.4 as a poller (Jsch) to read xml files. There is two spring boot 2.6.10 applications (same application for redundancy) reading from the same sftp folder 'inbound/orders' with the configuration:
sftp://user#localhost:2222/inbound/orders?preMove=$simple{file:parent}/.processing_$simple{sys.hostname}/$simple{file:onlyname}
When either of the application is shutdown for maintenance(graceful shutdown) the preMove file becomes orphaned. Is there a way to ensure camel fully consumes this 'preMove' file before shutting down the route?
I expect some may suggest an idempotent component to handle this which is something that we are considering but (dare I say) trying to avoid the overhead of a cache lookup during the read operation (we consider this a tier 0 service so need to avoid any dependencies).
I have tried other styles of control such as the markerFile and rename strategy but none seem to work as well as the preMove. The preMove works really well.
I have an Apache Camel Route that uses LevelDB as the aggregation repository. My problem is that when the Camel Context is started the LevelDBAggregationRepository is automatically started by Camel even if the Camel Route it is used in is off and not started.
Is there a way of preventing this?
Why is this important for me? I want my application to be highly available, so I want to share the same LevelDB between nodes. But the LevelDBAggregationRepository unfortunately does not support using multiple processes at a time, and I have no SQL DB available for the JDBC Aggregation Repository.
So, my current solution attempt is to use a route policy that ensures that only one node at a time has the Camel Route enabled (determined by leader election with Zookeeper). However, when I start a second node with the route turned off, its Camel Context tries to launch the LevelDB anyway and then all hell breaks loose.
I Am having two servers which poll to same ftp location. when a file is placed in that location both the servers are picking the files. But I need only one server to pick the file and process and delete.I am using Camel 2.16.2 Version. Is there any way to solve this issue?
Camel FTP uses most of camel-File2 API internally. All, camel file the options are inherited.
There are many strategies to avoid parallel processing of same file.
Use preMove, moveFailed, readLock, readLockRemoveOnCommit Camel-File options.
For instance you could use below parameters.
ftp://{{input.folder}}?readLock=rename&preMove=.inprogress&moveFailed=.error
I want to migrate one apache spark based usecase to apache flink. In this usecase, i distribute some files / directories to the working directory of task nodes. I use the api sc.addFile(). However i could not find the equivalent features in apache flink. I did see an api env.registerCachedFile() but it does not move the files/directories to the worker nodes. Can someone throw light in this issue? thanks
I am trying to load a file to Kafka Consumer either with Flume or directly to Kafka. I started Kafka server using this link: http://kafka.apache.org/081/quickstart.html
As mentioned in the doc, I started zookeeper and also the brokers. Then I am able to send messages from Producer to Consumer. But, I am trying to see if I can upload an input file from my local machine to Kafka.
Any advice? Thanks.
You can't load a file into Kafka Consumer. You can only write data/file in a kafka topic using Kafka Producer Api's.
So, you need to write that file into a kafka topic and then your consumers will be able to read it.