Camel SFTP fetch on schedule and on demand

Camel SFTP fetch on schedule and on demand - apache-camel

I can see similar problems in different variations but haven't managed to find a definite answer.
Here is the usecase:
SFTP server that I want to poll from every hour
on top of that, I want to expose a REST endpoint that the user can hit do force an ad-hoc retrieval from that same SFTP. I'm happy with the schedule on the polling to remain as-is, i.e. if I polled, 20 mins later the user forces refresh, the next poll can be 40 mins later.
Both these should be idempotent in that a file that was downloaded using the polling mechanism should not be downloaded again in ad-hoc pull and vice-versa. Both ways of accessing should download ALL the files available that were not yet downloaded (there will likely be more than one new file - I saw a similar question here for on-demand fetch but it was for a single file).
I would like to avoid hammering the SFTP via pollEnrich - my understanding is that each pollEnrich would request a fresh list of files from SFTP, so doing pollEnrich in a loop until all files are retrieved would be calling the SFTP multiple times.
I was thinking of creating a route that will start/stop a separate route for the ad-hoc fetch, but I'm not sure that this would allow for the idempotent behaviour between routes to be maintained.
So, smart Camel brains out there, what is the most elegant way of fulfilling such requirements?

Not a smart camel brain, but I would give a try as per my understanding.
Hope, you already went through:
http://camel.apache.org/file2.html
http://camel.apache.org/ftp2.html
I would have created a filter, separate routes for consumer and producer.
And for file options, I would have used: idempotent, delay, initialDelay, useFixedDelay=true, maxMessagesPerPoll=1, eagerMaxMessagesPerPoll as true, readLock=idempotent, idempotent=true, idempotentKey=${file:onlyname}, idempotentRepository, recursive=false
- For consuming.
No files will be read again! You can use a diversity of options as documented and try which suits you the best, like delay option. If yo
"I would like to avoid hammering the SFTP via pollEnrich - my understanding is that each pollEnrich would request a fresh list of files from SFTP, so doing pollEnrich in a loop until all files are retrieved would be calling the SFTP multiple times." - > Unless you use the option disconnect=true, the connection will not be terminated and you can either consume or produce files continously, check ftp options for disconnect and disconnectOnBatchComplete.
Hope this helps!

Related

Apache camel file component to periodically read files without deleting or moving files and without idempotency

file:/../..?noop=true&scheduler=quartz2&scheduler.cron=0+0/10+*+*+*+?
Using noop=true allows me to have the files at same place after the route consumes the files but it also enables idempotent which I don't want. (There is second route which will do deletion based on some other logic so the first route shouldn't lead to infinite loop by consuming non-idempotently I believe)
I think I can overwrite the file and use idempotentKey as ${file:name}-${file:modified} so that the file will be picked up on next polling but that still means extra write. Or just deleting and creating the same file also should work but again not a clean approach.
Is there a better way to accomplish this? I could not find it in Camel documentation.
Edit: To summarize I want to read the same files over and over in a scheduled manner (say every 10 mins) from the same repo. SOLVED! - Answer below.
Camel Version: 2.14.1
Thanks!

SOLVED!
file:/../..?noop=true&idempotent=false&scheduler=quartz2&scheduler.cron=0+0/10+*+*+*+?

pollEnrich from FTP

I'm trying enrich using dynamically selected remote files on an FTP server using pollEnrich. The remote files must remain in place and can be selected again and again so the endpoint has noop=true and idempotent=false. Everything seems to work fine until multiple requests start coming in that use the same remote file for enrichment, and this results in all but a few of the requests receiving a null body for the new exchange argument in the aggregation strategy. Here is the relevant part of the route, which has been modified slightly to post here:
.pollEnrich()
.simple("ftp://username:password#ftp.example.com/path/files?fileName=${header.sourceFilename}&passiveMode=true&noop=true&idempotent=false")
.timeout(0)
.cacheSize(-1)
.aggregationStrategy(myEnrichmentAggregationStrategy)
I switched to using file:// instead of ftp:// as a test and still experienced the same problems. I also tried different modes for timeout, cacheSize, and also enabling streamCaching since the body is an InputStream. I'm now thinking about implementing a custom read-lock mechanism (processStrategy) but it feels like a long shot workaround. Has anyone else come across this problem and can shed some light on what's wrong?

I believe I've found a solution and it is using the inProgressRepository property on the polling consumer to set a dummy implementation of IdempotentRepository, allowing me to disable the check for in progress files.

What's the right way to do long-running processes on app engine?

I'm working with Go on App Engine and I'm trying to build an API that needs to perform a long-running background task - in this case it needs to parse and chunk a big file out to task queues. I'd like it to return a 200 and close the user connection immediately and let the process keep running in the background until it's complete (this could take 5-10 minutes). Task queues alone don't really work for my use case because parsing the initial file can take more than the time limit for an API request.
At first I tried a Go routine as a solution for this problem. This failed because my app engine context expired as soon as the parent function closed the user connection. (I suppose I could try writing a go routine that doesn't require a context, but then I'd lose logging and I'd need to fetch the entire remote file and pass it to the go routine.)
Looking through the docs, it looks like App Engine used to have functionality to support exactly what I want to do: [runtime.RunInBackground], but that functionality is now deprecated and the replacement isn't obvious.
Is there a "right" or recommended way to do background processing now?
I suppose I could put a link to my big file into a task queue, but if I understand correctly, even functions called through task queues have to complete execution within a specified amount of time (is it 90 seconds?) I need to be able to run longer than that.
Thanks for any help.

try using:
appengine.BackgroundContext()
it should be long-lived but will only work on GAE Flex

Asynchronous requestAction

Pleeeeease help me... :- )
"You are my only hope".
I need to execute an action asynchronously several thousand times. The action is supposed to fetch email content from external API and it is located in the different controller, so I use requestAction to get it. When I get all the results, then these 1000 email contents are sent as 1000 emails during one request, using other API.
Unfortunately, when run sequentially, it takes a lof of time, so I need to run these request asynchronously.
My question is:
Can I execute
$this->requestAction($myUrl)
...parallelly? For example 100 requests at one time? I've seen a bit of asynchronous examples in PHP, but they all used static files and I need to preserve CakePHP structure to be able to use requestAction.
Thanks to all who can help!
EDIT: By the way, when I tried to run requests via fopen($url, 'r'); and then stream get contents the efficiency was worse than worst, but maybe it could be improved. Don't know, but the requestAction seems to be definitely better option (I think).

Is there any reason why you wouldn't use a shell for this?
Please take a look at: CakePHP Shells
"Deferred Execution" is probably what you are after here, basically you want to send one command not have the user waiting around for it? If so then you can use a Message Queue to handle this pretty easily.
We use CakeResque and Redis to send 1000's of emails and perform other API calls
CakeResque - Deferred Processing for CakePHP
There are other message queues that are available but this is really simple to get working and probably wont need much of a change to your code.

In the end, I wasn't able to find a solution that would use requestAction, but I was able to extract the code to a standalone php file, which then is processed using include method.
And the most interesting part, asynchronous requests, was done using great library called cURL-easy and with help of my utility function. You can read about it (how to install and how to use it) here:
Is making asynchronous HTTP requests possible with PHP?

BizTalk 2006 - Copy a received file to a new directory

I want to be able to copy the file I have which comes in as XML into a new folder location on the server. Essentially I want to hold a back up of the input files in a new folder.
What I have done so far is try to follow what has been said on this forum post - link text
At first I tried the last method which didn't do anything (file renaming while reading). So I tried one of the other options and altered the orchestration and put a Send shape just after the Receive shape. So the same message that comes in is sent out to the logical port. I export the MSI, and I have created a Send Port in the Admin console which has been set to point to my copy location. It copies the file but it continues to create one every second. The Event Viewer also reports warnings saying "The file exists". I have set the Copy Mode of the port to 'overwrite' and 'Create New', both are not working.
I have looked on Google but nothing helps - BTW I support BizTalk but I have no idea how pipelines, ports work. So any help would be appreciated.
thanks for the quick responses.
As David has suggested I want to be able to track the message off the wire before BizTalk does any processing with it.
I have tried to the CodePlex link that Ben supplied and its points to 'Atomic-Scope's BizTalk Message Archiving Pipeline Component' which looks like my client will have to pay for. I have downloaded the trial and will see if I have any luck.
David - I agree that the orchestration should represent the business flow and making a copy of a file isn't part of the business process. I just assumed when I started tinkering around I could do it myself in the orchestration as suggested on the link I posted.
I'd also rather not rely on the BizTalk tracking within the message box database as I suppose the tracked messages will need to be pruned on a regular basis. Is that correct or am I talking nonsense?
However is there a way I can do what Atomic-Scope have done which may be cheaper?
**Hi again, I have figured it out from David's original post as indicated I also created a Send port which just has a "Filter" expression like - BTS.ReceivePortName == ReceivePortName
Thanks all**

As the post you linked to suggests there are several ways of achieving this sort of result.
The first question is: What do you need to track?
It sounds like there are two possible answers to that question in your case, which I'll address seperately.
You need to track the message as received off the wire before BizTalk touches it
This scenario often arises where you need to be able to prove that your BizTalk solution is not the source of any message corruption or degradation being seen in messages.
There are two common approaches to this:
Use a pipeline component such as the one as Ben Runchey suggests
There is another example of a pipeline component for archiving here on codebetter.com. It looks good - just be careful if you use other components, and where you place this component, that you are still following BizTalk streaming model proper practices. BizTalk pipelines are all forwardonly streaming, meaning that your stream is readonly once, and all the work on them the happens in an eventing manner.
This is a good approach, but with the following caveats:
You need to be careful about the streaming employed within the pipeline component
You are not actually tracking the on the wire message - what your pipeline actually sees is the message after it has gone through the BizTalk adapter (e.g. HTTP adapter, File etc...)
Rely upon BizTalk's out of the box tracking
BizTalk automatically persists all messages to the message box database and if you turn on BizTalk tracking you can make BizTalk keep these messages around.
The main downside here is that enabling this tracking will result in some performance degradation on your server - depending on the exact scenario, this may not be a huge hit, but it can be signifigant.
You can track the message after it has gone through the initial receive pipeline
With this approach there are two main options, to use a pure messaging send port subscribing to the receive port, to use an orchestration send port.
I personally do not like the idea of using an orchestration send port. Orchestrations are generally best used to model the business flow needed. Unless this archiving is part of the business flow as understood by standard users, it could simply confuse what does what in your solution.
The approach I tend to use is to create a messaging send port in the BizTalk admin console that subscribes to your receive port. The send port will then just use a standard BizTalk file adapter, with a pass through pipeline.

I think you should look at the Biztalk Message Archiving pipeline component. You can find it on Codeplex (http://www.codeplex.com/btsmsgarchcomp).
You will have to create a new pipeline and deploy it to your biztalk group. Then update your receive pipeline to archive the file to a location that the host this receive location is running under has access to.