Taking input from multiple files to a camel route

Taking input from multiple files to a camel route - file

I want to write a camel route which will take input from multiple file destination and process them after aggregating.
is it possible to take input from multiple files for a single route?

Yes you can use poll-enrich to call consumer-endpoints like file to enrich the message. This works for many other consumer-endpoints as well like SFTP or message queues.
If you need to read same file multiple times it can get trickier as you'll likely have to set noop=true and possibly use something like dummy idempotent repository to get around camels default behavior.
Note that calling pollEnrich seems to clear headers / create new message so use exchange properties to persist data between pollEnrich calls.
from("file:someDirectory")
.setProperty("file1").body()
.pollEnrich("file:otherDirectory", 3000)
.setProperty("file2").body()
.pollEnrich("file:yetAnotherDirectory", 3000)
.setProperty("file3").body();

Related

Multiple consumers for the same endpoint is not allowed

I would like to read files from a directory with camel file consumer but I need my route to be transacted. So I can not use threads inside the rout.
Is it ok to write multiply routes to read from the same endpoint (same directory) with a little change between the uris (for example the sort type) , and like this to avoid the Multiple consumers for the same endpoint is not allowed exception ?

Yeah sure you can do that, mind that you will have competing consumes for the same files now, so mind about read-locks. By default Camel use the marker file.
You can also use different delay so they dont poll at the same interval/time. And you can sort by random to make less chance of processing the same files.

Camel from direct read file and process

I have a couple parallel routes in camel. One is reading sql data. One is reading a file on disk and then comparing to the prior sql data. I need to run route one, and based on if anything is imported, run route 2.
fromF("quartz2://mio/%s?cron={{route_1_cron}}", order).
log("Running data import...").
to("sql:{{sql_select}}").
choice().
when(body().isNull()).
stop().
when(body().isNotNull()).
bean(Utility.class,"incomingSqlData").
choice().when(header("status").isEqualTo(true).
to("direct:start").stop();
So far I am good. Now on the second route how do I start with from(direct:start) and then read the file from it's directory? Since I cannot have from(direct).from("file:..), since that would create two from routes.
And using from("direct:start").to("file:...") will try to write to the file.
Tl:dr: How should I start a route with direct and then read a file?

To expand on #noMad17 comment, you can use a content enricher. So, your from("direct:start") route can look something like:
from("direct:start")
.pollEnrich("file:...", new MyAggregationStrategy())
....
This will prompt your route to read a file.
Note that AggregationStrategy"is used to combine the original exchange and the resource exchange" and is optional. If not provided, then the body of resource exchange (i.e. the exchange resulting from reading the file) will overwrite the original exchange.

How to consume a Chunked HTTP Stream with Apache Camel?

I have an issue consuming a Chunked HTTP RSVP Stream using Camel.
The stream is here and you can find more information about it at the end of this page
I have created a simple route such:
from("http4://stream.meetup.com/2/rsvps").log(org.apache.camel.LoggingLevel.INFO, "MeetupElasticSearch.Log", "JSON RSVP ${body}")
But nothing happens no message are consumed. I tried by adding a camel timer before because I am not sure you can use http4 component directly in the from but result is the same.
Can you help me please?

You cannot use the http4 component in a from() directive.
However, there is several ways to call an URL and to retrieve the result:
One is to create a timer and call the http4 component from a to() like this:
from("quartz2://HttpPoll/myTimer?trigger.repeatCount=0")
.to("http4://stream.meetup.com/2/rsvps")
.log("Body is: ${body}")
.to(mockDestination);
Another way is to use the content enricher pattern if you want to for example aggegates results into one another stucture.
Note as well, if your URL is dynamic you must use recipientList() instead to().
UPDATE:
Another way to consume a stream from the Internet is to use another component than http4: the camel-stream.
This one you can declare it in a from()directive:
// Read stream from the given URL... an publish in a queue
from("stream:url?url=http://stream.meetup.com/2/rsvps&scanStream=true&retry=true").
to("seda:processingQueue");
// Read records from the processing queue
// and do something with them.
from("seda:processingQueue").to("log:out")
You can see more sample in this site.

Camel condition on aggregate of messages

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)

Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Camel use pollEnrich from same URI multiple times returns null body

I have 2 routes. The first route uses poll enrich to check if a file is present. The second route uses a poll enrich on the same uri to read and process the file. The first route invokes the second via a SEDA queue, like so:
public void configure() throws Exception {
String myFile = "file://myDir?fileName=MyFile.zip&delete=false&readLock=none";
from("direct:test")
.pollEnrich(myFile, 10000)
.to("seda:myQueue")
;
from("seda:myQueue")
.pollEnrich(myFile, 10000)
.log("Do something with the body")
;
}
As it stands, if I execute the first route, the poll enrich finds a file, but when the poll enrich in the second route executes, it returns a body of null. If I just execute the second route on its own, it retrieves the file correctly.
Why does the second poll enrich return null, is the file locked? (I was hoping using a combination of noop,readLock, and delete=false would prevent any locking)
Does camel consider the second poll enrich as a duplicate, therefore filtering it out? (I have tried implementing my own IdempotentRepository to return false on contains(), but the second pollEnrich still returns null)
You may wonder why I'm trying to enrich from 2 routes, the first route has to check if a number of files exist, only when all files are present (i.e., pollEnrich doesn't return null) can the second route start processing them.
Is there an alternative to pollEnrich that I can use? I'm thinking that perhaps I'll need to create a bean that retrieves a file by URI and returns it as the body.
I'm using camel 2.11.0

I realize this is now an old topic, but I just had a similar problem.
I suggest you try the options:
noop=true
which you already have, and
idempotent=false
To tell Camel it is OK to process the same file twice.
Update after testing:
I actually tested this with both settings as suggested above, it works some times, but under moderate load, it fails, i.e. returns null body for some exchanges, although not all.
The documentation indicates that setting noop=true automatically sets idempotent=true, so I am not sure the idempotent setting is being honoured in this case.

Is there any specific reason why you are not using just one route?
I don't understand why you are using two routes for this. File component can check if the file is there and if it is, pull it. If you are worried about remembering the files so you don't get duplicates, you can use an idempotent repository. At least, based on your question, I don't think you need to complicate the logic using two routes and the content enricher EIP.

the second route returns NULL because the file was already consumed in the first route...if you are just looking for a signal message when all files are present, then use a file consumer along with an aggregator and possibly a claim check to avoid carrying around large payloads in memory, etc...

As you've probably learned this does not work as one might expect
noop=true&idempotent=false
my guess is that Camel ignores idempotent=false and as documented uses instance of MemoryMessageIdRepository. To work around this, one can configure file endpoint to use custom idempotent repo:
noop=true&idempotentRepository=#myRepo
and register custom repository in the registry or spring context:
#Component("myRepo")
public class MyRepo implements IdempotentRepository {
#Override
public boolean contains(Object o) {
return false;
}
...
}

Try pollEnrich with strategyMethodAllowNull="true". By default , this value is false. When it is false, the aggregation strategy looks for the existing Exchange body, to aggregate the content returned from file.
When we make strategyMethodAllowNull="true", the existing body is considered as null. So every time , the content of the file is set into the current exchange body

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight