I would like to read files from a directory with camel file consumer but I need my route to be transacted. So I can not use threads inside the rout.
Is it ok to write multiply routes to read from the same endpoint (same directory) with a little change between the uris (for example the sort type) , and like this to avoid the Multiple consumers for the same endpoint is not allowed exception ?
Yeah sure you can do that, mind that you will have competing consumes for the same files now, so mind about read-locks. By default Camel use the marker file.
You can also use different delay so they dont poll at the same interval/time. And you can sort by random to make less chance of processing the same files.
Related
I want to write a camel route which will take input from multiple file destination and process them after aggregating.
is it possible to take input from multiple files for a single route?
Yes you can use poll-enrich to call consumer-endpoints like file to enrich the message. This works for many other consumer-endpoints as well like SFTP or message queues.
If you need to read same file multiple times it can get trickier as you'll likely have to set noop=true and possibly use something like dummy idempotent repository to get around camels default behavior.
Note that calling pollEnrich seems to clear headers / create new message so use exchange properties to persist data between pollEnrich calls.
from("file:someDirectory")
.setProperty("file1").body()
.pollEnrich("file:otherDirectory", 3000)
.setProperty("file2").body()
.pollEnrich("file:yetAnotherDirectory", 3000)
.setProperty("file3").body();
Let's consider the following use case:
a set of providers pushes data in a corresponding directory on a local server (e.g. P1 pushes data into data/P1, P2 into data/P2, etc.)
each provider has its own generation rules (e.g. P1 generates plain txt files, P2 generates archives, P3 generates encrypted files, etc.)
on a Spring Boot application running on the server, each provider has its own Camel route which, every 10 minutes, reads from the corresponding directory (e.g. R1 reads from("file:data/P1"), R2 reads from("file:data/P2"),
etc.)
the given provider can also combine rules (e.g. P4 generates archives containing encrypted data)
depending on the route, read data is then processed accordingly in order to move plain txt files to a target directory (e.g. R2 unzips data and moves it, R4 unzips data, decrypts the extraction result and moves it, etc.)
As soon as more routes are implemented, it immediately appears obvious that most of the code is duplicated and can be extracted; in fact, since rules can be combined, each data elaboration could be seen as an atomic operation, available for the given route.
Let's consider, for instance, the following atomic operations:
unzip
decrypt
move
So, here's how those routes could look like:
R1
from("file:data/P1")
.to("file:destination")
R2
from("file:data/P2")
// UNZIP LOGIC HERE
.to("file:destination")
R3
from("file:data/P3")
// DECRYPT LOGIC HERE
.to("file:destination")
R4
from("file:data/P4")
// UNZIP LOGIC HERE
// DECRYPT LOGIC HERE
.to("file:destination")
Since I want to extract common logic, I see two main options here (with corresponding R4 resulting code):
extract the logic into a custom component
from("file:data/P4")
// FOR EACH FILE
.to("my-custom-component:unzip")
.to("my-custom-component:decrypt")
.to("file:destination")
extract the logic into smaller routes
from("file:data/P4")
// FOR EACH FILE
.to("direct:my-unzip-route")
.to("direct:my-decrypt-route")
.to("file:destination")
(of course, this is a super simplification, but it's just to give you the big picture).
Between those two options, I prefer the latter, which allows me to quickly reuse Camel EIPs (e.g. unmarshal().pgp()):
from("file:data/P4")
.to("direct:my-unzip-route")
.to("direct:my-decrypt-route")
.to("file:destination");
from("direct:my-unzip-route")
// LOGIC
.unmarshal().zip()
// MORE LOGIC
;
from("direct:my-decrypt-route")
// LOGIC
.unmarshal().pgp()
// MORE LOGIC
;
First question: since the given sub-route changes the original set of files (e.g. unzip could transform one archive into 100 files), would it be better to use enrich() instead of to()?
Second question: what am I supposed to route between those sub-routes? Right now, due to implementation details not explained here for simplicity, I'm routing a collection of file names so, instead of the from("file:data/P4") I have a from("direct:read-P4") which receives a list of file names as input and then propagates this list to the given sub-routes; each sub-route, starting from the list of file names, applies its own logic, generating new files and returning a body with the updated list (e.g. receives {"test.zip"} and returns {"file1.txt", "file2.txt"}).
So, the given sub-route looks like this:
from("direct:...")
// FOR EACH FILE NAME
// APPLY TRANSFORMATION LOGIC TO THE CORRESPONDING FILE
.setBody( // UPDATED LIST OF FILE NAMES )
Is it correct to end a route without a producer EIP? Or am I supposed to end it with the to() which generates the given new file? If so, it would be mandatory for the next sub-route to read all data again from the very same directory, and that doesn't seem so optimized, since I already know which files have to be taken into consideration.
Third question: supposing it's ok to let the given sub-route transform data and return the corresponding names list, how am I supposed to test it? It wouldn't be possible to mock the ending producer, since I'm not completing the route with a producer... so, do I have to use interceptors? Or what else?
Basically, I'm making this question because I have a perfectly working set of routes but, during tests creation, I've noticed that some of them are unnatural to test... and this could easily be the result of a wrong design.
enrich is used if you want/need to integrate external data into your route
The example below illustrates how you could load the actual list of file types from an external file, where the aggregation strategy will add the file types to the exchange header. Afterwards a split is performed on the enriched header and the exchange directed to the respective route (or an error is thrown if no route is available for the file type). This route makes use of the enrich EIP, the split EIP and content based routing.
from(...)
.enrich("file:loadFileList", aggregationStrategy)
.split(header("fileList").tokenize(","))
.to("direct:routeContent");
from("direct:routeContent")
.choice()
.when(header("fileList").isEqualTo("..."))
.to("direct:storeFile")
.when(header("fileList").isEqualTo("..."))
.to("direct:unzip")
.when(header("fileList").isEqualTo("..."))
...
.default()
.throw(...)
.end()
from("direct:storeFile")
.to("file:destination");
from("direct:unzip")
.split(new ZipSplitter())
.streaming().convertBodyTo(String.class)
.choice()
.when(body().isNotNull())
.to("file:destination")
.default()
.throw(...)
.end()
.end()
The corresponding aggregation strategy might look something like this:
public class FileListAggregationStrategy implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange original, Exchange resource) {
String content = resource.getIn().getBody(String.class);
origina.setHeader("fileList", content);
}
}
Note however, that I have not tested the code myself. I just wrote down a basic structure mostly on top of my head.
Is it correct to end a route without a producer EIP
AFAIK Camel should "tell you" (in the sense of an error) that no producer is available when you attempt to load the route. A simple .log(LoggingLevel.DEBUG, "...") however is enough to stop Camel complaining if I remember correctly (haven't worked with Camel in a while now).
You always have the possibility to weave certain routes using AdviceRouteBuilder and modify routes to your needs. The intercepter itself also acts on certain route definitions, such as .to(...) i.e. The easiest way to test routes is to use MockEndpoints and replace the final producer calls (i.e. .to("file:destination") with your mock endpoint and perform your assertions on it. In the above mentioned sample you could i.e. only test the final unzipping route by providing the ProducerTemplate with the respective ZIP archive as body and replace the .to("file:destination") with a mock endpoint you perform assertions against. You could also send a single ZIP archive to the direct:routeContent and pass along a fileList header with the respective name so that the routing should invoke the unzip route. Here you could either reuse the above modified direct:unzip route definition and thus also check whether the archive could be unzipped or you could replace the .to("direct:unzip") route invocation with an other mock endpoint and such.
I have a couple parallel routes in camel. One is reading sql data. One is reading a file on disk and then comparing to the prior sql data. I need to run route one, and based on if anything is imported, run route 2.
fromF("quartz2://mio/%s?cron={{route_1_cron}}", order).
log("Running data import...").
to("sql:{{sql_select}}").
choice().
when(body().isNull()).
stop().
when(body().isNotNull()).
bean(Utility.class,"incomingSqlData").
choice().when(header("status").isEqualTo(true).
to("direct:start").stop();
So far I am good. Now on the second route how do I start with from(direct:start) and then read the file from it's directory? Since I cannot have from(direct).from("file:..), since that would create two from routes.
And using from("direct:start").to("file:...") will try to write to the file.
Tl:dr: How should I start a route with direct and then read a file?
To expand on #noMad17 comment, you can use a content enricher. So, your from("direct:start") route can look something like:
from("direct:start")
.pollEnrich("file:...", new MyAggregationStrategy())
....
This will prompt your route to read a file.
Note that AggregationStrategy"is used to combine the original exchange and the resource exchange" and is optional. If not provided, then the body of resource exchange (i.e. the exchange resulting from reading the file) will overwrite the original exchange.
I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.
I have 2 routes. The first route uses poll enrich to check if a file is present. The second route uses a poll enrich on the same uri to read and process the file. The first route invokes the second via a SEDA queue, like so:
public void configure() throws Exception {
String myFile = "file://myDir?fileName=MyFile.zip&delete=false&readLock=none";
from("direct:test")
.pollEnrich(myFile, 10000)
.to("seda:myQueue")
;
from("seda:myQueue")
.pollEnrich(myFile, 10000)
.log("Do something with the body")
;
}
As it stands, if I execute the first route, the poll enrich finds a file, but when the poll enrich in the second route executes, it returns a body of null. If I just execute the second route on its own, it retrieves the file correctly.
Why does the second poll enrich return null, is the file locked? (I was hoping using a combination of noop,readLock, and delete=false would prevent any locking)
Does camel consider the second poll enrich as a duplicate, therefore filtering it out? (I have tried implementing my own IdempotentRepository to return false on contains(), but the second pollEnrich still returns null)
You may wonder why I'm trying to enrich from 2 routes, the first route has to check if a number of files exist, only when all files are present (i.e., pollEnrich doesn't return null) can the second route start processing them.
Is there an alternative to pollEnrich that I can use? I'm thinking that perhaps I'll need to create a bean that retrieves a file by URI and returns it as the body.
I'm using camel 2.11.0
I realize this is now an old topic, but I just had a similar problem.
I suggest you try the options:
noop=true
which you already have, and
idempotent=false
To tell Camel it is OK to process the same file twice.
Update after testing:
I actually tested this with both settings as suggested above, it works some times, but under moderate load, it fails, i.e. returns null body for some exchanges, although not all.
The documentation indicates that setting noop=true automatically sets idempotent=true, so I am not sure the idempotent setting is being honoured in this case.
Is there any specific reason why you are not using just one route?
I don't understand why you are using two routes for this. File component can check if the file is there and if it is, pull it. If you are worried about remembering the files so you don't get duplicates, you can use an idempotent repository. At least, based on your question, I don't think you need to complicate the logic using two routes and the content enricher EIP.
the second route returns NULL because the file was already consumed in the first route...if you are just looking for a signal message when all files are present, then use a file consumer along with an aggregator and possibly a claim check to avoid carrying around large payloads in memory, etc...
As you've probably learned this does not work as one might expect
noop=true&idempotent=false
my guess is that Camel ignores idempotent=false and as documented uses instance of MemoryMessageIdRepository. To work around this, one can configure file endpoint to use custom idempotent repo:
noop=true&idempotentRepository=#myRepo
and register custom repository in the registry or spring context:
#Component("myRepo")
public class MyRepo implements IdempotentRepository {
#Override
public boolean contains(Object o) {
return false;
}
...
}
Try pollEnrich with strategyMethodAllowNull="true". By default , this value is false. When it is false, the aggregation strategy looks for the existing Exchange body, to aggregate the content returned from file.
When we make strategyMethodAllowNull="true", the existing body is considered as null. So every time , the content of the file is set into the current exchange body