I want to write a camel route which will take input from multiple file destination and process them after aggregating.
is it possible to take input from multiple files for a single route?
Yes you can use poll-enrich to call consumer-endpoints like file to enrich the message. This works for many other consumer-endpoints as well like SFTP or message queues.
If you need to read same file multiple times it can get trickier as you'll likely have to set noop=true and possibly use something like dummy idempotent repository to get around camels default behavior.
Note that calling pollEnrich seems to clear headers / create new message so use exchange properties to persist data between pollEnrich calls.
from("file:someDirectory")
.setProperty("file1").body()
.pollEnrich("file:otherDirectory", 3000)
.setProperty("file2").body()
.pollEnrich("file:yetAnotherDirectory", 3000)
.setProperty("file3").body();
I have this simple route in my RouteBuilder.
from("amq:MyQueue").routeId(routeId).log(LoggingLevel.DEBUG, "Log: ${in.headers} - ${in.body}")
As stated in the doc for HTTP-component:
Camel will store the HTTP response from the external server on the OUT body. All headers from the IN message will be copied to the OUT message, ...
I would like to know if this concept also applies to amq-component, routeId, and log? Is it the default behaviour, that IN always gets copied to OUT?
Thank you,
Hadi
First of all: The concept of IN and OUT messages is deprecated in Camel 3.x.
This is mentioned in the Camel 3 migration guide and also annotated on the getOut method of the Camel Exchange.
However, it is not (yet) removed, but what you can take from it: don't care about the OUT message. Use the getMessage method and don't use getIn and getOut anymore.
To answer your question:
Yes, most components behave like this
Every step in the route takes the (IN) message and processes it
The body is typically overwritten with the new processing result
The headers typically stay, new headers can be added
So while the Camel Exchange traverses the route, typically the body is continuously updated and the header list grows.
However, some components like aggregator create new messages based on an AggregationStrategy. In such cases nothing is copied automatically and you have to implement the strategy to your needs.
Let's consider the following use case:
a set of providers pushes data in a corresponding directory on a local server (e.g. P1 pushes data into data/P1, P2 into data/P2, etc.)
each provider has its own generation rules (e.g. P1 generates plain txt files, P2 generates archives, P3 generates encrypted files, etc.)
on a Spring Boot application running on the server, each provider has its own Camel route which, every 10 minutes, reads from the corresponding directory (e.g. R1 reads from("file:data/P1"), R2 reads from("file:data/P2"),
etc.)
the given provider can also combine rules (e.g. P4 generates archives containing encrypted data)
depending on the route, read data is then processed accordingly in order to move plain txt files to a target directory (e.g. R2 unzips data and moves it, R4 unzips data, decrypts the extraction result and moves it, etc.)
As soon as more routes are implemented, it immediately appears obvious that most of the code is duplicated and can be extracted; in fact, since rules can be combined, each data elaboration could be seen as an atomic operation, available for the given route.
Let's consider, for instance, the following atomic operations:
unzip
decrypt
move
So, here's how those routes could look like:
R1
from("file:data/P1")
.to("file:destination")
R2
from("file:data/P2")
// UNZIP LOGIC HERE
.to("file:destination")
R3
from("file:data/P3")
// DECRYPT LOGIC HERE
.to("file:destination")
R4
from("file:data/P4")
// UNZIP LOGIC HERE
// DECRYPT LOGIC HERE
.to("file:destination")
Since I want to extract common logic, I see two main options here (with corresponding R4 resulting code):
extract the logic into a custom component
from("file:data/P4")
// FOR EACH FILE
.to("my-custom-component:unzip")
.to("my-custom-component:decrypt")
.to("file:destination")
extract the logic into smaller routes
from("file:data/P4")
// FOR EACH FILE
.to("direct:my-unzip-route")
.to("direct:my-decrypt-route")
.to("file:destination")
(of course, this is a super simplification, but it's just to give you the big picture).
Between those two options, I prefer the latter, which allows me to quickly reuse Camel EIPs (e.g. unmarshal().pgp()):
from("file:data/P4")
.to("direct:my-unzip-route")
.to("direct:my-decrypt-route")
.to("file:destination");
from("direct:my-unzip-route")
// LOGIC
.unmarshal().zip()
// MORE LOGIC
;
from("direct:my-decrypt-route")
// LOGIC
.unmarshal().pgp()
// MORE LOGIC
;
First question: since the given sub-route changes the original set of files (e.g. unzip could transform one archive into 100 files), would it be better to use enrich() instead of to()?
Second question: what am I supposed to route between those sub-routes? Right now, due to implementation details not explained here for simplicity, I'm routing a collection of file names so, instead of the from("file:data/P4") I have a from("direct:read-P4") which receives a list of file names as input and then propagates this list to the given sub-routes; each sub-route, starting from the list of file names, applies its own logic, generating new files and returning a body with the updated list (e.g. receives {"test.zip"} and returns {"file1.txt", "file2.txt"}).
So, the given sub-route looks like this:
from("direct:...")
// FOR EACH FILE NAME
// APPLY TRANSFORMATION LOGIC TO THE CORRESPONDING FILE
.setBody( // UPDATED LIST OF FILE NAMES )
Is it correct to end a route without a producer EIP? Or am I supposed to end it with the to() which generates the given new file? If so, it would be mandatory for the next sub-route to read all data again from the very same directory, and that doesn't seem so optimized, since I already know which files have to be taken into consideration.
Third question: supposing it's ok to let the given sub-route transform data and return the corresponding names list, how am I supposed to test it? It wouldn't be possible to mock the ending producer, since I'm not completing the route with a producer... so, do I have to use interceptors? Or what else?
Basically, I'm making this question because I have a perfectly working set of routes but, during tests creation, I've noticed that some of them are unnatural to test... and this could easily be the result of a wrong design.
enrich is used if you want/need to integrate external data into your route
The example below illustrates how you could load the actual list of file types from an external file, where the aggregation strategy will add the file types to the exchange header. Afterwards a split is performed on the enriched header and the exchange directed to the respective route (or an error is thrown if no route is available for the file type). This route makes use of the enrich EIP, the split EIP and content based routing.
from(...)
.enrich("file:loadFileList", aggregationStrategy)
.split(header("fileList").tokenize(","))
.to("direct:routeContent");
from("direct:routeContent")
.choice()
.when(header("fileList").isEqualTo("..."))
.to("direct:storeFile")
.when(header("fileList").isEqualTo("..."))
.to("direct:unzip")
.when(header("fileList").isEqualTo("..."))
...
.default()
.throw(...)
.end()
from("direct:storeFile")
.to("file:destination");
from("direct:unzip")
.split(new ZipSplitter())
.streaming().convertBodyTo(String.class)
.choice()
.when(body().isNotNull())
.to("file:destination")
.default()
.throw(...)
.end()
.end()
The corresponding aggregation strategy might look something like this:
public class FileListAggregationStrategy implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange original, Exchange resource) {
String content = resource.getIn().getBody(String.class);
origina.setHeader("fileList", content);
}
}
Note however, that I have not tested the code myself. I just wrote down a basic structure mostly on top of my head.
Is it correct to end a route without a producer EIP
AFAIK Camel should "tell you" (in the sense of an error) that no producer is available when you attempt to load the route. A simple .log(LoggingLevel.DEBUG, "...") however is enough to stop Camel complaining if I remember correctly (haven't worked with Camel in a while now).
You always have the possibility to weave certain routes using AdviceRouteBuilder and modify routes to your needs. The intercepter itself also acts on certain route definitions, such as .to(...) i.e. The easiest way to test routes is to use MockEndpoints and replace the final producer calls (i.e. .to("file:destination") with your mock endpoint and perform your assertions on it. In the above mentioned sample you could i.e. only test the final unzipping route by providing the ProducerTemplate with the respective ZIP archive as body and replace the .to("file:destination") with a mock endpoint you perform assertions against. You could also send a single ZIP archive to the direct:routeContent and pass along a fileList header with the respective name so that the routing should invoke the unzip route. Here you could either reuse the above modified direct:unzip route definition and thus also check whether the archive could be unzipped or you could replace the .to("direct:unzip") route invocation with an other mock endpoint and such.
I would like to read files from a directory with camel file consumer but I need my route to be transacted. So I can not use threads inside the rout.
Is it ok to write multiply routes to read from the same endpoint (same directory) with a little change between the uris (for example the sort type) , and like this to avoid the Multiple consumers for the same endpoint is not allowed exception ?
Yeah sure you can do that, mind that you will have competing consumes for the same files now, so mind about read-locks. By default Camel use the marker file.
You can also use different delay so they dont poll at the same interval/time. And you can sort by random to make less chance of processing the same files.
I have a couple parallel routes in camel. One is reading sql data. One is reading a file on disk and then comparing to the prior sql data. I need to run route one, and based on if anything is imported, run route 2.
fromF("quartz2://mio/%s?cron={{route_1_cron}}", order).
log("Running data import...").
to("sql:{{sql_select}}").
choice().
when(body().isNull()).
stop().
when(body().isNotNull()).
bean(Utility.class,"incomingSqlData").
choice().when(header("status").isEqualTo(true).
to("direct:start").stop();
So far I am good. Now on the second route how do I start with from(direct:start) and then read the file from it's directory? Since I cannot have from(direct).from("file:..), since that would create two from routes.
And using from("direct:start").to("file:...") will try to write to the file.
Tl:dr: How should I start a route with direct and then read a file?
To expand on #noMad17 comment, you can use a content enricher. So, your from("direct:start") route can look something like:
from("direct:start")
.pollEnrich("file:...", new MyAggregationStrategy())
....
This will prompt your route to read a file.
Note that AggregationStrategy"is used to combine the original exchange and the resource exchange" and is optional. If not provided, then the body of resource exchange (i.e. the exchange resulting from reading the file) will overwrite the original exchange.