Camel from direct read file and process

Camel from direct read file and process - apache-camel

I have a couple parallel routes in camel. One is reading sql data. One is reading a file on disk and then comparing to the prior sql data. I need to run route one, and based on if anything is imported, run route 2.
fromF("quartz2://mio/%s?cron={{route_1_cron}}", order).
log("Running data import...").
to("sql:{{sql_select}}").
choice().
when(body().isNull()).
stop().
when(body().isNotNull()).
bean(Utility.class,"incomingSqlData").
choice().when(header("status").isEqualTo(true).
to("direct:start").stop();
So far I am good. Now on the second route how do I start with from(direct:start) and then read the file from it's directory? Since I cannot have from(direct).from("file:..), since that would create two from routes.
And using from("direct:start").to("file:...") will try to write to the file.
Tl:dr: How should I start a route with direct and then read a file?

To expand on #noMad17 comment, you can use a content enricher. So, your from("direct:start") route can look something like:
from("direct:start")
.pollEnrich("file:...", new MyAggregationStrategy())
....
This will prompt your route to read a file.
Note that AggregationStrategy"is used to combine the original exchange and the resource exchange" and is optional. If not provided, then the body of resource exchange (i.e. the exchange resulting from reading the file) will overwrite the original exchange.

Related

Taking input from multiple files to a camel route

I want to write a camel route which will take input from multiple file destination and process them after aggregating.
is it possible to take input from multiple files for a single route?

Yes you can use poll-enrich to call consumer-endpoints like file to enrich the message. This works for many other consumer-endpoints as well like SFTP or message queues.
If you need to read same file multiple times it can get trickier as you'll likely have to set noop=true and possibly use something like dummy idempotent repository to get around camels default behavior.
Note that calling pollEnrich seems to clear headers / create new message so use exchange properties to persist data between pollEnrich calls.
from("file:someDirectory")
.setProperty("file1").body()
.pollEnrich("file:otherDirectory", 3000)
.setProperty("file2").body()
.pollEnrich("file:yetAnotherDirectory", 3000)
.setProperty("file3").body();

in-Message copied in out-Message

I have this simple route in my RouteBuilder.
from("amq:MyQueue").routeId(routeId).log(LoggingLevel.DEBUG, "Log: ${in.headers} - ${in.body}")
As stated in the doc for HTTP-component:
Camel will store the HTTP response from the external server on the OUT body. All headers from the IN message will be copied to the OUT message, ...
I would like to know if this concept also applies to amq-component, routeId, and log? Is it the default behaviour, that IN always gets copied to OUT?
Thank you,
Hadi

First of all: The concept of IN and OUT messages is deprecated in Camel 3.x.
This is mentioned in the Camel 3 migration guide and also annotated on the getOut method of the Camel Exchange.
However, it is not (yet) removed, but what you can take from it: don't care about the OUT message. Use the getMessage method and don't use getIn and getOut anymore.
To answer your question:
Yes, most components behave like this
Every step in the route takes the (IN) message and processes it
The body is typically overwritten with the new processing result
The headers typically stay, new headers can be added
So while the Camel Exchange traverses the route, typically the body is continuously updated and the header list grows.
However, some components like aggregator create new messages based on an AggregationStrategy. In such cases nothing is copied automatically and you have to implement the strategy to your needs.

Apache Camel: how to isolate logic into testable atomic routes

Let's consider the following use case:
a set of providers pushes data in a corresponding directory on a local server (e.g. P1 pushes data into data/P1, P2 into data/P2, etc.)
each provider has its own generation rules (e.g. P1 generates plain txt files, P2 generates archives, P3 generates encrypted files, etc.)
on a Spring Boot application running on the server, each provider has its own Camel route which, every 10 minutes, reads from the corresponding directory (e.g. R1 reads from("file:data/P1"), R2 reads from("file:data/P2"),
etc.)
the given provider can also combine rules (e.g. P4 generates archives containing encrypted data)
depending on the route, read data is then processed accordingly in order to move plain txt files to a target directory (e.g. R2 unzips data and moves it, R4 unzips data, decrypts the extraction result and moves it, etc.)
As soon as more routes are implemented, it immediately appears obvious that most of the code is duplicated and can be extracted; in fact, since rules can be combined, each data elaboration could be seen as an atomic operation, available for the given route.
Let's consider, for instance, the following atomic operations:
unzip
decrypt
move
So, here's how those routes could look like:
R1
from("file:data/P1")
.to("file:destination")
R2
from("file:data/P2")
// UNZIP LOGIC HERE
.to("file:destination")
R3
from("file:data/P3")
// DECRYPT LOGIC HERE
.to("file:destination")
R4
from("file:data/P4")
// UNZIP LOGIC HERE
// DECRYPT LOGIC HERE
.to("file:destination")
Since I want to extract common logic, I see two main options here (with corresponding R4 resulting code):
extract the logic into a custom component
from("file:data/P4")
// FOR EACH FILE
.to("my-custom-component:unzip")
.to("my-custom-component:decrypt")
.to("file:destination")
extract the logic into smaller routes
from("file:data/P4")
// FOR EACH FILE
.to("direct:my-unzip-route")
.to("direct:my-decrypt-route")
.to("file:destination")
(of course, this is a super simplification, but it's just to give you the big picture).
Between those two options, I prefer the latter, which allows me to quickly reuse Camel EIPs (e.g. unmarshal().pgp()):
from("file:data/P4")
.to("direct:my-unzip-route")
.to("direct:my-decrypt-route")
.to("file:destination");
from("direct:my-unzip-route")
// LOGIC
.unmarshal().zip()
// MORE LOGIC
;
from("direct:my-decrypt-route")
// LOGIC
.unmarshal().pgp()
// MORE LOGIC
;
First question: since the given sub-route changes the original set of files (e.g. unzip could transform one archive into 100 files), would it be better to use enrich() instead of to()?
Second question: what am I supposed to route between those sub-routes? Right now, due to implementation details not explained here for simplicity, I'm routing a collection of file names so, instead of the from("file:data/P4") I have a from("direct:read-P4") which receives a list of file names as input and then propagates this list to the given sub-routes; each sub-route, starting from the list of file names, applies its own logic, generating new files and returning a body with the updated list (e.g. receives {"test.zip"} and returns {"file1.txt", "file2.txt"}).
So, the given sub-route looks like this:
from("direct:...")
// FOR EACH FILE NAME
// APPLY TRANSFORMATION LOGIC TO THE CORRESPONDING FILE
.setBody( // UPDATED LIST OF FILE NAMES )
Is it correct to end a route without a producer EIP? Or am I supposed to end it with the to() which generates the given new file? If so, it would be mandatory for the next sub-route to read all data again from the very same directory, and that doesn't seem so optimized, since I already know which files have to be taken into consideration.
Third question: supposing it's ok to let the given sub-route transform data and return the corresponding names list, how am I supposed to test it? It wouldn't be possible to mock the ending producer, since I'm not completing the route with a producer... so, do I have to use interceptors? Or what else?
Basically, I'm making this question because I have a perfectly working set of routes but, during tests creation, I've noticed that some of them are unnatural to test... and this could easily be the result of a wrong design.

enrich is used if you want/need to integrate external data into your route
The example below illustrates how you could load the actual list of file types from an external file, where the aggregation strategy will add the file types to the exchange header. Afterwards a split is performed on the enriched header and the exchange directed to the respective route (or an error is thrown if no route is available for the file type). This route makes use of the enrich EIP, the split EIP and content based routing.
from(...)
.enrich("file:loadFileList", aggregationStrategy)
.split(header("fileList").tokenize(","))
.to("direct:routeContent");
from("direct:routeContent")
.choice()
.when(header("fileList").isEqualTo("..."))
.to("direct:storeFile")
.when(header("fileList").isEqualTo("..."))
.to("direct:unzip")
.when(header("fileList").isEqualTo("..."))
...
.default()
.throw(...)
.end()
from("direct:storeFile")
.to("file:destination");
from("direct:unzip")
.split(new ZipSplitter())
.streaming().convertBodyTo(String.class)
.choice()
.when(body().isNotNull())
.to("file:destination")
.default()
.throw(...)
.end()
.end()
The corresponding aggregation strategy might look something like this:
public class FileListAggregationStrategy implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange original, Exchange resource) {
String content = resource.getIn().getBody(String.class);
origina.setHeader("fileList", content);
}
}
Note however, that I have not tested the code myself. I just wrote down a basic structure mostly on top of my head.
Is it correct to end a route without a producer EIP
AFAIK Camel should "tell you" (in the sense of an error) that no producer is available when you attempt to load the route. A simple .log(LoggingLevel.DEBUG, "...") however is enough to stop Camel complaining if I remember correctly (haven't worked with Camel in a while now).
You always have the possibility to weave certain routes using AdviceRouteBuilder and modify routes to your needs. The intercepter itself also acts on certain route definitions, such as .to(...) i.e. The easiest way to test routes is to use MockEndpoints and replace the final producer calls (i.e. .to("file:destination") with your mock endpoint and perform your assertions on it. In the above mentioned sample you could i.e. only test the final unzipping route by providing the ProducerTemplate with the respective ZIP archive as body and replace the .to("file:destination") with a mock endpoint you perform assertions against. You could also send a single ZIP archive to the direct:routeContent and pass along a fileList header with the respective name so that the routing should invoke the unzip route. Here you could either reuse the above modified direct:unzip route definition and thus also check whether the archive could be unzipped or you could replace the .to("direct:unzip") route invocation with an other mock endpoint and such.

Multiple consumers for the same endpoint is not allowed

I would like to read files from a directory with camel file consumer but I need my route to be transacted. So I can not use threads inside the rout.
Is it ok to write multiply routes to read from the same endpoint (same directory) with a little change between the uris (for example the sort type) , and like this to avoid the Multiple consumers for the same endpoint is not allowed exception ?

Yeah sure you can do that, mind that you will have competing consumes for the same files now, so mind about read-locks. By default Camel use the marker file.
You can also use different delay so they dont poll at the same interval/time. And you can sort by random to make less chance of processing the same files.

Camel use pollEnrich from same URI multiple times returns null body

I have 2 routes. The first route uses poll enrich to check if a file is present. The second route uses a poll enrich on the same uri to read and process the file. The first route invokes the second via a SEDA queue, like so:
public void configure() throws Exception {
String myFile = "file://myDir?fileName=MyFile.zip&delete=false&readLock=none";
from("direct:test")
.pollEnrich(myFile, 10000)
.to("seda:myQueue")
;
from("seda:myQueue")
.pollEnrich(myFile, 10000)
.log("Do something with the body")
;
}
As it stands, if I execute the first route, the poll enrich finds a file, but when the poll enrich in the second route executes, it returns a body of null. If I just execute the second route on its own, it retrieves the file correctly.
Why does the second poll enrich return null, is the file locked? (I was hoping using a combination of noop,readLock, and delete=false would prevent any locking)
Does camel consider the second poll enrich as a duplicate, therefore filtering it out? (I have tried implementing my own IdempotentRepository to return false on contains(), but the second pollEnrich still returns null)
You may wonder why I'm trying to enrich from 2 routes, the first route has to check if a number of files exist, only when all files are present (i.e., pollEnrich doesn't return null) can the second route start processing them.
Is there an alternative to pollEnrich that I can use? I'm thinking that perhaps I'll need to create a bean that retrieves a file by URI and returns it as the body.
I'm using camel 2.11.0

I realize this is now an old topic, but I just had a similar problem.
I suggest you try the options:
noop=true
which you already have, and
idempotent=false
To tell Camel it is OK to process the same file twice.
Update after testing:
I actually tested this with both settings as suggested above, it works some times, but under moderate load, it fails, i.e. returns null body for some exchanges, although not all.
The documentation indicates that setting noop=true automatically sets idempotent=true, so I am not sure the idempotent setting is being honoured in this case.

Is there any specific reason why you are not using just one route?
I don't understand why you are using two routes for this. File component can check if the file is there and if it is, pull it. If you are worried about remembering the files so you don't get duplicates, you can use an idempotent repository. At least, based on your question, I don't think you need to complicate the logic using two routes and the content enricher EIP.

the second route returns NULL because the file was already consumed in the first route...if you are just looking for a signal message when all files are present, then use a file consumer along with an aggregator and possibly a claim check to avoid carrying around large payloads in memory, etc...

As you've probably learned this does not work as one might expect
noop=true&idempotent=false
my guess is that Camel ignores idempotent=false and as documented uses instance of MemoryMessageIdRepository. To work around this, one can configure file endpoint to use custom idempotent repo:
noop=true&idempotentRepository=#myRepo
and register custom repository in the registry or spring context:
#Component("myRepo")
public class MyRepo implements IdempotentRepository {
#Override
public boolean contains(Object o) {
return false;
}
...
}

Try pollEnrich with strategyMethodAllowNull="true". By default , this value is false. When it is false, the aggregation strategy looks for the existing Exchange body, to aggregate the content returned from file.
When we make strategyMethodAllowNull="true", the existing body is considered as null. So every time , the content of the file is set into the current exchange body

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Camel from direct read file and process - apache-camel

Related

Taking input from multiple files to a camel route

in-Message copied in out-Message

Apache Camel: how to isolate logic into testable atomic routes

Multiple consumers for the same endpoint is not allowed

Camel use pollEnrich from same URI multiple times returns null body

Categories

Resources