How to get the number of iteration in a split? - apache-camel

I am new to Apache Camel.
I need to split a file line by line and to do some operation on each lines.
At the end I need a footer line with information from previous lines (number of lines and sum of the values of a column)
My understanding is that I should be using an aggregation strategy, so I tried something like that:
.split(body().tokenize("\r\n|\n"), sumAggregationStrategy)
.process("fileProcessor")
In my aggregation strategy I just set two headers with the incremented values:
newExchange.getIn().setHeader("sum", sum);
newExchange.getIn().setHeader("numberOfLines", numberOfLines);
And in the processor I try to access those headers:
int sum = inMessage.getIn().getHeader("sum", Integer.class);
int numberOfLines = inMessage.getIn().getHeader("numberOfLines", Integer.class);
There are two problems.
First of all the aggregation strategy seem to be called after the first iteration of the processor.
Second, my headers don't exist in the processors, so I can't access the information I need when I am at the last line of the file. The headers do exist in the oldExchange of the aggregators though.
I think I can still do it, but I would have to create a new processor just for the purpose of making the last line of the file.
Is there something I'm missing with the aggregation strategies ? Is there a better way to do this ?

An aggregator will be called for every iteration of the split. This is how they are supposed to work.
The reason you don't see the headers within the processor is, headers live and die with the message and not visible outside. You need to set the 'sum' and 'numberOfLines' as exchange properties instead. Because every iteration within a split results in an exchange, you need get the property from old exchange and set them again in the new exchange to pass them to subsequent components in the route.
This is how you could do,
AggregationStrategy:
public class SumAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
long sum = 0;
long numberOfLines = 0;
if(oldExchange != null) {
sum = (Long) oldExchange.getProperty("sum");
numberOfLines = oldExchange.getProperty("numberOfLines ");
}
sum = sum + ((Line)newExchange.getIn().getBody()).getColumnValue();
numberOfLines ++;
newExchange.setProperty("sum", sum);
newExchange.setProperty("numberOfLines",numberOfLines);
oldExchange.setProperty("CamelSplitComplete", newExchange.getProperty("CamelSplitComplete")); //This is for the completion predicate
return newExchange;
}
}
Route:
.split(body().tokenize("\r\n|\n"),sumAggregationStrategy)
.completionPredicate(simple("${exchangeProperty.CamelSplitComplete} == true"))
.process("fileProcessor").to("file:your_file_name?fileExist=Append");
Processor:
public class FileProcessor implements Processor {
public void process(Exchange exchange) throws Exception {
long sum = exchange.getProperty("sum");
long numberOfLines = exchange.getProperty("numberOfLines");
String footer = "Your Footer String";
exchange.getIn().setBody(footer);
}
}

Using custom aggregator like Srini suggested is a good idea. It might also support streaming large files better.
However if you want to keep things simple and avoid split and aggregation you could just use .tokenize("\r\n|\n") and convertBodyTo(List.class) to convert the string to a list of strings.
from("direct:addFooter")
.routeId("addFooter")
.setBody().tokenize("\r\n|\n")
.convertBodyTo(List.class)
.process(exchange -> {
List<String> rows = exchange.getMessage().getBody(List.class);
int sum = 0;
for (int i = 0; i < rows.size(); i++) {
sum += Integer.parseInt(rows.get(i));
}
int numberOfLines = rows.size();
exchange.getMessage().setHeader("numberOfLines", numberOfLines);
exchange.getMessage().setHeader("sum", sum);
})
// Write data to file using file or stream component
// you could also use Velocity, FreeMarker or Mustache templates to format the
// result before writing it to file.
;

Related

Why does my apache camel split/aggregate route return no results?

I'm trying to read a binary file, convert it into a pojo format and then output as CSV. The unmarshalling (and marshalling) seems to be fine, but I'm having trouble optimising the converting the relevant records to Foo.class. The attempt below returns no results.
from(String.format("file://%s?move=%s", INPUT_DIRECTORY, MOVE_DIRECTORY))
.unmarshal(unmarshaller)
.split(bodyAs(Iterator.class), new ListAggregationStrategy())
.choice()
.when(not(predicate)).stop()
.otherwise().convertBodyTo(Foo.class)
.end()
.end()
.marshal(csv)
.to(String.format("file://%s?fileName=${header.CamelFileName}.csv", OUTPUT_DIRECTORY));
I was able to get it to work like this, but it feels like there has to be a better way - This will be need to be efficient, and having a 1s timeout feels like it goes against that, which is why I was attempting to use the built in split aggregation. Alternatively some way of using completionFromBatchConsumer, but I was struggling to make that work either!.
from(String.format("file://%s?move=%s", INPUT_DIRECTORY, MOVE_DIRECTORY))
.unmarshal(unmarshaller)
.split(bodyAs(Iterator.class))
.streaming()
.filter(predicate)
.convertBodyTo(Foo.class)
.aggregate(header("CamelFileName"), new ListAggregationStrategy())
.completionTimeout(1000)
.marshal(csv)
.to(String.format("file://%s?fileName=${header.CamelFileName}.csv", OUTPUT_DIRECTORY));
You could create your own AggregationStrategy in your first solution.
Instead of calling stop() in your choice statement, put a simple header like "skipMerge" to true.
In your strategy, test if this header exists and if so, skip it.
class ArrayListAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
Object newBody = newExchange.getIn().getBody();
Boolean skipMerge = newExchange.getIn().getHeader("skipMerge", Boolean.class);
if (!skipMerge) { return oldExchange; }
ArrayList<Object> list = null;
if (oldExchange == null) {
list = new ArrayList<Object>();
list.add(newBody);
newExchange.getIn().setBody(list);
return newExchange;
} else {
list = oldExchange.getIn().getBody(ArrayList.class);
list.add(newBody);
return oldExchange;
}
}
}
Currently, your code never goes to marshal(csv) because the aggregator does not receive all the splitted parts.

What is the best way to set up a Spring JPA to handling searching for items based on tags?

I am trying to set up a search system for a database where each element (a code) in one table has tags mapped by a Many to many relationship. I am trying to write a controller, "search" where I can search a set of tags which basically act like key words, giving me an element list where the elements all have the specified tags. My current function is incredibly naive, basically it consists of retrieving all the codes which are mapped to be a tag, then adding those a set, then sorting the codes by how many times the tags for each code is found in the query string.
public List<Code> naiveSearch(String queryText) {
String[] tagMatchers = queryText.split(" ");
Set<Code> retained = new HashSet<>();
for (int i = 0; i < Math.min(tagMatchers.length, 4); i++) {
tagRepository.findAllByValueContaining(tagMatchers[i]).ifPresent((tags) -> {
tags.forEach(tag -> {
retained.addAll(tag.getCodes());
}
);
});
}
SortedMap<Integer, List<Code>> matches = new TreeMap<>();
List<Code> c;
for (Code code : retained) {
int sum = 0;
for (String tagMatcher : tagMatchers) {
for (Tag tag : code.getTags()) {
if (tag.getValue().contains(tagMatcher)) {
sum += 1;
}
}
}
c = matches.getOrDefault(sum, new ArrayList<>());
c.add(code);
matches.put(sum, c);
}
c = new ArrayList<>();
matches.values().forEach(c::addAll);
Collections.reverse(c);
return c;
}
This is quite slow and the overhead is unacceptable. My previous trick was a basically retrieval on the description for each code in the CRUDrepository
public interface CodeRepository extends CrudRepository<Code, Long> {
Optional<Code> findByCode(String codeId);
Optional<Iterable<Code>> findAllByDescriptionContaining(String query);
}
However this is brittle since the order of tags in containing factors into whether the result will be found. eg. I want "tall ... dog" == "dog ... tall"
So okay, I'm back several days later with how I actually solved this problem. I used hibernate's built in search library which has a very easy implementation in spring. Just paste the required maven coordinates in your POM.xml and it was ready to roll.
First I removed the manytomany for the tags<->codes and just concatenated all my tags into a string field. Next I added #Field to the tags field and then wrote a basic search Method. The method I wrote was a very simple search function which took a set of "key words" or tags then performed a boolean search based on fuzzy terms for the the indexed tags for each code. So far it is pretty good. My database is fairly small (100k) so I'm not sure about how this will scale, but currently each search returns in about 20-50 ms which is fast enough for my purposes.

whats is happening in this apex code?

String color1 = moreColors.get(0);
String color2 = moreColors[0];
System.assertEquals(color1, color2);
// Iterate over a list to read elements
for(Integer i=0;i<colors.size();i++) {
// Write value to the debug log
System.debug(colors[i]);
}
I am learning Apex and just started what is meaning of line System.assertEquals(color1, color2); and what is mean by debug log here?
System.assert, System.assertEquals, System.assertNotEquals. I argue these are three of the most important method calls in Apex.
These are assert statements. They are used in testing to validate that the data you have matches your expectations.
System.assert tests an logical statement. If the statement evaluates to True, the code keeps running. If the statement evaluates to False, the code throws an exception.
System.assertEquals tests that two values are equal. If the two are equal, the code keeps running. If they are not equal, the code throws an exception.
System.assertNotEqual tests that two values are not equal. If the two are not equal, the code keeps running. If they are equal, the code throws an exception.
These are critical for completing system testing. In Apex Code, you must have 75% line test coverage. Many people do this by generating test code that simply covers 75% of their lines of code. However, this is an incomplete test. A good test class actually tests that the code does what you expect. This is really great to ensure that your code actually works. This makes debugging and regression testing far easier. For example. Lets create a method called square(Integer i) that squares the integer returned.
public static Integer square( Integer i ) {
return i * i;
}
A poor test method would simply be:
#isTest
public static void test_squar() {
square( 1 );
}
A good test method could be:
#isTest
public static void test_square() {
Integer i;
Integer ret_square;
i = 3;
ret_square = square( i );
System.assertEquals( i * i; ret_square );
}
How I would probably write it is like this:
#isTest
public static void test_square() {
for( Integer i = 0; i < MAX_TEST_RUNS; i++ ) {
System.assertEquals( i*i, square( i ) );
}
}
Good testing practices are integral to being a good developer. Look up more on Testing-Driven Development. https://en.wikipedia.org/wiki/Test-driven_development
Line by Line ...
//Get color in position 0 of moreColors list using the list get method store in string color1
String color1 = moreColors.get(0);
//Get color in position 0 of moreColors list using array notation store in string color2,
//basically getting the same value in a different way
String color2 = moreColors[0];
//Assert that the values are the same, throws exception if false
System.assertEquals(color1, color2);
// Iterate over a list to read elements
for(Integer i=0;i<colors.size();i++) {
// Write value to the debug log
System.debug(colors[i]);//Writes the value of color list ith position to the debug log
}
If you are running this code anonymously via the Developer console you can look for lines prefixed with DEBUG| to find the statements, for e.g.
16:09:32:001 USER_DEBUG 1|DEBUG| blue
More about system methods can be found at https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_methods_system_system.htm#apex_System_System_methods

Aggregation and filtering through the consumer template

This is more a general, whats the best practice question...
I have a few processes where the consumer template has been used to read a directory (or a MQ queue) for whatever is available and then stop itself, the entire route-set it calls is created programmatically based of a few parameters
So using the consumer template method below... Is there a way to assign
A filter operation programmatically (ie, if i want to filter out certain files from the below, its easy if its through a standard route... (through .filter) but at the moment, i have no predefined beans, so adding #filter=filter to the EIP is not really an option).
An aggregation function from inside my while loop. (while still using the template).
#Override
public void process(Exchange exchange) throws Exception {
getConsumer().start();
int exchangeCount = 0;
while (true) {
String consumerEp = "file:d://directory?delete=true&sendEmptyMessageWhenIdle=true&idempotent=false";
Exchange fileExchange = getConsumer().receive(consumerEp);
if (fileExchange == null || fileExchange.getIn()==null || fileExchange.getIn().getHeader(CAMEL_FILE_NAME)==null) {
break;
}
exchangeCount++;
Boolean batchStatus = (Boolean) fileExchange.getProperty(PROP_CAMEL_BATCH_COMPLETE);
LOG.info("---PROCESSING : " + fileExchange.getIn().getHeader(CAMEL_FILE_NAME));
getProducer().send("direct:some-other-process", fileExchange);
//Get the CamelBatchComplete Property to establish the end of the batch, and not cycle through twice.
if(batchStatus!=null && batchStatus==true){
break;
}
}
// Stop the consumer service
getConsumer().stop();
LOG.info("End Group Operation : Total Exchanges=" + exchangeCount);
}

Is there a way to make the output file of a stream:file?fileName= dynamic?

Given a simple route like this
route.from("direct:foo")
.split()
.tokenize("\n")
.streaming()
.to("stream:file?fileName=target/streaming${header.count}.txt&closeOnDone=true");
which I then trigger with this
#Test
public void splitAndStreamToFile() {
StringBuilder builder = new StringBuilder();
for(int i = 0; i < 500; i++) {
builder.append(i);
builder.append("\n");
}
for(int i = 0; i < 10; i++) {
template.sendBodyAndHeader(builder.toString(), "count", i);
}
}
I get one big file that contains 10 times 500 lines, where I would have hoped to have 10 files that contain 500 lines each.
In other words, it seems that the fileName in the stream:file endpoint is not dynamic. I am wondering if this is at all possible? My google-fu turned up nothing so far.
EDIT:
With Claus' answer, I got it to work like this:
route.from("direct:foo")
.split()
.tokenize("\n")
.streaming()
.recipientList(route.simple("stream:file?fileName=target/streaming${header.count}.txt&closeOnDone=true"));
Its a dynamic to which there is an EIP pattern for:
http://camel.apache.org/how-to-use-a-dynamic-uri-in-to.html
But it could be a good idea to support the file/simple language on the fileName option as the regular file component does. Fell free to log a JIRA ticket about this improvement.
Sourcecode of the StreamProducer looks like it does not support any of the expression languages of Camel yet:
private OutputStream resolveStreamFromFile() throws IOException {
String fileName = endpoint.getFileName();
ObjectHelper.notEmpty(fileName, "fileName");
LOG.debug("About to write to file: {}", fileName);
File f = new File(fileName);
// will create a new file if missing or append to existing
f.getParentFile().mkdirs();
f.createNewFile();
return new FileOutputStream(f, true);
}
See sourecode.
If you need dynamic filenames, you should take a look at the file component, which supports the file language and the CamelFileName header.
In short,
toD uri=stream:file...
will do it.
The "toD" basically translates the "simple" or "file language" before it hits the stream component code...so that works for "fileName=..."

Resources