Camel Splitter store CamelSplitSize and processed rows on failure - apache-camel

http://camel.apache.org/splitter.html [1]
From [1] link i saw CamelSplitSize will be on the completed Exchange.
I am learning camel and i would like to know is there a way possible to split the xml file containing 100 rows (assuming 100 rows)
If the split failed while processing the 50th row and we need to show CamelSplitIndex as 50, CamelSplitSize as 100 and CamelSplitComplete as false
.bean(Splitter.class,"saveFile("${camelContext.properties[mySplitSize]}, ${camelContext.properties[mySplitIndex]}, ${camelContext.properties[mySplitComplete]})")
I could not find a way to accomplish this as the link [1] clearly states CamelSplitSize will be stored only on the completed Exchang. Any way to achieve this ??

If you need this properties you can catch exception that causes stopping splitter and get exchange which caused the exception. There you will find your properties.
public void show(Exchange exchange) {
CamelExchangeException camelExceptionCaught = (CamelExchangeException) exchange.getProperty("CamelExceptionCaught");
System.out.println(camelExceptionCaught.getExchange().getProperty("CamelSplitSize"));
System.out.println(camelExceptionCaught.getExchange().getProperty("CamelSplitComplete"));
System.out.println(camelExceptionCaught.getExchange().getProperty("CamelSplitIndex"));
}

Related

Sql Component :Consume multiple rows and mark them all as processed using onConsume

I configured camel sql component to read data from from database table . I have "onConsume" parameter working when i read one row at a time , but doesn't work when i try to read multiple rows at a time using "maxMessagesPerPoll". Here is what i tried ...
Working : When i read one row at a time and update the row using onConsume .
My consumer endpoint uri looks like :
sql:select * from REPORT where IS_VIOLATED != 'N' and TYPE = 'Provisioning'?consumer.delay=1000&consumer.onConsume=update REPORT set IS_VIOLATED = 'N' where REPORT_ID =:#REPORT_ID
Not working : When I configured camel's sql component to read configurable rows(using "maxMessagesPerPoll") . It reads multiple rows at a time but onConsume doesn't seem to work . I tried to tell camel to use IN operator and setting header value(REPORT_ID) with a array of values for IN clause.
My consumer endpoint uri now looks like :
sql:select * from REPORT where IS_VIOLATED != 'N' and TYPE = 'Provisioning'?consumer.delay=1000&maxMessagesPerPoll=3&consumer.useIterator=false&consumer.onConsume=update REPORT set IS_VIOLATED = 'N' where REPORT_ID in(:#REPORT_ID)
I might be doing something wrong here. I did enough searching on this already and found related post1, post2 . But it doesn't put me on correct path.
I need to be able to mark all the consumed rows to IS_VIOLATED = 'N' .
Thanks for your help.
I noticed that you set consumer.useIterator=false, and the doc says:
If true each row returned when polling will be processed individually. If false the entire java.util.List of data is set as the IN body.
So I think that because of this option, the :#REPORT_ID is no more understood, since it would be from the entire list and no more from each row.
Maybe removing this option would already be enough.
I also didn't understand why you changed the where clause from where REPORT_ID =:#REPORT_ID to where REPORT_ID in(:#REPORT_ID).
By carefully looking at the apache sql component doc :
I tried implementing custom processing stratergy, using attribute "processingStrategy"`.
public class ReportProcessingStratergy implements SqlProcessingStrategy {
#Override
public int commit(DefaultSqlEndpoint defaultSqlEndpoint, Exchange exchange, Object o, JdbcTemplate jdbcTemplate, String s) throws Exception {
s = s.replace("?","5066834,5066835,5066832");
return jdbcTemplate.update(s);
}
#Override
public int commitBatchComplete(DefaultSqlEndpoint defaultSqlEndpoint, JdbcTemplate jdbcTemplate, String s) throws Exception {
return 0;
}
}
configure spring bean :
<bean class="go.ga.ns.reconc.sl.ReportProcessingStratergy" id="reportProcessingStratergy">
now my sql consumer endpoint uri looks like :
sql:select * from REPORT where IS_VIOLATED != 'N' and TYPE = 'Provisioning'?consumer.delay=1000&maxMessagesPerPoll=3&consumer.useIterator=false&&processingStrategy=#reportProcessingStratergy&consumer.onConsume=update REPORT set IS_VIOLATED = 'N' where REPORT_ID in(?)
note :processingStrategy=#reportProcessingStratergy(# has significance as explained here, it did not work with out it)

Apache Camel: How to use "done" files to identify records written into a file is over and it can be moved

As the title suggests, I want to move a file into a different folder after I am done writing DB records to to it.
I have already looked into several questions related to this: Apache camel file with doneFileName
But my problem is a little different since I am using split, stream and parallelProcessing for getting the DB records and writing to a file. I am not able to know when and how to create the done file along with the parallelProcessing. Here is the code snippet:
My route to fetch records and write it to a file:
from(<ROUTE_FETCH_RECORDS_AND_WRITE>)
.setHeader(Exchange.FILE_PATH, constant("<path to temp folder>"))
.setHeader(Exchange.FILE_NAME, constant("<filename>.txt"))
.setBody(constant("<sql to fetch records>&outputType=StreamList))
.to("jdbc:<endpoint>)
.split(body(), <aggregation>).streaming().parallelProcessing()
.<some processors>
.aggregate(header(Exchange.FILE_NAME), (o, n) -> {
<file aggregation>
return o;
}).completionInterval(<some time interval>)
.toD("file://<to the temp file>")
.end()
.end()
.to("file:"+<path to temp folder>+"?doneFileName=${file:header."+Exchange.FILE_NAME+"}.done"); //this line is just for trying out done filename
In my aggregation strategy for the splitter I have code that basically counts records processed and prepares the response that would be sent back to the caller.
And in my other aggregate outside I have code for aggregating the db rows and post that writing into the file.
And here is the file listener for moving the file:
from("file://<path to temp folder>?delete=true&include=<filename>.*.TXT&doneFileName=done")
.to(file://<final filename with path>?fileExist=Append);
Doing something like this is giving me this error:
Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - Cannot store file: <folder-path>/filename.TXT] org.apache.camel.component.file.GenericFileOperationFailedException: Cannot store file: <folder-path>/filename.TXT
at org.apache.camel.component.file.FileOperations.storeFile(FileOperations.java:292)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.component.file.GenericFileProducer.writeFile(GenericFileProducer.java:277)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.component.file.GenericFileProducer.processExchange(GenericFileProducer.java:165)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.component.file.GenericFileProducer.process(GenericFileProducer.java:79)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.util.AsyncProcessorConverterHelper$ProcessorToAsyncProcessorBridge.process(AsyncProcessorConverterHelper.java:61)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:141)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:460)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.component.seda.SedaConsumer.sendToConsumers(SedaConsumer.java:298)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.component.seda.SedaConsumer.doRun(SedaConsumer.java:207)[209:org.apache.camel.camel-core:2.16.2]
at org.apache.camel.component.seda.SedaConsumer.run(SedaConsumer.java:154)[209:org.apache.camel.camel-core:2.16.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_144]
at java.lang.Thread.run(Thread.java:748)[:1.8.0_144]
Caused by: org.apache.camel.InvalidPayloadException: No body available of type: java.io.InputStream but has value: Total number of records discovered: 5
What am I doing wrong? Any inputs will help.
PS: Newly introduced to Apache Camel
I would guess that the error comes from .toD("file://<to the temp file>") trying to write a file, but finds the wrong type of body (String Total number of records discovered: 5 instead of InputStream.
I don't understand why you have one file-destinations inside the splitter and one outside of it.
As #claus-ibsen suggested try to remove this extra .aggregate(...) in your route. To split and re-aggregate it is sufficient to reference the aggregation strategy in the splitter. Claus also pointed to an example in the Camel docs
from(<ROUTE_FETCH_RECORDS_AND_WRITE>)
.setHeader(Exchange.FILE_PATH, constant("<path to temp folder>"))
.setHeader(Exchange.FILE_NAME, constant("<filename>.txt"))
.setBody(constant("<sql to fetch records>&outputType=StreamList))
.to("jdbc:<endpoint>)
.split(body(), <aggregationStrategy>)
.streaming().parallelProcessing()
// the processors below get individual parts
.<some processors>
.end()
// The end statement above ends split-and-aggregate. From here
// you get the re-aggregated result of the splitter.
// So you can simply write it to a file and also write the done-file
.to(...);
However, if you need to control the aggregation sizes, you have to combine splitter and aggregator. That would look somehow like this
from(<ROUTE_FETCH_RECORDS_AND_WRITE>)
.setHeader(Exchange.FILE_PATH, constant("<path to temp folder>"))
.setHeader(Exchange.FILE_NAME, constant("<filename>.txt"))
.setBody(constant("<sql to fetch records>&outputType=StreamList))
.to("jdbc:<endpoint>)
// No aggregationStrategy here so it is a standard splitter
.split(body())
.streaming().parallelProcessing()
// the processors below get individual parts
.<some processors>
.end()
// The end statement above ends split. From here
// you still got individual records from the splitter.
.to(seda:aggregate);
// new route to do the controlled aggregation
from("seda:aggregate")
// constant(true) is the correlation predicate => collect all messages in 1 aggregation
.aggregate(constant(true), new YourAggregationStrategy())
.completionSize(500)
// not sure if this 'end' is needed
.end()
// write files with 500 aggregated records here
.to("...");

Camel enrich SQL syntax issue

I'm tasked with creating a Camel route using Camel version 2.20.0 that takes a line in from a CSV file uses a value from that line in the SQL statement where clause and merges the results and outputs them again. If I hardcode the identifier in the SQL statement it works fine, if I try and use a dynamic URI I get an error.
The route is:
from("file:///tmp?fileName=test.csv")
.split()
.tokenize("\n")
.streaming()
.parallelProcessing(true)
.setHeader("userID", constant("1001"))
//.enrich("sql:select emplid,name from employees where emplid = '1001'",
.enrich("sql:select name from employees where emplid = :#userID",
new AggregationStrategy() {
public Exchange aggregate(Exchange oldExchange,
Exchange newExchange) {...
As I said if I uncomment the line with the hardcoded 1001 it queries the db and works as expected. However using the ':#userID' syntax I get an Oracle error of:
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
Message History
---------------------------------------------------------------------------------------------------------------------------------------
RouteId ProcessorId Processor Elapsed (ms)
[route3 ] [route3 ] [file:///tmp?fileName=test.csv ] [ 43]
[route3 ] [log5 ] [log ] [ 2]
[route3 ] [setHeader2 ] [setHeader[userID] ] [ 0]
[route3 ] [enrich2 ] [enrich[constant{sql:select name from employees where emplid = :#userID] [ 40]
The table is clearly there because it works when the value is hardcoded so it's got something to do with passing in the dynamic value. I've tried lots of variations on how to pass that variable in, inside single quotes, using values from the body instead of headers, etc. and haven't found the working combination yet though I've seen lots of similar seemingly working examples.
I've turned trace on it appears the header is correctly set as well:
o.a.camel.processor.interceptor.Tracer : >>> (route3) setHeader[userID, 1001] --> enrich[constant{sql:select name from employees where emplid = :#userID}] <<< Pattern:InOnly, Headers:{CamelFileAbsolute=true, CamelFileAbsolutePath=/tmp/test.csv, CamelFileLastModified=1513116018000, CamelFileLength=26, CamelFileName=test.csv, CamelFileNameConsumed=test.csv, CamelFileNameOnly=test.csv, CamelFileParent=/tmp, CamelFilePath=/tmp/test.csv, CamelFileRelativePath=test.csv, userID=1001}, BodyType:String, Body:1001,SomeValue,MoreValues
What needs to change to make this work?
I should also note I've tried this approach, using various syntax options to refer to the header value, without any luck:
.enrich().simple("sql:select * from employees where emplid = :#${in.header.userID}").aggregate ...
From the docs:
From Camel 2.16 onwards both enrich and pollEnrich supports dynamic endpoints that uses an Expression to compute the uri, which allows to use data from the current Exchange. In other words all what is told above no longer apply and it just works.
As you are using 2.20, I think you may try this example:
from("file:///tmp?fileName=test.csv")
.split()
.tokenize("\n")
.streaming()
.parallelProcessing(true)
.setHeader("userID", constant("1001"))
//.enrich("sql:select emplid,name from employees where emplid = '1001'",
.enrich("sql:select name from employees where emplid = ':#${in.header.userID}'",
new AggregationStrategy() {
public Exchange aggregate(Exchange oldExchange,
Exchange newExchange) {...
Take a look at the Expression topic in docs for further examples.
To sum up, the expression could be:
"sql:select name from employees where emplid = ':#${in.header.userID}'"
EDIT:
Sorry, I've missed the :# suffix. You could see a unit test working here.
Just take care with the columns types. If it's a integer, you shouldn't need the quotes.
Cheers!
From the Camel docs:
pollEnrich or enrich does not access any data from the current
Exchange which means when polling it cannot use any of the existing
headers you may have set on the Exchange.
The recommended way of achieving what you want is to instead use the recipientList, so I suggest you read up on that.
Content Enricher
Recipient List
Edit:
As Ricardo Zanini rightly pointed out in his answer it is actually possible to achieve this with Camel-versions from 2.16 onwards. As the OP is using 2.20 my answer is invalid.
I will, however, keep my answer but want to point out that this is only valid if you're using an older version than 2.16.

Parsing data from Kafka in Apache Flink

I am just getting started on Apache Flink (Scala API), my issue is following:
I am trying to stream data from Kafka into Apache Flink based on one example from the Flink site:
val stream =
env.addSource(new FlinkKafkaConsumer09("testing", new SimpleStringSchema() , properties))
Everything works correctly, the stream.print() statement displays the following on the screen:
2018-05-16 10:22:44 AM|1|11|-71.16|40.27
I would like to use a case class in order to load the data, I've tried using
flatMap(p=>p.split("|"))
but it's only splitting the data one character at a time.
Basically the expected results is to be able to populate 5 fields of the case class as follows
field(0)=2018-05-16 10:22:44 AM
field(1)=1
field(2)=11
field(3)=-71.16
field(4)=40.27
but it's now doing:
field(0) = 2
field(1) = 0
field(3) = 1
field(4) = 8
etc...
Any advice would be greatly appreciated.
Thank you in advance
Frank
The problem is the usage of String.split. If you call it with a String, then the method expects it to be a regular expression. Thus, p.split("\\|") would be the correct regular expression for your input data. Alternatively, you can also call the split variant where you specify the separating character p.split('|'). Both solutions should give you the desired result.

Is it possible to skip parsing some field in HAPI?

I am using Apache Camel, Mina2 and HAPI to receive HL7 v2 messages. I noticed that its taking lot of time to unmarshal and create Message object. And this time increases when I have larger message.
My message has around 120 OBX segment and I am using OBX 3 and OBX 5 field only. I tested manually removing fields after OBX 5 and found some improvement in performance. Is there any way to tell HAPI not to parse any fields after OBX 5 ?
You could extend ca.uhn.hl7v2.parser.PipeParser and override the Segment parsing method.
#Override
public void parse(Segment destination, String segment, EncodingCharacters encodingChars, Integer theRepetition) throws HL7Exception {
if(!"OBX".equals(destination.getName()) || destination.getParent().getParent().getAll("OBSERVATION").length <= 5) {
super.parse(destination, segment, encodingChars, theRepetition);
}
}
Use this to parse your messages and it will only parse the first 5 OBSERVATIONS in the ORDER_DETAIL.

Resources