Write and read file line by line with Camel - file

I would like to write a byte arrays in a file with Camel. But, in order to get back my arrays, I want to write them line by line, or with another separator.
How to do that with Camel ?
from(somewhere)
.process(new Processor() {
#Override
public void process(final Exchange exchange) throws Exception {
final MyObject body = exchange.getIn().getBody(MyObject.class);
byte[] serializedObject = MySerializer.serialize(body);
exchange.getOut().setBody(serializedObject);
exchange.getOut().setHeader(Exchange.FILE_NAME, "filename");
}
}).to("file://filepath?fileExist=Append&autoCreate=true");
Or is anyone have another way to get them back ?
PS : I need to have only one file, otherwise it would have been too easy ...
EDIT :
I successfully write my file line by line with the out.writeObject method (Thanks to Petter). And I can read them with :
InputStream file = new FileInputStream(FILENAME);
InputStream buffer = new BufferedInputStream(file);
input = new ObjectInputStream(buffer);
Object obj = null;
while ((obj = input.readObject()) != null) {
// Do something
}
But I not able to split and read them with camel. Do you have any idea to read them with Camel ?

It depends on what your serialized object looks like, since you seem to have your own serializer. Is it standard java binary
ByteArrayOutputStream bos = new ByteArrayOuputStream();
ObjectOutput out = new ObjectOutputStream(bos);
out.writeObject(obj);
return bos.toByteArray();
I probably won't be such a great idea to use text based separators like \n.
Can't you serialize into some text format instead? Camel has several easy to use data formats: (http://camel.apache.org/data-format.html). Xstream, for instance, is a line of code or so, to create XML from your objects, then it's not big deal to split the file into several XML parts and read them back with XStream.
In your example, if you really want a separator, why don't you just append it to the byte[]? Copy the array to a new, bigger byte[] and insert some unique sequence in the end.

Related

Change CheckCharacters on XmlReader generated by SqlCommand.ExecuteXmlReader

If I try to change CheckCharacters the following way it, reader.Settings.CheckCharacters is still true. How am I supposed to do it?
using (var reader_org = command.ExecuteXmlReader())
{
var settings = new XmlReaderSettings { CheckCharacters = false, ConformanceLevel = ConformanceLevel.Auto };
var reader = XmlReader.Create(reader_org, settings);
reader.Read();
}
According to the documentation it's supposed to work:
"Add features to an existing XML reader. The Create method can accept another XmlReader object. The underlying XmlReader object can be a user-defined reader, a XmlTextReader object, or another XmlReader instance that you want to add additional features to."
It appears you are using FOR XML in SQL Server to generate certain types of XML that are not actually valid values, because they contain restricted characters, and is therefore not valid XML.
SQL Server will quite rightly not allow you to generate such XML if you use the , TYPE directive. But if you do not use that, it generates the XML as a string, and does not validate invalid characters. See also this article.
Ideally, you would use Base64 or similar to encode this. But assuming for whatever reason you don't want to do this, then the reason your current code does not work is that the underlying reader_ord XML reader already has CheckCharacters = true so will throw an exception.
Instead you need to create your own XML reader from the string. Since FOR XML without , TYPE also splits up large XML blobs into separate rows, you also need to concatenate them all first.
var sb = new StringBuilder();
using (var reader = command.ExecuteReader())
{
while (reader.Read()) // read all rows
{
sb.Append(reader.GetString(0));
}
}
var settings = new XmlReaderSettings { CheckCharacters = false, ConformanceLevel = ConformanceLevel.Auto };
using (var xmlReader = XmlReader.Create(sb.ToString(), settings))
{
// do stuff with reader here
}
There are more performant ways to do that: for example you could create your own Stream out of sequential reader.GetStream results, but that is significantly more complex.

Parse string in json format from Kafka using Flink

What I want to do is reading a string in json format e.g.
{"a":1, "b":2}
using flink and then extract a specific value by its key, say 1.
Refer to here: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html
What I have done is:
val params = ParameterTool.fromArgs(args)
val env = StreamExecutionEnvironment.getExecutionEnvironment
val kafkaConsumer = new FlinkKafkaConsumer010(
params.getRequired("input-topic"),
new JSONKeyValueDeserializationSchema(false),
params.getProperties
)
val messageStream = env.addSource(kafkaConsumer)
But I am not quite sure how to move forward then. In the link above, it says I can use objectNode.get(“field”).as(Int/String/…)() to extract a specific value by key, but I wonder how can I do that?
Or there can be a completely different way to achieve what I want?
Thanks!
Apply data transformation on the data from Kafka like this:
messageStream.map(new MapFunction<ObjectNode, Object>() {
#Override
public Object map(ObjectNode value) throws Exception {
value.get("field").as(...)
}
})

Create PDF file from Text String or HTML String

My codenameone-app is producing some data, which I like to be able to summarize in a PDF-file for documentation purposes.
Would it be possible to either use a java library as cn1 library or to use a webservice which converts an HTML String into a PDF file like this:?
https://www.html2pdfrocket.com/convert-android-html-to-pdf
Maybe someone else already figured out a best-practice for this.
Thanks a lot!
There is no current builtin solution for that, it should be easy enough to wrap native libs or maybe even port a JavaSE lib that does that. Most of the developers who do something like this use a server side process to generate the PDF.
After giving it a quick-and-dirty try with html2pdfrocket - instead of using or porting a java library - I was simply amazed by the simplicity of the possibility with codenameone. I wasn't expecting this to be so easy AT ALL.
This class and method is all you'd need to simply save the pdf file to FileSystemStorage.
import com.codename1.io.Util;
public class PDFHandler {
private final static String URL="http://api.html2pdfrocket.com/pdf";
private final static String APIKEY = "<YOURAPI-KEY>";
/**
* Stores given HTML String or URL to Storage with given filename
* #param value URL or HTML add quote if you have spaces. use single quotes instead of double
* #param filename
*/
public void getFile(String value,String filename){
// Validate parameters
if(value==null||value.length()<1)
return;
if(filename==null||filename.length()<1)
return;
//Encode
value = Util.encodeUrl(value);
String fullPathToFile = FileSystemStorage.getInstance().getAppHomePath()+filename;
Util.downloadUrlToFileSystemInBackground(URL+"?apikey="+APIKEY+"&value="+value, fullPathToFile);
}
}
I hope this helps some other codenameone-newbie!

How to aggregate many marshalled (json) objects to one file

I created a route to buffer/store marshaled objects (json) into files. This route (and the other route to read the buffer) work fine.
storing in buffer:
from(DIRECT_IN).marshal().json().marshal().gzip().to(fileTarget());
reading from buffer:
from(fileTarget()).unmarshal().gzip().unmarshal().json().to("mock:a")
To reduce i/o i want to aggregate many exchanges in one file. I tried to just aggregate after json and before it so i added this after json() or from(...):
.aggregate(constant(true)).completionSize(20).completionTimeout(1000).groupExchanges()
In both cases i get conversion exceptions. How to do it correctly? I would prefer a way without custom aggregator. And it would be nice if just many exchanges/object are aggregated in one json (as list of objects) or in one text file - one json object per line.
Thanks in advance.
Meanwhile i added a simple aggreagtor:
public class LineAggregator implements AggregationStrategy {
#Override
public final Exchange aggregate(final Exchange oldExchange, final Exchange newExchange) {
//if first message of aggregation
if (oldExchange == null) {
return newExchange;
}
//else aggregate
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
String aggregate = oldBody + System.lineSeparator() + newBody;
oldExchange.getIn().setBody(aggregate);
return oldExchange;
}
}
The routes look like that, to buffer:
from(...)// marshal objects to json
.marshal()
.json()
.aggregate(constant(true), lineAggregator)
.completionSize(BUFFER_PACK_SIZE)
.completionTimeout(BUFFER_PACK_TIMEOUT)
.marshal()
.gzip()
.to(...)
from buffer:
from(...).unmarshal()
.gzip()
.split()
.tokenize("\r\n|\n|\r")
.unmarshal()
.json()
.to(....)
But the question remains, is the aggregator necessary?

Training own model in opennlp

I am finding it difficult to create my own model openNLP.
Can any one tell me, how to own model.
How the training shouls be done.
What should be the input and where the output model file will get stored.
https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html
This website is very useful, shows both in code, and using the OpenNLP application to train models for all different types, like entity extraction and part of speech etc.
I could give you some code examples in here, but the page is very clear to use.
Theory-wise:
Essentially you create a file which lists the stuff you want to train
eg.
Sport [whitespace] this is a page about football, rugby and stuff
Politics [whitespace] this is a page about tony blair being prime minister.
The format is described on the page above (each model expects a different format). once you have created this file, you run it through either the API or the opennlp application (via command line), and it generates a .bin file. Once you have this .bin file, you can load it into a model, and start using it (as per the api in the above website).
First you need to train the data with the required Entity.
Sentences should be separated with new line character (\n). Values should be separated from and tags with a space character.
Let's say you want to create medicine entity model, so data should be something like this:
<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and
<START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.
You can refer a sample dataset for example. Training data should have at least 15000 sentences to get the better results.
Further you can use Opennlp TokenNameFinderTrainer.
Output file will be in the .bin format.
Here is the example: Writing a custom NameFinder model in OpenNLP
For more details, refer the Opennlp documentation
Perhaps this article will help you out. It describes how to do TokenNameFinder training from data extracted from Wikipedia...
nuxeo - blog - Mining Wikipedia with Hadoop and Pig for Natural Language Processing
Copy the data in data and run below code to get your own mymodel.bin .
Can refer for data=https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt
public class Training {
static String onlpModelPath = "mymodel.bin";
// training data set
static String trainingDataFilePath = "data.txt";
public static void main(String[] args) throws IOException {
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream = new PlainTextByLineStream(
new FileInputStream(trainingDataFilePath), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(
lineStream);
TokenNameFinderModel model = null;
HashMap<String, Object> mp = new HashMap<String, Object>();
try {
// model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ;
model= NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap());
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
}
}

Resources