I am looking to convert HL7 v2 (older EDI format) messages to JSON, so I could make them processable under Apache Drill and compressible under Parquet.
I looked into HAPI, but I am not having luck finding utility for non-XML HL7 to JSON conversion.
Does anyone have a suggestion or a reference to a library?
Just use HAPI to convert to XML. The code below requires Saxon, because the XML-to-JSON requires XSLT 2.0, but if you already have a method to convert XML to JSON, then you just need the first two lines, which are entirely HAPI. You should download the XSLT locally for production, of course. :-)
String convertHL7ToJson(Message message) {
try {
DefaultXMLParser xmlParser = new DefaultXMLParser(new CanonicalModelClassFactory("2.6"));
String xml = xmlParser.encode(message);
Transformer xmlTransformer = TransformerFactory.newInstance("net.sf.saxon.TransformerFactoryImpl", null).newTransformer(
new StreamSource(new StringReader(readFileFromURL("https://github.com/bramstein/xsltjson/raw/master/conf/xml-to-json.xsl")))
);
StringWriter result = new StringWriter();
xmlTransformer.transform(
new StreamSource(new StringReader(xml)), new StreamResult(result)
);
return result.toString();
} catch (Throwable t) {
t.printStackTrace();
}
return null;
}
String readFileFromURL(String url) {
InputStream is = null;
try {
return new Scanner(is = new URL(url).openStream(), "UTF-8").useDelimiter("\\A").next();
} catch (Throwable t) {
t.printStackTrace();
} finally {
if(is != null)
try {
is.close();
} catch (Throwable ignored){}
}
return null;
}
This creates output like this:
"ORM_O01":{"MSH":{"MSH.1":"|","MSH.2":"^~\\&","MSH.3":{"HD.1":"TEST"},"MSH.4":{"HD.1":"TEST000","HD.2":"BL"},...
If there is a way to convert the HL7 to XML, you can query the XML natively 1 with Drill and then ultimately convert that to parquet.
Related
I'm trying to read a binary file, convert it into a pojo format and then output as CSV. The unmarshalling (and marshalling) seems to be fine, but I'm having trouble optimising the converting the relevant records to Foo.class. The attempt below returns no results.
from(String.format("file://%s?move=%s", INPUT_DIRECTORY, MOVE_DIRECTORY))
.unmarshal(unmarshaller)
.split(bodyAs(Iterator.class), new ListAggregationStrategy())
.choice()
.when(not(predicate)).stop()
.otherwise().convertBodyTo(Foo.class)
.end()
.end()
.marshal(csv)
.to(String.format("file://%s?fileName=${header.CamelFileName}.csv", OUTPUT_DIRECTORY));
I was able to get it to work like this, but it feels like there has to be a better way - This will be need to be efficient, and having a 1s timeout feels like it goes against that, which is why I was attempting to use the built in split aggregation. Alternatively some way of using completionFromBatchConsumer, but I was struggling to make that work either!.
from(String.format("file://%s?move=%s", INPUT_DIRECTORY, MOVE_DIRECTORY))
.unmarshal(unmarshaller)
.split(bodyAs(Iterator.class))
.streaming()
.filter(predicate)
.convertBodyTo(Foo.class)
.aggregate(header("CamelFileName"), new ListAggregationStrategy())
.completionTimeout(1000)
.marshal(csv)
.to(String.format("file://%s?fileName=${header.CamelFileName}.csv", OUTPUT_DIRECTORY));
You could create your own AggregationStrategy in your first solution.
Instead of calling stop() in your choice statement, put a simple header like "skipMerge" to true.
In your strategy, test if this header exists and if so, skip it.
class ArrayListAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
Object newBody = newExchange.getIn().getBody();
Boolean skipMerge = newExchange.getIn().getHeader("skipMerge", Boolean.class);
if (!skipMerge) { return oldExchange; }
ArrayList<Object> list = null;
if (oldExchange == null) {
list = new ArrayList<Object>();
list.add(newBody);
newExchange.getIn().setBody(list);
return newExchange;
} else {
list = oldExchange.getIn().getBody(ArrayList.class);
list.add(newBody);
return oldExchange;
}
}
}
Currently, your code never goes to marshal(csv) because the aggregator does not receive all the splitted parts.
Is there a way to set a JWE full serialization input with jose4j? For example, what goes in the TODO below?
public String decryptJWE(PrivateKey privateKey, String payload, boolean compact) throws JoseException {
JsonWebEncryption jwe = new JsonWebEncryption();
if (compact) {
jwe.setCompactSerialization(payload);
} else {
// TODO: what goes here? expecting something like jwe.setFullSerialization(payload)
}
jwe.setKey(privateKey);
return jwe.getPayload();
}
No, only the JWE compact serialization is supported. The general and flattened JWE JSON serializations aren't directly supported.
I have a problem with the ReactiveGridFsTemplate. I am trying to read a GridFS file written with the old GridFS (com.mongodb.gridfs) instead of the new GridFS (com.mongodb.client.gridfs.model.GridFS) with an UUID as an ID instead of the ObjectId. Reading the GridFS file metainfo goes fine, but as soon as I want to get the ReactiveGridFsResource it blows with a nice new MongoGridFSException("Custom id type used for this GridFS file").
The culprit is the code below from ReactiveGridFsTemplate which calls the getObjectId() instead of the getId(). Should it call this method or can that be rewritten to the getId() method?
public Mono<ReactiveGridFsResource> getResource(GridFSFile file) {
Assert.notNull(file, "GridFSFile must not be null!");
return Mono.fromSupplier(() -> {
GridFSDownloadStream stream = this.getGridFs().openDownloadStream(file.getObjectId());
return new ReactiveGridFsResource(file, BinaryStreamAdapters.toPublisher(stream, this.dataBufferFactory));
});
}
I hacked the ReactiveGridFsTemplate to use getId() instead of getObjectId() but now it gives me a stackoverflow exception. Can someone tell me what I'm doing wrong?
ReactiveGridFsTemplate reactiveGridFsTemplate = new ReactiveGridFsTemplate(mongoDbDFactory, operations.getConverter(), "nl.loxia.collectie.buitenlandbladen.dgn", 1024) {
public Mono<ReactiveGridFsResource> getResource(GridFSFile file) {
Assert.notNull(file, "GridFSFile must not be null!");
return Mono.fromSupplier(() -> {
GridFSDownloadStream stream = this.getGridFs().openDownloadStream(file.getId());
return new ReactiveGridFsResource(file, BinaryStreamAdapters.toPublisher(stream, this.dataBufferFactory));
});
}
};
var q = Query.query((Criteria.where("_id").is("5449d9e3-7f6d-47b7-957d-056842f190f7")));
List<DataBuffer> block = reactiveGridFsTemplate
.findOne(q)
.flatMap(reactiveGridFsTemplate::getResource)
.flux()
.flatMap(ReactiveGridFsResource::getDownloadStream)
.collectList()
.block();
The stacktrace: https://gist.github.com/nickstolwijk/fa77681572db1d91941d85f6c845f2f4
Also, this code hangs due to the stackoverflow exception. Is that correct?
I am trying to write a spell corrector using the lucene spellchecker. I would want to give it a single text file with blog text content. The problem is that it works only when I give it one sentence/word per line in my dictionary file. Also the suggest API returns results without giving any weightage to number of occurences. Following is the source code
public class SpellCorrector {
SpellChecker spellChecker = null;
public SpellCorrector() {
try {
File file = new File("/home/ubuntu/spellCheckIndex");
Directory directory = FSDirectory.open(file);
spellChecker = new SpellChecker(directory);
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
spellChecker.indexDictionary(
new PlainTextDictionary(new File("/home/ubuntu/main.dictionary")), config, true);
//Should I format this file with one sentence/word per line?
} catch (IOException e) {
}
}
public String correct(String query) {
if (spellChecker != null) {
try {
String[] suggestions = spellChecker.suggestSimilar(query, 5);
// This returns the suggestion not based on occurence but based on when it occured
if (suggestions != null) {
if (suggestions.length != 0) {
return suggestions[0];
}
}
} catch (IOException e) {
return null;
}
}
return null;
}
}
Do I need to make some changes?
Regarding your first issue, sounds like the expected, documented dictionary format, here in the PlainTextDictionary API. If you want to pass arbitrary text in, you might want to index it and use a LuceneDictionary instead, or possibly a HighFrequencyDictionary, depending on your needs.
The Spellchecker suggests replacements based on the similarity between the words (based on Levenstein Distance), before any other concern. If you want it to only recommend more popular terms as suggestions, you should pass a SuggestMode to SpellChecker.suggestSimilar. This ensures that matches suggested are at least as strong, popularity-wise, as the word they are intended to replace.
If you must override how Lucene decides on best matches, you can do that with SpellChecker.setComparator, creating your own Comparator on SuggestWords. Since SuggestWord exposes freq to you, it should be easy to arrange found matches by popularity.
In a nutshell, since GAE cannot write to a filesystem, I have decided to persist my data into the datastore (using JDO). Now, I will like to retrieve the data byte by byte and pass it to the client as an input stream. There's code from the gwtupload library(http://code.google.com/p/gwtupload/) (see below) which breaks on GAE because it writes to the system filesystem. I'll like to be able to provide a GAE ported solution.
public static void copyFromInputStreamToOutputStream(InputStream in, OutputStream out) throws IOException {
byte[] buffer = new byte[100000];
while (true) {
synchronized (buffer) {
int amountRead = in.read(buffer);
if (amountRead == -1) {
break;
}
out.write(buffer, 0, amountRead);
}
}
in.close();
out.flush();
out.close();
}
One work around I have tried (didn't work) is to retrieve the data from the datastore as a resource like this:
InputStream resourceAsStream = null;
PersistenceManager pm = PMF.get().getPersistenceManager();
try {
Query q = pm.newQuery(ImageFile.class);
lf = q.execute();
resourceAsStream = getServletContext().getResourceAsStream((String) pm.getObjectById(lf));
} finally {
pm.close();
}
if (lf != null) {
response.setContentType(receivedContentTypes.get(fieldName));
copyFromInputStreamToOutputStream(resourceAsStream, response.getOutputStream());
}
I welcome your suggestions.
Regards
Store data in a byte array, and use a ByteArrayInputStream or ByteArrayOutputStream to pass it to libraries that expect streams.
If by 'client' you mean 'HTTP client' or browser, though, there's no reason to do this - just deal with regular byte arrays on your end and send them to/from the user as you would any other data. The only reason to mess around with streams like this is if you have some library that expects them.