morphline#flume - looking for regexp change and a hash function - solr

Fluming data to Solr. Data get changed using morphline.
Looking for a couple of basic functions in morphline library:
create a hash value based on other attribute values (e.g. hash=("sha-1", timestamp,message,host,..)
change case of an attribute's string value (something more generic like regexp_replace would do as well).
Don't want yet to write a custom Java handler.. I think there is should be an easier way :)

(1) Non-generic solution for hash function as I wasn't able to find out-of-the-box morphline implementation, hard-coded SHA-1 (eg. no for loop, hard-coded 20 bytes):
{ java {
imports : "import java.security.*;"
code: """
try {
MessageDigest digest = MessageDigest.getInstance("SHA-1");
String value;
value = (String) record.getFirstValue("message");
if (value != null) { digest.update(value.getBytes("ISO-8859-1"), 0, value.length()); }
value = (String) record.getFirstValue("timestamp");
if (value != null) { digest.update(value.getBytes("ISO-8859-1"), 0, value.length()); }
value = (String) record.getFirstValue("hostname");
if (value != null) { digest.update(value.getBytes("ISO-8859-1"), 0, value.length()); }
byte[] a = digest.digest();
record.replaceValues("id"
, String.format("%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X"
,a[0] ,a[1] ,a[2] ,a[3] ,a[4] ,a[5] ,a[6] ,a[7] ,a[8] ,a[9] //SHA-1 has exactly 20 bytes
,a[10],a[11],a[12],a[13],a[14],a[15],a[16],a[17],a[18],a[19]) );
}
catch (java.security.NoSuchAlgorithmException e) { logger.error("hash to id: caught NoSuchAlgorithmException for SHA-1"); }
catch (java.io.UnsupportedEncodingException e) { logger.error("hash to id: caught UnsupportedEncodingException"); }
finally {
return child.process(record);
}
"""
}
}
(2) Non-generic implementaion for lower case transformation (I would hope morphline had just something like regexp_replace) :
java {
code: """
String program = (String) record.getFirstValue("program");
String program_lc = program.toLowerCase();
if (! program.equals(program_lc) )
{ record.replaceValues("program", program_lc); }
return child.process(record);
"""
}

Related

Spring Cloud Open Feign: Decoder for ByteArrayResource

I have a Spring Boot Rest End Point defined in an interface to download an image
#GetMapping(value = "/{name}")
ResponseEntity<ByteArrayResource> getFileByName(#PathVariable("name") String name);
And I use Feign Builder to invoke this end point.
Feign.builder()
.client(new ApacheHttpClient())
.contract(new SpringMvcContract())
.decoder(new JacksonDecoder())
.encoder(new JacksonEncoder())
.target(clazz, url)
On invoking, I get below error
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('�' (code 65533 / 0xfffd)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: (BufferedReader); line: 1, column: 2]
When I try to invoke the end point directly from Insomnia, it works fine. But fails through Feign Builder. The response content type is image/jpeg
Is there any specific decoder in feign to handle ByteArrayResource? I tried ResponseEntityDecoder, StreamDecoder and JacksonDecoder. None of it works.
On debugging, I see that Jackson ObjectMapper readValue fails. I tried changing the return type from ByteArraySource to byte[], didn't work either.
Any help?
I wrote my own little decoder and the problem was resolved. Below is the decoder
private Decoder byteArrayResourceDecoder() {
Decoder decoder = (response, type) -> {
if (type instanceof Class && ByteArrayResource.class.isAssignableFrom((Class) type)) {
return StreamUtils.copyToByteArray(response.body().asInputStream());
}
return new JacksonDecoder().decode(response, type);
};
return new ResponseEntityDecoder(decoder);
}
Hope this template helps others who has similar issues. Would have expected Feign to have decoder that supports all return types.
Thanks Maz - your solution helped me.
I modified your solution for my needs to read Spring StreamingResponseBody
1.) Create the decoder wrapper that either returns JacksonDecoder (Default) or reads the responsebody into a byte array.
Decoder decoder = (response, type) -> {
Map<String, Collection<String>> headers = response.headers();
Collection<String> contentType = null;
for (String x : headers.keySet()){
if ("content-type".equals(x.toLowerCase())){
contentType = headers.get(x);
}
}
if (contentType == null || contentType.stream().filter(x -> x.contains("application/json")).findFirst().isPresent()) {
return new JacksonDecoder(getMapper()).decode(response, type);
}
InputStream initialStream = response.body().asInputStream();
byte[] buffer = new byte[512];
byte[] result = null;
try(ByteArrayOutputStream out = new ByteArrayOutputStream()) {
try {
int length = 0;
while ((length = initialStream.read(buffer)) != -1) {
out.write(buffer, 0, length);
}
} finally {
out.flush();
}
result = out.toByteArray();
} finally {
initialStream.close();
}
return result;
};
2.) Use the custom decoder with the Feign.Builder
Feign.Builder builder = Feign.builder()
// --
.decoder(decoder)
// --
openfeignfeignspringstreamingresponsebody

How to access Spans with a SpanNearQuery in solr 6.3

I am trying to build a query parser by ranking the passages containing the terms.
I understand that I need to use SpanNearQuery, but I can't find a way to access Spans even after going through the documentation. The method I got returns null.
I have read https://lucidworks.com/blog/2009/07/18/the-spanquery/ which explains in a good way about the query. This explains how to access spans, but it is for solr 4.0 and unfortunately solr 6.3 doesn't have atomic reader any more.
How can I get the actual spans?
public void process(ResponseBuilder rb) throws IOException {
SolrParams params = rb.req.getParams();
log.warn("in Process");
if (!params.getBool(COMPONENT_NAME, false)) {
return;
}
Query origQuery = rb.getQuery();
// TODO: longer term, we don't have to be a span query, we could re-analyze the document
if (origQuery != null) {
if (origQuery instanceof SpanNearQuery == false) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Illegal query type. The incoming query must be a Lucene SpanNearQuery and it was a " + origQuery.getClass().getName());
}
SpanNearQuery sQuery = (SpanNearQuery) origQuery;
SolrIndexSearcher searcher = rb.req.getSearcher();
IndexReader reader = searcher.getIndexReader();
log.warn("before leaf reader context");
List<LeafReaderContext> ctxs = (List<LeafReaderContext>) reader.leaves();
log.warn("after leaf reader context");
LeafReaderContext ctx = ctxs.get(0);
SpanWeight spanWeight = sQuery.createWeight(searcher, true);
Spans spans = spanWeight.getSpans(ctx, SpanWeight.Postings.POSITIONS);
AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);
Map<Term, TermContext> termContexts = new HashMap<Term, TermContext>();
Spans spans = fleeceQ.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);
// SpanWeight.Postings[] postings= SpanWeight.Postings.values();
// Spans spans = sQuery.getSpans();
// Assumes the query is a SpanQuery
// Build up the query term weight map and the bi-gram
Map<String, Float> termWeights = new HashMap<String, Float>();
Map<String, Float> bigramWeights = new HashMap<String, Float>();
createWeights(params.get(CommonParams.Q), sQuery, termWeights, bigramWeights, reader);
float adjWeight = params.getFloat(ADJACENT_WEIGHT, DEFAULT_ADJACENT_WEIGHT);
float secondAdjWeight = params.getFloat(SECOND_ADJ_WEIGHT, DEFAULT_SECOND_ADJACENT_WEIGHT);
float bigramWeight = params.getFloat(BIGRAM_WEIGHT, DEFAULT_BIGRAM_WEIGHT);
// get the passages
int primaryWindowSize = params.getInt(OWLParams.PRIMARY_WINDOW_SIZE, DEFAULT_PRIMARY_WINDOW_SIZE);
int adjacentWindowSize = params.getInt(OWLParams.ADJACENT_WINDOW_SIZE, DEFAULT_ADJACENT_WINDOW_SIZE);
int secondaryWindowSize = params.getInt(OWLParams.SECONDARY_WINDOW_SIZE, DEFAULT_SECONDARY_WINDOW_SIZE);
WindowBuildingTVM tvm = new WindowBuildingTVM(primaryWindowSize, adjacentWindowSize, secondaryWindowSize);
PassagePriorityQueue rankedPassages = new PassagePriorityQueue();
// intersect w/ doclist
DocList docList = rb.getResults().docList;
log.warn("Before Spans");
while (spans.nextDoc() != Spans.NO_MORE_DOCS) {
// build up the window
log.warn("Iterating through spans");
if (docList.exists(spans.docID())) {
tvm.spanStart = spans.startPosition();
tvm.spanEnd = spans.endPosition();
// tvm.terms
Terms terms = reader.getTermVector(spans.docID(), sQuery.getField());
tvm.map(terms, spans);
// The entries map contains the window, do some ranking of it
if (tvm.passage.terms.isEmpty() == false) {
log.debug("Candidate: Doc: {} Start: {} End: {} ", new Object[] { spans.docID(), spans.startPosition(), spans.endPosition() });
}
tvm.passage.lDocId = spans.docID();
tvm.passage.field = sQuery.getField();
// score this window
try {
addPassage(tvm.passage, rankedPassages, termWeights, bigramWeights, adjWeight, secondAdjWeight, bigramWeight);
} catch (CloneNotSupportedException e) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Internal error cloning Passage", e);
}
// clear out the entries for the next round
tvm.passage.clear();
}
}
}
}

Set field Accessibility to Custom Salesforce Lead field from Java code

I am working around with Salesforce and force.com API and metadata API, version 36.
I can create a custom field in a Lead object but by default I can see it's hidden and this means I cannot create a new Lead with these custom fields because it returns a bad request (400 status code).
Is there any way by Code to set the custom field Visible?
public boolean createCustomExtTextField(String name, LoginResult metadataLoginResult, int length) {
boolean success = false;
CustomField cs = new CustomField();
cs.setFullName("Lead."+name+"__c");
cs.setLabel("Custom"+name+"Field");
cs.setType(FieldType.LongTextArea);
cs.setLength(length);
cs.setVisibleLines(50); // max 50
try {
MetadataConnection metadataConnection = createMetadataConnection(metadataLoginResult);
SaveResult[] results = metadataConnection.createMetadata(new Metadata[] { cs });
for (SaveResult r : results) {
if (r.isSuccess()) {
success = true;
} else {
System.out.println("Errors were encountered while creating " + r.getFullName());
for (com.sforce.soap.metadata.Error e : r.getErrors()) {
System.out.println("Error message: " + e.getMessage());
System.out.println("Status code: " + e.getStatusCode());
}
}
}
} catch (ConnectionException e) {
e.printStackTrace();
}
return success;
}
I am googling a lot and don't find something that actually helped. So, any hints are welcomed. Thank you.
Finally found a solution to this. I final one for me was to make all custom fields REQUIRED.
CustomField cs = new CustomField();
cs.setFullName("Lead.YourCompanyName" + name + "__c");
cs.setLabel("YourCompanyName" + name);
cs.setRequired(true);
...
com.sforce.soap.enterprise.LoginResult metadataLoginResult = operations.loginToMetadata(username, password, "https://login.salesforce.com/services/Soap/c/36.0");
...
private boolean createFieldInMetadata(LoginResult metadataLoginResult, CustomField cs) {
boolean success = false;
try {
MetadataConnection metadataConnection = createMetadataConnection(metadataLoginResult);
SaveResult[] results = metadataConnection.createMetadata(new Metadata[] { cs });
for (SaveResult r : results) {
if (r.isSuccess()) {
success = true;
} else {
System.out.println("Errors were encountered while creating " + r.getFullName());
for (com.sforce.soap.metadata.Error e : r.getErrors()) {
System.out.println("Error message: " + e.getMessage());
System.out.println("Status code: " + e.getStatusCode());
}
}
}
} catch (Exception e) {
}
return success;
}
And so it will appear in the page layout. Very important to know, a required field cannot have just an empty value set, it must be something. So if not all custom fields are required in your logic and you wanna avoid the entire process of unzipping page layout and zipping it back (however it may be done) just add "N/A" or any char at choice to the required by code but not your project custom fields.
I managed to make the custom Field Level Security visible for "Admin" profile but not Field Accessability to visible. The latter is untested.

AngularJs, how to set empty string in URL

In the controller I have below function:
#RequestMapping(value = "administrator/listAuthor/{authorName}/{pageNo}", method = { RequestMethod.GET,
RequestMethod.POST }, produces = "application/json")
public List<Author> listAuthors(#PathVariable(value = "authorName") String authorName,
#PathVariable(value = "pageNo") Integer pageNo) {
try {
if (authorName == null) {
authorName = "";
}
if (pageNo == null) {
pageNo = 1;
}
return adminService.listAuthor(authorName, pageNo);
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
This function fetches and returns data from mysql database based on "authorName" and "pageNo". For example, when "authorName = a" and "pageNo = 1" I have:
Data I get when "authorName = a" and "pageNo = 1"
Now I want to set "authorName" as ""(empty string), so that I can fetch all the data from mysql database (because the SQL statement "%+""+%" in backend will return all the data).
What can I do if I want to set authorName = empty string?
http://localhost:8080/spring/administrator/listAuthor/{empty string}/1
Thanks in advance!
I don't think that you can encode empty sting to url, what I suggest you to do is to declare some constant that will be your code to empty string - such as null.
Example:
administrator/listAuthor/null/90
Afterwards , on server side, check if authorName is null and set local parameter with empty stirng accordingly.

JPA Query returns no result

I am trying to search for some entities of kind "Book" in datastore using codes below.But it is returning empty List.I get a empty result list with size=-1.Where am I getting wrong?
#ApiMethod(name = "searchBook")
public List<Book> searchBook(#Named("bookName") String bookName,#Named("languageId")String languageId, #Named("subjectId")String subjectId) {
EntityManager mgr = getEntityManager();
List<Book> bookList=new ArrayList<Book>();
String queryString="";
queryString="SELECT x FROM Book x WHERE ";
//If i dont want to set any parameter for some field, I send BLANK as the parameter..
if(bookName!=null && !bookName.matches("BLANK")){
queryString=queryString+"x.bookName =:bookName"+" AND ";
}
if(languageId!=null && !languageId.matches("BLANK")){
queryString=queryString+"x.languageId =:languageId"+" AND ";
}
if(subjectId!=null && !subjectId.matches("BLANK")){
queryString=queryString+"x.categoryId =:subjectId"+" AND ";
}
//Removing last word "AND" and whitespaces from the end of queryString
queryString=queryString.substring(0,queryString.length()-5);
Query q = mgr.createQuery(queryString);
//setting query parameters..
if(bookName!=null && !bookName.matches("BLANK")){
q.setParameter("bookName", bookName);
}
if(languageId!=null && !languageId.matches("BLANK")){
q.setParameter("languageId", languageId);
}
if(subjectId!=null && !subjectId.matches("BLANK")){
q.setParameter("subjectId", subjectId);
}
//executing the query...
try {
mgr.getTransaction().begin();
bookList = (List<Book>) q.getResultList();
mgr.getTransaction().commit();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
mgr.close();
}
return bookList;
}
In my case I have tested by setting only one parameter for subjectId. Is something wrong in setting parameters? Is this the correct way to set dynamic number of parameters in a query?

Resources