Jackrabbit Oak Binary property in Solr index has not correct value - solr

I use remote solr index. I have one node type that has a binary property. When I add a node with this type and attach some non empty text file, oak add a document in solr but the value of binary field is some new line character.
I trace it and find that the binary value that extract in SolrIndexEditor class line 235 return a LinkedList with one entry that just contain "\n\n\n\n\n". Is there any config that I missed or there is a bug here?

I missed the tika parser in my dependencies. So by adding it the problem solved. I fine the answer here

Related

Solr highlighting does not work with multiple fields hl.fl when dynamic field is present

I have a dynamic text field bar_* in my index and want Solr to return highlightings for that field. So what I run is:
q=gold&hl=true&hl.fl=bar_*
It works as expected BUT in case I add some more fields to hl.fl it stops working. E.g.
q=gold&hl=true&hl.fl=bar_*,foo
Notes:
bar_* and foo fields are in the index/schema and there is no error here.
just rewriting request as q=gold&hl=true&hl.fl=bar_*&hl.fl=foo or q=gold&hl=true&hl.fl=bar_* foo does NOT help.
I didn't find any bugs in Solr JIRA on that topic.
Does anyone have an idea how to bit this. The possible workarounds that I see are:
Use hl.fl=*. But this one is not good for performance.
Explicitly specify all possible fields names for my dynamic field. But I don't like that at all.
I don't know what version is used, but it seems like this was a bug of previous Solr versions, I can confirm that in Solr 7.3 this works as expected.
curl -X GET \
'http://localhost:8983/solr/test/select?q=x_ggg:Test1%20OR%20bar_x:Test2&hl=true&hl.fl=%2A_ggg,foo,bar_%2A' \
-H 'cache-control: no-cache'
The more correct way is to do: hl.fl=bar_*,foo,*_ggg (use , or space as delimiter).
This helps to avoid long time debugging when you remove asterisk from your hl.fl parameter and highlighting by fields stops working, since this field not processed as regex anymore.
Here is spots in sources of Solr 7.3, where we can trace this behavior:
Solr calls org.apache.solr.highlight.SolrHighlighter#getHighlightFields
Before processing field, value splited by , or space here:
org.apache.solr.util.SolrPluginUtils#split
private final static Pattern splitList=Pattern.compile(",| ");
/** Split a value that may contain a comma, space of bar separated list. */
public static String[] split(String value){
return splitList.split(value.trim(), 0);
}
Results of split goes to method org.apache.solr.highlight.SolrHighlighter#expandWildcardsInHighlightFields.
In doc also mentioned expected contract https://lucene.apache.org/solr/guide/7_3/highlighting.html
hl.fl
Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets.
A wildcard of * (asterisk) can be used to match field globs, such as text_* or even * to highlight on all fields where highlighting is possible. When using *, consider adding hl.requireFieldMatch=true.
When not defined, the defaults defined for the df query parameter will be used.
try
q=gold&hl=true&hl.fl=bar_*&hl.fl=foo
After digging into Solr sources (org.apache.solr.highlight.SolrHighlighter#getHighlightFields) I have found a workaround for this. As appears Solr interprets hl.fl content as a regular expression pattern. So I've specified hl.fl as:
hl.fl=bar_*|foo
I.e. using | instead of comma. That worked perfectly for me.
Btw, I have found no documentation of this in the internet.

libxml2: missing children when dumping a node with xmlNodeDump()

I'm facing an issue with libxml2 (version 2.7.8.13).
I'm attempting to dump a node while parsing an in-memory document with a xmlTextReaderPtr reader.
So upon parsing the given node, I use xmlNodeDump() to get its whole content, and then switch to the next node.
Here is how I proceed:
[...]
// get the xmlNodePtr from the text reader
node = xmlTextReaderCurrentNode(reader);
// allocate a buffer to dump into
buf = xmlBufferCreate();
// dump the node
xmlNodeDump(buf, node->doc, node, 0 /* level of indentation */, 0 /* disable formatting */);
result = strdup((char*)xmlBufferContent(buf));
This works in most cases, but sometimes the result is missing some children from the parsed node. For instance, the whole in-memory xml document contains
[...]
<aList>
<a>
<b>42</b>
<c>aaa</c>
<d/>
</a>
<a>
<b>43</b>
...
</aList>
and I get something like:
<aList>
<a>
<b>42</b>
</c>
</a>
</aList>
The result is well formed but it lacks some data ! A whole bunch of children has "disappeared". xmlNodeDump() should recursively dumps all children of .
It looks like some kind of size limitation.
I guess I do something wrong, but I can't figure out what.
Thank you for your answers.
I succeeded in implementing this correctly another way, still I do not understand what happened there. Thank you for having read my question.
FYI, instead of trying to tinker an existing parsing code based on xmlTextReader, I have just rewritten a small parsing module for my case (dump all the 1st level siblings into separate memory chunks).
I did so by using the parsing and tree modules of libxml2, so:
get the tree from the in-memory xml document with xmlReadMemory()
get the first node with xmlDocGetRootElement()
for each sibling (with xmlNextElementSibling() ), dump its content (all children recursively) with xmlNodeDump()
Et voilĂ , kinda straightforward actually. Sometimes it's easier to start from scratch...
I guess there was some side effect.

Append bytes to a content of nt:unstructured type of node in jcr

I want to append some bytes in a already saved file in JCR .How can we do this ?
The file is stored in a nt:unstructured node in the JCR repo. Any suggestions ?
Because of the Binary values are streamed, I don't think you have a choice other than to create a new Binary using the stream of the original plus the extra bytes you want to append. In other words, I know of no utility of built-in functionality, so you have to write this minimal code yourself.

How to convert <node/> to <node></node> with libxml (converting empty elements to start-end tag pairs)

While generating an XML content, I get an empty node <node/>, and I want it to be <node></node>. (Since <node></node> is the correct form of c14n, the progress called "converting empty elements to start-end tag pairs")
How should I convert it?
There's a way hinted by Jim Garrison(Thank you) to do this,
by using xmlBufferCreate, xmlSaveToBuffer, xmlSaveDoc, xmlSaveClose
with xmlSaveOption: XML_SAVE_NO_EMPTY
Take a look at the libxml2 documentation, specifically xmlSaveOption value XML_SAVE_NO_EMPTY
I found another way which is easier when the nodes are generated under control, by simply giving value "" to the node.

JCR create single file, link from different nodes

I am trying to create a single file node for an image with name (say A.gif). Now, I want to re-use the file across multiple nodes. Is there a way to do this?
As a workaround, I am re-creating file nodes for different paths in my repository, but this results in duplication of files.
If you're using jackrabbit, copying a file node (or rather copying a binary property) is cheap if the DataStore is active.
That component makes sure "large" binary properties (with a configurable size threshold IIRC) are stored once only, based on a digest of their content.
So you can in this case copy the same file node many times without having to worry about disk space.
I'm not sure I understand your problem. However, what I would do is store the file in a single location and then reference it using a path property from multiple locations.
Assume that you have an the following node structure
-content
- articles
- article1
- article2
- images
- image1
You can set on each of the articles a property named imagePath which points to the path of the image to display, in this case /content/images/image1.
The nt:linkedFile type was made for just this kind of use.
And just for completeness, don't forget references.
Node imageNode = rootNode.addNode("imageNode");
imageNode.addMixin(JcrConstants.MIX_REFERENCEABLE);
Node node1 = rootNode.addNode("1");
node1.setProperty("image", imageNode);
Node node2 = rootNode.addNode("2");
node2.setProperty("image", imageNode);
session.save();
PropertyIterator references = imageNode.getReferences();
while (references.hasNext()) {
Property reference = references.nextProperty();
System.out.println(reference.getPath());
}

Resources