Solr highlighting does not work with multiple fields hl.fl when dynamic field is present - solr

I have a dynamic text field bar_* in my index and want Solr to return highlightings for that field. So what I run is:
q=gold&hl=true&hl.fl=bar_*
It works as expected BUT in case I add some more fields to hl.fl it stops working. E.g.
q=gold&hl=true&hl.fl=bar_*,foo
Notes:
bar_* and foo fields are in the index/schema and there is no error here.
just rewriting request as q=gold&hl=true&hl.fl=bar_*&hl.fl=foo or q=gold&hl=true&hl.fl=bar_* foo does NOT help.
I didn't find any bugs in Solr JIRA on that topic.
Does anyone have an idea how to bit this. The possible workarounds that I see are:
Use hl.fl=*. But this one is not good for performance.
Explicitly specify all possible fields names for my dynamic field. But I don't like that at all.

I don't know what version is used, but it seems like this was a bug of previous Solr versions, I can confirm that in Solr 7.3 this works as expected.
curl -X GET \
'http://localhost:8983/solr/test/select?q=x_ggg:Test1%20OR%20bar_x:Test2&hl=true&hl.fl=%2A_ggg,foo,bar_%2A' \
-H 'cache-control: no-cache'
The more correct way is to do: hl.fl=bar_*,foo,*_ggg (use , or space as delimiter).
This helps to avoid long time debugging when you remove asterisk from your hl.fl parameter and highlighting by fields stops working, since this field not processed as regex anymore.
Here is spots in sources of Solr 7.3, where we can trace this behavior:
Solr calls org.apache.solr.highlight.SolrHighlighter#getHighlightFields
Before processing field, value splited by , or space here:
org.apache.solr.util.SolrPluginUtils#split
private final static Pattern splitList=Pattern.compile(",| ");
/** Split a value that may contain a comma, space of bar separated list. */
public static String[] split(String value){
return splitList.split(value.trim(), 0);
}
Results of split goes to method org.apache.solr.highlight.SolrHighlighter#expandWildcardsInHighlightFields.
In doc also mentioned expected contract https://lucene.apache.org/solr/guide/7_3/highlighting.html
hl.fl
Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets.
A wildcard of * (asterisk) can be used to match field globs, such as text_* or even * to highlight on all fields where highlighting is possible. When using *, consider adding hl.requireFieldMatch=true.
When not defined, the defaults defined for the df query parameter will be used.

try
q=gold&hl=true&hl.fl=bar_*&hl.fl=foo

After digging into Solr sources (org.apache.solr.highlight.SolrHighlighter#getHighlightFields) I have found a workaround for this. As appears Solr interprets hl.fl content as a regular expression pattern. So I've specified hl.fl as:
hl.fl=bar_*|foo
I.e. using | instead of comma. That worked perfectly for me.
Btw, I have found no documentation of this in the internet.

Related

Issues with searching special characters in Solr

I'm using Solr 6.1.0
When I use defType=edismax, and using debug mode by setting debug=True, I found that the search for "r&d" is actually done to search on just the character "r".
http://localhost:8983/solr/collection1/highlight?q="r&d"&debugQuery=true&defType=edismax
"debug":{
"rawquerystring":"\"r",
"querystring":"\"r",
"parsedquery":"(+DisjunctionMaxQuery((text:r)))/no_coord",
"parsedquery_toString":"+(text:r)"
Even if I search with escape character, it is of no help.
http://localhost:8983/solr/collection1/highlight?q="r\&d"&debugQuery=true&defType=edismax
"debug":{
"rawquerystring":"\"r\\",
"querystring":"\"r\\",
"parsedquery":"(+DisjunctionMaxQuery((text:r)))/no_coord",
"parsedquery_toString":"+(text:r)",
But if I'm using other symbols like "r*d", then the search is ok.
http://localhost:8983/solr/collection1/highlight?q="r*d"&debugQuery=true&defType=edismax
"debug":{
"rawquerystring":"\"r*d\"",
"querystring":"\"r*d\"",
"parsedquery":"(+DisjunctionMaxQuery((text:\"r d\")))/no_coord",
"parsedquery_toString":"+(text:\"r d\")",
What could be the reason behind this?
Regards,
Edwin
First - if you're using the URL as you've pasted, & is the separator between different arguments in the URL, and have to be properly urlencoded if it belongs to an argument, and is not an argument separator.
q=text:"foo&bar"&fl=..
is parsed as
q=text:"foo
bar"
fl=..
Your Solr library usually handles this for you transparently. text%3A%22r%26d%22 is the urlencoded version of text:"r&d".
Secondly, any further parsing will depend on the analysis chain and tokenizer for the field you're searching. This determines which characters are kept and how the text is tokenized (split into separate tokens) before the tokens are matched between the querying text and the indexed text.
What Analyzer are you using for your field . Better try a Analyzer that doesn't tokenize your field much like KeyWordTokenizerFactory.

Add specified comment pattern in c

I'm wondoring if there exists a function of a software tool which allows me to add empty comment pattern to the variables defined in methods in c, for example /**...*/
I've tried using eclipse and vim. The best I can do is to add just comments for functions at the begining. I'd like to know if I could add such pattern wherever I want.
I know that use short cut key like Shift+Ctrl+/ can make a sentence as comment, but in the format of //. If there's a way for me to change this format to the one I want, that would be also a great help. Thanks!
In Notepad++ you can do that!
Check this link
In the web page search for Comment / uncomment section.
With The NERD Commenter, you can surround a selected text or variable with command delimiters via its <Leader>cc mapping:
[count]<Leader>cc NERDComComment
Comment out the current line or text selected in visual mode.
With
let NERDComInsertMap='<c-c>'
you can define an insert mode mapping that inserts the comment prefix and suffix at the current position, and puts the cursor in between. The comment syntax is filetype-specific and can be configured via the 'commentstring' option.
To change the comment prefix / suffix, you have to customize the plugin (in your ~/.vimrc), as described by :help NERDCustomDelimiters, e.g. for Java:
let g:NERDCustomDelimiters = {
\ 'java': { 'left': '/**', 'right': '*/' }
\ }
For unknown filetypes, you can also use 'commentstring', as this is what the plugin falls back to.

CakePHP/Croogo: Searching for strings with some special chars returns no results

I noticed there is a bug in Croogo's NodesController::search() when searching for words with some non-ascii chars on them e.g. 'üäö'. If I search for example for 'Steuergeräte' (german) I get no results, even though I should. If I search for 'Steuergerate' (which would be misspelled in german) I get the desired results. Which is totally weird.
A direct query on the db I works fine:
"SELECT * FROM i18n WHERE content LIKE '%Steuergeräte%';"
Which returns the expected records.
But it's not a general problem with unicode-chars, as for example, searching for a japanese word worked as expected. So this only affect some chars.
Cakephp: 2.4.0, Croogo: 1.4.5
Ok, I found the cause of the problem.
On the search-view, the string to be searched for is cleaned with:
$q = Sanitize::clean($this->request->params['named']['q']);
Which among other things runs html_entities on the string as default, when 'encode' => true is set (default). This would turn e.g. ö into ö and then search for words with html-entities on them.
I got a workaround by doing:
$q = $this->request->params['named']['q'];
// Use encode=false on Sanitize::clean to prevent äüöß etc. getting
// replaced by html entities. And strip tags manually first to prevent
// html injected strings.
$q = strip_tags($q);
$q = Sanitize::clean($q, array('encode' => false));
Note: If like in my case, TinyMCE is set with 'entity_encoding' => 'raw' then the body field in the nodes table would contain äöü instead of htmlentities as well, which IMO is a far better practice as replacing them with htmlentities. Per default though, tinymce replaces chars with htmlentities, so the body field would work with the default search behaviour of Croogo/Cakephp. But searching, for example, in the title-field wouldn't.
Update
Ok, as mark comments suggested, sanitizing and using cake's paginate method, is not necessary, so the Sanitize part can be skipped. I also found using htmlspecialchars even better as strip_tags, as strip_tags wouldn't take care of e.g. '&', and on the body, tinyMCE saves those as html_entities. So the updated code would look like this:
$q = htmlspecialchars($this->request->params['named']['q']);
// go on with searching for nodes on paginate-method

Highlighting issue with quoted queries in Solr - fragment not returned

This is very curious. Highlighting works fine in every other case, but there's this one case it doesn't return any fragments. My document is as follows (fieldType text_en):
Abu Yahya Suhaib bin Sinan (May Allah be pleased with him) reported that: The Messenger of Allah (PBUH) said, "How wonderful is the case of a believer; there is good for him in everything and this applies only to a believer. If prosperity attends him, he expresses gratitude to Allah and that is good for him; and if adversity befalls him, he endures it patiently and that is better for him".[Muslim].
My query is
"wonderful is the case of a believer"
Solr finds the document to return alright, but the highlighting component of the return value doesn't contain the text of the document. It has a field for the primary key of the document (like always) but nothing deeper than that.
If I remove the last word, everything works properly. If I remove the last word from the quotes and place it outside, it works. It even works with a longer (different) string in quotes. It just doesn't work for this!
How do I begin debugging this?
I don't have any highlighting setup in schema.xml and here are the parameters I'm passing as part of he query:
&hl=true&hl.fl=hadithText&hl.snippets=50&hl.fragsize=2500&hl.mergeContiguous=true&defType=edismax&mm=3<-1%205<-2&hl.usePhraseHighlighter=true

How to Increase/Configure Snippet size of a highlight?

I want to know that how we can configure Snippet Size(number of words/Characters) in highlighting? Currently i m facing a problem, sometimes solr Gives me snippet exactly the matched word. like let say I query solr as "Contents:risk" using solrnet it gives me exactly "risk" in highlighting snippets no more characters or words i do the same with Solr admin and it gives the same result too.
I'm not quite familiar with highlighting features but I believe this is done with the hl.fragsize parameter.
Mauricio already answered and this is a little bit of an old thread, but just to add the solution using SolrNet it would be:
Create a new Highlighting parameters object.
Set fragsize
Other parameters are possible
Highlighting documentation can be found here: Highlighting.md
Here is a sample code:
private HighlightingParameters SetHighLightSnippetParameters()
{
return new HighlightingParameters
{
Fragsize = SearchConstants.SnippetSize
};
}

Resources