I want to know that how we can configure Snippet Size(number of words/Characters) in highlighting? Currently i m facing a problem, sometimes solr Gives me snippet exactly the matched word. like let say I query solr as "Contents:risk" using solrnet it gives me exactly "risk" in highlighting snippets no more characters or words i do the same with Solr admin and it gives the same result too.
I'm not quite familiar with highlighting features but I believe this is done with the hl.fragsize parameter.
Mauricio already answered and this is a little bit of an old thread, but just to add the solution using SolrNet it would be:
Create a new Highlighting parameters object.
Set fragsize
Other parameters are possible
Highlighting documentation can be found here: Highlighting.md
Here is a sample code:
private HighlightingParameters SetHighLightSnippetParameters()
{
return new HighlightingParameters
{
Fragsize = SearchConstants.SnippetSize
};
}
Related
I have a dynamic text field bar_* in my index and want Solr to return highlightings for that field. So what I run is:
q=gold&hl=true&hl.fl=bar_*
It works as expected BUT in case I add some more fields to hl.fl it stops working. E.g.
q=gold&hl=true&hl.fl=bar_*,foo
Notes:
bar_* and foo fields are in the index/schema and there is no error here.
just rewriting request as q=gold&hl=true&hl.fl=bar_*&hl.fl=foo or q=gold&hl=true&hl.fl=bar_* foo does NOT help.
I didn't find any bugs in Solr JIRA on that topic.
Does anyone have an idea how to bit this. The possible workarounds that I see are:
Use hl.fl=*. But this one is not good for performance.
Explicitly specify all possible fields names for my dynamic field. But I don't like that at all.
I don't know what version is used, but it seems like this was a bug of previous Solr versions, I can confirm that in Solr 7.3 this works as expected.
curl -X GET \
'http://localhost:8983/solr/test/select?q=x_ggg:Test1%20OR%20bar_x:Test2&hl=true&hl.fl=%2A_ggg,foo,bar_%2A' \
-H 'cache-control: no-cache'
The more correct way is to do: hl.fl=bar_*,foo,*_ggg (use , or space as delimiter).
This helps to avoid long time debugging when you remove asterisk from your hl.fl parameter and highlighting by fields stops working, since this field not processed as regex anymore.
Here is spots in sources of Solr 7.3, where we can trace this behavior:
Solr calls org.apache.solr.highlight.SolrHighlighter#getHighlightFields
Before processing field, value splited by , or space here:
org.apache.solr.util.SolrPluginUtils#split
private final static Pattern splitList=Pattern.compile(",| ");
/** Split a value that may contain a comma, space of bar separated list. */
public static String[] split(String value){
return splitList.split(value.trim(), 0);
}
Results of split goes to method org.apache.solr.highlight.SolrHighlighter#expandWildcardsInHighlightFields.
In doc also mentioned expected contract https://lucene.apache.org/solr/guide/7_3/highlighting.html
hl.fl
Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets.
A wildcard of * (asterisk) can be used to match field globs, such as text_* or even * to highlight on all fields where highlighting is possible. When using *, consider adding hl.requireFieldMatch=true.
When not defined, the defaults defined for the df query parameter will be used.
try
q=gold&hl=true&hl.fl=bar_*&hl.fl=foo
After digging into Solr sources (org.apache.solr.highlight.SolrHighlighter#getHighlightFields) I have found a workaround for this. As appears Solr interprets hl.fl content as a regular expression pattern. So I've specified hl.fl as:
hl.fl=bar_*|foo
I.e. using | instead of comma. That worked perfectly for me.
Btw, I have found no documentation of this in the internet.
I have a List of Strings that I want to set as a parameter in a preparedstatement. This question and answer here, makes it look easy:
How to use an arraylist as a prepared statement parameter
Well, not really easy. There is still the conversion of a List, to an SQL-Array, which I found easiest to do by creating a String[] in between.
Below is my code:
PreparedStatement s = con.prepareStatement("SELECT * FROM Table WHERE Country IN ? ");
String[] countryArray = new String[countryListObject.size()];
countryArray = countryListObject.toArray(countryArray);
Array cArray = con.createArrayOf("VARCHAR", countryArray); //<--- Throws the exception
s.setArray(1, cArray);
This answer Seems to adress a similar problem, but I can't really understand how this helped solve anything. The answer is ambigous at best, stating only that:
Basically what you are wanting to do is not directly possible using a
PreparedStatement.
I've come to learn from the API Documentation that this exception is thrown if the JDBC driver does not support this method. I'm running a com.microsoft.sqlserver sqljdbc4 version 3.0. I am trying to see which versions do and don't support setArray, but I can't find the information. It is probably right in front of me, but I would really appreciate a little help on this.
How can I figure out if my JDBC's do support setArray()?
I am using Lucene 4.2 and would like to know how wordnet can be used to expand an input query for this version of Lucene. Basically, if my query is like
term_1 AND term_2 OR term_3
I would like it to be expanded as
(term_1 OR term_1syn_1 OR term_1syn_2) AND (term_2 OR term_2syn_1) OR (term_3 OR term_3syn_1)
and so on.
I looked at other answers on StackoverFlow for this kind of question, but none of them have any sample implementation.
Given an input query in form of a string, how can I expand it using the WordNetQueryParser and SynonymMap classes?
I have already downloaded the wordnet prolog file and I know that the _s.pl file has all the synonyms.
Any sample code would be highly appreciated.
A SynonymFilter allows you to define a SynonymnMap to a simple Custom Analyzer.
You can create a custom Analyzer by just overriding Analyzer.createComponents, and pass the custom version to both the IndexWriter and the QueryParser, when writing to and searching respectively.
One thing to consider, your case involves exploding out all possible synonyms, which will mean passing includeOrig to true in Builder.add. There are benefits either way here, might look into which will actually serve your needs best.
Lucene's Analyzer is designed to be readily extended to define the formatting for your particular case easily. The Analyzer API documentation linked above provides an example of overriding the createComponents method for your custom Analyzer.
Something like:
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new ClassicTokenizer(Version.LUCENE_40, reader);
TokenStream filter = new StandardFilter(Version.LUCENE_40, source);
filter = new LowerCaseFilter(Version.LUCENE_40,filter);
filter = new SynonymFilter(filter, mySynonymMap, false);
//Whatever other filter you want to add to the chain, being mindful of order.
return new TokenStreamComponents(source, filter);
}
And you'll need to define mySynonymMap, from the example, which is a SynonymnMap. The SynonymMap should generally be built by the SynonymMap.Builder, via the add(CharsRef, CharsRef, boolean) method linked above.
SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("crimson"), new CharsRef("red"), true);
//Be sure the boolean last arg you pass there is the one you want. There are significant tradeoffs here.
//Add as many terms as you like here...
SynonymMap mySynonymMap = builder.build();
There is also a WordNetSynonymParser, if you prefer that, which looks like just a SynonymMap.Builder designed to read a particular sort of specification, at a glance.
I am using Sunburnt Python Api for Solr Search. I am using highlighted search in Sunburnt it works fine.
I am using the following code:
search_record = solrconn.query(search_text).highlight("content").highlight("title")
records = search_record.execute().highlighting
Problem is it returns only 10 records. I know it can be change from solr-config.xml but issue is I want all records
I want to apply pagination using highlighted search of Sunburnt.
Given the SOLR-534 issue, which is still unresolved, you can't tell Solr to give you all results, but you can use a really high rows parameter depending on how many documents you expect to have in your index. I don't know anything about sunburnt but I believe something like this should work:
search_record = solrconn.query(search_text).paginate(rows=10000).highlight("content").highlight("title")
You just have to replace the rows value with something enough big depending on your index size.
The general approach to this is to use a paginator:
from django.core.paginator import Paginator
paginator = Paginator(si.query("black"), 30)
Once that's done, you can just paginate through everything:
for result in paginator.object_list:
print result
As I'm saying in title, i want to get a complete sentence when i search with highlighting.
Actually, i get a result which is cut in middle of a word.
For example, if I'm searching for the word "complete", I get ying in title, i want to get a complete sentence wh but I want the complete sentence As I'm saying in title, i want to get a complete sentence when i search with highlighting.
I've already tried to use "fragmenter" but I haven't any result.
Can anyone help me ?
Thanks and sorry for my english.
Also check if your request handler or query parameter has hl.useFastVectorHighlighter set to true. If the field that it generates summaries on are not set up with the correct term settings as mentioned in the wiki, the words could cut as you describe.
I think you find your answer here: http://wiki.apache.org/solr/HighlightingParameters
Take a look at the parameter hl.snippets and hl.fragsize, wehere you can define the length of the returned fragment - so you could set the value to the fieldsize (if the field isn't so large).
An other interesting parameter in your case is hl.fragmenter
got almost the same problem cutting words. As I just mentioned over here you could use another BoundaryScanner.
This gave me perfect results.