I need to customize Solr highlighting prefix and suffix like this:
<span class="highlight">text</span>
instead of the default
<em>text</em>
That's why I'm using this configuration within the solrconfig.xml for the HighlightComponent:
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.tag.post"><![CDATA[</span>]]></str>
</lst>
</fragmentsBuilder>
</highlighting>
</searchComponent>
The following are the default parameters for my standard request handler:
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="hl">true</str>
<str name="hl.fl">body,title</str>
<str name="hl.useFastVectorHighlighter">true</str>
</lst>
</requestHandler>
When I search for the text word I do get the text word highlighted, but not always using the prefix and suffix I configured:
<lst name="highlighting">
<lst name="document_1">
<arr name="body">
<str>my <em>text</em> highlighted</str>
</arr>
<arr name="title">
<str>my <span class="highlight">text</span> highlighted</str>
</arr>
</lst>
</lst>
Does anybody know why?
I am guessing you are seeing this behavior behavior because you only have the prefix and suffix defined for the SimpleFragmentsBuilder and the other highlights are coming from another fragment builder.
I am using a custom prefix and suffix for my highlighting and I set this value in the formatter section of the highlighting section of the solrconfig.xml and have not had any issues as it will apply to all fragment builders.
So maybe try the following:
<highlighting>
<fragmentsBuilder name="simple" default="true"
class="solr.highlight.SimpleFragmentsBuilder"/>
<!-- Configure the standard formatter -->
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.simple.post"><![CDATA[</span>]]></str>
</lst>
</formatter>
</highlighting>
I finally found out why! I'm using fastVectorHighlighter to make highlighting faster.
At the beginning I was highlighting only the title field and everything worked fine.
When I added the body field to highlighting I forgot to enable termVectors=true.
Now that my body field looks like this
<field name="body" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
after a full reindex highlighting is working perfectly:
<lst name="highlighting">
<lst name="document_1">
<arr name="body">
<str>my <span class="highlight">text</span> highlighted</str>
</arr>
<arr name="title">
<str>my <span class="highlight">text</span> highlighted</str>
</arr>
</lst>
</lst>
Previously the body field highlighting did work, but without fastVectorHighlighter since the field didn't have the termVectors=true parameter. That's why I got body highlighted with default prefix and suffix. Since fastVectorHighlighter is a completely different highlighting method, the configuration is different as well.
To avoid this kind of mistakes, as long the users can choose what fields to highlight with the hl.fl parameter, I'd recommend to include also the configuration for the standard highlighting (formatter element, class solr.highlight.HtmlFormatter) like this:
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.simple.post"><![CDATA[</span>]]></str>
</lst>
</formatter>
<fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.tag.post"><![CDATA[</span>]]></str>
</lst>
</fragmentsBuilder>
</highlighting>
</searchComponent>
This way highlighting will work with the same prefix and suffix even for fields with termVectors disabled.
Related
I actually use Solr 4.8.1 and I set up spellcheck. After indexing, the request doesn't return any suggestion.
After the advice of #n0tting, I modified a little my files.
Here are steps:
1- solrconfig.xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">phraseText</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="name">default</str>
<str name="field">title_spellcheck</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
add some configurations in standard requestHandler:
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
<str name="spellcheck.dictionary">default</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
2 schema.xml
Define a field for spell check:
<field name="title_spellcheck" type="phraseText" indexed="true" stored="false" multiValued="true" />
<copyField source="title" dest="title_spellcheck"/>
3 Request:
.../select?q=recommend&defType=edismax&qf=title&spellcheck=true&spellcheck.build=true&spellcheck.q=recommend&spellcheck.collate=true
I don't get any suggestion at result, neither <lst name="spellcheck">. can anybody give me an advice? Thanks a lot.
References:
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
http://solr.pl/en/2011/05/23/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-spellcheckcomponent-%E2%80%93-did-you-really-mean-that-part-5/
I am having an issue with by solr settings.
After a lot of investigation today, I found that its the spellcheck component which is causing the issue of Core Reload to hang.
If its turned off, all will run well and core can easily reload. However, when the spellcheck is on, the core wont reload instead hangs forever. Then the only way to get the project back alive is to stop solr, and delete the data folder then start solr again.
Here are the solr config settings for spell check:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<!-- Spell checking defaults -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">3</str>
<str name="spellcheck.maxCollationTries">3</str>
<str name="spellcheck.collateExtendedResults">true</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_en_splitting</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">location_details</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="buildOnCommit">true</str>
<float name="accuracy">0.5</float>
<float name="thresholdTokenFrequency">.01</float>
<int name="maxEdits">1</int>
<int name="minPrefix">3</int>
<int name="maxInspections">3</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.001</float>
</lst>
</searchComponent>
.
Here is the field from schema:
<field name="location_details" type="text_en_splitting" indexed="true" stored="false" required="false" />
Basically, it is a bug in Solr. You need to just hide/comment/remove the following from your requestHandler:
<!--<str name="spellcheck.maxCollationTries">3</str> here is a bug, put this parameter in the actual query string instead -->
Furthermore, if you really need to use maxCollationTries, you can enter it as a Query parameter in your url instead.
The following query works well for me
http://[]:8983/solr/vault/select?q=VersionComments%3AWhite
returns all the documents where version comments includes White
I try to omit the field name and put it as a default value as follows :
In solr config I write
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">PackageName</str>
<str name="df">Tag</str>
<str name="df">VersionComments</str>
<str name="df">VersionTag</str>
<str name="df">Description</str>
<str name="df">SKU</str>
<str name="df">SKUDesc</str>
</lst>
I restart the solr and create a full import.
Then I try using
http://[]:8983/solr/vault/select?q=White
(Where
http://[]:8983/solr/vault/select?q=VersionComments%3AWhite
still works)
But I dont get the document any as answer.
What am I doing wrong?
As far as I know you should only have the <str name="df"></str> declared once in your requestHandler
Typically what I do is copy all the fields that i want to search into a default search field called text.
schema.xml:
<copyField source="name_t" dest="text"/>
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="q">*:*</str>
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
</requestHandler>
If this is not good enough, you can always search other fields using a dismax search with the qf declaration like so:
http://localhost:8983/solr/vault/select/?q= White&defType=dismax&qf=PackageName+Tag+VersionComments+VersionTag+Description+SKU+SKUDesc
I use spellcheck component and when I request solr I have results. But if I use spellcheck.q, i haven't result.
Someone has an idea ?
Thanks
<!-- The spell check component can return a list of alternative spelling
suggestions. -->
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spellCheck</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="buildOnCommit">true</str>
<str name="accuracy">0.4</str>
<float name="thresholdTokenFrequency">.0004</float>
</lst>
</searchComponent>
<!--<queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/>-->
<!-- Handler par défaut -->
<requestHandler name="default" class="solr.SearchHandler" lazy="true" default="true">
<lst name="defaults">
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">10</str>
<str name="hl.usePhraseHighLighter">true</str>
<str name="hl.highlightMultiTerm">true</str>
<str name="hl.mergeContiguous">true</str>
</lst>
<arr name="last-components">
<str>highlight</str>
<str>spellcheck</str>
</arr>
</requestHandler>
Have you added your spellcheck component to the corresponding request handler (in solr config), set spellcheck parameter to true (or on) and configured the correct dictionary to use (if its name different than "default")?
If you don't use the spellcheck.q parameter, then the default is to use the q parameter (from http://wiki.apache.org/solr/SpellCheckComponent#q_OR_spellcheck.q). From that wiki:
Essentially, if you have a spelling "ready" version in your application, then it is probably better to send spellcheck.q, otherwise, if you just want Solr to do the job, use the q parameter
The reason that it works if you change the definition of the field type is probably due to the new field type being "spelling ready". It would help if you posted the query you are using and the relevant lines in the schema.xml.
I am trying to set up spellchecker, according to solr documentation. But when I am testing, I don't have any suggestion. My piece of code follows:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
<str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>
<requestHandler name="/spellcheck" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
<str name="spellcheck.dictionary">default</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
The query I send to Solr:
q=%2B%28text%3A%28gasal%29%29&suggestField=contentOriginal&ontologySeed=gasal&spellcheck.build=true&spellcheck.q=gasal&spellcheck=true&spellcheck.collate=true&hl=true&hl.snippets=5&hl.fl=text&hl.fl=text&rows=12&start=0&qt=%2Fsuggestprobabilistic
Does anybody know why?? Thanks in advance
First, don't repeat queryAnalyzerFieldType twice in the component configuration.
It is recommended not to use a /spellcheck handler but instead to bind the spellcheck component to the standard query handler (or dismax if it is what you use) like this:
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
...
</lst>
<arr name="last-components">
<str>spellcheck</str>
...
</arr>
</requestHandler>
You can then call it like this:
http://localhost:8983/solr/select?q=komputer&spellcheck=true
Also don't forget to build the spellcheck dictionary before you use it:
http://localhost:8983/solr/select/?q=*:*&spellcheck=true&spellcheck.build=true
You can force the dictionary to build at each commit by configuring it in the component:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">./spellchecker1</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
Finally, make sure that your name field is really an indexed field of type textSpell and that it contains enough content to build a good dictionary. In my case, I have a field named spellchecker that is populated from a couple of fields of my index (using copyField instructions in the schema).