I need to get snippets from documents where the query terms are matched to be able to output results similar to Google's snippet beneath the website URL. For example:
Snippet - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Snippet
A snippet is defined as a small piece of something, it may in more specific contexts refer to: Sampling (music), the use of a short phrase of a recording as an ...
I have set hl=true and even hl.fl='*' in the query URL and but no summaries are being output.
Solr FAQs say:
For a field to be summarizable it must be both stored and indexed.
I'm using Nutch and Solr and have set them up using this tutorial. What additional steps to I need to take to be able to do this?
Adding sample query and output:
http://localhost:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on&hl=true
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">57</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">test</str>
<str name="hl">true</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="94" start="0">
<doc>
<arr name="anchor">
<str>User:Sir Lestaty de Lioncourt</str>
</arr>
<float name="boost">0.0</float>
<str name="digest">6c27160d0b08068f3873bb2c063508b3</str>
<str name="id">
http://aa.wikibooks.org/wiki/User:Sir_Lestaty_de_Lioncourt
</str>
<str name="segment">20111029223245</str>
<str name="title">User:Sir Lestaty de Lioncourt - Wikibooks</str>
<date name="tstamp">2011-10-29T21:34:27.055Z</date>
<str name="url">
http://aa.wikibooks.org/wiki/User:Sir_Lestaty_de_Lioncourt
</str>
</doc>
...
</result>
<lst name="highlighting">
<lst name="http://aa.wikibooks.org/wiki/User:Sir_Lestaty_de_Lioncourt"/>
<lst name="http://aa.wikipedia.org/wiki/User:PipepBot"/>
<lst name="http://aa.wikipedia.org/wiki/User:Purodha"/>
...
</lst>
</response>
Looks like you aren't specifying the field to highlight (hl.fl). You should create a text field to use for highlighting (don't use string type) and have it stored/indexed.
Related
I have created two query documents with names 'makeup', and 'make up' in elevate.xml.
When I execute the elevate solr query, I am getting exception "Boosting query defined twice for query".
whereas when I save two documents with names 'ChildCare', and 'Child Care', Solr is returning the results.
Below is my Solr query:
http://localhost:8983/solr/oneweb-collection/elevate?
q=*:*&defType=edismax&fl=id&fl=title&fl=subtitle&fl=course_code&
fl=cricos_code&fl=course_introduction&fl=outcome&fl=page_url&
fl=score&fl=%5Btafe_elevated%5D&rows=3&wt=json
When I save the document nodes, system internally replacing the spaces and storing the documents with same name.
What is the resolution for this issue?
Config for elevator:
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<str name="queryFieldType">text_general</str>
<str name="config-file">elevate.xml</str>
<str name="forceElevation">true</str>
<str name="exclusive">true</str>
<str name="editorialMarkerFieldName">test_elevated</str>
</searchComponent>
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="defType">edismax</str>
<int name="rows">3</int>
<str name="fl">id,title,subtitle,course_code,cricos_code,course_introduction,outcome,page_url,[test_elevated],score</str>
<str name="q.alt">*:*</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
I am using Solr for spell checking. Enabled both DirectSolrSpellChecker & WordBreakSolrSpellChecker. I have the following issue:
A. When I am querying for "worry". Solr is converting this term to "worri" and returning results for the same. If word is ending with "y" [ "injury","worry" etc..], the ending "y" is replaced with "i".
Example Query:
http://localhost:8983/solr/MY_CORE/spell?df=text&spellcheck.q=worry&spellcheck=true&spellcheck.extendedResults=true&spellcheck.onlyMorePopular=true
Solr Result:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">5</int>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="worri">
<int name="numFound">9</int>
<int name="startOffset">0</int>
<int name="endOffset">5</int>
<int name="origFreq">5</int>
<arr name="suggestion">
<lst>
<str name="word">wo r ri</str>
<int name="freq">90</int>
</lst>
<lst>
<str name="word">worst</str>
<int name="freq">12</int>
</lst>
<lst>
<str name="word">wo r r i</str>
<int name="freq">5246</int>
</lst>
<lst>
<str name="word">work</str>
<int name="freq">2920</int>
</lst>
<lst>
<str name="word">w o r ri</str>
<int name="freq">530</int>
</lst>
<lst>
<str name="word">worn</str>
<int name="freq">81</int>
</lst>
<lst>
<str name="word">w o r r i</str>
<int name="freq">5246</int>
</lst>
<lst>
<str name="word">wors</str>
<int name="freq">79</int>
</lst>
<lst>
<str name="word">worm</str>
<int name="freq">10</int>
</lst>
</arr>
</lst>
</lst>
<bool name="correctlySpelled">false</bool>
</lst>
</response>
B. Also above output have words like "w o r r i", and I couldn't find any of those words in the solr field. I also don't know why solr is returning such words where letters are separated by spaces.
Below is schema file:
<field name=MY FIELD type="text_en" multiValued="false" indexed="true" stored="true"/>
Below is the config file:
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field"> MY FIELD </str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst>
<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">MY FIELD</str>
<str name="combineWords">false</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">false</str>
<str name="spellcheck.collateExtendedResults">false</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
I would really appreciate if someone can help me regarding this.
Thanks in advance !
The "strange" suggestions that you have like "wo r r i". You have them, because you're using WordBreakSolrSpellChecker and it breaks tokens trying to provide you some spellcheck capabilities, so if you will remove you shouldn't get these kind of suggestions. Here is the quote from the official documentation:
WordBreakSolrSpellChecker offers suggestions by combining adjacent
query terms and/or breaking terms into multiple words. It is a
SpellCheckComponent enhancement, leveraging Lucene's
WordBreakSpellChecker. It can detect spelling errors resulting from
misplaced whitespace without the use of shingle-based dictionaries and
provides collation support for word-break errors, including cases
where the user has a mix of single-word spelling errors and word-break
errors in the same query. It also provides shard support.
So, basically, in your example - you're getting normal suggestions from Solr index like: worst, work, worm, worn, wors. all other are just the result of WordBreakSolrSpellChecker and you will never find them in your index.
I like to channel Solr search results at query time. For example I have three channels: products, faq and other_docs. All within the same Solr core with the same fields filled. What I would like to acceive is to have Solr group the results "channel" for me.
Sample database (csv):
id,channel,name,desc
1,product,Some product,This is an very cool product!
2,product,Other product,This is an other product!
3,faq,How to stuff,This time: Simply do it!
4,other_docs,Legal notice,All your base are belong to us!
Wanted query result (xml):
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="grouped">
<lst name="channel">
<int name="matches">3</int>
<arr name="groups">
<lst>
<str name="groupValue">product</str>
<result name="doclist" numFound="2" start="0">
<doc>
<str name="name">Some product</str>
<str name="desc">This is an very cool product!</str></doc>
<doc>
<str name="name">Other product</str>
<str name="desc">This is an other product!</str></doc>
</result>
</lst>
<lst>
<str name="groupValue">faq</str>
<result name="doclist" numFound="1" start="0">
<doc>
<str name="name">How to stuff</str>
<str name="desc">This time: Simply do it!</str></doc>
</result>
</lst>
</arr>
</lst>
</lst>
</response>
How do I acceive this?
Check Field collapsing feature in SOLR
Result Grouping / Field Collapsing
We're trying to use Solr to correct the spelling of certain tests in a search box. We found that it works like this:
http://localhost:8080/solr/collection1/spell?q=badspelled&spellcheck=true
And it returns a set of suggested terms. But what we need is not a list of suggestions but that Solr makes a search directly using the first suggestion. Is that possible?
You will need to add the "spellcheck.collate=true" parameter to your first search query and then use the "collation" value in the response to fire a second query with that value.
Example from the plugin page:
http://localhost:8983/solr/spell?q=price:[80 TO 100] delll ultrashar&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
This returns the suggestions:
<lst name="spellcheck">
<lst name="suggestions">
<lst name="delll">
<int name="numFound">1</int>
<int name="startOffset">18</int>
<int name="endOffset">23</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">dell</str>
<int name="freq">2</int>
</lst>
</arr>
</lst>
<lst name="ultrashar">
<int name="numFound">1</int>
<int name="startOffset">24</int>
<int name="endOffset">33</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">ultrasharp</str>
<int name="freq">2</int>
</lst>
</arr>
</lst>
<bool name="correctlySpelled">false</bool>
<str name="collation">price:[80 TO 100] dell ultrasharp</str>
</lst>
</lst>
Then fire another query with the suggested query:
http://localhost:8983/solr/spell?q=price:[80 TO 100] dell ultrasharp&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
SpellCheck collate
I am using highlighting feature of solr. Well its doing great except one thing. Here is the problem in response
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9</int>
</lst>
<result name="response" numFound="1" start="0" maxScore="0.6901834">
<doc>
<str name="desc">I study in school</str>
<str name="name">school</str>
<str name="value">DPS</str>
<str name="country">India</str>
<str name="state">delhi</str>
<str name="city">New Delhi</str>
<str name="area">R.K. Puram</str>
<str name="id">c02101a4-c5c2-46a9-bb73-805208167b3c</str>
<float name="score">0.6901834</float></doc>
</result>
<lst name="highlighting">
<lst name="c02101a4-c5c2-46a9-bb73-805208167b3c">
<arr name="name">
<str>school</str>
</arr>
<arr name="value">
<str><em>DP</em>S</str>
</arr>
</lst>
</lst>
</response>
In highlighting section , Why am I getting "name":"school" field ??? Even though it is not getting highlighted like "value":"DPS" ..
Thanks