I have some docs in Solr that contains information about books. One of the fields is author, defined as:
<field name="author" type="text_general" indexed="true" stored="true"/>
This is an example of a doc:
<doc>
<str name="id">db04</str>
<str name="isbn">0596529325</str>
<str name="author">Toby Segaran</str>
<str name="category">Computers/Programming/Information Retrieval/Machine Learning</str>
<arr name="title">
<str>Programming Collective Intelligence</str>
</arr>
<int name="yearpub">2007</int>
<date name="pubdate">2007-07-28T00:00:01Z</date>
</doc>
I'm trying to create a autocomplete system using Solr 4.2. So far it worked well, if I search for to it returns me Toby Segaran as the result.
But in our website many people searches for Segaran for instance and I was wondering if is it possible to somehow suggest Toby Segaran when this happens.
So far this is the schema.xml I'm using:
<field name="author_suggest" type="text_auto" indexed="true" stored="true" multiValued="false"/>
<copyField source="author" dest="author_suggest"/>
<fieldType class="solr.TextField" name="text_auto">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Basically the field author is processed and copied to author_suggest.
In solrconfig.xml, these were created:
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
<str name="field">author_suggest</str>
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">6</str>
<str name="spellcheck.collate">false</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
So is this possible to somehow make suggestions based on words that are not exactly at the beginning of the phrase using the suggester from Solr?
If you need more information please let me know.
Thanks in advance
Related
I am developing a web application, and am using Solr as search engine. I would like to add autocomplete functionality. To do this, I have added the Suggester component, and configured a separate field for it. This works ok.
The problem is that Suggester returns the whole value of the field. For example, if the name of an article is "A newsworthy item" and I search for "new", it will return the whole "A newsworthy item", where I would like it to just return "newsworthy". In other words, return the individual word tokens.
The schema looks like this:
<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="term" type="text_autocomplete" indexed="true" stored="true" multiValued="false" />
<field name="weight" type="float" indexed="true" stored="true" />
<copyField source="name" dest="term"/>
The values are copied into the "term" field. The Solr config:
<!-- Search component -->
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggester</str>
<str name="lookupImpl">AnalyzingLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">text_autocomplete</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<!-- Search handler -->
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggester</str>
<str name="suggest.build">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Can anyone suggest a schema and/or configuration that will make the Suggester return a single word?
Instead of solr.SuggestComponent try making use of solr.SpellCheckComponent.
As SuggestComponent is meant to suggest the full phrase.
You can look into the details of solr.SpellCheckComponent over here.
http://wiki.apache.org/solr/SpellCheckComponent
For you quick reference, you can try with this.
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="field">term</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
</searchComponent>
I have this configuration (with solr 5.3.1):
<searchComponent class="solr.SuggestComponent" name="suggest">
<lst name="suggester">
<str name="name">suggest</str>
<str name="storeDir">dict_suggest</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="highlight">false</str>
<str name="field">suggestion</str>
<str name="suggestAnalyzerFieldType">suggest</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="payloadField">id</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="suggest">true</str>
<str name="suggest.dictionary">suggest</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The field in schema.xml is defined as <field name="suggestion" type="suggest" indexed="true" stored="true" required="true" multiValued="true" />.
The field type definition is this:
<fieldType name="suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Each time I try to build the index, solr shows Store Lookup build failed
There's no dump or description in the logs.
Am I missing something in the config? The suggester seems to work fine, so the "in memory" index works fine.
Thanks
I am working with solr auto complete functionality,I am using solr 4.50 to build my application, and I am following this link as a reference. My suggest component is something like this
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="storeDir">suggest</str>
<str name="field">autocomplete_text</str>
<bool name="exactMatchFirst">true</bool>
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
</lst>
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="field">lowerfilt</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
<str name="queryAnalyzerFieldType">edgytext</str>
</searchComponent>
but, I am getting the following error
org.apache.solr.spelling.suggest.Suggester – Loading stored lookup data failed
java.io.FileNotFoundException: /home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat (No such file or directory)
It says that some file are missing but the solr wiki suggester component says it supports these lookupImpls --
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<!-- Alternatives to lookupImpl:
org.apache.solr.spelling.suggest.fst.FSTLookup [finite state automaton]
org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted finite state automaton]
org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default, jaspell-based]
org.apache.solr.spelling.suggest.tst.TSTLookup [ternary trees]
-->
Dont know what I am doing wrong..... Any help will be deeply appreciated
I was able to get the autosuggest functionality working by using the Solr Term Component
Add term components in your solrconfig.xml like this
<searchComponent name="terms" class="solr.TermsComponent"/>
<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
define a field type for your autosuggest text in schema.xml
<fieldType name="edgytext" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
add fields in schema.xml like this
<field name="name" type="edgytext" indexed="true" stored="true" />
<field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
<copyField source="name" dest="autocomplete_text"/>
Now the most important step... Remove all the folders from your index directory
(can be found in solrconfig.xml ,.. look for <dataDir> tag)
Restart the solr. and reindex your data. You will se new folders created in your index directory.
You can check the auto suggest working by hitting the url -
http://127.0.0.1:8983/solr/your_core/terms?terms.fl=autocomplete_text&omitHeader=true&terms.limit=20&terms.sort=index&terms.regex=(.*)your_query(.*)
I am using the solr suggestion component with the following configuration:
schema.xml
<fieldType name="textSpell" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="image_memo" type="text_general"/>
<field name="username" type="text_general"/>
<field name="image_memo" type="text_general"/>
<field name="image_text" type="text_general"/>
<!-- More fields included here -->
<field name="spell" type="textSpell" indexed="true" stored="true" multiValued="true"/>
<copyField source="*" dest="spell"/>
solrconfig.xml
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">spell</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">6</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">6</str>
<str name="spellcheck.maxCollationTries">1000</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.collateParam.mm">100%</str>
</lst>
<arr name="components">
<str>suggest</str>
<str>query</str>
</arr>
</requestHandler>
As you can see there is a field spell wich i am using for a suggestion queries.
This works great even for multiple term queries.
But what I need is to search on selected fields only.
So for example I want valid suggestions only for the fields image_memo and username
The user can dynamicly add and remove fields to search.
I know that I could do something like this:
q=(image_memo:*search* OR image_username:*search*)
But this is is slowing down dramtically if you got a lot of fields and a multiple term query.
Example: Searching in Field memo, username, field, field1 and field2 for term, term1 and term2.
((memo:term OR username:term OR field:term OR field1:term OR
field2:term) AND (memo:term1 OR username:term1 OR field:term1
OR field1:term1 OR field2:term1) AND (memo:term2 OR
username:term2 OR field:term2 OR field1:term2 OR
field2:term2))
Is there any way to dynamically select the spell fields. Or is there a way that I can search for specific fields only in a multivalued field
I am using Apach Solr 4 Alpha.
All you need to do is use Dismax or eDismax. SpellcheckComponent automatically runs every suggestion using your query params.
So, you have to query like this:
/suggest?q={!dismax}term1 term2 term3&qf=memo username field field1 field2
or
/suggest?q=term1 term2 term3&defType=dismax&qf=memo username field field1 field2
You can implement your custom queryparser if you don't want to use (e)dismax.
I'm trying to configure a spellchecker to autocomplete full sentences from my query.
I've already been able to get this results:
"american israel" :
-> "american something"
-> "israel something"
But i want :
"american israel" :
-> "american israel something"
This is my solrconfig.xml :
<searchComponent name="suggest_full" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">suggestTextFull</str>
<lst name="spellchecker">
<str name="name">suggest_full</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">text_suggest_full</str>
<str name="fieldType">suggestTextFull</str>
</lst>
</searchComponent>
<requestHandler name="/suggest_full" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest_full</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.onlyMorePopular">true</str>
</lst>
<arr name="last-components">
<str>suggest_full</str>
</arr>
</requestHandler>
And this is my schema.xml:
<fieldType name="suggestTextFull" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
...
<field name="text_suggest_full" type="suggestTextFull" indexed="true" stored="false" multiValued="true"/>
I've read somewhere that I have to use spellcheck.q because q use the WhitespaceAnalyzer, but when I use spellcheck.q i get a java.lang.NullPointerException
Any ideas ?
If you spellcheck fields ( text_suggest_full ) contain american something and israel something so make sure, that there also exist an document/entry , with the value american israel something.
Solr will not merge american something and israel something to one term and will not apply the result to your spellchecking for american israel.
Wouldnt be there an autocomplete approach more suitable? See this article e.g.
You can use the suggester / a flexible "autocomplete" component;
you must have version 3.X of solr
SolrConfig.xml :
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">name_autocomplete</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Shema.xml
<field name="name_autocomplete" type="text" indexed="true" stored="true" multiValued="false" />
Add copyField
<copyField source="name" dest="name_autocomplete" />
Reload solr, reindex all and test :
http://localhost:8983/solr/suggest?q=&amerspellcheck=true&spellcheck.collate=true&spellcheck.build=true
Get something like :
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="ameri">
<int name="numFound">2</int>
<int name="startOffset">0</int>
<int name="endOffset">2</int>
<arr name="suggestion">
<str>american morocco</str>
<str>american morocco something</str>
</arr>
</lst>
<str name="collation">american morocco something</str>
</lst>
</lst>
</response>
Hope that help
Cheers
IMHO, a problem with the spellcheck component is that each word is spell checked against the full index.
The "collation" of the spell checked words does not neccesary match an single document within the index, but might come from separate indexed documents.