I want to use transformations by replacements on my extracted/indexed values of few copied fields with Solr.
In my schema.xml, I have the source_file field who contains a value. I copy this value into the collection field. Then I use PatternReplaceCharFilterFactory (regex / replacement) to modify this value as I want :
<field name="source_file" type="string" indexed="true" stored="true" multiValued="false" docValues="false"/>
<field name="collection" type="collectionType" indexed="true" stored="true" multiValued="false"/>
<copyField source="source_file" dest="collection"/>
<fieldType name="collectionType" class="solr.TextField" stored="true" indexed="true">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="A" replacement="B"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="A" replacement="B"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
EDIT 1
In Solr webapp, when I use the Analysis interface, I see that the pattern replacement works for Index & Query forms.
My source_file value is modified by the setted value. Which means that the definition of my schema seems correct. A becomes B.
But, when I use a Solr Query, I see that the pattern replacement not works. No change is taken into account (I always have the value A).
http://xxx:898x/solr/discovery_collection/select?fl=collection&indent=on&q=*:*&wt=json
The result shows that the value of collection field is not modify by the replacement pattern when I use a query.
I've done all my configurations into schema.xml. Nothing into solrconfig.xml.
Would I have forgotten a configuration (into solrconfig.xml ?) to force the indexing of modified fields?
EDIT 2
My request handlers in solrconf.xml :
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
<str name="defType">edismax</str>
<str name="timeAllowed">30000</str>
<bool name="preferLocalShards">false</bool>
</lst>
</requestHandler>
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="timeAllowed">30000</str>
</lst>
</requestHandler>
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
EDIT 3
I've just seen that it works. But in fact the value "B" is stored/indexed and I can search it by query, but the value "B" is not displayed in the json result of the query (it displays the value "A").
Do have I to use custuom updateRequestProcessorChain and requestHandler in solrconfig.xml? And add a custom class in lib/ to do the pattern transformation?
Related
Goal
Trying to implement an auto-suggester in Solr. Fields to extra suggestions from are title and content fields.
Progress thus far
I followed the official Solr guide to implement the feature, however, was stuck for a long time, as it was complaining that the custom field suggestType was not defined.
After a long time of trying I decided to add the field type to managed-schema.xml instead of schema.xml and it worked!
Thus far, it only worked when I based the suggestion field off content, however, we would like to use 2 fields to base suggestions of which is title and content.
Steps followed
1) Add custom field type in managed-schema xml:
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
2) Add custom field which uses custom field type in schema.xml:
<field name="suggestText" type="suggestType" stored="true" indexed="true" />
3) Add 'suggest' handler in solr-config.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">fuzzy_suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggestText</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">analyzingSuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">10</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
4) Copy both fields 'title' and 'content' to 'suggestText' in schema.xml:
<copyField source="title" dest="suggestField"/>
<copyField source="content" dest="suggestField"/>
Questions
Why does it only work when I add the custom field type to managed-schema.xml instead of schema.xml? From my understanding, managed-schema.xml should not be manually edited.
No results seem to appear when after I map both title and content field to the custom field textSuggest. I would like to know what I am missing.
Thanks.
It seems like you have a typo in your copy-field definition. The "dest" attribute is suggestField but the field you created earlier is called suggestText.
I've tried everything under the sun (well it is called solr after all) to make solr Suggest case-insensitive, but it stubbornly continues to be case-sensitive.
This returns a suggestion of Mexican:
http://localhost:8983/solr/mycollection/autocomplete?suggest.q=Mex
This returns 0 results:
http://localhost:8983/solr/mycollection/autocomplete?suggest.q=mex
To further diagnose I tried a lower case /select search against my suggestions field, which successfully returned docs containing "Mexican":
http://localhost:8983/solr/mycollection/select?q=suggestions:mex*
But no such luck using lowercase with the Suggester. It's as though my <filter class="solr.LowerCaseFilterFactory"/> has no effect when used by the Suggester.
I of course did a full config upload, collection reload, data re-index, and suggester rebuild before testing. I'm on SOLR 6.4.1 running in cloud mode. Any ideas? Diagnostic tips?
schema.xml
<fieldType name="textSuggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="recipe" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="suggestions" type="textSuggest" indexed="true" stored="true" multiValued="true" />
<copyField source="recipe" dest="suggestions"/>
solrconfig.xml
<searchComponent class="solr.SuggestComponent" name="suggest">
<lst name="suggester">
<str name="name">foodsuggester</str>
<str name="lookupImpl">WFSTLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggestions</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="storeDir">suggester_wfst_dir</str>
<str name="suggestAnalyzerFieldType">textSuggest</str>
</lst>
</searchComponent>
<requestHandler name="/autocomplete" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">foodsuggester</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The WFSTLookupFactory apparently does not take the suggestAnalyzerFieldType parameter and it is ignored. You could use the AnalyzingLookupFactory, which will analyze the text according to the suggestAnalyzerFieldType. So if you only want the lower case to be analyzed in the suggester you can use the suggestAnalzerFieldType, and indicate that you want to use the suggestText field type for analysis through the suggestAnalyzerFieldType.
It seems the WFSTLookupFactory lookup implmentation is case sensitive.
You can use FuzzyLookupFactory, if you don't have any specific reason for using WFSTLookupFactory.
<str name="lookupImpl">FuzzyLookupFactory</str>
I am developing a web application, and am using Solr as search engine. I would like to add autocomplete functionality. To do this, I have added the Suggester component, and configured a separate field for it. This works ok.
The problem is that Suggester returns the whole value of the field. For example, if the name of an article is "A newsworthy item" and I search for "new", it will return the whole "A newsworthy item", where I would like it to just return "newsworthy". In other words, return the individual word tokens.
The schema looks like this:
<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="term" type="text_autocomplete" indexed="true" stored="true" multiValued="false" />
<field name="weight" type="float" indexed="true" stored="true" />
<copyField source="name" dest="term"/>
The values are copied into the "term" field. The Solr config:
<!-- Search component -->
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggester</str>
<str name="lookupImpl">AnalyzingLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">text_autocomplete</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<!-- Search handler -->
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggester</str>
<str name="suggest.build">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Can anyone suggest a schema and/or configuration that will make the Suggester return a single word?
Instead of solr.SuggestComponent try making use of solr.SpellCheckComponent.
As SuggestComponent is meant to suggest the full phrase.
You can look into the details of solr.SpellCheckComponent over here.
http://wiki.apache.org/solr/SpellCheckComponent
For you quick reference, you can try with this.
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="field">term</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
</searchComponent>
I am working with solr auto complete functionality,I am using solr 4.50 to build my application, and I am following this link as a reference. My suggest component is something like this
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="storeDir">suggest</str>
<str name="field">autocomplete_text</str>
<bool name="exactMatchFirst">true</bool>
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
</lst>
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="field">lowerfilt</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
<str name="queryAnalyzerFieldType">edgytext</str>
</searchComponent>
but, I am getting the following error
org.apache.solr.spelling.suggest.Suggester – Loading stored lookup data failed
java.io.FileNotFoundException: /home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat (No such file or directory)
It says that some file are missing but the solr wiki suggester component says it supports these lookupImpls --
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<!-- Alternatives to lookupImpl:
org.apache.solr.spelling.suggest.fst.FSTLookup [finite state automaton]
org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted finite state automaton]
org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default, jaspell-based]
org.apache.solr.spelling.suggest.tst.TSTLookup [ternary trees]
-->
Dont know what I am doing wrong..... Any help will be deeply appreciated
I was able to get the autosuggest functionality working by using the Solr Term Component
Add term components in your solrconfig.xml like this
<searchComponent name="terms" class="solr.TermsComponent"/>
<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
define a field type for your autosuggest text in schema.xml
<fieldType name="edgytext" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
add fields in schema.xml like this
<field name="name" type="edgytext" indexed="true" stored="true" />
<field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
<copyField source="name" dest="autocomplete_text"/>
Now the most important step... Remove all the folders from your index directory
(can be found in solrconfig.xml ,.. look for <dataDir> tag)
Restart the solr. and reindex your data. You will se new folders created in your index directory.
You can check the auto suggest working by hitting the url -
http://127.0.0.1:8983/solr/your_core/terms?terms.fl=autocomplete_text&omitHeader=true&terms.limit=20&terms.sort=index&terms.regex=(.*)your_query(.*)
I am working with Solr Spell Check . Got it up and running . However for certain misspells it is not giving the expected result :
Correct Word : Cancer
Incorrect Spelling : Cacner ,cacnar , cancar ,cancre,cancere .
I am not getting "cancer" as the suggestion for "cacnar" instead it shows "inner" which although sounds more like cacner is not the correct suggestion . And for cacnar again I am getting a suggestion as 'pulmonary'.
Any way of configuring it to display cancer instead of the other results ?
Alternatively is there any score for the suggestions that can be referred to before showing it to the user ?
As per request here is the configuration :
The field used for dictionary (in schema.xml):
<copyField source="procname" dest="dtextspell" />
<field name = "dtextspell" stored="false" type="text_small" multiValued="true" indexed="true"/>
Definition of "text_small" (again in schema.xml) :
<fieldType name="text_small" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type ="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
In solrconfig.xml :
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_small</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">dtextspell</str>
<float name="thresholdTokenFrequency">.0001</float>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="field">name</str>
<str name="buildOnCommit">true</str>
</lst></searchComponent>
Attached it to the select request handler like this :
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="spellcheck.count">10</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr> </requestHandler>
To build the spell check :
http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.build=true
To search for term :
http://localhost:8080/solr/select?q=procname:%22cacner%22&spellcheck=true&defType=edismax
The response XML :
<lst name="spellcheck"><lst name="suggestions">
<lst name="cacner">
<int name="numFound">1</int>
<int name="startOffset">10</int>
<int name="endOffset">16</int>
<arr name="suggestion">
<str>inner</str> <end tags start from here>
Hope it helps !!
Sounds like you've not rebuilt the spellchecker's index recently. Request a manual update by make a query with spellcheck=true&spellcheck.build=true appended to the query string (do NOT do this on every request, as the build process can take some time). You should also make sure that you're using the correct field to build your spellchecker's index.
You can also configure the spellchecker component to rebuild the index on every commit or on every optimize, by adding:
<str name="buildOnCommit">true</str>
or
<str name="buildOnOptimize">true</str>
to your spellchecker configuration.