Lucene / SOLR term to number range proximity search - solr

I am using SOLR 4.9.0 with the following configuration (I am including only the part I consider relevant to the question):
<field name="content" type="text" indexed="true" stored="false"
termVectors="true" multiValued="false" />
<fieldType name="text" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
I can do proximity search for a term being close to another term:
content:"very suggestion"~100
I need to add the functionality of being able to search for a term being close to a number token, such as in:
content:"very [0.01 TO 0.99]"~100
content:"very [100 TO 1000000]"~100
Is there a tokenizer that already provides this functionality?
If not, what would roughly be the steps in order to adapt the standard tokenizer to be able to do that?
Any speculations on what the effect on the index structure, size, and indexing/searching speed would be?
EDIT:
I think that the following SOLR configuration is actually also relevant to my question:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">id</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="fl">* score</str>
</lst>
</requestHandler>

More than two years later, I found the answer to my question :)
By using the
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-ComplexPhraseQueryParser
one can do:
{!complexphrase inOrder=false}content:"fee [100 10000]"~10

Related

Solr Suggester returns 0 results when context language is en-AU

I have a list of product pages for one of my Australia sites where the content is in 2 language versions:
EN
en-AU
I have a search suggestion box where I am trying to populate few of the title fields through a computed field named as autosuggestiontitle_sm
Here's my suggester component defined in solrconfig.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">**autosuggestiontitle_sm**</str>
<str name="contextField">**_contextLanguage**</str>
<str name="suggestAnalyzerFieldType">**text_suggester**</str>
<str name="buildOnStartup">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">mySuggester</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Since my suggestAnalyzerFieldType is a custom field, I have included the below entries in managed-schema file as below:
<fieldType **name="text_suggester"** class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
And have added the 2 custom fields by defining the type as text_suggester:
<field name="autosuggestiontitle_sm" type="text_suggester" multiValued="true" indexed="true" stored="true"/>
<field name="_contextLanguage" type="text_suggester" multiValued="false" indexed="true" stored="true"/>
Since _language is a string type I have defined a custom field name as _contextLanguage of type text_suggester so added the below **copyField **entry:
Then, I did restart my solr server and re-indexed my custom index pertaining to my website context.
Now my search term is "fit".
Scenario 1 Query: https://localhost:8983/solr/custom_master_index/suggest?q=fit
Result is as expected which is picking 7 results where "fit" terms appears in title text from both EN and en-AU versions
Scenario 2 Query: https://localhost:8983/solr/custom_master_index/suggest?q=fit&suggest=true&suggest.cfq=en
Result is as expected which is picking 2 results where "fit" terms appears in title text from EN content.
But the issue is that when I query with en-AU which is my current context language of my Australia site, the result is either 0 or at time I see the EN results.
(Issue)Scenario 3 Query: https://localhost:8983/solr/custom_master_index/suggest?q=fit&suggest=true&suggest.cfq=en-AU
Note: I have tried to run the query with different values like suggest.cfq=en-au, suggest.cfq=au (nothing helped)
Can someone help me understand what is being missed so that en-AU contextField is not querying the right values.
Thanks in advance!

SOLR Proximity Search setting

I have some address data that I need to search. I am struggling a bit with the proximity search.
An eg. of address that I am trying to search is:
CATO STREET WEST LAUNCESTON TAS
and my search query for proximity search doesn't return anything when I try to search for (CATO WEST)~2
The configuration for the data field (schema.xml) is as follows:
<field name="street_name_space" type="text_general" indexed="true" stored="true"/>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Request handler is as follows:
<requestHandler name="/proximity" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<str name="qf">street_name_space</str>
<str name="qs">10</str>
<str name="pf">street_name_space</str>
<str name="ps">10</str>
<str name="echoParams">explicit</str>
<str name="fl">street_name, street_name_clean, street_name_space</str>
</lst>
</requestHandler>
Any idea what I shall be doing to get the results?
the KeywordTokenizerFactory you are using keeps the whole thing as a single term, so the only term indexed is 'cato street west launceston tas'. Of course this does not match your query.
Use some other tokenizer, like the WhitespaceTokenizerFactory and it should work

Solr 6.5.1 search in order first full match second left to right match

I am using lucene for indexing and solr for searching and having below requirements
example: “Test Five”
Highest priority - Words having both “Test” and “Five”
Next - Words having only left most word “Test”
Next – Next word in the list –“Five” (left to right) etc etc
Please find my schema
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="acSearch" type="searchFieldType" required="false" indexed="true" stored="false" multiValued="true" />
<copyField source="name" dest="acSearch" />
<fieldType name="searchFieldType" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[,]+" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[,]+" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer></fieldType>
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="wt">json</str>
<str name="indent">true</str>
<str name="fl">name</str>
<str name="rows">200</str>
<str name="df">dySearch</str>
<str name="sort">score desc</str>
</lst>
<arr name="components">
<str>query</str>
</arr>
I am not getting proper out put if i search
1.Test Five in response Test Five is coming first next not from left to right in any order data is coming
If i search Five Test related data is coming first .
Please could you help ....
you can just give a different boost to each of the terms when you build your query string:
q=Test^10 Five^3 last^1
With this, you don't need to mess with ngram etc, use just some standard analyzer

How to exclude specific Solr spellchecking results

We got the problem that we get spellchecking results that are technically correct but not suitable for the context of the input term.
For example the user searches for "ventilator" and the spellchecker returns "vibrator" as the corrected term.
We could remove the value "vibrator" from the possible results but if someone misspells "vibrator" we should return the corrected term.
Is it possible to exclude specific mappings (e.g. "ventilator" > "vibrator")?
The current config:
solrconfig.xml:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_spell</str>
<lst name="spellchecker">
<str name="name">de</str>
<str name="field">spellcheck_de</str>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
</lst>
And the Field config from schema.xml:
<fieldType name="text_spell_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
you can stuff like 'exclude terms that are less frequent than X' on the index, and the like. But if you want to 'exclude term X when serving suggestions for term Y only' then no, you can't.

solr suggester not returning any results

I've followed the solr wiki article for suggester almost to the T here: http://wiki.apache.org/solr/Suggester. I have the following xml in my solrconfig.xml:
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">description</str>
<float name="threshold">0.05</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
However, when I run the following query (or something similar):
../suggest/?q=barbequ
I only get the following result xml back:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">78</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions"/>
</lst>
</response>
As you can see, this isn't very helpful. Any suggestions to help resolve this?
A couple of things I can think of that might cause this problem:
The source field ("description") is incorrect - ensure that this is indeed the field that seeds terms for your spell checker. It could even be that the field is a different case (eg. "Description" instead of "description").
The source field in your schema.xml is not set up correctly or is being processed by filters that cause the source dictionary to be invalid. I use a separate field to seed the dictionary, and use <copyfield /> to copy relevant other fields to that.
The term "barbeque" doesn't appear in at least 5% of records (you've indicated this requirement by including <float name="threshold">0.05</float>) and therefore is not included in the lookup dictionary
In SpellCheckComponent the <str name="spellcheck.onlyMorePopular">true</str> setting means that only terms that would produce more results are returned as suggestions. According to the Suggester documentation this has a different function (sorting suggestions by weight) but it might be worth switching this to false to see if it is causing the issue.
Relevant parts of my schema.xml:
<schema>
<types>
<!-- Field type specifically for spell checking -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StandardFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StandardFilterFactory" />
</analyzer>
</fieldType>
</types>
<fields>
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
</fields>
<!-- Copy fields which are used to seed the spell checker -->
<copyField source="name" dest="spell" />
<copyField source="description" dest="spell" />
<schema>
Could the problem be that you're querying /suggest instead of /spell
../suggest/?q=barbequ
In my setup this the string I pass in:
/solr/spell?q=barbequ&spellcheck=true&spellcheck.collate=true
And the first time you do a spellcheck you need to include
&spellcheck.build=true
I'm running on solr 4 btw. So, perhaps /suggest is an entirely different endpoint that does something else. If so, apologize.
Please check, if the term-parameter are set in the schema.xml, like:
<field name="TEXT" type="text_en" indexed="true" stored="true" multiValued="true"
termVectors="true"
termPositions="true"
termOffsets="true"/>
...restart solr and reindex again

Resources