Solr 6.6.0 Case Insensitive Query Not Working - solr

Solr 6.6.0 Case Insensitive Query Not Working.
I had tried all other option/answer available on internet.
I had tired with,
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
but its not working.
I had tired with,
<filter class="solr.LowerCaseFilterFactory"/>
but its not working.
I had tired many different way, but none working.
i.e I want same result searching with title_s:iPhone and title_s:iphone.
I am not sure what would cause problem.

If case insensitive search was not working in a Solr release, you would get much more noise than just one stack overflow question.
Let's use this question to illustrate the approach everyone should follow for basic Solr usage :
1) Refer to the documentation - Solr has a good free online documentation.
Specifically describing how to configure the schema.xml and the various aspects of it [1].
From there you can learn that is quite simple to configure a field to be case insensitive :
<field name="title" type="text_case_insensitive" indexed="true" stored="true"/>
<fieldType name="text_case_insensitive" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
N.B. if you had a previous configuration in the schema for the title field, you need to re-index
[1]https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html

I had tried in many different way, but none work.
Than I had implement as below and it work fine.
Let me know below method is correct or not, but works fine for me :)
I had remove below code from schema,
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>
And added (replace) below code,
<fieldType name="string" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Related

Solr Dynamic Fields with Custom field type not working properly

I have added a wild card dynamic field in my schema.xml
<dynamicField name="test_srch_*" type="filter_text" indexed="true" stored="true" multiValued="true"/>
<!-- Tokenized text for search -->
<fieldType name="filter_text" class="solr.TextField" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
The output in solr is not being lowercased and also individual token searches like BBB bbb CCC ccc are not working. It is only working with "BBB, CCC"
<arr name="test_srch_2">
<str>BBB, CCC</str>
</arr>
Can someone please suggest as to what I am doing wrong?
I forgot to restart the servers. My bad!! Any changes to schema.xml require a server restart for the changes to be available as part of the core. Case insensitive search is working fine now

solr highlighting: overlapped html tags

I HAVE to highlight html text returned from solr v6.6.2
original html stored in body_txt_en field in solr:
I have a strong <strong>TCL</strong> code
highlight /select parameters:
hl.q=have strong TCL
hl=on
hl.fl=*_txt_en
expected result:
I <em>have</em> a <em>strong</em> <strong><em>TCL<em/></strong> code
real result:
I <em>have</em> a <em>strong</em> <strong><em>TCL</strong><em/> code
As you can see - </em> appeared after </strong> that breaks html for large documents.
field configuration:
<dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/>
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
All other solr parameters/configs are default for version 6.6.2
Analysis on indexing gives:
Can't understand why after solr.StandardTokenizerFactory (ST) end - start = 12 for TCL keyword
Question:
How to solve this "wrong html tag order" issue?

Solr substring search yields all indexed results

To do a substring search, I have added a new fieldType - "Text" with NgramFilter.
It works fine perfectly but downside is this problem
Example
name = ['Apple','Samy','And','a']
When I do a search name:a, then all the above items gets pulled up. Even when search changes to "App". All the above items are pulled. How can I fix this issue?
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" />
</analyzer>
</fieldType>
As you can see in the analysis, both the indexed value and the query value gets parsed through the EdgeNGramFilter - meaning that it will match anything that is a substring of anything else. Add a simpler filter for querying the field, and you should be good to go.
The example from the Wiki should be usable by just copying and pasting it:
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
My initial guess was that since you don't provide two alternative definitions, Solr will use the same chain for both. Your analysis output confirms that suspicion. Try adding a analyser with type="query" to have a specific chain for querying the field (you do not want EdgeNGram both places).

Autocomplete in Solr with Case-insensitive feature

I have been trying out this autocomplete feature in Solr4.7.1 using Suggester.I have configured it to display phrase suggestions also.Problem is If I type "game" I get suggestions as "game" or phrases containing "game".
But If I type "Game" no suggestion is displayed at all.How can I get suggestions case-insensitive?
I have configured in schema.xml fields like this:
<fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
minShingleSize="2"
maxShingleSize="4"
outputUnigrams="true"
outputUnigramsIfNoShingles="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
what worked for me was tweaking a code from velocity file, head.vm, I changed 'terms.prefix': function() { return $("#q").val().toLowerCase();},
which solved my issue as I am using terms component for suggestions.
I tried the same schema in the Solr Admin Analysis view. You can provide the index and query value here to see how the tokens are matched.
For your schema, I tried it in my local solr instance, it seems to work fine. ie., the Game and game are considered equal and matched.
I would urge you to post the request query, and/or provide the Suggester configurations (if you are using the same).

KeywordTokenizerFactory with LowerCaseFilterFactory

I wanted to use a NGramFilterFactory for my index and saw following example and tryed it out:
<fieldType name="NGramText" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="mark" type="NGramText" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
The example is using KeywordTokenizerFactory. What is the purpose of using this? From what i understand it really do not do anythig, " the entire
input string is preserved as a single token " it says on the net.
Is there a good reason to use KeywordTokenizerFactory to make Ngrams or could i change it for WhitespaceTokenizerFactory with out slowing down the searches?
And also with this example LowerCaseFilterFactory is not making the fields lowercase could that has something to do with the conjunction with KeywordTokenizerFactory?

Resources