KeywordTokenizerFactory with LowerCaseFilterFactory - solr

I wanted to use a NGramFilterFactory for my index and saw following example and tryed it out:
<fieldType name="NGramText" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="mark" type="NGramText" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
The example is using KeywordTokenizerFactory. What is the purpose of using this? From what i understand it really do not do anythig, " the entire
input string is preserved as a single token " it says on the net.
Is there a good reason to use KeywordTokenizerFactory to make Ngrams or could i change it for WhitespaceTokenizerFactory with out slowing down the searches?
And also with this example LowerCaseFilterFactory is not making the fields lowercase could that has something to do with the conjunction with KeywordTokenizerFactory?

Related

Solr Dynamic Fields with Custom field type not working properly

I have added a wild card dynamic field in my schema.xml
<dynamicField name="test_srch_*" type="filter_text" indexed="true" stored="true" multiValued="true"/>
<!-- Tokenized text for search -->
<fieldType name="filter_text" class="solr.TextField" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
The output in solr is not being lowercased and also individual token searches like BBB bbb CCC ccc are not working. It is only working with "BBB, CCC"
<arr name="test_srch_2">
<str>BBB, CCC</str>
</arr>
Can someone please suggest as to what I am doing wrong?
I forgot to restart the servers. My bad!! Any changes to schema.xml require a server restart for the changes to be available as part of the core. Case insensitive search is working fine now

Tokenizer with lower case filter not working

<fieldType name="keyword" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Filed definition:
<field name="name" type="keyword" indexed="true" stored="true"/>
I have a data where value for the above field is APPLE-INC
I expect this to be searched when I do apple-inc. It is not happening.
Any thoughts?
I have added the below field type in the schema file.
<fieldType name="keyword" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Here is the output getting on the analysis page.
Please refer the screenshot.

Solr 6.6.0 Case Insensitive Query Not Working

Solr 6.6.0 Case Insensitive Query Not Working.
I had tried all other option/answer available on internet.
I had tired with,
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
but its not working.
I had tired with,
<filter class="solr.LowerCaseFilterFactory"/>
but its not working.
I had tired many different way, but none working.
i.e I want same result searching with title_s:iPhone and title_s:iphone.
I am not sure what would cause problem.
If case insensitive search was not working in a Solr release, you would get much more noise than just one stack overflow question.
Let's use this question to illustrate the approach everyone should follow for basic Solr usage :
1) Refer to the documentation - Solr has a good free online documentation.
Specifically describing how to configure the schema.xml and the various aspects of it [1].
From there you can learn that is quite simple to configure a field to be case insensitive :
<field name="title" type="text_case_insensitive" indexed="true" stored="true"/>
<fieldType name="text_case_insensitive" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
N.B. if you had a previous configuration in the schema for the title field, you need to re-index
[1]https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html
I had tried in many different way, but none work.
Than I had implement as below and it work fine.
Let me know below method is correct or not, but works fine for me :)
I had remove below code from schema,
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>
And added (replace) below code,
<fieldType name="string" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Solr tokenizer for search

I have defined a new field type in Solr for a auto suggest,
<fieldType name="auto_text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
now if I search for a particular field for example
/solr/select?q=ree
Im able to get the response like "reebok shirt" but not able to fetch the records like "white reebok shirt", should I add any other tokenizer to acheive the same???
See wiki. KeywordTokenizerFactory does this: Treats the entire field as a single token, regardless of its content. Use WhitespaceTokenizerFactory instead.

Solr filter factory syntax not working

So I am attempting to have a custom field in my Solr schema that is filtered and processed a certain way but it doesn't seem to be working.
<fieldType name="removeWhitespace" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="\s" replacement="" replace="all" />
</analyzer>
</fieldType>
<field name="whiteSpaceRmved" type="removeWhitespace" stored="true" indexed="true"/>
<copyField source="original" dest="whiteSpaceRmved"/>
Basically, if I have a field like,
Hello World
I want to have that field, and a new field name that looks like,
HelloWorld
But when I try it, it copies the field, but doesn't change it in any way. Any ideas?
You need to move the tokenizer <tokenizer class="solr.StandardTokenizerFactory" />to the end of your analyzer chain. Currently, it is breaking the field values into tokens before you are removing whitespace. And actually since you are removing whitespace, you might not even need a tokenizer, since it looks like you want to store the values as strings really.
You should use KeywordTokenizer, which does no actual tokenizing, so the entire input string is preserved as a single token
<fieldType name="removeWhitespace" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="(\s)" replacement="" replace="all"
/>
</analyzer>
</fieldType>

Resources