How to search words with number and special characters in Solr - solr

After changing splitOnNumerics="0" I can search words with mixed number and normal character such as "90s", "omega30", etc but it is still not working with special characters like "80"", "40)", etc even I escaped them: 80\", 40\), etc. Do you have any idea?

Related

How to search word with and without special characters in Solr

We have used StandardTokenizerFactory in the solr. but we have faced issue when we have search without special character.
Like we have search "What’s the Score?" and its content special character. now we have only search with "Whats the Score" but we didn't get proper result. its
means search title with and without special character should we work.
Please suggest which Filter we need to use and satisfy both condition.
If you have a recent version of Solr, try adding to your analyzer chain solr.WordDelimiterGraphFilterFactory having catenateWords=1.
This starting from What's should create three tokens What, s and Whats.
Not sure if ' is in the list of characters used by filter to concatenate words, in any case you can add it using the parameter types="characters.txt"

Solr spellcheck polish characters

I would be more than grateful for information if sb was able to configure spellcheck in SOLR, so queries returns values when polish characters were replaced with unicoded?
I have spellcheck enabled however I am not getting any results when searching 'slub', while I am getting plenty for 'ślub'
Cheers
You should add an ASCIIFoldingFilterFactory in you spellchecking field configuration.
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false"/>
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.

Getting frequency for whitespace preserved word in SOLR spell suggestion

I am currently working with SOLR spellcheck feature. I am faced with a problem of not able to find the original frequency for the input when it has whitespaces.
For example,
spellcheck.q=aple returns me origFreq for the word 'aple'
However, when I input a text with spaces like bank of amarica, I am not getting the frequency of the whole word. Instead it is giving individual word's frequency. The suggestion for this is given via the collation in solr.
Is there a way to get the hits of the input entered with spaces, in this case bank of amarica?
SOLR handles multiple words a bit differently entirely depending on the setting of sp.query.extendedResults. If it's false then words with spaces are treated as a single token. If it's true then they are tokenized and treated as separate words. So try changing the core configuration. If this is not the case, post your config file.

How to search in solr with special character

I'm a newbie in solr and i would like to search with special character.
for example
id:123
data:it's
q=it'
then it will retain the result data:it's
thanks
Donquixote
The special character ' you have used in the query, q=data:it'* will give you the result.
But there are some special characters like ~^*(){}[]:\" and white space. For them you have to use escape character \ during query. Other special characters, available in keyboard can be searched as is.

How to index words with special character in Solr

I would like to index some words with special characters all together.
For example, given m&m, I would like to index it as a whole, rather than delimiting it as m and m (normally & would be considered as a delimiter).
Is there a way to achieve this by using standard tokenizer/filter or should I have to write one myself?
basically text field type filter out special characters before indexing. and you can use string type but it is not advisable for searching on it. you can use types option of WordDelimiterFilterFactory and you can convert those special characters to alphabetical
% => percent
& => and
A Standard Tokenizer factory splits/tokenizes the given text at special characters. To index with special characters you could either write your own custom tokenizer or you can do the following:
Take a list of characters, at which you want to tokenize/split the
text. For eg, my list is {" ",";"}.
Use a PatternTokenizer with the
above list of characters, instead of the StandardTokenizer. Your
configuration will look like:
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern=" |;" />
</analyzer>
you can use WhiteSpaceTokenizerFactory.
http://docs.lucidworks.com/display/solr/Tokenizers#Tokenizers-WhiteSpaceTokenizer
It will tokenize only on whitespaces. For example,
"m&m" will be considered as a single token and so it would indexed like that

Resources