Solr lucene search only first word in a sentence - need query - solr

I am searching data which matches only that sentence which starts with the search key.
Ex:
search key "what"
result : DESIRED ONE
**what** is your name
**what** are you doing
**what** is that
etc.
How i am getting now is
Is that **what** you want
some text before **what**
etc.
i am using EdgeNGram as well..But it is giving me the second one.
Any help appreciated....

You can use a regex 'starts with query' q:(name:/what.*/) will yield results where the name field start with 'what'

I think you're using the WhitespaceTokenizer. You should try using the KeywordTokenizer with EdgeNGram from the left end.
If you're trying to implement auto suggest, have a look at the Suggester component.

I think you can use the following trick: add the following line to your field definition:
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="^(.*)$" replacement="AAAA $1" />
This will add the "AAAA" token to both your index and query. This way you can match the beginning of an indexed string.
More info here:
http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/
example: https://github.com/billdueber/solr_stupid_tricks/blob/SST4/solr/conf/schema.xml (search for "text_l")
http://blogs.perl.org/users/mark_leighton_fisher/2012/01/stupid-lucene-tricks-exact-match-starts-with-ends-with.html

Related

Precise expression or word solr match

I am looking for way to match a very specific expression or word in my solr collection.
Here is an example :
I want the query to return me :
"Paris"
And not : "Paris is great"
And not : "I like Paris"
Thanks :)
If you only want exact matches, make sure the field type is defined as string. A string field will not do any tokenization or use any filters, and will only generate hits when the query is exactly the same as the value indexed.
You need to use KeywordTokenizer
This tokenizer treats the entire text field as a single token
https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-KeywordTokenizer

SOLR: Fuzzy search on a text field with spaces

Here's my problem: I have a single text field that is indexed by SOLR, which is the usernames from our database. I'd like the search to be fuzzy and not an exact match. Eg; if the username is "krishnarayaprolu" and I search with a spelling mistake "krishnIrayaprolu", it should still return the record.
This is working fine for me except when the usernames have a space in them. So a username: "krishna rayaprolu", and a search string "krishnI rayaprolu~0.5" is not returning the record. It is returning fine if the spelling mistake is at the end like "krishna rayaprolI~0.5". Any ideas?
For my config, I tried WhiteSpaceTokenizerFactory and StandardTokenizerFactory. On the search side, I tried quotes and escaping the space. None of them helped with my space+fuzziness problem. I'm using the admin interface for searching. Appreciate any pointers.
I have solution for your problem, only need to add some fields in your schema.
Create new ngram field and copy all you title name in ngram field.
When you fire any query for missspell word and you get blank result then split
the word and again fire the same query you will get results as expected.
Example : Suppose user searching for word "krishna rayaprolu" but type it as "krishnI rayaprolu~0.5", then
create query in below way you will get results as expected hopefully.
**(ngram:"krishnI rayaprolu~0.5" OR ngram:"kri" OR ngram:"kris" OR ngram:"krish" OR ngram:"krishn" OR ngram:"krishnI" OR ngram:"ray" OR ngram:"raya" OR ngram:"rayap" ..... )**
We have split the word sequence wise and fire query on field ngram.
Hope it will help you.

substring match in solr query

I have a requirment where I have to match a substring in a query .
e.g if the field has value :
PREFIXabcSUFFIX
I have to create a query which matches abc. I always know the length of the prefix.
I can not use EdgeNgram and Ngram because of the space constraints.(As they will create more indexes.)
So i need to do this on query time and not on index time. Using a wildcard as prefix something like *abc* will have high impact on performance .
Since I will know the length of the prefix I am hoping to have some way where I can do something like ....abc* where dots represents the exact length of the prefix so that the query is not as bad as searching for the whole index as in the case of wild card query (*abc*).
Is this possible in solr ? Thanks for your time .
Solr version : 4.10
Sure, Wildcard syntax is documented here, you could search something like ????abc*. You could also use a regex query.
However, the performance benefit from this over *abc* will be very small. It will still have to perform a sequential search over the whole index. But if there is no way you can improve your analysis to support your search needs, there may be no getting around that (GIGO).
You could use the RegularExpressionPatternTokenizer for this. For the sample below I guessed that the length of your prefix is 6. Your example text PREFIXabcSUFFIX would become abcSUFFIX. This way you may search for abc*
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern=".{6}(.+)" group="1"/>
</analyzer>
About the Tokenizer:
This tokenizer uses a Java regular expression to break the input text stream into tokens. The expression provided by the pattern argument can be interpreted either as a delimiter that separates tokens, or to match patterns that should be extracted from the text as tokens.

how to do solr search including sepcial characters like (-,&.. etc)?

I need to do solr search for a string like BEBIL1407-GREEN with special character(-) but it is ignoring - and searches for only with BEBIL1407. I need to search with a hole word.Im using solr_4.5.1
Example Query :
q=BEBIL1407-GREEN&qt=select&start=0&rows=24&fq=clg%3A%222%22&fq=isAproved%3A%22Y%22&fl=id
Your question is about searching for BEBIL1407-GREEN but finding BEBIL1407.
You did not post your schem or your query parser.
As default solr using the standard query parser on field "text" with fieldtype "text_general".
You can test with the solr analysis screen the way from a word (in real text) to the corresponding token in the index.
For "text_general" the word "BEBIL1407-GREEN" goes to two token: "bebil1407" and "green".
The Standard-Parser does support escaping of special characters this would help if your word starts with a hyphen(minus sign). But in this case most possible the tokenizer is the reason of "finding unexpected documents".
Solution:
You can search with a phrase. In this case "BIBIL1407-GREEN" will also find "BIBIL1407 GREEN"
You can use an other FieldType e.g. one with WhiteSpaceTokenizer
Hope this helps, otherwise post your search field and your definition from schema.xml...

finding matches for part words in SOLR

I have a field with value of "holmes#sible.com"
I want get back this field If I search for "sible".
I use ngrams filter, which would help only if the string was "sible#holmes.com"
Which filters/tokenizers should I use for such a thing (pretty much the LIKE in sql).
EdgeNGramFilterFactory would help only if the string was "sible#holmes.com" but NGramFilterFactory will get what you want with "holmes#sible.com" too.

Resources