Precise expression or word solr match - solr

I am looking for way to match a very specific expression or word in my solr collection.
Here is an example :
I want the query to return me :
"Paris"
And not : "Paris is great"
And not : "I like Paris"
Thanks :)

If you only want exact matches, make sure the field type is defined as string. A string field will not do any tokenization or use any filters, and will only generate hits when the query is exactly the same as the value indexed.

You need to use KeywordTokenizer
This tokenizer treats the entire text field as a single token
https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-KeywordTokenizer

Related

SOLR - Searching record based on SOLR field in passed string

I have a CSV string field say "field1" in SOLR which can have value similar to 1,5,7
Now, I want to get this record if I pass values:
1,5,6,7
OR
1,5,7,10
OR
1,5,7
Basically any of these inputs should return me this record from SOLR.
Is there anyway to achieve this. I am open for schema change if it helps.
The Standard Tokenizer (used in text fields like text_general) will not split on commas if there is no space in between characters.
That means that "1,2,3" will be indexed as a single token ("1,2,3") but it will index "1, 2, 3" as three tokens ("1", "2", "3").
If you can make sure there will be a space after the comma in the value that you are indexing and the value that you are using in your search query you might be able to achieve what you want by indexing your field as a text_general.
You can use the Analysis Screen in Solr to see how your value will be indexed and searched and see if any of the built-in field types gives you what you want.

Solr exact match field boosting

I have this requirement: if the query text match exactly with a particular field value (the title field) the result must be first or al least be boosted.
So I need to boost the results with the exact match.
My solution is to create the title as an untokenized field, so it'll match only exactly, and boost this the title with an edismax query.
Is there any othere way?
How can I index a field untokenized? So without tokenize on spaces?
Use a KeywordTokenizer - this will index the field as a single value, but still allow you to attach filters - for example to lowercase the text before storing the token.
If you don't want to perform lowercasing either, you can use a string (StrField) field - a string field will only give a hit if the value is exactly the same.
This is usually what you'll do to give exact hits a larger boost than other hits - and you can use the qf parameter to dismax (which you probably are already) to give this list. Use copyField to index the content into separate fields with different definitions.

How does Solr process the query string when using edismax qf parameter and specify field in query

All:
[UPDATE]
After reading the debug explain, it seems that the qf will expand only
the keywords without specifying field.
===================================================================
When I learn to use edismax query parser, it said the qf paramter is:
Query Fields: specifies the fields in the index on which to perform
the query. If absent, defaults to df.
And its purpose is to generate all fields' combination with the query terms.
However, if we already specify the field in query( q prameter), I wonder what happen when I specify another different fields in qf?
For example:
q=title:epic
defType=edismax
qf=content
Could anyone give some explanation how SOLR interpret this query?
Thanks
When you specify qf it means you want solr to search for whatever is in the "q" field in these "qf" fields. So, your first and third line contradict each other:
q=title:epic
defType=edismax
qf=content
If you want to search for any document where the content field contains anything matching your search terms, but these search terms as tokens in "q" separated by +OR+.
like this...
q=I+OR+like+OR+books+ORand+OR+games
defType=edismax
qf=content
When q=title:epic. It means you has settled the query field to title, so the qf parameter could not be set as "content", in this case, you have no query result for sure. You leave the qf parameter empty or set it as "title"

finding matches for part words in SOLR

I have a field with value of "holmes#sible.com"
I want get back this field If I search for "sible".
I use ngrams filter, which would help only if the string was "sible#holmes.com"
Which filters/tokenizers should I use for such a thing (pretty much the LIKE in sql).
EdgeNGramFilterFactory would help only if the string was "sible#holmes.com" but NGramFilterFactory will get what you want with "holmes#sible.com" too.

Can Solr search key words precisely?

For example:
I want to search "support", I hope it will only return the results containing "support", and do NOT return the result containing "supports" or any other relevant matches.
Is it possible to implement like this?
Thanks.
Yes, if you search against an unanalyzed field type, matches are exact. In the default Solr schema the unanalyzed field type is named "string" (of class "solr.StrField")
EDIT: it depends on what you mean by "precisely". If your field value is "support desk" and your query is "support", should it match?
If your answer is yes, then you should look into configuring stemming.
If your answer is no, i.e. the query must match the field value and nothing else, then you should use a string (i.e. unanalyzed) field type.
Furthermore, if your query is "supports" and the field value is "Supports", should it match?
If you answer yes, then you should use a LowerCaseFilterFactory (you can't do this on a string field type, you'll have to switch to a text field type).
If you answer no, then it's ok to use a string field type.
In summary, the Lucene/Solr text analysis pipeline is very configurable, take a look at the analyzer docs for a reference of all available options.
What you are describing is called stemming. There is another almost identical question on stack overflow, check it out : Solr exact word search
You will need to re-index and disable stemming in your configuration. I don't believe it's possible to do that at query time since what is stored in your index is the stemmed version of the word. In your case "support" is stored in the index even is "supports" is displayed.
This should get you started How to configure stemming in Solr?

Resources