Is there any way to map analyzer to Query-types (phrase, range) similar to the way we do with Analyzer to field names?
I want to support exact match in case of phrase searches and search on even stemmed words if it's not a phrase search. During indexing I'm indexing both the original token and stemmed token at the same position.
Consider the following case:
document1 : feature flipping
document2 : feature flip
Tokens generated during indexing phase:
document1 : feature featur flipping flip
document2 : feature featur flip
feature & featur are at the same position and flipping & flip are at the same position
When I search using phrase query "feature flipping" query generated is
Your Query: +matchAllDocs:true +(alltext:("feature flipping"))
Lucene's: +matchAllDocs:true +alltext:"(feature featur) (flipping flip)"
And this returns both the documents. Is there any way to return only the exact match (document 1)? I thought that if it' possible to map analyzers to query-types, then i will skip phrase queries from stemFilter.
UPDATE
https://issues.apache.org/jira/browse/LUCENE-2892 is what I'm looking for.
Thanks
Related
Does anyone worked on the solr search to boost the result based on maximum search keyword match? Actually I am doing query on Solr to get the result based on multiple keywords search and need to boost the result based on maximum matches keyword.
Let say my search term is field:("suresh" or "ramesh" or "vikas"). Now if any result match all three word then should come first. If any any result match only two word then come 2nd position and so on.
Thanks !
I am trying to use the eDismax Query Parser with the following requirements where a search query can be intepreted as a phrase and also individual words, but where phrase takes precedence over individual words.
Example:
Search query: We are cool
Results should be:
Documents fields with phrase 'we are cool' appearing top of list
Documents where fields comprises of either 'we', 'are', 'cool' where highest number of occurences take precedence.
How would I go about implementing this? Thanks.
The simplest way: use pf param boosting for that, check the doc here
So for example, adding this (if you had those two fields):
q=We are cool&pf=mytitle^10 mydescription
I am a newbie with solr and I have a question about query mechanism.
In my solr schema.xml for a particular field (say field1) i have a standard tokenizer that splits into words and a couple of filters. One of the filters is a solr.KeepWordFilterFactory filter that has a extremely short dictionary (just 10 words, say they are: red, orange, yellow, green etc). I tested the schema with analyze menu of solr and everything works.
that is a document with text "Red fox was sitting on green grass". would translate to {"red,"green"}
However, when I submit a query: field1:"red green" it fails to find such a document. As if the query is applied to unfiltered yet tokenized source.
Can you confirm that this is what standard query parser actually does. I.e the filters are applied exclusively for the index, but no for the actual search ??(i understand that the search will be applied only to those documents where the index matches the analyzed query). Or if not how the phrase query actually works in the above example.
When you do a query like this : "red green", Lucene expects to find these terms in consecutive positions , so pos(green) = pos(red) + 1. When you do it like this : "red green"~10 , you give it 10 moves to shuffle the terms around and try to make them seem consecutive (it's called a phrase slop) .
Other that that , what a KeywordMarkerFilter does is mark tokens with the keyword flag. Filters following it could implement a logic that check if the token is a keyword before modifying it. It does not stop lucene from indexing tokens not marked as keywords, but it could stop it from further modifying them.
How do you set up partial (substring) fuzzy match in Solr 4.2.1?
For example, if you have a list of US cities indexed, I would like a search term "Alber" to match "Alburquerque".
I have tried using the NGramFilterFactory on the <fieldType> and rebuilt the index but queries do not return results as expected - they still work as if I had just done the standard text_general defaults. Exact matches work, and explicit fuzzy searches would work given sufficient similarity (for example "Alberquerque~" with one misspelling would work.)
I did go to the analyzer tool in the Solr admin and saw that my ngrams were indeed being generated.
Is there something i'm missing from the query side?
Or should I take a different approach altogether?
And can this work with dismax? (Multiple fields indexed like this with different weights)
Thanks!
I have a SOLR query which supports both exact and partial matches. The query terms have appropriate boost factors added where exact matches have higher boost compared to partial matches.
However, within partial matches too, we want to define the boost factors in such a way that
a partial match having a full word gets more priority than a partial match appearing as a part of a word.
For example: If a user searches for a string "Annie Hall", then the documents containing values like: "Tanner Hall", "Hall Pass" etc. should have a higher weight (priority) as compared to values like: "Halloween", " "The Dog Who Saved Halloween". They all are partial matches but "Hall" appears as a separate word in "Tanner Hall" and "Hall Pass" and hence they should have more score.
Please help.
Regards,
I am assuming you are using ngram filter for your queries as it is able to match both the full and the partial matches.
If so, you can always have two fields.
Non Ngramed field with higher boost - text
NGramed field with normal boost - text_ngram
e.g. For dismax - text^2 text_ngram would result in prefect matches having higher boost then the partial matches.
Remember if there is a full match, there would be a partial match as well so its a cumulative boost.