solr dismax identify phrase match

solr dismax identify phrase match - solr

I am searching for "i want honda bike" on a text field using edismax query handler.
My intent is to find out docs having "honda bike" in it.
Now the results containing "honda", "bike" and "honda bike". Basically I am not interest in "honda" and "bike". I am actually interested in "honda bike".
Is there any way to identify if the phrase in field has matched the user query?

I would investigate these parameters -- pf, pf2, and pf3.
pf -- phrase fields. This will let you boost the documents that have your q values in close proximity.
pf2 and pf3 -- chops the input into bigrams (or trigrams).
There are also slop settings to give some leeway in matching.
http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

The problem was IDF was disturbing the score hence I could not fully rely on score to confidently say what has perfectly matched.
So I disabled IDF calculation.
take a look at
http://lucene.472066.n3.nabble.com/Identify-exact-search-in-edismax-td4011859.html#a4011976

mm (Minimum 'Should' Match) feature of edismax can be used here
http://wiki.apache.org/solr/ExtendedDisMax

Related

how to use pf(Phrase Fields) and ps(Phrase Slop) of eDisMax Query Parser in solr?

What is Phrase Fields, Phrase Slop and Query Phrase Slop in eDisMax. I go through many website but not understand these with implementation. I want to know how query pass in solr using this and how output differs from each other If I have following data.
{
"id":"2",
"shipping_firstname":"Sudhanshu",
"address":"H.No. 444, Gali No.2 Jain Nagar",
"date_added":"2017-01-21T14:15:15Z",
"_version_":1562029999829024768}]
}

Welcome,
Phrase Fields, Phrase Slop and Query Phrase Slop in eDisMax parser are used to boost a document based on certain criteria.
Based on your use case you can give different boost values to manipulate the overall score of a document.
The pf (Phrase Fields) parameter can be used to boost the score of documents in which all of the terms in the q parameter appear in close proximity. The pf parameter takes a list of fields and optional corresponding boosts. The eDisMax query parser will attempt to make phrase queries out of all the terms in the q parameter, and if it’s able to find the exact phrase in any of the phrase fields, it will apply the specified boost to the match for that document.
The ps (Phrase Slop) parameter :
When using the pf parameter, you may not want to require all terms in the query to appear as an exact phrase. You can make use of the ps (phrase slop) parameter to specify how many term positions the terms in the query can be off by to be considered a match on the phrase fields.
The qs (Query Phrase Slop) parameter :
Just as the ps parameter allows you to define the amount of slop (edit distances) on phrases matching in the phrase fields (pf parameter), the qs parameter allows you to do the same for phrases the user explicitly specifies in the main q parameter. Think of the qs parameter as redefining what an exact match is, allowing you to change the slop from the default of 0 (terms must appear beside each other) to a higher number.
What is your requirement here? These params can only help you for ranking results to boost or get some documents at the top and not in actual search criteria or finding matching documents.

Show all the results in solr textual query

I have the following simple query in solr in which I want to solr all the records based on their name similarity to a text ("Olive Tasting Room"):
query: name:"Olive Tasting Room"
But when I search it on solr it returns only one document which is most similar. this is while I want a sorted list of all my documents based on their rank (similarity to my query).
how should I do this in sorl/lucene ?

When you use the `field:"Term Term2" syntax, you're doing a phrase search - i.e. you expect the terms to come in succession after each other.
The best way to handle more "natural" queries is to use the edismax query parser. You do this by using defType=edismax in the URL. After changing to edismax, you can enter the query itself in q - q=Olive Tasting Room (escape it properly if you enter it directly into an URL), and qf=name (qf is short for "query fields", which fields the edismax handler should query).
You can also use the pf3=text parameter to give a boost to any documents that feature three words from your query after each other (and pf2 for just two) in the text.

Boost SOLR Result Score based on search term and document type

I have a rule from my SMEs for SOLR Search relevancy. It goes like this.
When words "XX", "YY", or "ZZ" are in the User's search terms, heavily boost the document_type "MMMM" in the results. (But ONLY then, which means I can't weight the doc itself I think.)
I can imagine building a "Query Pre-Processor" that checks for the presence of the specified terms "XX", etc. and then plugs them into a pre-built query that heavily boosts document_type "MMMM".
That feels more than a little clunky to me. Doing this in code and handling a "union" situation where terms from two rules are in the search doesn't sound like something I'd like to maintain.
I'm wondering if there could be a way to leverage SOLR to do this? The first thing that comes to mind is to put those particular search terms "XX", etc.. into any document_type "MMMM" when pre-processing the data to go into SOLR.
Just tossing them into the document's text is probably not going to change the weighting all that much -- especially if the term is in other documents NOT part of that document_type -- and that suggests to my mind an "important_abbreviations" field on all documents and a "standard" practice of including a boost for that general field on all queries. I say that because I don't recall ever seeing a way to boost a particular field within a doc except in a query.
I'm wondering if anyone else out there has solved this problem and if so, how -- since both of these feel a little clunky to me.

Attempting One Possible Answer: Please feel free to critique, advise or warn.
(I'm aware that an "abbreviation" field feels a bit like synonyms, Please comment if you think synonyms would be a better way to approach this.)
Step 1: Make an "abbreviation" multivalued field in SOLR on all collection docs.
Step 2: Add "XX", "YY", "ZZ" to all documents of type "MMMM" when I build the solrInputDocument to send to SOLR.
Step 3: Boost the "abbreviation" field when adding the abbreviations in step 2 so that resulting xml looks like this:
<field name="abbreviation" boost="5.0">myXXAbbreviationGoesHere</field>
[Concern: Can I boost some fields of type "abbreviation" and not others? In other words, will SOLR respect/correctly calculate the field boost value if it's "2" on one document "5" on another and there is no boost on a 3rd document?]
Step 4: Do a copyField and drop "abbreviation" into the default "text" search field. [This probably looses me my field-specific weighting, yes? -- Thus 5 or 6 below.]
Step 5: OR - add a Request Handler that forces doing search on the abbreviations field directly on every incoming search. Not totally sure on this one, but I got the idea from this stackoverflow question: Solr - Boosting result if query is found in a special field
Step 6: OR - append the query text for searching "abbreviation" on every query entered in my UI - before submission to SOLR.
[In this case, I want to search the default field AND the "abbreviation" field with this single query. I assume that's possible, I just haven't tried to write the query yet. Comments gratefully accepted.]

Solr - how to plan field boosting

I query using
qf=Name+Tag
Now I want that documents that have the phrase in tag will arrive first so I use
qf=Name+Tag^2
and they do appear first.
What should be the rule of thumb regarding the number that comes after the field?
How do I know what number to set it?

The number is pure preference based and is mainly trial and error basis.
As to how much the field weighs in comparison to the other field.
The scoring takes into account various factors, however some factors can be considered and tested
e.g. term frequency - So is a word appears twice in Name should it override a single occurrence in the tag field
Also, if you are checking for a Phrase match you should use pf if using the edismax parser.
qf will match individuals words where pf will match whole words.
For e.g. if you have fields name & tag and you search for ruby rails
qf would cause scoring name:ruby tag:ruby & name:rails tag:rails
pf would cause scoring name:"ruby rails" tag:"ruby rails"
so would be better to use qf to match the results and boost single matches but have higher pf values.

Which one to use for boosting bq or q.alt

As the title says which one I need to use for boosting in solr. whether its q.alt or bq. I tried the boosting in both however I'm not clear on how the boosting is working. Because in q.alt I got the correct results when I specified boosting value as 1000 at the same time I got the same results in bq with the boosting value as 2
Can someone help me to get the best practices for boosting?
My SOLR version is 3.5.

It depends upon what are you trying to boost.
Use qf (query fields) - to boost the individual search fields which have different weightage.
for e.g. For a document title has a higher weightage then description then you would use title^2 description^1
q.alt is just an alternate query factor in case on q is specified.
Use bq and bf for boosting certain matches, ranges or when the need to apply some functions on them. These usually are the extar boost and not the part of the search boost.
for e.g. for latest documents you would boost by date, or Price range or you want to boost on sum of fields etc ...

use qf parameter for boosting
Dismax Query Parser Wiki

Categories

android-studio-electri...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight