Solr morelikethis returns no documents - solr

I try to build a product recommender using the Text from the productdescription as input for recommendations.
But for some reasons I don't get any results. I setup the productdescription as textfield in the Schema.XML . I also marked it as a vector field.
My query looks like this select?q=id:189&mlt=true&mlt.fl=productdescription&mlt.mintf=1&mlt.mindf=0
From my understanding this query should somehow alsways bring me some similar items even if the score would be very low as df is set to 0.
But the only result I get is sometimes a duplicate of a product with the same description but an different ID (the dataset is not perfect).
So my question is: how can I always get the next nearest document even if there is no 1:1 match from the whole Text

I had a similar issue where no results were showing until I added mlt.mintf=1, which I see you have, but perhaps play around with those mintf mindf parameters to see if something yields results. I've actually been looking for more in depth examples for MoreLikeThis..

Related

Display Solr results based on custom selection

I have a filed 'qualification' which is having multiple values (something like MCA, MBA, MSC, PhD, ...).
My requirement is to display results in the order MSC, MCA, PhD, MBA. So, I am using the below query to boost the field values.
&bq=(qualification: "MSC"^5 "MCA"^4 "PhD"^3 "MBA"^2)
The above query is working only when I use q=*:*
But when search with any text like q=course, I am not getting the results with specified order.
Please help what I did wrong.
Thanks & Regards
Venu
You're probably not doing anything "wrong", but when you actually search for something, the score isn't flat (i.e. it's no longer just 1) any more.
If you don't want your query to affect the score, use a filter query (fq instead). This does however not give you any actual relevance inside the results - if you still want that, you'll probably have to adjust your boosts to be far higher, so that the actual scores are only used internally within each boost level.
&bq=qualification:"MSC"^50000
&bq=qualification:"MCA"^40000
&bq=qualification:"PhD"^30000
&bq=qualification:"MBA"^20000
If you append debugQuery=true to your query string, you can see how the score is calculated for each document, and adjust your boosts accordingly.

Solr block join faceting while maintaining original query

I am attempting to implement a search in solr 5.5 which requires faceting on child document fields. I realize that flattening the data structure is the ideal solution for solr search but unfortunately because of business requirements of the search, I am required to maintain a relationship between various fields (hence the child documents).
I am experimenting with using the BlockJoinFacetComponent to facet on child document fields, and I am able to get everything working and get the counts I expect using the basic example, no problems there. The issue I am facing is that the BlockJoinFacetComponent requires a ToParentQuery, and I can't figure out how to combine this with my original search query and still get facet results.
To explain further:
I am basically following this example: http://www.slideshare.net/lucidworks/faceting-with-lucene-block-join-query-oleg-savrasov
In the example, the user originally searches for "dress", and then is shown facets to filter down by size, color. Size and Color are child fields, and the BlockJoinFacetComponent is used in the example to facet by size and color and retrieve the expected counts.
In the example, the query used to retrieve said facets (slide 22) is:
q= {!parent which="scope:product"} COLOR: Blue
child.facet.field = SIZE
Which works fine. What I am not understanding is in this example we have now lost the original search for "dress". So my question is basically how can I combine my original search (dress) with the ToParentQuery? I have tried everything I can think of to combine the queries, but I always end up getting the same exception:
"Block join faceting is allowed with ToParentBlockJoinQuery only".
I have even downloaded the solr source code and hooked up a remote debugger where this error is being thrown to try and debug this, but I still can't figure it out. No matter what I do it seems like unless the ToParentBlockJoinQuery is the only thing in the query, the BlockJoinFacetComponent will reject it. Which seems odd considering to use the component you've now lost what the user originally searched for.
After further debugging, the issue stems from the fact that BlockJoinFacetComponent seems to not be able to separate the ToParentBlockJoinQuery portion of a query if you are using a query parser other than the standard parser (I was using edismax).
For example, with the standard query parser, this works:
"original query" + _query_:"{!parent which="scope:product"} COLOR: Blue"
child.facet.field = SIZE
If you run this same query with a dismax or edismax query parser, you receive the error:
"Block join faceting is allowed with ToParentBlockJoinQuery only".
Since I am dependent on the edismax query parser, this was a show stopper for me. However, I was able to achieve the results I desired instead by using the JSON Facet API: http://yonik.com/solr-nested-objects/#faceting

solr facet counts not correct with stats and group option

I am using solr search for products-search on our web page. Since now, all works fine.
But while implementing a price slider, to filter actual results by pricerange, I'm stuck with the following issue:
There is no way to exclude filters for the stats option, same way as it is possible on facets. I use stats for getting the overall min- and max-price, no matter what price range is selected (on the slider) and which category is selected on actual search.
So best way to get this values is to exclude the range-filter on stats select, otherwise there will be max- and min-price just for the actual (ranged) result.
exclude a filter on facets (works on solr 4.4):
...&fq={!tag=cat}categories:Electronics/Computers&facet=true&facet.field={!ex=cat}categories&...
But using this for stats is not possible (see https://issues.apache.org/jira/browse/SOLR-3177)
So then I tried using a group select as suggested on that called page.
my solr call looks like this:
fq={!tag=cat}categories:Electronics/Computers&facet=true&
facet.field={!ex=cat}categories_raw&
facet.prefix=Electronics&stats=true&stats.field=minPrice&
stats.field=maxPrice&stats.field=vat&group=true&group.query=minPrice:[* TO 20]
maxPrice:[0 TO *]&group.main=true
All fine. I get the correct stats result and the correct result-count having applied the pricerange-filter. .... EXCEPT the problem, that the facet counts now were wrong, as I did not apply the price range filter.
I know there is a group.facet option, as I also tried. But using that group.facet I need to use a group.field on which the results are based on. In my opinion, usually I need to use the price-field as group.field (group.field=price).
But we do have two price fields on our products (min and max-price). I tried to set them both as group.field parameter, but still get the wrong facet-counts.
It looks like I am just a small step away from the correct solution, but I don't get it.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

Is it possible to have SOLR MoreLikeThis use different fields for model and matches?

Let's say I have documents with two fields, A and B.
I'd like to use SOLR's MoreLikeThis, but with a twist: I'm most interested in boosting documents whose A field is like my model document's B field. (That is, extract MLT's 'interesting terms' from the model B field, but only collect MLT results based on the A field.)
I don't see a way to use the mlt.fl fields or mlt.qf boosts to achieve this effect in a single query. (It seems mlt.fl specifies fields used for both discovery of 'interesting terms' and matching to those terms.) Am I missing some option?
Or will I have to extract the 'interesting terms' myself and swap the 'field:term' details?
(Other ideas in this same vein appreciated as well.)
Two options I see are:
Use a copyField - index your original document with a copy of field A named B, and then query using B.
Extend MoreLikeThisHandler and change the fields you query.
The first option costs a bit of programming (mostly configuration changes) and some memory consumption. The second involves more programming but no memory footprint increase. Hope one of them suits your needs.
I now think there are two ways to achieve the desired effect (without customizing the MLT source code).
First option: Do an initial MLT query with the MLT handler, adding the parameter &mlt.interestingTerms=details. This includes the list of terms that were deemed interesting, ranked with their relative boosts. The usual behavior uses those discovered terms against the same mlt.fl fields to find similar documents. For example, the response will include something like:
"interestingTerms":
["field_b:foo",5.0,"field_b:bar",2.9085307,"field_b:baz",1.67070794]
(Since the only thing about this initial query that's interesting is the interestingTerms, throwing in an fq that rules out all docs could help it skip unnecessary scoring work.)
Explicitly re-composing that interestingTerms info into a new OR query field_a:foo^5.0 field_a:bar^2.9085307 field_a:baz^1.67070794 amounts to using the B field example text to find documents that are similar in field A, and may be mimicking exactly the kind of query default MLT does on its usual model field.
Second option: Grab the model document's actual field B text, and feed it directly as a ContentStream body, to be used in lieu of a query, for specifying the model document. Then target mlt.fl at field A for the sake of collecting similar results. For example, a fragment of the parameters might be …&stream.body=foo bar baz&mlt.fl=field_a&…. Again, the net effect being that model text originally from field_b is finding documents similar only in field_a.

Resources