Solr block join faceting while maintaining original query - solr

I am attempting to implement a search in solr 5.5 which requires faceting on child document fields. I realize that flattening the data structure is the ideal solution for solr search but unfortunately because of business requirements of the search, I am required to maintain a relationship between various fields (hence the child documents).
I am experimenting with using the BlockJoinFacetComponent to facet on child document fields, and I am able to get everything working and get the counts I expect using the basic example, no problems there. The issue I am facing is that the BlockJoinFacetComponent requires a ToParentQuery, and I can't figure out how to combine this with my original search query and still get facet results.
To explain further:
I am basically following this example: http://www.slideshare.net/lucidworks/faceting-with-lucene-block-join-query-oleg-savrasov
In the example, the user originally searches for "dress", and then is shown facets to filter down by size, color. Size and Color are child fields, and the BlockJoinFacetComponent is used in the example to facet by size and color and retrieve the expected counts.
In the example, the query used to retrieve said facets (slide 22) is:
q= {!parent which="scope:product"} COLOR: Blue
child.facet.field = SIZE
Which works fine. What I am not understanding is in this example we have now lost the original search for "dress". So my question is basically how can I combine my original search (dress) with the ToParentQuery? I have tried everything I can think of to combine the queries, but I always end up getting the same exception:
"Block join faceting is allowed with ToParentBlockJoinQuery only".
I have even downloaded the solr source code and hooked up a remote debugger where this error is being thrown to try and debug this, but I still can't figure it out. No matter what I do it seems like unless the ToParentBlockJoinQuery is the only thing in the query, the BlockJoinFacetComponent will reject it. Which seems odd considering to use the component you've now lost what the user originally searched for.

After further debugging, the issue stems from the fact that BlockJoinFacetComponent seems to not be able to separate the ToParentBlockJoinQuery portion of a query if you are using a query parser other than the standard parser (I was using edismax).
For example, with the standard query parser, this works:
"original query" + _query_:"{!parent which="scope:product"} COLOR: Blue"
child.facet.field = SIZE
If you run this same query with a dismax or edismax query parser, you receive the error:
"Block join faceting is allowed with ToParentBlockJoinQuery only".
Since I am dependent on the edismax query parser, this was a show stopper for me. However, I was able to achieve the results I desired instead by using the JSON Facet API: http://yonik.com/solr-nested-objects/#faceting

Related

Solr 8.8.2 reduce recall and improve precision for multi token queries - mm, qs, shingles

I'm facing a issue wherein I have huge amount of data in Solr and as a result, searching for a multi token query is generating a big recall set. For ex - if i search for "apple watch series 4 42mm", i get back 4 million results. My parser is edismax, minimum match setting is 2 as of now, and am using WhiteSpace Tokenizer with a bunch of filters. The goal here is to reduce this recall set to display more relevant results.
Things that I explored are -
MinimumMatch - Am trying setting mm to 2>2 4>3 to see how it results. Also tried finding out if i could apply mm on individual fields and found out that it used to be possible with local params in Solr but has been discontinued since Solr 7.2. I do not want to get into writing a custom parser or tweaking Solr's code since that could lead to other problems. Nor do i want to change the default parser to Lucene. Is there any other way that i could apply mm separately to category_name, product_name, product_description, brand_name, etc?
Query slop - Am not using qs as of now, tried a few examples converting my query into phrase query and applying qs. It does reduce recall but i have a problem there. Suppose i have a product which has "apple" in brand_name and "watch series 4 42mm" in product name, that is a relevant result but will not be returned because the phrase query has to have all tokens in the field. Is there a way to apply qs to suit my purpose?
ShingleFilterFactory - I'm trying this filter with outputUnigrams true because i do not want the individual terms to not be indexed. But with that, index size would explode and result set won't be that good either. Can i use other levers like mm or something else along with this to make it work? Also, is there a way to make outputUnigrams a query param?
Explored pf2, pf3, ps also but those will be used for boosting. Right now, my aim is filtering the most relevant results.
Can someone please help me with the above? Thanks

Why dismax q.alt doesn't return any result

I'm new to solr.
After following the tutorial exercise 1(https://solr.apache.org/guide/8_9/solr-tutorial.html), I'm able to do some solr query on my loacl machine.
If I want to get result without condition, I will do the query like
http://127.0.0.1:8983/solr/#/techproducts/query?q=*:*&q.op=OR
This works pretty fine.
But when I switch to "dismax" and try to have similar result, I do need to use "q.alt".
The query is like
http://127.0.0.1:8983/solr/#/techproducts/query?q.op=OR&defType=dismax&q.alt=*:*
However, this query resulted in no result, which is pretty weird.
Even thought I specified the row, it still won't work.
http://127.0.0.1:8983/solr/#/techproducts/query?q.op=OR&defType=dismax&q.alt=*:*&row=0
Does anyone face the same problem before?
These parameters are not meant to be used with the user interface URLs; they're for sending directly to Solr. The user interface is a Javascript interface that talks to the Solr API behind the scenes. You can see that your urls have a local anchor in them (#), and this is just references that the javascript based user interface uses to load the correct page.
The rows parameter is also named rows, not row - and when used with 0, no documents will be returned (in the example it's given as an example for using facets with complete counts - you have to ask for facets for that to make sense).
The actual URL to query Solr for matching documents would be:
http://127.0.0.1:8983/solr/techproducts/select?defType=edismax&q.alt=*:*
This URL is shown in the user interface over the query results when using the query page.
There is also usually no reason to use dismax and not edismax these days, as edismax does everything that the old dismax handler did and with more functionality.

Solr dismax Query Over Multiple Fields

I am trying to do a solr dismax query over multiple fields, and am a little confused with the syntax.
My core contains a whole load of podcast episodes. The fields in the index are EPISODE_ID, EPISODE_TITLE, EPISODE_DESC, and EPISODE_KEYWORDS.
Now, when I do a query I would like to search for the query term in the EPISODE_TITLE, EPISODE_DESC, and EPISODE_KEYWORDS fields, with different boosts for the different fields.
So when I search for 'jedi', the query I've built looks like this:
http://localhost:8983/solr/episode_core/select?
&defType=dismax&q=jedi&fl=EPISODE_ID,EPISODE_TITLE,EPISODE_DESC,EPISODE_KEYWORDS
&qf=EPISODE_TITLE^3.0+EPISODE_DESC^2.0+EPISODE_KEYWORDS
However, this doesn't seem to work - it returns zero records.
When I put a default field like below, it now works, but this is kind of crap because it means I'm not getting results from searching all of the 3 fields:
http://localhost:8983/solr/episode_core/select?&df=EPISODE_DESC
&defType=dismax&q=jedi&fl=EPISODE_ID,EPISODE_TITLE,EPISODE_DESC,EPISODE_KEYWORDS
&qf=EPISODE_TITLE^3.0+EPISODE_DESC^2.0+EPISODE_KEYWORDS
Is there something I am missing here? I thought that you could search over multiple fields, and I thought that the 'qf' parameter would mean you didn't need to supply the default field parameter?
All help much appreciated...
Your idea is correct. If you've defined qf (query fields) for Dismax, there shouldn't be any need to specify a df (default field).
Can you be more specific about what isn't working?
Also, read up on Configuration Invariants in solrconfig.xml as it is possible your configuration could be sending some different parameters than you've specified in the URL.
(E.g. if you're seeing a specific error message asking you to provide a df)

Lucene OR query not working

I am trying to query Solr with following requirement:
_ I would like to get all documents which not have a particular field
-exclusivity:[* TO *]
I would like to get all document which have this field and got the specific value
exclusivity:(None)
so when I am trying to query Solr 4 with:
fq=(-exclusivity:[* TO *]) OR exclusivity:(None)
I have only got results if the field exists in document and the value is None but results not contain results from first query !!
I cannot understand why it is not working
To explain your results, the query (-exclusivity:[* TO *]) will always get no results, because you haven't specified any result to retrieve. By default, Lucene doesn't retrieve any results, unless you tell it to get them. exclusivity:(None) isn't a limitation placed on the full result set, it is the key used to find the documents to retrieve. This differs from a database, which by default returns all records in a table, and allows you to limit the set.
(-exclusivity:[* TO *]) only specifies what NOT to get, but doesn't tell it to GET anything at all.
Solr has logic to handle Pure negative queries (I believe, in much the same way as below, by implicitly retrieving all documents first), but from what I gather, only as the top level query, and it does not handle queries like term1 OR -term2 documented here.
I believe with solr you should be able to use the query *:* to get all docs (though that would not be available in raw lucene), so you could use the query:
(*:* -exclusivity:[* TO *]) exclusivity:(None)
which would mean, get (all docs except those with a value in exclusivity) or docs where exclusivity = "None"
I have founded answer to this problem. I have made bad assumption how "-" works in solr.I though that
-exclusivity:[* TO *]
add everything without exclusivity field to the data set but it is not the case. The '-' could only exclude things from data set. BTW femtoRgon you are right but I am using it as fq (filter query) not as a master query I have forgotten to mention that.
So the solution is like
-exclusivity:([* TO *] AND -(None))
and full query looks like
/?q=*:*&fq=-exclusivity:([* TO *] AND -(None))
so that means I will get everything does not have field exclusivity or has this field and it is populated with value None.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources