SOLR nested document query unexpected results - solr

I created a query to find parent documents in SOLR by filtering on both child and parent properties. I have simplified it for this example to:
{!parent which='content_type:"parent" AND field_a="value" AND field_b="value"'}((child_field_x:("VALUE" ) AND field_y:value))
Only parent documents have 'content_type:parent'. SOLR only returns parent documents, so that works.
Now I'm creating crossings between to other fields, lets say field_c and field_d. For all possible values of both C and D I want to calculate the number of parent documents. For each combination of values I now do this:
{!parent which='content_type:"parent" AND field_a="value" AND field_b="value" AND field_c="value" AND field_d="value"'}((child_field_x:("value" ) AND child_field_y:value))
When I add up all the results of alle these queries however, I get a much larger number then with the original query above. The original query would give me 15k results, if I add up all rows I get 80k results.
I did some testing and notice that if I take a specific value for C and a specific value for D these were the results:
Filtering only on C: 12.522 documents
Filtering only on D: 15.205 documents
Filtering on both (AND): 12.349 documents
Filtering on C and negate D: 3.265 documents -> expected
the difference between C and D which would be 2.683
Both field_c and field_d are single value.
If I remove the child query (everything after }), but leave it like {!parent which='(..) I do get the correct sum. It's only when I start adding the child document query that it doesn't add up anymore.
I just don't get it, why does this happen? I have a feeling I'm not getting something from the concept of child documents, but can't seem to find anything looking at examples and documentation. It does seem to correctly filter on the parent properties, but probably the child documents are not queried correctly, or so it seems.
UPDATE I did some extra testing by looking at the results generated. There are no duplicates in the result set and the results of parent documents are correct for the parent filters. I wasn't able to check the child documents that belong to those companies yet, but it seems to be a problem there.
One thing I noticed: if I change the default query operator to 'AND' instead of 'OR' I get 0 results in every crossing. Since my query already contained 'AND' only, I didn't get why this would be the case.

I finally managed to find a solution. Its best to work with join query parsers. If you want to filter parent documents having child documents according to a specific condition then do this:
Query: myparentfield:"value" AND myotherparentfield:"othervalue"
FQ: {!join from=_root_ to=_root_}mychildfield:"childvalue" AND myotherchildfield:"otherchildvalue"
Feel free to replace AND's with OR's there.
Now if you want two child conditions with an AND condition (so the parent should have a child complying to condition A, but also (another) child complying to condition B) then use this:
Query: myparentfield:"value" AND myotherparentfield:"othervalue"
FQ: {!join from=_root_ to=_root_}mychildfield:"childvalueA" AND myotherchildfield:"otherchildvalueA"
FQ2: {!join from=_root_ to=_root_}mychildfield:"childvalueB" AND myotherchildfield:"otherchildvalueB"
If you want to get the parents that have either a child with condition A or a child with condition B use this:
Query: myparentfield:"value" AND myotherparentfield:"othervalue"
FQ: {!join from=_root_ to=_root_}(mychildfield:"childvalueA" AND myotherchildfield:"otherchildvalueA") OR (mychildfield:"childvalueB" AND myotherchildfield:"otherchildvalueB"`)
It is important to have a _root_ field in your schema according to the following field definition:
<field name="_root_" type="string" indexed="true" stored="false" docValues="true" />
You may also add something like content_type with either a value of 'parent' or 'child'. Use it to filter on content_type:parent in the main query if you want to only return parent documents.
I hope this helps someone, since I feel the SOLR documentation is a bit limited. The documentation is there, but not very extensive on the subject of child documents/embedded documents.

Related

Nested document searches in Solr complex parentFilter syntax

We are adding nested documents to our Solr index. For this purpose, we've added a solr_record_type field to each record, but there will be an interval while we are updating the index where the original documents will have null in this field. We would like to treat all of the original documents as root documents.
In our Solr index, solr_record_type equals 1 and the child types are represented by 2-4. So, in order to get backwards compatibility with what is currently returned by queries, I added this fq parameter:
-solr_record_type:[2 TO 4]
However, I am having trouble composing the parentFilter in the child transformer. For the fl field I've tried:
*,[child parentFilter="-solr_record_type:[2 TO 4]"]
This doesn't work because it then omits the _childDocuments_ section from the results for some reason. I don't know why. I need some way to specify that the parent filter is either "null or 1" or "anything but 2, 3, and 4". How can I do this?
I was unable to find a definitive reference for syntax for the parentFilter, only very simple examples.
A negative query needs to be prefixed with what it's going to remove the documents from. Think of it as the intersection between the two sets, and if you only have the set which are "these documents should be removed", you have nothing to remove them from.
The regular query parser (and the edismax handlers) append the set of all documents, *:* automagically in front of negative queries for you, so it appears to work - until you start with longer AND and OR statements involving negative queries, where you suddenly need to prefix *:* as well.
The same is the case in the parentFilter syntax - there is no inherent set of all documents automagically prefixed internally, so if you have a negative query, you'll have to add it yourself.
*,[child parentFilter="*:* -solr_record_type:[2 TO 4]"]

Filter in Graphical Query parser Solr

I have a structure where I want to search my documents and filter/rank/set conditions on my parents. Example, a doc is a match because it contains my searched string, but also because its parent contains a certain value.
Using the graph parser and experimenting with the filter is the best way I have noticed doing this. I tried block join child parser first but it wouldn't do it for me.
The problem I am facing now is that I can't seem to get the filter to work in this way:
traversalFilter="(-field:x) OR (field2:y)"
Meaning, if field does not have value x it is ok, if field has value x and field2 has y its also ok. Other cases is filtered away.
But it won't work. Any help is appreciated!
Edit for more information:
I have set up a test core with all my fields stored in a text_general field. Default solrconfig. I have a simple chaining I'm using from parameter as document id. And a to field storing all ids of each documents children. And the graph parser works fine, its just this kind of filter that does not work for me.
I have documents with field with value a or b.
A query like this:
q=*
fq={!graph from=id to=to returnRoot=false traversalFilter="(field:b)" }id:0
This query filters away any document and its children that do not have b as value on field.
q=*
fq={!graph from=id to=to returnRoot=false traversalFilter="(-field:b)" }id:0
Should then work in the opposite. Filter away documents with b as value. But this does not work for some reason.
Edit:
from solrquerysyntax:
https://wiki.apache.org/solr/SolrQuerySyntax
Pure negative queries (all clauses prohibited) are allowed.- inStock:false finds all field values where inStock is not false
Which is why q=* fq=-(field:x) works fine, in returning all documents not containing value x in field.
So why can't I add the same filter in the graph traversal
EDIT3:
I have now started looking on the graph parser and have noticed that when filtering -(-field:x) is the same as +field:x. But +(-field:x) is not the same as -field:x and does not work.

Matching parent, though value in parent OR child solr

I have been experimenting with solr for a couple of weeks but i've been stuck on a query i would like to execute for a couple of days now.
I have a nested data structure where I'm using a fq like this:
{!parent which="parentDoc:true"}parentDoc:false AND <searched term>
This matches my child documents and returns the parent to those children. I am very pleased with that. BUT the problem i have is if there is a match directly inside the parent and nothing in the children. I will not get a response.
I would make it so that in some way there is a OR condition of some sort making it so, any document can match the searched term AND parentDoc:false OR the above filter query.
Is this even possible to execute within one query in solr, or do i have to make two? I have not found any information about this issue, making me to believe im just missing something trivial.
<searched term> from your FQ refer to parents not to children, since you have only parents in the results after join.
Also your <searched term> is a totally separate condition than your join command.
Put q=*:* and if you want to filter by children conditions you can use in fq:
+{!parent which=parentDoc:true v='parentDoc:false AND (<children_conditions>)'} AND <parents_conditions>
In case you want to expand children then add RAW query:
expand.field=_root_&expand=true&expand.fq=*:*
or more specific:
expand.field=_root_&expand=true&expand.fq=(parentDoc:false AND (<children_conditions>))

Solr - return all groups that have certain value within group

I've got a number of products with options - let's say:
product A has options a, b and c; product B has options b, c.
In my Solr document these products are stored as: A:a, A:b, A:c, B:b, B:c (so for this situation I've 5 products stored in Solr). I'm grouping these products (so I've two groups - A and B).
How, for given model, can I retrieve all groups that have a certain option within the group? (if searching for product with option a it should return the group A, with products A:a, A:b, A:c).
I can't do it using q, as it will restrict the results set to only those products that have given option (so, in this case, if I do q=field:a, I'll get group A with result A:a and no other results). Can't use group.query as it returns only one group of results that match given query (if it returned all results I think this would be what I'm looking for).
Is there any other way I can accomplish this?
When you index the items, you can define a child - for solr block join queries.
https://cwiki.apache.org/confluence/display/solr/Other+Parsers
You can use the block join query to find children of certain parents, or parents of certain children.
To quote from the documentation
Block Join Parent Query Parser
This parser takes a query that matches child documents and returns their parents. The syntax for this parser is similar: q={!parent which=}. Again the parameter The parameter allParents is a filter that matches only parent documents; here you would define the field and value that you used to identify a document as a parent. The parameter someChildren is a query that matches some or all of the child documents. Note that the query for someChildren should match only child documents or you may get an exception.

Lucene OR query not working

I am trying to query Solr with following requirement:
_ I would like to get all documents which not have a particular field
-exclusivity:[* TO *]
I would like to get all document which have this field and got the specific value
exclusivity:(None)
so when I am trying to query Solr 4 with:
fq=(-exclusivity:[* TO *]) OR exclusivity:(None)
I have only got results if the field exists in document and the value is None but results not contain results from first query !!
I cannot understand why it is not working
To explain your results, the query (-exclusivity:[* TO *]) will always get no results, because you haven't specified any result to retrieve. By default, Lucene doesn't retrieve any results, unless you tell it to get them. exclusivity:(None) isn't a limitation placed on the full result set, it is the key used to find the documents to retrieve. This differs from a database, which by default returns all records in a table, and allows you to limit the set.
(-exclusivity:[* TO *]) only specifies what NOT to get, but doesn't tell it to GET anything at all.
Solr has logic to handle Pure negative queries (I believe, in much the same way as below, by implicitly retrieving all documents first), but from what I gather, only as the top level query, and it does not handle queries like term1 OR -term2 documented here.
I believe with solr you should be able to use the query *:* to get all docs (though that would not be available in raw lucene), so you could use the query:
(*:* -exclusivity:[* TO *]) exclusivity:(None)
which would mean, get (all docs except those with a value in exclusivity) or docs where exclusivity = "None"
I have founded answer to this problem. I have made bad assumption how "-" works in solr.I though that
-exclusivity:[* TO *]
add everything without exclusivity field to the data set but it is not the case. The '-' could only exclude things from data set. BTW femtoRgon you are right but I am using it as fq (filter query) not as a master query I have forgotten to mention that.
So the solution is like
-exclusivity:([* TO *] AND -(None))
and full query looks like
/?q=*:*&fq=-exclusivity:([* TO *] AND -(None))
so that means I will get everything does not have field exclusivity or has this field and it is populated with value None.

Resources