Solr query for child documents and return parents and filtered children - solr

I'm having trouble creating a Solr query to be able to pull out the right documents, and am starting to wonder if what I am trying to do is even possible.
Currently on Solr 8.9 using a managed schema and every field is using a wildcard field.
Firstly what the document looks like
(changed names due to redacting internal business language):
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates_s": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
},
{
"id": "COUNTY:1CITY:2",
"city_name_s": "Watford",
"size": {
"id": "COUNTY:1CITY:2SIZE:2",
"sq_ft_s": "150",
"sq_meters_s": "10000"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
And what I want to return:
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
Basically my goal is to return more or less the entire thing, however with filtering out one of the cities. For example, the condition for the city would be like city_name_s:"St Albans". So it's to say that I want the parent and all children, however if the child is in that array (ie cities array), then the given field (city_name_s) must equal my defined value, or we don't want that child.
Things I've tried:
I've basically tried two approaches here:
I've tried to play around with {!child} and {!parent} to get a result that I want. Currently I can only get something from City level or the entire thing as if the filter was not there at county level.
I've tried to change values for the childFilter option, with things like:
city_name_s:"St Albans" OR (*:* NOT city_name_s:[* TO *]) to try and say 'if field exists it should be this'.
Anyhow I'm starting to run out of ideas with this; been hacking away at it for the past couple of days and not really got any closer.
Thanks in advance for any help; bashing my head against the wall currently so any suggestions are more than welcome :)

I had a similar issue in solr 9.0.0 and this solved it for me: Apache Solr Filter on Child Documents
In your case, just add fl=*,[child childFilter=city_name_s:"St Albans"]

Related

Managed or manual written schema.xml for the nested json object in Solr 7.5

Here is nested json object have to save in SOLR then retrive it as is and ability to query on any level of attribute -
{
"number": "19940852773",
"details": [{
"number": "19940852773",
"pId": 70972062,
"bReviewReduction": 66.7000
},
{
"number": "19940852773",
"pId": 70972063,
"bReviewReduction": 0.0000
}
],
"line_details": [{
"number": "19940852773",
"paymentId": 70972062,
"paymentDetailId": 14972918
},
{
"number": "19940852773",
"paymentId": 70972905,
"paymentDetailId": 14973428
},
{
"number": "19940852773",
"paymentId": 70972905,
"paymentDetailId": 14973429,
"numOfServiceLines": 21
}
]
}
Solr doesn't support indexing the content as-is, but you can use the custom JSON indexing features to transform it to a form. Submitting the document above will give you a document / set of documents containing a number of childDocument entries, and you'll have to handle this through the child/parent transformations when retrieving and querying.
Usually it's a better strategy to try to flatten the document structure into something suitable for answering the queries you want Solr to answer for you, instead of trying to keep the raw data structure and special case all queries to handle that.

What is the meaning here list in array calling

Below I have environment file and recipe can you explain I am not getting what is the list here.
{
"json_class": "Chef::Environment",
"description": "prod environment",
"default_attributes": {
},
"chef_type": "environment",
"override_attributes": {
"user": {
"mapr": {
"id": "application",
"group": "application",
},
"local" : {
"id": "chef",
"group": "chef"
},
"ldap" : {
"id": "ldap",
"sudo": true,
},
}
"name": "prod"
}
Below is the recipe what is the list here i did not get
node['user_create'].each do |list, user|
group user['group'] do
group_name user['group']
gid user['gid']
action [:create]
ignore_failure true
end
user user do
username user['id']
uid user['uid']
group user['gid']
home user['home']
manage_home true
end
if list !='ldap'
How list is passing here in if condition
You are not actually passing in any attributes via the environment, which you can see because the values of default_attributes and override_attributes are both just empty hashes { }. The data you've included there is just ignored by Chef as noise. In the future I recommend you use the Ruby DSL for environment files as it has more error checking for things like this (though not perfect error checking).
As an aside, you've been asking a lot of questions on here and seem to be struggling with Chef. Please consider joining the Chef community Slack team and asking there instead as it's a full chat system and thus the community could offer real-time help rather than here random blurbs.

How can you retrieve a full nested document in Solr?

In my instance of Solr 4.10.3 I would like to index JSONs with a nested structure.
Example:
{
"id": "myDoc",
"title": "myTitle"
"nestedDoc": {
"name": "test name"
"nestedAttribute": {
"attr1": "attr1Val"
}
}
}
I am able to store it correctly through the admin interface:
/solr/#/mySchema/documents
and I'm also able to search and retrieve the document.
The problem I'm facing is that when I get the response document from my Solr search, I cannot see the nested attributes. I only see:
{
"id": "myDoc",
"title": "myTitle"
}
Is there a way to include ALL the nested fields in the returned documents?
I tried with : "fl=[child parentFilter=title:myTitle]" but it's not working (ChildDocTransformerFactory from:https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents). Is that the right way to do it or is there any other way?
I'm using: Solr 4.10.3!!!!!!
To get returned all the nested structure, you indeed need to use ChildDocTransformerFactor. However, you first need to properly index your documents.
If you just passed your structure as it is, Solr will index them as separate documents and won't know that they're actually connected. If you want to be able to correctly query nested documents, you'll have to pre-process your data structure as described in this post or try using (modifying as needed) a pre-processing script. Unfortunately, including the latest Solr 6.0, there's no nice and smooth solution on indexing and returning nested document structures, so everything is done through "workarounds".
Particularly in your case, you'll need to transform your document structure into this:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name",
"_childDocuments_" :[
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}]
}
Then, the following ChildDocTransformerFactor query will return you all subdocuments (btw, although it says it's available since Solr 4.9, I've actually only seen it in Solr 5.3... so you need to test):
q=title:myTitle&fl=*,[child parentFilter=type:parentDoc limit=50]
Note, although it returns all nested documents, the returned document structure will be flattend (alas!), i.e., you'll get:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name"
},
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}
Probably, not really what you've expected but... this is the unfortunate Solr's behavior that will be fixed in a nearest future release.
You can put
q={!parent which=}
and in fl field :"fl=*,[child parentFilter=title:myTitle].
It will give you all parent field and children field of title:mytitle

fuzzy search in elasticsearch different than fuzziness match boolean

i'm trying to figure out why the following queries produce vastly different results. i'm told a fuzzy query is almost never a good idea per this document Found-fuzzy so i'm trying to use a match query with a fuzziness parameter. they produce extremely different results. i'm not sure what's the best way of doing this.
my example is a movie title containing 'batman'. the user, however, types 'bat man' (with a space). this would make sense that a fuzzy query should find batman. it should also find other variations like spider man, but for now that's ok i guess. (not really, but...)
so the fuzzy search is actually returning more relevant results than the match one below. any ideas?
--fuzzy:
{
"query":{
"bool":{
"should": [
{
"fuzzy": {
"title": {
"value": "bat man",
"boost": 4
}
}
}
], "minimum_number_should_match": 1
}
}
}
--match:
{
"query":{
"bool":{
"should": [
{
"match": {
"title": {
"query": "bat man",
"boost": 4
}
}
}
], "minimum_number_should_match": 1
}
}
}
EDIT
i'm adding examples of what gets returned.
first, nothing gets returned using the match query, even with a high fuzziness value added (fuzziness: 5)
but i do get several 'batman' related titles using the fuzzy query such as 'batman' or 'batman returns'.
this gets even stranger when i do multiple fuzzy searches on 'bat man' using the fuzzy search... if i search my 'starring' field, in addition to the title field, (starring contains lists of actors), i get 'jason bateman' as well as the title 'batman'.
{
"_index": "store24",
"_type": "searchdata",
"_id": "081227987909",
"_score": 4.600759,
"fields": {
"title": [
"Batman"
]
}
},
{
"_index": "store24",
"_type": "searchdata",
"_id": "883929053353",
"_score": 4.1418676,
"fields": {
"title": [
"Batman Forever"
]
}
},
{
"_index": "store24",
"_type": "searchdata",
"_id": "883929331789",
"_score": 3.5298011,
"fields": {
"title": [
"Batman Returns"
]
}
}
BEST SO FAR (STILL NOT GREAT)
what i've found that works best so far is to combine both queries. this seems redundant, but i can't as yet make one work like the other. so, this seems to be better:
"should": [
{
"fuzzy": {
"title": {
"boost": 6.0,
"min_similarity": 1.0,
"value": "batman"
}
}
},
{
"match": {
"title": {
"query": "batman",
"boost": 6.0
,"fuzziness": 1
}
}
}
]
Elastic Search analyzes docs and converts them into terms, which are what is actually searched (not the docs themselves). The key difference between the two query types is that the match query does not analyze the query text before sending the query. So consider the example below:
The search of 'bat man' in a fuzzy search would first tokenize the term, then search. So what it really looks for is 'btmn,' which might not turn up the same matches. A good example of this is how Jason Bateman showed up because the last name was tokenized to btmn or a similar form.
More detailed information on the Analyzing of text fields when searching can be read http://exploringelasticsearch.com/searching_data.html#sec-searching-analysis
When a search is performed on an analyzed field, the query itself is
analyzed, matching it up to the documents which are analyzed when
added to the database. Reducing words to these short tokens normalizes
the text allowing for fast efficient lookups. Whether you’re searching
for "rollerblading" in any form, internally we’re just looking for
"rollerblad".

searching an array deep inside a mongo document

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}
you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Resources