searching an array deep inside a mongo document - arrays

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}

you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Related

Azure Cognitive Search query specific data source

I have a Azure Cognitive Search set up with two DataSources, two Indexers indexing those DataSources and one Index.
I'd like to be able to able to query/filter by DataSource. Is that possible?
This is certainly possible. You'd just need to find a way to get a field in the index that contains the name of the data source. The best way to do this depends on the data source you're using--for example, if you're using a SQL data source, you might just be able to edit the query to return the value whereas that wouldn't work for blob storage.
Another option that would work for all data sources would be to include a skillset with a conditional skill which you can use to set a default value for a field.
Off of #Derek Legenzoff serve this is how I did it...
Add all your datasources
Create skillsets for each one data source with the following skill
{
"#odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
"name": "#1",
"description": "",
"context": "/document",
"inputs": [
{
"name": "condition",
"source": "= true"
},
{
"name": "whenTrue",
"source": "= 'Production'" //name of your environment
},
{
"name": "whenFalse",
"source": "= ''"
}
],
"outputs": [
{
"name": "output",
"targetName": "Env"
}
]
}
Create single index for your data model and add a Env field, and it's filterable and queriable
Create indexers for each one of your data sources that point to the index from step 3 and datasource from step 1
IMPORTANT: make sure you indexers have the following code in the defnition.
"outputFieldMappings": [
{
"sourceFieldName": "/document/Env",
"targetFieldName": "Env"
}
],
This will connect the product of the skill to the index

Solr query for child documents and return parents and filtered children

I'm having trouble creating a Solr query to be able to pull out the right documents, and am starting to wonder if what I am trying to do is even possible.
Currently on Solr 8.9 using a managed schema and every field is using a wildcard field.
Firstly what the document looks like
(changed names due to redacting internal business language):
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates_s": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
},
{
"id": "COUNTY:1CITY:2",
"city_name_s": "Watford",
"size": {
"id": "COUNTY:1CITY:2SIZE:2",
"sq_ft_s": "150",
"sq_meters_s": "10000"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
And what I want to return:
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
Basically my goal is to return more or less the entire thing, however with filtering out one of the cities. For example, the condition for the city would be like city_name_s:"St Albans". So it's to say that I want the parent and all children, however if the child is in that array (ie cities array), then the given field (city_name_s) must equal my defined value, or we don't want that child.
Things I've tried:
I've basically tried two approaches here:
I've tried to play around with {!child} and {!parent} to get a result that I want. Currently I can only get something from City level or the entire thing as if the filter was not there at county level.
I've tried to change values for the childFilter option, with things like:
city_name_s:"St Albans" OR (*:* NOT city_name_s:[* TO *]) to try and say 'if field exists it should be this'.
Anyhow I'm starting to run out of ideas with this; been hacking away at it for the past couple of days and not really got any closer.
Thanks in advance for any help; bashing my head against the wall currently so any suggestions are more than welcome :)
I had a similar issue in solr 9.0.0 and this solved it for me: Apache Solr Filter on Child Documents
In your case, just add fl=*,[child childFilter=city_name_s:"St Albans"]

Fix data post API call to fix inconsistencies, like typos?

Ok, so this is going to be a complicated question, I hope I'm clear. Full admission, I just finished a Bootcamp yesterday so I'm not aware of a lot of technologies out there, and I think I may need additional technologies to accomplish what I'm looking for...
Right now, I have an application that uses bandsintown API call to populate a database. What I've noticed is that bandsintown isn't consistent with their data returns in each object, which makes operations after retrieving the objects difficult/seemingly impossible. An example would be that different artists performing at the same venue returns different latitude, longitude, venue name, etc. Examples:
Here is Primus playing at Bonnaroo:
{
"offers": [],
"venue": {
"country": "United States",
"city": "Manchester",
"latitude": "35.4839582",
"name": "Bonnaroo Music and Arts Festival 2020",
"location": "",
"region": "TN",
"longitude": "-86.08963169999998"
},
"datetime": "2020-09-25T12:00:00",
"on_sale_datetime": "",
"description": "",
"lineup": [
"Primus"
],
"bandsintown_plus": false,
"id": "1020701795",
"title": "",
"artist_id": "1263",
"url": "https://www.bandsintown.com/e/1020701795?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event"
}
compared to The Weeknd playing at Bonnaroo:
{
"id": "18604416",
"url": "https://www.bandsintown.com/e/18604416?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event",
"datetime": "2017-05-17T19:00:00",
"title": "",
"description": "",
"venue": {
"location": "",
"name": "Bonnaroo",
"latitude": "35.476247",
"longitude": "-86.081026",
"city": "Manchester",
"country": "United States",
"region": "TN"
},
"lineup": [
"The Weeknd"
],
"offers": [],
"artist_id": "1371750",
"on_sale_datetime": "",
"bandsintown_plus": false
}
My issue is now I wish to aggregate and $group in MongoDB because both events were at Bonnaroo, but the Object{venue.name} is not the same... Even the latitude & longitude is different so I can't use those either. I'm wondering if there is a way to alter the data of the objects automatically without having to go into the DB and edit individual objects. Both these events include the word Bonnaroo, so could I have something find and match text and then slice out the text that isn't similar? If so, can I then use the matched venue name field as a reference to change the latitude & longitude values too?
I hope I was clear, feel free to ask any clarifying questions if I wasn't. This site has helped me so many times and I appreciate all the hard work the community puts in to help each other! Thanks ahead of time!
~~~EDIT~~~
Thanks for the first reply #morad takhtameshloo.
So I was able to build something before I saw your reply that splits the data into an array, which is along the same lines as what you offered. The only thing that won't work is the $arrayToElem with the index cause there are some venues that:
Have multiple-word names (e.g. The Stone Pony)
Have words before the actual venue name (saw it in one result that was like
"Verizon Live Presents at The Stony Pony")
Using this Bonnaroo example, I have the new field returning every word as a value in the array:
"venueName": ["Bonnaroo", "Music", "and","Arts","Festival","2020"]
My next step is going to be to compare the [venueName] of the 'Primus' object and the 'The Weeknd' object, find what values in the array are the same, and return them back to the value of "venueName".
Hope this makes more sense, I appreciate your input!
actual the trick depends to your data, you should provide more data if the ones you've provided does not depict the whole problem
in other words how deep you want to dive in.
for the dumbest answer, at least for the data you've provided
db.prod4.aggregate([
{
$addFields: {
venueName: {
$arrayElemAt: [{ $split: ['$venue.name', ' '] }, 0],
},
},
},
])
but that not the case of course, something that comes to mind is that venue's geolocations for the same venue should not be far away from each other, for instance, the data you've provided two locations are in 1.16 KM of each other.
so another dummy solution that works would be writing a simple script that selects a random element from the array of all data, and finds data that their lat/lng is for example in 2km of that point, and removes those elements from array and selects another random element from the array and do the same
if you provide more data it would be much more easier, because the easiest solution is to find many patterns and plan only for them

How to use Split Skill in azure cognitive search?

I am new to Azure cognitive search. I have a docx file which is stored in azure blob storage.I am using #Microsoft.Skills.Text.SplitSkill to split the document into multiple pages(chunks).But when I index the output of this skill,I am getting the entire docx file content.how do I return the "pages" from the SplitSkill so that the user sees the portion of the original document that was found by their search instead of returning entire document?
Please assist me.Thank you in advance.
The split skill allows you to split text into smaller chunks/pages that can be then processed by additional cognitive skills.
Here is what a minimalistic skillset that does splitting and translation may look like:
"skillset": [
{
"#odata.type": "#Microsoft.Skills.Text.SplitSkill",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"defaultLanguageCode": "en",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "languageCode",
"source": "/document/language"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "mypages"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.TranslationSkill",
"name": "#2",
"description": null,
"context": "/document/mypages/*",
"defaultFromLanguageCode": null,
"defaultToLanguageCode": "es",
"suggestedFrom": "en",
"inputs": [
{
"name": "text",
"source": "/document/mypages/*"
}
],
"outputs": [
{
"name": "translatedText",
"targetName": "translated_text"
}
]
}
]
Note that the split skill generated a collection of text elements under the "\document\mypages" node in the enriched tree. Also not that by providing the context "\document\mypages*" to the translation skill, we are telling the translation skill to perform translation on "each page".
I should point out that documents will still be indexed at the document level though. Skillsets are not really built to "change the cardinality of the index". That said, a workaround for that may be to project each of the pages as separate elements into a knowledge store, and then create a separate index that is actually focused on indexing each page.
Learn more about the knowledge store projections here:
https://learn.microsoft.com/en-us/azure/search/knowledge-store-concept-intro

How can you retrieve a full nested document in Solr?

In my instance of Solr 4.10.3 I would like to index JSONs with a nested structure.
Example:
{
"id": "myDoc",
"title": "myTitle"
"nestedDoc": {
"name": "test name"
"nestedAttribute": {
"attr1": "attr1Val"
}
}
}
I am able to store it correctly through the admin interface:
/solr/#/mySchema/documents
and I'm also able to search and retrieve the document.
The problem I'm facing is that when I get the response document from my Solr search, I cannot see the nested attributes. I only see:
{
"id": "myDoc",
"title": "myTitle"
}
Is there a way to include ALL the nested fields in the returned documents?
I tried with : "fl=[child parentFilter=title:myTitle]" but it's not working (ChildDocTransformerFactory from:https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents). Is that the right way to do it or is there any other way?
I'm using: Solr 4.10.3!!!!!!
To get returned all the nested structure, you indeed need to use ChildDocTransformerFactor. However, you first need to properly index your documents.
If you just passed your structure as it is, Solr will index them as separate documents and won't know that they're actually connected. If you want to be able to correctly query nested documents, you'll have to pre-process your data structure as described in this post or try using (modifying as needed) a pre-processing script. Unfortunately, including the latest Solr 6.0, there's no nice and smooth solution on indexing and returning nested document structures, so everything is done through "workarounds".
Particularly in your case, you'll need to transform your document structure into this:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name",
"_childDocuments_" :[
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}]
}
Then, the following ChildDocTransformerFactor query will return you all subdocuments (btw, although it says it's available since Solr 4.9, I've actually only seen it in Solr 5.3... so you need to test):
q=title:myTitle&fl=*,[child parentFilter=type:parentDoc limit=50]
Note, although it returns all nested documents, the returned document structure will be flattend (alas!), i.e., you'll get:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name"
},
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}
Probably, not really what you've expected but... this is the unfortunate Solr's behavior that will be fixed in a nearest future release.
You can put
q={!parent which=}
and in fl field :"fl=*,[child parentFilter=title:myTitle].
It will give you all parent field and children field of title:mytitle

Resources