Does anyone know what is complex collections in azure Search service? - azure-cognitive-search

From Microsoft doc, it says these:
Maximum complex collection fields: 40 Maximum elements across all
complex collections per document:3000
But what is complex collection? If I have a collection of string, is it a complex collection?

In terms of JSON, a complex collection is an array of objects. In terms of an index definition in Azure Cognitive Search, it’s a field of type Collection(Edm.ComplexType). The subfields of such a field correspond to the JSON properties of objects in an array.
For example, if you define a field Addresses with string subfields Street and City, then in the context of a document it might look like this:
{
...
"Addresses": [
{ "Street": "123 Main Street", "City": "Bellevue" },
{ "Street": "221B Baker Street", "City": "London" }
]
}

Related

Is there a way to auto generate ObjectIds inside arrays of MongoDB?

Below is my MongoDb collection structure:
{
"_id": {
"$oid": "61efa44933eabb748152a250"
},
"title": "my first blog",
"body": "Hello everyone,wazuzzzzzzzzzzzzzzzzzup",
"comments": [{
"comment": "fabulous work bruhv",
}]
}
}
Is there a way to auto generate ids for comments without using something like this:
db.messages.insert({messages:[{_id:ObjectId(), message:"Message 1."}]});
I found the above method from the SO question:
mongoDB : Creating An ObjectId For Each New Child Added To The Array Field
But someone in the comments pointed out that:
"I have been looking at how to generate a JSON insert using ObjectID() and in my travels have found that this solution is not very good for indexing. The proposed solution has _id values that are strings rather than real object IDs - i.e. "56a970405ba22d16a8d9c30e" is different from ObjectId("56a970405ba22d16a8d9c30e") and indexing will be slower using the string. The ObjectId is actually represented internally as 16 bytes."
So is there a better way to do this?

Fix data post API call to fix inconsistencies, like typos?

Ok, so this is going to be a complicated question, I hope I'm clear. Full admission, I just finished a Bootcamp yesterday so I'm not aware of a lot of technologies out there, and I think I may need additional technologies to accomplish what I'm looking for...
Right now, I have an application that uses bandsintown API call to populate a database. What I've noticed is that bandsintown isn't consistent with their data returns in each object, which makes operations after retrieving the objects difficult/seemingly impossible. An example would be that different artists performing at the same venue returns different latitude, longitude, venue name, etc. Examples:
Here is Primus playing at Bonnaroo:
{
"offers": [],
"venue": {
"country": "United States",
"city": "Manchester",
"latitude": "35.4839582",
"name": "Bonnaroo Music and Arts Festival 2020",
"location": "",
"region": "TN",
"longitude": "-86.08963169999998"
},
"datetime": "2020-09-25T12:00:00",
"on_sale_datetime": "",
"description": "",
"lineup": [
"Primus"
],
"bandsintown_plus": false,
"id": "1020701795",
"title": "",
"artist_id": "1263",
"url": "https://www.bandsintown.com/e/1020701795?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event"
}
compared to The Weeknd playing at Bonnaroo:
{
"id": "18604416",
"url": "https://www.bandsintown.com/e/18604416?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event",
"datetime": "2017-05-17T19:00:00",
"title": "",
"description": "",
"venue": {
"location": "",
"name": "Bonnaroo",
"latitude": "35.476247",
"longitude": "-86.081026",
"city": "Manchester",
"country": "United States",
"region": "TN"
},
"lineup": [
"The Weeknd"
],
"offers": [],
"artist_id": "1371750",
"on_sale_datetime": "",
"bandsintown_plus": false
}
My issue is now I wish to aggregate and $group in MongoDB because both events were at Bonnaroo, but the Object{venue.name} is not the same... Even the latitude & longitude is different so I can't use those either. I'm wondering if there is a way to alter the data of the objects automatically without having to go into the DB and edit individual objects. Both these events include the word Bonnaroo, so could I have something find and match text and then slice out the text that isn't similar? If so, can I then use the matched venue name field as a reference to change the latitude & longitude values too?
I hope I was clear, feel free to ask any clarifying questions if I wasn't. This site has helped me so many times and I appreciate all the hard work the community puts in to help each other! Thanks ahead of time!
~~~EDIT~~~
Thanks for the first reply #morad takhtameshloo.
So I was able to build something before I saw your reply that splits the data into an array, which is along the same lines as what you offered. The only thing that won't work is the $arrayToElem with the index cause there are some venues that:
Have multiple-word names (e.g. The Stone Pony)
Have words before the actual venue name (saw it in one result that was like
"Verizon Live Presents at The Stony Pony")
Using this Bonnaroo example, I have the new field returning every word as a value in the array:
"venueName": ["Bonnaroo", "Music", "and","Arts","Festival","2020"]
My next step is going to be to compare the [venueName] of the 'Primus' object and the 'The Weeknd' object, find what values in the array are the same, and return them back to the value of "venueName".
Hope this makes more sense, I appreciate your input!
actual the trick depends to your data, you should provide more data if the ones you've provided does not depict the whole problem
in other words how deep you want to dive in.
for the dumbest answer, at least for the data you've provided
db.prod4.aggregate([
{
$addFields: {
venueName: {
$arrayElemAt: [{ $split: ['$venue.name', ' '] }, 0],
},
},
},
])
but that not the case of course, something that comes to mind is that venue's geolocations for the same venue should not be far away from each other, for instance, the data you've provided two locations are in 1.16 KM of each other.
so another dummy solution that works would be writing a simple script that selects a random element from the array of all data, and finds data that their lat/lng is for example in 2km of that point, and removes those elements from array and selects another random element from the array and do the same
if you provide more data it would be much more easier, because the easiest solution is to find many patterns and plan only for them

Query nested elements from unknown number of arrays

Querying to get an element from an array that is nested in an object is not a big deal. But, when you have a self-repeating structure that is nested within itself, it seems to become much more complicated. Here is an example document:
{
"_id": "1",
"name": "Alpha",
"options": [
{
"_id": "2",
"name": "Bravo",
"options": [
]
},
{
"_id": "3",
"name": "Charlie",
"options": [
{
"_id": "4",
"name": "Delta",
"options": [
...etc
]
}
]
}
]
}
As you can see, here we have a number of arrays holding objects that are alike. This is useful for, say, a tabbed interface being configured out of a DB.
What I'm looking to do is actually be able to modify the documents that are further down in the structure without knowing their positions in the ancestor arrays. So far I've only found the "$" update operator helpful when you don't have to deal with more than one level of arrays. But as you can see, there are arrays within objects that are inside of arrays here, so it becomes complicated.
Pushing to an array in the first level is not difficult:
db.myCollection.update({_id: "1", "options._id": "2"}, {$push: {"options.$.options": {...object definition here...}}});
This works. However, when you try to go deeper into the structure, I don't understand how to target the deeper portions of the document.
The following query is not working for me, and I'm hoping somebody could help me out:
db.myCollection.update({_id: "1", "options.options._id": "4"}, {"options.options.$.options": {...object definition here...}});
It whines about how I can't use dot syntax. Which makes sense, because in this case "options.options" is ambiguous. I was hoping that the query selection would have pointed to the right place.
tl;dr: How do you update document portions that are multiple levels deep nested within more than one array where you do not know the index of the target element? You can select it, so you should be able to modify it.

JSON Schema: verifying object's values, without keys

Not to confuse anybody, I'll start with validating arrays...
Regarding arrays, JSON Schema can check whether elements of an (((...)sub)sub)array conform to a structure:
"type": "array",
"items": {
...
}
When validating objects, I know I can pass certain keys with their corresponding value types, such as:
"type": "object",
"properties": {
// key-value pairs, might also define subschemas
}
But what if I've got an object which I want to use to validate values only (without keys)?
My real-case example is that I'm configuring buttons: there might be edit, delete, add buttons and so on. They all have specific, rigid structure, which I do have JSON schema for. But I don't want to limit myself to ['edit', 'delete', 'add'] only, there might be publish or print in the future. But I know they all will conform to my subschema.
Each button is:
BUTTON = {
"routing": "...",
"params": { ... },
"className": "...",
"i18nLabel": "..."
}
And I've got an object (not an array) of buttons:
{
"edit": BUTTON,
"delete": BUTTON,
...
}
How can I write such JSON schema? Is there any way of combining object with items (I know there are object-properties and array-items relations).
You can use additionalProperties for this. If you set additionalProperties to a schema instead of a boolean, then any properties that aren't explicitly declared using the properties or patternProperties keywords must match the given schema.
{
"type": "object",
"additionalProperties": {
... BUTTON SCHEMA ...
}
}
http://json-schema.org/latest/json-schema-validation.html#anchor64

searching an array deep inside a mongo document

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}
you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Resources