Index a dictionary property in azure search - azure-cognitive-search

I have a DTO with a property of type Dictionary<string, string>. It's not annotated. When I upload my DTO and call indexClient.Documents.Index(batch), I get this error back from the service:
The request is invalid. Details: parameters : A node of type 'StartObject' was read from the JSON reader when trying to read the contents of the property 'Data'; however, a 'StartArray' node was expected.
The only way I've found to avoid it is by setting it to null. This is how I created my index:
var fields = FieldBuilder.BuildForType<DTO>();
client.Indexes.Create(new Index
{
Name = indexName,
Fields = fields
});
How can I index my dictionary?

Azure Cognitive Search doesn't support fields that behave like loosely-typed property bags like dictionaries. All fields in the index must have a well-defined EDM type.
If you don't know the set of possible fields at design-time, you have a couple options, but they come with big caveats:
In your application code, add new fields to the index definition as you discover them while indexing documents. Updating the index will add latency to your overall write path, so depending on how frequently new fields are added, this may or may not be practical.
Model your "dynamic" fields as a set of name/value collection fields, one for each desired data type. For example, if a new string field "color" is discovered with value "blue", the document you upload might look like this:
{
"id": "123",
"someOtherField": 3.5,
"dynamicStringFields": [
{
"name": "color",
"value": "blue"
}
]
}
Approach #1 risks bumping into the limit on the maximum number of fields per index.
Approach #2 risks bumping into the limit on the maximum number of elements across all complex collections per document. It also complicates the query model, especially for cases where you might want correlated semantics in queries.

Related

MongoDB: deleting an array value in multiple documents in one collection using pymongo

Afternoon all. I hope you can help with a problem. I am using creating a Flask app and using pymongo and I need to delete a singular array value from multiple arrays across many documents within one collection. I have recipe documents within a 'recipes' collection, and within each of them is an array ('category') that can be between 1 and 3 values in length. I have a feature available within the admin section of my site that allows for the deletion of a category from the 'categories' collection and so, to go with that, I want to delete any instances of that category value within the 'category' array on any recipe document within the recipes 'collection'.
I have managed to do this on a single document:
mongo.db.recipes.update(
{"_id": ObjectId("60df2eec9df14e58d0c698c8")},
{"$pull": {"category": category_id}}
)
but have not managed this on the collection as a whole I have determined I need to use the $all with the $pull operator on the collection using the 'update_many' query and have so far tried:
mongo.db.recipes.update_many({"category": {"$all": {"$pull": {"category": category_id}}}})
mongo.db.recipes.update_many({"$all": {"$pull": {"category": category_id}}})
mongo.db.recipes.update_many({"category": {"$pullAll": {"category": category_id}}})
None of them work. I keep getting a the error:
TypeError: update_many() missing 1 required positional argument: 'update'
Any help would be appreciated.
Read - https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html?highlight=update_many#pymongo.collection.Collection.update_many
1st parameter is filter which will be {} if you want to update all document
mongo.db.recipes.update_many(
{} // filter filter,
{} // update
)
mongo.db.recipes.update_many({ }, {"$pull": {"category": category_id}})

Angular alphabetizes GET response

I am currently trying to make an interactive table in Angular that reflects table information from a SQL database.
The stack I am using is MSSQL, Express.js, and AngularJS. When I log the response in Node, the data is in the desired order. However, when I log the data from .success(function(data)), the fields are alphabetized and the rows are put in random order).
I am sending a JSON object (an array of rows EX. {"b":"blah","a":"aye"}). However the row is received in Angular as {"a":"aye","b":"blah"}.
Desired affect -> Use column and row ordering from SQL query in client view. Remove "magic" angular is using to order information.
In Javascript, the properties of an object do not have guaranteed order. You need to send a JSON array instead:
["blah", "aye"]
If you need the column names as well you can send down an array of objects:
[{ "col":"b", "value":"blah" }, { "col":"a", "value":"aye" }]
Or alternatively, an object of arrays:
{ "col": ["b", "a"], "value": ["blah", "aye"] }
Edit: After some more thought, you're ideal JSON structure would probably look like this:
{
"col": ["b","a"],
"row": [
["blah","aye"],
["second","row"],
["and","so on"]
]
}
Now instead of getting "blah" from accessing table[0]['b'] like you would've before, you'll need to do something like table.row[0][table.col.indexOf('b')]

MongoDB: Query and retrieve objects inside embedded array?

Let's say I have the following document schema in a collection called 'users':
{
name: 'John',
items: [ {}, {}, {}, ... ]
}
The 'items' array contains objects in the following format:
{
item_id: "1234",
name: "some item"
}
Each user can have multiple items embedded in the 'items' array.
Now, I want to be able to fetch an item by an item_id for a given user.
For example, I want to get the item with id "1234" that belong to the user with name "John".
Can I do this with mongoDB? I'd like to utilize its powerful array indexing, but I'm not sure if you can run queries on embedded arrays and return objects from the array instead of the document that contains it.
I know I can fetch users that have a certain item using {users.items.item_id: "1234"}. But I want to fetch the actual item from the array, not the user.
Alternatively, is there maybe a better way to organize this data so that I can easily get what I want? I'm still fairly new to mongodb.
Thanks for any help or advice you can provide.
The question is old, but the response has changed since the time. With MongoDB >= 2.2, you can do :
db.users.find( { name: "John"}, { items: { $elemMatch: { item_id: "1234" } } })
You will have :
{
name: "John",
items:
[
{
item_id: "1234",
name: "some item"
}
]
}
See Documentation of $elemMatch
There are a couple of things to note about this:
1) I find that the hardest thing for folks learning MongoDB is UN-learning the relational thinking that they're used to. Your data model looks to be the right one.
2) Normally, what you do with MongoDB is return the entire document into the client program, and then search for the portion of the document that you want on the client side using your client programming language.
In your example, you'd fetch the entire 'user' document and then iterate through the 'items[]' array on the client side.
3) If you want to return just the 'items[]' array, you can do so by using the 'Field Selection' syntax. See http://www.mongodb.org/display/DOCS/Querying#Querying-FieldSelection for details. Unfortunately, it will return the entire 'items[]' array, and not just one element of the array.
4) There is an existing Jira ticket to add this functionality: it is https://jira.mongodb.org/browse/SERVER-828 SERVER-828. It looks like it's been added to the latest 2.1 (development) branch: that means it will be available for production use when release 2.2 ships.
If this is an embedded array, then you can't retrieve its elements directly. The retrieved document will have form of a user (root document), although not all fields may be filled (depending on your query).
If you want to retrieve just that element, then you have to store it as a separate document in a separate collection. It will have one additional field, user_id (can be part of _id). Then it's trivial to do what you want.
A sample document might look like this:
{
_id: {user_id: ObjectId, item_id: "1234"},
name: "some item"
}
Note that this structure ensures uniqueness of item_id per user (I'm not sure you want this or not).

Using CouchDB-lucene how can I index an array of objects (not values)

Hello everyone and thanks in advance for any ideas, suggestions or answers.
First, the environment: I am using CouchDB (currently developing on 1.0.2) and couchdb-lucene 0.7. Obviously, I am using couchdb-lucene ("c-l" hereafter) to provide full-text searching within couchdb.
Second, let me provide everyone with an example couchdb document:
{
"_id": "5580c781345e4c65b0e75a220232acf5",
"_rev": "2-bf2921c3173163a18dc1797d9a0c8364",
"$type": "resource",
"$versionids": [
"5580c781345e4c65b0e75a220232acf5-0",
"5580c781345e4c65b0e75a220232acf5-1"
],
"$usagerights": [
{
"group-administrators": 31
},
{
"group-users": 3
}
],
"$currentversionid": "5580c781345e4c65b0e75a220232acf5-1",
"$tags": [
"Tag1",
"Tag2"
],
"$created": "/Date(1314973405895-0500)/",
"$creator": "administrator",
"$modified": "/Date(1314973405895-0500)/",
"$modifier": "administrator",
"$checkedoutat": "/Date(1314975155766-0500)/",
"$checkedoutto": "administrator",
"$lastcommit": "/Date(1314973405895-0500)/",
"$lastcommitter": "administrator",
"$title": "Test resource"
}
Third, let me explain what I want to do. I am trying to figure out how to index the '$usagerights' property. I am using the word index very loosely because I really do not care about being able to search it, I simply want to 'store' it so that it is returned with the search results. Anyway, the property is an array of json objects. Now, these json objects that compose the array will always have a single json property.
Based on my understanding of couchdb-lucene, I need to reduce this array to a comma separated string. I would expect something like "group-administrators:31,group-users:3" to be a final output.
Thus, my question is essentially: How can I reduce the $usagerights json array above to a comma separated string of key:value pairs within the couchdb design document as used by couchdb-lucene?
A previous question I posted regarding indexing of tagging in a similar situation, provided for reference: How-to index arrays (tags) in CouchDB using couchdb-lucene
Finally, if you need any additional details, please just post a comment and I will provide it.
Maybe I am missing something, but the only difference I see from your previous question, is that you should iterate on the objects. Then the code should be:
function(doc) {
var result = new Document(), usage, right;
for(var i in doc.$usagerights) {
usage = doc.$usagerights[i];
for(right in usage) {
result.add(right + ":" + usage[right]);
}
}
return result;
}
There's no requirement to convert to a comma-separated list of values (I'd be intrigued to know where you picked up that idea).
If you simply want the $usagerights item returned with your results, do this;
ret.add(JSON.stringify(doc.$usagerights),
{"index":"no", "store":"yes", "field":"usagerights"});
Lucene stores strings, not JSON, so you'll need to JSON.parse the string on query.

mongodb - retrieve array subset

what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance
In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.
This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828
I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.
Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};

Resources