I am trying to write a query in the MongoDB CLI to search for documents that are sharing an arbitrary number of fields, out of a specific, predetermined, set of fields.
To give you an example: let's consider a database composed of documents with variable fields. These fields can be shared among documents, but not necessarily.
{
_id: ObjectId("AAA"),
field1: "value_a",
field2: "value_b",
field3: "value_l",
field6: "value_n"
}
{
_id: ObjectId("BBB"),
field1: "value_c",
field3: "value_e"
}
{
_id: ObjectId("CCC"),
field2: "value_f",
field4: "value_g"
}
{
_id: ObjectId("DDD"),
field1: "value_m",
field5: "value_h",
field2: "value_i",
field6: "value_j",
field7: "value_k"
}
{
_id: ObjectId("EEE"),
field8: "value_o"
}
The challenge is to have a query in which one can not only specify the fields of interest, but also to specify/force the amount of fields to be shared among documents (instead of having an exact match, as mentioned below with the use of $exists). For the fields a document may have that are not in the list, it does not matter whether they are present or not.
For clarification, let's say we are interested in the list of fields ["field1", "field3", "field6", "field8"] in the documents shown above, and we want to know which documents share X fields (regardless which one, as long as they are from the list we defined).
Let's call QUERY1 the query returning the documents sharing exactly X fields, as described below:
Documents with exactly one field: returns the document with ObjectId("EEE")
Documents with exactly two fields: returns the documents with ObjectId("BBB") and ("DDD")
Documents with exactly three fields: returns the document with ObjectId("AAA")
Additionally, let's call QUERY2 the query returning documents that share at least X fields?
Documents with at least one field: returns the documents with ObjectId("AAA"), ("BBB"), ("DDD"), ("EEE")
Documents with at least two fields: returns the documents with ObjectId("AAA"), ("BBB"), ("DDD")
Documents with at least three fields: returns the document with ObjectId("AAA")
I have been trying with $exists, however the problem is that the query returns the documents sharing exactly those four fields, without the flexibility explained above: db.documents.find({'field1' : {$exists: true}, 'field3' : {$exists: true}, 'field6' : {$exists: true}, 'field8' : {$exists: true})
Does anyone know how to write QUERY1 and QUERY2?
Moreover, as I would like to create new collections out of the results from those queries, ideally the queries would work with the following function at the end : .forEach(function(x){db.newCollection.insert(x)});
Thank you very much for your help, very appreciated.
db.documents.aggregate([
{
"$project": {
"matchedFieldCount": {
$size: {
"$filter": {
"input": {
"$objectToArray": "$$ROOT"
},
"cond": {
"$in": [
"$$this.k",
[
"field1",
"field3",
"field6",
"field8"
]
]
}
}
}
}
}
},
{ $match: { matchedFieldCount: 1 } }
])
You can adjust the $match in the pipeline to do QUERY1 or QUERY2. And if you want you can project just the _id by adding one more pipeline step.
Related
Passing an object to an existing field in $set or $addFields merges objects rather than replaces them, e.g.
https://mongoplayground.net/p/aXe-rExjCXr
// Collection
[
{
"_id": "123",
"options": {
"size": "Large",
"color": "Red"
}
}
]
// Aggregate
db.collection.aggregate([
{
$set: {
options: {
size: "Small"
}
}
}
]);
// Expect
[
{
"_id": "123",
"options": {
"size": "Small"
}
}
]
// Actual
[
{
"_id": "123",
"options": {
"size": "Small",
"color": "Red" // <-- Not expected?
}
}
]
(It get's even weirder with arrays)
Is it possible to have it behave like non-object values and simply replace the field?
For context, I want to use this in an aggregate update pipeline.
This is the expected behaviour, and as far as i know there is not plan to change, as far as i remembered there was a jira with this, but they closed it, meaning that it will not change i think.
$set/$addFields replace always except
array field and i add document => array with all members that document
document field and i add document => merge documents (this is your case here)
$project replace always except
array field and i add document => array with all members that document
Solutions
You can override this "weird" behaviour especially in case of
arrays, by $unset the old field first for example, and then $set
Based on the jira in the comment bellow, we can also use $literal to avoid this, but when we use $literal we have to be sure that we dont use expressions because they will not be evaluated.
(expressions like path references, variables, operators etc)
I'm trying to figure out the best way to implement this.
If I have one large collection in my mongodb that holds all of my "Inventory" information for my warehouse without regard to the specific "type" of inventory, what is the best way to aggregate the data into their own collections continuously? **Added this information after-the-fact: I'm using a Mean stack so maybe some of this is better to just do server-side with an angular function rather than actually keeping a collection updated?
For instance, my current "Inventory" collection would have items such as
_id: something, name: item1, type: chemical, vendor: something, safetycode: ####, machineUse: n/a, equipmentUse: n/a ...
_id: something2, name: item2, type: machine, vendor: something2 safetycode: n/a, machineUse: "digging", equipmentUse: n/a ...
_id: something3, name: item3, type: equipment, vendor: something3 safetycode: n/a, machineUse: n/a, equipmentUse: "Computer" ...
I'm inclined to $group but is this best practice to keep a 'SEPERATE' collection updated with their respective groups? You'll notice in the following that the aggregate function should collect the specific 'type'(Chemical, Machine, equipment, etc...) and store all the details of each item collected with fields that are only used for that 'type' (i.e, Chemicals use 'safetycode', machines DO NOT use saftey code so it's left out and instead 'machineUse' is stored, etc.)
db.invfulls.aggregate([
{ "$group": {
"_id": "$itype",
"total": {$sum : 1},
"items":{
"$push":{
"$cond":{
"if": {"$eq":["$itype","Chemical"]},
"then": {"id":"$_id", "name":"$name", "vendor":"$vendor", "review":"$needsreview"},
"else": {"id":"$_id", "name":"$name", "vendor":"$vendor"}
}
}
}
}},
{$out: "invByType"}
])
Additionally, would I have to make this a function in the database and call that function anytime there is a new "post" made?
I've read a bit about the mapReduce as well but everything I read says it's a very slow and shouldn't be used?
I'm trying to do exactly what the poster in this link was trying to accomplish. I have documents with the same structure as the poster; in my documents there is an array of objects, each with many keys. I want to bring back all objects (not just the first, as you can with an $elemMatch) in that array where a key's value matches my query. I want my query's result to simply be an array of objects, where there is a key in each object that matches my query. For example, in the case of the linked question, I would want to return an array of objects where "class":"s2". I would want returned:
"FilterMetric" : [
{
"min" : "0.00",
"max" : "16.83",
"avg" : "0.00",
"class" : "s2"
},
{
"min" : "0.00",
"max" : "16.83",
"avg" : "0.00",
"class" : "s2"
}
]
I tried all the queries in the answer. The first two queries bring back an empty array in robomongo. In the shell, the command does nothing and return me to the next line. Here's a screenshot of robomongo:
On the third query in the answer, I get an unexpected token for the line where "input" is.
I'm using MongoDB version 3.0.2. It appears as if the OP was successful with the answer, so I'm wondering if there is a version issue, or if I'm approaching this the wrong way.
The only problem with the answers in that question seems to be that they're using the wrong casing for FilterMetric. For example, this works:
db.sample.aggregate([
{ "$match": { "FilterMetric.class": "s2" } },
{ "$unwind": "$FilterMetric" },
{ "$match": { "FilterMetric.class": "s2" } },
{ "$group": {
"_id": "$_id",
"FilterMetric": { "$push": "$FilterMetric" }
}}
])
This question already has answers here:
Matching an array field which contains any combination of the provided array in MongoDB
(2 answers)
Closed 8 years ago.
Is there a way to Match every element in database document's array in Mongo?
For instance, with document:
{
Stuff: ['chicken', 'stock']
}
Is there a query that would take as input in some way ['chicken', 'flavored', 'stock'] and return this document, but wouldn't return it with an input of just ['chicken']?
You can do this by combining multiple operators to perform a double-negation that's a bit hard to keep track of in your head (at least it is for me!):
// Find the docs where Stuff only contains 'chicken' or 'stock' elements
db.test.find({Stuff: {$not: {$elemMatch: {$nin: ['chicken', 'stock']}}}})
So the $elemMatch with the $nin is finding the docs where a single element of the Stuff array is not in the set of strings, and then the parent $not inverts the match to return all the docs where that didn't match any elements.
Starting with docs of:
db.test.insert([
{Stuff: ['chicken']},
{Stuff: ['chicken', 'stock']},
{Stuff: ['chicken', 'flavored', 'stock']}
])
The above query returns:
{
"_id": ObjectId("53dec1c5393fa0461f92334c"),
"Stuff": [
"chicken"
]
}
{
"_id": ObjectId("53dec1c5393fa0461f92334d"),
"Stuff": [
"chicken",
"stock"
]
}
Note, however, that this will also return docs where Stuff is either missing or has no elements. To exclude those you need to $and in a qualifier that ensures Stuff has at least one element:
db.test.find({$and: [
{Stuff: {$not: {$elemMatch: {$nin: ['chicken', 'stock']}}}},
{'Stuff.0': {$exists: true}}
]})
Actually, a text index should help here, as it is a OR match, sortable by score, so that the result with the best match can easily be found. The more of the search words are found, the better the score. Plus, the text search provides stemming and stop words. In the example given that would mean that "chickens" and "the chicken" would give identical results.
db.yourCollection.ensureIndex({"Stuff":"text"})
db.yourCollection.find({ $text: { $search:"chicken stock", $language:"en" } },{ score: { $meta: "textScore" } })).sort( { score: { $meta: "textScore" } } ).limit(whatEverYouDeemAppropriateInt)
Please see the MongoDB docs about Text Indexes for further details.
what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance
In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.
This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828
I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.
Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};