Let's say I have a collection of documents in the following format:
{
// some fields
"name" : "some name",
"specs" : [
{
"key" : {
"en" : "English key name",
"xx" : "Other key name",
},
"value" : {
"en" : "English value",
"xx" : "Other value",
}
},
{
"key2" : {
"en" : "English key name2",
"xx" : "Other key name2",
},
"value2" : {
"en" : "English value2",
"xx" : "Other value2",
}
},
//and some more sub-documents
],
}
I'm trying to query it from the database to get it in the following format:
{
"name" : "some name",
"specs" : [
{
"key" : "English key name",
"value" : "English value",
},
{
"key2" : "English key name2",
"value2" : "English value2",
},
//and some more sub-documents
],
}
How can it be done, if it is possible at all?
Background
I'm making a software which must be available in multiple languages, and I think current document schema is most suitable for this (if you've got better ideas for the schema I'd like to see them).
To minimize amount of data queried from the database, I'm trying to select the data only in one language. And moreover I want to minimize nesting of structures in the code, so I'm searching a way to somehow select a value out from a sub-document and replace the sub-document.
I've tried a lot of ways writing such query. Here's the one, but it doesn't work as I expect it to:
db.software.aggregate({
$project : {
"name" : true,
"specs" : {
"key" : "$specs.key.en",
"value" : "$specs.value.en"
}
}
});
It transforms a key into an array of all "key.en" fields within specs field. May there be a way to reference a current array element inside "specs" instead of the whole specs array?
Related
I am having trouble understanding why an index is not able to cover a certain query, when my interpretation of documentation suggests it should... :)
The document I am referring to is: https://docs.mongodb.com/manual/core/index-multikey/
I am creating an index on a property which is part of an array of objects. The value indexed is present in other documents. The query looks up directly for the value of the property in the array. But when I look at the plan in the profiler, it is looking through the entire collection.
The structure of the document is as follows:
{
"userEmail": "string",
"basicInformation": {
"name" : "string"
},
"events": {
"live" : [
{"eventId": "id of event 1", // <--- field indexed : "events.live.eventId"
"date" : "date of event",
"duration": n},
{"eventId": "id of event 2",
"date" : "date of event",
"duration": n},
...
],
"onDemand" : [
{"eventId": "id of event 1", // <--- field indexed : "events.onDemand.eventId"
"date" : "date of event",
"duration": n},
{"eventId": "id of event 2",
"date" : "date of event",
"duration": n},
...
]
}
QUERY:
{
$facets: {
"liveUsers": [
{$match: {"events.live.eventId": "id of event 1"}},
{ $project: { .... }}
],
"onDemandUsers": [
{$match: {"events.live.eventId": "id of event 1"}},
{ $project: { .... }}
]
}
}
}
The plan does not seem to use the index and scans the collection. Currently the number of documents in the collection is over 63K, which leads to alerts. Can you help me understand how the indexes should be built or query restructured, so that we can avoid the full collection scan.
I'm fairly new to mongoDB, but I've managed to archive a load of documents into a new collection called documents_archived in the following format using an aggregation pipeleine:
{
"_id" : ObjectId("5a0046ef2039404645a42f52"),
"archive" : [
{
"_id" : ObjectId("54e60f49e4b097cc823afe8c"),
"_class" : "xxxxxxxxxxxxx",
"fields" : [
{
"key" : "Agreement Number",
"value" : "1002465507"
}
{
"key" : "Name",
"value" : "xxxxxxxx"
}
{
"key" : "Reg No",
"value" : "xxxxxxx"
}
{
"key" : "Surname",
"value" : "xxxxxxxx"
}
{
"key" : "Workflow Id",
"value" : "xxxxxxxx"
}
],
"fileName" : "Audit_C1002465507.txt",
"type" : "Workflow Audit",
"fileSize" : NumberLong(404),
"document" : BinData(0, "xxxxx"),
"creationDate" : ISODate("2009-09-25T00:00:00.000+0000"),
"lastModificationDate" : ISODate("2015-02-19T16:28:57.016+0000"),
"expiryDate" : ISODate("2018-09-25T00:00:00.000+0000")
}
]
}
Now, I'm trying to extract just the Agreement Number's value. However, I have tried many things that my limited knowledge, searching and documentation will allow. Wondered if the mongoDB experts out there can help. Thank you.
Here's a solution that uses the agg framework. I am assuming that each doc can have more than one entry in the archive field but only one Agreement Number in the fields array because your design appears to be key/value. If multiple Agreement Numbers show up in the fields array we'll have to add an additional $unwind but for the moment, this should work:
db.foo.aggregate([
{$unwind: "$archive"}
,{$project: {x: {$filter: {
input: "$archive.fields",
as: "z",
cond: {$eq: [ "$$z.key", "Agreement Number" ]}
}}
}}
,{$project: {_id:false, val: {$arrayElemAt: ["$x.value",0]} }}
]);
{ "val" : "1002465507" }
You can use following in mongo shell to extract only values:
db.documents_archived.find().forEach(function(doc) {
doc.archive[0].fields.forEach(function(field) {
if (field.key == "Agreement Number") {
print(field.value)
}
})
})
I have a document in document db that has a property that is an array of an object. I would like to flatten it and get a single property in all the objects in terms of an array.
eg:
{
"name" : "blah",
"address" : [
{
"type" : "home",
"location" : "123 st"
},
{
"type" : "work",
"location" : "321 st"
}
]
}
-> What I want
[
{
"name" : "blah",
"locations" : [ "123 st", "321 st" ]
}
]
You could try to define and use User-defined function to extract locations info.
And then you could call this User-defined function from inside your query.
"-Kj9Penv_LMRUIPSet0b" : {
"categories" : [ "food", "fashion"],
"contact" : "profile/contact/eieiiieie888x7ww28288_x22",
"location" : "New York, United States",
"name" : "Billybob Smith",
"social" : {
"twitter" : {
"followers" : "1,002",
"nickname" : "#billybob"
}
},
"state" : "0"
},
"eieiiieie888x7ww28288_x22" : {
"categories" : [ "food", "fashion" ],
"contact" : "profile/contact/eieiiieie888x7ww28288_x22",
"location" : "New York, United States",
"name" : "Billybob Smith.",
"social" : {
"twitter" : {
"followers" : "1,002",
"nickname" : "#billybob"
}
},
"socialID" : "twitter_id|558969977",
"state" : "0",
"uniqueID" : "eieiiieie888x7ww2828"
},
This is one .JSON example of a duplicate in my database. I have a lot of duplicates in my database. The only common piece of data I capture which uniquely identifies each user is their contact link. What is my best course of action to seek and remove duplicates from my database? I'm totally stuck. The second entry example is the more accurate and complete entry. Ideally, I could remove the first one and leave the second one behind.
Could really use some help here! Thank you so much!
Consider the following document:
{
"entity_id" : 10,
"features" :
[
{
"10" : "Test System 2"
},
{
"20" : "System 2 Description"
},
{
"30" : ["Free", "Monthly", "Quaterly"]
},
{
"40" : ["Day", "Swing"]
}
],
}
I need to, in as few statements as possible, to achieve the following:
Given a document like so:
{"feature_id" : "30", "value" : ["Free"]}
get the corresponding element of the array "features" to contain ["Free"] instead of ["Free", "Monthly", "Quaterly"]
Given a document like so:
{"feature_id" : "50", "value" : ["Bonds", "Commodities"]}
create a new element of the array "features" looking like
{"50" : ["Bonds", "Commodities"]}
Given a document like so:
{"feature_id" : "40", "value" : ""}
remove the corresponding element from the array "features".
Data model
Your data model isn't easy to work with given your desired updates.
If you want to use an array, I would suggest changing the document structure to look like:
{
"entity_id" : 10,
"features" : [
{
feature_id: "10",
value : "Test System 2"
},
{
feature_id: "20",
value: "System 2 Description"
},
{
feature_id: "30",
value: ["Free", "Monthly", "Quaterly"]
},
{
feature_id: "40",
value: ["Day", "Swing"]
}
],
}
Alternatively, you could model as an embedded document:
{
"entity_id" : 10,
"features" : {
"10" : "Test System 2",
"20" : "System 2 Description",
"30" : ["Free", "Monthly", "Quaterly"],
"40" : ["Day", "Swing"]
}
}
The benefit of modeling as an array is that you can add a multikey index across all features/values.
If you model as an embedded document, you could reference fields directly (i.e. features.10). This assumes you know what the keys are going to be, and you would have to index each feature value separately.
I'll assume the first format for the examples below. Also note that your key values have to match in type (so string "10" will not match number 10).
Example 1
Given a document like so:
{"feature_id" : "30", "value" : ["Free"]}
get the corresponding element of the array "features" to contain ["Free"] instead of ["Free", "Monthly", "Quaterly"]
Sample update:
db.docs.update(
// Criteria (assumes entity_id is unique)
{
entity_id: 10,
features: {
// Using $elemMatch to find feature_id with string "30"
$elemMatch: { feature_id: "30" },
}
},
// Update
{ $set: {
"features.$.value" : ["Free"]
}}
)
Example 2
Given a document like so:
{"feature_id" : "50", "value" : ["Bonds", "Commodities"]}
create a new element of the array "features" looking like
{"50" : ["Bonds", "Commodities"]}
Sample update:
db.docs.update(
// Criteria (assumes entity_id is unique)
{
entity_id: 10,
},
// Update
{ $push: {
"features" : { "feature_id" : "50", value: ["Bonds", "Commodities"] }
}}
)
Example 3
Given a document like so:
{"feature_id" : "40", "value" : ""}
remove the corresponding element from the array "features".
Sample update:
db.docs.update(
// Criteria (assumes entity_id is unique)
{
entity_id: 10,
},
// Update
{ $pull: {
"features" : { "feature_id" : "40" }
}}
)