Mongodb get unique value from documents and add to array - arrays

I wanted to do a query to match documents in one collection with documents in another collection based upon a value which should be contained in both sets of documents but, as I have been informed that Mongo does not support a JOIN, I believe I can't do what I want in the way I want to.
My alternative method then is to insert a document into the collection (col1) where I want to do a query and update which contains an array of all the unique cycle number which are in the other collection (col2).
Collection 1 (Col 1)
{
"_id" : ObjectId("5670961f910e1f54662c11ag"),
"objectType" : "Account Balance",
"Customer" : "Thomas Brown",
"status" : "unprocessed",
"cycle" : "1234"
},
{
"_id" : ObjectId("5670961f910e1f54662c12fd"),
"objectType" : "Account Balance",
"Customer" : "Luke Underwood",
"status" : "unprocessed",
"cycle" : "1235"
}
Collection 2 (Col 2)
{
"_id" : ObjectId("5670961f910e1f54662c1d9d"),
"objectOrigin" : "Xero",
"Value" : "500.00",
"key" : "grossprofit",
"cycle" : "1234",
"company" : "e56e09ef-5c7c-423e-b699-21469bd2ea00"
},
{
"_id" : ObjectId("5670961f910e1f54662c1d9f"),
"objectOrigin" : "Xero",
"Value" : "500.00",
"key" : "grossprofit",
"cycle" : "1234",
"company" : "0a514db8-1428-4da6-9225-0286dc2662c1"
},
{
"_id" : ObjectId("5670961f910e1f54662c1da0"),
"objectOrigin" : "Xero",
"Value" : "-127.28",
"key" : "grossprofit",
"cycle" : "1234",
"company" : "c2d0561c-dc5d-44b9-beaf-d69a3472a2b8"
},
{
"_id" : ObjectId("5670961f910e1f54662c1da1"),
"objectOrigin" : "Xero",
"Value" : "-127.28",
"key" : "grossprofit",
"cycle" : "1235",
"company" : "c3fbe6e4-962a-45f6-9ce3-71e2a588438c"
}
So I want to create a document in collection 1 which looks like this:
{
"_id" : ObjectId("5670961f910e1f54662c1d9f"),
"objectType" : "Status Updater",
"cycles" : ["1234","1235"]
}
Now what I want to do is query ALL documents where cycle = cycles and update "status" to "processed". I believe I would do this with a findAndModify with multi : true but not entirely sure.
When finished, I will just simply delete any document in the Collection 1 where objectType is "Status Updater".

If I understand correctly, you want to
a) update all documents in collection #1 where the value of cycle
exists in collection #2.
b) Furthermore, your document of type "objectType" : "Status
Updater" is only a temporary document to keep track of all the cycle
values.
I think you can skip b) and just use the following (this code needs to be executed in the mongo shell):
# get values of all unique cycle values
# returns an array containing: ["1234", "1235", ...]
values = db.coll2.distinct("cycles", {})
# find all documents where cycles value is in the values array
# and update their status.
db.coll1.update({cycles: {$in: values}}, {$set: {status: "processed"}}, {multi: true})

Related

GrayLog not displaying messages that are Bulk inserted

I need to migrate millions of records from a SQL database to ES. Currently we insert records in ES via GELF HTTP, but only doing that one record at a time just isn't feasible.
I've been working on this a couple days and am new to both GrayLog and ElasticSearch. I'm trying to find a way to Bulk insert messages into ES and then have them display in GrayLog. I've been using Cerebro to monitor the indexes and the number of messages in each of them. When I do the Bulk insert, the message count does increase in the correct Index, however I can not see them in GrayLog.
Here is what I have:
var _elasticsearchContext = new ElasticsearchContext(ConnectionString, new ElasticsearchMappingResolver());
var connectionSettings = new ConnectionSettings(new Uri(ConnectionString))
.MapDefaultTypeIndices(m => m.Add(typeof(Auditing_Dev), "auditing-dev_0"));
var elasticClient = new ElasticClient(connectionSettings);
var items = new List<Auditing_Dev>();
//I loop through a DataReader creating new Auditing_Dev objects
//and add them to the items collection
var bulkResponse = elasticClient.Bulk(b => b.IndexMany(items, (d, doc) => d.Document(doc).Index("auditing-dev_0").Type("message")));
I get back a valid response and I see the document count increase in Cerebro in the auditing-dev_0 index. When I compare a message that I insert via Bulk to one that is inserted via HTTP request, the indexes and types are the same.
Message I insert:
{
"_index" : "auditing-dev_0",
"_type" : "message",
"_id" : "AVsWWn-jNp2NX1vOria1",
"_version" : 1,
"found" : true,
"_source" : {
"level" : 5,
"origin" : "10.80.3.2",
"success" : true,
"type" : "Company.Enterprise",
"user" : "stupid#dropdown.test",
"gl2_source_input" : "57193c1d0cf25a44afc31c15",
"gl2_source_node" : "5866cc80-382e-4287-ae5b-8a0a68a9a1f1",
"gl2_remote_ip" : "10.100.20.164",
"gl2_remote_port" : 52273,
"streams" : [ "578fbabe738a897c6d91336b" ]
}
}
Compared to one inserted via HTTP:
{
"_index" : "auditing-dev_0",
"_type" : "message",
"_id" : "e3d34d50-0a8a-11e7-84bb-00155d007a32",
"_version" : 1,
"found" : true,
"_source" : {
"level" : 5,
"gl2_remote_ip" : "192.168.211.114",
"origin" : "192.168.211.35",
"gl2_remote_port" : 2960,
"streams" : [ "578fbabe738a897c6d91336b" ],
"gl2_source_input" : "57193c1d0cf25a44afc31c15",
"success" : "True",
"gl2_source_node" : "5866cc80-382e-4287-ae5b-8a0a68a9a1f1",
"user" : "admin#purple-pink.test",
"timestamp" : "2017-03-16 22:43:44.000"
}
}
I see the _id is a different format, but does that matter?
In GrayLog there is only one Input and that is the one for GELF HTTP. Do I need to add a new Input?
Turned out to be the Timestamp field not being present. Who knew?

String from document meets value of array

I've got an array of Project ID's, for example:
[ 'ExneN3NdwmGPgRj5o', 'hXoRA7moQhqjwtaiY' ]
And in my Questions collection, I've got a field called 'project', which has a string of a project Id. For example:
{
"_id" : "XPRbFupkJPmrmvcin",
"question" : "Vraag 13",
"answer" : "photo",
"project" : "ExneN3NdwmGPgRj5o",
"datetime_from" : ISODate("2017-01-10T08:01:00Z"),
"datetime_till" : ISODate("2017-01-10T19:00:00Z"),
"createdAt" : ISODate("2017-01-10T08:41:39.950Z"),
"notificationSent" : true
}
{
"_id" : "EdFH6bo2xBPht5kYW",
"question" : "sdfadsfasdf",
"answer" : "text",
"project" : "hXoRA7moQhqjwtaiY",
"datetime_from" : ISODate("2017-01-11T11:00:00Z"),
"datetime_till" : ISODate("2017-01-11T17:00:00Z"),
"createdAt" : ISODate("2017-01-10T10:21:42.147Z"),
"notificationSent" : false
}
Now I want to return all documents of the Questions collection, where the Project (id) is one of the value's from the Array.
To test if it's working, I'm first trying to return one document.
Im console.logging like this:
Questions.findOne({project: { $eq: projectArray }})['_id'];
but have also tryed this:
Questions.findOne({project: { $in: [projectArray] }})['_id'];
But keep getting 'undefined'
Please try this.
Questions.find({project: { $in: projectArray }}) => for fetching all docs with those ids
Questions.findOne({project: { $in: projectArray }}) => if you want just one doc

MongoDB: Check for missing documents using a model tree structures with an array of ancestors

I'm using a model tree structures with an array of ancestors and I need to check if any document is missing.
{
"_id" : "GbxvxMdQ9rv8p6b8M",
"type" : "article",
"ancestors" : [ ]
}
{
"_id" : "mtmTBW8nA4YoCevf4",
"parent" : "GbxvxMdQ9rv8p6b8M",
"ancestors" : [
"GbxvxMdQ9rv8p6b8M"
]
}
{
"_id" : "J5Dg4fB5Kmdbi8mwj",
"parent" : "mtmTBW8nA4YoCevf4",
"ancestors" : [
"GbxvxMdQ9rv8p6b8M",
"mtmTBW8nA4YoCevf4"
]
}
{
"_id" : "tYmH8fQeTLpe4wxi7",
"refType" : "reference",
"parent" : "J5Dg4fB5Kmdbi8mwj",
"ancestors" : [
"GbxvxMdQ9rv8p6b8M",
"mtmTBW8nA4YoCevf4",
"J5Dg4fB5Kmdbi8mwj"
]
}
My attempt would be to check each ancestors id if it is existing. If this fails, this document is missing and the data structure is corrupted.
let ancestors;
Collection.find().forEach(r => {
if (r.ancestors) {
r.ancestors.forEach(a => {
if (!Collection.findOne(a))
missing.push(r._id);
});
}
});
But doing it like this will need MANY db calls. Is it possible to optimize this?
Maybe I could get an array with all unique ancestor ids first and check if these documents are existing within one db call??
First take out all distinct ancesstors from your collections.
var allAncesstorIds = db.<collectionName>.distinct("ancestors");
Then check if any of the ancesstor IDs are not in the collection.
var cursor = db.<collectionName>.find({_id : {$nin : allAncesstorIds}}, {_id : 1})
Iterate the cursor and insert all missing docs in a collection.
cursor.forEach(function (missingDocId) {
db.missing.insert(missingDocId);
});

mongodb - mapreducing two collections where one collection has ids in an array of array

I'm very new to mongoDB and having some problems on joining two collections.
I've read some posts on mapReduce to perform NOSQL way of joining but still having some difficulties here
Collection 1: attraction
{
"_id" : "0001333b-e485-4fee-a0e2-9b7dc338d5a2",
"types" : "Shops",
"name" : "name",
"geo_location" : {
"lat" : 36.0567700000000002,
"lon" : -112.1354520000000008
},
"overall_rating" : 10.0000000000000000,
"num_of_review" : 6,
"review" : [
{
"review_ids" : [
"66ea1cd8-da34-40dc-8ad6-f30df5de9c2c",
"76f51c8d-d2a8-4609-8b7c-c2b0c386e35c",
"185c962a-fcfe-4d03-a3ac-86398be6312a",
"2212535b-28c6-423e-91f7-cc1dfb407d79",
"7e0f1d85-e79e-4bec-9e9c-7dfb03223816",
"f19a83a6-c6ef-4cbe-b90d-f6187bd50baa"
]
}
]
}
Collection 2: attraction_review
{
"_id" : "7e0f1d85-e79e-4bec-9e9c-7dfb03223816",
"user_id" : "somename",
"review_id" : "r122796525",
"unified_id" : "0001333b-e485-4fee-a0e2-9b7dc338d5a2",
"source_id" : "d1057961",
"review_url" : "someURL",
"title" : "some title",
"overall_rating" : 10,
"review_date" : "dates",
"content" : "some contents here",
"source" : "source",
"traval_date" : "dates",
"sort" : ""
}
Basically I need to keep (or copy) the reviews in the attraction_review whose _id has appeared in the review_ids array of the attraction collection.
The example above shows the matching review in red.
It is guaranteed that the attraction_review collection contains every ids in the review_ids for all records in the attraction collection.
The difficulty here is that the review_ids array is within review array, and I am not sure how I would go about mapping many instances of ids.
I would be grateful for some suggestions.
Many thanks

update nth document in a nested array document in mongodb

I need to update a document in an array inside another document in Mongo DB
{
"_id" : ObjectId("51cff693d342704b5047e6d8"),
"author" : "test",
"body" : "sdfkj dsfhk asdfjad ",
"comments" : [
{
"author" : "test",
"body" : "sdfkjdj\r\nasdjgkfdfj",
"email" : "test#tes.com"
},
{
"author" : "hola",
"body" : "sdfl\r\nhola \r\nwork here"
}
],
"date" : ISODate("2013-06-30T09:12:51.629Z"),
"permalink" : "jaiho",
"tags" : [
"jaiho"
],
"title" : "JAiHo"
}
Q1) Update email of 0th element of comments array
db.posts.update({"permalink" : "haha"},{$set:{"comments.0.email":1}})
This doesn't throw any exception but doesn't update anything as well
Q2) Add a field on nth element of comments array number_likes
db.posts.update({"permalink" : "haha"},{$inc:{"comments.0.num_likes":1}})
Doesn't work either.
Am I missing something here?
Q1: If you update with permalink 'jaiho' instead of 'haha', it most certainly updates the email;
> db.posts.update({"permalink" : "jaiho"},{$set:{"comments.0.email":1}})
> db.posts.find()
..., "email" : 1 },...
Q2: Same goes for this include;
> db.posts.update({"permalink" : "jaiho"},{$inc:{"comments.0.num_likes":1}})
> db.posts.find()
..., "num_likes" : 1 },...
If you are trying to do it dynamically in Node JS following should work.
i = 0;
selector = {};
operator = {};
selector['comments.' + i + '.email'] = 1; // {'comments.0.num_likes' : 1}
operator['$inc'] = selector; // {'$inc' : {'comments.0.num_likes' : 1} }
db.posts.update({'permalink' : 'xyz'}, operator);

Resources