MongoDB - NodeJS - Mongoose Slow Query Array - arrays

I have a question about Mongo and NodeJS currently in one project that i am working on have some issues with the performance. It does work but has a mix of ideas as far as i see. I would like to see some ideas from a more experience person.
I will share the flow with you and the used libs and tools on the way i try to understand it there days and as far as i see might be a better way to do it but still will be slow and un-efficient.
Used Tools and Lib: lodash, mongoose, async, and more.
This is what is the results from NewRelic:
Database MongoDB preferences toArray 19.8 13.6 423ms
Database MongoDB spotifies toArray 19.4 12.6 415ms
Database MongoDB mifi toArray 19.2 12.6 409ms
Database MongoDB fifi toArray 18.9 12.6 404ms
Database MongoDB locations toArray 14.3 18.5 305ms
A sample of what is used to add to the model all the time:
const subObjectsOutCards = [{
name : 'preferencesModel',
model : 'Preferences'}, {
name : 'spotifyModel',
model : 'Spotify'} ,{
name : 'MifiModel',
model : 'mifi'}, {
name : 'fiModel',
model : 'Fifi'}];
Next One:
exports.fetchUsers = (callback) => {
Account.find().populate('userTrips').exec((err, docs) => {
if (_.isEmpty(docs)){
callback(null, null)
} else {
async.map(docs, (doc, callback) => {
var object = doc.toObject();
exports.fetchSubObjectsToUserForCards(object, doc._id, (result) => {
callback(null, result);
})
}, (err, results1) => {
var updatedAccounts = results1;
async.map(updatedAccounts, (doc, callback) => {
locationController.getMostRecentLocationById(doc._id, (result) => {
var updatedDoc = doc;
updatedDoc.locationModel = result;
callback(null, updatedDoc);
})
}, (err, results2) => {
callback(results2);
})
})
}
})}
This is where it goes after:
exports.fetchObjectSub = (account, userId, callback) => {
var newAccount = account;
var itemsProcessed = 0;
subObjectsOut.forEach(function(object){
mongoose.model(object.model).find({'accountId' : userId}, function(err, doc){
if (err){
callback(false);
} else {
if (!_.isNil(doc[0]) && newAccount !== false){
doc = doc[0].toObject();
newAccount[object.name] = doc;
itemsProcessed++;
if (itemsProcessed === subObjectsOut.length){
callback(newAccount);
}
} else {
itemsProcessed++;
}
}
});
});}
Account Example:
{
"_id": "59f372389f89d1cb0dbabdbad",
"residence": "Katowice, Poland",
"orientation": "Bisexual",
"lastName": "Kowalczyk",
"job": "Web Developer # Freelance",
"gender": "Female",
"firstName": "Marina",
"dob": "7/10/1998",
"about": "Computer science student",
"lastConnection": "2017-11-24T19:16:28.780Z",
"created_at": "2017-10-27T20:55:06.070Z",
"phone": {
"number": "7183173136"
},
"profilePicture": {
"url": "https://.JPG",
"pictureType": ".JPG"
},
"interests": [
"Sports",
"Food",
"Cycling",
"Running",
"Cooking",
"Movies",
"Fashion",
"Business",
"Travel",
"Music",
"Theatre",
"Yoga",
"Party",
"Dancing",
"Reading"
],
"__v": 38}
Preferences Example:
{
"_id": "59a83258c7fd5b4ae586c53b",
"visibilityLocation": true,
"visibilityGenderPreferences": true,
"visibilityFb": false,
"visibilityDistance": false,
"visibilityAge": false,
"showMyProfileAs": "Male",
"showMe": "Males",
"locationAccuracy": 0,
"accountId": "59f372389f89d1cb0dbabdbad",
"created_at": "2017-08-31T13:29:42.462Z",
"distance": {
"max": 30,
"metrics": "K",
"min": 0
},
"ageRange": {
"max": 32,
"min": 18
},
"__v": 0}
EDIT: As requested
Indexes - automatic from the system on the _id
Index on accountId in each table from the main one
Operation Performed is on 1000 elements and its really slow: Speeds that i get are 3000ms to 9000ms and all the slow down on average says 5M ms for these documents which is insane....
Example Account and Preferences can be found above before edit.
When i started with this i thought straight that are the filters that we have but seems its not the case. Since thats pretty fast. Problems comes from here and the tools confirm it.
The idea about the structure of the Schema as far as i see is to be a flat one and after to be used Collections all the time instead of references of the other tables. And for each one since there are a lot filters there need to be a search for each person on each one of their five tables and getting their account on the way and adding for each table a field with it and creating a documents. And all of them wait for each other all the time. So if you have 5 users = 5 new documents and after for each there was a loop and a new documents. Please if anybody can help a bit with this would be great. Thanks you

Related

Pouchdb view returning empty result

Good morning,
I'm currently working with Couchdb and Pouchdb and I'm having a problem with one query on Pouchdb side.
I have a database with different documents setup like this:
{
"_id": "fd87b66087503d760fa501fa49029f94",
"_rev": "1-e2be19d447c98d624c2c8492eaf0a3f4",
"type": "product",
"name": "Blanc de Morgex et de la Salle Brut Extreme 2014",
"category": "Wine",
"subcategory": null,
"zone": "Italy",
"nation": "Valle d'Aosta",
"province": "Morgex, AO",
"cellar": "Cave Mont Blanc",
"price": 30,
"structure": null,
"year": 2014,
"mescita": null,
"tag": null
}
The query I wrote should return the available years of products that match some filters. This is the query, with reduce : _count:
function (doc) {
if(doc.category && doc.type == 'product' && doc.year != null) {
emit(doc.year , 1);
}
}
If I try it with Postman adding the group = true parameter everything works and the result is something like:
{
"rows": [
{
"key": 2004,
"value": 2
},
{
"key": 2006,
"value": 2
},
{
"key": 2008,
"value": 2
}
]
}
The problem is when i run this view with Pouchdb with the following code which return a JSON with an empty array:
wine_db.query('wine_list/years', {reduce: '_count', key : "Bollicine", group : true, group_level: 2}).then(function(doc) {
years_list = doc;
console.log('getting year list');
console.log(doc);
}).catch(function(err){
console.log(err);
});
I've tried to play a little with the parameters of the function and even changing the function to return just a list of all the years, but nope.
I can't find the problem neither a different solution so I'm open to every suggestion you can have.
Another solution (group result)
Working on the indications and on the solution suggested by #user3405291 I finally found a way to group the results by year.
Since the emit function return a complex key ['CATEGORY', YEAR] I can use the startkey and endkey parameters to query the result just for a section of the index returned keeping this way the reduce function enable to group the result.
In the end the view function is:
function (doc) {
if(doc.category && doc.type == 'product' && doc.year) {
emit([doc.category, doc.year], doc.year );
}
}
And the Pouchdb query:
wine_db.query('wine_list/years',
{
startkey : ['CATEGORY'],
endkey : ['CATEGORY', {}],
group: true
}
).then(function (doc) {
years_list = doc;
console.log(years_list);
}).catch(function (err) {
console.log(err);
});
The result, where value is the total number of elements with that index:
{
"rows": [
{
"key": [
"Bollicine",
2004
],
"value": 2
},
{
"key": [
"Bollicine",
2006
],
"value": 2
},
{
"key": [
"Bollicine",
2008
],
"value": 2
}
]
}
In your view map function you emit the year as the key/index:
emit(doc.year , 1);
Now, I'm not sure why your are doing your query with a key like {key : "Bollicine"}:
wine_db.query('wine_list/years', {key : "Bollicine"})
.then(res=>{console.log(res)})
Of course you would get an empty response, because your view is actually indexing your docs according to year. I think you might want to do a query with a key like: {key : "2014"}
UPDATE
Based on your comments, I feel like you need to find docs based on both year and category. I'm not sure if I understand what you want, but this may help you: change your view map function like this:
function (doc) {
if(doc.category && doc.type == 'product' && doc.year) {
emit([doc.year, doc.category] , 1);
}
}
The above view will index your docs according to both year and category. You then query your view like this:
wine_db.query('wine_list/years', {key : ['2014', 'Bollicine']})
.then(res=>{console.log(res)})
The above query will give you all the docs with year field equal to 2014 and category field equal to Bollicine.
Second Update
your code works, but I just get the result for the year 2014. What I'm trying to accomplish is to get all the available years given a specific category
One solution is this:
function (doc) {
if(doc.category && doc.type == 'product' && doc.year) {
emit(doc.category, doc.year);
}
}
The above view will index your docs according to category as key and will return the year as value. Therefore you can query like this:
wine_db.query('wine_list/years', {key : 'Bollicine'})
.then(res=>{console.log(res)})
You should get a response like this, by which you have all the available years for Bollicine category:
{
"total_rows": 400,
"offset": 0,
"rows": [
{
"key": "Bollicine",
"value": "2014"
},
{
"key": "Bollicine",
"value": "2015"
},
{
"key": "Bollicine",
"value": "2018"
}
]
}

Cloudant DB query in Node-RED function

I want to load my tweets from the Cloudant DB by ascending order. I thought using sort: "tweet.id" would work but no.
msg.payload = {
"query": "*:*",
limit: 6,
sort: "tweet.id",
};
return msg;
Node-RED flow:
I got this to work by creating a new Cloudant Query index in the Cloudant dashboard:
{
"index": {
"fields": [ "tweet.timestamp_ms" ]
},
"type": "json"
}
to index the tweet.timestamp_ms field. This can then be queried to return the data in timestamp order:
{
"selector": {
"_id": {
"$gt": 0
}
},
"sort": [
{
"tweet.timestamp_ms": "asc"
}
]
}
I solve this issue by adding the type of the variable.
What I was trying to do is to get all the documents with id=1 and then sort them by the attribute "nombre" which is a string.
My search index is:
function (doc) {
index("id", doc.id, {"store": true});
index("nombre", doc.nombre, {"store": true});
}
And the payload in Node-red:

updating a JSON array in AWS dynamoDB

My document looks like this:
{
"data": {
"eventId": "20161029125458-df-d",
"name": "first",
"purpose": "test",
"location": "yokohama",
"dateArray": [],
"attendees": [
{
"attendeeId": "2016102973634-df",
"attendeeName": "lakshman",
"personalizedDateSelection": {}
},
{
"attendeeId": "2016102973634-tyyu",
"attendeeName": "diwaakar",
"personalizedDateSelection": {}
}
]
}
}
Say, I need to update the attendee JSON array with attendeeId: 2016102973634-df. I tried many ways ways using update and condition expression, but no success.
Here is my try:
const params = {
TableName: "event",
Key: {
"eventId": eventId
},
UpdateExpression: "SET attendees[???] = ",
ConditionExpression: attendees.attendeeId = "2016102973634-df",
ExpressionAttributeValues: {
":attendee" : attendeeList
},
ReturnValues: "ALL_NEW"
};
dynamo.update(params, (err, data) => {
if (err) {
return reject(err);
}
console.log(data.Attributes);
});
Could not find any resources for updating an Json in a array.
After #notionquest's comment:
- Have not used any JsonMarshaller. Initially I added the empty array to attendees field like this:
{
"eventId": "20161029125458-df-d",
"name": "first",
"purpose": "test",
"location": "yokohama",
"dateArray": [],
"attendees": []
}
and then When a new attendee comes I add it to the attendees property like this:
const attendee = {
"attendeeName": "user1",
"personalizedDateSelection": {"today": "free"}
}
const attendeeList = [attendee];
const eventId = "20161029125458-df-d";
const params = {
TableName: "event",
Key: {
"eventId": eventId
},
UpdateExpression: "SET attendees = list_append(attendees, :attendee)",
ExpressionAttributeValues: {
":attendee" : attendeeList
},
ReturnValues: "ALL_NEW"
};
dynamo.update(params, (err, data) => {
if (err) {
return reject(err);
}
console.log("in update dynamo");
console.log(data.Attributes);
});
As you have seen in the above snippets, initially I add empty [] array and add a new attendee using the above code. Now, How do I update a specific JSON in an array. If you say that is not possible, what else can I try?
Should I try this :
Get the Full JSON.
Manipulate the JSOn and change the things I want in my nodeJS.
And then update the new JSON to dynamoDB.
But this consumes two calls to dynamoDB which seems to be inefficient.
Would like to know If there is any round way ?
you can store the index of list. while updating the list we can use them. For example ,
{
"data": {
"eventId": "20161029125458-df-d",
"name": "first",
"purpose": "test",
"location": "yokohama",
"dateArray": [],
"attendees": [
{
"index":0,
"attendeeId": "2016102973634-df",
"attendeeName": "lakshman",
"personalizedDateSelection": {}
},
{
"index":1,
"attendeeId": "2016102973634-tyyu",
"attendeeName": "diwaakar",
"personalizedDateSelection": {}
}
]
}
}
const params = {
TableName: "event",
Key: {
"eventId": eventId
},
UpdateExpression: "SET attendees[attendee.index].attendeeName = :value",
ExpressionAttributeValues: {
":value" : {"S":"karthik"}
},
ReturnValues: "ALL_NEW"
};
dynamo.update(params, (err, data) => {
if (err) {
return reject(err);
}
console.log(data.Attributes);
});
An example of an update query:
Data structure (saved in DynamoDB)
{
tenant_id: 'tenant_1',
users: {
user1: {
_id: 'user1',
email_address: 'test_email_1#gmail.com'
},
user2: {
_id: 'user2',
email_address: 'test_email_2#gmail.com'
}
}
}
Data for update (used in the params)
var user = {
email_address: 'updated#gmail.com'
}
Params
var params = {
TableName: 'tenant-Master',
Key: {
"tenant_id": 'tenant_1'
},
UpdateExpression: "set #users.user1 = :value",
ExpressionAttributeNames: {
"#users": "users"
},
ExpressionAttributeValues: {
":value": user,
},
};
Explanation
By switching to a map of maps from an array of maps we can now use UpdateExpression: "set #users.user1 = :value" to update our nested object at the map of users with the id of user1.
NOTE: This method as is will REPLACE the entire map object at users.user1. Some changes will need to be made if you want to keep pre-existing data.
I could not find any answer to query and update the JSON-array. I think this may be AWS profitable motive to not allow those features. If you need to query on a particular ID other than primary key, you need to make a secondary index which is cost effective. This secondary index cost is additional to the dyn
amoDB table cost.
Since, I did not want to pay extra bucks on secondary index, I changed my dynamoDB schema to the following:
{
"data": {
"eventId": "20161029125458-df-d",
"name": "first",
"purpose": "test",
"location": "yokohama",
"dateArray": [],
"attendees": {
"2016102973634-df": {
"attendeeId": "2016102973634-df",
"attendeeName": "lakshman",
"personalizedDateSelection": {}
},
"2016102973777-df": {
"attendeeId": "2016102973777-df",
"attendeeName": "ffff",
"personalizedDateSelection": {}
}
}
}
}
Changing attendees from [] to {}. This allows me the flexibility to query particular attendeeId and change the entire JSON associated with that. Even though, this is a redundant step, I do not want to spend extra bucks on my hobby project.

Update array of subdocuments in MongoDB

I have a collection of students that have a name and an array of email addresses. A student document looks something like this:
{
"_id": {"$oid": "56d06bb6d9f75035956fa7ba"},
"name": "John Doe",
"emails": [
{
"label": "private",
"value": "private#johndoe.com"
},
{
"label": "work",
"value": "work#johndoe.com"
}
]
}
The label in the email subdocument is set to be unique per document, so there can't be two entries with the same label.
My problems is, that when updating a student document, I want to achieve the following:
adding an email with a new label should simply add a new subdocument with the given label and value to the array
if adding an email with a label that already exists, the value of the existing should be set to the data of the update
For example when updating with the following data:
{
"_id": {"$oid": "56d06bb6d9f75035956fa7ba"},
"emails": [
{
"label": "private",
"value": "me#johndoe.com"
},
{
"label": "school",
"value": "school#johndoe.com"
}
]
}
I would like the result of the emails array to be:
"emails": [
{
"label": "private",
"value": "me#johndoe.com"
},
{
"label": "work",
"value": "work#johndoe.com"
},
{
"label": "school",
"value": "school#johndoe.com"
}
]
How can I achieve this in MongoDB (optionally using mongoose)? Is this at all possible or do I have to check the array myself in the application code?
You could try this update but only efficient for small datasets:
mongo shell:
var data = {
"_id": ObjectId("56d06bb6d9f75035956fa7ba"),
"emails": [
{
"label": "private",
"value": "me#johndoe.com"
},
{
"label": "school",
"value": "school#johndoe.com"
}
]
};
data.emails.forEach(function(email) {
var emails = db.students.findOne({_id: data._id}).emails,
query = { "_id": data._id },
update = {};
emails.forEach(function(e) {
if (e.label === email.label) {
query["emails.label"] = email.label;
update["$set"] = { "emails.$.value": email.value };
} else {
update["$addToSet"] = { "emails": email };
}
db.students.update(query, update)
});
});
Suggestion: refactor your data to use the "label" as an actual field name.
There is one straightforward way in which MongoDB can guarantee unique values for a given email label - by making the label a single separate field in itself, in an email sub-document. Your data needs to exist in this structure:
{
"_id": ObjectId("56d06bb6d9f75035956fa7ba"),
"name": "John Doe",
"emails": {
"private": "private#johndoe.com",
"work" : "work#johndoe.com"
}
}
Now, when you want to update a student's emails you can do an update like this:
db.students.update(
{"_id": ObjectId("56d06bb6d9f75035956fa7ba")},
{$set: {
"emails.private" : "me#johndoe.com",
"emails.school" : "school#johndoe.com"
}}
);
And that will change the data to this:
{
"_id": ObjectId("56d06bb6d9f75035956fa7ba"),
"name": "John Doe",
"emails": {
"private": "me#johndoe.com",
"work" : "work#johndoe.com",
"school" : "school#johndoe.com"
}
}
Admittedly there is a disadvantage to this approach: you will need to change the structure of the input data, from the emails being in an array of sub-documents to the emails being a single sub-document of single fields. But the advantage is that your data requirements are automatically met by the way that JSON objects work.
After investigating the different options posted, I decided to go with my own approach of doing the update manually in the code using lodash's unionBy() function. Using express and mongoose's findById() that basically looks like this:
Student.findById(req.params.id, function(err, student) {
if(req.body.name) student.name = req.body.name;
if(req.body.emails && req.body.emails.length > 0) {
student.emails = _.unionBy(req.body.emails, student.emails, 'label');
}
student.save(function(err, result) {
if(err) return next(err);
res.status(200).json(result);
});
});
This way I get the full flexibility of partial updates for all fields. Of course you could also use findByIdAndUpdate() or other options.
Alternate approach:
However the way of changing the schema like Vince Bowdren suggested, making label a single separate field in a email subdocument, is also a viable option. In the end it just depends on your personal preferences and if you need strict validation on your data or not.
If you are using mongoose like I do, you would have to define a separate schema like so:
var EmailSchema = new mongoose.Schema({
work: { type: String, validate: validateEmail },
private: { type: String, validate: validateEmail }
}, {
strict: false,
_id: false
});
In the schema you can define properties for the labels you already want to support and add validation. By setting the strict: false option, you would allow the user to also post emails with custom labels. Note however, that these would not be validated. You would have to apply the validation manually in your application similar to the way I did it in my approach above for the merging.

Generate query results based on tags in PouchDB

I'm new to NoSQL but have decided to use PouchDB for an Angular Application I am creating.
There are going to be a series of questions (about 1000 in total) which each have their own tags. Each object shouldn't have more that 6 or 7 tags. Example data is:
{
"text": "Question?",
"answers": [
{ "text": "Yes", "correct": true },
{ "text": "No", "correct": false }
],
"tags": ["tag1", "tag3"]
},
{
"text": "Question?",
"answers": [
{ "text": "Yes","correct": true },
{ "text": "No", "correct": false }
],
"tags": ["tag2", "tag3"]
}
I'm at a total loss on how I can query the db in order to retrieve only questions that have "tag2" or questions that have "tag1" and "tag3".
I came across the question found at How to query PouchDB with SQL-like operators but can't seem to wrap my head around how it works. I tried to modify it based on my data and I always get 0 results when querying the database.
I guess my biggest struggle is comparing it to SQL when it isn't. Does anyone know how I can go about creating a query based on specific tags?
Yup, you create a map/reduce query like this:
// document that tells PouchDB/CouchDB
// to build up an index on tags
var ddoc = {
_id: '_design/my_index',
views: {
my_index: {
map: function (doc) {
doc.tags.forEach(function (tag) {
emit(tag);
});
}.toString()
}
}
};
// save it
pouch.put(ddoc).then(function () {
// success!
}).catch(console.log.bind(console));
Then you query it:
pouch.query('my_index', {key: myTag, include_docs: true}).then(function (res) {
// got a result
}).catch(console.log.bind(console));
If you want to find multiple tags, you can just keys instead of key.
BTW this will be easier in the future when I add $elemMatch and $in to pouchdb-find.

Resources