MongoDB Array Query Performance

MongoDB Array Query Performance - arrays

I'm trying to figure out what the best schema is for a dating site like app. User's have a listing (possibly many) and they can view other user listings to 'like' and 'dislike' them.
Currently i'm just storing the other persons listing id in a likedBy and dislikedBy array. When a user 'likes' a listing, it puts their listing id into the 'liked' listings arrays. However I would now like to track the timestamp that a user likes a listing. This would be used for a user's 'history list' or for data analysis.
I would need to do two separate queries:
find all active listings that this user has not liked or disliked before
and for a user's history of 'liked'/'disliked' choices
find all the listings user X has liked in chronological order
My current schema is:
listings
_id: 'sdf3f'
likedBy: ['12ac', 'as3vd', 'sadf3']
dislikedBy: ['asdf', 'sdsdf', 'asdfas']
active: bool
Could I do something like this?
listings
_id: 'sdf3f'
likedBy: [{'12ac', date: Date}, {'ds3d', date: Date}]
dislikedBy: [{'s12ac', date: Date}, {'6fs3d', date: Date}]
active: bool
I was also thinking of making a new collection for choices.
choices
Id
userId // id of current user making the choice
userlistId // listing of the user making the choice
listingChoseId // the listing they chose yes/no
type
date
I'm not sure of the performance implications of having these choices in another collection when doing the find all active listings that this user has not liked or disliked before.
Any insight would be greatly appreciated!

Well you obviously thought it was a good idea to have these embedded in the "listings" documents so your additional usage patterns to the cases presented here worked properly. With that in mind there is no reason to throw that away.
To clarify though, the structure you seem to want is something like this:
{
"_id": "sdf3f",
"likedBy": [
{ "userId": "12ac", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "as3vd", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "sadf3", "date": ISODate("2014-04-09T07:30:47.091Z") }
],
"dislikedBy": [
{ "userId": "asdf", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "sdsdf", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "asdfas", "date": ISODate("2014-04-09T07:30:47.091Z") }
],
"active": true
}
Which is all well and fine except that there is one catch. Because you have this content in two array fields you would not be able to create an index over both of those fields. That is a restriction where only one array type of field (or multikey) can be be included within a compound index.
So to solve the obvious problem with your first query not being able to use an index, you would structure like this instead:
{
"_id": "sdf3f",
"votes": [
{
"userId": "12ac",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "as3vd",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "sadf3",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "asdf",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "sdsdf",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "asdfas",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
}
],
"active": true
}
This allows an index that covers this form:
db.post.ensureIndex({
"active": 1,
"votes.userId": 1,
"votes.date": 1,
"votes.type": 1
})
Actually you will probably want a few indexes to suit your usage patterns, but the point is now can have indexes you can use.
Covering the first case you have this form of query:
db.post.find({ "active": true, "votes.userId": { "$ne": "12ac" } })
That makes sense considering that you clearly are not going to have both an like and dislike option for each user. By the order of that index, at least active can be used to filter because your negating condition needs to scan everything else. No way around that with any structure.
For the other case you probably want the userId to be in an index before the date and as the first element. Then your query is quite simple:
db.post.find({ "votes.userId": "12ac" })
.sort({ "votes.userId": 1, "votes.date": 1 })
But you may be wondering that you suddenly lost something in that getting the count of "likes" and "dislikes" was as easy as testing the size of the array before, but now it's a little different. Not a problem that cannot be solved using aggregate:
db.post.aggregate([
{ "$unwind": "$votes" },
{ "$group": {
"_id": {
"_id": "$_id",
"active": "$active"
},
"likes": { "$sum": { "$cond": [
{ "$eq": [ "$votes.type", "like" ] },
1,
0
]}},
"dislikes": { "$sum": { "$cond": [
{ "$eq": [ "$votes.type", "dislike" ] },
1,
0
]}}
])
So whatever your actual usage form you can store any important parts of the document to keep in the grouping _id and then evaluate the count of "likes" and "dislikes" in an easy manner.
You may also not that changing an entry from like to dislike can also be done in a single atomic update.
There is much more you can do, but I would prefer this structure for the reasons as given.

Related

How to add child entities without id to parent in state normalized with normalizr

I've recently started using normalizr with zustand in a new React app. It's been a very good experience so far, having solved most of the painful problems I've had in the past.
I've just bumped into an issue I can't think of a clean way of solving for the past few days.
Imagine I have a normalizr-normalized state looking like:
{
"entities": {
"triggers": {
"1": {
"id": 1,
"condition": "WHEN_CURRENCY_EXCHANGED",
"enabled": true,
"value": "TRY"
},
"2": {
"id": 2,
"condition": "WHEN_CURRENCY_EXCHANGED",
"enabled": true,
"value": "GBP"
},
"3": {
"id": 3,
"condition": "WHEN_TRANSACTION_CREATED",
"enabled": true,
"value": true
}
},
"campaigns": {
"19": {
"id": 19,
"name": "Some campaign name",
"triggers": [
1,
2,
3
]
}
}
},
"result": 19
}
And we have a page that allows a user to add one or more triggers to the campaign and then save them. The problem is that at the time of adding these triggers, they do not have an id until the user clicks the Save button (ids are generated by the database). When the Save button is clicked, the state is being denormalized (via normalizr's denormalize function) and sent as payload to the backend looking like the following:
{
"id": 19,
"name": "Some campaign name",
"triggers": [
{
"id": 1,
"condition": "WHEN_CURRENCY_EXCHANGED",
"enabled": true,
"value": "TRY"
},
{
"id": 2,
"condition": "WHEN_CURRENCY_EXCHANGED",
"enabled": true,
"value": "GBP"
},
{
"id": 3,
"condition": "WHEN_TRANSACTION_CREATED",
"enabled": true,
"value": true
}
]
}
The problem is that if the user adds an entity to the triggers, it does not have an id as ids are generated by the database and I cannot find a proper way to add it to the state (due to the id-based nature of normalized states).
The only workaround I can think of is generating some temporary IDs (e.g. uuid) when a trigger is added on the front-end but is not yet saved and then going over each entity upon denormalization, doing something like if (isUuid(trigger.id)) delete trigger.id, which seems too tedious and workaroundish.
Appreciate your help.
P.S. There is something similar explained here. The problem is that in our case the generateId('comment') logic is happening on the backend.

A simple solution is to split.
The create trigger API call and the add trigger to campaign API call.
Do the first, then save the trigger into the normalized store with the id generated by the backend.
Then add it to the campaign.

How to update a double nested value inside an array of multiple documents?

Imagine the following collection of city records:
{
"city": "London",
"inhabitants": [
{
"id": "34543534",
"user": {
"name": "Jonathan Deer",
"email": "john#btinternet.com"
}
},
{
"id": "0454534",
"user": {
"name": "Tanya Patel",
"email": "tanya#btinternet.com"
}
},
{
"id": "4345345",
"user": {
"name": "Catherine King",
"email": "catherine#gmail.com"
}
}
]
}
{
"city": "Manchester",
"inhabitants": [
{
"id": "980003",
"user": {
"name": "Benjamin Thaw",
"email": "benny#btinternet.com"
}
},
{
"id": "734488",
"user": {
"name": "Craig Longstone",
"email": "craig#gmail.com"
}
},
{
"id": "4400093",
"user": {
"name": "Arnold Greentree",
"email": "arnold#btinternet.com"
}
},
]
},
What I'm trying to do is loop through each inhabitants array of each city, and see if any of the people there has an email address containing btinternet.com in it. For those users I want to sent a new flag isBT: true and for everyone else (e.g., gmail.com users) isBT: false:
"user": {
"name": "Tanya Patel",
"email": "tanya#btinternet.com"
"isBT" true
}
I tried the following queries - first query sets all of them to isBT: false while the second one searches for "btinternet.com" in email address and sets isBT: true:
db.city.update({ "inhabitants.user.email": {$exists: true}}, {$set: { "inhabitants.$.user.isBT": false}}, {multi: true})
db.city.update({ "inhabitants.user.email": {$regex: "btinternet.com"}}, {$set: { "inhabitants.$.user.isBT": true}}, {multi: true})
The problem is that when I execute the second query, there are several inhabitants records that are left with isBT: false even though they contain the necessary "btinternet.com" email address. It almost seems like only the first user record that matches the criteria gets updated... Is there a way to update ALL user records for all "inhabitants" arrays?
I looked at using the positional operator $[], but our DB is on version 2.6.3 but this operator was introduced only in 3.6...

The short answer is "no".
The long answer is "no, because your MongoDB version doesn't support such an operation". You'll need to either...
1. retrieve all matching documents and perform a full array update through server-side processing of the data (e.g. use the MongoDB cursor.forEach()),
2. extend your match for "inhabitants.user.isBT": true (use
$elemMatch) and repeatedly perform the update query until the
number of modified documents is 0 (i.e. there are no more array
elements to update), or
3. update your MongoDB version and any
server-side code that relies on features of the current version that
have changed between 2.6 and 3.6.
Any solution to this problem will require more effort than a single query. There's no getting around it. It's a tough pill to swallow, but there really isn't a nice alternative.

Query an array of users based on an array of users

Basically I'm having trouble understanding how I would figure this out.
I have a document in a mongodb collection, and that document has field called friends which is an array of usernames.
I want to query through each username in the array friends, and have an array of those user documents. I'm terrible at explaining maybe if I draw this out it'll make sense.
mongodb document:
{
"_id": {
"$oid": "59a20e65f94cb5e924af774e"
},
"name": "Nick",
"friends": ["Jones","Mark","Mike"]
}
Now with this friends array, I want to search the same collection for an object with the "name" Jones, Mark, and Mike. When I find that object, I want to put it into an array.
Basically I want it to return this, (for this example let's say Jones, Mark, and Mike only have one friend, and that friend is Nick.
[{
"_id": {
"$oid": "59a20e65f94cb5e924af774e"
},
"name": "Jones",
"friends": ["Nick"]
},
{
"_id": {
"$oid": "59a20e65f94cb5e924af774e"
},
"name": "Mark",
"friends": ["Nick"]
},
{
"_id": {
"$oid": "59a20e65f94cb5e924af774e"
},
"name": "Mike",
"friends": ["Nick"]
}]
^ an array of three objects, which are all the friends of Nick.
If you need any more explanation please let me know, I'm terrible at this type of stuff.
For the record, I'm using node, and basic mongodb (not mongoose).

I believe you are looking for $in operator.
// doc.friends = ["Jones","Mark","Mike"]
db.collection.find({ name: { $in: doc.friends }})

How to do a NoSql linked query

I have a noSql (Cloudant) database
-Within the database we have documents where one of the document fields represents “table” (type of document)
-Within the documents we have fields that represent links other documents within the database
For example:
{_id: 111, table:main, user_id:222, field1:value1, other1_id: 333}
{_id: 222, table:user, first:john, other2_id: 444}
{_id: 333, table:other1, field2:value2}
{_id: 444, table:other2, field3:value3}
We want of way of searching for _id:111
And the result be one document with data from linked tables:
{_id:111, user_id:222, field1:value1, other1_id: 333, first:john, other2_id: 444, field2:value2, field3:value3}
Is there a way to do this?
There is flexibility on the structure of how we store or get the data back—any suggestions on how to better structure the data to make this possible?

The first thing to say is that there are no joins in Cloudant. If you're schema relies on lots of joining then you're working against the grain of Cloudant which may mean extra complication for you or performance hits.
There is a way to de-reference other documents' ids in a MapReduce view. Here's how it works:
create a MapReduce view to emit the main document's body and its linked document's ids in the form { _id: 'linkedid'}
query the view with include_docs=true to pull back the document AND the de-referenced ids in one go
In your case, a map function like this:
function(doc) {
if (doc.table === 'main') {
emit(doc._id, doc);
if (doc.user_id) {
emit(doc._id + ':user', { _id: doc.user_id });
}
}
}
would allow you to pull back the main document and its linked user document in one API by hitting the GET /mydatabase/_design/mydesigndoc/_view/myview?startkey="111"&endkey="111z"&include_docs=true endpoint:
{
"total_rows": 2,
"offset": 0,
"rows": [
{
"id": "111",
"key": "111",
"value": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
},
"doc": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
}
},
{
"id": "111",
"key": "111:user",
"value": {
"_id": "222"
},
"doc": {
"_id": "222",
"_rev": "1-6a277581235ca01b11dfc0367e1fc8ca",
"table": "user",
"first": "john",
"other2_id": "444"
}
}
]
}
Notice how we get two rows back, the first is the main document body, the second the linked user.

Multiple search filtering is not working in cloudant, why?

Here i quoted my code for multiple search filtering. I could not find the mistakes in that. please give a right code to make it work well.
Employee document:
{
"_id": "527c8d9327c6f27f17df0d2e17000530",
"_rev": "24-276a8dc913559901897fd601d2f9654f",
"proj_role": "TeamMember",
"work_total_experience": "3",
"personal": {
"languages_known": [
"English","Telugu"
]},
"skills": [
{
"skill_set": "Webservices Framework",
"skill_exp": 1,
"skill_certified": "yes",
"skill_rating": 3,
},
{
"skill_set": "Microsoft",
"skill_exp": 1,
"skill_certified": "yes",
"skill_rating": 3,
}
]
"framework_competency": "Nasscom",
"type": "employee-docs"
}
Design Document:
{
"_id": "_design/sample",
"_rev": "86-1250f792e6e84f6f33447a00cf64d61d",
"views": {},
"language": "javascript",
"indexes": {
"search": {
"index": "function(doc){\n index(\"default\", doc._id);if(doc.type=='employee-docs'){\nif (doc.proj_role){index(\"project_role\", doc.proj_role);}if(doc.work_total_experience){\nindex(\"work_experience\", doc.work_total_experience);}\nif(doc.personal.languages_known){for(c in doc.personal.languages_known){ \n index(\"languages_known\",doc.personal.languages_known[c]);}} if(doc.skills){for (var i=0;i<doc.skills.length;i++){\nindex('skill_set',doc.skills[i].skill_set);}}}}"
}
}
}
Run using below URL : https://ideyeah4.cloudant.com/opteamize_new/_design/sample/_search/search?q=project_role:TeamMember%20AND%20work_experience:%223%22%20AND%20languages_known:Telugu%20AND%20skill_set:Microsoft&include_docs=true

A simple way to debug this is to query the top 100 results in your index:
https://ideyeah4.cloudant.com/opteamize_new/_design/sample/_search/search?q=*:*&limit=100
This will at least tell you whether there are any documents in your index at all.
Your current query (without URL encoding) looks like:
project_role:TeamMember AND work_experience:"3" AND languages_known:Telugu AND skill_set:Microsoft
I'd suggest that some of these search values require quotes - always true when you are searching string values. Next, you could try:
project_role:"TeamMember"
see if you get any results and refine from there.
Debugging this might also be easier if you store the values as well as index them (so you can see exactly what is indexed). To do this, add an object to each index call { "store": true }. For example,
index("languages_known", doc.personal.languages_known[c], { "store": true });
Now, when you query the index it will return a list of fields which were stored with each match.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight