Modelling N-to-N relationship with one document per relationship - database

There are 2 collections in my app, user and friend. The friend collection will be for storing the the friendship relationship between the users of the app.
In a relational database, the friend collection will have one row for each friendship, which involves 2 users.
In MongoDB, I am advised to store all of a user's friend in an array field.
What is wrong with storing one document per friendship? Is there really no scenario where this schema would be preferred over storing friends in an array?
The goal is fast queries. The friend collection will be used as the from collection in a $lookup stage of a find query on the user collection. And the friend collection will have compound index on the friends' user ids.
What is wrong with this schema design?

Related

How to model One-To-Many relationships in DynamoDB such that all child relationships can be queried?

One common strategy for one-to-many relationships in DynamoDB is using a composite primary key; a broader Partition key as parent, and a narrower Sorting key that contains child relationships. The following example is taken from this article by data scientist Alex Debrie:
This strategy solves the most common retrieval use cases:
Retrieve all metadata and users in an organization
Retrieve all the users within an organization
Retrieve metadata about an organization
Retrieve specific users within an organization
The use case I am trying to solve does not seem to be covered by this model. What if you wanted to retrieve all the metadata from all organizations combined? Or what if you wanted to know all the users across organizations? Even by using a Global Secondary Index, you have no choice but to split all fields into their own partition; this requires a scan to retrieve. Does dynamo have any features to facilitate these kind of retrievals?
To retrieve the metadata for all organizations, you can model it the following way: The SK for the respective metadata items doesn’t need to contain the company name. Just give it the value “META” for every company. Create a GSI that uses the SK as PK. This way you can query all items from the index using the PK “META”.
To retrieve all users across organizations, add an attribute “type” with the value “USER” to all user entries. Create a GSI with this type as PK. This way you can query all users from the index using the PK “USER”.

Correct way to organize my DynamoDB tables

This is the first time I use DynamoDB. I have a question about how to organize my tables.
I own a table containing users, I will have words associated with those users.
Should I create a property with an array in the user or a new table? I will also have a history of the words searched by each user and I also have the same doubt.
What determines which way to go?

GAE: NDB query on list

Google App Engine NDB data model like so:
Users
Username
FirstName
LastName
Posts
PostID
PosterUsername
SubscribedPosts
PostID
SubscriberUsername
For a specific user, I want to return all the Posts which the user is subscribed to and display them on the page.
Since the wonderful NDB doesn't support JOINs, we do two queries:
postIDList =
SubscribedPosts.query(SubscribedPosts.SubscriberUsername == 'johndoe').fetch()
This gives us a list of SubscribedPosts. So how do I take my postIDList list and use it as a filter criteria for a Posts query?
Something like:
results = Posts.query(Post.PostID IN postIDList.PostID)
In a normal relational database, this would be a simple query using table joins. How is this done in Google's ndb?
You are going to run into lots of bottlenecks if you try to design your datastore models the same way you would design tables in a relational database as you have in this example.
Your comment goes in one possible right direction, although there are other solutions. Going that route, I would drop the "subscribedPosts" model altogether use a repeated KeyProperty entity in the User model to store subscribed posts.
See this related post: One-To-Many Example in NDB
Seems you are looking to model a many-to-many relationship, not one-to-many. Read Modelling Entity Relationships (althought this is for the older db, not newer ndb, it still gives the idea).
One of the two entities should maintain a list of keys (repeated=True) of the related other entities. Which entity should hold the list? Preferably the list should be on the side that usually has fewer relationships so that the list of keys is smaller. Another consideration is which side likely has less contention for updates.
In your specific case, lets say on average users subscribe to 10 posts and lets say on average each post has 100 users subscribed to it. In this case, we would want to put the list of keys on Users side of the relation.
class Users(ndb.Model):
user_name = ndb.StringProperty()
first_name = ndb.StringProperty()
last_name = ndb.StringProperty()
posts = ndb.KeyProperty(kind='Posts', repeated=True)
class Posts(ndb.Model)
post_id = ndb.StringProperty()
poster_user_name = ndb.StringProperty()
Establish the relationship by adding to the list in the Users instance:
current_user.posts.append(current_post.key)
For a given Users instance, getting all subscribed Posts is easy since the list of keys of the subscribed Posts is already within the given Users:
ndb.get_multi(given_user.posts)
For a given Posts instance, get all subscribing Users by ...
query = Users.query(Users.posts == given_post.key)

Which database model to store data in?

I am writing an application in Google App Engine with python and I want to sort users and user posts into groups. Users will be able to tag a post with a group ID and then that post will be displayed on the group page.
I would also like to relate the users to the groups so that only members of a group can tag a post with that group ID and so that I can display all the users of a group on the side. I am wondering if it would be more efficient to have a property on the user which will have all of the groups listed (I am thinking max 10 or so) or would it be better to have a property on the Group model which lists all of the users (possibly a few hundred).
Is there much of a difference here?
Your data model should derive from the most likely use cases. What are you going to retrieve?
A. Show a list of groups to a user.
B. Show a list of users in a group.
Solution:
If only A, store unindexed list of groups in a property of a user entity.
If both, same as above but indexed.
If only B, store unindexed list of users in a property of a group entity.
NB: If you make a property indexed, you cannot put hundreds of user ids in it - it will lead to an exploding index.

App Engine Datastore: entity design and query optimization

I have a system where users can vote on entities, if they like or hate them. It will be bazillion votes and trazillion records, hopefully, some time in the future :)
At the moment i store a vote in an Entity like this:
UserRecordVote: recordId, userId, hateOrLike
And when i want to get every Record the user liked i do a query like this:
I query the "UserRecordVote" table for all the "likes", then i take the recordIds from that resultset, create a key of that property and get the record from the Record Table.
Then i aggregate all that in a list and return it.
Here's the question:
I came up with a different approach and i want to find out if that one is 1. faster and 2. how much is the difference in cost.
I would create an Entity which's name would be userId + "likes" and the key would be the record id:
new Entity(userId + "likes", recordId)
So when i would do a query to get all the likes i could simply query for all, no filters needed. AND i could just grab the entity key! which would be much cheaper if i remember the documentation of app engine right. (can't find the pricing page anymore). Then i could take the Iterable of keys and do a single get(Iterable keys). Ok so i guess this approach is faster and cheaper right? But what if i want to grab all the votes of a user or better said, i want to grab all the records a user didn't vote on yet.
Here's the real question:
I wan't to load all the records a user didn't vote on yet:
So i would have entities like this:
new Entity(userId+"likes", recordId);
and
new Entity(userId+"hates", recordId);
I would query both vote tables for all entity keys and query the record table for all entity keys. Then i would remove all the record entity keys matching one of the vote entity keys and with the result i would get(Iterable keys) the full entities and have all the record entites which are not in one of the two voting tables.
Is that a useful approach? Is that the fastest and cost efficient way to do a datastore query? Am i totally wrong and i should store the information as list properties?
EDIT:
With that approach i would have 2 entity groups for each user, which would result in million different entity groups, how would GAE Datastore handle that? Atleast the Datastore Viewer entity select box would probably crash :) ?
To answer the Real Question, you probably want to have your hateOrLike field store an integer that indicates either hated/liked/notvoted. Then you can filter on hateOrLike=notVoted.
The other solutions you propose with the dynamically named entities make it impossible to query on other aspects of your entities, since you don't know their names.
The other thing is you expect this to be huge, you likely want to keep a running counter of your votes rather than tabulating every time you pull up a UserRecord - querying all the votes, and then calculating them on each view is very slow - especially since App Engine will only return 1000 results on each query, and if you have more than 1000 votes, you'll have to keep making repeated queries to get all the results.
If you think people will vote quickly, you should look into using a sharded counter for performance. There's examples of that with code available if you do a google search.
Consider serializing user hate/like votes in two separate TextProperties inside the entity. Use the userId as key_name.
rec = UserRecordVote.get_by_key_name(userId)
hates = len(rec.hates.split('_'))
etc.

Resources