appengine data structure - child, parent or both? - google-app-engine

I'm trying my hand at google appengine and using the datastore with php and quercus.
I'm not familiar with Java or Python, so lots of learning going on here. I've got pages rendering, and i'm able to get data in and out of the datastore.
The app I am building has users, groups, topics and comments.
A group has users, and users can belong to multiple groups.
When a user logs in, I display the groups they are members of, and the topics of those groups.
I've got this currently built in MySql, and am now figuring out how to get it into appengine.
The way I see it, a group is a parent which has topics and users as children. Topics have comments as children.
However, I have to get the groups that a user belongs to when the user logs in. Therefore, I was thinking of a separate parent entity which stores the user, contact and login info, and that user would have children containing the group id which each user belongs to, so that I know what groups to fetch.
The users are children of the group so that I can display all the users of a group, but maybe there is a more efficient way to do it.
Like this
Groups(EntityGroup) - GroupName, Owner
↳ Topics - TopicName, Content, Owner
↳ Comments - Comment, Owner
↳ Users - userid
Users(EntityGroup) - userName, email, password
↳ userGroup - groupid
Then, when a user logs in, the logic looks like this
SELECT groupid FROM Users where password=hashofpassword+uniqueusername
foreach(groupid as group){
SELECT users from group;
SELECT topics from group
foreach(topicid as topic){
SELECT comments;
}
}
The reason I'm looking at it like this is because when a user logs in, I can't very well go looking through each group for the user, and I only would want to store the login info in one place.
Please don't recommend me to the code.google.com documentation, as I've gone through that many times already, but am not completely understanding what's going on with appengine.
also, is the way I've outlined above the proper way to visualize the datastore? I think visualizing the data has been a struggle which might be causing some of the challenges.

It looks to me like there is a many-to-many relationship between Users and Groups, yes? A user can belong to many groups, and a Group can have many users who are subscribed to it. The most logical way to represent this is AppEngine to is to give the User entity a ListProperty that holds the Key of the eahc of the groups to which he belongs. In Python, it would look like this:
class User(db.Model):
userName = db.StringProperty()
email = db.EmailProperty()
password = db.StringProperty()
groups = ListProperty(db.Key)
Whenever the User subscribes to the group, you add the Group's key to the groups list.
Likewise, the Group entity will have a ListProperty that contains the Keys of each User who belongs to it.
You wouldn't want to make the Users children of the Group, as that would make it very difficult or impossible for a User to belong to more than one Group.
The difficulty that you will have is that when a User joins a group, you will need to update the Group in a Transaction -- you can only have one User being added to a Group at a time; otherwise, you have the possibility that one write will overwrite another. Presumably, the User can be updated outside of a transaction, as he or she should only be joining one group at a time.

Related

firebase firestore nosql design for chat app with groups

I am trying to think of a way to design the firestore db in a way that is efficient.
The main issue I am having with is how I should define "groups". Lets say a user is invited to a group chat and so the client needs to retrieve the data for that group chat, should I have a "groups" collection and then find the correct group document? OR, should I have a "groups" property in the user document that has a id to reference the group to retrieve?
In SQL, having a reference in a user's groups table would be the obvious answer, but I am not sure about firestore. I don't want to look through the entire collection of groups just to find the group that the user was newly invited in. Any tips? Also, my front end is in React and I am considering using the onSnapshot method to subscribe to the collection (that seems to be the best way to have real time updates).
What i believe is best for you is this :
First have a collection, suppose you make groups, and inside that every docuent has all the group unique ids,
And inside that for every group, i.e document, you can have a collection which holds all the chats for that group and group related info , like group type, etc etc
Hope it helps. feel free for doubts

Sharded ancestor entities in GAE

I'm working on a GAE-based project involving a large user base (possibly millions of users). We use Datastore for persistency. Users will be identified both by username and by e-mail address, so these two properties should be unique across all entities of the kind. Because Datastore doesn't support unique fields other than ID, we need transactions to ensure uniqueness of these fields when new users are registered. And in order to have transactions, User entities need to be enclosed in entity groups.
Having large entity groups is not recommended, as pointed out here. Therefore, given a possible large number of stored users, I'm thinking of putting them into multiple smaller entity groups. Each group would have a common parent with ID generated from the two unique fields (a piece of the MD5 sum for instance). Inserting a new user could look like this (in Python):
#ndb.transactional
def register_new_user(login, email, full_name) :
# validation code omitted
user = User(login = login, email = email, full_name = full_name)
group_id = a_simple_hash(login, email)
group_key = ndb.Key('UserGroup', group_id)
query = User.query(ancestor = group_key).filter(ndb.OR(User.login = login, User.email = email))
if not query.get() :
user.put()
One problem I see with this solution is that it will be impossible to get a User by ID alone. We'd have to use complete entity keys.
Are there any other cons of such approach? Anyone tried something similar?
EDIT
As I've been pointed out in comments, a hash like the one outlined above would not work properly because it would only prevent registering users having non-unique e-mails together with non-unique usernames matching those e-mails. It would work if the hash was computed based on a single field.
Nevertheless, I find the concept of such sharding interesting by itself and perhaps worth of discussion.
An e-mail address is owned by a user and unique. So there is a very small change, somebody will (try to) use the same email address.
So my approch would be: get_or_insert a new login, which makes it easy to login (by key) and next verify if the e-mail address is unique.
If it not unique you can discard or .....do something else
Entity groups have meaning for transactions. I'am interested in your planned transactions, because I do not understand your entity group key hash. Which entities will be part of the entity group, and why?
A user with the same login will be part of another entity group, If i do understand your hash?
It looks like your entity group holds a single entity.
In my opinion you're overthinking here : what's the probability of having two users register with the same username at the same time ?
Very slim. Eventual consistency is good enough for this case, as you don't nanosecond precision...
unless you plan to have more users than facebook, with people registering every second.
Registering with the same email is virtually impossible for different users, since the check has already been done by the email provider for you!
Only a user could try to open two accounts with the same email address. Eventual consistency is good enough for this query too.
Your user entities each belong to their own entity group.
Actually in most use cases, your User is the most obvious root entity : people use the datastore because they need scalability, and most of the time huge scale is needed for user oriented apps.

How to prevent user to access other users' data?

PROBLEM
User authenticated into the application
Simple database schema: User ---> Document ---> Item
API to access to Document Items
If the logged user knows the id of items that belong to some other user, he can access to it.
I would like to prevent this behavior.
SOLUTION
The first solution I found is to add a userid field to every records in every table to check at every query if the record belong to the logged user.
This is a good solution? Do you know some better design pattern to prevent the user to access other users' data?
Thanks
If the documents belong to a user, adjust your queries so that only items that belong to the user's documents are retrieved. No need to add userIDs to the items themselves.
If you need to expose IDs to the users, make those IDs GUIDs, instead of consecutive numbers. While not a perfect solution, it makes it much harder to guess the IDs of other users' items,
If you're using Oracle, there's VPD, Virtual Private Database. You can use that to restrict access for users.

Which database model to store data in?

I am writing an application in Google App Engine with python and I want to sort users and user posts into groups. Users will be able to tag a post with a group ID and then that post will be displayed on the group page.
I would also like to relate the users to the groups so that only members of a group can tag a post with that group ID and so that I can display all the users of a group on the side. I am wondering if it would be more efficient to have a property on the user which will have all of the groups listed (I am thinking max 10 or so) or would it be better to have a property on the Group model which lists all of the users (possibly a few hundred).
Is there much of a difference here?
Your data model should derive from the most likely use cases. What are you going to retrieve?
A. Show a list of groups to a user.
B. Show a list of users in a group.
Solution:
If only A, store unindexed list of groups in a property of a user entity.
If both, same as above but indexed.
If only B, store unindexed list of users in a property of a group entity.
NB: If you make a property indexed, you cannot put hundreds of user ids in it - it will lead to an exploding index.

Which is the best way to relate some tables?

I want to make an application where there will be different users and each user will have a set of friends which will be put in categories. There will be some default categories, but the user will be able to add his own. I was wondering which will be the best way to do this.
My idea is to have 3 tables - user, friends and categories.
The user table to have fields (one to many) for friends and categories (but I don't know if the user table will need any information about the friends and the categories at all).
The friends table to have a field for categories (one to many) and a field for the user (many to one).
The category table to have fields for user (many to many?) and friends (many to many?).
I'm not sure about the relations, too. I'm using PHP with MySQL and Symfony2 and Doctrine2. Please help!
EDIT
Maybe I haven't described exactly what I need. When you open the app, you see a login form. If you don't have an account, you should register - the registration creates a new user. This user isn't connected with other users (I'm still new to programming and I want something a little easier so it's something like phonebook). Each user has a list of friends and a firend is a row in a table with fields such as name, addres, phone, email, photo, birthday and so on, but they are added by the current user. The friends are not users. Every user is in fact an account with password and username and when you log in there is just a list of friends. So each user creates categories for himself and he has nothing to do with other users and their categories. The category will have only id and name.
So the idea is that you create an account, then create some categories and add friends to them just to have an organiser when you friends are born or where they live, or which is their phone number, but you create them and add the information about them, they are to users themselves. It's not like a social network. Just a notebook where each user can write info about his friends.
First of all, you need to understand the role of intersection tables: if user A labels user B as a friend (i.e. there is a many-to-many relation from user to itself), and you create a new table to represent that relation (the friends table), any additional information about this "friendship" should be linked to that table. So, if a user categorizes his friends in some way, the category applies to friends, not to user. There's no need for a relation between category and user for this specific purpose.
Update: since friends are not users, the friends table will not be an intersection table (and thus have only one reference back to user, denoting the "owner"), but the rest of the answer still applies.
I'm assuming each category will be a row in the category table. Additional information about the category might be added, but it should be limited to that. For instance, if you want to know which user created a category, you could add a foreign key to user labeled for instance "owner" or "created_by". That might be useful if categories created by one user are not to be seen by others.
Finally, you can relate friends with category. If User A can put user B in at most one category, then a foreign key from friends to category should suffice (i.e. a one to many relation). Otherwise, you might need another many-to-many relation, so an additional intersection table should be created (for instance friend_category).
You could avoid this extra table by employing denormalization, having multiple rows in friends where both users are the same (and in the same order) but the category is different (see also this example). Whether this is advantageous or not is beyond the scope of this answer, but IMHO using an extra table is better for now (it might seem more complicated, but it will be easier to maintain in the long run). (Update: if friends is not an intersection table, denormalizing like this is not really an option, so stick with the friend_category table)
In the end, your layout would look like this:
user friends friend_category category
---- ------- --------------- --------
(user fields) <-- user (owner) <-- friend (category fields)
(friend fields) category --> user (owner) --+
^ |
| |
+--------------------------------------------------------------------+
I can suggest the following table set for this (this scheme applies to the phonebook or social network tasks as well):
Table "Users" that stores all the information about users:
UserId
Name
Phone
Address
... (any other fields)
Table "Categories" that stores information about relationship categories:
CategoryId
Name
Table "Relationships" that stores information about relationships between users:
FirstUserId -> Link to Users table
SecondUserId -> Link to Users table
CategoryId -> Link to Categories table
So, any user is able to add new categories, and then reference them when adding new relationship to another person.
If you need to select all user's friends, you will have to:
select fr.* from Relationships r join Users fr on r.SecondUserId = fr.UserId where r.FirstUserId = <Current user id>

Resources