I'm currently designing an application similar to twitter/jaiku/reddit in structure. Basically there are small posts with upvotes and downvotes, and they are sorted by score and time like reddit.
I've gotten all of this working, but now our requirements have changed a bit, and we need the user to be able to mark a post as 'read'. This would make the post no longer show up in that user's feed. I can model this with a Read entity for each tuple of (User, Post), but this would require a lot of work to find posts which 'do not' exist in that table. Alternatively I can invert that relation so that I have one entity for each unread post, and it becomes much easier to find which posts 'do' exist in the table... But then I'd need to create an entry in this table for every single user everytime a post is made. This would not scale well.
My question is this: How would I model this sort of negative information in appengine's datastore? I'm using the go runtime if that matters, but answers for any runtime are fine.
This would be a many-to-many relationship. This article describes how to model different kinds of relationships, including many-to-many. The only issue is that I'm not sure weather you should store a list of read posts on the user, or a list of users who have read it, on the post, as poth lists might get large in different situations. If posts are relatively private, and not seen by many people, you could store a list of user keys on the post model. But, if one post could be seen by thousands of people, it might be better to store a list of posts on the users, as there wil probably not be many users with thousands of read posts. Another option might be to discard old posts, or just discard their read state.
Related
I have three entities: user, post and comment. A user may have multiple posts and a post may have multiple comments.
I know I can add ancestor relations like this:
user(Grand Parent) post(parent) comment(child)
I'm little bit confused about ancestors. I read from documention and searches that ancestors are used for transactions, every ancestors are in same entity group and entity groups are stored in same datastore node which makes it less scaleable. Is this right?
Is creating user as parent of posts and post as parent of comments a good thing?
Rather than this we can add one extra property in the post entity like user_id as shown in example and filter by it.
Which is better/more scalable: filter posts by ancestors or add an extra property user_id in the post Entity and filter by it?
I know both approaches can get the same results but I want to know which one is better in performance and scalability?
Sorry, I'm new in datastore.
Update 11/4/2017
A large number of users is using this App. It's is quite possible there are more
than one posts per sec. But A single user can not create posts more than one per sec. But multiple user may be. As described in documentations maximum entity group write rate of 1/s. Is it still possible to use Ancestor ?
Same for comments. Multiple user can add comment in a same entity group. It's is
quite possible more than one comment in one sec.
Ancestor Queries are faster ?
I read in many places that ancestors queries are much faster than others.
As I know the reason why they are fast is that because it create entity group and store related data in same node. So, it require less time to get data from single node as compare to multiple nodes.
For Example: If post is store in Asia node and comment is store in Europe node and I want to get posts and comments then datastore API need to fetch two nodes to complete request. Which make it slow. Rather than if I create ancestor relation and make entity group which create a better performance.
But what if I don't need to get post and comment data at same time. If I need post in separate web page and comment in separate page.In this scenario datastore api need to fetch only one node at a time.It is not matter data save in single node or save in multiple node. What about query performance can ancestor make it fast in this case ?
Yes, you are correct: all ancestry-related entities are in the same entity group, which raises 2 scalability issues: data contention and maximum entity group write rate of 1/s. See somehow related Is there an Entity Group Max Size?
There are advantages of using ancestries and some may be willing to sacrifice scalability for them (see What would be the purpose of putting all datastore entities in a single group?), but IMHO not for your kind of app: I think you'll agree that it's not really critical to see every new user/post/comment in random searches immediately after it is created (i.e. strong consistency) - the fact that it eventually appears is IMHO good enough.
Simply having no ancestry at all and adding additional model properties (entity keys or even just entity key IDs for entities which never have ancestors) to allow cross-referencing entities is the more scalable approach and IMHO fits well with your app.
I think the question to ask is: Are you expecting:
User to create Posts more than once per seconds (I doubt :)
People to comment on a Post more than once per second (could happen)
It not, then having ancestors queries will be faster than normal queries. So it depends of your usecase. I'd go for query speed unless you know you will have thousands of comments on posts.
I'm creating a prototype group list application. I want the following objects:
User
List
Item
Comment
I think that I should structure this as follows:
http://myapp.firebase.io/user/
http://myapp.firebase.io/user/uid/lists/
http://myapp.firebase.io/list/
http://myapp.firebase.io/item/listid/
http://myapp.firebase.io/comment/itemid
where http://myapp.firebase.io/user/uid/lists/ points to list unique id's, http://myapp.firebase.io/item/listid/ points to all item objects for a given list, and http://myapp.firebase.io/comment/itemid points to all comments for a given item.
Does this structure make sense? The reason I did it this way instead of nesting further (i.e. http://myapp.firebase.io/list/listid/item/ for items and http://myapp.firebase.io/list/listid/item/itemid/comment for comments) is because it says in the documentation that whenever you fetch an object you fetch all children. Sometimes (perhaps even most of the time) I want to fetch a list's items, but not each item's comments. I might only want to do that when a user clicks on the item.
In a NoSQL database you should model your data for how you intend to use it. I highly recommend reading this article on NoSQL data modeling.
The top-level structure seems fine and does not violate Firebase's recommendation to limit nesting of data. But there are many other places where you might still make mistakes (which is one of the reasons this question is a bit too broad for Stack Overflow, but I'll try to answer some of it anyway).
I'd separate out the user's lists into a separate top-level node:
/userlists/$uid/$listid
That way the /users/$uid nodes would just contain the user's profile information and you could cheaply show a list of users. You might even consider splitting the most visible aspect of the user profile into another top-level node, to make the showing of such a list even cheaper.
/usernames/$uid
You'll be duplicating data in this case. But storage is (relatively) cheap, and optimizing for the more common reading of data is one of the reasons NoSQL databases can scale so well.
As you may notice, I focus on showing a list of user names, retrieving the lists for a user and accessing the profile for a specific user. These are use-cases and we're modeling the data to fit them.
In a NoSQL database you should model your data for how your app accesses it. I highly recommend reading this article on NoSQL data modeling.
After that, write out your list of use-cases and see how you can most easily access the data for it. Liberally denormalize and occasionally duplicate the data, to fit the use-cases. Use multi-location updates to keep denormalized and duplicated data in sync with its main entity.
It's better to start with an example to illustrate this case. Let's say we have an User class and it should have an list of Post.
The first thought is to create this list inside the User class, but analyzing the use cases we find out that most of the times we want to retrieve the user without its posts and retrieve the posts without the user. However we need the user ID to retrieve posts. So the other way to create the data model is to not have the associations but create Post indexed by User ID.
In terms of cost, what are the pros and cons of both implementations?
See the billing page, in particular the section on the datastore operations:
https://developers.google.com/appengine/docs/billing
Datastore read costs grow per entity.
Datastore write costs grow per indexed property.
The first method will be much cheaper since it only operates on one User entity, and there's no indexing required.
However, cost probably isn't your sole deciding factor. Entities are limited to 1MB each, so if you're storing your posts within your User entity, you'll likely run into a wall. Time to read/write entities also depend on size, so large entities will take longer to read/write.
My previous answer was assuming you were actually storing a list of Post objects within your User entity. It sounds like you're asking if the User and Post are both entities, and the User stores a list of keys to the Posts.
The main benefit to the first case (User with a List of keys to Post entities) is that it enables you to fetch Posts in a consistent manner. After getting the User object, you can read the list of POSTS and fetch them individually. Datastore get-by-key operations are consistent. Depending on how you issue the get operations, this may be slower than a query.(ie, if you just use a for loop).
There's a possible very minor other benefit is as long as you don't index your Post List in your User, you can update your User relatively inexpensively this way. As an extreme example, if your User adds 5 Posts at once, you can add them all to the list, and then write the User once with one write operation. This isn't really all that great, since you probably have to write your Post entity anyways, But it's one less index write op per entity.
There is still the limit on the size of the User entity, so your List will have a maximum limit. There's also a maximum on the number of index entries per entity, so if you index the List, that could be a limit (but that would make the User entity more expensive to write too).
From a read perspective, the first case is non optimal.
The second case works better from a read perspective, it makes it easier to get Posts if you have the User id, but you have the index write ops when you write your Post. If you don't write Posts often, this is better. Note that queries are eevntually consistent.
I'm going to write simple news site on redis with supporting followers.
I can't imagine how can I organize users timeline like in twitter. I read about Retwis ( http://redis.io/topics/twitter-clone ), but its feed creating method seems stupid. What if I want to remove entries? I'll should to remove all entry references from followers feeds. What if I already do not follow some users?
There are several ways to attack what you describe using a bit of imagination, here are some examples that address your questions:
What if I want to remove entries?
One could mantain a set such as post:$postid:users for each post, holding all the userids that may have the post in their feeds; when the post is to be deleted one just has to extract all members from this set and iterate through the ids to remove it from each uid:$userid:posts set; speaking of which you would have to turn that last one into a set instead of a list like the original article suggests in order to be able to extract and remove individual items but that is trivial, the logic is pretty similar.
What if I already do not follow some users?
When the feed is being generated for each individual user you have to necessarily iterate and read each post:$postid key, from which you have access to the author userid; so before showing the post you read this id and look it up in the uid:$userid:following set, if it's there we show the post, if it's not we delete it from uid:$userid:posts and don't show it.
In a nutshell, this is what you have to keep in mind in order to build this kind of logic in redis:
You'll need many commands, but that's ok, Redis is supposed to be fast enough to handle it well.
Data will repeat, but that is also ok; it may look insane for someone with a relational DBMS background to store a set of users for each post if each user already has a set with their posts, but this is the only way around building relationships in a non-relational data store like redis.
Generally speaking think of sets and sorted sets when designing something relational in Redis.
With redis you get to do everything yourself, but once you get your head around it it's actually pretty powerful.
I'm working on a notification feed for my mobile app and am looking for some help on an issue.
The app is a Twitter/Facebook like app where users can post statuses and other users can like, comment, or subscribe to them.
One thing I want to have in my app is to have a notifications feed where users can see who liked/comment on their post or subscribed to them.
The first part of this system I have figured out, when a user likes/comments/subscribes, a Notification entity will be written to the datastore with details about the event. To show a users Notification's all I have to do is query for all Notification's for that user, sort by date created desc and we have a nice little feed of actions other users took on a specific users account.
The issue I have is what to do when someone unlikes a post, unsubscribes or deletes a comment. Currently, if I were to query for that specific notification, it is possible that nothing would return from the datastore because of eventual consistency. We could imagine someone liking, then immediate unliking a post (b/c who hasn't done that? =P). The query to find that Notification might return null and nothing would get deleted when calling ofy().delete().entity(notification).now(); And now the user has a notification in their feed saying Sally liked his post when in reality she liked then quickly unliked it!
A wrench in this whole system is that I cannot delete by Key<Notification>, because I don't really have a way to know id of the Notification when trying to delete it.
A potential solution I am experimenting with is to not delete any Notifications. Instead I would always write Notification's and simply indicate if the notification was positive or negative. Then in my query to display notifications to a specific user, I could somehow only display the sum-positive Notification's. This would save some money on datastore too because deleting entities is expensive.
There are three main ways I've solved this problem before:
deterministic key
for example
{user-Id}-{post-id}-{liked-by} for likes
{user-id}-{post-id}-{comment-by}-{comment-index} for comments
This will work for most basic use cases for the problem you defined, but you'll have some hairy edge cases to figure out (like managing indexes of comments as they get edited and deleted). This will allow get and delete by key
parallel data structures
The idea here is to create more than one entity at a time in a transaction, but to make sure they have related keys. For example, when someone comments on a feed item, create a Comment entity, then create a CommentedOn entity which has the same ID, but make it have a parent key of the commenter user.
Then, you can make a strongly consistent query for the CommentedOn, and use the same id to do a get by key on the Comment. You can also just store a key, rather than having matching IDs if that's too hard. Having matching IDs in practice was easier each time I did this.
The main limitation of this approach is that you're effectively creating an index yourself out of entities, and while this can give you strongly consistent queries where you need them the throughput limitations of transactional writes can become harder to understand. You also need to manage state changes (like deletes) carefully.
State flags on entities
Assuming the Notification object just shows the user that something happened but links to another entity for the actual data, you could store a state flag (deleted, hidden, private etc) on that entity. Then listing your notifications would be a matter of loading the entities server side and filtering in code (or possibly subsequent filtered queries).
At the end of the day, the complexity of the solution should mirror the complexity of the problem. I would start with approach 3 then migrate to approach 2 when the fuller set of requirements is understood. It is a more robust and flexible approach, but complexity of XG transaction limitations will rear its head - but ultimately a distributed feed like this is a hard problem.
What I ended up doing and what worked for my specific model was that before creating a Notification Entity I would first allocate and ID for it:
// Allocate an ID for a Notification
final Key<Notification> notificationKey = factory().allocateId(Notification.class);
final Long notificationId = notificationKey.getId();
Then when creating my Like or Follow Entity, I would set the property Like.notificationId = notificationId; or Follow.notificationId = notificationId;
Then I would save both Entities.
Later, when I want to delete the Like or Follow I can do so and at the same time get the Id of the Notification, load the Notification by key (which is strongly consistent to do so), and delete it too.
Just another approach that may help someone =D