Good way of implementing a twitter-like follower system? - google-app-engine

I'm trying to create a twitter-like follower system (users can follow one another). I'm confused about a good way to store the follower relationships. I'm using JDO (on google app engine).
The first thing that comes to mind is to keep a Set for followers, and the ppl you are following. Something like:
class User {
private String mUsername;
private Set<String> mFollowers;
private Set<String> mFollowees;
}
I'm worried about what happens when those sets grow to have like 10,000+ entries in them. Viewing a user's page is going to be a common operation, and I'd hate to have to load the entire Sets every time my API needs to generate user info. I'm only going to be showing 50 followers at a time anyway, so it makes no sense to load the entire Set.
An alternate could be using an intermediate class to store relationships, this way they are not bound to the User object. Paging should then also be easy (I think). For example, whenever I want to follow a user, I'd create an instance of this object:
class RelationshipInfo {
private String mMyUsername;
private String mUsernameYouAreFollowing;
}
so when I view a user's page, I could query for the first 50 such records above given the user's id. Does that make any performance sense? I'm not sure if this is better than the first option above. This way would require more trips to the datastore.
Any thoughts would be great,
Thanks

Brett Slatkin's Building Scalable, Complex Apps on App Engine talk from last year's Google I/O actually uses a Twitter-like application as its example. Even aside from that, it's a great talk and I highly recommend it even if it didn't relate specifically to what you're asking.
Also, you may want to check out Jaiku, an open-source Twitter-like application built on App Engine.

Related

Mongo DB: Single collection per user whit all interaction, or Multiple Collections per argument?

Good Evening.
I'm pretty new to mongo db and i'm planning to make an app who will work whit Nosql(MongoDB).
The scope of the app is pretty simple:
Register a profile
Request item to a shopper
Fulfill and sent payment notice.
If i would make this whit SQL i would create a User Table, A Request Item Table a sending Paymen Table.
I would, also in order to learn something, to make it whit NOsql, and i choose mongo.
I could create 3 collection and put every different document and make a search every time i need.
OR, and this is the question, COULD i create collection for EVERY user, and inside every user put every interaction of the very same user.
So if i need to search for User10 order and paymen, i would look only inside User10 collection and search for every item he\she requested.
But on the other hand, how much can affect me if i need to search all order in a specific timeframe? It should be slower than SQL i suppose.
Is a acceptable way to do this, there are some backdraw i did not yet seen, or is discouraged in order to make another approach?
The backend would be write in Java, meanwile the app (for...reason) would be write in Xamarin.Form
.
While this is possible I would personally recommend against this as this is considered an anti pattern, you should read this article about this very topic.
I would personally ask myself what are the advantages of this approach that i'm hoping to gain? if quick queries at a user level is what you seek this should not be a problem with sufficient indexes. (on user_id and on timeframe ).
There are other standard solutions built to deal with scale like collection sharding. From my personal experience MongoDB deals with scale very well, It sounds like this is a personal project to learn from which probably means you'll never really reach hyper scale, The first barrier you'll probably encounter is hardware.

How to make App Engine Datastore private

I'm developing an App Engine app that offers users to keep a diary.
Now, I noticed that I can check all data in datastore through Developers Console.
This is not good for a diary app for privacy.
So I want to know how to make datastore private to prevent me from checking users' data.
Please help me.
This is a little bit tricky since the code can read the data in the datastore and so, by definition, anyone who can update the running code can also read the data in the datastore; however, there are ways that you can at least make it more difficult to inadvertently examine the data (though accessing the data will still be technically possible for you or any of the owners to do). The simplest way is to encrypt the data before storing it within the datastore model objects (and decrypting it when you read the data from the model objects); however, this will make indexed fields no longer work if you do that (you will need to decide whether that content really needs to be indexable or whether it is worthwhile to add manual indexing).
If you want data to not be readable by you at all, then you will need to encrypt/decrypt the data with a key that is only available to your application while the user is interacting with it (e.g. encrypting the data in the client that communicates with your server); however, you need to be aware that this will make any sort of indexing or background processing of the data impossible.
The only way to prevent you from viewing data in the datastore is to remove you from the developers of the app. A developer can always extract data if he wants to, either by looking it at directly in the Datastore viewer or by writing code that can read/forward this data.

How to load seed data into app engine's datastore using golang?

I need to create a dictionary table inside of datastore and I would like to be able to have the data seeded by a script. Is there an easy way of doing this using go?
Ideally I'd like to be able to add entries to a list of "names" and the script should go through the list check if dictionary table contains and entry with the name and if not it should create it. It would also be cool if it would run only on application restart.
So, to begin, I'll gently remind you that there are no "tables" in Datastore. Thinking using the terms of the RDBMS world will only confuse you. Please take the time to really understand the underlying storage mechanisms and data structures. I'd recommend this video for an in-depth look at what Datastore actually is.
So to get to your actual use-case, "application restart" is also something of a tricky term to use in relation to app engine. I'd recommend getting familiar with the actual infrastructure that runs your app - when instances of what scaling type are turned on and off, how they can communicate, etc...
The best way, apart from these nitpicks, to have the list of words ("Names") to check for stored in your datastore. A cron job at a given time interval would run and this code would fetch this archetypal list of Names. Then the code would initiate a task queue job to run the following process for that name: check if the name already has a user defined value in the "actual" data, and if so, just exit. If it doesn't exist, create it. When a user wants to actually define data on that name, the code that their request goes through would retreive that object, test for a flag to know whether it was created by the automated cron job's task queue or not, and take action accordingly. The best way to initially populate the archetypal list of Names that should be checked against would be to use a bulk upload from CSV using appcfg to populate. This feature is mostly undocumented and although not technically deprecated, might take some thinking to get working right.
Best of luck in your coding endeavours.

Strong consistency in Datastore (HRD)... my idea

I'm hoping that this isn't flagged as "not helpful" because I think that many people are attempting to figure out a way to keep strong consistency in the HRD.
Here is the idea I'm using for my app. I'd like to get your opinions.
I have a fitness app. This is of course made up of Workouts and Exercises.
The HRD contains about 400 exercises to pick from, or the User can create their own Exercise (a UExercise).
When the User logs in, I load all of the Workout keys into a "workoutKeys" List on the User. At the same time I load all the User exercise keys (UExercise) into a "exerciseKeys" List also on the User.
If the user wants to add/delete exercises from a specific workout, the Workout is loaded and all its Exercise keys are loaded into a "exerciseKeys" List on the Workout.
See a pattern here?
So whenever I want to view Exercises created by the user (UExercise) or the users Workouts, or the Exercises in that Workout, I do a get() using those keys.
Since a user would probably not have 1000's of Workouts, or create 1000's of Exercises, I think this is a safe and quick way to achieve strong consistency.
Not saying that this is the best way for EVERY app. But for mine I believe it will work well.
I would greatly appreciate all of your input as to if there is something I may be missing here, or not properly taking into consideration.
Ok... After some careful consideration of how my app will work, and how users actually use it, I have decided to ditch the idea above and go with Ancestor Queries.
So for the above models, I have come up with the following...
For a Workout, I make the User the parent
For an Exercise created a user (UExercise), I make the User the
parent
This allows me to use Ancestor Queries (which are strongly consistent) to pull the most recently added or modified Entities.
Due to the fact that the user will not be modifying these Entities en mass, I think the limitations on the writes will not be a factor.
This also rids me of properties on Model objects that should not really be there in the first place.
By the way, I also tried Memcache. I found this to be the ultimate pain. Having to keep the Memcache and the Datastore in sync seemed to inject much more complexity than was really needed.
But your site, and results may differ. This idea works well for my app.
Thanks!

App Engine entity groups: grouping all of a user's data vs avoiding them as long as possible

I'm working on a web application that allows users to create simple websites and publish them on a static web host, and I have problems deciding if and how I should use ancestors in the gray area between necessary and avoidable.
The model is rather simple: currently every User has one or more Website entities. The Website entities store all the basic information about a website, plus it's nested navigation menu that refers to Page entities (the navigation tree is stored as a JSON property). The Page entity types are based on a PolyModel, and there are several page types that behave differently (There's a GalleryPage, for example).
There are no entity groups (or rather, no entities with ancestors) as of yet, and I'll only need a couple of transactions. When updating a Page's name, for example, I have to update it in the Page entity itself as well as in the navigation tree on the Website entity.
I think I understand how entity groups work and the basic implications of using them, but I have trouble deciding on the "best" way to structure my data in the absence of strong reasons for either approach. I could:
Go entirely without ancestors on my entities. As far as I understand I can still use cross-group transactions as long as I get the entities by key and don't need more than 5 within the transaction. The downside is that I'd depend on the XG transactions and there might come a point where I can't ninja my way around using ancestor queries anymore (and then it might be too late).
Make the user object the parent of all his Website's, Page's and other data. This would give the user a strongly consistent view of all of his data, allow me to use transactions whenever I add a feature that would need them, but limits the sustained writes to 1-5/sec. But, as a user will only ever be updating his own data, this might actually work and behave just the same for 1000 users as it will for 1.
Try to use even smaller entity groups (like seperating the navigation from the Website and making that the parent of the Website's Pages). But I'm not quite sure if there's much benefit to this, because most of the editing happens on Pages anyway.
So I guess the real question is: how do you decide when to use ancestor relationships on App Engine when there's no obvious reason for or against them? Would you go for the convenience of strongly consistent queries and being able to use transactions freely while adding features later, or would you avoid them at all costs until there's a very obvious reason for them, even if might limit my ability to do transactions later?
I read the related documentation, read the chapter on transactions in "Programming App Engine", looked at quite a few of the Google I/O videos, but I still find it hard to make that decision.

Resources