app engine logging to database and entity groups - google-app-engine

In my application I have a Profile entity, which have some children, like ProfileAccount, ProfileLink, etc. They're usually updated in a transaction, like
def update_profile(key):
profile = db.get(key)
accounts = db.query("SELECT * FROM ProfileAccount WHERE ANCESTOR IS :1", profile)
# do something with accounts and profile
profile.put()
I call it with db.run_in_transaction(update_profile, key), but I need to have an administrative log of everything that happens when the profile is updated, so I created a generic AdminLog entity which contains a reference to a Profile, the timestamp and arbitrary string data. This would be processed later to check what happened since the last user login.
The problem is as AdminLog doesn't belong to the same entity group as the Profile, I cannot add it on the same transaction, but on the other side, I don't think it would be clever to put all those logs under the same entity (Profile), as it's not essential data.
One thing I thought about would be a StringList on the Profile, that would be cleared on each login, so this way I'd have everything that happened to the profile. Do you think that's a nice approach, or maybe there's some other workaround for this kind of situation ?
Thanks in advance for any tips

Using child entities seems like the best option. It ensures you can update them transactionally, and associates the changes with the entity they apply to. If you wish, you can garbage collect old admin log entries to save space.

Related

Implement Terms and Conditions for IdentityServer (something similar to Consent Service)

Our applications do not require consents from the users, whereas they require terms and conditions (tandc) to be accepted before accessing the application. We have two tables to track tandc, something like this:-
TersmAndConditions
-TermsAndConditionsId
-Message
-IsActive
UserTermsAndConditions
-UserId
-TermsAndConditionsId
-AgreedDate
Current approach:
Every time we come up with a new tandc (which happens when there is a major change in the application), we insert a record in the TermsAndConditions table. When a user signs into our application, right after the authentication we check whether there is a record in the UserTermsAndConditions table for the authenticated user, if yes, we issue tokens, and if there is no record we show the tandc page, and when the user agrees the tandc, we insert a record into the UserTermsAndConditions table.
We're planning to do the same by coming up with a new implementation of IConsentService instead of the DefaultConsentService. Is this a right approach?

GAE Transaction in root entity

I'm new to GAE and I have some questions about transaction with the DataStore.
For example, I have a user entity, which is created when the user adds my app on Facebook. I get some properties with the Facebook API, but I want to add a username for the user, and it needs to be unique. So in the transaction scope I call this method:
def ExistsUsernameToDiferentUser(self, user, username):
query = User.all()
query.filter("username", username)
query.filter("idFacebook != ", user.idFacebook)
userReturned = query.get()
return True if userReturned else False
But GAE gives me this error:
BadRequestError: queries inside transactions must have ancestors
Ok, I understand, but the user doesn't have any ancestor, it's a root entity. What do I have to do?
I see what you're trying to do now.
By forcing the use of ancestors, the datastore forces you to lock down a portion of the datastore (everything under the given ancestor) so you can guarantee consistency on that portion. However, to do what you want, you essentially need to lock down all User entities to query whether a certain one exists, and then create a new one, and then unlock them.
You CAN do this, just create an entity, it can be an empty entity, but make sure it has a unique key (like "user-ancestor"), save it, and make it the ancestor of every User entity.
THIS IS A PROBABLY A BAD IDEA since this limits your performance on User entities, particularly on writes. Every time a new user is created, all User entities are prevented from being updated.
I'm trying to illustrate how you need to think about transactions a bit differently in the HRD world. It's up to you to structure your data (using ancestors) so that you get good performance characteristics for your particular application. In fact, you might disagree with me and say that User entities will be updated so infrequently that it's ok to lock them all.
For illustrative purposes, another short-sighted possibility is to create multiple ancestors based on the username. ie, one for each letter of the alphabet. Then when you need to create a new User, you can search based on the appropriate ancestor. While this is an improvement from having a single ancestor (it's 26 times better), it still limits your future performance up front. This may be ok if you know right now the total number of users you will eventually have, but I suspect you want hundreds of millions of users.
The best way is to go back to the other suggestion and make the username the key. This allows you the best scalability, since getting/setting the User entity by key can be transactional and won't lock down other entities, limiting your scalability.
You'll need to find a way to work your application around this. For example, whatever information you get before the username can be stored in another entity that has a RelatedField to the User which is created later. Or you can copy that data into the User entity after the User entity is created by key, then remove the original entity.
If usernames are unique why dont you make it the key?
class User(db.Model):
#property
def username(self):
return self.key().name()
....
User.get_or_insert(username,field1=value1,....)
Note: You will not need transactions if you use get_or_insert

What is the best strategy for mirroring a remote DB in Core Data?

Let's say that I have two tables in a DB: Expenses and Account. Expenses is the data that I'm interested in and that table has a foreign key to Account. This DB is remote, accessed via Restful-esque commands, and I want to mirror just the data I need for my app in a Core Data data store on the iPhone. The actual DB I'm working with is much bigger than this example. ~30 tables and the Expenses table has ~7 FKs. I'm working closely with the person doing the API design, so I can modify the way I make my requests or the data returned, if necessary.
What is the best strategy for loading this data into Core Data?
My first thought was to have the request for the expense bring back the ids for the FK.
<expense>
<date>1/1/2011</date>
<cost>1.50</cost>
<account_id>123</account_id>
</expense>
This works fine if I already have an account with id '123' in my data store. If I don't, then I've got to make additional web requests every time I encounter an id I don't haveā€¦ which is going to be incredibly slow. I can get around this by making requests in a specific order, i.e. request all new accounts before requesting expenses, so that I way I know all the FK rows exist. I feel this would become much too cumbersome once the DB starts reaching moderate complexity.
My second thought was to have the data returned from the request follow FKs and return data from the FK.
<expense>
<date>1/1/2011</date>
<cost>1.50</cost>
<account>
<id>123</id>
<name>Bob's Big Boy</name>
<address>1234 Main Street</address>
</account>
</expense>
This looks better and guarantees that I'll have all the data I need when I need it. If I don't already have an account '123' I can create a new account object from that XML. My concern with this method, though, is that as the database grows in complexity, these XML files could become excessively large. The Expenses table has ~7 foreign keys, each of those tables has multiple FKs. It feels like a simple request for just a single Expense could end up returning a huge chunk of data.
How have other people solved this issue?
I am assuming that at any given time you only want to cache part of the server DB in the local app and that the data cached may change overtime.
You probably want to use "stub" entities to represent related objects that you haven't actually downloaded yet. You would set up the entities like this:
Expense{
date:Date
cost:Number
account<<-->AccountStub.expenses
}
AccountStub{
id:Number
expenses<-->>Expenses.account
}
Account:AccountStub{
name:String
address:String
}
The AccountStub entity has the bare minimum info needed to identify the Account in the server DB based on info provided from the Expense table. It serves as a placeholder in the object graph for the full fledged Account object (you can think of it as a type of fault if you like.)
Since Expenses has the relationship with AccountStub and Account inherits from AccountStub you can swap out an Account for an AccountStub (and vice versa) as needed.
You will need to provide a custom subclass for AccountStub and Account such that AccountStub can trigger the downloading of account data and the creation of an Account object when that data is actually required. Then the new Account object should be swapped out for AccountStub in all its relationships (that may take rather a lot of code.)
To use, you would first obtain the data for an Expense object and create that object. You would attempt to fetch for an AccountStub with the ID provided from the Expense table data. Set the fetch to include subentries. If an AccountStub or Account object exist with that ID you will add the Expense object to the relationship. If not, the you create an AccountStub object with that ID and add it to the relationship. Now you have a basic object graph showing the relationship of an Expense object to an AccountStub object. To access the account data of an Expense, you would first check if the related account is a stub or a full account. If it is a stub, then you need to load the full account data before preceding.
The advantage of this system is that you can maintain a fairly complex object graph without having to actually have all the data locally. E.g. you can maintain several relationships and walk those relationships. E.g you could expand your model like this:
AccountStub{
id:Number
expenses<-->>Expenses.account
owner<<--AccountOwnerStub.accounts
}
AccountOwnerStub{
id:Number
accounts<-->>AccountStub.owner
}
AccountOwner{
name:String
address:String
bill:Number
}
If you wanted to find the name of an Expense object's account owner, you would just walk the relationship across the stubs with account.owner.name the Account object itself would would remain just a stub.
If you need to conserve room locally, you can revert an object back to a stub without compromising the graph.
This would take some work and you would have to keep an eye on the stubs but it would let you mirror a complex external DB without having to keep all the data on hand.

How to structure database tables?

I am planning a database. It will track when a software program has been registered and log the information in the Registered table.
Two questions:
1: where should i log invalid registration attempts. For example if the user enters the wrong registration information or if they try to register but they have used all of their licenses. I want to remember this information but where do i put it?
I was thinking a separate FailiedRegiatration table or in general notifications table. What do you think?
2: Also if a user registers the same computer i want to allow them however i want to document that they reregistered the computer. Where should i store this information?
I was thinking making a DateRegiatered table that is linked to the Refistered table. That way for each successful registration i can keep track if someone reregisteres on the same computer.
Any comments are helpful as i think through this.
Thanks.
If you need to specifically act on failed registrations, or later activate it and make it a successful registration table, store it in a separate table. If you only need to know about it, consider just storing the failure in a log table of some sort.
I think you want a separate table tracking user, and the machine registered on; that way, you know how many registrations a user performed, whether its 1, 2, or 10, etc. Just a pointer table that points to user ID and the registration...
My two cents.
Personally, I prefer to use logs, rather than database tables, to record "events" that are suitable for logging, and your "failed registration" event definitely seems to fall under this category (the "dates of registration" information is more debatable from this point of view).
Of course, that does depend on having a good logging system (with log rotation, etc) and a good log-processing system too -- many hosting providers, for example, may not give you those, though they'll typically let you use a relational DB.
If that's the case (you can't rely on "good logging and log processing", but rather whatever you do need to persist must go s/where in the DB), then one or more "log-like tables" (more or less like you outline) are a kind-of-OK workaround (and it's hard to suggest better ones, especially without enough info about your deployment situation;-).
I think 2 tables would work. One table to track users (eg: id, username, serial, email), and one table to track registrations (id, foreign key to the users table, timestamp, record of success or failure, and some field to id the user's computer).
The second table would be your log table and have entries for successful initial reg, successful re-registration, and failed registration attempts. no?
Depending on how much information on the user's machine you have you can come up with various ways to ID if it is the same machine or not. This is a hard problem though.

Creating a Notifications type feed in GAE Objectify

I'm working on a notification feed for my mobile app and am looking for some help on an issue.
The app is a Twitter/Facebook like app where users can post statuses and other users can like, comment, or subscribe to them.
One thing I want to have in my app is to have a notifications feed where users can see who liked/comment on their post or subscribed to them.
The first part of this system I have figured out, when a user likes/comments/subscribes, a Notification entity will be written to the datastore with details about the event. To show a users Notification's all I have to do is query for all Notification's for that user, sort by date created desc and we have a nice little feed of actions other users took on a specific users account.
The issue I have is what to do when someone unlikes a post, unsubscribes or deletes a comment. Currently, if I were to query for that specific notification, it is possible that nothing would return from the datastore because of eventual consistency. We could imagine someone liking, then immediate unliking a post (b/c who hasn't done that? =P). The query to find that Notification might return null and nothing would get deleted when calling ofy().delete().entity(notification).now(); And now the user has a notification in their feed saying Sally liked his post when in reality she liked then quickly unliked it!
A wrench in this whole system is that I cannot delete by Key<Notification>, because I don't really have a way to know id of the Notification when trying to delete it.
A potential solution I am experimenting with is to not delete any Notifications. Instead I would always write Notification's and simply indicate if the notification was positive or negative. Then in my query to display notifications to a specific user, I could somehow only display the sum-positive Notification's. This would save some money on datastore too because deleting entities is expensive.
There are three main ways I've solved this problem before:
deterministic key
for example
{user-Id}-{post-id}-{liked-by} for likes
{user-id}-{post-id}-{comment-by}-{comment-index} for comments
This will work for most basic use cases for the problem you defined, but you'll have some hairy edge cases to figure out (like managing indexes of comments as they get edited and deleted). This will allow get and delete by key
parallel data structures
The idea here is to create more than one entity at a time in a transaction, but to make sure they have related keys. For example, when someone comments on a feed item, create a Comment entity, then create a CommentedOn entity which has the same ID, but make it have a parent key of the commenter user.
Then, you can make a strongly consistent query for the CommentedOn, and use the same id to do a get by key on the Comment. You can also just store a key, rather than having matching IDs if that's too hard. Having matching IDs in practice was easier each time I did this.
The main limitation of this approach is that you're effectively creating an index yourself out of entities, and while this can give you strongly consistent queries where you need them the throughput limitations of transactional writes can become harder to understand. You also need to manage state changes (like deletes) carefully.
State flags on entities
Assuming the Notification object just shows the user that something happened but links to another entity for the actual data, you could store a state flag (deleted, hidden, private etc) on that entity. Then listing your notifications would be a matter of loading the entities server side and filtering in code (or possibly subsequent filtered queries).
At the end of the day, the complexity of the solution should mirror the complexity of the problem. I would start with approach 3 then migrate to approach 2 when the fuller set of requirements is understood. It is a more robust and flexible approach, but complexity of XG transaction limitations will rear its head - but ultimately a distributed feed like this is a hard problem.
What I ended up doing and what worked for my specific model was that before creating a Notification Entity I would first allocate and ID for it:
// Allocate an ID for a Notification
final Key<Notification> notificationKey = factory().allocateId(Notification.class);
final Long notificationId = notificationKey.getId();
Then when creating my Like or Follow Entity, I would set the property Like.notificationId = notificationId; or Follow.notificationId = notificationId;
Then I would save both Entities.
Later, when I want to delete the Like or Follow I can do so and at the same time get the Id of the Notification, load the Notification by key (which is strongly consistent to do so), and delete it too.
Just another approach that may help someone =D

Resources