Objectify: adding the #Cache annotation on existing data and removing #Parent - google-app-engine

Two questions about updating my domain diagram:
1) I am new to GAE and have just deployed my first application based on Objectify. Just to discover than soon after my first users came in I had soon gone through the datastore read quota limit:
I had not until now put too much thought on server side caching. I thought Objectify's session cache would do the job for me. But now I realise I need use the global memcache.
According to Objectify's doc, I have to use Objectify's #Cache annotation on every entity that is accessed by key (and not by query).
However I am concerned about the side effects this will have on data that I have already stored in datastore.
2) I also realize now that I am using #Parent too much. There are a couple entities were using #Parent has no benefit (and it has some drawbacks due the datastore limiting write operations on entities belonging to the same root).
If I go ahead and remove the #Parent annotation from the entities of my domain where it no longer is needed, will it have side effects on the already persited entities?
Thanks!

For objectify : the global cache is enabled by default, however you
must still annotate your entity classes with #Cache.
#Parent is
important if you need consistent result, and avoid eventual
consistency. Removing the ancestor will have a side effect on the already stored data as the key will change. You will need a migration plan.
But most of all, the free quota are quite reasonable, so if you already run into quota errors with your first user, then I would suggest installing appstats and actually measure what is the real underlying cause i.e. what action(s) are responsible for the bulk of the operations and work on those. Much better than a general approach.

Related

Are Google App Engine entity groups locked when writes are not in a transaction

I have a question-answer-comment application(similar to stackoverflow). The questions and their related answers and comments logically form part of entity groups as defined in App Engine Docs .
I want to use entity groups/ancestor paths to group my entities together for 2 reasons:
Improve query efficiency by storing Question and Answer entities together physically
Allow me to perform ancestor queries thus eliminating the need for me store the Answer keys on the Question entity (relationships)
I do not want strong consistency as it will eventually cause contention.
Does App Engine always lock an entity group when updating or only when the update is being done in a transaction? In other words, do entity groups force updates to happen in transactions or simply provide the option to use transactions?
About your 1st reason for choosing an ancestry-based approach - I don't think I ever saw any kind of promise with respect to the physical location in the datastore - I imagine any such constraint would collide with its high scalability. I wouldn't worry about it, IMHO the gain of such efficiency optimisation, if any, would be negligible.
You should be aware that contention isn't directly related to (strong) consistency (consistency really boils down to just the accuracy of query results).
Contention is however directly related to accessing the same entity group simultaneously, even for read operations, not only for write - see Contention problems in Google App Engine. Using ancestry is only making it worse as all entities in an ancestry tree are in the same entity group.
For your 2nd reason (if I understand your goal correctly) you don't need to store Answer keys into your Question entity or use ancestry. If you store the Question key (or key ID) into the Answer entity you can obtain the answers to a question by making regular (non-ancestor) queries for Answer entities with the matching question key/ID.
The entity group "locking" is only visible in transactions (and no, transactions aren't enforced, but think twice before attempting to write outside transactions - unintended overwrites will occur). But note that such locking is only effective as protection against conflicting write ops, but not against contention.

Automatic Upgrade to Cloud Firestore, what about ancestor queries and entity groups?

On the announcement of Automatic Upgrade to Cloud Firestore for Google Datastore projects.
Benefits including:
Queries are no longer eventually consistent; instead, they are all strongly consistent.
Transactions are no longer limited to 25 entity groups.
Writes to an entity group are no longer limited to 1 per second.
In the current active app, some logic was implemented to ensure strong consistency using cross group Transaction operations, creation of ancestor queries & entity groups.
What will happen to all this app logic & DB data structure when it is automatically migrated to Firestore? Since data would be strongly consistent, it seems there will no longer be a need for entity groups & ancestor queries!
...Unless used inside a cross group transaction for atomic behavior across multiple entities
Any thoughts on that and what to expect? Also anyone knows when is the automatic migration expected to finish?
My interpretation of the announcement in your context:
Your existing cross-group transactions don't touch more than 25 entity groups due to current limit. Dropping the 25 groups limit won't have any influence on them, they should continue to work as before
ancestor queries remain supported
structuring/grouping your data in entity groups remains supported, regardless of the reason behind it. Your particular split may have been driven by current limits - the migration may make that reason disappear, but that's about it.
So I'm almost certain your app will continue to function unaffected (except maybe in performance/response times?). The difference will be that you will have the option of dropping workarounds for the no longer applicable limitations and maybe further optimize your app.
In general I believe all existing applications will remain unaffected, otherwise Google wouldn't make the upgrades automatic - they would simply notify the app owners to make the necessary changes by a certain date, with a migration guide in place - like they did with other non-backwards-compatible changes.

Database: Repositories with NoSQL/Document Database (DDD)

Looking for any advice from anyone who has migrated their repositories from relational DB to a NoSQL?
We are currently building an App using a Postgres database & ORM (SQLAlchemy). However, there is a possibility that at a later date we may need to migrate the App to an environment that currently only supports a couple of NoSQL solutions.
With that in mind, we're following the Persistence-Orietated approach to repositories covered in Vaughn Vernon's Implementing Domain-Driven Design. This results in the following API:
save(aggregate)
save_all(aggregates)
remove(aggregate)
get_by_...
Without going into detail, the ORM specific code has been hidden away in the repository itself. The Session is only used for the short span of time when data is retrieved, or updated, and then immediately committed and closed (in the repos methods). This means lots of merging on save, and not the most efficient use of the Session.
def save(aggregate):
try:
session.merge(aggregate)
commit
except:
rollback
def get():
try:
aggregate = session.query_by(id)
session.expunge
commit
except:
rollback
return aggregate
etc etc
The advantages:
We are limiting ourselves to updating a single Aggregate per Use Case, so the lack of fully utilising the UOW Transaction Control in the Application Service is minimal (outside of performance). Transaction Control is enabled in the repos while the aggregate is written to ensure the full aggregate is persisted.
No ORM specific code leaks outside of the Repositories, which would need to be re-coded in the advent of switching to a NoSQL db anyway.
So if we do have to switch to a NoSQL DB, we should have a minimal amount of work to do.
However, almost everything I have read encourages Transactional Behaviour to live in the Application Service Layer. Although I believe there is a distinction here between Business Transactional and DB Transactional.
Likewise, we're taking performance hit, in that we are asking the session factory for a session on every call to the repository. Most services contain about 3 or so calls to a repository.
So, the question to anyone who has migrated from Relational to a NoSQL DB?
Does the concept of a Unit of Work / Session mean anything in a NoSQL world?
Should we fully embrace the ORM in the meantime, and move the UOW/Session outside of the Repository into the Application Service?
If we do that, what was the level of effort to re-engineer the Application Service, if we need to migrate to a NoSQL solution in the end. (The repositories will need to be re-written in any instance).
Finally, anyone had much experience writing a implementation agnostic repository?
PS. Understand we could drop the ORM entirely and go pure SQL in the meantime, but we have beed asked to ensure we are using an ORM.
EDIT: In this answer I focus on document db's based on the questions title. Of course other NoSQL stores exist with vastly different characteristics (for example graph db's, using event sourcing and others).
It should not be a problem really.
In document db's your entire aggregate should be a single document. This way you have exactly the same guarantees that you need for transactional consistency. Regardless of how many entities change within the aggregate, you're still storing a document. You will need to make sure you enforce some form of optimistic concurrency (through an etag or version or similar), and not a Unit of Work pattern, but after that your transactional requirements are covered.
I can't really comment whether you fully embrase a UoW pattern now, vs rely on ORM implementation etc. This really depends a lot on your current situation and details about implementation. What I can say though is that it is quite probable that you won't need to migrate from normal form (SQL) to documents all in one go. Start from a simple one so that you can see what works for you and what doesn't.
I don't know if implementation-agnostic repositories exist, but that doesn't make a lot of sense to me. The whole point of a repository is encapsulating persistence, so you can't abstract it: there won't be any other responsibility allocated to them. Also, you can't assume that the repository will need to compose different models into the aggregate model: this is specific to platform, so it's not agnostic.
Another final comment: I see in your question that for documents you wrote save_all(aggregates). I'm not sure what you're referring to, but at minimum, each aggregate save should be wrapped in its own transaction, otherwise this operation violates transactional boundary characteristic of Aggregate.
Does the concept of a Unit of Work / Session mean anything in a NoSQL
world?
Yes, it can still be an interesting concept to have. Just because you're using a NoSQL storage doesn't mean that the need for some sort of business transaction management disappears. Many NoSQL databases have drivers or third party libraries that manage change tracking. See RavenDB for instance.
Sure, if you're only ever loading one aggregate per transaction and if your NoSQL unit of storage matches an aggregate perfectly, most of a Unit of Work's features will be less important, but you'll still be facing exceptions to that rule. Besides, the part of a UoW that's relevant in any case is Commit and possibly Abort.
Should we fully embrace the ORM in the meantime, and move the
UOW/Session outside of the Repository into the Application Service?
What I recommend instead is materializing the concept of Unit of Work in a full fledged class:
class UnitOfWork {
void Commit()
{
// Call your ORM persistence here
}
}
Application Services are just the place where the Unit of Work is called, not where it is implemented.
If we do that, what was the level of effort to re-engineer the
Application Service, if we need to migrate to a NoSQL solution in the
end. (The repositories will need to be re-written in any instance).
It depends on a lot of other parameters such as Unit of Work support by your NoSQL API or third party libraries, and similarity in shape between Aggregates and the NoSQL storage. It can range from practically no work to writing a full UoW/change tracking implementation yourself. If the latter, extracting UoW logic from the Repository to a separate class won't be the hardest part of the job.
Finally, anyone had much experience writing a implementation agnostic
repository?
I concur with SKleanthous here - implementation agnostic repos don't make much sense IMO. You've got your repository abstractions (interfaces) which are of course agnostic but when it comes to implementations, you have to address a particular persistent storage.

Symfony2 Doctrine Use Cache Tables

For the project I'm working on, we have a fully normalized database where no information is redundant.
I'd like to keep this method, but also add "cache" tables, which are essentially tables which have pre-computed information. I'd love to be able to have this information in separate tables (which could then be blown away and regenerated as needed).
For example, part of this involves a forum. One "cached" value would be the number of posts a user has made. There is no need to keep this in any of the normalized tables, because it can be calculated based on a count of posts linked with that user. However, this is a (relatively) expensive call, so the cache table would keep track of this value for me and I can pull from it as needed.
I'm also strongly considering using a NoSQL database like MongoDB for this, because the cached tables would essentially have no joins or foreign keys (making it perfect for MongoDB).
Any ideas how I should approach this using Doctrine in Symfony2? Anyone done this before?
Thanks a ton!
Update
As greg0ire comments, it looks like Doctrine has some built in caching functionality: http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/caching.html
Does anyone know if I can employ this to cache my values without storing them in the database?
For example, if I had an unmapped property $postCount, can I use Doctrine to cache that value (or I guess, the object with that value populated)?
The only problem with this approach (caching to memory instead of a database), is we're working in a clustered environment, so I'd either have to build the cache multiple times (each server the user hits), or set get a shared caching server set up (which is a bit tricky).
I'll continue to investigate this route, but does anyone know of any database stored methods?
Thanks.
I think you may be looking for Doctrine's result cache
Here is the related part of the sf2 configuration.

DjangoAppEngine and Eventual Consistency Problems on the High Replication Datastore

I am using djangoappengine and I think have run into some problems with the way it handles eventual consistency on the high application datastore.
First, entity groups are not even implemented in djangoappengine.
Second, I think that when you do a djangoappengine get, the underlying app engine system is doing an app engine query, which are only eventually consistent. Therefore, you cannot even assume consistency using keys.
Assuming those two statements are true (and I think they are), how does one build an app of any complexity using djangoappengine on the high replication datastore? Every time you save a value and then try to get the same value, there is no guarantee that it will be the same.
Take a look in djangoappengine/db/compiler.py:get_matching_pk()
If you do a djangomodel.get() by the pk, it'll translate to a Google App Engine Get().
Otherwise it'll translate to a query. There's room for improvement here. Submit a fix?
Don't really know about djangoappengine but an appengine query if it includes only key is considered a key only query and you will always get consistent results.
No matter what the system you put on top of the AppEngine models, it's still true that when you save it to the datastore you get a key. When you look up an entity via its key in the HR datastore, you are guaranteed to get the most recent results.

Resources