Blocking concurrent access to App Engine datastore - google-app-engine

I'm building an application on Google App Engine that uses the datastore to store information about the current state of the server. When an Android device queries the server, a servlet gets an Entity from the datastore, modifies it, and puts it back into the datastore to update the datastore entry.
However, sometimes while one instance of the servlet has gotten the data from the datastore, another instance of the servlet does the same before the first instance puts updated data back in. This is causing synchronization issues in my application.
Is there any way to "lock" the datastore so that nothing can operate on it until the lock is released?
Thanks.

Transactions are what you're after.
Read the docs carefully though: there are strict limitations on what you can do within a transaction. Specifically, you can only query within a single entity group - that is, the set of entities with the same ancestor.

Related

What is the suitable db for bulk writes

My application is currently on app engine server. My application writes the records(for logging and reporting) continuously.
Scenario: Views count in the website. When we open the website it hits the server to add the record with time and type of view. Showing these counts in the users dashboard.
Seems these requests are huge now. For now 40/sec. Google App Engine writes are going heavy and cost is increasing like anything.
Is there any way to reduce this or any other db to log the views?
Google App Engine's Datastore is NOT suitable for such a requirement where you have to continuously write to datastore and read less often.
You need to offload this task to a third party service (either you write one or use existing one)
Better option for user tracking and analytics is Google Analytics (Although you wont be directly able to show the hit counters on website using analytics).
If you want to show your user page hit count use a page hit counter: https://www.google.com/search?q=hit+counter
In this case you should avoid Datastore.
For this kind of analytics it's best to do the following:
Dump data to GAE log (yes, this sounds counter-intuitive, but it's actually advice from google engineers). GAE log is persistent and is guaranteed to not loose data you write to it.
Periodically parse the log for your data and then export it to BigQuery.
BigQuery has a quite powerful query language so it's capable of doing complex analytics reports.
Luckily this was already done before: see the Mache framework. Also see related video.
Note: there is now a new BigQuery feature called streaming inserts, which could potentially replace the cumbersome middle step (files on Cloud Storage) used in Mache.

In Python and GAE: How to permanently cache data from a Datastore across HTTP GET requests

I am developing an online product using GAE and Python. Certain data in my Model (i.e. Datastore) are constant across Contexts: which means for all incoming HTTP GET requests, those data don't change.
For the sake of argument, assume that said data must live in the Datastore as opposed to static pages (e.g html).
How would I set the Google App Engine Caching policies so that the Datastore is only queried once in the life of the application -- even if the product is experiencing millions of hits per day?
DISCLAIMER: I am a complete newbie to both Python and GAE.
I am presently looking into global variables, which I would use to store said query results. Not only do I not yet know how that would work, there is another problem: Different HTTP GET requests (i.e. urls) are for different portions and views of said constant data.
Thanks for any insight.
You may want to take a look at the Memcache API. It will allow you to do basically what you want - cache the results of a query (or even the resulting page as HTML) and serve it while it is available (you can set an expiry, but you will also occasionally experience cache misses where the datastore is queried anyway). Also, as #voscausa mentions, switching your datastore API from db to ndb will provide automatic caching with additional options to further modify caching behavior (docs here).

Replicating data from GAE data store

We have an application that we're deploying on GAE. I've been tasked with coming up with options for replicating the data that we're storing the the GAE data store to a system running in Amazon's cloud.
Ideally we could do this without having to transfer the entire data store on every sync. The replication does not need to be in anything close to real time, so something like a once or twice a day sync would work just fine.
Can anyone with some experience with GAE help me out here with what the options might be? So far I've come up with:
Use the Google provided bulkloader.py to export the data to CSV and somehow transfer the CSV to Amazon and process there
Create a Java app that runs on GAE, reads the data from the data store and sends the data to another Java app running on Amazon.
Do those options work? What would be the gotchas with those? What other options are there?
You could use a logic similar to what App Engine HRD migration or backup tool are doing:
Mark modified entities with a child entity marker
Run a MapperPipeline using App Engine mapreduce library iterating on those entity using a Datastore Input Reader
In your map function fetch the parent entity and serialize it to Google Storage using a File Output Writer and remove the marker
Ping the remote host to import those entity from the Google Storage url
As an alternative to 3 and 4, you could make multiple urlfetch(POST) to send each serialized entity to the remote host directly, but it is more fragile as an single failure could compromise the integrity of your data import.
You could look at the datastore admin source code for inspiration.

Add persistent field to an App Engine Datastore class?

So if I have a class of which I have a number saved in a datastore, and then seek to add a field later, how can I prevent all my previous objects breaking? Is there a way to retroactively set those fields so they're not null?
I'm using JDO.
It depends on how you are accessing the datastore. Which runtime (python/java) and which API are you using to access the datastore? The datastore itself is schemaless, so it dosen't care what is or isn't in a certain entity. On the Java side, if you use the low level datastore API, you wouldn't have any problems accessing the "old" entities and adding in the data you want to. However, if you are using JDO or JPA to access the datastore, you might get errors accessing the entities with missing data.

app engine big table

what is bigtable. Is any authentication require to create table in bigtable.where the data will be store. it is possible to view the table. we can view all the tables in bigtable, which was created by others.
I'll take your several questions one at a time:
Bigtable is the system on which AppEngine's datastore is built. It is effectively a distributed hashtable.
Authentication is required in that you must have a Google Account; you must have signed up for AppEngine; you must have created an application within AppEngine. You application will be able to access the datastore, and if you are logged in to your application's Admin Console, you can use the Datastore Viewer to inspect the contents of your application's datastore.
The data will be stored on Google Servers.
There are no tables, per se, but you can use the Datastore Viewer to view entities that reside in your application's Datastore.
No, you can not ever view the Datastore's contents that were created by other applications. Each application's view of the Datastore is completely siloed and has no connection to that of other AppEngine applications.

Resources