Google app engine and application scope data - google-app-engine

I'm developing a spring based web application that is deployed on Google app Engine.
I have a manager that store data in application scope. I'm using a spring bean (singleton) that hold a map and simply perform get and remove from the map,
however GAE is a distributed environment and this design has a problem since each application instance will have its own manager and request are not guaranteed to be made to the same application instance.
So I've looked around and found 2 possible solution:
use data store
use memcache
storing the data will cause a lot of read and writes and eventually I do not need the data to be saved.
second looked promising but Google mentioned that:
In general, an application should not expect a cached value to always be available.
and I need a guarantee that when I ask my manger to get a value it will return it.
is there any other solution?
did I miss anything?

A common solution is storing the values in a memcache, backed by the datastore.
First fetch the application-scope value from the memcache, and if the memcache returns zero-result (a cache-miss event), fetch the value from the datastore and put the fetched value on the memcache.
In this way, the next fetch to the memcache would return your application-scope data, reducing the need for a (relatively) costly read to the datastore.
App Engine memcache is a least-recently used cache, so values that are frequently read would suffer from few cache-miss event.
A more complex solution to have a application-scope value is to store the values in-memory at a resident backend, and have all your other instances request/update the value from/to this particular backend via a servlet handler.

Related

Datastore Issue in Google App Engine making the application not allowing the inputs to work

App engine application is deployed in server and is connected to the database. All the data is loaded in the list and it is showing properly, but if we give some inputs in input box or any other form fields it is not reflecting in the application. The error i got from the console of app engine was this
com.google.apphosting.api.ApiProxy$RequestTooLargeException: The
request to API call datastore_v3.Put() was too large
In the App engine server I cleared the data store entries but it didn't work out well. I disabled the write access of the data store file but for few minutes it worked i was able to give the inputs and it was reflecting but after sometime the URL was not working and then I enabled the write access then the application was accessible.
This error message was caused due the Datastore default limits, most likely the limit related to the limit size for a Datastore entity, since the exception is thrown by datastore_v3.Put():
Maximum size for an entity: 1,048,572 bytes (1 MiB - 4 bytes)
However, also take in mind the limits regarding other Datastore metrics, in the linked documentation.
I would recommend you to double check that the size of the entities you are trying to insert into Datastore falls under this value, as well as under the values of the rest of the limits.

Need all requests from the same client to go to the same instance

I have a GWT based Java web application deployed to Google App Engine, in which the servelet reads and changes state held in memory. The client code might send requests to change this state and, subsequently, to change or read the same state. So it's important that all requests from the same instance of the client page go to the same instance of the application's version.
Since I don't expect a lot of traffic, I don't mind limiting the maximum number of instances to 1. But I'd like that one instance to exist more or less permanently. (If a user takes more than, say, an hour between requests, I don't mind if their data is lost.)
In detail, the way I'm managing state is that I have a static variable that points to a hash table, the hash table maps strings to states. On the first request from the client, a new unique string is created and a new state and a new entry is made in the hash table. The string is returned in the response. On subsequent requests, the client sends the string so that the servelet can find the state that it needs to mutate or read. I can't keep the state in a database because it is very complex and not at all serializable.
What are the ways to ensure that all requests from a given client instance go to the same server instance?
What are the ways to ensure that all requests from a given client instance go to the same server instance?
There are no ways, by design. If you want to persist state between requests reliably, use the datastore, with memcache as a cache.
Adding: You can also use cookies if your data storage is meager, obfuscating/encrypting them as needed.
App Engine assumes that applications hold no essential state between requests. That makes spinning up/shutting down instances a non-issue.
I second Dave's answer, GAE is not exactly the right fit for what you desire.
However there could be ways around it, but in only a few specific cases: if you're using the standard GAE environment with manual scaling and a subsequent request is always based on URLs embedded in the response to the previous request.
You could craft the URLs in a response to a request according to the targeted routing rules such that subsequent requests hit the same instance. From Targeted routing:
If you are still using backends or have manually-scaled services, you can target and send a request to a instance by
including the instance ID. The instance ID is an integer in the range
from 0 up to the total number of instances that are running, and can
be specified as follows:
Sends a request to a specific service and version within a specific instance:
https://INSTANCE_ID-dot-VERSION_ID-dot-SERVICE_ID-dot-MY_PROJECT_ID.appspot.com
http://INSTANCE_ID.VERSION_ID.SERVICE_ID.MY_CUSTOM_DOMAIN
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be
an integer in the range from 0, up to the total number of instances
running. Regardless of your scaling type or instance class, it is not
possible to send a request to a specific instance without targeting a
service or version within that instance.
To determine the instance ID you could use the modules API, for example:
// Get the instance handling the current request.
int currentInstance = modulesApi.getCurrentInstance();
Note that if the targeted instance goes down you will keep getting errors permanently (that instance will not come back), so you might want to think at a fall-back solution for going somehow to a non-instance-based based page from where you could hitch a flow on another instance.
But such solution is not available in the GAE flex environment. From Targeted routing:
Note: In the flexible environment, targeting an instance is not supported. It is not possible to send requests directly to a specific
instance.

How to sync offline database with Firebase when device is online?

I'm currently using angularJS and phonegap to build a test application for Android / iOS.
The app use only text data stored in a Firebase database. I want the app to have its own local database (used when the device is offline) and sometime (when the device is online)
sync with a Firebase database.
The offline mode uses the storage API of phonegap/cordova. Could I just check the device's online state and backup the online database periodically ?
Any clues on how I can achieve this ? Last time a similar question was asked, the answer was "not yet"... (here)... because it focused on a hypothetical Firebase feature.
If Firebase is online at the start and loses its connection temporarily, then reconnects later, it will sync the local data then. So in many cases, once Firebase is online, you can simply keep pushing to Firebase during an outage.
For true offline usage, you will probably want to monitor the device's state, and also watch .info/connected to know when Firebase connects.
new Firebase('URL/.info/connected').on('value', function(ss) {
if( ss.val() === null ) /* firebase disconnected */
else /* firebase reconnected */
});
The way to achieve this with the current Firebase toolset, until it supports true offline storage, would
keep the local data simple and small
when the device comes online, convert the locally stored data to JSON
use set() to save the data into Firebase at the appropriate path
Additionally, if the app loads while the device is offline, for some reason, you can "prime" Firebase by calling set() to "initialize" the data. Then you can use Firebase as normal (just as if it were online) until it comes online at some point in the future (you would also want to store your local copy to handle the case where it never does).
Obviously, the simpler the better. Concurrent modifications, limits of local storage size, and many other factors will quickly accumulate to make any offline storage solution complex and time consuming.
After some time, I would like to add $0.03 to #Kato's answer:
Opt to call snapshot.exists() instead of calling snapshot.val() === null. As the documentation points out, exists() is slightly more efficient than comparing snapshot.val() to null.
And if you want to update data prefer to use the update() method rather then set(), as the last will overwrite your Firebase data. You can read more here.

Outgoing bandwidth 2x that of total responses

I have a very simple app engine application serving 1.8Kb - 3.6Kb gzipped files stored in blobstore. A map from numeric file ID to blobkey is stored in the datastore and cached in memcache. The servlet implementation is trivial: a numeric file ID is received in the request; the blobkey is retrieved from memcache/datastore and the standard BlobstoreService.serve(blobKey, resp) is invoked to serve the response. As expected the app logs show that response sizes always match the blobstore file size that was served.
I've been doing some focused volume testing and this has revealed that the outgoing bandwidth quota utilization is consistently reported to be roughly 2x what I expect given requests received. I've been doing runs of 100k requests at a time summing bytes received at the client, comparing this with the app logs and everything balances except for the outgoing bandwith quota utilization.
Any help in understanding how the outgoing bandwidth quota utilization is determined for the simple application I describe above? What am I missing or not accounting for? Why would it not tally with the totals shown for response sizes in the app logs?
[Update 2013.03.04: I abandoned the use of the blobstore and reverted back to storing the blobs directly in the datastore. The outgoing bandwidth utilization is now as exactly as expected. It appears that the 2x multiplier is somehow related to the use of the blobstore (but it remains inexplicable). I encountered several other problems with using the blobstore service; most problematic were the additional datastore reads and writes (which are related to the blobinfo and blobindex meta data managed in the datastore - and which is what I was originally trying to reduce by migrating my data to the blobstore). A particularly serious issue for me was this: https://code.google.com/p/googleappengine/issues/detail?id=6849. I consider this a blobstore service memory leak; once you create a blob you can never delete the blob meta data in the datastore. I will be paying for this in perpetuity since I was foolish enough to run a 24 hr volume and performance test and now am unable to free the storage used during the test. It appears that the blobstore is currently only suitable for use in very specific scenarios (i.e. permanently static data). Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.]
The blobstore data can be deleted (i don't recommend it since it can lead to unexpected behavior), but only if you know the table that it is saved in __BlobInfo__ and __BlobFileIndex__. This is done, so your uploaded files don't have the same name, and to accidentally replace an old file.
For a full list of tables that are stored in datastore you can run SELECT * FROM __kind__.
I am not sure why your app engine app consumes 2x you outgoing bandwidth, but i will test it myself.
An alternative is to use Google Cloud Storage. If you use the default bucket for your app engine app, you get 5GB free storage
Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.
That's true, you can either use cloud storage or datastore (cloud storage is an immutable object storing service). Blobstore was more for uploading files via <input type='file' /> forms. (since recently, writing files from inside app, has been deprecated in favor to cloud storage)

Google App Engine: keep state of an object between HTTP-requests (Java)

User makes HTTP-request to the server. This request is processed with an object of some class, let's call it "Processor". Then the same user in two minutes makes another HTTP request. And I want it to be processed with the same instance of Processor as the first one. So basically I want to keep the state of some object among several requests.
I know that I can save it each time to the datastore and then load back, but this approach seems to be very slow. Is there a way to store objects in some RAM place?
How about using memcache?
You can't ensure that consecutive requests to your app will go to the same instance, but memcache can help reduce or eliminate the overhead of accessing the datastore for each request.
It sounds like you are describing is a session.
I am not sure which language runtime and web framework you are using, but it is sure to include support for a sessions. (If you are using Java you will need to enable it.)
The standard session mechanism puts a small ID in a cookie that is stored in the user's browser. On every request, each of which could be go to a different application server, this ID is used as a key to read and write persistent information from the data store.
If the datastore accesses are too slow for you I would suggest not using memcache for this session storage, because memcache is by design unreliable, so the user's session information could disappear at any time, which would be a bad experience for them.
If the amount of data you want to store is less than about a few kilobytes, then I recommend doing what Play Framework does, which is to encrypt your session data and store it directly in a cookie stored in the user's browser. This is fast and truly stateless.
If you have more data than can be stored in a cookie, and you don't want to use the data store, you could could use JavaScript local storage on the browser, and use AJAX calls to communicate with the server. (If you want to support older browsers you may need to use the jStorage wrapper library.)
If memcache isn't enough, you could use backends to maintain state. Use a resident backend (or a set of them) and route incoming requests from the frontend to the backend machine that has the state.
Docs: Python Java

Resources