Writing to Datastore from Backends without shutting down

Writing to Datastore from Backends without shutting down - google-app-engine

I am trying to write a program in Google App Engine (Python) to continually run a resident Backend which is working on finding what a series converges to. I want to make it so that it runs in the Backend, writes to Datastore, and at any point in time, you can tell what item the series is on and what value it is. The Backend only writes to one entity in Datastore, so it does not overload the storage or anything.The probably I run into though is that the Backend does not write the entity to the Datastore so it is accessible by my frontend webpage until the Backend is shut down, which defeats the purpose of being able to continually check in on it. If there is some way to have the Backend write to the Datastore so the frontend page can check in on it, please tell me!

Datastore writes in a backend process should behave no differently than writes in your front end app, meaning that they should be available for read in your front end (nearly) instantly (within consistency constraints). Both backend and front end interact with the same datastore.
It sounds like you just need to implement a recurring write of the current status of your series (ie. once every x cycles), instead of writing once at the end of the backend process.

You post suggests two issues.
The first is "without shutting down". We don't guarantee that backends will run indefinitely. See the docs on Shutdown for some details.
The second issue, if I'm understanding you, is that you're not seeing values written by the backend until some time after they're written. You may be running into "eventual consistency", were "eventual" is usually pretty short, but can an rare occasions be surprisingly long. Understanding Isolation and Consistency can help here.

Related

Alternatives to App Engine's native logging API?

Does anyone have any advice on making the logging in Google App Engine better? I am currently trying to use Splunk Storm, but they are finicky regarding input and go down often. Has anyone else encountered this and solved it in some capacity?
Currently I have a process that runs in a backend that reads from the LogService and pipes the logs into Splunk Storm via REST api. This often fails, or storm goes down, or the backend IP changes.
My issue is with the logging provided within App Engine, as the logs disappear when new versions are pushed and querying the logs with the provided dashboard is almost unusable. Splunk was a potential solution, but the cloud solution leaves a lot to be desired.
Anything that would provide a better interface into my logs would be appreciated.

You can export logs from GAE to BiqQuery which has quite capable query language. You can use Mache, an open-source project that already does this. You should write your own exporter, to expose (and make queryabe) fields (columns) you are interested in.

Since you've decided to use Splunk (or another external service) as permanent storage, it sounds like you need a location to buffer logs between the times when they're written to App Engine's log service and when Splunk is available to accept the logs. To avoid losing logs before version churn causes them to fall out of App Engine, this buffer needs to be fast and highly available.
One reasonable choice is the AE datastore. There's no unreliable hop to a 3rd party, it has an availability SLA, and it can be scaled arbitrarily by sharding writes. The downside would be the cost of R/W operations and the storage footprint of in-flight logs, but you'll incur a comparable cost for another backing store.
Whatever choice of service, have one batch process (e.g. backend or cronjob) write to the buffer from the logs reader API. As long as it runs more often than app updates, logs will always exist in durable storage. Then have another batch process wait for Splunk to be available then upload to it from the buffer and delete as you get receipt confirmation from Splunk.

GAE Go - "This request caused a new process to be started for your application..."

I've encountered this problem for a second time now, and I'm wondering if there is any solution to this. I'm running an application on Google App Engine that relies on frequent communication with a website through HTTP JSON RPC. It appears that GAE has a tendency to randomly display a message like this in the logs:
"This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application."
And reset all variables stored in RAM without warning. The same process happens over and over no matter how many times I set the variables again or upload newer code to GAE, although incrementing the app version number seems to solve the problem.
How can I get more information on this behaviour, how to avoid it and prevent data loss of my Golang applications on Google App Engine?
EDIT:
The variables stored in RAM are small classes of strings, bytes, bools and pointers. Nothing too complicated or big.
Google App Engine seems to "start a new process" in matter of seconds of heavier use, which shouldn't be long enough time for the application to be shut down for not being used. The timespan between application being uploaded to GAE, having its variable set and a new process being created is less than a minute.

Do you realize that GAE is a cloud hosting solution that automatically manages instances based on the load? This is it's main feature and reason people are using it.
When load increases, GAE creates a new instance, which , of course, has all RAM variables empty.
The solution is not to expect variables to be available or store them to permanent storage at the end of request (session, memcache, datastore) and load them if not present at the beginnig of request.

You can read about GAE instances in their documentation here, check out the performance section:
http://code.google.com/appengine/kb/java.html
In your case of having small data available, if its static then you can load it into memory on startup of a new instance. If it's dynamic data, you should be saving it to the database using their api.
My recommendation for keeping a GAE instance alive, either pay for the Always-On service or follow my recommendations for using a cron here:
http://rwyland.blogspot.com/2012/02/keeping-google-app-engine-gae-instances.html
I use what I call a "prime schedule" of a 3, 7, 11 minute cron job.

You should consider using Backends if you want long running instances with resident memory.

writing then reading entity does not fetch entity from datastore

I am having the following problem. I am now using the low-level
google datastore API rather than JDO, that way I should be in a
better position to see exactly what is happening in my code. I am
writing an entity to the datastore and shortly thereafter reading it
from the datastore using Jetty and eclipse. Sometimes the written
entity is not being read. This would be a real problem if it were to
happen in production code. I am using the 2.0 RC2 API.
I have tried this several times, sometimes the entity is retrieved
from the datastore and sometimes it is not. I am doing a simple
query on the datastore just after committing a write transaction.
(If I run the code through the debugger things run slow enough
that the entity has a chance of being read back on the second pass).
Any help with this issue would be greatly appreciated,
Regards,

The development server has the same consistency guarantees as the High Replication datastore on the live server. A "global" query uses an index that is only guaranteed to be eventually consistent with writes. To perform a query with strongly consistent guarantees, the query must be limited to an entity group, using an "ancestor" key.
A typical technique is to group data specific to a single user in a group, so the user can see changes to queries limited to the user's group with strong consistency guarantees. Another technique is to use fancier client logic to update the client's local view as soon as the change is submitted, so the user sees the change in the UI immediately while the update to the global index is in progress.
See the docs on queries and transactions.

Is a real-time multiplayer game using Google App Engine feasible?

I am currently developing a real-time multiplayer game, and have been evaluating various cloud-based hosting solutions. I am unsure whether App Engine fits my needs, and would be grateful for any feedback.
In essence, I want the system to work like this: Player A calculates round n, and generates a hash out of the game state at the end of that round. He then sends his commands for that round, and the hash, as a http POST to the server. Player B does the same thing, in parallel.
The server, while handling the POST from a player, first writes the received hash code to the memcache. If the hash from the other player is not yet in the memcache, it waits and periodically checks the memcache for the other players hash. As soon as both hashes are in the memcache, it compares them for equality. If they are equal, the server sends the commands of each player to the respectively other one as the http response.
A round like that should last around half a second, meaning two requests per player per second.
Of course, this way of doing it will only work if there are at least two instances of the application running, as two requests must be dealt with in parallel. Also, the memory cache must be consistent over all instances, be fairly reliable, and update immediately.
I cannot use XMPP because I want my game to be able to run within restricted networks, so it has to be limited to http on port 80.
Is there a way to enforce that two instances of the app are always running? Are there glaringly obvious flaws in my design? Do you think an architecture like this might work on App Engine? If not, what cloud based solution would you suggest?

I believe this could work. The key API for you to learn about / test would probably be the Channel API. That is what would allow back and forth communication between the client and server.
The next issue to worry about would be memcache. In general, it is reliable, but in the strictest sense we are supposed to assume that memcached data could disappear at any time.
If you decide that you can't risk losing the data like that, then you need to persist it in the datastore, which means you will have to experiment to make sure you can sustain 2 moves per turn. I think this is possible, but not trivially so. If you had said 1 move every 3 seconds I would say "no problem." But multiple updates to one entity per second start to bump up against the practical limit on writes per second, especially if they are transactional.
Having multiple instances running will not be a problem - you can pay to keep instances warm if necessary.

How does Google App Engine infrastructure is fault tolerant?

I am actually implementing a web application on Google App Engine. This has taken me for the moment a huge time in re-designing the database and the application through GAE requirements and best practices.
My problem is this: How can I be sure that GAE is fault tolerant, or at what degree is it fault tolerant? I didn't find any documents in GAE on this, and it is an issue that could have drawbacks for me: My app would have, for example, to read an entity from the datastore, compute it in the application, and then put it on the datastore. In this case how could we be sure that this would be correctly done and that we get the right data : if for example the machine on which the computing have be done crash ?
Thank you for your help!

If a server crashes during a request, that request is going to fail, but any new requests would be routed to a different server. So one user might see an error, but the rest would not. The data in the datastore would be fine. If you have data that needs to be kept consistent, you would do your updates in a transaction, so that either the whole set of updates was applied or none.

Transactions operating on the same entity group are executed serially, but transactions operating on different entity groups run in parallel. So, unless there is a single entity which everything in your app wants to read and write, scalability will not suffer from transactions.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight