Route request to specific instance - google-app-engine

Does anybody know if GAE provides a way to route a request to a specified instance? The startup of new instances is killing me on facebook URL linter requests since they timeout before a new instance can start up sometimes. I have no way to control this timeout either. So what I'd like to do is to keep specified instances idle for these calls without needing to hack around it with cron jobs. I think this would be more cost effective as well.

The new modules allows for direct addressing of instances. Much like how backends used to work.
Like so:
http://instance.version.module.app-id.appspot.com
Read more in the documentation here.

It sounds like you need a dedicated set of "always alive" instances to handle just those calls. Backends might be a good solution for that. You can set a separate url address to route to a specific backend.
http://code.google.com/appengine/docs/python/backends/overview.html#Addressing_Backends

This is not possible for frontends, but you can have requests directed to specific backends, and you can make backends externally accessible if you choose.
I'd suggest working on your app to improve loading time, though. If it's taking so long a bot gives up, that's got to have serious implications for usability by your users. Also, make sure you've got warmup requests enabled.

Related

Using Objectify in long-running background tasks

I am planning to use Objectify to talk to Cloud Datastore from GAE Flex. The app is going to be running quite a few background threads talking to Datastore regarding which I have a couple of questions.
I am not planning to use any memcache setup and since these threads are going to be running for a long time, I dont want the Session cache to fill up either. I could not find a way to set ofy() to never cache locally too and the only option seems to be to run a clear() operation periodically. Is there a better way to avoid these caches?
As I see it, we need to wrap any such invocations of ofy() in a run() block to perform the cleanup. I wanted to confirm that this was the only way to use it outside requests scope and there was no in-built support for these longer contexts.
Thanks
You are correct. ObjectifyService.run() is the way to run requests outside of the ObjectifyFilter.
There is not currently any way to disable the session cache. The session cache is pretty deeply woven into the fabric of Objectify in order to get sane behavior for #Load operations. It's not impossible, it just hasn't risen to the top of the priority queue.
The best way to iterate large quantities of your datastore without hitting memory issues is to iterate specifying an explicit chunk() size and then clear() after processing that number of items. If you use Guava's Iterators.partition(), this is pretty much a one-liner.

How Can You Determine When a Request Started on GAE Managed VM?

On Google App Engine, there are multiple ways a request can start: a web request, a cron job, a taskqueue, and probably others as well.
How could you (especially on Managed VM) determine the time when your current request began?
One solution is to instrument all of your entry points, and save the start time somewhere, but it would be nice if there was an environment variable or something that told when the request started. The reason this is important is because many GAE requests have deadlines (either 60 seconds or 10 minutes in various scenarios), and it's helpful to determine how much time you have left in a request when you are doing some additional work.
We don't specifically expose anything that lets you know how much time is left on the current request. You should be able to do this by recording the time at the entrypoint of a request, and storing it in a thread local static.
The need for this sounds... questionable. Why are you doing this? It may be a better idea to use a worker / queue pattern with polling for something that could take a long time.
You can see all this information in the logs in your Developer console. You can also add more data to the logs in your code, as necessary.
See Writing Application Logs.

Appengine responses becoming slower?

my ajax calls to AppEngine doing some very basic logic (and doing all the actual processing in the background, isolated from the frontend) tend to be at least 200% slower than they used to. Like taking 3 seconds instead of one out of a sudden since a week or so.
I am wondering if you guys had a similar experience or something changed in the meantime I am not aware of, quota wise maybe. I am using the free quota.
Thanks
Zac
To my knowledge there is no particular change going on, but we can't be sure. However slow response time can have multiple root causes.
If you have no traffic on your application then you might have zero instance running, therefore when you make your request there is the time for an instance to start up.
If you have a lot of traffic, depending on your configuration the request can take more time. You need to fine tune wether the request waits to be handled by an "overloaded" instance or if another instance should start.
If you use an API maybe there is something wrong with it.
I would suggest you enable appstats in your app, it will show you what takes time in your request: you will definitely see if this is something on your side or not.

how many users in a GAE instance?

I'm using the Python 2.5 runtime on Google App Engine. Needless to say I'm a bit worried about the new costs so I want to get a better idea of what kind of traffic volume I will experience.
If 10 users simultaneously access my application at myapplication.appspot.com, will that spawn 10 instances?
If no, how many users in an instance? Is it even measured that way?
I've already looked at http://code.google.com/appengine/docs/adminconsole/instances.html but I just wanted to make sure that my interpretation is correct.
"Users" is a fairly meaningless term from an HTTP point of view. What's important is how many requests you can serve in a given time interval. This depends primarily on how long your app takes to serve a given request. Obviously, if it takes 200 milliseconds for you to serve a request, then one instance can serve at most 5 requests per second.
When a request is handled by App Engine, it is added to a queue. Any time an instance is available to do work, it takes the oldest item from the queue and serves that request. If the time that a request has been waiting in the queue ('pending latency') is more than the threshold you set in your admin console, the scheduler will start up another instance and start sending requests to it.
This is grossly simplified, obviously, but gives you a broad idea how the scheduler works.
First, no.
An instance per user is unreasonable and doesn't happen.
So you're asking how does my app scale to more instances? Depends on the load.
If you have much much requests per second then you'll get (automatically) another instance so the load is distributed.
That's the core idea behind App Engine.

What's the best way for the client app to immediately react to an update in the database?

What is the best way to program an immediate reaction to an update to data in a database?
The simplest method I could think of offhand is a thread that checks the database for a particular change to some data and continually waits to check it again for some predefined length of time. This solution seems to be wasteful and suboptimal to me, so I was wondering if there is a better way.
I figure there must be some way, after all, a web application like gmail seems to be able to update my inbox almost immediately after a new email was sent to me. Surely my client isn't continually checking for updates all the time. I think the way they do this is with AJAX, but how AJAX can behave like a remote function call I don't know. I'd be curious to know how gmail does this, but what I'd most like to know is how to do this in the general case with a database.
Edit:
Please note I want to immediately react to the update in the client code, not in the database itself, so as far as I know triggers can't do this. Basically I want the USER to get a notification or have his screen updated once the change in the database has been made.
You basically have two issues here:
You want a browser to be able to receive asynchronous events from the web application server without polling in a tight loop.
You want the web application to be able to receive asynchronous events from the database without polling in a tight loop.
For Problem #1
See these wikipedia links for the type of techniques I think you are looking for:
Comet
Reverse AJAX
HTTP Server Push
EDIT: 19 Mar 2009 - Just came across ReverseHTTP which might be of interest for Problem #1.
For Problem #2
The solution is going to be specific to which database you are using and probably the database driver your server uses too. For instance, with PostgreSQL you would use LISTEN and NOTIFY. (And at the risk of being down-voted, you'd probably use database triggers to call the NOTIFY command upon changes to the table's data.)
Another possible way to do this is if the database has an interface to create stored procedures or triggers that link to a dynamic library (i.e., a DLL or .so file). Then you could write the server signalling code in C or whatever.
On the same theme, some databases allow you to write stored procedures in languages such as Java, Ruby, Python and others. You might be able to use one of these (instead of something that compiles to a machine code DLL like C does) for the signalling mechanism.
Hope that gives you enough ideas to get started.
I figure there must be some way, after
all, web application like gmail seem
to update my inbox almost immediately
after a new email was sent to me.
Surely my client isn't continually
checking for updates all the time. I
think the way they do this is with
AJAX, but how AJAX can behave like a
remote function call I don't know. I'd
be curious to know how gmail does
this, but what I'd most like to know
is how to do this in the general case
with a database.
Take a peek with wireshark sometime... there's some google traffic going on there quite regularly, it appears.
Depending on your DB, triggers might help. An app I wrote relies on triggers but I use a polling mechanism to actually 'know' that something has changed. Unless you can communicate the change out of the DB, some polling mechanism is necessary, I would say.
Just my two cents.
Well, the best way is a database trigger. Depends on the ability of your DBMS, which you haven't specified, to support them.
Re your edit: The way applications like Gmail do it is, in fact, with AJAX polling. Install the Tamper Data Firefox extension to see it in action. The trick there is to keep your polling query blindingly fast in the "no news" case.
Unfortunately there's no way to push data to a web browser - you can only ever send data as a response to a request - that's just the way HTTP works.
AJAX is what you want to use though: calling a web service once a second isn't excessive, provided you design the web service to ensure it receives a small amount of data, sends a small amount back, and can run very quickly to generate that response.

Resources