How can I do the same thing over and over every 1-4 seconds in google app engine? - google-app-engine

I want to run a script every few seconds (4 or less) in google app engine to process user input and generate output. What is the best way to do this?

Run a cron job.
http://code.google.com/appengine/docs/python/config/cron.html
http://code.google.com/appengine/docs/java/config/cron.html
A cron job will invoke a URL at a
given time of day. A URL invoked by
cron is subject to the same limits and
quotas as a normal HTTP request,
including the request time limit.
.
Also consider the Task Queue - http://code.google.com/appengine/docs/python/taskqueue/overview.html

Reconsider what you're doing. As Ash Kim says, you can do it with the task queue, but first take a close look if you really need to run a process like this. Is it possible to rewrite things so the task runs only when needed, or immediately, or lazily (that is, only when the results are needed)?

Related

What's the right way to do long-running processes on app engine?

I'm working with Go on App Engine and I'm trying to build an API that needs to perform a long-running background task - in this case it needs to parse and chunk a big file out to task queues. I'd like it to return a 200 and close the user connection immediately and let the process keep running in the background until it's complete (this could take 5-10 minutes). Task queues alone don't really work for my use case because parsing the initial file can take more than the time limit for an API request.
At first I tried a Go routine as a solution for this problem. This failed because my app engine context expired as soon as the parent function closed the user connection. (I suppose I could try writing a go routine that doesn't require a context, but then I'd lose logging and I'd need to fetch the entire remote file and pass it to the go routine.)
Looking through the docs, it looks like App Engine used to have functionality to support exactly what I want to do: [runtime.RunInBackground], but that functionality is now deprecated and the replacement isn't obvious.
Is there a "right" or recommended way to do background processing now?
I suppose I could put a link to my big file into a task queue, but if I understand correctly, even functions called through task queues have to complete execution within a specified amount of time (is it 90 seconds?) I need to be able to run longer than that.
Thanks for any help.
try using:
appengine.BackgroundContext()
it should be long-lived but will only work on GAE Flex

NDB query().iter() of 1000<n<1500 entities is wigging out

I have a script that, using Remote API, iterates through all entities for a few models. Let's say two models, called FooModel with about 200 entities, and BarModel with about 1200 entities. Each has 15 StringPropertys.
for model in [FooModel, BarModel]:
print 'Downloading {}'.format(model.__name__)
new_items_iter = model.query().iter()
new_items = [i.to_dict() for i in new_items_iter]
print new_items
When I run this in my console, it hangs for a while after printing 'Downloading BarModel'. It hangs until I hit ctrl+C, at which point it prints the downloaded list of items.
When this is run in a Jenkins job, there's no one to press ctrl+C, so it just runs continuously (last night it ran for 6 hours before something, presumably Jenkins, killed it). Datastore activity logs reveal that the datastore was taking 5.5 API calls per second for the entire 6 hours, racking up a few dollars in GAE usage charges in the meantime.
Why is this happening? What's with the weird behavior of ctrl+C? Why is the iterator not finishing?
This is a known issue currently being tracked on the Google App Engine public issue tracker under Issue 12908. The issue was forwarded to the engineering team and progress on this issue will be discussed on said thread. Should this be affecting you, please star the issue to receive updates.
In short, the issue appears to be with the remote_api script. When querying entities of a given kind, it will hang when fetching 1001 + batch_size entities when the batch_size is specified. This does not happen in production outside of the remote_api.
Possible workarounds
Using the remote_api
One could limit the number of entities fetched per script execution using the limit argument for queries. This may be somewhat tedious but the script could simply be executed repeatedly from another script to essentially have the same effect.
Using admin URLs
For repeated operations, it may be worthwhile to build a web UI accessible only to admins. This can be done with the help of the users module as shown here. This is not really practical for a one-time task but far more robust for regular maintenance tasks. As this does not use the remote_api at all, one would not encounter this bug.

How to let user set time to run a task in google app engine

My customer wants to set time (ex: Dec 13, 16:00 pm) to run a certain task.
I dont think cron job fits for it because customer dont know how to use google app engine SDK.
Is there any other way to do it?
Thanks
You can create a task and set the time when you want this task to be executed. From the documentation:
X-AppEngine-TaskETA, the target execution time of the task, specified
in milliseconds since January 1st 1970.
You can use the task queue API: https://cloud.google.com/appengine/docs/python/taskqueue/ or even the defer API: https://cloud.google.com/appengine/articles/deferred
If your customer is a user of your application, and there may be several users requesting tasks to be executed at different times, then you can save these requests to the datastore. Create a cron job to run every hour (or however precise you need the timeframe) which checks the datastore to see if there are any tasks to run at that time - if so, run the proper script.
If this is just a one-time, or small number of tasks, you can do as Andrei suggested.

How do I execute code on App Engine without using servlets?

My goal is to receive updates for some service (using http request-response) all the time, and when I get a specific information, I want to push it to the users. This code must run all the time (let's say every 5 seconds).
How can I run some code doing this when the server is up (means, not by an http request that initiates the execution of this code) ?
I'm using Java.
Thanks
You need to use
Scheduled Tasks With Cron for Java
You can set your own schedule (e.g. every minute), and it will call a specified handler for you.
You may also want to look at
App Engine Modules in Java
before you implement your code. You may separate your user-facing and backend code into different modules with different scaling options.
UPDATE:
A great comment from #tx802:
I think 1 minute is the highest frequency you can achieve on App Engine, but you can use cron to put 12 tasks on a Push Queue, either with delays of 5s, 10s, ... 55s using TaskOptions.countdownMillis() or with a processing rate of 1/5 sec.

Backends logs depth

I have a long-running process in a backend and I have seen that the log only stores the last 1000 logging calls per request.
While this might not be an issue for a frontend handler, I find it very inconvenient for a backend, where a process might be running indefinitely.
I have tried flushing logs to see if it creates a new logging entry, but it didn't. This seems so wrong, that I'm sure there must be a simple solution for this. Please, help!
Thanks stackoverflowians!
Update: Someone already asked about this in the appengine google group, but there was no answer....
Edit: The 'depth' I am concerned with is not the total number of RequestLogs, which is fine, but the number of AppLogs in a RequestLog (which is limited to 1000).
Edit 2: I did the following test to try David Pope's suggestions:
def test_backends(self):
launched = self.request.get('launched')
if launched:
#Do the job, we are running in the backend
logging.info('There we go!')
from google.appengine.api.logservice import logservice
for i in range(1500):
if i == 500:
logservice.flush()
logging.info('flushhhhh')
logging.info('Call number %s'%i)
else:
#Launch the task in the backend
from google.appengine.api import taskqueue
tq_params = {'url': self.uri_for('backend.test_backends'),
'params': {'launched': True},
}
if not DEBUG:
tq_params['target'] = 'crawler'
taskqueue.add(**tq_params)
Basically, this creates a backend task that logs 1500 lines, flushing at number 500. I would expect to see two RequestLogs, the first one with 500 lines in it and the second one with 1000 lines.
The results are the following:
I didn't get the result that the documentation suggests, manually flushing the logs doesn't create a new log entry, I still have one single RequestLog with 1000 lines in it. I already saw this part of the docs some time ago, but I got this same result, so I thought I wasn't understanding what the docs were saying. Anyways, at the time, I left a logservice.flush() call in my backend code, and the problem wasn't solved.
I downloaded the logs with appcfg.py, and guess what?... all the AppLogs are there! I usually browse the logs in the web UI, I'm not sure if I could get a confortable workflow to view the logs this way... The ideal solution for me would be the one that is described in the docs.
My apps autoflush settings are set to the default, I played with them when at some time, but I saw that the problem persisted, so I left them unset.
I'm using python ;)
The Google docs suggest that flushing should do exactly what you want. If your flushing is working correctly, you will see "partial" request logs tagged with "flush" and the start time of the originating request.
A couple of things to check:
Can you post your code that flushes the logs? It might not be working.
Are you using the GAE web console to view the logs? It's possible that the limit is just a web UI limit, and that if you actually fetch the logs via the API then all the data will be there. (This should only be an issue if flushing isn't working correctly.)
Check your application's autoflush settings.
I assume there are corresponding links for Java, if that's what you're using; you didn't say.
All I can think that might help is to use a timed/cron script like the following to run every hour or so from you workstation/server
appcfg.py --oauth2 request_logs appname/ output.log --append
This should give you a complete log - I haven't tested it myself
I did some more reading and it seems CRON is already part of appcfg
https://developers.google.com/appengine/docs/python/tools/uploadinganapp#oauth
appcfg.py [options] cron_info <app-directory>
Displays information about the scheduled task (cron) configuration, including the
expected times of the next few executions. By default, displays the times of the
next 5 runs. You can modify the number of future run times displayed
with the -- num_runs=... option.
Based on your comment, I would try.
1) Write you own logger class
2) Use more than one version

Resources