What's the right way to do long-running processes on app engine? - google-app-engine

I'm working with Go on App Engine and I'm trying to build an API that needs to perform a long-running background task - in this case it needs to parse and chunk a big file out to task queues. I'd like it to return a 200 and close the user connection immediately and let the process keep running in the background until it's complete (this could take 5-10 minutes). Task queues alone don't really work for my use case because parsing the initial file can take more than the time limit for an API request.
At first I tried a Go routine as a solution for this problem. This failed because my app engine context expired as soon as the parent function closed the user connection. (I suppose I could try writing a go routine that doesn't require a context, but then I'd lose logging and I'd need to fetch the entire remote file and pass it to the go routine.)
Looking through the docs, it looks like App Engine used to have functionality to support exactly what I want to do: [runtime.RunInBackground], but that functionality is now deprecated and the replacement isn't obvious.
Is there a "right" or recommended way to do background processing now?
I suppose I could put a link to my big file into a task queue, but if I understand correctly, even functions called through task queues have to complete execution within a specified amount of time (is it 90 seconds?) I need to be able to run longer than that.
Thanks for any help.

try using:
appengine.BackgroundContext()
it should be long-lived but will only work on GAE Flex

Related

will gatling actually perform the operation or will it check only the urls' response time?

I have a gatling test for an application that will answer a survey and upon answering this survey, the application will identify possible answers that may pose a risk and create what we call riskareas. These riskareas are normally created in the background as soon as the survey answering is finished. My question is I have a gatling test with ten users who will go and answer the survey and logout, I used recorder to record the test; now after these ten users are finished I do not see any riskareas being created in the application. Am I missing something--should the survey be really answered by gatling (like it does in selenium) user or is it just the urls that the gatling test will touch ?
I am new to gatling please help.
Gatling should be indistinguishable from a user in a web browser (or Selenium) as far as the server is concerned, so the end result should be exactly the same as if you'd gone through the process yourself. However, writing a Gatling script is a little more work than writing a Selenium script.
For performance reasons, Gatling operates at a lower level than Selenium. Gatling works with the actual data that is sent and received from the server (i.e, the actual GETs and POSTs sent to the server), rather than with user-level interactions (such as clicking links and filling forms).
The recorder will generally produce a relaitvely "dumb" script. It records the exact data that was sent to the server, and makes no attempt to account for things that may change from run to run. For example, the web application you are testing might have hidden form fields that contain session information, or the link addresses might contain a unique identifier or a session id.
This means that your script may not be doing what you think it's doing.
To debug the script, the first thing to do is to add checks on each of the requests, to validate that you are getting the response you expect (for example, check that when you submit page 1 of the survey, you are taken to page 2 - check for something that you'd only expect to find on page 2, like a specific question).
Once you know which requests are failing, look at what data was sent with the request, and try to figure out where it came from. You will probably find that there are session ids, view state, or similar, that must be extracted from the previous page.
It will help to enable request and response logging, as per the documentation.
To simplify testing of web apps, we wrote some helper functions to allow tests to be written in a more Selenium-like way. Once you understand what your application is doing, you may find that it simplifies scripting for you too. However, understanding why your current script doesn't work the way you expect should be your first step.

How do I execute code on App Engine without using servlets?

My goal is to receive updates for some service (using http request-response) all the time, and when I get a specific information, I want to push it to the users. This code must run all the time (let's say every 5 seconds).
How can I run some code doing this when the server is up (means, not by an http request that initiates the execution of this code) ?
I'm using Java.
Thanks
You need to use
Scheduled Tasks With Cron for Java
You can set your own schedule (e.g. every minute), and it will call a specified handler for you.
You may also want to look at
App Engine Modules in Java
before you implement your code. You may separate your user-facing and backend code into different modules with different scaling options.
UPDATE:
A great comment from #tx802:
I think 1 minute is the highest frequency you can achieve on App Engine, but you can use cron to put 12 tasks on a Push Queue, either with delays of 5s, 10s, ... 55s using TaskOptions.countdownMillis() or with a processing rate of 1/5 sec.

GAE: is putting all handlers in main.py gonna make my app slow?

I'm building a web application using GAE.
I've been doing some research by my own on GAE python project structures,
and found out that there isn't a set trend on how to place my handlers within the project.
As of now, I'm putting all the handlers(controllers) in main.py,
and make all urls (/.*) be directed to main.application.
Is this going to make my application slower?
Thank You!
In general, this will not make your application slower, however it can potentially slow you down your instance start-up time, but it generally isn't a problem unless you have very large complicated apps.
The instance start up time comes into play whenever GAE spins up a new instance for you. For example, if your app is unused for a long period and you start it up once in a long while, or for example, if your app is very busy and need a new instance to handle the load.
python loads your modules as needed. So if you launch an instance, and the request goes to main.py, then main.py and all the modules associated with it will get loaded. If your app is large, this may take a few seconds. Let's just say for example it takes 6 seconds to load every module in your app. That's a 6 second wait for whoever is issuing that request. Subsequent requests to that loaded instance will be quick.
It's possible to break down your handlers to separate modules. If handler for \a requires very little code, then having \a in a separate file will reduce the response time for \a. But when you load \b that has all the rest of the code, that would take a while to load. So it's possible to take that 6 second load and potentially break it up into a few requests that may take 2 seconds.
This type of optimization really depends on the libraries you need to load with each request. You generally want to do this later on, when you run into the problems, rather than design your layout for this purpose up front, since it's pretty difficult to predict.
App Engine warmup requests also help alleviate this problem.
No, that doesn't affect the speed. Your code needs to be loaded anyway, so it makes no difference if it's all in one file or not. It will of course make the file more complex, but that's your problem, not GAE's.

Refactoring a Google App Engine datastore

In my datastore I had a few hundred entities of kind PlayerStatistic that I wanted to rename to GamePlayRecord. On the dev server it was easy to do this by writing a small script in the Interactive Console. However there is no Interactive Console once the app has been deployed.
Instead, I copied that script into a file and linked the file in app.yaml. I deployed the script, intending to run it once and then delete it. However, I ran into another problem, which is that the script ran for over 30 seconds. The script would always get cut off before it could complete.
My solution ended up being rewriting the script so that it creates and deletes the entities one at a time. That way, even when it timed out, the script could continue where it left off. Since I only have a few hundred entities this took about 5 refreshes.
Is there a better way to run one-time refactoring scripts on Google App Engine? Is there a good way to get around the 30 second limit in order to run these refactoring scripts?
Use the task queue.
Tasks can run for more much longer than web requests. You can also split up the work into many tasks, so they will run parallel and finish faster. When you finish the task, you can programmatically insert a new task, so the whole process is automated and you don't need to manually refresh.
appengine-mapreduce is a good way to do datastore refactoring. It takes care of a lot of the messy details that you would have to grapple with when writing task code by hand.

How can I do the same thing over and over every 1-4 seconds in google app engine?

I want to run a script every few seconds (4 or less) in google app engine to process user input and generate output. What is the best way to do this?
Run a cron job.
http://code.google.com/appengine/docs/python/config/cron.html
http://code.google.com/appengine/docs/java/config/cron.html
A cron job will invoke a URL at a
given time of day. A URL invoked by
cron is subject to the same limits and
quotas as a normal HTTP request,
including the request time limit.
.
Also consider the Task Queue - http://code.google.com/appengine/docs/python/taskqueue/overview.html
Reconsider what you're doing. As Ash Kim says, you can do it with the task queue, but first take a close look if you really need to run a process like this. Is it possible to rewrite things so the task runs only when needed, or immediately, or lazily (that is, only when the results are needed)?

Resources