Refactoring a Google App Engine datastore - google-app-engine

In my datastore I had a few hundred entities of kind PlayerStatistic that I wanted to rename to GamePlayRecord. On the dev server it was easy to do this by writing a small script in the Interactive Console. However there is no Interactive Console once the app has been deployed.
Instead, I copied that script into a file and linked the file in app.yaml. I deployed the script, intending to run it once and then delete it. However, I ran into another problem, which is that the script ran for over 30 seconds. The script would always get cut off before it could complete.
My solution ended up being rewriting the script so that it creates and deletes the entities one at a time. That way, even when it timed out, the script could continue where it left off. Since I only have a few hundred entities this took about 5 refreshes.
Is there a better way to run one-time refactoring scripts on Google App Engine? Is there a good way to get around the 30 second limit in order to run these refactoring scripts?

Use the task queue.
Tasks can run for more much longer than web requests. You can also split up the work into many tasks, so they will run parallel and finish faster. When you finish the task, you can programmatically insert a new task, so the whole process is automated and you don't need to manually refresh.

appengine-mapreduce is a good way to do datastore refactoring. It takes care of a lot of the messy details that you would have to grapple with when writing task code by hand.

Related

What's the right way to do long-running processes on app engine?

I'm working with Go on App Engine and I'm trying to build an API that needs to perform a long-running background task - in this case it needs to parse and chunk a big file out to task queues. I'd like it to return a 200 and close the user connection immediately and let the process keep running in the background until it's complete (this could take 5-10 minutes). Task queues alone don't really work for my use case because parsing the initial file can take more than the time limit for an API request.
At first I tried a Go routine as a solution for this problem. This failed because my app engine context expired as soon as the parent function closed the user connection. (I suppose I could try writing a go routine that doesn't require a context, but then I'd lose logging and I'd need to fetch the entire remote file and pass it to the go routine.)
Looking through the docs, it looks like App Engine used to have functionality to support exactly what I want to do: [runtime.RunInBackground], but that functionality is now deprecated and the replacement isn't obvious.
Is there a "right" or recommended way to do background processing now?
I suppose I could put a link to my big file into a task queue, but if I understand correctly, even functions called through task queues have to complete execution within a specified amount of time (is it 90 seconds?) I need to be able to run longer than that.
Thanks for any help.
try using:
appengine.BackgroundContext()
it should be long-lived but will only work on GAE Flex

NDB query().iter() of 1000<n<1500 entities is wigging out

I have a script that, using Remote API, iterates through all entities for a few models. Let's say two models, called FooModel with about 200 entities, and BarModel with about 1200 entities. Each has 15 StringPropertys.
for model in [FooModel, BarModel]:
print 'Downloading {}'.format(model.__name__)
new_items_iter = model.query().iter()
new_items = [i.to_dict() for i in new_items_iter]
print new_items
When I run this in my console, it hangs for a while after printing 'Downloading BarModel'. It hangs until I hit ctrl+C, at which point it prints the downloaded list of items.
When this is run in a Jenkins job, there's no one to press ctrl+C, so it just runs continuously (last night it ran for 6 hours before something, presumably Jenkins, killed it). Datastore activity logs reveal that the datastore was taking 5.5 API calls per second for the entire 6 hours, racking up a few dollars in GAE usage charges in the meantime.
Why is this happening? What's with the weird behavior of ctrl+C? Why is the iterator not finishing?
This is a known issue currently being tracked on the Google App Engine public issue tracker under Issue 12908. The issue was forwarded to the engineering team and progress on this issue will be discussed on said thread. Should this be affecting you, please star the issue to receive updates.
In short, the issue appears to be with the remote_api script. When querying entities of a given kind, it will hang when fetching 1001 + batch_size entities when the batch_size is specified. This does not happen in production outside of the remote_api.
Possible workarounds
Using the remote_api
One could limit the number of entities fetched per script execution using the limit argument for queries. This may be somewhat tedious but the script could simply be executed repeatedly from another script to essentially have the same effect.
Using admin URLs
For repeated operations, it may be worthwhile to build a web UI accessible only to admins. This can be done with the help of the users module as shown here. This is not really practical for a one-time task but far more robust for regular maintenance tasks. As this does not use the remote_api at all, one would not encounter this bug.

Shell concept in cakephp

I am novice in cakephp.Please explain me the concept to shells in cake php and how it is helpful in web development?
I have read http://book.cakephp.org/2.0/en/console-and-shells.html.But still I am not getting the idea of shells in relevance to web development.
Shells are relevant. You can basically write any shell command and script you need. There are other shells like the migration shell coming with the migrations plugin.
I've seen silly attempts were people used cron and wget to call an URL to execute a task every X minutes. That's the perfect example where a shell is the proper solution.
There are plenty of use cases for shells, queuing (emails for example), data conversion, data import... Everything that runs for a long time or checks something like the queuing can be done as a shell. Shells can be also utility or development tools as well. You can even control with the "nice" command how much CPU load a program is allowed to use.
So for example if you have audio or video conversion after upload this should run in the background. The shell task will look for new uploads and when it finds some convert the data in the desired format and never use up more than 20% CPU load for example and by this it won't make the site unresponsive by using 100% CPU load.
If you are new to CakePHP, I'll suggest to check 'bake' console command. ( http://book.cakephp.org/2.0/en/console-and-shells/code-generation-with-bake.html ).
It can help you to generate models, controllers, views, migrations and many others.
It's a powerful tool that will save you a lot of time. When you are generating the model using 'bake' command, it will automatically detect all the tables in your database, work out the database relationship and you can also define the validation for each field.
Here is video tutorial on YouTube http://www.youtube.com/watch?v=Kfu58OozDrM

Should I move large files generated by App Engine out of the way?

I just exported some database tables from App Engine, to guard against catastrophic loss. All my generated files, (csv, .sql3, bulkloader logs) are still in my app directory.
Then I resumed development and it seemed to be taking forever to upload my updates when I did a
appcfg.py update .
In fact, I killed the update, as it was taking a long time. So should I move these large generated files out of the way? I like them where they are, because it makes keeping track of them easier, but if it's going to slow down updates, I'm willing to move them.
Yes, you should move them. The appcfg script is going to try to process them and that will slow things down.
The other option is to modify app.yaml and add them to the ignore list.

Winforms app as Scheduled task

I've got a winforms app that I developed to do batch processing on tens of thousands of students, now we're trying to run it nightly as a scheduled task.
I personally find it useful to be able to login to the box and see how it's processing by looking at the GUI, though the standard way it to convert it into a commandline app.(which radically limits the amount of screen realestae I can use for loggin messages)
Can I run the app as a schedueld task, the IT Guy whos scheduling says it's not running because it's a winform app. Are there any tricks needed to get it to run well, or am I forced to rewrite it as a commandline app with it's 80 char width limit.
Basicaly I just echo the log file to the screen in realtime to make debugging issues easier. So the gui is output only.
Its' running as the currently loggedin user, but the issue is that it does not run if the user is not currently logged in on the box,so when we leave for the night it fails to run.
Thanks,
Eric-
You need to make sure it is running as the currently logged in account. If it runs as 'system' I don't think it will show up correctly.
I have one of these myself... and despise it. It only exists because I haven't had a chance to rewrite it into a proper service. Don't forget there are more ways to log than just outputting to the console. ;)

Resources