I want to handle user data in queue with some external program. Is it possible? First of all i must upload and install this software somewhere... Then run it within queue environment. How i can do it?
If it is impossible on GAE, can you advice me another cloud platform with ability to run programs with queue interface, like for example OGE or something else..?
I guess you need both access to GAE Datastore (to load user data) and ability to execute external program (to analyse it)?
At the moment GAE does not allow executing arbitrary programs, so answer is no.
However, there is an upcoming feature called VM-based backends, which will allow you to start Compute Engine instances (with ability to run arbitrary programs) and have those instances access the GAE Datastore. This is at the moment a trusted-tester feature (a limited beta), I guess it'll be available in a couple of months.
Related
I need to run some specific code that can't be run on Google AppEngine (because of restrictions).
Since these workers are asynchronous, I thought about launching a Compute instance every time I need it and connecting them via a specific Tasks via the Task Queue from Google AppEngine, but I can not find any documentation about if this is possible?
TL;DR: Is it possible to specify a Google Compute as instance for a Task queue?
No, there is no way to specify a Google Compute as instance for a Task queue.
But did you consider using the Flexible environment (eventually with a custom runtime to try to address the restrictions) instead? Or the alternatives suggested for the Flexible env (only has limited task queue support) From Task Queue:
The Task Queue service has limited availability outside of the
standard environment. If you want to use the service outside of the
standard environment, you can sign up for the Cloud Tasks alpha.
Outside of the standard environment, you can't add tasks to push
queues, but a service running in the flexible environment can be
the target of a push task. You can specify this using the
target parameter when adding a task to queue or by specifying
the default target for the queue in queue.yaml.
In many cases where you might use pull queues, such as queuing up
tasks or messages that will be pulled and processed by separate
workers, Cloud Pub/Sub can be a good alternative as it offers
similar functionality and delivery guarantees.
I would like to try out dropwizard-metrics + graphite.
For order to this to work out i need to run a job regular (e.g. each 5:th second) that sends metrics from the instance to the graphite server.
Is this even possible?
The java documentation and python documentation for Cron in App Engine says that the minimal interval in App Engine is configurable in 'minutes'. Thus the simple answer would be: No you cannot schedule a job every 5 seconds.
However...
Knowing that tasks queue tasks can run up to 10 Minutes (see deadlines) you could manually schedule a task every (let's say) 5 minutes and handle the 5 second interval yourself in your servlet (or whatever it is called in python).
I'm just saying it is possible to use suche short intervals. You should really avoid a crutch like that. This kind of behaviour will eat through your quota and make your app expensive really fast.
Edit:
Since the title of the question asks for running jobs in specific instances:
As Dmitry pointed out and as documented here it is possible to address specific instances when using manual or basic scaling with modules. Instances are anonymous when using automatic scaling and thus cannot be addressed. It seems this feature is only documented and available for app engine modules.
I am using the Google Appengine remote shell, with Python. I am walking through an entire database table updating all my entities, and I am doing this in 500 entity chunks. This is all working fine. The task involves
fire up the remote shell
kick off the job
wait 10 minutes
rinse, repeat
I'd like to keep this up while I'm at work, and just do it in the background, without of course impacting my productivity :-). What's getting in my way is the firewall, which prevents this sort of transfer of data, when logged in over VPN.
So is there a way to do this, like in a separate Emacs shell? If I had two computers, I'd just run this thing on my spare, but I don't. (I do have an iPad, but I doubt that helps).
I may be misunderstanding the core issues, and hence, my question.
Rather than using the remote shell, it'll probably be easier - and certainly quicker - to run the job entirely on the server via the mapreduce API.
I've encountered this problem for a second time now, and I'm wondering if there is any solution to this. I'm running an application on Google App Engine that relies on frequent communication with a website through HTTP JSON RPC. It appears that GAE has a tendency to randomly display a message like this in the logs:
"This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application."
And reset all variables stored in RAM without warning. The same process happens over and over no matter how many times I set the variables again or upload newer code to GAE, although incrementing the app version number seems to solve the problem.
How can I get more information on this behaviour, how to avoid it and prevent data loss of my Golang applications on Google App Engine?
EDIT:
The variables stored in RAM are small classes of strings, bytes, bools and pointers. Nothing too complicated or big.
Google App Engine seems to "start a new process" in matter of seconds of heavier use, which shouldn't be long enough time for the application to be shut down for not being used. The timespan between application being uploaded to GAE, having its variable set and a new process being created is less than a minute.
Do you realize that GAE is a cloud hosting solution that automatically manages instances based on the load? This is it's main feature and reason people are using it.
When load increases, GAE creates a new instance, which , of course, has all RAM variables empty.
The solution is not to expect variables to be available or store them to permanent storage at the end of request (session, memcache, datastore) and load them if not present at the beginnig of request.
You can read about GAE instances in their documentation here, check out the performance section:
http://code.google.com/appengine/kb/java.html
In your case of having small data available, if its static then you can load it into memory on startup of a new instance. If it's dynamic data, you should be saving it to the database using their api.
My recommendation for keeping a GAE instance alive, either pay for the Always-On service or follow my recommendations for using a cron here:
http://rwyland.blogspot.com/2012/02/keeping-google-app-engine-gae-instances.html
I use what I call a "prime schedule" of a 3, 7, 11 minute cron job.
You should consider using Backends if you want long running instances with resident memory.
I'm currently developing a small hobby project (open sourced at https://github.com/grav/mailbum) which quite simply takes images from a Gmail account and puts them in albums on Picasa Web.
Since it's (currently) only dealing with Google-hosted data, I was thinking about hosting it on Google App Engine, but I'm not sure if it's well-suited for GAE:
Will the maximum execution time be a problem? It's currently 10 minutes according to http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html, but I'd think the tasks (i.e. processing a single mail) would be easy to run in parallel. I'm also guessing that dealing with Google-hosted data would be quite efficient on GAE?
Will the fact that it's written in Clojure be an obstacle? I've researched a bit in getting Clojure to run on GAE, but I've never tried it. Any pin-pointers?
Thanks for any advice and thoughts on the project!
It seems like your application is doable on GAE. My points of concern would be:
Does your code ever store the images that it is processing to temporary files? If so it will need to be changed to do everything in memory, because GAE applications are sandboxed and not allowed to write to the filesystem (if you need temporary persistent storage, you might be able to work something out where you write your file data to a BLOB field in the GAE datastore).
How do you get the images into Picasa Web? If they provide a simple REST/HTTP API then all is well. If you need something more involved than that (like a raw TCP socket) then it won't work.
The 10-minute execution time limit only applies to background tasks. When actually servicing web requests the time limit is 30 seconds. So if you provide a web-based interface to your app, you need to structure things so that the interface is just scheduling jobs that run in the background (i.e. you can't fire off a job directly as part of servicing a web request).
If none of those sound like show-stoppers to you, then I think your app should work just fine on GAE.
Can't really say if Clojure will work though. I have, however, spent time in the past getting some third-party libraries to work on App-Engine. Generally all I had to do was remove/modify/disable any parts of the library that accessed features that are forbidden by the sandbox (for instance, I had to disable the automatic caching to disk to get commons-fileupload to work on GAE). Not sure if the same would apply to Clojure, or even what the scope would be on a task like that.
I have been dabbling with Clojure and App Engine for a while now and I have to recommend appengine-magic. It abstracts most of the Java stuff away and is very easy to use. As a plus the project seems to be very active.