I am having trouble with using Push Queues on Google App Engine's Flexible Environment (formally named, their Managed VM Environment). I am receiving numerous 404 Instance Unavailable (see picture below).
After a bit of sleuthing, I believe these errors may be because I am adding a task to a task queue, then deploying a new version of the Flexible VM instance. The taskqueue that I previously pushed is locked to the older instance, and can no longer run. Is this how taskqueues work with Flexible VM? If so, how does one use push taskqueues with the Flexible VM?
I was 90% done migrating to flexible env when I came across this same problem. After extensive research, I concluded there are three options:
REST API (experimental)
Use the beta REST API for task queues (this, as all other google APIs from flexible env, is external, so you need to deal with auth appropriately).
REST API reference: https://cloud.google.com/appengine/docs/python/taskqueue/rest/
Note, this is external and experimental. Find e.g. a java sdk without any meaningful documentation here: https://developers.google.com/api-client-library/java/apis/ (current version: https://developers.google.com/api-client-library/java/apis/taskqueue/v1beta2)
Compat runtime
Build your own flexible environment, based off a -compat runtime. This offers the old appengine api in a container suitable for the flexible env:
https://cloud.google.com/appengine/docs/flexible/custom-runtimes/build (look for images with "YES" in the last column)
e.g.: https://cloud.google.com/appengine/docs/flexible/java/dev-jetty9-and-apis
https://cloud.google.com/appengine/docs/flexible/java/migrating-an-existing-app
Note: I spent two weeks in blistered frustration pleading every God almighty help me get this to work, following container rabbit holes into the depths of Lucifer's soul and across unexplored dimensions. I eventually had to give in. I just can't get this to work to a satisfying degree.
Proxy service
Kind of a hacky alternative, but it gets the job done: create a very thin standard environment wrapper service which proxies tasks onto / off your queue. Pass them to your own app however you want. ¯\_(ツ)_/¯
Downside is you are now spinning up extra instances and burning extra minutes.
I ended up with a variation of this, where I'm using a proxy service in standard env, but just ported my eventual task handler to AWS Lambda (so it's completely off GAE). It's a different disaster, but a more manageable one.
Good luck!
I'm the tech lead and manager for this product.
There are two distinct answers to your question.
For the first, it seems like you have a version routing issue -- as you say, tasks cannot run against a VM because you launched a new version. By default, tasks are assigned to run on the version from which they were enqueued to avoid version mismatches. You should be able to override the version by re-configuring the target in your queue.yaml (or queue.xml). Documentation for that can be found here. You might also need to look at your
From a broader perspective, building a migration path away from standard/MVM-only support for task queues is currently our highest priority.
The replacement is Cloud Tasks, which exposes the same interface but can be used fully independently from App Engine. It exists in the same universe as AppEngine Task Queues, so you will be able to add tasks to existing queues (both push and pull). It is currently available in closed alpha. You can sign up to join the alpha here.
We strongly recommend against writing new code against the REST API. It is unsupported and the cloud tasks alpha is already substantially more feature complete.
I'm upvoting hraban's answer (he did wrestle with the devil after all) but providing an additional answer here.
Keep in mind that the Flexible Environment (Managed VMs) is still just a compute engine instance... with Google doing a good job of extracting features from AppEngine to make them reachable in a transparent manner. TaskQueues didn't quite make it. Keep a sharp eye on the cloud library--that's the mechanism by which the DataStore becomes usable (for Java go to http://googlecloudplatform.github.io/google-cloud-java/0.3.0/index.html). If you go to that link you can also select other languages. I have it on official word that TaskQueues are still on the roadmap (but no ETA).
As of now you can't use the REST api to enqueue onto PUSH queues. Now the way that I decided to tackle this problem was to use the REST API and create a PULL queue to put tasks in. Then I poll that queue inside an AppEngine service (i.e. module) and put it into a PUSH task queue. Why do I go to all that trouble? Because I need scheduled execution... which is a feature that TaskQueues alone can give you on AppEngine. So I package my task in an envelope and then unpack and re-push it into a task queue. Sounds crazy? It's been working reliably for me. Don't be scared off by the fact that the REST api is "alpha".
I will say if you're starting something new take a good look at the Pub/Sub API.
Related
following the golang library instructions if you write logs with the client library, where can one see those logs when running your server locally during development (eg via go run main.go)?
in my case (not sure if it's relevant) i'm using the library as part of golang logic in appengine, and even the relevant-looking instructions on "viewing logs" for those docs don't mention local development explicitly. Is that because it (running gcloud app logs tail and seeing local server logs) should "just work" or because there's no way to see logs for a local logs sdk interaction?
It's a good question and the Cloud Logging libraries do appear bound to Google's Cloud Logging service but, for local development (your question) and, loose-coupling as a generally good principle, these libraries really ought to be pluggable. Why shouldn't services running on e.g. GCP route logs to e.g. AWS?
With OpenTelemetry (nee OpenCensus), Google (and others) promote the ability to disconnect metric and trace production from consuming services, and logs aren't distinctly different.
A popular logging library in Go, Logrus supports pluggable logging via Hooks and an old (!) Stackdriver Logging implementation exists; this should be straightforward to upgrade to the current API (version).
Meantime, I think your question would benefit from being posted to Google's public issue tracker for Stackdriver (sic.) logging (link) and I'm going to ask someone who's very familiar with Cloud Logging as she may have some insight into this for us.
Update
I emailed with some former colleagues at Google and learned that Open Telemetry will eventually encompass logging. This is mentioned briefly on the project's About page.
tl;dr Tentatively answering myself: that's not supported - instead one has to just conditionally swap out calls to regular logger if env (eg empty GAE_INSTANCE env variable) indicates you're on localhost.
Walking through the code under the NewClient(...) call on the logging package, I end up a spot where the upstream API is really being called (note the rpc context used by the very last turtle - I never saw logic as I walked through that seemed to be switching to something for local development), so I suspect there really is no emulation capturing.
EDIT: See DazWilkin's helpful answer below for more context
In trying to deploy a laravel project to a flexible environment in App Engine, I was hit with this message.
It has been two days now, any way to solve this while maintaining the same region for my project?
You can try deleting old versions of your deployment and/or consider reducing the resources you may need for your deploys (including max_instances). If you can't change this I think you should wait until there are resources or change your project to another region.
In addition, if your applications allows it, you can try on App Engine Standard Environment which does not use a VM in Compute Engine as the backend being this is my opinion the best option to maintain the same region for your project.
Using Python 3.4 Google App Engine Flex.
Google documentation on using pull queues with Python says to "from google.appengine.api import taskqueue", but does not explain how to make taskqueue available to Python runtime.
They do link to "Easily Access Google API's from Python", where it explains how to install the api client via "pip install google-api-python-client"
This does not install the taskqueue lib.
From the previous doc, there is a link to "Installation", where it says:
Because the Python client libraries are not installed in the App Engine Python runtime environment, they must be vendored into your application just like third-party libraries.
This links to another page "Using third-party libraries", which states you need to either install a lib to /lib or use requirements.txt. Neither of these make taskueue available.
Searching for taskqueue.py in Google's github shows only an example module with the same name.
There is a documentation page on the module, but no information on how to install it.
There is a Python 2.7 example that google points to here, but it doesn't work. There is no setup of taskqueue, no requirements.txt, no instructions.
There is a stack overflow question on this topic here, and the checked answer says to install the SDK. That takes you to here, which takes you to here, which takes you here, which takes you to here, which provides the gcloud SDK download for deploying and managing gcloud. This does not include the python lib for taskqueue.
There is another similar stackoverflow question here, which says:
... this is now starting to feel like an infinite loop. Yes, it's been made crystal clear you need to import the taskqueue. But how do you make it available?
I've asked the question to Google Support and they haven't been able to answer for 4 days.
I've opened two issues, one here and another here. No answers yet.
Do not want to use Python < 3.4.
Do not want to use HTTP REST API.
Just want a simple pull queue.
Many of the docs you mentioned are standard environment docs and do not apply to the flexible environment.
From the Task Queue section in Migrating Services from the Standard Environment to the Flexible Environment:
The Task Queue service has limited availability outside of the
standard environment. If you want to use the service outside of the
standard environment, you can sign up for the Cloud Tasks alpha.
Outside of the standard environment, you can't add tasks to push
queues, but a service running in the flexible environment can be
the target of a push task. You can specify this using the
target parameter when adding a task to queue or by specifying
the default target for the queue in queue.yaml.
In many cases where you might use pull queues, such as queuing up
tasks or messages that will be pulled and processed by separate
workers, Cloud Pub/Sub can be a good alternative as it offers
similar functionality and delivery guarantees.
I have started to try to use the Google Cloud datalab. While I understand it is a Beta product, I find the Doc's very frustrating, to say the least.
The questions here and lack of responses as well as lack of new revisions or docs over the several months the project has been available make me wonder if there is any commitment to the product?
A beginning would be a notebook that shows data ingestion from external sources to both the datastore system and the Big query system. That is a common use case. I'd like to use my own data, it would be great to have a Notebook to ingest it. It seems that should be doable without huge effort? And it would get me (and others) out of this mess trying to link the various terse docs from various products and workspaces up and working together..
in addition to a better explanation of the Git hub connection process (prior question))
For BigQuery, see here: https://github.com/GoogleCloudPlatform/datalab/blob/master/content/datalab/tutorials/BigQuery/Importing%20and%20Exporting%20Data.ipynb
For GCS, see here: https://github.com/GoogleCloudPlatform/datalab/blob/master/content/datalab/tutorials/Storage/Storage%20Commands.ipynb
Those are the only two storage options currently supported in Datalab (which should not be used in any event for large scale data transfers; these are for small scale transfers that can fit in memory in the Datalab VM).
For Git support, see https://github.com/GoogleCloudPlatform/datalab/blob/master/content/datalab/intro/Using%20Datalab%20-%20Managing%20Notebooks%20with%20Git.ipynb. Note that this has nothing to do with Github, however.
As for the low level of activity recently, that is because we have been heads down getting ready for GCP Next (which happens this coming week). Once that is over we should be able to migrate a number of new features over to Datalab and get a new public release out soon.
Datalab isn't running on your local machine. Just the presentation part is in your browser. So if you mean the browser client machine, that wouldn't be a good solution - you'd be moving data from the local machine to a VM which is running the Datalab Python code (and this VM has limited storage space), and then moving it again to the real destination. Instead, you should use the cloud console or (preferably) gcloud command line on your local machine for this.
I have a couple of questions about the App Engine Map Reduce API. First of all there's a mapreduce package in the SDK, and there's a separate mapreduce bundle here:
https://developers.google.com/appengine/downloads
Which one should I be using? Should I be using the bundle, or is the documentation out of date and I should actually use the SDK version?
Second I'd like to be able to run mapreduce's on a non-default version to make sure that the requests from the mapreduce don't interfere with user requests.
What's the best way to do this? Can I start the pipeline with a task queue, and set the target version of that queue to be my non-default version?
We recommend using the open source version of Map Reduce for GAE at http://code.google.com/p/appengine-mapreduce/
The stale bundle link in the docs is a bug. That'll get cleaned up soon.
A few of our SDKs have bits of MapReduce (for historic reasons), but the open source version is the way to go for now.
As for using a separate version, this is kind of "it depends". If you're thinking of interference in terms of competition for the processor, that's not likely to be a noticeable issue. Depending on queue processing rates you've set up, more instances of your app will be spun up to handle mapping tasks as needed. I'd try some experiments first. Make sure you have a problem before you invest time and effort solving it.
mapreduce can be start on a not default version. And after it starts, it will continue run on that version automatically.
In my case I just deploy the code on a non default version and trigger the mapreduce with version_id.app_id.appspot.com/path_to_start_a_job.
cron job can also trigger the mapreduce on non default version without problem.