I have a couple of questions about the App Engine Map Reduce API. First of all there's a mapreduce package in the SDK, and there's a separate mapreduce bundle here:
https://developers.google.com/appengine/downloads
Which one should I be using? Should I be using the bundle, or is the documentation out of date and I should actually use the SDK version?
Second I'd like to be able to run mapreduce's on a non-default version to make sure that the requests from the mapreduce don't interfere with user requests.
What's the best way to do this? Can I start the pipeline with a task queue, and set the target version of that queue to be my non-default version?
We recommend using the open source version of Map Reduce for GAE at http://code.google.com/p/appengine-mapreduce/
The stale bundle link in the docs is a bug. That'll get cleaned up soon.
A few of our SDKs have bits of MapReduce (for historic reasons), but the open source version is the way to go for now.
As for using a separate version, this is kind of "it depends". If you're thinking of interference in terms of competition for the processor, that's not likely to be a noticeable issue. Depending on queue processing rates you've set up, more instances of your app will be spun up to handle mapping tasks as needed. I'd try some experiments first. Make sure you have a problem before you invest time and effort solving it.
mapreduce can be start on a not default version. And after it starts, it will continue run on that version automatically.
In my case I just deploy the code on a non default version and trigger the mapreduce with version_id.app_id.appspot.com/path_to_start_a_job.
cron job can also trigger the mapreduce on non default version without problem.
Related
I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).
I have a website on AppEngine that is 99% static. It is running on Python 2.7 runtime. Now the time has come to evolve this webapp, and since I have almost none Python code in it, I'd prefer to write it in Go instead.
Can I change runtime from Python 2.7 to Go, while keeping the project intact? Specifically, I want to keep the same app-ID, the same custom domain attached to it, the same SSL certificate, and so on.
What do I have to do in order to do that? I surely have to change runtime in the app.yaml. Is there anything else?
Bonus question: will such change happen without a downtime?
I'd be grateful for any links to documentation on exactly that (swapping runtime on a live app). I can't find any.
Specify a runtime as well as a new value for version. When deployed you'll have an older version that is Python and a newer version that is Go. There won't be any downtime (same as when deploying a newer version of Python).
Rather than trusting links/docs (that may be out of date or not 100% exactly what you're trying to do), why not create a new GAE-Std project for testing purposes and try it yourself. Having a GAE-Std test project is good for testing new function (especially by other testers who won't have access to the dev environ on your laptop).
The GAE services offer complete code isolation. So it should be possible to simply deploy a new version of the service, which can be written in a different language or even use a different GAE (standard/flex) environment. Personally I didn't go through a language change, but I did go through a split of a single-service app into a multi-service one, I see no reason for which the same principles wouldn't apply.
Maybe develop the new version as a separate app first, to be able to test it properly without risking an accidental impact on the old version and only after that bring the code as a new version in the old app. That'd be using the GAE project isolation. You can, in fact, test the entire version migration as a separate app if you so desire without even touching the existing app. I am using this technique - a separate app ID - to implement a staging environment for my app, completely isolated from my production app, see How to copy / clone entire Google App Engine Project
Make sure to not switch traffic to the new version at deployment time. This keeps the app working with the old version. Test first that the new version works as expected using Targeted routing. Then maybe use Splitting traffic across multiple versions to perform A/B testing with just a small percentage of the traffic going to the new version. Finally, when happy with the results, switch all traffic to the new version.
You need to pay special attention to the app-level configs (dispatch, cron, queue, datastore indexes), shared by all services/versions. They need to be functionally equivalent in the 2 versions. The service isolation doesn't apply to them, only project isolation can ensure no impact to the old version.
There should be no need to make any change to the app ID, custom domain mapping or SSL config. The above mentioned tests should confirm that.
A few potentially interesting posts related to re-working services/modules:
Converting App Engine frontend versions to modules
Google App Engine upgrading part by part
Migrating to app engine modules, test versions first?
Advantages of implementing CI/CD environments at GAE project/app level vs service/module level?
I am having trouble with using Push Queues on Google App Engine's Flexible Environment (formally named, their Managed VM Environment). I am receiving numerous 404 Instance Unavailable (see picture below).
After a bit of sleuthing, I believe these errors may be because I am adding a task to a task queue, then deploying a new version of the Flexible VM instance. The taskqueue that I previously pushed is locked to the older instance, and can no longer run. Is this how taskqueues work with Flexible VM? If so, how does one use push taskqueues with the Flexible VM?
I was 90% done migrating to flexible env when I came across this same problem. After extensive research, I concluded there are three options:
REST API (experimental)
Use the beta REST API for task queues (this, as all other google APIs from flexible env, is external, so you need to deal with auth appropriately).
REST API reference: https://cloud.google.com/appengine/docs/python/taskqueue/rest/
Note, this is external and experimental. Find e.g. a java sdk without any meaningful documentation here: https://developers.google.com/api-client-library/java/apis/ (current version: https://developers.google.com/api-client-library/java/apis/taskqueue/v1beta2)
Compat runtime
Build your own flexible environment, based off a -compat runtime. This offers the old appengine api in a container suitable for the flexible env:
https://cloud.google.com/appengine/docs/flexible/custom-runtimes/build (look for images with "YES" in the last column)
e.g.: https://cloud.google.com/appengine/docs/flexible/java/dev-jetty9-and-apis
https://cloud.google.com/appengine/docs/flexible/java/migrating-an-existing-app
Note: I spent two weeks in blistered frustration pleading every God almighty help me get this to work, following container rabbit holes into the depths of Lucifer's soul and across unexplored dimensions. I eventually had to give in. I just can't get this to work to a satisfying degree.
Proxy service
Kind of a hacky alternative, but it gets the job done: create a very thin standard environment wrapper service which proxies tasks onto / off your queue. Pass them to your own app however you want. ¯\_(ツ)_/¯
Downside is you are now spinning up extra instances and burning extra minutes.
I ended up with a variation of this, where I'm using a proxy service in standard env, but just ported my eventual task handler to AWS Lambda (so it's completely off GAE). It's a different disaster, but a more manageable one.
Good luck!
I'm the tech lead and manager for this product.
There are two distinct answers to your question.
For the first, it seems like you have a version routing issue -- as you say, tasks cannot run against a VM because you launched a new version. By default, tasks are assigned to run on the version from which they were enqueued to avoid version mismatches. You should be able to override the version by re-configuring the target in your queue.yaml (or queue.xml). Documentation for that can be found here. You might also need to look at your
From a broader perspective, building a migration path away from standard/MVM-only support for task queues is currently our highest priority.
The replacement is Cloud Tasks, which exposes the same interface but can be used fully independently from App Engine. It exists in the same universe as AppEngine Task Queues, so you will be able to add tasks to existing queues (both push and pull). It is currently available in closed alpha. You can sign up to join the alpha here.
We strongly recommend against writing new code against the REST API. It is unsupported and the cloud tasks alpha is already substantially more feature complete.
I'm upvoting hraban's answer (he did wrestle with the devil after all) but providing an additional answer here.
Keep in mind that the Flexible Environment (Managed VMs) is still just a compute engine instance... with Google doing a good job of extracting features from AppEngine to make them reachable in a transparent manner. TaskQueues didn't quite make it. Keep a sharp eye on the cloud library--that's the mechanism by which the DataStore becomes usable (for Java go to http://googlecloudplatform.github.io/google-cloud-java/0.3.0/index.html). If you go to that link you can also select other languages. I have it on official word that TaskQueues are still on the roadmap (but no ETA).
As of now you can't use the REST api to enqueue onto PUSH queues. Now the way that I decided to tackle this problem was to use the REST API and create a PULL queue to put tasks in. Then I poll that queue inside an AppEngine service (i.e. module) and put it into a PUSH task queue. Why do I go to all that trouble? Because I need scheduled execution... which is a feature that TaskQueues alone can give you on AppEngine. So I package my task in an envelope and then unpack and re-push it into a task queue. Sounds crazy? It's been working reliably for me. Don't be scared off by the fact that the REST api is "alpha".
I will say if you're starting something new take a good look at the Pub/Sub API.
I'm working on a very simple web app, written in Go language.
I have a standalone version and now port it to GAE. It seems like there is very small changes, mainly concerning datastore API (in the standalone version I need just files).
I also need to include appengine packages and use init() instead of main().
Is there any simple way to merge both versions? As there is no preprocessor in Go, it seems like I must write a GAE-compatible API for the standalone version and use this mock module for standalone build and use real API for GAE version. But it sounds like an overkill to me.
Another problem is that GAE might be using older Go version (e.g. now recent Go release uses new template package, but GAE uses older one, and they are incompatible). So, is there any change to handle such differences at build time or on runtime?
Thanks,
Serge
UPD: Now GAE uses the same Go version (r60), as the stable standalone compiler, so the abstraction level is really simple now.
In broad terms, use abstraction. Provide interfaces for persistence, and write two implementations for that, one based on the datastore, and one based on local files. Then, write a separate main/init module for each platform, which instantiates the appropriate persistence interface, and passes it to your main application to use.
My immediate answer would be (if you want to maintain both GAE and non-GAE versions) that you use a reliable VCS which is good at merging (probably git or hg), and maintain separate branches for each version. The GAE API fits in reasonably well with Go, so there shouldn't be too many changes.
As for the issue of different versions, you should probably maintain code in the GAE version and use gofix (which is unfortunately one-way) to make a release-compatible version. The only place where this is likely to cause trouble is if you use the template package, which is in the process of being deprecated; if necessary you could include the new template package in your GAE bundle.
If you end up with GAE code which you don't want to run on Google's servers, you can also look into AppScale.
Recently I've started using limited staging on my Google App Engine project. The data is still shared between all versions, but behaviour (especially user facing behaviour) is different.
Naturally when I implement something incredibly new it only runs on the latest version of my code and I don't feel like it should be backported to the older versions.
Some of this new functionality requires cron jobs to be run periodically, but I'm hitting a problem. I have to run a cron job to call the latest code, but this is what Google's documentation has to say about the issue:
Cron requests are always sent to the default version of the application.
The default version is the oldest because the first versions of the client code that went out to users weren't future proof and don't know how to select which API version to call.
So my question is, how can I get around this limitation and make a cron job that will call the latest rather than the default version of the application?
You can now specify a version using the target tag.
<target>version-2</target>
You can't change the cron jobs to run on a different version then the default.
Depending on how much time your cron job takes to run you could change your cron job script to to do a URLFetch to "http://latest.appname.appspot.com/cron_job_endpoint".
If you're cron job takes longer then 10 minutes to run, then I would design it in a way that you can chain the different tasks using task queues.