What is the best practice to roll out a new version in google app engine - google-app-engine

When we switch the server code to a new version in google app engine console, lots of new instances need to be spawned. Because of that, we see some 500 errors and long response time.
What is the best practice to mitigate those problems?

500 responses have not always occurred to requests during a deployment. Previously the new version of your app was able to take over traffic from the old without interruption, however that seemed to stop quite some time ago. These 500s don't appear to go to your application at all (as in no requests will show in your logs, and they won't be served by your applications 500 page). The time window also seems to vary from between none, to up to a minute.
I'm not aware of any indication that the appengine team is looking at solving this, although it seems like a bug (or at the least a reasonable feature request).
To get around this issue, we generally deploy to a different version and switch that to be the default version. Once that is serving traffic, we deploy to the previous version, then switch it back to default. This allows customers to be served uninterrupted, but it does require (at least in java land) a new build.

In addition to the other persons answer re: warmup requests, you should also look at traffic splitting - "App Engine's Traffic Splitting tool allows you to roll out features for your app slowly over a period of time, similar to what Google does when rolling out a new feature over a few days or weeks. Traffic Splitting also allows you to do A/B Testing. Traffic Splitting works by splitting incoming requests to different versions of your app."
docs here https://developers.google.com/appengine/docs/adminconsole/trafficsplitting

Set up warmup requests to load your application before actual traffic is directed to the instance:
Python: https://developers.google.com/appengine/docs/python/config/appconfig#Warmup_Requests
Java: https://developers.google.com/appengine/docs/java/config/appconfig#Warmup_Requests
Go: https://developers.google.com/appengine/docs/go/config/appconfig#Inbound_Services
PHP: https://developers.google.com/appengine/docs/php/config/appconfig#Warmup_Requests

Related

Google App Engine for long running but low CPU tasks, or long-polling?

App Engine has been great for requests that process quickly with no external API calls to databases or caches or third-party resources, but we've found that introducing any sort of "longer running" component or external latency (for example in a HTTP POST operation that runs asynchronously in the background and might take a second or two to process a few more intense database queries... totally invisible and OK from a UX perspective on the client-side because it's asynchronous but expensive to App Engine billing since it's long running) ... the "instance hours" compound and drive costs up considerably.
These sorts of expense inducing situations where a request is literally just waiting for a response from an external resource and requiring almost zero CPU during their idling seem avoidable, but I'm not sure if it's avoidable with App Engine.
It's almost like a "long poll" where the response might be left open but doing nothing.
Is there a way to do this on App Engine without just paying an insane amount for instance hours, or would we be better off moving to Compute Engine or EC2? Does it scale automatically based on CPU load, or is it based solely on open and perhaps inactive requests in total count? — threadsafe is indeed enabled.
There are really two ways to go about this one (top of mind).
Use Task Queues!
If the work doesn't need to be exactly at the same time of the request, this is exactly what [task queues] in App Engine are for. They allow you to put a job on a queue, and have another module pick up the work. They're kind of great because you can separately scale your front end and back end processes.
If that doesn't work....
Use App Engine Flexible
Under the hood App Engine Flexible is just running GCE instances. The cost structure is entirely different, since you persistently have a VM running in the background serving your requests.
Hope this helps!
What you're really worried about here is how App Engine scales your instances. Because many of your requests require few resources, your app might be able to handle many more concurrent requests on a single instance than normal. You can look into parameters that shape scaling here. Of particular interest:
max_concurrent_requests The number of concurrent requests an automatic scaling instance can accept before the scheduler spawns a new instance (Default: 8, Maximum: 80).
There is a danger here, where an instance may fill up with non-long-polling requests and become overburdened. To prevent that, you could isolate your long-polling requests into their own service and set its scaling parameters separately from the rest of your app.

Fastest Open Source Content Management System for Cloud/Cluster deployment

Currently clouds are mushrooming like crazy and people start to deploy everything to the cloud including CMS systems, but so far I have not seen people that have succeeded in deploying popular CMS systems to a load balanced cluster in the cloud. Some performance hurdles seem to prevent standard open-source CMS systems to be deployed to the cloud like this.
CLOUD: A cloud, better load-balanced cluster, has at least one frontend-server, one network-connected(!) database-server and one cloud-storage server. This fits well to Amazon Beanstalk and Google Appengine. (This specifically excludes CMS on a single computer or Linux server with MySQL on the same "CPU".)
To deploy a standard CMS in such a load balanced cluster needs a cloud-ready CMS with the following characteristics:
The CMS must deal with the latency of queries to still be responsive and render pages in less than a second to be cached (or use a precaching strategy)
The filesystem probably must be connected to a remote storage (Amazon S3, Google cloudstorage, etc.)
Currently I know of python/django and Wordpress having middleware modules or plugins that can connect to cloud storages instead of a filesystem, but there might be other cloud-ready CMS implementations (Java, PHP, ?) and systems.
I myself have failed to deploy django-CMS to the cloud, finally due to query latency of the remote DB. So here is my question:
Did you deploy an open-source CMS that still performs well in rendering pages and backend admin? Please post your average page rendering access stats in microseconds for uncached pages.
IMPORTANT: Please describe your configuration, the problems you have encountered, which modules had to be optimized in the CMS to make it work, don't post simple "this works", contribute your experience and knowledge.
Such a CMS probably has to make fewer than 10 queries per page, if more, the queries must be made in parallel, and deal with filesystem access times of 100ms for a stat and query delays of 40ms.
Related:
Slow MySQL Remote Connection
Have you tried Umbraco?
It relies on database, but it keeps layers of cache so you arent doing selects on every request.
http://umbraco.com/azure
It works great on azure too!
I have found an excellent performance test of Wordpress on Appengine. It appears that Google has spent some time to optimize this system for load-balanced cluster and remote DB deployment:
http://www.syseleven.de/blog/4118/google-app-engine-php/
Scaling test from the report.
parallel
hits GAE 1&1 Sys11
1 1,5 2,6 8,5
10 9,8 8,5 69,4
100 14,9 - 146,1
Conclusion from the report the system is slower than on traditional hosting but scales much better.
http://developers.google.com/appengine/articles/wordpress
We have managed to deploy python django-CMS (www.django-cms.org) on GoogleAppEngine with CloudSQL as DB and CloudStore as Filesystem. Cloud store was attached by forking and fixing a django.storage module by Christos Kopanos http://github.com/locandy/django-google-cloud-storage
After that, the second set of problems came up as we discovered we had access times of up to 17s for a single page access. We have investigated this and found that easy-thumbnails 1.4 accessed the normal file system for mod_time requests while writing results to the store (rendering all thumb images on every request). We switched to the development version where that was already fixed.
Then we worked with SmileyChris to fix unnecessary access of mod_times (stat the file) on every request for every image by tracing and posting issues to http://github.com/SmileyChris/easy-thumbnails
This reduced access times from 12-17s to 4-6s per public page on the CMS basically eliminating all storage/"file"-system access. Once that was fixed, easy-thumbnails replaced (per design) file-system accesses with queries to the DB to check on every request if a thumbnail's source image has changed.
One thing for the web-designer: if she uses a image.width statement in the template this forces a ugly slow read on the "filesystem", because image widths are not cached.
Further investigation led to the conclusion that DB accesses are very costly, too and take about 40ms per roundtrip.
Up to now the deployment is unsuccessful mostly due to DB access times in the cloud leading to 4-5s delays on rendering a page before caching it.

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

Latency accessing Google App Engine overseas

I am about to begin development of a web app in New Zealand for a NZ market for which scalability is a key requirement. I am contemplating using Google Apps Engine which I have used in the past for smaller projects where latency was not a big issue, because half the apps are client side Java script.
However, the new project requires fast AJAX response times. The local web-app companies charge about $175/month (much more than in the US I would imagine) for a dedicated server.
Is there likely to be a significant difference between the latency for AJAX requests if I use Google Apps Engine (hosted in the US I presume??) vs the local hosting company who host here in New Zealand? If so how big?
A service which may interest you in this context is CloudSleuth. They measure page load times from multiple locations. But select Asia/Oceania for Location. Then drill down for GAE to see page load time from various location. Unfortunately the closest will be Sydney, where page load for GAE currently is almost 20s.
From your explanation you would like to use App Engine as your backend, there should not be any latency problems other that the time your app would take to load and serve a request. But as they say, there is no better test like the one you do it yourself, so go ahead play with App Engine and see it for yourself!
Happy coding!
It's unavoidably the case that the latency for a request within New Zealand is going to be lower than the latency for a request to the US and back, all else being equal. There are several mitigating factors to consider, though:
The speed-of-light delay may not be significant for your application. The round trip time to the US and back is under 100 milliseconds; the latency generated by your app serving the request may be large enough that this is not a significant factor on the end-user latency.
Although your app is only in a single location at any one time, Google has caching frontends all around the world. Requests typically get routed to the closest one, and if your app generates cacheable responses, the frontend may be able to return a response from its cache immediately, without having to ever send the request to your app.
Some ISPs, particularly in places like NZ where international bandwidth is expensive, run transparent proxies. Likewise, so do organisations, and your browser itself has a cache. Any of these can satisfy the request in less time than a roundtrip, if the response is cacheable.
In the end, the question is whether or not the extra 100 milliseconds or so is acceptable. More often than not, the answer is yes, and it's worth the tradeoff of not having to handle machine provisioning, maintenance, etc etc yourself.
App Engine is not globally distributed.
The whole application is hosted around North America by default.
It you pay for the service you may request hosting within Europe instead, but there is no option to select any other regions (from https://developers.google.com/appengine/docs/python/gettingstartedpython27/uploading).

Relative advantages of storage using Amazon Web Services S3 vs Google Application Engine

What do you see as the advantages and disadvantages of Amazon Web Services S3 compared with Google Application Engine? The cost per gigabyte for the two is, at the time I ask, roughly similar; I have not seen any widespread complaints about the quality of service; so I think the decision of which one to use may depend on the API (of all things).
Google's API breaks your content into what they call static content, such as your CSS files, favicons, images, etc and non-static dynamically-generated HTTP responses. Requests for static stuff will be served to whoever requests it until your bandwidth limit is reached; non-static requests will be fulfilled until your bandwidth or CPU limit is reached. With respect to your non-static requests, you can provide any logic you are able to express in Python, so you can be choosy about who you serve.
Amazon's API treats all your content as blobs in a bucket, and provides an access protocol that lets you distinguish between a variety of fulfillable requests ranging from world-readable to owner-only. If you want to something that's not in the kit, though, I don't know what you do beyond being thoughtful about distributing your URIs.
What differences do you see between the two? Are there other cloud storage services you like? Zetta had a press release today, but they're looking for a minimum of ten terabytes on the beta application, and none of my clients are there (yet); and Joyent will probably do something in the near future.
The way I see it is the Google App Engine basically provides a sandbox for you to deploy your app as long as it is written with their requirements (Python etc). Amazon gives you a virtual machine with a lot more flexibility in what can be done but probably more work on your side needed. MS new Azure seems to be going down the GAE route, but replace Python with .NET.
GAE has a limit of 10MB each on static files uploaded through appcfg.py (look right at the bottom of http://code.google.com/appengine/docs/python/tools/uploadinganapp.html). Obviously you can write code to slice large files into bits and reassemble at download time, but it suggests to me that Google doesn't expect App Engine to be used just as a simple CDN, and that if you want to use it as one you'll have to do some work. S3 does the job out of the box, all you have to do is grab a third-party interface app.
If you want to do something non-standard with file access on S3, then probably Amazon expects you to spring for a server instance on EC2. Once this is done, you have much more flexibility than GAE's environment, but you pay more (in cash and probably in maintenance).
The plus point for GAE is that it has "cheap" on its side for small apps (up to 1GB storage, 1GB bandwidth and 1.3 million hits a day are free: http://code.google.com/appengine/docs/quotas.html). Depending on your use, this might be significant, or it might be irrelevant on the scale of your total bandwidth costs.
Coincidentally, I have just this last couple of days looked at GAE for the first time. I took an old Perl CGI script and turned it into a GAE app, which is up and running. About 10 hours total, including reading the GAE introductory docs and remembering how Python is supposed to work enough to write a couple of hundred lines. I'd speculate that's more effort than loading a bunch of files onto S3, but less effort than maintaining EC2 server(s). However, I haven't used Amazon.
[Edited to add: this sounds like the advantages are all with Amazon for commercial purposes. This may well be true, but then GAE is not yet mature and presumably will get better from here fairly rapidly. They only let people start paying in December or so, before that it was free-quota-only except by special arrangement with Google. While Google sometimes takes flack for its claims of "perpetual beta", I think GAE genuinely is still starting up. If your app is a good fit for the BigTable data paradigm, then it might scale better on GAE than EC2. For storage I assume that S3 is already good enough for all reasonable purposes, and Google's clever architecture gives GAE no advantages to compensate when all you're doing is serving files.]
* Except that Google has just offered me a preview of GAE's Java support.
** Just noticed that you can set up chron jobs, but they're limited by the same rules as any other request (30 second runtime, can't modify files, etc).

Resources