Feasibility/value of caching with AppEngine backends? - google-app-engine

I've just finished watching the Google IO 2011 presentation on AppEngine backends (http://www.google.com/events/io/2011/sessions/app-engine-backends.html) which piqued my curiosity about using a backend instance for somewhat more reliable and configurable in-memory caching. It could be an interesting option as a third layer of cache, under in-app caching and memcache, or perhaps as a substitute for some cases where higher reliability is desirable.
Can anyone share any experience with this? Googling around doesn't reveal much experimentation here. Does the latency of a URLfetch to retrieve a value from a backend's in-memory dictionary render it less attractive, or is it not much worse than a memcache RPC?
I am thinking of whipping up some tests to see for myself, but if I can build on the shoulder of giants...thanks for any help :)

Latency between a backend and frontend instance is extremely low.
If you think about it, all App Engine RPC's are fulfilled with "backend instances". The backends for the Datastore and Memcache are just run by Google for your convenience.
Most requests, according to the App Engine team, stay within the same datacenter - meaning latency is inter-rack and much lower than outside URLFetches.
A simple request handler and thin API layer for coordinating the in memory storage is all you need - in projects where I've set up backend caching, it's done a good job of fulfilling the need for more flexible in-memory storage - centralizing things definitely helps. The load balancing doesn't hurt either ;)

Related

Caching after GAE standard migration to Go 1.11/1.12

I've almost completed migrating based on google's instructions.
It's very nice to not have to call into the app-engine libraries whatsoever.
However, now I must replace my calls to app-engine-standard memcached.
Here's what the guide says: "To use a memcache service on App Engine, use Redis Labs Memcached Cloud instead of App Engine Memcache."
So is this my only option; a third party? They don't even list pricing on their page if GCE is selected.
I also see in the standard environment how-to guides there is a guide on Connecting to internal resources in a VPC network.
From that link it mentions Cloud Memorystore. I can't find any examples if this is advisable or possible to do on GAE standard. Of course it wasn't previously possible but now that GAE standard has become much more "standard", I think it should be possible?
Thanks for any advice on the best way forward.
Memorystore appears to be Google's replacement:
https://cloud.google.com/memorystore/
You connect to it using this guide:
https://cloud.google.com/appengine/docs/standard/go/using-memorystore
Alas it costs about $1.20/GB per day with no free quota.
Thus, if your data doesn't change, and requires less than 100MB of cache at a time, the first answer might be better (free). Also, your data won't explode the instance as you can control the max size of the cache.
However, if your data changes or you need more cache, MemoryStore is a more direct replacement to MemCache - just costs money.
I've been thinking about this. 2nd gen instances have twice the ram, so if global cache isn't required (as in items don't change once created - (name items using their sha256)), you can run your own local threadsafe memcache (such as https://github.com/dgraph-io/ristretto) and allocate some of the extra ram to it. It'll be faster than Memcache was, so requests can be serviced even faster, keeping the number of instances low.
You could make it global for data that does change, by using pub/sub between instances, but I think that's significantly more work.
To ease the migration to 1.12, I have been thinking of using this solution:
create a dedicated app using the 1.11 runtime.
setup twirp endpoints to act as a proxy for all the deprecated app engine services (memcache, mail, search...)

Google App Engine for long running but low CPU tasks, or long-polling?

App Engine has been great for requests that process quickly with no external API calls to databases or caches or third-party resources, but we've found that introducing any sort of "longer running" component or external latency (for example in a HTTP POST operation that runs asynchronously in the background and might take a second or two to process a few more intense database queries... totally invisible and OK from a UX perspective on the client-side because it's asynchronous but expensive to App Engine billing since it's long running) ... the "instance hours" compound and drive costs up considerably.
These sorts of expense inducing situations where a request is literally just waiting for a response from an external resource and requiring almost zero CPU during their idling seem avoidable, but I'm not sure if it's avoidable with App Engine.
It's almost like a "long poll" where the response might be left open but doing nothing.
Is there a way to do this on App Engine without just paying an insane amount for instance hours, or would we be better off moving to Compute Engine or EC2? Does it scale automatically based on CPU load, or is it based solely on open and perhaps inactive requests in total count? — threadsafe is indeed enabled.
There are really two ways to go about this one (top of mind).
Use Task Queues!
If the work doesn't need to be exactly at the same time of the request, this is exactly what [task queues] in App Engine are for. They allow you to put a job on a queue, and have another module pick up the work. They're kind of great because you can separately scale your front end and back end processes.
If that doesn't work....
Use App Engine Flexible
Under the hood App Engine Flexible is just running GCE instances. The cost structure is entirely different, since you persistently have a VM running in the background serving your requests.
Hope this helps!
What you're really worried about here is how App Engine scales your instances. Because many of your requests require few resources, your app might be able to handle many more concurrent requests on a single instance than normal. You can look into parameters that shape scaling here. Of particular interest:
max_concurrent_requests The number of concurrent requests an automatic scaling instance can accept before the scheduler spawns a new instance (Default: 8, Maximum: 80).
There is a danger here, where an instance may fill up with non-long-polling requests and become overburdened. To prevent that, you could isolate your long-polling requests into their own service and set its scaling parameters separately from the rest of your app.

google cloud storage performance characteristics (latency / request response time)

I'm considering building an app on App Engine, and I'm trying to decide if I should store data in the datastore or on google cloud storage.
Each object is going to be typically no more than around a kilobyte, perhaps a few kilobytes at most (and often less). It won't change too often.
I could have the client directly access the data, but though I might live without there would be some benefit to the app engine app accessing the data and using it as part of serving a response.
What are the performance characteristics of google cloud storage? How quickly do requests come back? I was able to find a status dashboard for the datastore which indicates that they are usually reasonably quick at handling requests but I've had trouble getting guidance on how fast GCS is.
Under the most recent price reductions, it seems like the datastore might actually be cheaper for my use case of relatively small chunks of data ($0.06/100,000 requests vs $0.01/10,000 class b operations). Am I interpreting that correctly?
This thread might give you some insights. Back in september 2013 I had about 200-250ms on average for "blank" sequential inserts. You can get a great speedup if you combine your requests. You can insert up to 500 entities in a single request. Which takes roughly 500ms-900ms if I remember correctly.

What are the rules of thumb when trying to decide if developing on Google App Engine platform is worthwhile

I have an idea for a web application and I am currently researching different platforms. I am really interested in Google App Engine, but it looks like it works pretty good for certain application types while it is less suitable for others (there are horror as well as success stories e.g. Goodbye Google App Engine vs. Why we are really happy with Google App Engine
There is also a similar negative story in this thread from 1 year ago, concluding GAE was not ready for commercial production platform: GAE as Production Platform. There are also other threads from 2009 talking about data select limits (1000 rows) that has since been lifted.
My app will essentially perform some mathematical analysis based on data pulled from external data feeds (could be some substantial amount of data), it would be real time only the first time data is downloaded for a specific item at hand and then stored and retrieved locally from the database at that point. There will be some additional external data pulls as scheduled intervals.
Based on this brief description, should I even bother starting on GAE? In general, what are the rules of thumb to try and decide if developing on GAE is suitable for a problem at hand? Also, what are the good examples of Apps in Production that use GAE. It looks like GAE App Gallery is not around anymore, but I would definitely appreciate any Web 2.0 App examples running on the app engine.
In your specific case I would double check these factors:
a. Is the mathematical analysis a long running CPU intensive job?
GAE is not designed for long running CPU intensive computational Jobs; this would lead to have an high billing cost and would force you to design your application to avoid some GAE limitations (10 minutes max per job, limited soft memory, CPU quota, etc. etc.).
b. Are you planning to retrieve external data using a mainstream API (twitter, yahoo, facebook)?
Your application shares the same pool of IPs with other applications; if the API you want to adopt does not allow authenticated request, your application will suffer hiccups caused by throttling/quota limits errors. I faced this problem here.
App Engine should work fine for your application. It's generally designed to serve, and to scale, sites that serve mostly user-facing traffic. Applications that it's not suitable for are things such as video transcoding, which rely heavily on backend processing, or things that have to shell out to native code, such as 3D graphics, etcetera.
Depends on what type of mathematical analysis are you doing. If your application is heavy in I/O, I would give it some pause. On GAE, you're kind of limited in your I/O options. You basically have the following:
RAM: I can't recall exactly, but GAE imposes a hard limit of around 200MB of RAM.
Datastore: You get plenty of space here, but it's slow compared to a cached local file system.
Memcache: Faster than datastore, but not nearly as fast as a cached disk. And worse, it's a cache, so there's no guarantee that it won't get wiped out.
External sources: These include calling out to external web-pages. Lots of flexibility, but very slow.
In sum, I would perhaps look at other options if you're doing heavy I/O on a medium-size dataset (>20MB and ~<2GB). These are probably non-issues for 90% of web-apps, although you should be aware of them.
All the negatives aside, working on GAE is a joyous experience. You spend more time programming and less time configuring. And it's really cheap.

Relative advantages of storage using Amazon Web Services S3 vs Google Application Engine

What do you see as the advantages and disadvantages of Amazon Web Services S3 compared with Google Application Engine? The cost per gigabyte for the two is, at the time I ask, roughly similar; I have not seen any widespread complaints about the quality of service; so I think the decision of which one to use may depend on the API (of all things).
Google's API breaks your content into what they call static content, such as your CSS files, favicons, images, etc and non-static dynamically-generated HTTP responses. Requests for static stuff will be served to whoever requests it until your bandwidth limit is reached; non-static requests will be fulfilled until your bandwidth or CPU limit is reached. With respect to your non-static requests, you can provide any logic you are able to express in Python, so you can be choosy about who you serve.
Amazon's API treats all your content as blobs in a bucket, and provides an access protocol that lets you distinguish between a variety of fulfillable requests ranging from world-readable to owner-only. If you want to something that's not in the kit, though, I don't know what you do beyond being thoughtful about distributing your URIs.
What differences do you see between the two? Are there other cloud storage services you like? Zetta had a press release today, but they're looking for a minimum of ten terabytes on the beta application, and none of my clients are there (yet); and Joyent will probably do something in the near future.
The way I see it is the Google App Engine basically provides a sandbox for you to deploy your app as long as it is written with their requirements (Python etc). Amazon gives you a virtual machine with a lot more flexibility in what can be done but probably more work on your side needed. MS new Azure seems to be going down the GAE route, but replace Python with .NET.
GAE has a limit of 10MB each on static files uploaded through appcfg.py (look right at the bottom of http://code.google.com/appengine/docs/python/tools/uploadinganapp.html). Obviously you can write code to slice large files into bits and reassemble at download time, but it suggests to me that Google doesn't expect App Engine to be used just as a simple CDN, and that if you want to use it as one you'll have to do some work. S3 does the job out of the box, all you have to do is grab a third-party interface app.
If you want to do something non-standard with file access on S3, then probably Amazon expects you to spring for a server instance on EC2. Once this is done, you have much more flexibility than GAE's environment, but you pay more (in cash and probably in maintenance).
The plus point for GAE is that it has "cheap" on its side for small apps (up to 1GB storage, 1GB bandwidth and 1.3 million hits a day are free: http://code.google.com/appengine/docs/quotas.html). Depending on your use, this might be significant, or it might be irrelevant on the scale of your total bandwidth costs.
Coincidentally, I have just this last couple of days looked at GAE for the first time. I took an old Perl CGI script and turned it into a GAE app, which is up and running. About 10 hours total, including reading the GAE introductory docs and remembering how Python is supposed to work enough to write a couple of hundred lines. I'd speculate that's more effort than loading a bunch of files onto S3, but less effort than maintaining EC2 server(s). However, I haven't used Amazon.
[Edited to add: this sounds like the advantages are all with Amazon for commercial purposes. This may well be true, but then GAE is not yet mature and presumably will get better from here fairly rapidly. They only let people start paying in December or so, before that it was free-quota-only except by special arrangement with Google. While Google sometimes takes flack for its claims of "perpetual beta", I think GAE genuinely is still starting up. If your app is a good fit for the BigTable data paradigm, then it might scale better on GAE than EC2. For storage I assume that S3 is already good enough for all reasonable purposes, and Google's clever architecture gives GAE no advantages to compensate when all you're doing is serving files.]
* Except that Google has just offered me a preview of GAE's Java support.
** Just noticed that you can set up chron jobs, but they're limited by the same rules as any other request (30 second runtime, can't modify files, etc).

Resources