I've almost completed migrating based on google's instructions.
It's very nice to not have to call into the app-engine libraries whatsoever.
However, now I must replace my calls to app-engine-standard memcached.
Here's what the guide says: "To use a memcache service on App Engine, use Redis Labs Memcached Cloud instead of App Engine Memcache."
So is this my only option; a third party? They don't even list pricing on their page if GCE is selected.
I also see in the standard environment how-to guides there is a guide on Connecting to internal resources in a VPC network.
From that link it mentions Cloud Memorystore. I can't find any examples if this is advisable or possible to do on GAE standard. Of course it wasn't previously possible but now that GAE standard has become much more "standard", I think it should be possible?
Thanks for any advice on the best way forward.
Memorystore appears to be Google's replacement:
https://cloud.google.com/memorystore/
You connect to it using this guide:
https://cloud.google.com/appengine/docs/standard/go/using-memorystore
Alas it costs about $1.20/GB per day with no free quota.
Thus, if your data doesn't change, and requires less than 100MB of cache at a time, the first answer might be better (free). Also, your data won't explode the instance as you can control the max size of the cache.
However, if your data changes or you need more cache, MemoryStore is a more direct replacement to MemCache - just costs money.
I've been thinking about this. 2nd gen instances have twice the ram, so if global cache isn't required (as in items don't change once created - (name items using their sha256)), you can run your own local threadsafe memcache (such as https://github.com/dgraph-io/ristretto) and allocate some of the extra ram to it. It'll be faster than Memcache was, so requests can be serviced even faster, keeping the number of instances low.
You could make it global for data that does change, by using pub/sub between instances, but I think that's significantly more work.
To ease the migration to 1.12, I have been thinking of using this solution:
create a dedicated app using the 1.11 runtime.
setup twirp endpoints to act as a proxy for all the deprecated app engine services (memcache, mail, search...)
Related
I am using app engine to serve a bunch of sklearn models. These models are around 100 mb in size, and there are around 25 of them.
Downloading them can take up to 15s at times, despite being in the designated app engine bucket, and is often dominating request times.
I currently use a FIFO cache layer wrapped around the GCS storage client, but cache hits aren't great as the different model are used quite interspersed and app engine memory is limited.
Memcache seems too small for this, and /tmp is also stored in RAM.
Is there a better solution for caching such files?
You can imagine different solution to solve your issue.
You can embed your models in your deployment. Like that, the model are already here with the service. When a new model version is released, you deployed a new app engine service revision
The problem with the precedent solution is the deployment frequency: when one of the model is updated you need to repackage and redeploy your App Engine service. The solution is the micro services. You can have 1 model per APp Engine service and therefore only deploy this one that has been updated. If you want only entry point, you can have a 26th app engine service wich is your entry point and will route the request to the correct model service.
You can also perform the same thing with Cloud Run, where you manage the container packaging and detail if you need special things. You have also more flexibility on the number of CPUs and the memory size.
Last point, after solving the download issue part, you could have cold start issue: the time that take your server to start and to load in memory your model (at the first request, when the instance start). Cloud Run proposes a min-instance feature to keep warm a certain number of instances and therefore to eliminate the cold start issue.
Regarding a production environment, I would like to know if the python standard environment (2.7) at Google App Engine supports code with multiprocessing and pooling? Using Google´s datastore. Or should Map Reduce be used instead?
And regarding development environment in a localhost, also I would like to know, how to avoid a database lock when writing to the same database from processes started from different shell terminals?
Thanks
You can have a look at this post on Google Groups, where it is confirmed that multiprocessing is not available in Google App Engine (GAE) Standard environment, but you can implement it in GAE Flexible. You might also be interested in this post about parallel execution in GAE, and Tasklets in particular with a Cloud Datastore example.
Regarding database lock:
Updates are actually done within a datastore transaction and NDB by default will retry the operation three times before failing altogether. It is recommended you only update an entity group once per second at the most. If you are seeing database locks, then you're probably doing something wrong. We implemented a version of the "fork-join queue" described by Brett Slatkin back in 2010 data pipelines talk, which is a method of "joining" many updates to the same entity such that they can all be applied at once at a controlled rate: https://www.youtube.com/watch?v=zSDC_TU7rtc&feature=youtu.be&t=33m37s
also, see the discussion going on here:
How to deal with eventual consistency in fork-join-queue
I have implemented instance mem-caches because we have very static data and the memcache is not very reliable and rather slow compared to an instance cache.
However there is some situations where I would like to invalidate the instance caches. Is there any way to look them up?
Example
Admin A updates a large gamesheet on instance A and that instance looks up all other instances and update the data using a simple REST api.
TL;DR: you can't.
Unlike backends, frontend instances are not individually addressable; that is, there is no way for you to make a RESTy URLFetch call to a specific frontend instance. Even if they were, there is no builtin mechanism for enumerating frontend instances, so you would need to roll your own, e.g. keeping a list of live instances in the datastore and adding to it in a warmup request and removing on repeated connect failure. But at that point you've just implemented a slower, more costly, and less available memcache service.
If you moved all the cache services to backends (using your instance-local static, or, for instance, running a memcached written in Go as a different app version), it's true you would gain a degree of control (or at least transparency) regarding evictions. Availability, speed, and cost would still likely suffer.
I've just finished watching the Google IO 2011 presentation on AppEngine backends (http://www.google.com/events/io/2011/sessions/app-engine-backends.html) which piqued my curiosity about using a backend instance for somewhat more reliable and configurable in-memory caching. It could be an interesting option as a third layer of cache, under in-app caching and memcache, or perhaps as a substitute for some cases where higher reliability is desirable.
Can anyone share any experience with this? Googling around doesn't reveal much experimentation here. Does the latency of a URLfetch to retrieve a value from a backend's in-memory dictionary render it less attractive, or is it not much worse than a memcache RPC?
I am thinking of whipping up some tests to see for myself, but if I can build on the shoulder of giants...thanks for any help :)
Latency between a backend and frontend instance is extremely low.
If you think about it, all App Engine RPC's are fulfilled with "backend instances". The backends for the Datastore and Memcache are just run by Google for your convenience.
Most requests, according to the App Engine team, stay within the same datacenter - meaning latency is inter-rack and much lower than outside URLFetches.
A simple request handler and thin API layer for coordinating the in memory storage is all you need - in projects where I've set up backend caching, it's done a good job of fulfilling the need for more flexible in-memory storage - centralizing things definitely helps. The load balancing doesn't hurt either ;)
What do you see as the advantages and disadvantages of Amazon Web Services S3 compared with Google Application Engine? The cost per gigabyte for the two is, at the time I ask, roughly similar; I have not seen any widespread complaints about the quality of service; so I think the decision of which one to use may depend on the API (of all things).
Google's API breaks your content into what they call static content, such as your CSS files, favicons, images, etc and non-static dynamically-generated HTTP responses. Requests for static stuff will be served to whoever requests it until your bandwidth limit is reached; non-static requests will be fulfilled until your bandwidth or CPU limit is reached. With respect to your non-static requests, you can provide any logic you are able to express in Python, so you can be choosy about who you serve.
Amazon's API treats all your content as blobs in a bucket, and provides an access protocol that lets you distinguish between a variety of fulfillable requests ranging from world-readable to owner-only. If you want to something that's not in the kit, though, I don't know what you do beyond being thoughtful about distributing your URIs.
What differences do you see between the two? Are there other cloud storage services you like? Zetta had a press release today, but they're looking for a minimum of ten terabytes on the beta application, and none of my clients are there (yet); and Joyent will probably do something in the near future.
The way I see it is the Google App Engine basically provides a sandbox for you to deploy your app as long as it is written with their requirements (Python etc). Amazon gives you a virtual machine with a lot more flexibility in what can be done but probably more work on your side needed. MS new Azure seems to be going down the GAE route, but replace Python with .NET.
GAE has a limit of 10MB each on static files uploaded through appcfg.py (look right at the bottom of http://code.google.com/appengine/docs/python/tools/uploadinganapp.html). Obviously you can write code to slice large files into bits and reassemble at download time, but it suggests to me that Google doesn't expect App Engine to be used just as a simple CDN, and that if you want to use it as one you'll have to do some work. S3 does the job out of the box, all you have to do is grab a third-party interface app.
If you want to do something non-standard with file access on S3, then probably Amazon expects you to spring for a server instance on EC2. Once this is done, you have much more flexibility than GAE's environment, but you pay more (in cash and probably in maintenance).
The plus point for GAE is that it has "cheap" on its side for small apps (up to 1GB storage, 1GB bandwidth and 1.3 million hits a day are free: http://code.google.com/appengine/docs/quotas.html). Depending on your use, this might be significant, or it might be irrelevant on the scale of your total bandwidth costs.
Coincidentally, I have just this last couple of days looked at GAE for the first time. I took an old Perl CGI script and turned it into a GAE app, which is up and running. About 10 hours total, including reading the GAE introductory docs and remembering how Python is supposed to work enough to write a couple of hundred lines. I'd speculate that's more effort than loading a bunch of files onto S3, but less effort than maintaining EC2 server(s). However, I haven't used Amazon.
[Edited to add: this sounds like the advantages are all with Amazon for commercial purposes. This may well be true, but then GAE is not yet mature and presumably will get better from here fairly rapidly. They only let people start paying in December or so, before that it was free-quota-only except by special arrangement with Google. While Google sometimes takes flack for its claims of "perpetual beta", I think GAE genuinely is still starting up. If your app is a good fit for the BigTable data paradigm, then it might scale better on GAE than EC2. For storage I assume that S3 is already good enough for all reasonable purposes, and Google's clever architecture gives GAE no advantages to compensate when all you're doing is serving files.]
* Except that Google has just offered me a preview of GAE's Java support.
** Just noticed that you can set up chron jobs, but they're limited by the same rules as any other request (30 second runtime, can't modify files, etc).