Does memcached share across servers in google app engine? - google-app-engine

On the memcached website it says that memcached is a distributed memory cache. It implies that it can run across multiple servers and maintain some sort of consistency. When I make a request in google app engine, there is a high probability that request in the same entity group will be serviced by the same server.
My question is, say there were two servers servicing my request, is the view of memcached from these two servers the same? That is, do things I put in memcached in one server reflected in the memcached instance for the other server, or are these two completely separate memcached instances (one for each server)?
Specifically, I want each server to actually run its own instance of memcached (no replication in other memcached instances). If it is the case that these two memcached instances update one another concerning changes made to them, is there a way to disable this?
I apologize if these questions are stupid, as I just started reading about it, but these are initial questions I have run into. Thanks.

App Engine does not really use memcached, but rather an API-compatible reimplementation (chiefly by the same guy, I believe -- in his "20% time";-).
Cached values may disappear at any time (via explicit expiration, a crash in one server, or due to memory scarcity in which case they're evicted in least-recently-used order, etc), but if they don't disappear they are consistent when viewed by different servers.

The memcached server chosen doesn't depend on the entity group that you're using (the entity group is a concept from the datastore, a different beast).
Each server runs its own instance of memcached, and each server will store a percentage of the objects that you store in memcache. The way it works is that when you use the memcached API to store something under a given key, a memcached server is chosen (based on the key).
There is no replication between memcached instances, if one of those boxes goes down, you lose 1/N of your memcached' data (N being the number of memcached instances running in AppEngine).

Typically, memcached does not share data between servers. The application server hashes the key to choose a memcached server, and then communicates with that server to get or set the data.

Based in what I know, there is only ONE instance of Memcache of you entire application, there could be many instance of your code running each one with their memory, and many datastore around the world, but there is only one Memcache server at a time, and keep in mind that this susceptible to failure service, even is no SLA for it.

Related

How to sync two postgresql databases in real time

I have two postgresql databases configured for replication. Master node is sending new data in real time to secondary node and it works just fine. But there is one disadvantage - secondary node is read-only, so no write requests are accepted.
What I need is to be able to perform write/read operations on both of the databases and still be able to have them in perfect sync, so they are identical.
What is the best solution for such requirements?
Just for the context of this question:
I have two web app instances that are deployed in two different locations in the world (very high delay in sending requests is the reason I decided to deploy one instance locally in each location). Both instances are fetching the same data but they are also able to generate some data and input it into the DB. It is impossible to have only one DB due to too big delay when fetching data.
Maybe my solution is not perfect, I'm open for any suggestion really because I'm out of ideas how to make it work smoothly and maybe I'm lacking some knowledge.
Thanks

Writing to many replicas of MongoDB

Let's say I have a distributed application that writes to the database. To decrease latency one of the instances (app + database) is hosted in Australia and another one is hosted in Europe. Both instances of database need to share the same data.
So what we are after here is data locality. The reason for it is obvious: we don't want users in Australia shooting requests to our database in Europe because that would increase latency.
The natural choice would be to deploy both instances of database in a one replica set. But it seems that with MongoDB you can write to only one Mongo instance within replica set.
What are the strategies with MongoDB to have two instances of database, sharing the same data, to which you can write to? Or is the MongoDB just a wrong choice for this requirement?
Huge subject, but i'll try to give you a short and simple answer :
As your two instances must share the same sata, you can't use sharded cluster with zones . But replica set can be your solution :
Create a replica set with at least the following :
a server in a 'neutral' zone. It will be the primary server (set a priority higher). This server, as long as it still primary, will handle your write operations.
your two existing servers with lower priority.
Set in your application Read Preference to 'nearest'. This way, your read operations will be handle by the server having the mower network latency, regardless of the Master/secondary roles of server.
But i highly recommand you to check the documentation, to see how correctly deploy this architecture. Here's a good start
EDIT
Some consideration about this solution :
This use case is one of the rare use case where it's better to read from secondaries. In general, prefer reading your data from MASTER, since replica set is done for high availability, not for scalability.
If some of your data can be 'located' to be accessed faster, consider sharding collections as a better solution

How to programmatically look up google app instances

I have implemented instance mem-caches because we have very static data and the memcache is not very reliable and rather slow compared to an instance cache.
However there is some situations where I would like to invalidate the instance caches. Is there any way to look them up?
Example
Admin A updates a large gamesheet on instance A and that instance looks up all other instances and update the data using a simple REST api.
TL;DR: you can't.
Unlike backends, frontend instances are not individually addressable; that is, there is no way for you to make a RESTy URLFetch call to a specific frontend instance. Even if they were, there is no builtin mechanism for enumerating frontend instances, so you would need to roll your own, e.g. keeping a list of live instances in the datastore and adding to it in a warmup request and removing on repeated connect failure. But at that point you've just implemented a slower, more costly, and less available memcache service.
If you moved all the cache services to backends (using your instance-local static, or, for instance, running a memcached written in Go as a different app version), it's true you would gain a degree of control (or at least transparency) regarding evictions. Availability, speed, and cost would still likely suffer.

Is there an App Engine shared memory or equivalent solution?

I'm building a GAE app that requires a cryptographic key to operate. I would like to avoid storing the key in code or in a persistent datastore, and instead upload the key whenever I start my app so that it will only reside in memory for the duration of the app's lifetime (from the time I upload the key until no instances are running.)
I understand that this is possible to do with a resident backend, but this seems too expensive (cheapest backend is currently 58$/month) just to keep one value in memory and serve it to other instances on demand.
Note that I'm not looking for a general robust shared-memory solution, just one value that is basically written once and read many times. Thanks.
I don't think that this can work the way you hope. The sources of data in GAE:
Files deployed with your app (war or whatever).
Per-instance memory (front-end or back-end).
Memcache.
Datastore (or SQL now, I suppose).
Blobstore.
Information retrieved via http requests (i.e. store it somewhere else).
1 and 4 are out, as per your question. 2 doesn't work by itself because the starting and stopping of instances is out of your control (it wouldn't scale otherwise), and persistent instances are expensive. 3 doesn't work by itself because Memcache can be cleared at any time. 5 is really no different than the datastore, as it is permanently stored on the Google's servers. Maybe you could try 6 (store it somewhere else), and retrieve it into per-instance memory during the instance startup. But I suspect that is no better security-wise (and, for that matter, doesn't match with what you said that you wanted).
It seems that a Memcache and local memory solution might work if you:
have your server instances clear the memcached key on exit and
existing server instances write/refresh the key regularly (for
example on every request).
That way the key will likely be there as long as an instance is operational and most likely not be there when starting up cold.
The same mechanism could also be used to propagate a new key and/or cycle server instances in the event of a key change.

Caching in Google App Engine/Cloud Based Hosting

I am curious as to how caching works in Google App Engine or any cloud based application. Since there is no guarantee that requests are sent to same sever, does that mean that if data is cached on 1st request on Server A, then on 2nd requests which is processed by Server B, it will not be able to access the cache?
If thats the case (cache only local to server), won't it be unlikely (depending on number of users) that a request uses the cache? eg. Google probably has thousands of servers
With App Engine you cache using memcached. This means that a cache server will hold the data in memory (rather than each application server). The application servers (for a given application) all talk the same cache server (conceptually, there could be sharding or replication going on under the hoods).
In-memory caching on the application server itself will potentially not be very effective, because there is more than one of those (although for your given application there are only a few instances active, it is not spread out over all of Google's servers), and also because Google is free to shut them down all the time (which is a real problem for Java apps that take some time to boot up again, so now you can pay to keep idle instances alive).
In addition to these performance/effectiveness issues, in-memory caching on the application server could lead to consistency problems (every refresh shows different data when the caches are not in sync).
Depends on the type of caching you want to achieve.
Caching on the application server itself can be interesting if you have complex in-memory object structure that takes time to rebuild from data loaded from the database. In that specific case, you may want to cache the result of the computation. It will be faster to use a local cache than a shared memcache to load if the structure is large.
If having consistent value between in-memory and the database is paramount, you can do some checksum/timestamp check with a stored value on the datastore, every time you use the cached value. Storing checksum/timestamp on a small object or in a global cache will fasten the process.
One big issue using global memcache is ensuring proper synchronization on "refilling" it, when a value is not yet present or has been flushed. If you have multiple servers doing the check at the exact same time and refilling value in cache, you may end-up having several distinct servers doing the refill at the same time. If the operation is idem-potent, this is not a problem; if not, a potential and very hard to trace bug.

Resources