Dedicated memcache vs shared memcache (python, google appengine) - google-app-engine

So basically, I am trying to decide if I should go for dedicated memcache.
My scenario is as follows:
I am working on an application that provides real time analysis for some public data.
I am going to maintain a total 15kb key/value memcache (20 keys, variable values)
Meanwhile values are constantly changing (total keys/values around 20 updates every 3 seconds)
Hits to the website will perform request for these keys (also around a request per 3 seconds)
I am assuming that 10000 users hit the website immediately, this will produce about 20 * 10000 requests each 3 seconds.
Considering the size of the memcache (relatively small), but also the number of requests produced about 7000/second (memcache key/value access), would dedicated memcache be more of "risk averse" deal for this situation.
Thanks,

The cached data seems crucial to the correct operation of your web application. If you lost data it might be a disservice to thousands of users! Hopefully your app also periodically persists the cached data and automatically recovers from cache erasure.
Although the size of the data is small, shared memcache still has more risk than dedicated memcache of evicting some or all of the data at unpredictable times. The design must also deal correctly with partial data loss. Not only your memory pressure but also that from other applications and cloud operational factors are more likely to result in AppEngine discarding shared cache.

With this data size you will not get any benefits from using the dedicated memcache.
The rate at which you access memcache does is not relevant for this decision.

Related

Memory usage of Google App Engine instance

I am developping an application using App Engine to collect, store and deliver data to users.
During my tests, I have 4 data sources which send HTTP POST requests to the server every 5s (all requests are exactly uniform).
The server stores received data to the datastore using Objectify.
At the beginning, all requests are manage by 1 instance (class F1) with 0.8 QPS, a latency of 80ms and 80MB of memory.
But during the following hours, the used memory increases and goes over the limit of F1 Instance.
However, the scheduler doesn't start another instance. When I stop all traffic, average memory never decreases.
Now I have 150MB memory instead of 128MB (limit of F1 class) and I stopped all the traffic.
I Tried to set performance settings manually or automatic, disable Appstats without any improvement.
I use Memcache and datastore, don't have any cron or task queues and the traffic is always the same.
What are the possible reasons the average memory increase?
Is it a bug of the admin console?
Which points define the quantity of memory used per request?
Another question:
Does Google have special discount for datastore read/write ( >30 million ops / day ) ?
Thank you,
Joel
Regarding the special price, I don't think there is. If your app needs this amount of read/write quota you should look into optimizing to minimize write and perhaps implement some sort of bulk writing if possible.
On the memory issue. You should post your code in order to get a straight answer since there are too many things to look into when discussing memory usage. Knowing more about your case will help in producing a straight answer.
Cheers,
Kjartan

Memcache System - Keys getting dropped frequently

We currently have our application hosted in Google App Engine. Billing is enabled to that application. This application is still in beta that we are using for testing purpose. We have a logic of serving data from the Memcache if present, if not then we get the data from the datastore and update the memcache and serve the data. We are encountering strange behaviour related to Memcache. The data related to some keys in Memcache is getting dropped after few minutes after being set. We tried setting expiration time for the keys in the memcache, even that does not seem to work. Since the data is getting dropped from the memcache the data is again from the datastore which is increasing the billing for our application.
Currently nearly 80% of the billing is related to datastore read. The datastore read is high as the memcache is not working efficiently as it should be. Any insight why we are facing this issue would be really helpful.
Just an FYI, we are having around 75000 keys in the memcache with total size of 100 MB data. Our structure demands keeping such large number of keys in memcache, which I think should not be an issue.
Our application is being by 10 users and the billing amount per day is coming to around $40.
Thanks,
Krish
Unfortunately memcache will evict keys as and when it requires. Setting the time to expire only means the item will be in memcache for up to the expiry time.
Take a look at the docs regarding eviction.
Also, take a look at this for some more insight into ways around memcache issues.
Regarding your data structure, perhaps you could post a new question and we can see if others have advice for you.

Caching in Google App Engine/Cloud Based Hosting

I am curious as to how caching works in Google App Engine or any cloud based application. Since there is no guarantee that requests are sent to same sever, does that mean that if data is cached on 1st request on Server A, then on 2nd requests which is processed by Server B, it will not be able to access the cache?
If thats the case (cache only local to server), won't it be unlikely (depending on number of users) that a request uses the cache? eg. Google probably has thousands of servers
With App Engine you cache using memcached. This means that a cache server will hold the data in memory (rather than each application server). The application servers (for a given application) all talk the same cache server (conceptually, there could be sharding or replication going on under the hoods).
In-memory caching on the application server itself will potentially not be very effective, because there is more than one of those (although for your given application there are only a few instances active, it is not spread out over all of Google's servers), and also because Google is free to shut them down all the time (which is a real problem for Java apps that take some time to boot up again, so now you can pay to keep idle instances alive).
In addition to these performance/effectiveness issues, in-memory caching on the application server could lead to consistency problems (every refresh shows different data when the caches are not in sync).
Depends on the type of caching you want to achieve.
Caching on the application server itself can be interesting if you have complex in-memory object structure that takes time to rebuild from data loaded from the database. In that specific case, you may want to cache the result of the computation. It will be faster to use a local cache than a shared memcache to load if the structure is large.
If having consistent value between in-memory and the database is paramount, you can do some checksum/timestamp check with a stored value on the datastore, every time you use the cached value. Storing checksum/timestamp on a small object or in a global cache will fasten the process.
One big issue using global memcache is ensuring proper synchronization on "refilling" it, when a value is not yet present or has been flushed. If you have multiple servers doing the check at the exact same time and refilling value in cache, you may end-up having several distinct servers doing the refill at the same time. If the operation is idem-potent, this is not a problem; if not, a potential and very hard to trace bug.

using memcached for very small data, good idea?

I have a very small amount of data (~200 bytes) that I retrieve from the database very often. The write rate is insignificant.
I would like to get away from all the unnecessary database calls to fetch this almost static data. Is memcached a good use for this? Something else?
If it's of any relevance I'm running this on GAE using python. The data in question could be easily (de)serialized as json.
Memcache is well-suited for this - reading from the datastore is much more expensive than reading from memcache. This is especially true for small amounts of data for which the cost to retrieve is dominated by latency to the datastore.
If your app receives enough requests that instances typically stay alive for a little while, then you could go one step further and use App Caching to largely avoid memcache too. (Basically, cache the value in a global variable, and also app-cache the time the value was last updated. Provide an accessor for the value which retrieves the latest from memcache/db if it hasn't been updated in X minutes). Memcache is pretty cheap though, so this extra work might only make sense if you access this variable rather frequently.
If it changes less often than once per day, you could just hardcode it in webapp code, and reupload the file each time it changes.

How often does memcache on Google AppEngine lose data?

Memcache in general and on AppEngine in specific is unreliable in the sense that my data may be deleted from the cache for whatever reason at any point in time. However, in some cases there might be cases where a small risk may be worth the added performance using memcache could give, such as updating some data in memcache that gets saved periodically to some other, more reliable storage. Are there any numbers from Google that could give me an indication of the actual probability that a memcache entry would be lost from the cache before its expiration time, given that I keep within my quotas?
Are there any reasons other than hardware failure and administrative operations such as machines at the data centers being upgraded/moved/replaced that would cause entries to be removed from memcache prematurely?
Memcache, like any cache, should be used as... a cache. If you can't find something in the cache, there must be a strategy to find it in permanent storage.
In addition to the reasons you mention, Memcache and other caching approaches have limits to the amount of items they will hold (discarding usually the least recently used ones when the cache is full), and often also set other cache invalidation policies (e.g. flush everything unused for one hour).
If you don't configure and operate the cache yourself, you have NO guarantee of when and how items might be removed from the cache intentionally / by design.
Any concrete answer you get to this question is 100% subject to change.
That said, I've used memcache under light loads to accumulate data for 15 minutes or so before writing it all to the Datastore. This was for totally non-critical analytic data though. Do not depend on it.
It's not that data can be lost, but that if it is lost, it can be easily regained.
For example, using it to store data from the datastore is ideal, in that if a piece of data is not in the cache, it can be easily fetched.
If you're storing data such as a hit counter in the cache, it can't be regained if the cache is cleared, so you'll lose data.
If you're concerned about load for a common job, how about setting a job to update the counter later, using the task queue?
I have implemented a shared-memcache based stats counter that collects hourly to DB and can identify cache loss (log it). So far I see constantly <10% cache-losses total each day after at most 1h (average 30 mins) cache time with about 60 active counters. Counter losses appear to be random single counters. I suspect, that counters that are incremented only once (occurs quite often in my case) could have higher probability of being dropped.
My App uses <1MB total memcache in the shared memcache system. Unfortunately using dedicated memcache with 1GB minimum and substantial costs per year is out of the question. Stats counter used.
I have created a stackdriver counter that records memcache losses for a counter that is saved every full hour. The graph shows successful saves in red and memcache fails in blue. The counter saves every full hour and has a few counts in the hour.

Resources