Should programmers avoid writing to the local file system when writing application to be deployed to a cloud?
Does this recommendation apply only to this particular cloud provider (Cloud Foundry)?
In short, yes, you probably should avoid it.
Most cloud providers - just like Cloud Foundry - recommend that you only keep ephemeral data (like caches) on your local disk, since a single machine may fail or reboot for upgrade or re-balancing at any time and you don't necessarily get the same machine back after a restart.
Many providers provide alternate SAN/SMB mountable disks which you can use for persistent data.
Related
I am thinking to use cloud memory store redis database with policy set to noeviction, sort of persistent database to serve the client. Wondering what could be the downside of this?
Of course we will keep instance memory on higher side to make sure incoming keys can accommodate. Are there any chances keys can lost while sort of infra restructuring or failover or patching happen at cloud provider end?
Thanks in advance
There are still chances that keys will be lost in case of unplanned restarts. Failovers only work during instance crashes or scheduled maintenance, and will not work on manual restarts. GCP also has two Redis tier capabilities. Only the Standard tier supports failovers.
Both offers maximum instance size of 300GB and maximum network bandwidth of 12Gbps. The advantage of having Standard tier is that it provides redundancy and availability using replication, cross-zone replication and automatic failover.
noeviction is only a policy that makes sure that all keys are not evicted and not replaced regardless of how old they are. It only returns an error when the Redis instance reaches maxmemory. It still doesn't cover other persistence features like point-in-time snapshot and AOF persistence, which unfortunately Memorystore doesn't support yet.
Since Memorystore does not cover your entire use case, my suggestion is to use Redis open source instead. You can quickly provision and deploy a Redis VM instance from the GCP Markeplace.
You can check out the full features in the documentation.
I would like to switch to a different VPS host. How would I transfer a DigitalOcean droplet to another host?
It's a pain. I use DigitalOcean, AWS, Google Cloud, and Vultr. It's hypothetically possible, mad the best explanation is here; however, as they point out, while it may be difficult for servers that have been active for a very long time, a fresh start is at least worthy of consideration, given the incompatibility of the file formats big cloud services use for snapshots.
Also, keep in mind that even if you can get the snapshot to boot on AWS or Azure or wherever you're going, it's likely the virtual network configuration will be totally different. You're going to be stuck doing network configuration and possibly new reverse proxies and the like. Probably only accessible through the slow browser shell your new provider offers. Do not recommend, from experience.
I've almost completed migrating based on google's instructions.
It's very nice to not have to call into the app-engine libraries whatsoever.
However, now I must replace my calls to app-engine-standard memcached.
Here's what the guide says: "To use a memcache service on App Engine, use Redis Labs Memcached Cloud instead of App Engine Memcache."
So is this my only option; a third party? They don't even list pricing on their page if GCE is selected.
I also see in the standard environment how-to guides there is a guide on Connecting to internal resources in a VPC network.
From that link it mentions Cloud Memorystore. I can't find any examples if this is advisable or possible to do on GAE standard. Of course it wasn't previously possible but now that GAE standard has become much more "standard", I think it should be possible?
Thanks for any advice on the best way forward.
Memorystore appears to be Google's replacement:
https://cloud.google.com/memorystore/
You connect to it using this guide:
https://cloud.google.com/appengine/docs/standard/go/using-memorystore
Alas it costs about $1.20/GB per day with no free quota.
Thus, if your data doesn't change, and requires less than 100MB of cache at a time, the first answer might be better (free). Also, your data won't explode the instance as you can control the max size of the cache.
However, if your data changes or you need more cache, MemoryStore is a more direct replacement to MemCache - just costs money.
I've been thinking about this. 2nd gen instances have twice the ram, so if global cache isn't required (as in items don't change once created - (name items using their sha256)), you can run your own local threadsafe memcache (such as https://github.com/dgraph-io/ristretto) and allocate some of the extra ram to it. It'll be faster than Memcache was, so requests can be serviced even faster, keeping the number of instances low.
You could make it global for data that does change, by using pub/sub between instances, but I think that's significantly more work.
To ease the migration to 1.12, I have been thinking of using this solution:
create a dedicated app using the 1.11 runtime.
setup twirp endpoints to act as a proxy for all the deprecated app engine services (memcache, mail, search...)
In our application we have a server which contains entities along with their relations and processing rules stored in DB. To that server there will be n no.of clients like raspberry pi , gateways, android apps are connected.
I want to push configuration & processing rules to those clients, so when they read some data they can process on their own. This is to make the edge devices self sustainable, avoid outages when server/network is down.
How to push/pull the configuration. I don't want to maintain DBs at client and configure replication. But the problem is maintenance and patching of DBs for those no.of client will be tough.
So any other better alternative.?
At the same time I have to push logs to upstream (server).
Thanks in advance.
I have been there. You need an on-device data store. For this range of embedded Linux, in order of growing development complexity:
Variables: Fast to change and retrieve, makes sense if the data fits in memory. Lost if the process ends.
Filesystem: Requires no special libraries, just read/write access somewhere. Workable if the data is small enough to fit in memory and does not change much during execution (read on startup when lacking network, write on update from server). If your data can be structured as a few object variables, you could write them to JSON files, and there is plenty of documentation on other file storage options for Android apps.
In-memory datastore like Redis: Lightweight dependency, can automate messaging and filesystem-stored backup. Provides a managed framework/hybrid of the previous two.
Lightweight databases, especially SQLite: Lightweight SQL database, stored in one file and popular with Android apps (probably already installed on many of the target devices). It could work for frequent changes on a larger block of data in a memory-constrained environment, but does not look like a great fit. It gets worse for anything heavier.
Redis replication is easy, but indiscriminate, so mainly sensible if your devices receive a changing but identical ruleset. Otherwise, in all these cases, the easiest transfer option may be to request and receive the whole configuration (GET a string, download a JSON file, etc.) and parse the received values.
I am wondering what the stats are for different ways of storing (and therefore retrieving) content. Are there any charts out there, or do you guys have any quick tests to show, the requests per second, etc., of:
Direct (local) database access, vs.
HTTP Access to cached data, vs.
HTTP Access to uncached data (remote database), vs.
Direct File access
I am wondering to judge how necessary it is to locally cache data if I'm using remote services.
Thanks!
.. what the stats are ...
Although some people may have published their findings, this will not map directly to your experience - you may find the opposite of they discovered.
Sometimes it may be faster to retrieve files from a database than a file - it depends on the size of the file, the filesystem or DBMS it resides on, the other data which affects the access path (e.g. indexes, number of I/O operations to dereference the start of the file...) the underlying hardware, the amount of caching available, the presence of the data or information relating to its location in the cache and the interaction between each of these factors.
And that's before you start considering the additional variables introduced when you start talking about HTTP, which also implies remote network access.
While ultimately any file would need to be read from the filesystem at some point, this suggests that direct file access would be the fastest method (but only on the local machine) however if you consider centralized caching and concurrency this is not necessarily the case.
I am wondering to judge how necessary it is to locally cache data if I'm using remote services.
Rather hard to say. How remote? what are your bandwidth costs? Latency? What level of service do you hope to provide? Does the remote system provide caching information already? How do you deal with cache invalidations?
If we knew everything about your application, the data source, your customers and networks connecting them and your budget for implementing the service then we might hazard a guess. And, yes, caching on the MITM server probably is a good idea but only if you know that you're not breaking anything by using caching.
C.