Performance and use case between Bucket and App Engine Directory - google-app-engine

I have two objects store as pickle in file. The files need to be modified and updated in about bi-weekly basis (not regular). I wonder if it is better to just store them in App Engine folder or upload them into bucket? Note that I have the intention to automate the files modification through cron or App Engine. How about the read-write speed between the two options?

The read-write speed will indeed be higher from a disk in App Engine Flexible (App Engine Standard doesn't let you write to filesystem). On the other hand, it'll be tricky to manage if you want to rely on App Engine's scalability features. Let's say you have one instance, with the files on disk. The new spawned instance won't have those files on disk.
By storing those files on Cloud Storage, they'll be available to all App Engine's instance, no matter how many. You also ensure they all get the same version of those files. The downside is that access will be slower. If this is critical for your app, you may want to implement some cache mechanism to avoid retrieving the file from Storage every time. You can invalidate the cache on file update by implementing Cloud Pub/Sub Notifications for Cloud Storage.

Related

Caching cloud storage files on app engine

I am using app engine to serve a bunch of sklearn models. These models are around 100 mb in size, and there are around 25 of them.
Downloading them can take up to 15s at times, despite being in the designated app engine bucket, and is often dominating request times.
I currently use a FIFO cache layer wrapped around the GCS storage client, but cache hits aren't great as the different model are used quite interspersed and app engine memory is limited.
Memcache seems too small for this, and /tmp is also stored in RAM.
Is there a better solution for caching such files?
You can imagine different solution to solve your issue.
You can embed your models in your deployment. Like that, the model are already here with the service. When a new model version is released, you deployed a new app engine service revision
The problem with the precedent solution is the deployment frequency: when one of the model is updated you need to repackage and redeploy your App Engine service. The solution is the micro services. You can have 1 model per APp Engine service and therefore only deploy this one that has been updated. If you want only entry point, you can have a 26th app engine service wich is your entry point and will route the request to the correct model service.
You can also perform the same thing with Cloud Run, where you manage the container packaging and detail if you need special things. You have also more flexibility on the number of CPUs and the memory size.
Last point, after solving the download issue part, you could have cold start issue: the time that take your server to start and to load in memory your model (at the first request, when the instance start). Cloud Run proposes a min-instance feature to keep warm a certain number of instances and therefore to eliminate the cold start issue.

Best way to store many small files from memory to cloud storage

I wish to store a lot of images from memory on the cloud. I'm looking for the best cloud storage option for these images where I would have a minimum of http calls to make.
I currently use google cloud storage but batch upload from memory is not possible. To use google cloud storage I have to use either their client librairies and make one HTTP request per item I want to upload (very slow) or write all my files on disk (also very slow) to upload in batch.
What would be my best option?
DATA TRANSFER SERVICES, could be the best choice,whether you have access to a T1 line or a 10 Gbps network connection, Google offers solutions to meet your unique data transfer needs and get your data on the cloud quickly and securely.
Fast offline, online, or cloud-to-cloud data transfer.

To save big files, why use Google Storage rather than Java project?

When making a contents-file based bulletin with Java, why it is needed to use Google Storage, a separate storage service rather than just creating a folder within Java project itself?
Is there any file size limit that a Java project can become?
Is there any file size limit of a Java project to be uploaded for Google-App-Engine?
In App Engine, the local filesystem that your application is deployed to is not writeable. The flexible environment does allow you to write to a directory on a local disk, but written files are ephemeral, due to the fact that the disk gets initialized on each VM startup, so your files will get deleted on this occasion. You can read more related detail on the "Choosing an App Engine Environment" online documentation page.
You can profit from the facilities offered by Compute Engine, if writing to disc is a must. By default, each Compute Engine instance has a single root persistent disk that contains the operating system, as stated on the "Storage Options" page. For more insight, you may implement the proposed example from the "Running the Java Bookshelf on Compute Engine" page.
The Compute Engine does not handle scaling up of your applications automatically, so you'll have to take care of it yourself, as suggested in the "Autoscaling Groups of Instances" document.
There is a maximum size limit of 5 TB for individual objects stored in Cloud Storage. An application is limited to 10,000 uploaded files per version. Each file is limited to a maximum size of 32 megabytes. You may find related details on the "Quotas | App Engine Documentation | Google Cloud" page.

Programmers should avoid writing to the local file system in cloud?

Should programmers avoid writing to the local file system when writing application to be deployed to a cloud?
Does this recommendation apply only to this particular cloud provider (Cloud Foundry)?
In short, yes, you probably should avoid it.
Most cloud providers - just like Cloud Foundry - recommend that you only keep ephemeral data (like caches) on your local disk, since a single machine may fail or reboot for upgrade or re-balancing at any time and you don't necessarily get the same machine back after a restart.
Many providers provide alternate SAN/SMB mountable disks which you can use for persistent data.

Incrementally creating a 100k+ lines file with App Engine

I want to export some data from our App Engine app - the current data set is around 70k (will grow) entities which need to be filtered.
The filtering is done with a cron job (app engine task), 1k batch at a time. Is there a mechanism which will allow me to add lines to an existing file, rather than uploading it in bulk (like Google Cloud Storage requires)?
You can use the Datastore API to access the Datastore from your own PC or a Compute Engine instance and write all the entities to your hard drive (or Compute Engine instance). It's different from using the Datastore from within the App Engine instances, but only slightly, so you should have no problems writing the code.
I must observe, however, that writing 100 files to the Cloud Storage with 1,000 entities in each sounds like a good solution to me. Whatever you want to do with these records later, having 100 smaller files instead of one large super-file may be a good idea.

Resources