Static file hosting limits on App Engine with Go - google-app-engine

I've been trying to find documentation on static file hosting with App Engine and Go for a while now, but can't find anything current in the documentation.
I've found a number of (unofficial) references from about 5 years ago to a 10,000 file limit with a max of 1,000 per directory, but I haven't been able to find any current official documentation on this other than information on billing for static files.
So what are the static file hosting limits on App Engine (using Go if that changes things)? Any links to official documentation will be appreciated.

It's currently on the Quotas documentation, under Deployment:
The number of times the application has been uploaded by a developer. The current quota is 10,000 per day.
An application is limited to 10,000 uploaded files per version. Each file is limited to a maximum size of 32 megabytes. Additionally, if the total size of all files for all versions exceeds the initial free 1 gigabyte, then there will be a $ 0.026 per GB per month charge.

Related

Google App Engine (GAE) - do instances use a shared disk?

I'm using Google App Engine (GAE) and my app.yaml looks like this:
runtime: custom # uses Dockerfile
env: flex
manual_scaling:
instances: 2
resources:
cpu: 2
memory_gb: 12
disk_size_gb: 50
The 50GB disk, is it shared between instances? The docs are silent on this issue. I'm downloading files to the disk and each instance will need to be able to access the files I am downloading.
If the disk is not shared, how can I share files between instances?
I know I could download them from Google Cloud storage on demand, but these are video files and instant access is needed for every instance. Downloading the video files on demand would be too slow.
Optional Reading
The reason instant access is needed is because I am using ffmpeg to produce a photo from the video at frame X (or time X). When a photo of the video is taken, these photos need to be made available to the user as quickly as possible.
You are using 50 Gb disk in your GAE, be it standard or flex, there is no way you can share instances between GAE, as the storage is dedicated.
You tried GCS, since video file processing is involved and GCS is object based storage.
So the alternative to this could be Filestore but it is not yet supported for GAE Flex despite the possibility of SSH into its underlying fully-managed machine.
There is a way if you use the /tmp folder. However, it will store files in the RAM of the instance, so note that it will take up memory and that it is temporary (as the folder's name suggests).
For more details, see the documentation here or here.

What is the actual app binary limit in Google App Engine?

When I deploy an app on my Go standard environment, the below size limit error occurred:
ERROR: (gcloud.app.deploy) Error Response: [9] Deployment contains files that cannot be compiled: App binary too big: 69351840 > 67108864
67108864 is 64MB. But this limit is different with what the documentation says. The limit in the documentation is 32MB instead:
Each file is limited to a maximum size of 32 megabytes.
Is the documentation outdated? Or only Go has a higher limit than other languages? I want to find the documentation of actual app binary limits.
Yes, Go has a higher max static data file size limit than other languages. This change is not in the documentation and it needs to be updated to reflect the correct value for Go.
I've filed a public issue about it here for the documentation update.

Deploying and reading static files on GAE

Due to limitations of the experimental Search API I've decided to use Apache Lucene for my fulltext search needs. I have looked at the AppEngine ports of Lucene but they do not suit my needs (ones using RAMIndex will not support the size of my index and ones using the datastore are too slow performance-wise), so I've tested out Lucene using my local filesystem and found that it works perfectly for me.
Now my problem is how to get it to work on AppEngine. We are not allowed to write to the filesystem, but that is fine because he index is created on my dev machine and is read-only on the server (periodically I will update the index and need to push the new index up). Reading from the filesystem is allowed so I figured that I would be able to bundle up my index along with my other static files and have access to it.
The problem that I've run up against is the AppEngine static file quotas (https://developers.google.com/appengine/docs/java/runtime at the bottom of the page). My index is only around 750MB so I am fine on the "total files < 1GB" front, however some of my index files are several hundred MB and therefore would not be allowed on AppEngine due to the 32 MB max per file.
Is there any way to deploy and read static files larger than 32 MB on AppEngine? Or will I be stuck having to setup some other server (for instance Amazon) just to read my Lucene index?
With 750MB file, you must use BlobStore or Google Cloud Storage.
If you can change code for access static file in Lucene, you can use request to BlobStore or Cloud Storage to read file. But if static file is only option, you must split index into 32MB pieces.
If you change code for Lucene file access, you have limit of 32MB for each read request. So, file must be read in pieces.

Google App Engine log download takes too long

I wrote an application running on Google App that generates logs at very high rate. Now I need to download and process them. It says in the Admin Console that the log is around 300MB. Ideally I would like to process only small parts of the logs. Could anybody give me some pointers on how to:
Download only log entries in a specific time range (i.e. the timestamp of the log records fall into this range).
OR
If I have about 300Mb of logs, what is the quickest way to download it?. I've run the appcfg.sh command for almost an hour now and it's still running (around 250000 log records now). Is there a way to break the log down to small chunks and download them in parallel?
Many thanks,
Anh.

Best way to get CSV data into App Engine when bulkloader takes too long/generates errors?

I have a 10 MB CSV file of Geolocation data that I tried to upload to my App Engine datastore yesterday. I followed the instructions in this blog post and used the bulkloader/appcfg tool. The datastore indicated that records were uploaded but it took several hours and used up my entire CPU quota for the day. The process broke down in errors towards the end before I actually exceeded my quota. But needless to say, 10 MB of data shouldn't require this much time and power.
So, is there some other way to get this CSV data into my App Engine datastore (for a Java app).
I saw a post by Ikai Lan about using a mapper tool he created for this purpose but it looks rather complicated.
Instead, what about uploading the CSV to Google Docs - is there a way to transfer it to the App Engine datastore from there?
I do daily uploads of 100000 records (20 megs) through the bulkloader. Settings I played with:
- bulkloader.yaml config: set to auto generate keys.
- include header row in raw csv file.
- speed parameters are set on max (not sure if reducing would reduce cpus consumed)
These settings burn through my 6.5 hrs of free quota in about 4 minutes -- but it gets the data loaded (maybe its' from the indexes being generated).
appcfg.py upload_data --config_file=bulkloader.yaml --url=http://yourapp.appspot.com/remote_api --filename=data.csv --kind=yourtablename --bandwidth_limit=999999 --rps_limit=100 --batch_size=50 --http_limit=15
(I autogenerate this line with a script and use Autohotkey to send my credentials).
I wrote this gdata connector to pull data out of a Google Docs Spreadsheet and insert it into the datastore, but it uses Bulkloader, so it kind of takes you back to square one of your problem.
http://code.google.com/p/bulkloader-gdata-connector/source/browse/gdata_connector.py
What you could do however is take a look at the source to see how I pull data out of gdocs and create a task(s) that does that, instead of going through bulkloader.
Also you could upload your document into the blobstore and similarly create a task that reads csv data out of blobstore and creates entities. (I think this would be easier and faster than working with gdata feeds)

Resources