How to set CacheControl in the new Cloud Storage Api (Python)? - google-app-engine

I'm following the guidelines and updating my code to use the new Cloud Storage API in GAE, i do need to set the cachecontrol headers, previously it was easy:
files.gs.create(filename, mime_type='image/png', acl='public-read', cache_control='public, max-age=100000, must-revalidate' )
BUT, with the new API, the guidelines says that the "cache_control" is not available...
I get this error when tried to put the cachecontrol inside the Options:
ValueError: option cache_control is not supported.
Tried with Cache-Control and the same error...
As usual, the documentation of the new API is not good.
Can someone help me how to set the cache headers in the new Cloud Storage API using PYTHON. In case is not possible, can I still use the old api for my project?
Thanks.

You are right. As documented here,
the open function only supports x-goog-acl and x-goog-meta headers.
Cache control is likely to be added in the near future to make migration easier. Please note that the main value of the GCS client lib is buffered read, buffered resumable write, and automatically retries to overcome transient errors. Many other simple REST operations on GCS (e.g cache, file copy, create bucket ...) can already be done by Google API Client. The "downside" of Google API Client is that since it doesn't come directly from/for App Engine, it does not have dev appserver support.

Related

Dealing with large zip uploads and extracting using google cloud

I am trying to create a site for e-learning courses (zips html/css/js/media) to be uploaded to.
I am using go on google app engine with google cloud storage to store the zips and extracted courses.
I will explain the development dead ends I have encountered.
My first thought was to use the resumable upload functionality of cloud storage to send the zip file, then read it using go on app engine, unzip the files and write them back to cloud storage.
This took a while to read and understand the documentation and worked perfectly for my 2MB test zip. It failed when I tried it with a modest 67MB zip. I had encountered a hidden limitation when accessing cloud storage from app engine. No matter the client I used there was a 10MB/32MB limit.
I tried both the old and new libraries as well as blobstore.
I also looked into creating a custom oauth2 supporting client library using sockets but hit too many dead ends.
Giving up on that approach I thought even though it would mean more uploading, perhaps just extracting on the client (browser) side then uploading each file with it's own resumable upload would make the most sense. After exploring a few libraries I had extracting in browser working ready to upload.
I wrote my handler that created the datastore entry for the upload, selected a location for the upload and created all the upload urls.
When testing this I was finding that it would take a while to go through generating the long lists of files (anything over 100). I decided that it would make sense since I was using to to make the requests concurrently. I spend a day or two getting that working. After dealing with some CORS issues that weirdly did not show up earlier I had everything working.
Then I started getting errors when stress testing my approach with a large (500mb) zip/course. The uploads would fail and I discovered that when trying to send 300+ files to generate upload urls I was getting the following error
Post http://localhost:62394: dial tcp [::1]:62394: connectex: No connection could be made because the target machine actively refused it.
now I have no idea how to diagnose this. I don't know if I am hitting a rate limit and if I am I don't know how to avoid it.
This seems like creating this should be simple, but it is anything but.
I have a few options I can pursue
Try to create the resumable uploads with a batch operation(https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch)
batch operations to /upload are not supported.
Maybe make requesting each url a one by one api call.
Make requesting the url happen over a channel (https://cloud.google.com/appengine/docs/go/channel/reference)
spend the next week or more adding layers of retries and fallback error handling.
Try another solution.
This should be simple. How should this be done?

Possible to access Piwik getVisitorLog through HTTP API?

I'm building some reporting tool. Ideally I want to avoid going through web server logs myself and use (some of) the power of Piwik.
The stuff I get from the visitor log would be a good start, this is at http://example.com/piwik/index.php?module=CoreHome&action=index&idSite=1&period=day&date=yesterday#/module=Live&action=getVisitorLog&idSite=1&period=day&date=yesterday
Unfortunately I can't find a getVisitorLog action in the HTTP API docs at
http://developer.piwik.org/api-reference/reporting-api#Actions (and it's also not an undocumented feature, method=Actions.getVisitorLog gives me
The method 'getVisitorLog' does not exist or is not available in the module '\Piwik\Plugins\Actions\API'.
Is there another way to get to this? Or should I write a plugin for Piwik?
Apparently it is possible through the Live plugin API:
http://developer.piwik.org/api-reference/reporting-api#Live
This works as desired:
http://example.com/piwik/index.php?module=API&method=Live.getLastVisitsDetails&format=JSON&idSite=1&period=day&date=2015-07-21&expanded=1&token_auth=XXXXX&filter_limit=100

Download large file on Google App Engine Python

On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.

detect AppEngine vs basic server

I am developing a web app to run on either Google's AppEngine or a basic server with file storage (it may not stay that way but that's the current status).
How do I detect whether the AppEngine services (most importantly blobstore) are available at runtime?
I have tried using code like the following:
try{
Class.forName( "com.google.appengine.api.blobstore.BlobstoreServiceFactory" );
logger.info( "Using GAE blobstore backend" );
return new GAEBlobService();
}catch( ClassNotFoundException e ){
logger.info( "Using filesystem-based backend" );
return new FileBlobService();
}
but it doesn't work because BlobstoreServiceFactory is available at compile time. What fails if trying to use GAE's blobstore without a GAE server is the following:
com.google.apphosting.api.ApiProxy$CallNotFoundException: The API package 'blobstore' or call 'CreateUploadURL()' was not found.
There's a few things you can use.
You can check the runtime environment to check the running version of App Engine. Check the section about "The Environment" in the runtime docs: https://developers.google.com/appengine/docs/java/runtime
You could also do what you were doing, and attempt to make an call that uses the SDK API functions (instead of just checking for the existence of a class) and catch the exception. This may negatively impact performance since you're making an extra RPC.
You could check request headers for GAE specific request headers too.

Any way to get the pending_ms value for the current (or a specific) request from App Engine?

I'd like to know, in a particular request, what the pending_ms value is (assuming it exists for the given request).
I know that the App Engine logs include this value, but I'm hoping to find it elsewhere for use in gae_mini_profiler.
I've searched around the App Engine source, but no luck -- this is being added elsewhere in the GAE pipeline.
There's not currently any way to access this programmatically, either from within the request or outside it. Please do file a feature request for it, though.

Resources