Set headers on file upload to Google Cloud Storage - google-app-engine

According to the documentation, one should be able to set objects headers on upload to GoogleCloudStorage.
Implementation Details
You should specify cache-control only for objects that are accessible
to all anonymous users. To be anonymously accessible, an object's ACL
must grant READ or FULL_CONTROL permission to AllUsers. If an object
is accessible to all anonymous users and you do not specify a
cache-control setting, Cloud Storage applies a cache-control setting
of 3600 seconds. When serving via XML, Cloud Storage respects the
cache-control of the object as set by its metadata.
However, adding headers through the Google API doesn't seem to work, when fetching the image back with google.appengine.api.images.get_serving_url .
Changing Cache-Control headers from gsutil console has its effects, but takes several days for changes to be visible on the object (when checking from the gsutil console, again, no effect when fetching the image back with the API.

After 2 months of going back and forth with Google's support, we found out that file is sent to the Google Cloud Storage with the proper headers (can be checked via gsutil command).
However get_serving_url function does not respect Blob's headers (confirmed by Google's engineers).
As of 17th of August 2017, there are no future plans to fix that.
Thought someone may encounter the similar problem as there's nothing about it in the documentation.

Related

How to send custom data with Cloud Pub/Sub when GCS object is uploaded via a Signed URL

I was able to set up Google Cloud Storage Cloud Pub/Sub notifications using:
gsutil notification create -t [TOPIC_NAME] -m my-key:my-value -f json gs://[BUCKET_NAME]
My App Engine servlet correctly gets a message every time an object is uploaded to GCS. I upload my object to GCS with a Signed URL.
However, I'm not sure how to set custom key-value pairs from my client when uploading an object with the Signed URL. The above gsutil command lets you set a key:value pair but it hard-codes it so that is not useful. In my client I want to set some key:value pair like user : some-user so then in my servlet I can do some extra App Engine stuff like write to a database.
I tried uploading some headers in the metadata tag as shown here but getting those headers from the HttpServletRequest in the servlet didn't seem to work.
Also, how would I sent the subscriptionUniqueToken as well, since there is no explanation on how to do that.
Note: using Java
The notion of a unique token is no longer necessary in most cases. Object change notifications offered them because they worked by sending unauthenticated HTTPS calls to a configurable endpoint. If that endpoint were discovered, a malicious user could also send such calls. However, Cloud Pub/Sub notifications publish messages to a topic as a known service account, and so long as no malicious third party is also granted publishing permission to that topic, they cannot interfere. If you want, you can include a unique token as a second protection mechanism, but it's generally not necessary.
"Client tokens" are a feature specific to Object Change Notifications. The equivalent with Cloud Pub/Sub integration are "custom_attributes", user-specified properties of a notification config that are attached as extra attributes to each notification. You could add a "unique_token" attribute and attach a value, if you wish.
When using signed URLs, setting custom metadata is done with HTTP headers beginning x-goog-meta-. For example, x-goog-meta-stuff: Foo will create a custom attribute pair "stuff: Foo".

Google Cloud Storage public link does not become invalid when unchecking

I am using Google Cloud Storage to upload images. I am now testing it from the cloud console.
After I upload a picture if I check the Share publicly checkbox to obtain a public link, I get (obviously) a publicly accessible url, which is: https://storage.googleapis.com/bucket_name/pictureName .
Then, if I uncheck the Share Publicly checkbox, it makes a request
Request URL:https://clients6.google.com/storage/v1_internal/b/bucketName/o/pictureName.jpg/acl/allUsers?key=AIzaSyCI-yuie5UVOi0DjtiCwWBwQ1djkiuo1g
Request Method:DELETE
The request goes well, but the public url remains publicly accessible. I thought it is valid for some time, but after one hour is still available.
So, what is the right way to remove the public url? How do I restrict access to a stored file after I made it public?
See the documentation on cache control and consistency. In particular:
Note: If you do not specify a cache lifetime, a publicly accessible
object can be cached for up to 60 minutes.
So I'm guessing this is working as intended and your object is cached. Have you tried waiting a little longer?
In Sharing your data publicly, it's shown that there are 2 ways to stop sharing an object publicly.
Deselect the checkbox under Shared Publicly as you've mentioned already.
Edit the object permissions and remove the entry with ID allUsers.
The reason you are still able to access the object publicly is indeed because of caching as mentioned by #jterrace. The Cache control and consistency article referenced explains the effect of this eventual consistency.
One can test this behavior by sharing an object publicly and unsharing immediately after. In most cases, the object will be publicly accessible for the cache duration. One can shorten this duration by specifying the Cache-Control headers such as max-age.
When Your sharing Publicly Url is like https://storage.googleapis.com/bucket_name/pictureName.
If you delete the file or uncheck the Share Publicly checkbox
It is available up to 60 minutes it is default cache time in Google cloud,
To avoid the Issue Need to pass Query parameter like
https://storage.googleapis.com/bucket_name/pictureName?avoidCache=1
Every time passes random number in a query string.

Google Storage Image Serving Cache

Using the GoogleStorageTools class's CloudStorageTools::getImageServingUrl and then replacing the storage object of the image with another image of the same name, the old image is still displayed upon subsequent calls of getImageServingUrl
I tried using CloudStorageTools::deleteImageServingUrl and then CloudStorageTools::getImageServingUrl again, but this doesn't work.
Is there any way to interact with Cloud Storage and tell it to refresh the image or the image URL? I'm guessing not, and am going to ensure the filenames are unique, instead, but it feels like there ought to be a way.
If you refresh the image, does the new image show up? It's possible there's a cache-control policy set on the image. Google Cloud Storage allows users to specify what cache-control headers should be sent to browsers, but I'm not sure whether app engine's getImageServingUrl respects that value.
As an experiment, could you try going to console.developers.google.com, heading over to "storage > cloud storage > storage browser", choosing the appropriate object, choosing "edit metadata," and then seeing whether there's a Cache-Control policy on the object? Try changing the cache-control section to "max-age=0,no-cache".

How to set CacheControl in the new Cloud Storage Api (Python)?

I'm following the guidelines and updating my code to use the new Cloud Storage API in GAE, i do need to set the cachecontrol headers, previously it was easy:
files.gs.create(filename, mime_type='image/png', acl='public-read', cache_control='public, max-age=100000, must-revalidate' )
BUT, with the new API, the guidelines says that the "cache_control" is not available...
I get this error when tried to put the cachecontrol inside the Options:
ValueError: option cache_control is not supported.
Tried with Cache-Control and the same error...
As usual, the documentation of the new API is not good.
Can someone help me how to set the cache headers in the new Cloud Storage API using PYTHON. In case is not possible, can I still use the old api for my project?
Thanks.
You are right. As documented here,
the open function only supports x-goog-acl and x-goog-meta headers.
Cache control is likely to be added in the near future to make migration easier. Please note that the main value of the GCS client lib is buffered read, buffered resumable write, and automatically retries to overcome transient errors. Many other simple REST operations on GCS (e.g cache, file copy, create bucket ...) can already be done by Google API Client. The "downside" of Google API Client is that since it doesn't come directly from/for App Engine, it does not have dev appserver support.

Caching & GZip on GAE (Community Wiki)

Why does it seem like Google App Engine isn’t setting appropriate cache-friendly headers (like far-future expiration dates) on my CSS stylesheets and JavaScript files? When does GAE gzip those files? My app.yaml marks the respective directories as static_dirs, so the lack of far-future expiration dates is kind of surprising to me.
This is a community wiki to showcase the best practices regarding static file caching and gzipping on GAE!
How does GAE handle caching?
It seems GAE sets near-future cache expiration times, but does use the etag header. This is used so browsers can ask, “Has this file changed since when it had a etag of X68f0o?” and hear “Nope – 304 Not Modified” back in response.
As opposed to far-future expiration dates, this has the following trade-offs:
Your end users will get the latest copies of your resources, even if they have the same name (unlike far-future expiration). This is good.
Your end users will however still have to make a request to check on the status of that file. This does slow down your site, and is “pure overhead” when the content hasn’t changed. This is not ideal.
Opting for far-future cache expiration instead of (just) etag
To use far-future expiration dates takes two steps and a bit of understanding.
You have to manually update your app to request new versions of resources, by e.g. naming files like mysitesstyles.2011-02-11T0411.css instead of mysitestyles.css. There are tools to help automate this, but I’m not aware of any that directly relate to GAE.
Configure GAE to set the expiration times you want by using default_expiration and/or expiration in app.yaml. GAE docs on static files
A third option: Application manifests
Cache manifests are an HTML5 feature that overrides cache headers. MDN article, DiveIntoHTML5, W3C. This affects more than just your script and style files' caching, however. Use with care!
When does GAE gzip?
According to Google’s FAQ,
Google App Engine does its best to serve gzipped content to browsers that support it. Taking advantage of this scheme is automatic and requires no modifications to applications.
We use a combination of request headers (Accept-Encoding, User-Agent) and response headers (Content-Type) to determine whether or not the end-user can take advantage of gzipped content. This approach avoids some well-known bugs with gzipped content in popular browsers. To force gzipped content to be served, clients may supply 'gzip' as the value of both the Accept-Encoding and User-Agent request headers. Content will never be gzipped if no Accept-Encoding header is present.
This is covered further in the runtime environment documentation (Java | Python).
Some real-world observations do show this to generally be true. Assuming a gzip-capable browser:
GAE gzips actual pages (if they have proper content-type headers like text/html; charset=utf-8)
GAE gzips scripts and styles in static_dirs (defined in app.yaml).
Note that you should not expect GAE to gzip images like GIFs or JPEGs as they are already compressed.

Resources