Decode an App Engine Blobkey to a Google Cloud Storage Filename - google-app-engine

I've got a database full of BlobKeys that were previously uploaded through the standard Google App Engine create_upload_url() process, and each of the uploads went to the same Google Cloud Storage bucket by setting the gs_bucket_name argument.
What I'd like to do is be able to decode the existing blobkeys so I can get their Google Cloud Storage filenames. I understand that I could have been using the gs_object_name property from the FileInfo class, except:
You must save the gs_object_name yourself in your upload handler or
this data will be lost. (The other metadata for the object in GCS is stored
in GCS automatically, so you don't need to save that in your upload handler.
Meaning gs_object_name property is only available in the upload handler, and if I haven't been saving it at that time then its lost.
Also, create_gs_key() doesn't do the trick because it instead takes a google storage filename and creates a blobkey.
So, how can I take a blobkey that was previously uploaded to a Google Cloud Storage bucket through app engine, and get it's Google Cloud Storage filename? (python)

You can get the cloudstorage filename only in the upload handler (fileInfo.gs_object_name) and store it in your database. After that it is lost and it seems not to be preserved in BlobInfo or other metadata structures.
Google says: Unlike BlobInfo metadata FileInfo metadata is not
persisted to datastore. (There is no blob key either, but you can
create one later if needed by calling create_gs_key.) You must save
the gs_object_name yourself in your upload handler or this data will
be lost.
https://developers.google.com/appengine/docs/python/blobstore/fileinfoclass
Update: I was able to decode a SDK-BlobKey in Blobstore-Viewer: "encoded_gs_file:base64-encoded-filename-here". However the real thing is not base64 encoded.
create_gs_key(filename, rpc=None) ... Google says: "Returns an encrypted blob key as a string." Does anyone have a guess why this is encrypted?

From the statement in the docs, it looks like the generated GCS filenames are lost. You'll have to use gsutil to manually browse your bucket.
https://developers.google.com/storage/docs/gsutil/commands/ls

If you have blobKeys you can use: ImagesServiceFactory.makeImageFromBlob

Related

How can I generate thumbnails in GAE using GCS instead of Blobstore?

I am making an application in which the users can upload some pictures so that others can see them. Since some of these can be a bit large, I need to generate smaller images to give a preview of the content.
I already have the uploaded images in GCS, in urls with the form: "https://storage.googleapis.com/...", but from what I can see in the Images API docs, it uses the blobstore, which I am not using (it's been superseded). How can I serve the thumbnails from the gcs link to avoid making the users load the full image? I would really appreciate any code example.
UPDATE:
I tried to copy the example with an image from my app using images.Image with filename as suggested, but it gives me a TransformationError, and a NotImageError if I don't try any transformations:
def get(self):
teststr ='/gs/staging.trn-test2.appspot.com/TestContainer/Barcos-2017-02-12-145657.jpg'
img = images.Image(filename=teststr)
img.resize(width=80, height=100)
thumbnail = img.execute_transforms(output_encoding=images.JPEG)
self.response.headers['Content-Type'] = 'image/jpeg'
self.response.out.write(thumbnail)
What am I missing?
In general you can use the Blobstore API, but with GCS as underlying storage instead of the Blobstore, see Using the Blobstore API with Google Cloud Storage. IMHO just the storage is superseded, not the API itself:
Note: You should consider using Google Cloud Storage rather than Blobstore for storing blob data.
For the Image class in particular (from your link) you can use the filename optional constructor argument instead of the blob_key one (which triggers the above-mentioned blobstore API + GCS usage under the hood):
filename: String of the the file name of a Google Storage file that
contains the image data. Must be in the format
`/gs/bucket_name/object_name`.
From its __init__() function:
if filename:
self._blob_key = blobstore.create_gs_key(filename)
else:
self._blob_key = _extract_blob_key(blob_key)

Limit upload size for appengine interface to cloud store

Consider an image (avatar) uploader to Google Cloud Storage which will start from the user's web browser, and then pass through a Go appengine instance which will handle standard compression/cropping etc. and then set the resulting image as an object in Cloud Storage
How can I ensure that the appengine instance isn't overloaded by too much or bad data? In other words, I think I'm asking two questions (or possibly not):
How can I limit the amount of data allowed to be sent to an appengine instance in a single request, or is there already a default safe limit?
How can I validate the data to make sure it's proper jpg/png/gif before attempting to process it with standard go image libraries?
All App Engine requests are limited to 32MB.
You can check the size of the file being uploaded before the upload starts.
You can verify the file's mime-type and only allow correct files to be uploaded.

Uploading to Google Cloud Storage using Blobstore: Blobstore doesn't retain file name upon upload

I'm trying to upload to GCS using the Blobstore. I have set the GCS bucket name while generating the upload url, and the file gets uploaded successfully.
In the upload handler, blobInfo.getFilename() returns the right file name. But the file actually got saved in the GCS bucket in some different file name. Each time, the file name is some random hash like this one:
L2FwcGhvc3RpbmdfcHJvZC9ibG9icy9BRW5CMlVvbi1XNFEyWEJkNGlKZHNZRlJvTC0wZGlXVS13WTF2c0g0LXdzcEVkaUNEbEEyc3daS3Vham1MVlZzNXlCSk05ZnpKc1RudDJpajF1TmxwdWhTd2VySVFLdUw3US56ZXFHTEZSLVoxT3lablBI
Is this how it will work? Is this an anomaly?
I store the file name to the datastore based on the data returned from blobInfo.getFilename(), which is the correct value of file name. But I'm unable to access the file using the GcsFilename since the file is stored in GCS with that random hash as file name.
Any pointers would be greatly helpful.
Thanks!
PS: The blobstore page says that BlobInfo is currently not available for GCS objects. But BlobInfo.getFilename returns the right value for me. Is that something wrong from my end?
It's how it works, see https://cloud.google.com/appengine/docs/python/blobstore/fileinfoclas ...:
FileInfo metadata is not persisted to datastore [...] You must save
the gs_object_name yourself in your upload handler or this data will
be lost
I personally recommend that new applications use https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/ directly, rather than the blobstore emulation on top of it.
The latter is currently provided essentially only for (limited, partial) backwards compatibility: it's not really all that suitable for new applications.

Location of GS File in Local/Dev AppEngine

I'm trying to trouble shoot some issues I'm having with an export task I have created. I'm attempting to export CSV data using Google Cloud Storage and I seem to be unable to export all my data. I'm assuming it has something to do with the (FAR TOO LOW) 30 second file limit when I attempt to restart the task.
I need to trouble shoot, but I can't seem to find where my local/development server writing the files out. I see numerous entries in the GsFileInfo table so I assume something is going on, but I can't seem to find the actual output file.
Can someone point me to the location of the Google Cloud Storage files in the local AppEngine development environment?
Thanks!
Looking at dev_appserver code, looks like you can specify a path or it will calculate a default based on the OS you are using.
blobstore_path = options.blobstore_path or os.path.join(storage_path,
'blobs')
Then it passed this path to blobstore_stub (GCS storage is backed by blobstore stub), which seems to shard files by their blobstore key.
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
return os.path.join(self._DirectoryForBlob(blob_key), str(blob_key)[1:])
For example, i'm using ubuntu and started with dev_appserver.py --storage_path=~/tmp, then i was able to find files under ~/tmp/blobs and datastore under ~/tmp/datastore.db. Alternatively, you can go to local admin_console, the blobstore viewer link will also display gcs files.
As tkaitchuck mentions above, you can use the included LocalRawGcsService to pull the data out of the local.db. This is the only way to get the file, as they are stored in the Local DB using the blobstore. Here's the original answer:
which are the files uri on GAE java emulating cloud storage with GCS client library?

How to mapreduce over google cloud storage file?

from the app-engine mapreduce console (myappid.appspot.com/mapreduce/status)
I have a mapreduce defined with input_reader: mapreduce.input_readers.BlobstoreLineInputReader
that I have used successfully with a regular blobstore file, but it doesn't work with a Blobkey created from cloud storage with create_gs_key. when I run it, I get the error "BadReaderParamsError: Could not find blobinfo for key THEKEY". The input reader checks for the existence of a BlobInfo. Is there any work around to this? shouldn't BlobInfo.get(BLOBKEY FROM CS) return a blobinfo?
to get a blob_key from a google cloud storage file, I run this:
from google.appengine.ext import blobstore
READ_PATH = '/gs/mybucket/myfile.json'
blob_key = blobstore.create_gs_key(READ_PATH)
print blob_key
A community member created a LineInputReader for Cloud Storage as an issue on the appengine-mapreduce library: http://code.google.com/p/appengine-mapreduce/issues/detail?id=140
We've posted our modifications here: https://github.com/thinkjson/CloudStorageLineInputReader
We're using this to do MapReduce over about 4TB of data, and have been happy with it so far.
Cloud Storage and BlobStore are two different storages, you can't pass a key from the Cloud Storage as a BlobStore key.
You will need to implement your own line reader over Cloud Storage file.

Resources