Serving large file from Google Cloud Storage in Google App Engine - google-app-engine

Running a dart server in App Engine Flexible Environment, there seems to be a limitation of serving files larger than 32MB.
There are a few requirements to files I want to serve:
file size can be larger than 32MB
can not be publicly accessible (authorization is done on the server)
At the moment I try to read the file from the bucket using the gcloud library and then pipe into the request.response. This fails because of the limitation e.g.: HTTP response was too large: 33554744. The limit is: 33554432.
Is there a way to serve larger files from storage? The documentation on this topic is quite confusing (I don't think there is dart specific documentation at all). I keep reading something about the Blobstore but I am not sure if this solution is applicable for dart.

As #filiph suggest you can use signed urls from Google Cloud Storage.
Server side I have this code in python:
import time
import base64
from oauth2client.service_account import ServiceAccountCredentials
def create_signed_url(file_location):
# Get credentials
creds = ServiceAccountCredentials.from_json_keyfile_name('filename_of_private_key.json')
client_id = creds.service_account_email
# Set time limit to two hours from now
timestamp = time.time() + 2 * 3600
# Generate signature string
signature_string = "GET\n\n\n%d\n%s" % (timestamp, file_location)
signature = creds.sign_blob(signature_string)[1]
encoded_signature = base64.b64encode(signature)
encoded_signature = encoded_signature.replace("+", "%2B").replace("/", "%2F")
# Generate url
return "https://storage.googleapis.com%s?GoogleAccessId=%s&Expires=%d&&Signature=%s" % (
file_location, client_id, timestamp, encoded_signature)

Related

google cloud storage: access cloud storage and provide download link for users

In my developent server, i am able to use the blobkey to download a csv object. The problem is that in production, the blobkey does not download anything (it returns a 404); presumably because the blobkey is inaccurate. I think this is because googles deprecation of the blobstore and is no longer using blobkeys. This means i need to try and download from google storage bucket. I am not sure how to do this; In development server, i would go to this endpoint to download /data?key=<blob_key> to download the blob object.
I can also download the csv object if i navigate to the bucket and ot the item and then click download. Is there some minor adjustments i can make to get the download to occur? BI would appreciate if someone could point me to a particular direction.
To download objects from your buckets in Cloud Storage depending on your preferences you can check the following code sample (Python):
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
Be sure that you are not using Python 2.7 anymore since it became deprecated and it is not supported anymore. If you have Python 2.7 please upgrade to Python 3.7.

400-500ms latency for simple Google Drive API file query from App Engine

I'm running a Python 3.7 App Engine application in the Standard Environment. In general my time-to-first-byte for requests to this application range from 70-150ms when doing simple server-side rendering of text. My app runs in the us-central region, I'm testing from California.
I've been benchmarking requests to Google Drive API v3. I have a request which simply takes the drive file ID in the URL and returns some metadata about the file. The simplified code looks like:
from googleapiclient.discovery import build
def get(file_id: str):
credentials = oauth2.get_credentials()
service = build("drive", "v3", cache_discovery=False, credentials=credentials)
data = service.files().get(fileId=file_id, fields=",".join(FIELDS), supportsTeamDrives=True)
return json.dumps(data)
My understanding is that this request should be making a single request to the Drive API. I'm seeing 400-500ms time-to-first-byte coming back from the server. That implies ~300ms to get data from the Drive API. This seems quite high to me.
My questions:
* Is this kind of per-API-call latency common for Google Drive API
* If this is not common, what is common?
* What steps, if any, could I take to reduce the amount of time talking to the Drive API?

Total cpu utilization of running app engine flexible instances

I need to make decisions in an external system based on the current CPU utilization of my App Engine Flexible service. I can see the exact values / metrics I need to use in the dashboard charting in my Google Cloud Console, but I don't see a direct, easy way to get this information from something like a gcloud command.
I also need to know the count of running instances, but I think I can use gcloud app instances list -s default to get a list of my running instances in the default service, and then I can use a count of lines approach to get this info easily. I intend to make a python function which returns a tuple like (instance_count, cpu_utilization).
I'd appreciate if anyone can direct me to an easy way to get this. I am currently exploring the StackDriver Monitoring service to get this same information, but as of now it is looking super-complicated to me.
You can use the gcloud app instances list -s default command to get the running instances list, as you said. To retrieve CPU utilization, have a look on this Python Client for Stackdriver Monitoring. To list available metric types:
from google.cloud import monitoring
client = monitoring.Client()
for descriptor in client.list_metric_descriptors():
print(descriptor.type)
Metric descriptors are described here. To display utilization across your GCE instances during the last five minutes:
metric = 'compute.googleapis.com/instance/cpu/utilization'
query = client.query(metric, minutes=5)
print(query.as_dataframe())
Do not forget to add google-cloud-monitoring==0.28.1 to “requirements.txt” before installing it.
Check this code that locally runs for me:
import logging
from flask import Flask
from google.cloud import monitoring as mon
app = Flask(__name__)
#app.route('/')
def list_metric_descriptors():
"""Return all metric descriptors"""
# Instantiate client
client = mon.Client()
for descriptor in client.list_metric_descriptors():
print(descriptor.type)
return descriptor.type
#app.route('/CPU')
def cpuUtilization():
"""Return CPU utilization"""
client = mon.Client()
metric = 'compute.googleapis.com/instance/cpu/utilization'
query = client.query(metric, minutes=5)
print(type(query.as_dataframe()))
print(query.as_dataframe())
data=str(query.as_dataframe())
return data
#app.errorhandler(500)
def server_error(e):
logging.exception('An error occurred during a request.')
return """
An internal error occurred: <pre>{}</pre>
See logs for full stacktrace.
""".format(e), 500
if __name__ == '__main__':
# This is used when running locally. Gunicorn is used to run the
# application on Google App Engine. See entrypoint in app.yaml.
app.run(host='127.0.0.1', port=8080, debug=True)

How to migrate from The Files API to Google Cloud Storage?

I've got a message yesterday from Google saying that the Files API will be disabled on July 28th and it is recommended to migrate to Google Cloud Storage.
Currently I use Files API in the following way - once email is received, I save its attachment (images only) to blobstore -
from google.appengine.api import files
bs_file = files.blobstore.create(mime_type=ctype, _blobinfo_uploaded_filename='screenshot_'+image_file_name)
try:
with files.open(bs_file, 'a') as f:
f.write(image_file)
files.finalize(bs_file)
blob_key = files.blobstore.get_blob_key(bs_file)
Later on, I access blobstore and attach the same images to another mail I send:
attachments = []
for at_blob_key in message.attachments:
blob_reader = blobstore.BlobReader(at_blob_key)
blob_info = blobstore.BlobInfo.get(at_blob_key)
if blob_reader and blob_info:
filename = blob_info.filename
attachments.append((filename, blob_reader.read()))
if len(attachments) > 0:
email.attachments = attachments
email.send()
Now, I am supposed to use Google Cloud Storage instead of Blobstore. Google Cloud Storage is not free, so I have to enable billing. Currently my Blobstore Stored Data is 0.27Gb, which is small, so looks like I will not have to pay a lot. But I am afraid to enable billing, since some other parts of my code could result in a huge bill (and seems there is no way to enable billing just for Google Cloud Storage).
So, is there any way to continue usage of Blobstore for files storage in my case? What else can I use for free instead of Google Cloud Storage (what is about Google Drive)?
The below example uses the GCS default bucket to store your screenshots. The default bucket has free quota.
from google.appengine.api import app_identity
import cloudstorage as gcs
default_bucket = app_identity.get_default_gcs_bucket_name()
image_file_name = datetime.datetime.utcnow().strftime('%Y%m%d%H%M%S') + '_' + image_file_name # GCS filename should be unique
gcs_filename = '/%s/screenshot_%s' % (default_bucket, image_file_name)
with gcs.open(gcs_filename, 'w', content_type=ctype) as f:
f.write(image_file)
blob_key = blobstore.create_gs_key('/gs' + gcs_filename)
blob_key = blobstore.BlobKey(blob_key) # if should be stored in NDB

Location of GS File in Local/Dev AppEngine

I'm trying to trouble shoot some issues I'm having with an export task I have created. I'm attempting to export CSV data using Google Cloud Storage and I seem to be unable to export all my data. I'm assuming it has something to do with the (FAR TOO LOW) 30 second file limit when I attempt to restart the task.
I need to trouble shoot, but I can't seem to find where my local/development server writing the files out. I see numerous entries in the GsFileInfo table so I assume something is going on, but I can't seem to find the actual output file.
Can someone point me to the location of the Google Cloud Storage files in the local AppEngine development environment?
Thanks!
Looking at dev_appserver code, looks like you can specify a path or it will calculate a default based on the OS you are using.
blobstore_path = options.blobstore_path or os.path.join(storage_path,
'blobs')
Then it passed this path to blobstore_stub (GCS storage is backed by blobstore stub), which seems to shard files by their blobstore key.
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
return os.path.join(self._DirectoryForBlob(blob_key), str(blob_key)[1:])
For example, i'm using ubuntu and started with dev_appserver.py --storage_path=~/tmp, then i was able to find files under ~/tmp/blobs and datastore under ~/tmp/datastore.db. Alternatively, you can go to local admin_console, the blobstore viewer link will also display gcs files.
As tkaitchuck mentions above, you can use the included LocalRawGcsService to pull the data out of the local.db. This is the only way to get the file, as they are stored in the Local DB using the blobstore. Here's the original answer:
which are the files uri on GAE java emulating cloud storage with GCS client library?

Resources