Using bottle.py and blobstore GAE - google-app-engine

I recently started using bottle and GAE blobstore and while I can upload the files to the blobstore I cannot seem to find a way to download them from the store.
I followed the examples from the documentation but was only successful on the uploading part. I cannot integrate the example in my app since I'm using a different framework from webapp/2.
How would I go about creating an upload handler and download handler so that I can get the key of the uploaded blob and store it in my data model and use it later in the download handler?
I tried using the BlobInfo.all() to create a query the blobstore but I'm not able to get the key name field value of the entity.
This is my first interaction with the blobstore so I wouldn't mind advice on a better approach to the problem.

For serving a blob I would recommend you to look at the source code of the BlobstoreDownloadHandler. It should be easy to port it to bottle, since there's nothing very specific about the framework.
Here is an example on how to use BlobInfo.all():
for info in blobstore.BlobInfo.all():
self.response.out.write('Name:%s Key: %s Size:%s Creation:%s ContentType:%s<br>' % (info.filename, info.key(), info.size, info.creation, info.content_type))

for downloads you only really need to generate a response that includes the header "X-AppEngine-BlobKey:[your blob_key]" along with everything else you need like a Content-Disposition header if desired. or if it's an image you should probably just use the high performance image serving api, generate a url and redirect to it.... done
for uploads, besides writing a handler for appengine to call once the upload is safely in blobstore (that's in the docs)
You need a way to find the blob info in the incoming request. I have no idea what the request looks like in bottle. The Blobstoreuploadhandler has a get_uploads method and there's really no reason it needs to be an instance method as far as I can tell. So here's an example generic implementation of it that expects a webob request. For bottle you would need to write something similar that is compatible with bottles request object.
def get_uploads(request, field_name=None):
"""Get uploads for this request.
Args:
field_name: Only select uploads that were sent as a specific field.
populate_post: Add the non blob fields to request.POST
Returns:
A list of BlobInfo records corresponding to each upload.
Empty list if there are no blob-info records for field_name.
stolen from the SDK since they only provide a way to get to this
crap through their crappy webapp framework
"""
if not getattr(request, "__uploads", None):
request.__uploads = {}
for key, value in request.params.items():
if isinstance(value, cgi.FieldStorage):
if 'blob-key' in value.type_options:
request.__uploads.setdefault(key, []).append(
blobstore.parse_blob_info(value))
if field_name:
try:
return list(request.__uploads[field_name])
except KeyError:
return []
else:
results = []
for uploads in request.__uploads.itervalues():
results += uploads
return results

For anyone looking for this answer in future, to do this you need bottle (d'oh!) and defnull's multipart module.
Since creating upload URLs is generally simple enough and as per GAE docs, I'll just cover the upload handler.
from bottle import request
from multipart import parse_options_header
from google.appengine.ext.blobstore import BlobInfo
def get_blob_info(field_name):
try:
field = request.files[field_name]
except KeyError:
# Maybe form isn't multipart or file wasn't uploaded, or some such error
return None
blob_data = parse_options_header(field.content_type)[1]
try:
return BlobInfo.get(blob_data['blob-key'])
except KeyError:
# Malformed request? Wrong field name?
return None
Sorry if there are any errors in the code, it's off the top of my head.

Related

Flask Restplus mutlipart/form-data model

Is it possible with Flask Restplus to create a model for a multipart/form-data request so that I can use it to validate the input with #api.expect?
I have this complex data structure for which I've created a api.namespace().model that has to be received together with a file. However when I tried to document the endpoint I noticed that this doesn't seem to be supported by Flask Restplus.
I've tried to find something along the lines of
parser = ns.parser()
parser.add_argument("jsonModel", type=Model, location="form")
parser.add_argument("file", type=FileStorage, location="files")
and
formModel = ns.model("myForm", {"jsonModel": fields.Nested(myModel), "file": fields.File})
But neither methods seem to support this kind of behavior.

How to setup uploading binary objects through flask_restless endpoint?

I am working on a REST python application and I have picked flask_restless to build endpoints connected to the database. One of the tables I would like to manage is storing binary files as blobs (LargeBinary).
I have noticed, though, that flask_restless requires json data for POST requests. I tried to apply base64 to the binary file contents and wrap it with json, but ultimately flask_restless passed file contents to sqlalchemy as a string and the SQLite backend complained that it requires bytes input (quite rightly so).
I tried searching the interwebs for a solution, but either I am formulating my query incorrectly, or actually there is none.
So, is there a way to configure the endpoint managed with flask_restless to accept binary file as an attachment? Or rather the suggested solution would be to setup the endpoint for that particular table directly with flask (I did that before in another app), away from flask_restless?
It turns out that sending an attachment is not possible.
So I dug deeper into how to send base64-encoded attachments which would then be saved as blobs.
For that I used pre- and post-processing facility of flask_restless:
def pp_get_single_image(result=None, **kw):
import base64
result['image'] = base64.b64encode(result['image']).decode('utf8')
def pp_get_many_images(result=None, search_params=None, **kw):
result['objects'] = [pp_get_single_image(d) or d for d in result['objects']]
def pp_post_image_in(data=None, **kw):
import base64
data['image'] = base64.b64decode(data['image'])
def pp_post_image_out(result=None, **kw):
import base64
result['image'] = base64.b64encode(result['image']).decode('utf8')
postprocessors=dict(GET_SINGLE=[pp_get_single_image], GET_MANY=[pp_get_many_images], POST=[pp_post_image_out])
preprocessors=dict(POST=[pp_post_image_in])
manager = flask_restless.APIManager(app, flask_sqlalchemy_db=db)
manager.create_api(Image, methods=['GET', 'POST', 'DELETE'],
postprocessors=pp_image.postprocessors,
preprocessors=pp_image.preprocessors)

How do we get the document file url using the Watson Discovery Service?

I don't see a solution to this using the available api documentation.
It is also not available on the web console.
Is it possible to get the file url using the Watson Discovery Service?
If you need to store the original source/file URL, you can include it as a field within your documents in the Discovery service, then you will be able to query that field back out when needed.
I also struggled with this request but ultimately got it working using Python bindings into Watson Discovery. The online documentation and API reference is very poor; here's what I used to get it working:
(Assume you have a Watson Discovery service and have a created collection):
# Programmatic upload and retrieval of documents and metadata with Watson Discovery
from watson_developer_cloud import DiscoveryV1
import os
import json
discovery = DiscoveryV1(
version='2017-11-07',
iam_apikey='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
url='https://gateway-syd.watsonplatform.net/discovery/api'
)
environments = discovery.list_environments().get_result()
print(json.dumps(environments, indent=2))
This gives you your environment ID. Now append to your code:
collections = discovery.list_collections('{environment-id}').get_result()
print(json.dumps(collections, indent=2))
This will show you the collection ID for uploading documents into programmatically. You should have a document to upload (in my case, an MS Word document), and its accompanying URL from your own source document system. I'll use a trivial fictitious example.
NOTE: the documentation DOES NOT tell you to append , 'rb' to the end of the open statement, but it is required when uploading a Word document, as in my example below. Raw text / HTML documents can be uploaded without the 'rb' parameter.
url = {"source_url":"http://mysite/dis030.docx"}
with open(os.path.join(os.getcwd(), '{path to your document folder with trailing / }', 'dis030.docx'), 'rb') as fileinfo:
add_doc = discovery.add_document('{environment-id}', '{collections-id}', metadata=json.dumps(url), file=fileinfo).get_result()
print(json.dumps(add_doc, indent=2))
print(add_doc["document_id"])
Note the setting up of the metadata as a JSON dictionary, and then encoding it using json.dumps within the parameters. So far I've only wanted to store the original source URL but you could extend this with other parameters as your own use case requires.
This call to Discovery gives you the document ID.
You can now query the collection and extract the metadata using something like a Discovery query:
my_query = discovery.query('{environment-id}', '{collection-id}', natural_language_query="chlorine safety")
print(json.dumps(my_query.result["results"][0]["metadata"], indent=2))
Note - I'm extracting just the stored metadata here from within the overall returned results - if you instead just had:
print(my_query) you'll get the full response from Discovery ... but ... there's a lot to go through to identify just your own custom metadata.

Provide a callback URL in Google Cloud Storage signed URL

When uploading to GCS (Google Cloud Storage) using the BlobStore's createUploadURL function, I can provide a callback together with header data that will be POSTed to the callback URL.
There doesn't seem to be a way to do that with GCS's signed URL's
I know there is Object Change Notification but that won't allow the user to provide upload specific information in the header of a POST, the way it is possible with createUploadURL's callback.
My feeling is, if createUploadURL can do it, there must be a way to do it with signed URL's, but I can't find any documentation on it. I was wondering if anyone may know how createUploadURL achieves that callback calling behavior.
PS: I'm trying to move away from createUploadURL because of the __BlobInfo__ entities it creates, which for my specific use case I do not need, and somehow seem to be indelible and are wasting storage space.
Update: It worked! Here is how:
Short Answer: It cannot be done with PUT, but can be done with POST
Long Answer:
If you look at the signed-URL page, in front of HTTP_Verb, under Description, there is a subtle note that this page is only relevant to GET, HEAD, PUT, and DELETE, but POST is a completely different game. I had missed this, but it turned out to be very important.
There is a whole page of HTTP Headers that does not list an important header that can be used with POST; that header is success_action_redirect, as voscausa correctly answered.
In the POST page Google "strongly recommends" using PUT, unless dealing with form data. However, POST has a few nice features that PUT does not have. They may worry that POST gives us too many strings to hang ourselves with.
But I'd say it is totally worth dropping createUploadURL, and writing your own code to redirect to a callback. Here is how:
Code:
If you are working in Python voscausa's code is very helpful.
I'm using apejs to write javascript in a Java app, so my code looks like this:
var exp = new Date()
exp.setTime(exp.getTime() + 1000 * 60 * 100); //100 minutes
json['GoogleAccessId'] = String(appIdentity.getServiceAccountName())
json['key'] = keyGenerator()
json['bucket'] = bucket
json['Expires'] = exp.toISOString();
json['success_action_redirect'] = "https://" + request.getServerName() + "/test2/";
json['uri'] = 'https://' + bucket + '.storage.googleapis.com/';
var policy = {'expiration': json.Expires
, 'conditions': [
["starts-with", "$key", json.key],
{'Expires': json.Expires},
{'bucket': json.bucket},
{"success_action_redirect": json.success_action_redirect}
]
};
var plain = StringToBytes(JSON.stringify(policy))
json['policy'] = String(Base64.encodeBase64String(plain))
var result = appIdentity.signForApp(Base64.encodeBase64(plain, false));
json['signature'] = String(Base64.encodeBase64String(result.getSignature()))
The code above first provides the relevant fields.
Then creates a policy object. Then it stringify's the object and converts it into a byte array (you can use .getBytes in Java. I had to write a function for javascript).
A base64 encoded version of this array, populates the policy field.
Then it is signed using the appidentity package. Finally the signature is base64 encoded, and we are done.
On the client side, all members of the json object will be added to the Form, except the uri which is the form's address.
var formData = new FormData(document.forms.namedItem('upload'));
var blob = new Blob([thedata], {type: 'application/json'})
var keys = ['GoogleAccessId', 'key', 'bucket', 'Expires', 'success_action_redirect', 'policy', 'signature']
for(field in keys)
formData.append(keys[field], url[keys[field]])
formData.append('file', blob)
var rest = new XMLHttpRequest();
rest.open('POST', url.uri)
rest.onload = callback_function
rest.send(formData)
If you do not provide a redirect, the response status will be 204 for success. But if you do redirect, the status will be 200. If you got 403 or 400 something about the signature or policy maybe wrong. Look at the responseText. If is often helpful.
A few things to note:
Both POST and PUT have a signature field, but these mean slightly different things. In case of POST, this is a signature of the policy.
PUT has a baseurl which contains the key (object name), but the URL used for POST may only include bucket name
PUT requires expiration as seconds from UNIX epoch, but POST wants it as an ISO string.
A PUT signature should be URL encoded (Java: by wrapping it with a URLEncoder.encode call). But for POST, Base64 encoding suffices.
By extension, for POST do Base64.encodeBase64String(result.getSignature()), and do not use the Base64.encodeBase64URLSafeString function
You cannot pass extra headers with the POST; only those listed in the POST page are allowed.
If you provide a URL for success_action_redirect, it will receive a GET with the key, bucket and eTag.
The other benefit of using POST is you can provide size limits. With PUT however, if a file breached your size restriction, you can only delete it after it was fully uploaded, even if it is multiple-tera-bytes.
What is wrong with createUploadURL?
The method above is a manual createUploadURL.
But:
You don't get those __BlobInfo__ objects which create many indexes and are indelible. This irritates me as it wastes a lot of space (which reminds me of a separate issue: issue 4231. Please go give it a star)
You can provide your own object name, which helps create folders in your bucket.
You can provide different expiration dates for each link.
For the very very few javascript app-engineers:
function StringToBytes(sz) {
map = function(x) {return x.charCodeAt(0)}
return sz.split('').map(map)
}
You can include succes_action_redirect in a policy document when you use GCS post object.
Docs here: Docs: https://cloud.google.com/storage/docs/xml-api/post-object
Python example here: https://github.com/voscausa/appengine-gcs-upload
Example callback result:
def ok(self):
""" GCS upload success callback """
logging.debug('GCS upload result : %s' % self.request.query_string)
bucket = self.request.get('bucket', default_value='')
key = self.request.get('key', default_value='')
key_parts = key.rsplit('/', 1)
folder = key_parts[0] if len(key_parts) > 1 else None
A solution I am using is to turn on Object Changed Notifications. Any time an object is added, a Post is sent to a URL - in my case - a servlet in my project.
In the doPost() I get all info of objected added to GCS and from there, I can do whatever.
This worked great in my App Engine project.

Google App Engine Example of Uploading and Serving Arbitrary Files

I'd like to use GAE to allow a few users to upload files and later retrieve them. Files will be relatively small (a few hundred KB), so just storing stuff as a blob should work. I haven't been able to find any examples of something like this. There are a few image uploading examples out there but I'd like to be able to store word documents, pdfs, tiffs, etc. Any ideas/pointers/links? Thanks!
The same logic used for image uploads apply for other archive types. To make the file downloadable, you add a Content-Disposition header so the user is prompted to download it. A webapp simple example:
class DownloadHandler(webapp.RequestHandler):
def get(self, file_id):
# Files is a model.
f = Files.get_by_id(file_id)
if not f:
return self.error(404)
# Set headers to prompt for download.
headers = self.response.headers
headers['Content-Type'] = f.content_type or 'application/octet-stream'
headers['Content-Disposition'] = 'attachment; filename="%s"' % f.filename
# Add the file contents to the response.
self.response.out.write(f.contents)
(untested code, but you get the idea :)
It sounds like you want to use the Blobstore API.
You don't mention if you are using Python or Java so here are links to both.
I use blobstore API that admits any file upload/download up to 50 MB.

Resources