Programmatically emulating "gsutil mv" on appengine cloudstorage in python - google-app-engine

I would like to implement a mv (copy-in-the-cloud) operation on google cloud storage that is similar to how gsutil does it (http://developers.google.com/storage/docs/gsutil/commands/mv).
I read somewhere earlier that this involves a read and write (download and reupload) of the data, but I cannot find the passages again.
Is this the correct way to move a file in cloud storage, or does one have to go a level down to the boto library to avoid copying the data over the network for renaming the file?
istream = cloudstorage.open(src, mode='r')
ostream = cloudstorage.open(dst, content_type=src_content, mode='w')
while True:
buf = istream.read(500000)
if not buf:
break
ostream.write(buf)
istream.close()
ostream.close()
Update: I found the rest api that supports copy and compose operations and much more. It seems that there is hope that we do not have to copy data across continents to rename something.
Useful Links I have found sofar ...
Boto based approach: https://developers.google.com/storage/docs/gspythonlibrary
GCS Clinet Lib: https://developers.google.com/appengine/docs/python/googlecloudstorageclient/
GCS Lib: https://code.google.com/p/appengine-gcs-client
RAW JSON API: https://developers.google.com/storage/docs/json_api

Use the JSON API, there is a copy method. Here is the official example for Python, using the Python Google Api Client lib :
# The destination object resource is entirely optional. If empty, we use
# the source object's metadata.
if reuse_metadata:
destination_object_resource = {}
else:
destination_object_resource = {
'contentLanguage': 'en',
'metadata': {'my-key': 'my-value'},
}
req = client.objects().copy(
sourceBucket=bucket_name,
sourceObject=old_object,
destinationBucket=bucket_name,
destinationObject=new_object,
body=destination_object_resource)
resp = req.execute()
print json.dumps(resp, indent=2)

Related

How to setup uploading binary objects through flask_restless endpoint?

I am working on a REST python application and I have picked flask_restless to build endpoints connected to the database. One of the tables I would like to manage is storing binary files as blobs (LargeBinary).
I have noticed, though, that flask_restless requires json data for POST requests. I tried to apply base64 to the binary file contents and wrap it with json, but ultimately flask_restless passed file contents to sqlalchemy as a string and the SQLite backend complained that it requires bytes input (quite rightly so).
I tried searching the interwebs for a solution, but either I am formulating my query incorrectly, or actually there is none.
So, is there a way to configure the endpoint managed with flask_restless to accept binary file as an attachment? Or rather the suggested solution would be to setup the endpoint for that particular table directly with flask (I did that before in another app), away from flask_restless?
It turns out that sending an attachment is not possible.
So I dug deeper into how to send base64-encoded attachments which would then be saved as blobs.
For that I used pre- and post-processing facility of flask_restless:
def pp_get_single_image(result=None, **kw):
import base64
result['image'] = base64.b64encode(result['image']).decode('utf8')
def pp_get_many_images(result=None, search_params=None, **kw):
result['objects'] = [pp_get_single_image(d) or d for d in result['objects']]
def pp_post_image_in(data=None, **kw):
import base64
data['image'] = base64.b64decode(data['image'])
def pp_post_image_out(result=None, **kw):
import base64
result['image'] = base64.b64encode(result['image']).decode('utf8')
postprocessors=dict(GET_SINGLE=[pp_get_single_image], GET_MANY=[pp_get_many_images], POST=[pp_post_image_out])
preprocessors=dict(POST=[pp_post_image_in])
manager = flask_restless.APIManager(app, flask_sqlalchemy_db=db)
manager.create_api(Image, methods=['GET', 'POST', 'DELETE'],
postprocessors=pp_image.postprocessors,
preprocessors=pp_image.preprocessors)

Parsing Swagger JSON data and storing it in .net class

I want to parse Swagger data from the JSON I get from {service}/swagger/docs/v1 into dynamically generated .NET class.
The problem I am facing is that different APIs can have different number of parameters and operations. How do I dynamically parse Swagger JSON data for different services?
My end result should be list of all APIs and it's operations in a variable on which I can perform search easily.
Did you ever find an answer for this? Today I wanted to do the same thing, so I used the AutoRest open source project from MSFT, https://github.com/Azure/autorest. While it looks like it's designed for generating client code (code to consume the API documented by your swagger document), at some point on the way producing this code it had to of done exactly what you asked in your question - parse the Swagger file and understand the operations, inputs and outputs the API supports.
In fact we can get at this information - AutoRest publically exposes this information.
So use nuget to install AutoRest. Then add a reference to AutoRest.core and AutoRest.Model.Swagger. So far I've just simply gone for:
using Microsoft.Rest.Generator;
using Microsoft.Rest.Generator.Utilities;
using System.IO;
...
var settings = new Settings();
settings.Modeler = "Swagger";
var mfs = new MemoryFileSystem();
mfs.WriteFile("AutoRest.json", File.ReadAllText("AutoRest.json"));
mfs.WriteFile("Swagger.json", File.ReadAllText("Swagger.json"));
settings.FileSystem = mfs;
var b = System.IO.File.Exists("AutoRest.json");
settings.Input = "Swagger.json";
Modeler modeler = Microsoft.Rest.Generator.Extensibility.ExtensionsLoader.GetModeler(settings);
Microsoft.Rest.Generator.ClientModel.ServiceClient serviceClient;
try
{
serviceClient = modeler.Build();
}
catch (Exception exception)
{
throw new Exception(String.Format("Something nasty hit the fan: {0}", exception.Message));
}
The swagger document you want to parse is called Swagger.json and is in your bin directory. The AutoRest.json file you can grab from their GitHub (https://github.com/Azure/autorest/tree/master/AutoRest/AutoRest.Core.Tests/Resource). I'm not 100% sure how it's used, but it seems it's needed to inform the tool about what is supports. Both JSON files need to be in your bin.
The serviceClient object is what you want. It will contain information about the methods, model types, method groups
Let me know if this works. You can try it with their resource files. I used their ExtensionLoaderTests for reference when I was playing around(https://github.com/Azure/autorest/blob/master/AutoRest/AutoRest.Core.Tests/ExtensionsLoaderTests.cs).
(Also thank you to the Denis, an author of AutoRest)
If still a question you can use Swagger Parser library:
https://github.com/swagger-api/swagger-parser
as simple as:
// parse a swagger description from the petstore and get the result
SwaggerParseResult result = new OpenAPIParser().readLocation("https://petstore3.swagger.io/api/v3/openapi.json", null, null);

Google Cloud Storage: download a file with a different name

I'm wondering if it's possible to download a file from Google Cloud Storage with a different name than the one that has in the bucket.
For example, in Google Cloud Storage I have stored a file named 123-file.txt but when I download it I would like choose a different name, let's say file.txt
I've noticed that the link for download it is like:
https://storage.cloud.google.com/bucket_name%2F123-file.txt?response-content-disposition=attachment;%20filename=123-file.txt
So I've tried to change it to:
https://storage.cloud.google.com/bucket_name%2F123-file.txt?response-content-disposition=attachment;%20filename=file.txt
But it still keeps downloading as 123-file.txt instead of file.txt
The response-content-disposition parameter can only be used by authorized requests. Anonymous links don't work with it. You have a few options:
The content-disposition of a particular object is part of its metadata and can be permanently set. If you always want a specific file to be downloaded with a specific name, you can just permanently set the content-disposition metadata for the object.
You can also generate signed URLs that include the response-content-disposition query parameter. Then the users will be making authorized requests to download the resource.
example (first option Brandon Yarbrough) with javascript library:
const storage = new Storage()
const fileBucket = storage.bucket('myBucket')
const file = fileBucket.file('MyOriginalFile.txt')
const newName = "NewName.txt"
await file.save(content, {
metadata: {
contentDisposition: `inline; filename="${newName}"`
}
})
the following is a part of a python script i've used to remove the forward slashes - added by google cloud buckets when to represent directories - from multiple objects, it's based on this blog post, please keep in mind the double quotes around the content position "file name"
def update_blob_download_name(bucket_name):
""" update the download name of blobs and remove
the path.
:returns: None
:rtype: None
"""
# Storage client, not added to the code for brevity
client = initialize_google_storage_client()
bucket = client.bucket(bucket_name)
for blob in bucket.list_blobs():
if "/" in blob.name:
remove_path = blob.name[blob.name.rfind("/") + 1:] # rfind gives that last occurence of the char
ext = pathlib.Path(remove_path).suffix
remove_id = remove_path[:remove_path.rfind("_id_")]
new_name = remove_id + ext
blob.content_disposition = f'attachment; filename="{new_name}"'
blob.patch()

Using bottle.py and blobstore GAE

I recently started using bottle and GAE blobstore and while I can upload the files to the blobstore I cannot seem to find a way to download them from the store.
I followed the examples from the documentation but was only successful on the uploading part. I cannot integrate the example in my app since I'm using a different framework from webapp/2.
How would I go about creating an upload handler and download handler so that I can get the key of the uploaded blob and store it in my data model and use it later in the download handler?
I tried using the BlobInfo.all() to create a query the blobstore but I'm not able to get the key name field value of the entity.
This is my first interaction with the blobstore so I wouldn't mind advice on a better approach to the problem.
For serving a blob I would recommend you to look at the source code of the BlobstoreDownloadHandler. It should be easy to port it to bottle, since there's nothing very specific about the framework.
Here is an example on how to use BlobInfo.all():
for info in blobstore.BlobInfo.all():
self.response.out.write('Name:%s Key: %s Size:%s Creation:%s ContentType:%s<br>' % (info.filename, info.key(), info.size, info.creation, info.content_type))
for downloads you only really need to generate a response that includes the header "X-AppEngine-BlobKey:[your blob_key]" along with everything else you need like a Content-Disposition header if desired. or if it's an image you should probably just use the high performance image serving api, generate a url and redirect to it.... done
for uploads, besides writing a handler for appengine to call once the upload is safely in blobstore (that's in the docs)
You need a way to find the blob info in the incoming request. I have no idea what the request looks like in bottle. The Blobstoreuploadhandler has a get_uploads method and there's really no reason it needs to be an instance method as far as I can tell. So here's an example generic implementation of it that expects a webob request. For bottle you would need to write something similar that is compatible with bottles request object.
def get_uploads(request, field_name=None):
"""Get uploads for this request.
Args:
field_name: Only select uploads that were sent as a specific field.
populate_post: Add the non blob fields to request.POST
Returns:
A list of BlobInfo records corresponding to each upload.
Empty list if there are no blob-info records for field_name.
stolen from the SDK since they only provide a way to get to this
crap through their crappy webapp framework
"""
if not getattr(request, "__uploads", None):
request.__uploads = {}
for key, value in request.params.items():
if isinstance(value, cgi.FieldStorage):
if 'blob-key' in value.type_options:
request.__uploads.setdefault(key, []).append(
blobstore.parse_blob_info(value))
if field_name:
try:
return list(request.__uploads[field_name])
except KeyError:
return []
else:
results = []
for uploads in request.__uploads.itervalues():
results += uploads
return results
For anyone looking for this answer in future, to do this you need bottle (d'oh!) and defnull's multipart module.
Since creating upload URLs is generally simple enough and as per GAE docs, I'll just cover the upload handler.
from bottle import request
from multipart import parse_options_header
from google.appengine.ext.blobstore import BlobInfo
def get_blob_info(field_name):
try:
field = request.files[field_name]
except KeyError:
# Maybe form isn't multipart or file wasn't uploaded, or some such error
return None
blob_data = parse_options_header(field.content_type)[1]
try:
return BlobInfo.get(blob_data['blob-key'])
except KeyError:
# Malformed request? Wrong field name?
return None
Sorry if there are any errors in the code, it's off the top of my head.

Google App Engine Example of Uploading and Serving Arbitrary Files

I'd like to use GAE to allow a few users to upload files and later retrieve them. Files will be relatively small (a few hundred KB), so just storing stuff as a blob should work. I haven't been able to find any examples of something like this. There are a few image uploading examples out there but I'd like to be able to store word documents, pdfs, tiffs, etc. Any ideas/pointers/links? Thanks!
The same logic used for image uploads apply for other archive types. To make the file downloadable, you add a Content-Disposition header so the user is prompted to download it. A webapp simple example:
class DownloadHandler(webapp.RequestHandler):
def get(self, file_id):
# Files is a model.
f = Files.get_by_id(file_id)
if not f:
return self.error(404)
# Set headers to prompt for download.
headers = self.response.headers
headers['Content-Type'] = f.content_type or 'application/octet-stream'
headers['Content-Disposition'] = 'attachment; filename="%s"' % f.filename
# Add the file contents to the response.
self.response.out.write(f.contents)
(untested code, but you get the idea :)
It sounds like you want to use the Blobstore API.
You don't mention if you are using Python or Java so here are links to both.
I use blobstore API that admits any file upload/download up to 50 MB.

Resources