Unit testing Google's Cloud Storage API

Unit testing Google's Cloud Storage API - google-app-engine

I have an API endpoint I'm trying to write unit tests for and I can't seem to figure out how to unit test the Python Google Cloud Storage client library calls (https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/).
I was hoping to find a stub somewhere in the library and have it be as simple as unit testing the mail API would be (https://cloud.google.com/appengine/docs/python/tools/localunittesting?hl=en), but haven't found anything yet. Any idea how to go about this?

The list of available unit test does not list GCS. You can file a feature request on their GitHub to add that functionality.
In the mean time using the setUp for your tests to create files is probably your best bet.

Since we have been bitten by severalGoogle API transistions so far (blobstore, blobstore with gs:/, cloudstorage, google-clound-storage) we have long created our own thin wrapper around all GCS access. This also includes stubbing for tests, like this:
def open(path, mode='w', bucket=None, content_type=None):
if not bucket:
bucket = app_identity.get_default_gcs_bucket_name()
jsonpath = '/{}'.format(os.path.join(bucket, path))
jsonpath = jsonpath.replace('*', str(datetime.date.today()))
if os.environ.get(b'GAETK2_UNITTEST'):
LOGGER.info('running unittest, GCS disabled')
return StringIO()
return cloudstorage.open(jsonpath, mode, content_type=content_type)
Still a lot of work if you want to retrofit that on a big application. But might be worth it - the next Google API depreciation will come.

Related

Cloud Endpoints and App Engine

I've just started on Google Cloud and I'm creating an iOS app to interact with Google Cloud services via a mobile backend. I'm using Python to write the backend for App Engine. I've gone through the tutorials in creating an API based on endpoints - but I have a question.
Do I have to create a Cloud Endpoints API, and then an app on App Engine? Basically, I want to be able to register accounts on my iOS app, call an API which then makes use of Google Datastore to store the account details. From looking at the tutorials (both the cloud endpoints one and then the guestbook one), am I meant to expose Google Datastore, cloud storage etc. within the endpoints api? Or does that link into another app where that is all done?
Sorry if this sounds a bit silly, but I just want to make sure!
Thanks in advance.

In a nutshell, your Cloud Endpoints API is your application. Some of the documentation regarding Cloud Endpoints can be a bit confusing (or vague), but on the server side it's essentially a bunch of Python decorators or Java annotations that allow you to expose your application logic as a REST API.
I find the Java implementation of Cloud Endpoints more intuitive than the Python one, which requires a bit more work to (de-)serialise your objects. You could look at endpoints_proto_datastore.ndb.EndpointsModel which might take some of the boilerplate stuff out of the equation (defining messages).
Essentially, when you write your API, each endpoint maps to a python function. Inside that function you can do what you like, but typically it will be either:
Deserialise your POSTed JSON, validate it, and write some entities to Datastore (or Cloud SQL, BigTable, wherever).
Read one or more entities from Datastore and serialize them to JSON and return them to the client.
For example, you might define your API (the whole collection of endpoint functions) as
#endpoints.api(name='cafeApi', version='v1', description='Cafe API', audiences=[endpoints.API_EXPLORER_CLIENT_ID])
class CafeApi(remote.Service):
# endpoints here
For example, you might have an endpoint to get nearby cafes:
#endpoints.method(GEO_RESOURCE, CafeListResponse, path='cafes/nearby', http_method='GET', name='cafes.nearby')
def get_nearby_cafes(self, request):
"""Get cafes close to specified lat,long"""
cafes = list()
for c in search.get_nearby_cafes(request.lat, request.lon):
cafes.append(c.response_message())
return CafeListResponse(cafes=cafes)
A couple of things to highlight here. With the Python Endpoints implementation, you need to define your resource and message classes - these are used to encapsulate request data and response bodies.
So, in the above example, GEO_RESOURCE encapsulates the fields required to make a GeoPoint (so we can search by location using Search API, but you might just search Datastore for Cafes with a 5-star rating):
GEO_RESOURCE = endpoints.ResourceContainer(
message_types.VoidMessage,
lat=messages.FloatField(1, required=True),
lon=messages.FloatField(2, required=True)
)
and the CafeListResponse would just encapsulate a list of CafeResponse objects (with Cloud Endpoints you return a single object):
class CafeListResponse(messages.Message):
locations = messages.MessageField(CafeResponse, 1, required=False, repeated=True)
where the CafeResponse is the message that defines how you want your objects (typically Datastore entities) serialised by your API. e.g.,
class LocationResponse(messages.Message):
id = messages.StringField(1, required=False)
coordinates = messages.MessageField(GeoMessage, 3, required=True)
name = messages.StringField(4, required=False)
With that endpoint signature, you can access it via an HTTP GET at /cafeApi/v1/cafes/nearby?lat=...&lon=... or via, say, the Javascript API client with `cafeApi.cafes.nearby(...).
Personally, I found Flask a bit more flexible with working with Python to create a REST API.

Download large file on Google App Engine Python

On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.

DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.

Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.

google apps from app engine

I want to produce a Google Apps document based on a (Google doc) template stored on the users Google Drive and some XML data held by a servlet running on Google App Engine.
Preferably I want to run as much as possible on the GAE. Is it possible to run Apps Service APIs on GAE or download/manipulate Google doc on GAE? I have not been able to find anything suitable
One alternative is obviously to implement the merge functionality using an Apps Script transferring the XML as parameters and initiate the script through http from GAE, but it just seem somewhat awkward in comparison.
EDIT:
Specifically I am looking for the replaceText script functionality, as shown in the Apps script snippet below, to be implemented in GAE. Remaining code is supported through Drive/Mail API, I guess..
// Get document template, copy it as a new temp doc, and save the Doc’s id
var copyId = DocsList.getFileById(providedTemplateId)
.makeCopy('My-title')
.getId();
var copyDoc = DocumentApp.openById(copyId);
var copyBody = copyDoc.getActiveSection();
// Replace place holder keys,
copyBody.replaceText("CustomerAddressee", fullName);
var todaysDate = Utilities.formatDate(new Date(), "GMT+2", "dd/MM-yyyy");
copyBody.replaceText("DateToday", todaysDate);
// Save and close the temporary document
copyDoc.saveAndClose();
// Convert temporary document to PDF by using the getAs blob conversion
var pdf = DocsList.getFileById(copyId).getAs("application/pdf");
// Attach PDF and send the email
MailApp.sendEmail({
to: email_address,
subject: "Proposal",
htmlBody: "Hi,<br><br>Here is my file :)<br>Enjoy!<br><br>Regards Tony",
attachments: pdf});

As you already found out, apps script is currently the only one that can access an api to modify google docs. All other ways cannot do it unless you export to another format (like pdf or .doc) then use libraries that can modify those, then reupload the new file asking to convert to a google doc native format, which in some cases would loose some format/comments/named ranges and other google doc features. So like you said, if you must use the google docs api you must call apps script (as a content service). Also note that the sample apps script code you show is old and uses the deptecated docsList so you need to port it to the Drive api.

Apps script pretty much piggy backs on top of the standard published Google APIs. Increasingly the behaviours are becoming more familiar.
Obviously apps script is js based and gae not. All the APIs apart from those related to script running are available in the standard gae client runtimes.
No code to check here so I'm afraid generic answer is all I have.

I see now it can be solved by using the Google Drive API to export (download) the Google Apps Doc file as PDF (or other formats) to GAE, and do simple replace-text editing using e.g. the iText library

How to set CacheControl in the new Cloud Storage Api (Python)?

I'm following the guidelines and updating my code to use the new Cloud Storage API in GAE, i do need to set the cachecontrol headers, previously it was easy:
files.gs.create(filename, mime_type='image/png', acl='public-read', cache_control='public, max-age=100000, must-revalidate' )
BUT, with the new API, the guidelines says that the "cache_control" is not available...
I get this error when tried to put the cachecontrol inside the Options:
ValueError: option cache_control is not supported.
Tried with Cache-Control and the same error...
As usual, the documentation of the new API is not good.
Can someone help me how to set the cache headers in the new Cloud Storage API using PYTHON. In case is not possible, can I still use the old api for my project?
Thanks.

You are right. As documented here,
the open function only supports x-goog-acl and x-goog-meta headers.
Cache control is likely to be added in the near future to make migration easier. Please note that the main value of the GCS client lib is buffered read, buffered resumable write, and automatically retries to overcome transient errors. Many other simple REST operations on GCS (e.g cache, file copy, create bucket ...) can already be done by Google API Client. The "downside" of Google API Client is that since it doesn't come directly from/for App Engine, it does not have dev appserver support.

Apache Httpclient and the Cloud

I want to put a scraping service using Apache HttpClient to the Cloud. I read problems are possible with Google App Engine, as it's direct network access and threads creation are prohibited. What's about other cloud hosting providers? Have anyone experince with Apache HttpClient + cloud?

AppEngine has threads and direct network access (HTTP only). There is a workaround to make it work with HttpClient.
Also, if you plan to use many parse tasks in parallel, you might check out Task Queue or even mapreduce.
Btw, there is a "misfeature" in GAE that you can not fully set custom User-agent header on your requests - GAE always adds "AppEngine" to the end of it (this breaks requests to certain sites - most notably iTunes).

It's certainly possible to create threads and access other websites from CloudFoundry, you're just time limited for each process. For example, if you take a look at http://rack-scrape.cloudfoundry.com/, it's a simple rack application that inspects the 'a' tags from Google.com;
require 'rubygems'
require 'open-uri'
require 'hpricot'
run Proc.new { |env|
doc = Hpricot(open("http://www.google.com"))
anchors = (doc/"a")
[200, {"Content-Type" => "text/html"}, [anchors.inspect]]
}
As for Apache HttpClient, I have no experience of this but I understand it isn't maintained any more.