Local unit testing for Google App Engine + Python WebApp2 - google-app-engine

I have a really simple web app. All the important stuff happens in index.py:
from google.appengine.api import users
import webapp2
import os
import jinja2
JINJA_ENVIRONMENT = jinja2.Environment(
loader=jinja2.FileSystemLoader(os.path.dirname(__file__)),
extensions=['jinja2.ext.autoescape'],
autoescape=True)
def get_user():
user = {}
user['email'] = str(users.get_current_user())
user['name'], user['domain'] = user['email'].split('#')
user['logout_link'] = users.create_logout_url('/')
return user
class BaseHandler(webapp2.RequestHandler):
def dispatch(self):
user = get_user()
template_values = {'user': user}
if user['domain'] != 'foo.com':
template_values['page_title'] = 'Access Denied'
template = '403'
else:
template_values['page_title'] = 'Home'
template = 'index'
template_engine = JINJA_ENVIRONMENT.get_template('%s.html' % template)
self.response.write(template_engine.render(template_values))
app = webapp2.WSGIApplication([
('/', BaseHandler),
], debug=True)
I'm trying to be a good person and write some local unit tests but - after looking at the documentation - I am totally out of my depth. All I want is a basic framework where I can do something like:
python test_security.py
and simulate two users hitting the domain - one #foo.com who should get the index template, and one #bar.com who should get the 403 template.
Here's where I've got so far:
import sys
# I don't want to talk about it, let's just ignore this block
sys.path.append('C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2')
sys.path.append('C:\Program Files (x86)\Google\google_appengine\lib\webob-1.2.3')
sys.path.append('C:\Program Files (x86)\Google\google_appengine\lib\jinja2-2.6')
sys.path.append('C:\Program Files (x86)\Google\google_appengine\lib\yaml-3.10')
sys.path.append('C:\Program Files (x86)\Google\google_appengine\lib\jinja2-2.6')
sys.path.append('C:\Program Files (x86)\Google\google_appengine')
sys.path.append('C:\pytest')
# A few proper imports
import unittest
import webapp2
from google.appengine.ext import testbed
# Import the module I'd like to test
import index
class TestHandlers(unittest.TestCase):
def test_hello(self):
self.testbed = testbed.Testbed()
self.testbed.init_user_stub()
self.testbed.setup_env(USER_EMAIL='test#foo.com',USER_ID='1', USER_IS_ADMIN='0')
request = webapp2.Request.blank('/')
response = request.get_response(main.app)
print "running test"
self.assertEqual(response.status_int, 200)
self.assertEqual(response.body, 'Hello, world!')
Predictably, this doesn't work at all. What am I missing? Am I just wildly overestimating how simple this should be?

If you're planning on invoking this with "python test_security.py", the magic words you are looking for are:
if __name__ == '__main__':
unittest.main()
This will make your unit test run - at the moment all you're doing is defining it.
Note also that you'll need to change your request.get_response from "main.app" to "index.app".

I suspect (primarily based on the function names) that you should call self.testbed.init_user_stub() before calling self.testbed.setup_env(), not after.
Also you seem to be missing an initial self.testbed = testbed.Testbed() and possibly a testbed.activate() call.
You might want to check out this answer: https://stackoverflow.com/a/21139805/4495081

Related

sagemaker inference container ModuleNotFoundError: No module named 'model_handler'

I am trying to deploy a model using my own custom inference container on sagemaker. I am following the documentation here https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html
I have an entrypoint file:
from sagemaker_inference import model_server
#HANDLER_SERVICE = "/home/model-server/model_handler.py:handle"
HANDLER_SERVICE = "model_handler.py"
model_server.start_model_server(handler_service=HANDLER_SERVICE)
I have a model_handler.py file:
from sagemaker_inference.default_handler_service import DefaultHandlerService
from sagemaker_inference.transformer import Transformer
from CustomHandler import CustomHandler
class ModelHandler(DefaultHandlerService):
def __init__(self):
transformer = Transformer(default_inference_handler=CustomHandler())
super(HandlerService, self).__init__(transformer=transformer)
And I have my CustomHandler.py file:
import os
import json
import pandas as pd
from joblib import dump, load
from sagemaker_inference import default_inference_handler, decoder, encoder, errors, utils, content_types
class CustomHandler(default_inference_handler.DefaultInferenceHandler):
def model_fn(self, model_dir: str) -> str:
clf = load(os.path.join(model_dir, "model.joblib"))
return clf
def input_fn(self, request_body: str, content_type: str) -> pd.DataFrame:
if content_type == "application/json":
items = json.loads(request_body)
for item in items:
processed_item1 = process_item1(items["item1"])
processed_item2 = process_item2(items["item2])
all_item1 += [processed_item1]
all_item2 += [processed_item2]
return pd.DataFrame({"item1": all_item1, "comments": all_item2})
def predict_fn(self, input_data, model):
return model.predict(input_data)
Once I deploy the model to an endpoint with these files in the image, I get the following error: ml.mms.wlm.WorkerLifeCycle - ModuleNotFoundError: No module named 'model_handler'.
I am really stuck what to do here. I wish there was an example of how to do this in the above way end to end but I don't think there is. Thanks!
This is because of the path mismatch. The entrypoint is trying to look for "model_handler.py" in WORKDIR directory of the container.
To avoid this, always specify absolute path when working with containers.
Moreover your code looks confusing. Please use this sample code as the reference:
import subprocess
from subprocess import CalledProcessError
import model_handler
from retrying import retry
from sagemaker_inference import model_server
import os
def _retry_if_error(exception):
return isinstance(exception, CalledProcessError or OSError)
#retry(stop_max_delay=1000 * 50, retry_on_exception=_retry_if_error)
def _start_mms():
# by default the number of workers per model is 1, but we can configure it through the
# environment variable below if desired.
# os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
print("Starting MMS -> running ", model_handler.__file__)
model_server.start_model_server(handler_service=model_handler.__file__ + ":handle")
def main():
_start_mms()
# prevent docker exit
subprocess.call(["tail", "-f", "/dev/null"])
main()
Further, notice this line - model_server.start_model_server(handler_service=model_handler.__file__ + ":handle")
Here we are starting the server, and telling it to call handle() function in model_handler.py to invoke your custom logic for all incoming requests.
Also remember that Sagemaker BYOC requires model_handler.py to implement another function ping()
So your "model_handler.py" should look like this -
custom_handler = CustomHandler()
# define your own health check for the model over here
def ping():
return "healthy"
def handle(request, context): # context is necessary input otherwise Sagemaker will throw exception
if request is None:
return "SOME DEFAULT OUTPUT"
try:
response = custom_handler.predict_fn(request)
return [response] # Response must be a list otherwise Sagemaker will throw exception
except Exception as e:
logger.error('Prediction failed for request: {}. \n'
.format(request) + 'Error trace :: {} \n'.format(str(e)))

How can I run local unit tests on models that use Google Cloud Storage on GAE (python)

I'm trying to write unit tests for a model that calls on files stored on google cloud storage, but I haven't found any examples on how to simulate the GCS service for unit testing.
It seems that there should be a stub service I can use, and I see some references to gcs within the testbed docs described there, but haven't nailed down an example using it I can work off of.
Here's a condensed/example version of the model I have:
import peewee
from google.appengine.api import app_identity
import cloudstorage
class Example(Model):
uuid = peewee.CharField(default=uuid4)
some_property = peewee.CharField()
#property
def raw_file_one(self):
bucket = app_identity.get_default_gcs_bucket_name()
filename = '/{0}/repo_one/{1}'.format(bucket, self.uuid)
with cloudstorage.open(filename, 'r') as f:
return f.read()
def store_raw_file(self, raw_file_one):
bucket = app_identity.get_default_gcs_bucket_name()
filename = '/{0}/stat_doms/{1}'.format(bucket, self.uuid)
with cloudstorage.open(filename, 'w') as f:
f.write(raw_file_one)
I'll build test cases with:
import unittest
from google.appengine.ext import testbed
class TestCase(unittest.TestCase):
def run(self, *args, **kwargs):
self.stub_google_services()
result = super(TestCase, self).run(*args, **kwargs)
self.unstub_google_services()
return result
def stub_google_services(self):
self.testbed = testbed.Testbed()
self.testbed.activate()
self.testbed.init_all_stubs()
def unstub_google_services(self):
self.testbed.deactivate()
Into module tests like:
from application.tests.testcase import TestCase
from application.models import Example
class ExampleTest(TestCase):
def test_store_raw_file(self):
...
[assert something]
I presume I'd do something like blobstore = self.testbed.get_stub('blobstore') to create a service I can perform tests on (e.g. blobstore.CreateBlob(blob_key, image)) -- but I don't see a service for GCS in the testbed ref docs.
Thoughts on how to implement unit tests with GCS?
I think you're looking for:
from google.appengine.ext.cloudstorage import cloudstorage_stub
from google.appengine.api.blobstore import blobstore_stub
and:
blob_stub = blobstore_stub.BlobstoreServiceStub(blob_storage)
storage_stub = cloudstorage_stub.CloudStorageStub(blob_storage)
testbed._register_stub('blobstore', self.blob_stub)
testbed._register_stub("cloudstorage", self.storage_stub)

How to call an instance of webapp.RequestHandler class inside other module?

I am trying to create a simple web application using Google App Engine. I use jinja2 to render a frontend html file. User enters their AWS credentials and gets the output of regions and connected with them virtual machines.
I have a controller file, to which I import a model file and it looks like this:
import webapp2
import jinja2
import os
import model
jinja_environment = jinja2.Environment(loader=jinja2.FileSystemLoader(os.path.dirname(__file__)))
class MainPage(webapp2.RequestHandler):
def get(self):
template = jinja_environment.get_template('index.html')
self.response.out.write(template.render())
def request_a(self):
a = self.reguest.get('a')
return a
def request_b(self):
b = self.reguest.get('b')
return b
class Responce(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.write(testing_ec2.me.get_machines())
app = webapp2.WSGIApplication([('/2', MainPage), ('/responce', Responce)], debug=True)
then I have a model file, to which I import controller file and it looks like this:
import boto.ec2
import controller
import sys
if not boto.config.has_section('Boto'):
boto.config.add_section('Boto')
boto.config.set('Boto', 'https_validate_certificates', 'False')
a = controller.MainPage.get()
b = controller.MainPage.get()
class VM(object):
def __init__(self, a, b):
self.a = a
self.b = b
self.regions = boto.ec2.regions(aws_access_key_id = a, aws_secret_access_key = b)
def get_machines(self):
self.region_to_instance = {}#dictionary, which matches regions and instances for this region
for region in self.regions:
conn = region.connect(aws_access_key_id = self.a, aws_secret_access_key = self.b)
reservations = conn.get_all_instances()#get reservations(need understand them better)
if len(reservations) > 0:#if there are reservations in this region
self.instances = [i for r in reservations for i in r.instances]#creates a list of instances for that region
self.region_to_instance[region.name] = self.instances
return self.region_to_instance
me = VM(a, b)
me.get_machines()
When I run this, it throws an error: type object 'MainPage' has no attribute 'request_a'
I assume, that it happens, because I do not call an instance of MainPage class and instead call a class itself. What is an instance of MainPage(and it`s parent webapp.RequestHandler) class? How do I call it inside another module?
Your code looks very strange to me. I do not understand your coding practice.
The general answer is : if you like to use methods of your MainPage, you can use inheritance.
But. If I understand the goal of your code. Why not call boto from your Responce class. But, here you use a get, where you should use a post, because you post a form with AWS credentials.
So I suggest :
create a MainPage with get and post methods to handle the form
in the the post method make the boto requests and send the result with jinja to the user.
See also Getting started with GAE Python27 and handling forms:
https://developers.google.com/appengine/docs/python/gettingstartedpython27/handlingforms?hl=nl

<class 'google.appengine.runtime.DeadlineExceededError'>: how to get around?

Ok guys I am having tons of problems getting my working dev server to a working production server :). I have a task that will go through and request urls and collect and update data. It takes 30 minutes to run.
I uploaded to production server and going to the url with its corresponding .py script appname.appspot.com/tasks/rrs after 30 seconds I am getting the class google.appengine.runtime.DeadlineExceededError' Is there any way to get around this? Is this a 30 second deadline for a page? This script works fine in development server I go to the url and the associate .py script runs until completion.
import time
import random
import string
import cPickle
from StringIO import StringIO
try:
import json
except ImportError:
import simplejson as json
import urllib
import pprint
import datetime
import sys
sys.path.append("C:\Program Files (x86)\Google\google_appengine")
sys.path.append("C:\Program Files (x86)\Google\google_appengine\lib\yaml\lib")
sys.path.append("C:\Program Files (x86)\Google\google_appengine\lib\webob")
from google.appengine.api import users
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext import db
class SR(db.Model):
name = db.StringProperty()
title = db.StringProperty()
url = db.StringProperty()
##request url and returns JSON_data
def overview(page):
u = urllib.urlopen(page)
bytes = StringIO(u.read())
##print bytes
u.close()
try:
JSON_data = json.load(bytes)
return JSON_data
except ValueError,e:
print e," Couldn't get .json for %s" % page
return None
##specific code to parse particular JSON data and append new SR objects to the given url list
def parse_json(JSON_data,lists):
sr = SR()
sr.name = ##data gathered
sr.title = ##data gathered
sr.url = ##data gathered
lists.append(sr)
return lists
## I want to be able to request lets say 500 pages without timeing out
page = 'someurlpage.com'##starting url
url_list = []
for z in range(0,500):
page = 'someurlpage.com/%s'%z
JSON_data = overview(page)##get json data for a given url page
url_list = parse_json(JSON_data,url_list)##parse the json data and append class objects to a given list
db.put(url_list)##finally add object to gae database
Yes, the App Engine imposes a 30 seconds deadline. One way around it might be a try/except DeadlineExceededError and putting the rest in a taskqueue.
But you can't make your requests run for a longer period.
You can also try Bulkupdate
Example:
class Todo(db.Model):
page = db.StringProperty()
class BulkPageParser(bulkupdate.BulkUpdater):
def get_query(self):
return Todo.all()
def handle_entity(self, entity):
JSON_data = overview(entity.page)
db.put(parse_json(JSON_data, [])
entity.delete()
# Put this in your view code:
for i in range(500):
Todo(page='someurlpage.com/%s' % i).put()
job = BulkPageParser()
job.start()
ok so if I am dynamically adding links as I am parsing the pages, I would add to the todo queue like so I believe.
def handle_entity(self, entity):
JSON_data = overview(entity.page)
data_gathered,new_links = parse_json(JSON_data, [])##like earlier returns the a list of sr objects, and now a list of new links/pages to go to
db.put(data_gathered)
for link in new_links:
Todo(page=link).put()
entity.delete()

Using Google App Engine's Cron service to extract data from a URL

I need to scrape a simple webpage which has the following text:
Value=29
Time=128769
The values change frequently.
I want to extract the Value (29 in this case) and store it in a database. I want to scrape this page every 6 hours. I am not interested in displaying the value anywhere, I just am interested in the cron. Hope I made sense.
Please advise me if I can accomplish this using Google's App Engine.
Thank you!
Please advise me if I can accomplish
this using Google's App Engine.
Sure! E.g., in Python, urlfetch (with the URL as argument) to get the contents, then a simple re.search(r'Value=(\d+)').group(1) (if the contents are as simple as you're showing) to get the value, and a db.put to store it. Do you want the Python details spelled out, or do you prefer Java?
Edit: urllib / urllib2 would also be feasible (GAE does support them now).
So cron.yaml should be something like:
cron:
- description: refresh "value"
url: /refvalue
schedule: every 6 hours
and app.yaml something like:
application: valueref
version: 1
runtime: python
api_version: 1
handlers:
- url: /refvalue
script: refvalue.py
login: admin
You may have other entries in either or both, of course, but this is the subset needed to "refresh the value". A possible refvalue.py might be:
import re
import wsgiref.handlers
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.api import urlfetch
class Value(db.Model):
thevalue = db.IntegerProperty()
when = db.DateTimeProperty(auto_now_add=True)
class RefValueHandler(webapp.RequestHandler):
def get(self):
resp = urlfetch.fetch('http://whatever.example.com')
mo = re.match(r'Value=(\d+)', resp.content)
if mo:
val = int(mo.group(1))
else:
val = None
valobj = Value(thevalue=val)
valobj.put()
def main():
application = webapp.WSGIApplication(
[('/refvalue', RefValueHandler),], debug=True)
wsgiref.handlers.CGIHandler().run(application)
if __name__ == '__main__':
main()
Depending on what else your web app is doing, you'll probably want to move the class Value to a separate file (e.g. models.py with other models) which of course you'll then have to import (from this .py file and from others which do something interesting with all of your saved values). Here I've taken some possible anomalies into account (no Value= found on the target page) but not others (the target page's server does not respond or gives an error); it's hard to know exactly what anomalies you need to consider and what you want to do if they occur (what I'm doing here is very simply recording None as the value at the anomaly's time, but you may want to do more... or less -- I'll leave that up to you!-)

Resources