GAE Python Testing - google-app-engine

First I'm quite new to GAE/Python, please bear with me. Here's the situation
Have a test_all.py which tests all the test suites in my package. The TestCase's setUp current look like;
import test_settings as policy # has consistency variables <- does not work
...
def setUp(self):
# First, create an instance of the Testbed class.
self.testbed = testbed.Testbed()
# Then activate the testbed, which prepares the service stubs for use.
self.testbed.activate()
# Consistency policy nomal operations
self.policy = None
if policy.policy_flag == 'STRICT':
# Create a consistency policy that will simulate the High Replication consistency model.
self.policy = datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=0)
# Initialize the datastore stub with this policy.
self.testbed.init_datastore_v3_stub(consistency_policy=self.policy)
This is my primitive attempt to setup the datastore with different consistency policies to run tests against them. Total fail.
What I want is to run my test cases against different datastore consistencies at one go from my root test_all.py Using the above or other, how can I do this? How do I pass in parameters to the TestCase from the test runner?

Python, unit test - Pass command line arguments to setUp of unittest.TestCase
The above thread shows exactly how to best pass runtime arguments to the test case. Specifically answer 2.

Related

how to run a batch transform job in sagemaker pipeline via custom inference code?

based on the aws documentation/example provided here , https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html#Define-a-Transform-Step-to-Perform-Batch-Transformation, a model is created and a batch transform inference can be run on the trained model. it works for this example but if we need a custom inference script, How do we include a custom inference script in the model or model package before we run the batch transformation ?
from sagemaker.transformer import Transformer
from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep
transformer = Transformer(
model_name=step_create_model.properties.ModelName,
instance_type="ml.m5.xlarge",
instance_count=1,
output_path=f"s3://{default_bucket}/AbaloneTransform",
)
step_transform = TransformStep(
name="AbaloneTransform", transformer=transformer, inputs=TransformInput(data=batch_data)
)
You need a "model repacking step".
From the Amazon SageMaker Workflows FAQ:
Model repacking happens when the pipeline needs to include a custom
script in the compressed model file (model.tar.gz) to be uploaded to
Amazon S3 and used to deploy a model to a SageMaker endpoint. When
SageMaker pipeline trains a model and registers it to the model
registry, it introduces a repack step if the trained model output from
the training job needs to include a custom inference script. The
repack step uncompresses the model, adds a new script, and
recompresses the model. Running the pipeline adds the repack step as a
training job.
Basically, you can redefine a sagemaker Model by calling the training output as model_data and pass the inference script as entry_point.
Then sequentially, after training the model, redefine the model by changing the entry_point and on the latter you can use the transformer.
This is an example flow taken from a tested code:
my_model = Model(
image_uri=your_img_uri,
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
role=role,
entry_point='inference_script.py',
name="your_inference_step_name"
)
step_create_model = ModelStep(
name="YourInfName",
step_args=my_model.create(instance_type="ml.m5.xlarge")
)
transformer = Transformer(
model_name=step_create_model.properties.ModelName,
instance_count=your_instance_count,
instance_type=your_instance_type,
output_path=your_path
)
Of course, instead of the generic Model, you can directly use the one that best suits your requirements (e.g. PyTorchModel etc...).

How to clear/teardown db with pytest in Flask app

I know this might seem like duplication, I have found similar questions on SO on this topic, but none of them really worked for me. I simply need to clear out (or teardown) the database after each test, so every test works with a new empty one.
I am using fixture and my code looks like this:
#pytest.fixture(scope="module", autouse=True)
def test_client_db():
# set up
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///"
with app.app_context():
db.init_app(app)
db.create_all()
testing_client = app.test_client()
ctx = app.app_context()
ctx.push()
# do the testing
yield testing_client
# tear down
with app.app_context():
db.session.remove()
db.drop_all()
ctx.pop()
I am new to pytest and from what I have learnt, whatever goes before yield works as sort of a "set up", whatever goes after works as "teardown". Yet, when I run several tests, the database is not clear for each test, it holds data between them.
Why is it so? What is wrong with this fixture? What am i missing?
You have set the scope to module - this means the fixture will only be reset after all tests in a module had run.
Either set the scope to function or leave it completely, as function is the default.
See https://docs.pytest.org/en/stable/fixture.html#fixture-scopes

Persistant login during django selenium tests

Here the snippet I'm using for my end-to-end tests using selenium (i'm totally new in selenium django testing) ;
from django.contrib.auth.models import User
from django.contrib.staticfiles.testing import StaticLiveServerTestCase
from selenium.webdriver.chrome.webdriver import WebDriver
class MyTest(StaticLiveServerTestCase):
#classmethod
def setUpClass(cls):
super(DashboardTest, cls).setUpClass()
cls.selenium = WebDriver()
cls.user = User.objects.create_superuser(username=...,
password=...,
email=...)
time.sleep(1)
cls._login()
#classmethod
def _login(cls):
cls.selenium.get(
'%s%s' % (cls.live_server_url, '/admin/login/?next=/'))
...
def test_login(self):
self.selenium.implicitly_wait(10)
self.assertIn(self.username,
self.selenium.find_element_by_class_name("fixtop").text)
def test_go_to_dashboard(self):
query_json, saved_entry = self._create_entry()
self.selenium.get(
'%s%s' % (
self.live_server_url, '/dashboard/%d/' % saved_entry.id))
# assert on displayed values
def self._create_entry():
# create an entry using form and returns it
def test_create(self):
self.maxDiff = None
query_json, saved_entry = self._create_entry()
... assert on displayed values
I'm noticed that between each test the login is not persistant. So i can use _login in the setUp but make my tests slower.
So how to keep persistant login between test ? What are the best practices for testing those tests (djnago selenium tests) ?
Through-the-browser tests with Selenium are slow, period. They are, however, very valuable as they're the best shot you have at automating the true user experience.
You shouldn't try to write true unit tests with Selenium. Instead, use it to write one or two large functional tests. Try to capture an entire user interaction from start to finish. Then structure your test suite so that you can run your fast, non-Selenium unit tests separately, and only have to run the slow functional tests on occasion.
Your code looks fine, but in this scenario you'd combine test_go_to_dashboard and test_create into one method.
kevinharvey pointed me to the solution! Finally found out a way to reduce time of testing and keeping track of all tests:
I have renamed all methods starting with test.. to _test_.. and added a main method that calls each _test_ method:
def test_main(self):
for attr in dir(self):
# call each test and avoid recursive call
if attr.startswith('_test_') and attr != self.test_main.__name__:
with self.subTest("subtest %s " % attr):
self.selenium.get(self.live_server_url)
getattr(self, attr)()
This way, I can test (debug) individually each method :)

App Engine + NoseGAE bizarre broken test

I'm using the NoseGAE to write local unit tests for my App Engine application, however something is suddenly going wrong with one of my tests. I have standard setUp and tearDown functions, but one test seemingly broke for a reason I can't discern. Even stranger, setUp and tearDown are NOT getting called each time. I added global variables to count setUp/tearDown calls, and on my 4th test (the now seemingly broken one), setUp has been called twice and tearDown has been called once. Further, one of the objects from the third test exists when I query it by id, but not in a general query for its type. Here's some code that gives the bizarre picture:
class GameTest(unittest.TestCase):
def setUp(self):
self.testapp = webtest.TestApp(application)
self.testbed = testbed.Testbed()
self.testbed.activate()
self.testbed.init_datastore_v3_stub(
consistency_policy=datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=1),
require_indexes=True,
root_path="%s/../../../" % os.path.dirname(__file__)
)
def tearDown(self):
self.testbed.deactivate()
self.testapp.cookies.clear()
def test1(self):
...
def test2(self):
...
def test3(self):
...
# I create a Game object with the id 123 in this particular test
Game(id=123).put()
...
def test4(self):
print "id lookup: ", Game.get_by_id(123)
print "query: ", Game.query().get()
self.assertIsNone(Game.get_by_id(123))
This is an abstraction of the tests, but illustrates the issue.
The 4th test fails because it asserts that an object with that id does not exist. When I print out the two statements:
id lookup: Game(key=Key('Game', 123))
query: None
The id lookup shows the object created in test3, but the query lookup is EMPTY. This makes absolutely no sense to me. Further, I am 100% sure the test was working earlier. Does anyone have any idea how this is even possible? Could I possibly have some local corrupted file causing an issue?
I somewhat "solved" this. This issue only reproduced when I had other test cases in other files that were failing. Once I solved those, all my tests passed. I still don't fully understand why other failing tests should cause these bizarre issues with the testbed, but to anyone else having this issue, try fixing your other test cases first and see if that doesn't cause it to go away.

Automatically cached models in App Engine

I've been working on creating a subclass of db.Model that is automatically cached, i.e.:
instance.put would store the entity in memcache before persisting it to the datastore
class.get_by_key_name would first check the cache, and if missed, would go to the datastore to retrieve it and cache it after retrieval
I developed the approach below (which appears to work for me), but I have a few questions:
I had read Nick Johnson's article on efficient model memcaching which suggests implementing the serialization for memcache through protocol buffers. Looking at the memcache API source code in the SDK, it looks like Google has already implemented protobuf serialization by default. Is my interpretation correct?
Am I missing some important details (which could get me in the future) in the way I am subclassing db.Model or overriding the two methods?
Is there a more efficient way of implementing what I've done below?
Are there guidelines, benchmarks or best practices for when such entity caching would make sense from a performance perspective? Or would it always make sense to cache entities? On a related note, should I be reading anything into the fact that Google hasn't provided a cached model in the modeling API? Are there too many special cases to be thinking about?
Below is my current implementation. I would really appreciate any and all guidance/suggestions on caching entities (even if your response is not a direct answer to one of the 4 questions above, but relevant to the topic overall).
from google.appengine.ext import db
from google.appengine.api import memcache
import os
import logging
class CachedModel(db.Model):
'''Subclass of db.Model that automatically caches entities for put and
attempts to load from cache for get_by_key_name
'''
#classmethod
def get_by_key_name(cls, key_names, parent=None, **kwargs):
cache = memcache.Client()
# Ensure that every new deployment of the application results in a cache miss
# by including the application version ID in the namespace of the cache entry
namespace = os.environ['CURRENT_VERSION_ID'] + '_' + cls.__name__
if not isinstance(key_names, list):
key_names = [key_names]
entities = cache.get_multi(key_names, namespace=namespace)
if entities:
logging.info('%s (namespace=%s) retrieved from memcache' % (str(entities.keys()), namespace))
missing_key_names = list(set(key_names) - set(entities.keys()))
# For keys missed in memcahce, attempt to retrieve entities from datastore
if missing_key_names:
missing_entities = super(CachedModel, cls).get_by_key_name(missing_key_names, parent, **kwargs)
missing_mapping = zip(missing_key_names, missing_entities)
# Determine entities that exist in datastore and store them to memcache
entities_to_cache = dict()
for key_name, entity in missing_mapping:
if entity:
entities_to_cache[key_name] = entity
if entities_to_cache:
logging.info('%s (namespace=%s) cached by get_by_key_name' % (str(entities_to_cache.keys()), namespace))
cache.set_multi(entities_to_cache, namespace=namespace)
non_existent = set(missing_key_names) - set(entities_to_cache.keys())
if non_existent:
logging.info('%s (namespace=%s) missing from cache and datastore' % (str(non_existent), namespace))
# Combine entities retrieved from cache and entities retrieved from datastore
entities.update(missing_mapping)
if len(key_names) == 1:
return entities[key_names[0]]
else:
return [entities[key_name] for key_name in key_names]
def put(self, **kwargs):
cache = memcache.Client()
namespace = os.environ['CURRENT_VERSION_ID'] + '_' + self.__class__.__name__
cache.set(self.key().name(), self, namespace=namespace)
logging.info('%s (namespace=%s) cached by put' % (self.key().name(), namespace))
return super(CachedModel, self).put(**kwargs)
Rather than reinventing the wheel, why not switch to NDB, which already implements memcaching of model instances?
You might check out Nick Johnson's article on adding pre and post hooks for data model classes as an alternative to overriding get_by_key_name. That way your hook could work even when using db.get and db.put.
That said, I've found in my app that I've had more dramatic performance improvements caching things at a higher level - like all the content I need to render an entire page, or the page's html itself if possible.
You also might check out the asynctools library which can help you run datastore queries in parallel and cache the results.
I lot of good tips from Nick Johnson's you want implement are already implemented in the module appengine-mp. like serialization via protocolbuf or prefetching entities.
About your method get_by_key_names you can check the code. If you want create your own db.Model layer, maybe that can help you but you can also contribute to improve the existing model. ;)

Resources