What is the best way to implement a context_task for periodic_tasks in Huey? - peewee

I'm looking for the best way to implement Peewee's context manager with periodic tasks in Huey. Normal tasks have that nice little Huey.context_task() decorator, but there doesn't seem to be anything similar for periodic tasks.
Am I correct to assume I will just have to use an (uglier) with statement within periodic tasks?

Should be able to do something like this:
from functools import wraps
huey = RedisHuey()
db = PostgresqlDatabase(...)
def db_periodic_task(*args, **kwargs):
def decorator(fn):
#wraps(fn)
def new_task():
with db:
fn()
return huey.periodic_task(*args, **kwargs)(new_task)
return decorator
#db_periodic_task(crontab('0', '0'))
def my_periodic_task():
# db will have been opened at start.
# db will then be closed upon return.

Related

Persistant login during django selenium tests

Here the snippet I'm using for my end-to-end tests using selenium (i'm totally new in selenium django testing) ;
from django.contrib.auth.models import User
from django.contrib.staticfiles.testing import StaticLiveServerTestCase
from selenium.webdriver.chrome.webdriver import WebDriver
class MyTest(StaticLiveServerTestCase):
#classmethod
def setUpClass(cls):
super(DashboardTest, cls).setUpClass()
cls.selenium = WebDriver()
cls.user = User.objects.create_superuser(username=...,
password=...,
email=...)
time.sleep(1)
cls._login()
#classmethod
def _login(cls):
cls.selenium.get(
'%s%s' % (cls.live_server_url, '/admin/login/?next=/'))
...
def test_login(self):
self.selenium.implicitly_wait(10)
self.assertIn(self.username,
self.selenium.find_element_by_class_name("fixtop").text)
def test_go_to_dashboard(self):
query_json, saved_entry = self._create_entry()
self.selenium.get(
'%s%s' % (
self.live_server_url, '/dashboard/%d/' % saved_entry.id))
# assert on displayed values
def self._create_entry():
# create an entry using form and returns it
def test_create(self):
self.maxDiff = None
query_json, saved_entry = self._create_entry()
... assert on displayed values
I'm noticed that between each test the login is not persistant. So i can use _login in the setUp but make my tests slower.
So how to keep persistant login between test ? What are the best practices for testing those tests (djnago selenium tests) ?
Through-the-browser tests with Selenium are slow, period. They are, however, very valuable as they're the best shot you have at automating the true user experience.
You shouldn't try to write true unit tests with Selenium. Instead, use it to write one or two large functional tests. Try to capture an entire user interaction from start to finish. Then structure your test suite so that you can run your fast, non-Selenium unit tests separately, and only have to run the slow functional tests on occasion.
Your code looks fine, but in this scenario you'd combine test_go_to_dashboard and test_create into one method.
kevinharvey pointed me to the solution! Finally found out a way to reduce time of testing and keeping track of all tests:
I have renamed all methods starting with test.. to _test_.. and added a main method that calls each _test_ method:
def test_main(self):
for attr in dir(self):
# call each test and avoid recursive call
if attr.startswith('_test_') and attr != self.test_main.__name__:
with self.subTest("subtest %s " % attr):
self.selenium.get(self.live_server_url)
getattr(self, attr)()
This way, I can test (debug) individually each method :)

GAE Python Testing

First I'm quite new to GAE/Python, please bear with me. Here's the situation
Have a test_all.py which tests all the test suites in my package. The TestCase's setUp current look like;
import test_settings as policy # has consistency variables <- does not work
...
def setUp(self):
# First, create an instance of the Testbed class.
self.testbed = testbed.Testbed()
# Then activate the testbed, which prepares the service stubs for use.
self.testbed.activate()
# Consistency policy nomal operations
self.policy = None
if policy.policy_flag == 'STRICT':
# Create a consistency policy that will simulate the High Replication consistency model.
self.policy = datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=0)
# Initialize the datastore stub with this policy.
self.testbed.init_datastore_v3_stub(consistency_policy=self.policy)
This is my primitive attempt to setup the datastore with different consistency policies to run tests against them. Total fail.
What I want is to run my test cases against different datastore consistencies at one go from my root test_all.py Using the above or other, how can I do this? How do I pass in parameters to the TestCase from the test runner?
Python, unit test - Pass command line arguments to setUp of unittest.TestCase
The above thread shows exactly how to best pass runtime arguments to the test case. Specifically answer 2.

"ExistenceError" in simple AppEngine + Google Cloud Storage application

I have a simple AppEngine handler as follows:
class TestGS(webapp2.RequestHandler):
def get(self):
file_name = '/gs/ds_stats/testfile'
files.gs.create(file_name, mime_type='text/html')
with files.open(file_name, 'a') as file_handle:
file_handle.write("foo")
files.finalize(file_name)
However, when I call this handler, I get ExistenceError: ApplicationError: 105 at the line with files.open(....
This seems like a super simple scenario, and there's no indication at all as to why this is failing (especially since the files.gs.create right above it seems to have succeeded, though, is there any way to verify this?).
Looking through the source code, I see the following problems can cause this error:
if (e.application_error in
[file_service_pb.FileServiceErrors.EXISTENCE_ERROR,
file_service_pb.FileServiceErrors.EXISTENCE_ERROR_METADATA_NOT_FOUND,
file_service_pb.FileServiceErrors.EXISTENCE_ERROR_METADATA_FOUND,
file_service_pb.FileServiceErrors.EXISTENCE_ERROR_SHARDING_MISMATCH,
file_service_pb.FileServiceErrors.EXISTENCE_ERROR_OBJECT_NOT_FOUND,
file_service_pb.FileServiceErrors.EXISTENCE_ERROR_BUCKET_NOT_FOUND,
]):
raise ExistenceError()
That's a pretty large range of issues... Of course it doesn't tell me which one! And again, strange that the 'create' seems to work.
The problem turned out to be a lack of clarity in the documentation. files.gs.create returns a special 'writeable file path' which you need to feed in to open and finalize. A correct example looks like this:
class TestGS(webapp2.RequestHandler):
def get(self):
file_name = '/gs/ds_stats/testfile'
writable_file_name = files.gs.create(file_name, mime_type='text/html')
with files.open(writable_file_name, 'a') as file_handle:
file_handle.write("foo")
files.finalize(writable_file_name)

How to pass a google mapreduce parameter to done_callback

I'm having trouble setting a parameter when kicking off a mapreduce via start_map so I can access it in done_callback. Numerous things I've read imply that it's possible, but somehow I've not got the earth-moon-stars properly aligned. Ultimately, what I'm trying to accomplish is to delete the temporary blob I created for the mapreduce job.
Here's how I kick it off:
mrID = control.start_map(
"Find friends",
"findfriendshandler.findFriendHandler",
"mapreduce.input_readers.BlobstoreLineInputReader",
{"blob_keys": blobKey},
shard_count=7,
mapreduce_parameters={'done_callback': '/fnfrdone','blobKey': blobKey})
In done_callback, the context object isn't available:
class FindFriendsDoneHandler(webapp.RequestHandler):
def post(self):
ctx = context.get()
if ctx is not None:
params = ctx.mapreduce_spec.mapper.params
try:
blobKey = params['blobKey']
logging.info(['BLOBKEY ' + blobKey])
except KeyError:
logging.info('blobKey key not found in params')
else:
logging.info('context.get did not work') #THIS IS WHAT GETS OUTPUT
Thanks!
EDIT: It seems like there may be more than one MR library, so I wanted to include my various imports:
from mapreduce import control
from mapreduce import operation as op
from mapreduce import context
from mapreduce import model
Below is the code I used in my done_callback handler to retrieve my blobKey user parameter:
class FindFriendsDoneHandler(webapp.RequestHandler):
mrID = self.request.headers['Mapreduce-Id']
try:
mapreduceState = MapreduceState.get_by_key_name(mrID)
mrSpec = mapreduceState.mapreduce_spec
jsonSpec = mrSpec.to_json()
jsonParams = jsonSpec['params']
blobKey = jsonParams['blobKey']
blobInfo = BlobInfo.get(blobKey)
blobInfo.delete()
logging.info('Temp blob deleted successfully for mapreduce:' + mrID)
except:
logging.warning('Unable to delete temp blob for mapreduce:' + mrID)
This uses the mapreduce ID passed into done callback via the header to retrieve the mapreduce state model object from the mapreduce state table. The model stores any user params sent via start_map in a mapreduce_spec property which is in json format.
Note that MR, itself, actually stores the blob_key elsewhere in mapreduce_spec.
Thanks again to #Nick for pointing me to the model.py source file.
I'd love to hear if there's a simpler way to get at MR user params...
Context is only available to mappers/reducers - it's largely concerned with things that don't make sense outside the context of one. As you can see from the source, however, the "Mapreduce-Id" header is set, from which you can get the ID of the mapreduce job.
You shouldn't have to do your own cleanup, though - mapreduce has a handler that does it for you.

Automatically cached models in App Engine

I've been working on creating a subclass of db.Model that is automatically cached, i.e.:
instance.put would store the entity in memcache before persisting it to the datastore
class.get_by_key_name would first check the cache, and if missed, would go to the datastore to retrieve it and cache it after retrieval
I developed the approach below (which appears to work for me), but I have a few questions:
I had read Nick Johnson's article on efficient model memcaching which suggests implementing the serialization for memcache through protocol buffers. Looking at the memcache API source code in the SDK, it looks like Google has already implemented protobuf serialization by default. Is my interpretation correct?
Am I missing some important details (which could get me in the future) in the way I am subclassing db.Model or overriding the two methods?
Is there a more efficient way of implementing what I've done below?
Are there guidelines, benchmarks or best practices for when such entity caching would make sense from a performance perspective? Or would it always make sense to cache entities? On a related note, should I be reading anything into the fact that Google hasn't provided a cached model in the modeling API? Are there too many special cases to be thinking about?
Below is my current implementation. I would really appreciate any and all guidance/suggestions on caching entities (even if your response is not a direct answer to one of the 4 questions above, but relevant to the topic overall).
from google.appengine.ext import db
from google.appengine.api import memcache
import os
import logging
class CachedModel(db.Model):
'''Subclass of db.Model that automatically caches entities for put and
attempts to load from cache for get_by_key_name
'''
#classmethod
def get_by_key_name(cls, key_names, parent=None, **kwargs):
cache = memcache.Client()
# Ensure that every new deployment of the application results in a cache miss
# by including the application version ID in the namespace of the cache entry
namespace = os.environ['CURRENT_VERSION_ID'] + '_' + cls.__name__
if not isinstance(key_names, list):
key_names = [key_names]
entities = cache.get_multi(key_names, namespace=namespace)
if entities:
logging.info('%s (namespace=%s) retrieved from memcache' % (str(entities.keys()), namespace))
missing_key_names = list(set(key_names) - set(entities.keys()))
# For keys missed in memcahce, attempt to retrieve entities from datastore
if missing_key_names:
missing_entities = super(CachedModel, cls).get_by_key_name(missing_key_names, parent, **kwargs)
missing_mapping = zip(missing_key_names, missing_entities)
# Determine entities that exist in datastore and store them to memcache
entities_to_cache = dict()
for key_name, entity in missing_mapping:
if entity:
entities_to_cache[key_name] = entity
if entities_to_cache:
logging.info('%s (namespace=%s) cached by get_by_key_name' % (str(entities_to_cache.keys()), namespace))
cache.set_multi(entities_to_cache, namespace=namespace)
non_existent = set(missing_key_names) - set(entities_to_cache.keys())
if non_existent:
logging.info('%s (namespace=%s) missing from cache and datastore' % (str(non_existent), namespace))
# Combine entities retrieved from cache and entities retrieved from datastore
entities.update(missing_mapping)
if len(key_names) == 1:
return entities[key_names[0]]
else:
return [entities[key_name] for key_name in key_names]
def put(self, **kwargs):
cache = memcache.Client()
namespace = os.environ['CURRENT_VERSION_ID'] + '_' + self.__class__.__name__
cache.set(self.key().name(), self, namespace=namespace)
logging.info('%s (namespace=%s) cached by put' % (self.key().name(), namespace))
return super(CachedModel, self).put(**kwargs)
Rather than reinventing the wheel, why not switch to NDB, which already implements memcaching of model instances?
You might check out Nick Johnson's article on adding pre and post hooks for data model classes as an alternative to overriding get_by_key_name. That way your hook could work even when using db.get and db.put.
That said, I've found in my app that I've had more dramatic performance improvements caching things at a higher level - like all the content I need to render an entire page, or the page's html itself if possible.
You also might check out the asynctools library which can help you run datastore queries in parallel and cache the results.
I lot of good tips from Nick Johnson's you want implement are already implemented in the module appengine-mp. like serialization via protocolbuf or prefetching entities.
About your method get_by_key_names you can check the code. If you want create your own db.Model layer, maybe that can help you but you can also contribute to improve the existing model. ;)

Resources