Python/Nose/Testing Why does pickling a OrderedDict fail? - google-app-engine

I´ve come about some very strange behavior in one of my nose tests for GAE, not quite sure on how to debug it further... Any idea why it fails would be appreciated...
# Main Testing file stripped to the basics
# -*- coding: utf-8 -*-
import unittest
import pickle
from collections import OrderedDict
from ptest import SomeClass
class PickleTest(unittest.TestCase):
def runTest(self):
res = OrderedDict()
for item in [1, 2, 3]:
res[item] = "test"
#works
pickle.dumps(res)
#fails
otherClass = SomeClass()
test = otherClass.pTest("Nav")
if __name__ == '__main__':
unittest.main()
The imported class file:
import pickle
from collections import OrderedDict
class SomeClass:
def pTest(self, tableName=None, rightsTrimmed=True):
return pickle.dumps(OrderedDict())
leads to
PicklingError: Can't pickle <class 'collections.OrderedDict'>: it's not the same object as collections.OrderedDict
But strangely enough only for the statement in the imported class, not the main one.
I´m at the end of my wisdom. Being executed in the normal GAE dev/production environment, the code works... The system Python version is Python 2.7.5.

Thanks to Oleksiy, I´ve figured out that when inserting a breakpoint, the test ran though without modifying the code. Stange. I can´t really imagine why, but this has given me the idea that something really strange in terms of timing is going on. And, to confirm that suspicion, I´ve tried a late import of the OrderedDict, which works.
It´s a first find, changing my production code to late imports to allow tests to run seems crazy. I´ll read up on late imports and think about on how to proceed...
# -*- coding: utf-8 -*-
import pickle
#from collections import OrderedDict
class SomeClass:
def pTest(self, tableName=None, rightTrimmed=True):
# The Demons seem to be pleased by the late import...
from collections import OrderedDict
return pickle.dumps(OrderedDict())

Related

Unittest mock.patch.object(autospec=True) broken for staticmethod?

I want to ensure that my Class' staticmethod is called with the correct arguments without actually calling it, therefore I am mocking it. E.g.:
import unittest
from unittest.mock import patch
class FooStatic:
#staticmethod
def bar_static(self, baz_static):
print(baz_static)
pass
class TestFooStatic(unittest.TestCase):
def test_foo_static(self):
with patch.object(FooStatic, 'bar_static', autospec=True):
FooStatic.bar_static()
def test_foo_static_instance(self):
with patch.object(FooStatic, 'bar_static', autospec=True):
foo_s = FooStatic()
foo_s.bar_static()
Both these tests should complain that FooStatic.bar_static cannot be called without argument 'baz_static'. Unfortunately they don't, the tests succeed.
Without the staticmethod decorator, patch behaves as I expect:
class Foo:
def bar(self, baz):
print(baz)
pass
class TestFoo(unittest.TestCase):
def test_foo(self):
with patch.object(Foo, 'bar', autospec=True):
foo = Foo()
foo.bar() # raises TypeError: missing a required argument: 'baz'
I have found a loosely related issue in python, that was fixed from python 3.7 onwards: merged PR.
I am on python 3.8.5 (default on ubunutu 20).
I am not opposed to investing some time to try and propose a fix myself. However, I first want to make sure I am not overlooking anything. Any thoughts?

Scala Slick-Extensions SQLServerDriver 2.1.0 usage - can't get it to compile

I am trying to use Slick-Extensions to connect to an SQL Server Database from Scala. I use slick 2.1.0 and slick-extensions 2.1.0.
I can't seem to get the code I wrote to compile. I followed the examples from slick's website and this compiled fine when the driver was H2. Please see below:
package com.example
import com.typesafe.slick.driver.ms.SQLServerDriver.simple._
import scala.slick.direct.AnnotationMapper.column
import scala.slick.lifted.TableQuery
import scala.slick.model.Table
class DestinationMappingsTable(tag: Tag) extends Table[(Long, Int, Int)](tag, "DestinationMappings_tbl") {
def id = column[Long]("id", O.PrimaryKey, O.AutoInc)
def mltDestinationType = column[Int]("mltDestinationType")
def mltDestinationId = column[Int]("mltDestinationId")
def * = (id, mltDestinationType, mltDestinationId)
}
I am getting a wide range of errors: scala.slick.model.Table does not take type parameters, column does not take type parameters and O not found.
If the SQLServerDriver does not use the same syntax as slick, where do I find its documentation?
Thank you!
I think to your import of scala.slick.model.Table shadows your import of com.typesafe.slick.driver.ms.SQLServerDriver.simple.Table
Try to just remove the:
import scala.slick.model.Table

How transactions influence read consistency for next non ancestor query in NDB

The apply phase of save may fail and/or is still being done asynchronously before next not strongly-consistent read — non ancestor query.
Based on local testing article I have wrote a test that should simulate inconsistent reads:
import dev_appserver
dev_appserver.fix_sys_path()
import unittest
from google.appengine.ext import ndb
from google.appengine.ext import testbed
from google.appengine.datastore import datastore_stub_util
class SomeModel(ndb.Model):
pass
class SingleEntityConsistency(unittest.TestCase):
def setUp(self):
# Setup AppEngine env
self.testbed = testbed.Testbed()
self.testbed.activate()
self.policy = datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=0)
self.testbed.init_datastore_v3_stub(consistency_policy=self.policy)
self.testbed.init_memcache_stub()
# A test key
self.key = ndb.Key('SomeModel', 'test')
def tearDown(self):
self.testbed.deactivate()
def test_tx_get_or_insert(self):
p = SomeModel.get_or_insert('test')
self.assertEqual(0, SomeModel.query().count(1), "Shouldn't be applied yet")
self.assertEqual(1, SomeModel.query(ancestor=self.key).count(1), "Ancestor query read should be consistent")
def test_no_tx_insert(self):
p = SomeModel(id='test')
p.put()
self.assertEqual(0, SomeModel.query().count(2), "Shouldn't be applied yet")
self.assertEqual(1, SomeModel.query(ancestor=self.key).count(1), "Ancestor query read should be consistent")
def test_with_ancestor(self):
p = SomeModel(id='test')
p.put()
self.assertEqual(p, SomeModel.query(ancestor=self.key).get())
def test_key(self):
p = SomeModel(id='test')
p.put()
self.assertEqual(p, self.key.get())
if __name__ == '__main__':
unittest.main()
Actual questions…
Does wrapping put() in transaction change behaviour described in the beginning? Do I still need a strongly consistent query to make a sure that I'll read was was written in the txn? (tests suggest that, I still need strongly consistent query)
Is key.get() considered to be strongly-consistent? (tests suggest that, it is)
UPDATE
I have updated test code as Guido mentioned, now all test pass:
self.testbed.init_datastore_v3_stub(consistency_policy=self.policy)
I believe you must do something to activate the policy. That would explain the test failures. Also I believe only queries are affected and a lone put is effectively a transaction. Finally beware of NDB's caches.

Simple / Smart, Pythonic database solution, can use Python types + syntax? (Key / Value Dict, Array, maybe Ordered Dict)

Looking for solutions that push the envelope and:
Avoid
Manually writing SQL queries(Python can be more OO not passing DSL strings)
Using non-Python datatypes for a supposedly required model definition
Using a new class of types rather than perfectly good native Python types
Boast
Using Python objects
Using Object Oriented and key based retrieval and creation
Quick protoyping
No SQL table to make
Model /Type inference or no model
Less lines and characters to type
Easily output to and from JSON, maybe XML or even Protocol Buffers.
I do web, desktop and mobile software development so the more portable the better.
python
>> from someAmazingDB import *
>> db.taskList = []
>> db['taskList'].append({title:'Beat old sql interfaces','done':False})
>> db.taskList.append({title:'Illustrate different syntax modes','done':True})
#at this point it should autosave
#we should be able to reload the console and access like:
python
>> from someAmazingDB import *
>> print 'Done tasks'
>> for task in db.taskList:
>> if task.done:
>> print task
'Illustrate different syntax modes'
Here is the challenge: The above code should work with very little modification or thinking required. Like a different import statement and maybe a little more but Django Models and SQLAlchemy DO NOT CUT IT.
I'm looking for more interesting library suggestions than just "Try Shelve" or "use pickle"
I'm not opposed to Python classes being used for models but they should be really straight forward, unlike the stuff you see with Django and similar.
I've was actually working on something like this earlier today. There is no readme or sufficient tests yet, but... http://github.com/mikeboers/LiteMap/blob/master/litemap.py
The LiteMap class behaves much like the builtin dict, but it persists into a SQLite database. You did not indicate what particular database you were interested in, but this could be almost trivially modified to any back end.
It also does not track changes to mutable classes (e.g. like appending to the list in your example), but the API is really simple.
Database access doesn't get better than SQLAlchemy.
Care to explain what about Django's models you don't find straightforward? Here's how I'd do what you have in Django:
from django.db import models
class Task(models.Model):
title = models.CharField(max_length=...)
is_done = models.BooleanField()
def __unicode__(self):
return self.title
----
from mysite.tasks.models import Task
t = Task(title='Beat old sql interfaces', is_done=True)
t.save()
----
from mysite.tasks.models import Task
print 'Done tasks'
for task in Task.objects.filter(is_done=True):
print task
Seems pretty straightforward to me! Also, results in a slightly cleaner table/object naming scheme IMO. The trickier part is using Django's DB module separate from the rest of Django, if that's what you're after, but it can be done.
Using web2py:
>>> from gluon.sql import DAL, Field
>>> db=DAL('sqlite://stoarge.db')
>>> db.define_table('taskList',Field('title'),Field('done','boolean')) # creates the table
>>> db['taskList'].insert(title='Beat old sql interfaces',done=False)
>>> db.taskList.insert(title='Beat old sql interfaces',done=False)
>> for task in db(db.taskList.done==True).select():
>> print task.title
Supports 10 different database back-ends + google app engine.
Question looks strikingly similar to http://api.mongodb.org/python/1.9%2B/tutorial.html
So answer is pymongo, what else ;)
from pymongo import Connection
connection = Connection()
connection = Connection('localhost', 27017)
tasklist = db['test-tasklist']
tasklist.append({title:'Beat old sql interfaces','done':False})
db.tasklist.append({title:'Illustrate different syntax modes','done':True})
for task in db.tasklist.find({done:True}):
print task.title
I haven't tested the code but wont be very different than this
BTW Redish is also interesting and fun.

What is the best way to do AppEngine Model Memcaching?

Currently my application caches models in memcache like this:
memcache.set("somekey", aModel)
But Nicks' post at http://blog.notdot.net/2009/9/Efficient-model-memcaching suggests that first converting it to protobuffers is a lot more efficient. But after running some tests I found out it's indeed smaller in size, but actually slower (~10%).
Do others have the same experience or am I doing something wrong?
Test results: http://1.latest.sofatest.appspot.com/?times=1000
import pickle
import time
import uuid
from google.appengine.ext import webapp
from google.appengine.ext import db
from google.appengine.ext.webapp import util
from google.appengine.datastore import entity_pb
from google.appengine.api import memcache
class Person(db.Model):
name = db.StringProperty()
times = 10000
class MainHandler(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
m = Person(name='Koen Bok')
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, m)
r = memcache.get(key)
self.response.out.write('Pickle took: %.2f' % (time.time() - t1))
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, db.model_to_protobuf(m).Encode())
r = db.model_from_protobuf(entity_pb.EntityProto(memcache.get(key)))
self.response.out.write('Proto took: %.2f' % (time.time() - t1))
def main():
application = webapp.WSGIApplication([('/', MainHandler)], debug=True)
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
The Memcache call still pickles the object with or without using protobuf. Pickle is faster with a protobuf object since it has a very simple model
Plain pickle objects are larger than protobuf+pickle objects, hence they save time on Memcache, but there is more processor time in doing the protobuf conversion
Therefore in general either method works out about the same...but
The reason you should use protobuf is it can handle changes between versions of the models, whereas Pickle will error. This problem will bite you one day, so best to handle it sooner
Both pickle and protobufs are slow in App Engine since they're implemented in pure Python. I've found that writing my own, simple serialization code using methods like str.join tends to be faster since most of the work is done in C. But that only works for simple datatypes.
One way to do it more quickly is to turn your model into a dictionary and use the native eval / repr function as your (de)serializers -- with caution of course, as always with the evil eval, but it should be safe here given that there is no external step.
Below an example of a class Fake_entity implementing exactly that.
You first create your dictionary through fake = Fake_entity(entity) then you can simply store your data via memcache.set(key, fake.serialize()). The serialize() is a simple call to the native dictionary method of repr, with some additions if you need (e.g. add an identifier at the beginning of the string).
To fetch it back, simply use fake = Fake_entity(memcache.get(key)). The Fake_entity object is a simple dictionary whose keys are also accessible as attributes. You can access your entity properties normally, except referenceProperties give keys instead of fetching the object (which is actually quite useful). You can also get() the actual entity with fake.get(), or more interestigly, change it and then save with fake.put().
It does not work with lists (if you fetch multiple entities from a query), but could be easily be adjusted with join/split functions using an identifier like '### FAKE MODEL ENTITY ###' as the separator. Use with db.Model only, would need small adjustments for Expando.
class Fake_entity(dict):
def __init__(self, record):
# simple case: a string, we eval it to rebuild our fake entity
if isinstance(record, basestring):
import datetime # <----- put all relevant eval imports here
from google.appengine.api import datastore_types
self.update( eval(record) ) # careful with external sources, eval is evil
return None
# serious case: we build the instance from the actual entity
for prop_name, prop_ref in record.__class__.properties().items():
self[prop_name] = prop_ref.get_value_for_datastore(record) # to avoid fetching entities
self['_cls'] = record.__class__.__module__ + '.' + record.__class__.__name__
try:
self['key'] = str(record.key())
except Exception: # the key may not exist if the entity has not been stored
pass
def __getattr__(self, k):
return self[k]
def __setattr__(self, k, v):
self[k] = v
def key(self):
from google.appengine.ext import db
return db.Key(self['key'])
def get(self):
from google.appengine.ext import db
return db.get(self['key'])
def put(self):
_cls = self.pop('_cls') # gets and removes the class name form the passed arguments
# import xxxxxxx ---> put your model imports here if necessary
Cls = eval(_cls) # make sure that your models declarations are in the scope here
real_entity = Cls(**self) # creates the entity
real_entity.put() # self explanatory
self['_cls'] = _cls # puts back the class name afterwards
return real_entity
def serialize(self):
return '### FAKE MODEL ENTITY ###\n' + repr(self)
# or simply repr, but I use the initial identifier to test and eval directly when getting from memcache
I would welcome speed tests on this, I would assume this is quite faster than the other approaches. Plus, you do not have any risks if your models have changed somehow in the meantime.
Below an example of what the serialized fake entity looks like. Take a particular look at datetime (created) as well as reference properties (subdomain) :
### FAKE MODEL ENTITY ###
{'status': u'admin', 'session_expiry': None, 'first_name': u'Louis', 'last_name': u'Le Sieur', 'modified_by': None, 'password_hash': u'a9993e364706816aba3e25717000000000000000', 'language': u'fr', 'created': datetime.datetime(2010, 7, 18, 21, 50, 11, 750000), 'modified': None, 'created_by': None, 'email': u'chou#glou.bou', 'key': 'agdqZXJlZ2xlcgwLEgVMb2dpbhjmAQw', 'session_ref': None, '_cls': 'models.Login', 'groups': [], 'email___password_hash': u'chou#glou.bou+a9993e364706816aba3e25717000000000000000', 'subdomain': datastore_types.Key.from_path(u'Subdomain', 229L, _app=u'jeregle'), 'permitted': [], 'permissions': []}
Personally I also use static variables (faster than memcache) to cache my entities in the short term, and fetch the datastore when the server has changed or its memory has been flushed for some reason (which happens quite often in fact).

Resources