I find How can I represent an 'Enum' in Python? for how to create enum in python. I have a field in my ndb.Model that I want to accept one of my enum values. Do I simply set the field to StringProperty? My enum is
def enum(**enums):
return type('Enum', (), enums)
ALPHA = enum(A="A", B="B", C="C", D="D")
This is fully supported in the ProtoRPC Python API and it's not worth rolling your own.
A simple Enum would look like the following:
from protorpc import messages
class Alpha(messages.Enum):
A = 0
B = 1
C = 2
D = 3
As it turns out, ndb has msgprop module for storing protorpc objects and this is documented.
So to store your Alpha enum, you'd do the following:
from google.appengine.ext import ndb
from google.appengine.ext.ndb import msgprop
class Part(ndb.Model):
alpha = msgprop.EnumProperty(Alpha, required=True)
...
EDIT: As pointed out by hadware, a msgprop.EnumProperty is not indexed by default. If you want to perform queries over such properties you'd need to define the property as
alpha = msgprop.EnumProperty(Alpha, required=True, indexed=True)
and then perform queries
ndb.query(Part.alpha == Alpha.B)
or use any value other than Alpha.B.
Related
I am having trouble with Schema inference from Scala case classes during conversion from DataStreams to Tables in Flink. I've tried reproducing the examples given in the documentation but cannot get them to work. I'm wondering whether this might be a bug?
I have commented on a somewhat related issue in the past. My workaround is not using case classes but defining somewhat laboriously a DataStream[Row] with return type annotations.
Still I would like to learn if it is somehow possible to get the Schema inference from case classes working.
I'm using Flink 1.15.2 with Scala 2.12.7. I'm using the java libraries but install flink-scala separately.
This is my implementation of Example 1 as a quick Sanity Check:
import org.apache.flink.runtime.testutils.MiniClusterResourceConfiguration
import org.apache.flink.test.util.MiniClusterWithClientResource
import org.scalatest.BeforeAndAfter
import org.scalatest.funsuite.AnyFunSuite
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment
import java.time.Instant
class SanitySuite extends AnyFunSuite with BeforeAndAfter {
val flinkCluster = new MiniClusterWithClientResource(
new MiniClusterResourceConfiguration.Builder()
.setNumberSlotsPerTaskManager(2)
.setNumberTaskManagers(1)
.build
)
before {
flinkCluster.before()
}
after {
flinkCluster.after()
}
test("Verify that table conversion works as expected") {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tableEnv = StreamTableEnvironment.create(env)
case class User(name: String, score: java.lang.Integer, event_time: java.time.Instant)
// create a DataStream
val dataStream = env.fromElements(
User("Alice", 4, Instant.ofEpochMilli(1000)),
User("Bob", 6, Instant.ofEpochMilli(1001)),
User("Alice", 10, Instant.ofEpochMilli(1002))
)
val table =
tableEnv.fromDataStream(
dataStream
)
table.printSchema()
}
}
According to documentation this should result in:
(
`name` STRING,
`score` INT,
`event_time` TIMESTAMP_LTZ(9)
)
What I get:
(
`f0` RAW('SanitySuite$User$1', '...')
)
If I instead modify my code in line with Example 5 - that is explicitly define a Schema that mirrors the case class, I instead get an error which very much looks like it results from the inability of extracting the case class fields:
Unable to find a field named 'event_time' in the physical data type derived from the given type information for schema declaration. Make sure that the type information is not a generic raw type. Currently available fields are: [f0]
the issue is with the imports, you are importing java classes and using scala classes for pojo.
Using following works:
import org.apache.flink.api.common.eventtime.WatermarkStrategy
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.configuration.Configuration
import org.apache.flink.connector.kafka.source.KafkaSource
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment
import org.apache.flink.streaming.api.scala._
I am doing development on python and GAE,
When I try to use ProtoRPC for web service, I cannot find a way to let my request contain a json format data in message. example like this:
request format:
{"owner_id":"some id","jsondata":[{"name":"peter","dob":"1911-1-1","aaa":"sth str","xxx":sth int}, {"name":...}, ...]}'
python:
class some_function_name(messages.Message):
owner_id = messages.StringField(1, required=True)
jsondata = messages.StringField(2, required=True) #is there a json field instead of StringField?
any other suggestion?
What you'd probably want to do here is use a MessageField. You can define your nested message above or within you class definition and use that as the first parameter to the field definition. For example:
class Person(Message):
name = StringField(1)
dob = StringField(2)
class ClassRoom(Message):
teacher = MessageField(Person, 1)
students = MessageField(Person, 2, repeated=True)
Alternatively:
class ClassRoom(Message):
class Person(Message):
...
...
That will work too.
Unfortunately, if you want to store arbitrary JSON, as in any kind of JSON data without knowing ahead of time, that will not work. All fields must be predefined ahead of time.
I hope that it's still helpful to you to use MessageField.
Currently my application caches models in memcache like this:
memcache.set("somekey", aModel)
But Nicks' post at http://blog.notdot.net/2009/9/Efficient-model-memcaching suggests that first converting it to protobuffers is a lot more efficient. But after running some tests I found out it's indeed smaller in size, but actually slower (~10%).
Do others have the same experience or am I doing something wrong?
Test results: http://1.latest.sofatest.appspot.com/?times=1000
import pickle
import time
import uuid
from google.appengine.ext import webapp
from google.appengine.ext import db
from google.appengine.ext.webapp import util
from google.appengine.datastore import entity_pb
from google.appengine.api import memcache
class Person(db.Model):
name = db.StringProperty()
times = 10000
class MainHandler(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
m = Person(name='Koen Bok')
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, m)
r = memcache.get(key)
self.response.out.write('Pickle took: %.2f' % (time.time() - t1))
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, db.model_to_protobuf(m).Encode())
r = db.model_from_protobuf(entity_pb.EntityProto(memcache.get(key)))
self.response.out.write('Proto took: %.2f' % (time.time() - t1))
def main():
application = webapp.WSGIApplication([('/', MainHandler)], debug=True)
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
The Memcache call still pickles the object with or without using protobuf. Pickle is faster with a protobuf object since it has a very simple model
Plain pickle objects are larger than protobuf+pickle objects, hence they save time on Memcache, but there is more processor time in doing the protobuf conversion
Therefore in general either method works out about the same...but
The reason you should use protobuf is it can handle changes between versions of the models, whereas Pickle will error. This problem will bite you one day, so best to handle it sooner
Both pickle and protobufs are slow in App Engine since they're implemented in pure Python. I've found that writing my own, simple serialization code using methods like str.join tends to be faster since most of the work is done in C. But that only works for simple datatypes.
One way to do it more quickly is to turn your model into a dictionary and use the native eval / repr function as your (de)serializers -- with caution of course, as always with the evil eval, but it should be safe here given that there is no external step.
Below an example of a class Fake_entity implementing exactly that.
You first create your dictionary through fake = Fake_entity(entity) then you can simply store your data via memcache.set(key, fake.serialize()). The serialize() is a simple call to the native dictionary method of repr, with some additions if you need (e.g. add an identifier at the beginning of the string).
To fetch it back, simply use fake = Fake_entity(memcache.get(key)). The Fake_entity object is a simple dictionary whose keys are also accessible as attributes. You can access your entity properties normally, except referenceProperties give keys instead of fetching the object (which is actually quite useful). You can also get() the actual entity with fake.get(), or more interestigly, change it and then save with fake.put().
It does not work with lists (if you fetch multiple entities from a query), but could be easily be adjusted with join/split functions using an identifier like '### FAKE MODEL ENTITY ###' as the separator. Use with db.Model only, would need small adjustments for Expando.
class Fake_entity(dict):
def __init__(self, record):
# simple case: a string, we eval it to rebuild our fake entity
if isinstance(record, basestring):
import datetime # <----- put all relevant eval imports here
from google.appengine.api import datastore_types
self.update( eval(record) ) # careful with external sources, eval is evil
return None
# serious case: we build the instance from the actual entity
for prop_name, prop_ref in record.__class__.properties().items():
self[prop_name] = prop_ref.get_value_for_datastore(record) # to avoid fetching entities
self['_cls'] = record.__class__.__module__ + '.' + record.__class__.__name__
try:
self['key'] = str(record.key())
except Exception: # the key may not exist if the entity has not been stored
pass
def __getattr__(self, k):
return self[k]
def __setattr__(self, k, v):
self[k] = v
def key(self):
from google.appengine.ext import db
return db.Key(self['key'])
def get(self):
from google.appengine.ext import db
return db.get(self['key'])
def put(self):
_cls = self.pop('_cls') # gets and removes the class name form the passed arguments
# import xxxxxxx ---> put your model imports here if necessary
Cls = eval(_cls) # make sure that your models declarations are in the scope here
real_entity = Cls(**self) # creates the entity
real_entity.put() # self explanatory
self['_cls'] = _cls # puts back the class name afterwards
return real_entity
def serialize(self):
return '### FAKE MODEL ENTITY ###\n' + repr(self)
# or simply repr, but I use the initial identifier to test and eval directly when getting from memcache
I would welcome speed tests on this, I would assume this is quite faster than the other approaches. Plus, you do not have any risks if your models have changed somehow in the meantime.
Below an example of what the serialized fake entity looks like. Take a particular look at datetime (created) as well as reference properties (subdomain) :
### FAKE MODEL ENTITY ###
{'status': u'admin', 'session_expiry': None, 'first_name': u'Louis', 'last_name': u'Le Sieur', 'modified_by': None, 'password_hash': u'a9993e364706816aba3e25717000000000000000', 'language': u'fr', 'created': datetime.datetime(2010, 7, 18, 21, 50, 11, 750000), 'modified': None, 'created_by': None, 'email': u'chou#glou.bou', 'key': 'agdqZXJlZ2xlcgwLEgVMb2dpbhjmAQw', 'session_ref': None, '_cls': 'models.Login', 'groups': [], 'email___password_hash': u'chou#glou.bou+a9993e364706816aba3e25717000000000000000', 'subdomain': datastore_types.Key.from_path(u'Subdomain', 229L, _app=u'jeregle'), 'permitted': [], 'permissions': []}
Personally I also use static variables (faster than memcache) to cache my entities in the short term, and fetch the datastore when the server has changed or its memory has been flushed for some reason (which happens quite often in fact).
I want to be able to take a dynamically created string, say "Pigeon" and determine at runtime whether Google App Engine has a Model class defined in this project named "Pigeon". If "Pigeon" is the name of a existant model class, I would like to then get a reference to the Pigeon class so defined.
Also, I don't want to use eval at all, since the dynamic string "Pigeon" in this case, comes from outside.
You could try, although probably very, very bad practice:
def get_class_instance(nm) :
try :
return eval(nm+'()')
except :
return None
Also, to make that safer, you could give eval a locals hash: eval(nm+'()', {'Pigeon':pigeon})
I'm not sure if that would work, and it definitely has an issue: if there is a function called the value of nm, it would return that:
def Pigeon() :
return "Pigeon"
print(get_class_instance('Pigeon')) # >> 'Pigeon'
EDIT: Another way of doing it is possibly (untested), if you know the module:
(Sorry, I keep forgetting it's not obj.hasattr, its hasattr(obj)!)
import models as m
def get_class_instance(nm) :
if hasattr(m, nm) :
return getattr(m, nm)()
else : return None
EDIT 2: Yes, it does work! Woo!
Actually, looking through the source code and interweb, I found a undocumented method that seems to fit the bill.
from google.appengine.ext import db
key = "ModelObject" #This is a dynamically generated string
klass = db.class_for_kind(key)
This method will throw a descriptive exception if the class does not exist, so you should probably catch it if the key string comes from the outside.
There's two fairly easy ways to do this without relying on internal details:
Use the google.appengine.api.datastore API, like so:
from google.appengine.api import datastore
q = datastore.Query('EntityType')
if q.get(1):
print "EntityType exists!"
The other option is to use the db.Expando class:
def GetEntityClass(entity_type):
class Entity(db.Expando):
#classmethod
def kind(cls):
return entity_type
return Entity
cls = GetEntityClass('EntityType')
if cls.all().get():
print "EntityType exists!"
The latter has the advantage that you can use GetEntityClass to generate an Expando class for any entity type, and interact with it the same way you would a normal class.
I'm unable to workout how you can get objects from the Google App Engine Datastore using get_by_id. Here is the model
from google.appengine.ext import db
class Address(db.Model):
description = db.StringProperty(multiline=True)
latitude = db.FloatProperty()
longitdue = db.FloatProperty()
date = db.DateTimeProperty(auto_now_add=True)
I can create them, put them, and retrieve them with gql.
address = Address()
address.description = self.request.get('name')
address.latitude = float(self.request.get('latitude'))
address.longitude = float(self.request.get('longitude'))
address.put()
A saved address has values for
>> address.key()
aglndWVzdGJvb2tyDQsSB0FkZHJlc3MYDQw
>> address.key().id()
14
I can find them using the key
from google.appengine.ext import db
address = db.get('aglndWVzdGJvb2tyDQsSB0FkZHJlc3MYDQw')
But can't find them by id
>> from google.appengine.ext import db
>> address = db.Model.get_by_id(14)
The address is None, when I try
>> Address.get_by_id(14)
AttributeError: type object 'Address' has no attribute 'get_by_id'
How can I find by id?
EDIT: It turns out I'm an idiot and was trying find an Address Model in a function called Address. Thanks for your answers, I've marked Brandon as the correct answer as he got in first and demonstrated it should all work.
I just tried it on shell.appspot.com and it seems to work fine:
Google Apphosting/1.0
Python 2.5.2 (r252:60911, Feb 25 2009, 11:04:42)
[GCC 4.1.0]
>>> class Address(db.Model):
description = db.StringProperty(multiline=True)
latitude = db.FloatProperty()
longitdue = db.FloatProperty()
date = db.DateTimeProperty(auto_now_add=True)
>>> addy = Address()
>>> addyput = addy.put()
>>> addyput.id()
136522L
>>> Address.get_by_id(136522)
<__main__.Address object at 0xa6b33ae3bf436250>
An app's key is a list of (kind, id_or_name) tuples - for root entities, always only one element long. Thus, an ID alone doesn't identify an entity - the type of entity is also required. When you call db.Model.get_by_id(x), you're asking for the entity with key (Model, x). What you want is to call Address.get_by_id(x), which fetches the entity with key (Address, x).
You should use long type in get_by_id("here").
Int type must have a error message.