Google App Engine Datastore - query with StructuredProperty in projection and filter - google-app-engine

I have a couple ndb models looks like these:
class Product(ndb.Model):
manufacturer = ndb.StringProperty()
category = ndb.StringProperty()
price = ndb.FloatProperty()
class Customer(ndb.Model):
customerId = ndb.StringProperty()
name = ndb.StringProperty()
products = ndb.StructuredProperty(Product, repeated=True)
And I'd like to query based on 'manufacturer' and 'category' of the product he/she owns. So this query works as expected.
query = Customer.query(Customer.products == Product(manufacturer=data_json["product"]["manufacturer"],
category=data_json["product"]["category"]))
results= query.fetch()
However, I cannot get the "projection" to work along with this query. The following query simply returned nothing.
query = Customer.query(Customer.products == Product(manufacturer=data_json["product"]["manufacturer"],
category=data_json["product"]["category"]))
results= query.fetch(projection=[Customer.products.price])
But if I use the projection without the filter, the projection part works fine. The following query will return all entities but only the 'price' property
results= Customer.query().fetch(projection=[Customer.products.price])
Any thoughts? Thanks.
BTW, my queries were developed based on this article.
https://cloud.google.com/appengine/docs/standard/python/ndb/queries#filtering_structured_properties

The correct way of combining AND and OR operations in the ndb library is documented in NDB Client Library's documentation.
With the query below, you are performing an AND operation in the filter, so instead of this one, you should use the one I propose below, using ndb.AND().
# Your query
query = Customer.query(Customer.products == Product(manufacturer=data_json["product"]["manufacturer"], category=data_json["product"]["category"]))
# Query using ndb.AND
query = Customer.query(ndb.AND(Customer.products == Product(manufacturer=data_json["product"]["manufacturer"]), Customer.products == Product(category=data_json["product"]["category"])))
Also, it turns out that if you perform the filtering in multiple steps, the query also works:
# Your request
query = Customer.query(Customer.products == Product(manufacturer=data_json["product"]["manufacturer"], category=data_json["product"]["category"]))
results = query.fetch(projection=[Customer.products.price])
# Request performing filter in multiple steps
query = Customer.query(Customer.products == Product(category=data_json["product"]["category"]))
query1 = query.filter(Customer.products == Product(manufacturer=data_json["product"]["manufacturer"]))
results = query1.fetch(projection=[Customer.products.price])
You can use either of the proposed alternatives, although I would suggest using ndb.AND() as it minimizes the code and is also the best way to combine AND operations.
UPDATE with some code:
app.yaml
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /.*
script: main.app
main.py
import webapp2
from google.appengine.ext import ndb
# Datastore Models
class Product(ndb.Model):
manufacturer = ndb.StringProperty()
category = ndb.StringProperty()
price = ndb.FloatProperty()
class Customer(ndb.Model):
customerId = ndb.StringProperty()
name = ndb.StringProperty()
products = ndb.StructuredProperty(Product, repeated=True)
# Create entities for testing purposes
class CreateEntities(webapp2.RequestHandler):
def get(self):
prod1 = Product(manufacturer="Google", category="GCP", price=105.55)
prod2 = Product(manufacturer="Google", category="GCP", price=123.45)
prod3 = Product(manufacturer="Google", category="Drive", price=10.38)
prod1.put()
prod2.put()
prod3.put()
cust1 = Customer(customerId="Customer1", name="Someone", products=[prod1,prod2,prod3])
cust2 = Customer(customerId="Customer2", name="Someone else", products=[prod1])
cust3 = Customer(customerId="Customer3", name="Noone", products=[prod3])
cust1.put()
cust2.put()
cust3.put()
# Response text
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Done creating entities')
class GetEntities(webapp2.RequestHandler):
def get(self):
# This will not work
#query = Customer.query(Customer.products == Product(category="GCP", manufacturer="Google"))
#results = query.fetch(projection=[Customer.products.price])
# Alternative 1 - WORKS
#query = Customer.query(Customer.products == Product(category="GCP"))
#query1 = query.filter(Customer.products == Product(manufacturer="Google"))
#results = query1.fetch(projection=[Customer.products.price])
# Alternative 2 - WORKS
query = Customer.query(ndb.AND(Customer.products == Product(manufacturer="Google"), Customer.products == Product(category="GCP")))
results = query.fetch(projection=[Customer.products.price])
self.response.out.write('<html><body>')
for result in results:
self.response.out.write("%s<br><br>" % result)
self.response.out.write('</body></html>')
app = webapp2.WSGIApplication([
('/createEntities', CreateEntities),
('/getEntities', GetEntities),
], debug=True)

Related

Django: executing UPDATE query always returns rowcount 0

I'm new to programming and I'm not sure, whether the problem is in me or in the Django code. I call link method from my view and update field MatchId on Record model. Database is SQL Server 2017.
My view:
class RecordViewSet(viewsets.ModelViewSet):
"""
API for everything that has to do with Records.
Additionally we provide an extra `link` action.
"""
queryset = Record.objects.all().order_by("Id")
serializer_class = RecordSerializer
permission_classes = [permissions.IsAuthenticated]
#action(methods=["post"], detail=False)
def link(self, request, *args, **kwargs):
idToMatch = request.POST.getlist("Id")
recordsToMatch = Record.objects.filter(Id__in=idToMatch)
lastMatchId = Record.objects.latest("MatchId").MatchId
matchedSuccesfully = recordsToMatch.update(MatchId=lastMatchId + 1)
if matchedSuccesfully > 1:
return Response(data=matchedSuccesfully, status=status.HTTP_200_OK)
else:
return Response(data=matchedSuccesfully, status=status.HTTP_404_NOT_FOUND)
For some reason matchedSuccessfully always returns zero. Relevant Django code:
def execute_sql(self, result_type):
"""
Execute the specified update. Return the number of rows affected by
the primary update query. The "primary update query" is the first
non-empty query that is executed. Row counts for any subsequent,
related queries are not available.
"""
cursor = super().execute_sql(result_type)
try:
rows = cursor.rowcount if cursor else 0
is_empty = cursor is None
finally:
if cursor:
cursor.close()
for query in self.query.get_related_updates():
aux_rows = query.get_compiler(self.using).execute_sql(result_type)
if is_empty and aux_rows:
rows = aux_rows
is_empty = False
return rows
I rewrote execute_sql as follows:
def execute_sql(self, result_type):
"""
Execute the specified update. Return the number of rows affected by
the primary update query. The "primary update query" is the first
non-empty query that is executed. Row counts for any subsequent,
related queries are not available.
"""
cursor = super().execute_sql(result_type)
try:
if cursor:
cursor.execute("select ##rowcount")
rows = cursor.fetchall()[0][0]
else:
rows = 0
is_empty = cursor is None
finally:
if cursor:
cursor.close()
for query in self.query.get_related_updates():
aux_rows = query.get_compiler(self.using).execute_sql(result_type)
if is_empty and aux_rows:
rows = aux_rows
is_empty = False
return rows
and now it works, but I'm unsure if there is a more elegant way to resolve this since now I have to ship this exact code everywhere. Source code at:
https://github.com/django/django/blob/main/django/db/models/sql/compiler.py
I've faced the same issue and came to the same point in django's depths.
In my case — the problem was in trigger configured for UPDATE.
It should have return ##ROWCOUNT as a result, but in my case it didn't.
Btw, the thing I did (due to restriction on editing triggers) — overrided save method in base model for such models to force_update=True:
class BaseModel(models.Model):
def save(self, force_insert=False, force_update=False, using=None, update_fields=None):
if self._state.adding:
super().save(force_insert=force_insert, force_update=force_update, using=using, update_fields=update_fields)
else:
try:
super().save(force_insert=force_insert, force_update=True, using=using, update_fields=update_fields)
except DatabaseError as e:
if str(e) == 'Forced update did not affect any rows.':
pass
else:
raise e
class Meta:
managed = False
abstract = True

Google App Engine - query vs. filter clarification

My model:
class User(ndb.Model):
name = ndb.StringProperty()
Is there any difference in terms of efficiency/cost/speed between the following two queries?
u = User.query(User.name==name).get()
u = User.query().filter(User.name==name).get()
Should I use one of them over the other? I assume the 2nd one is worse because it firsts get the entire User class queryset and then applies the filter?
There is no difference in functionality between the two so you can choose whatever you like best. On the google documentation, they show these two examples:
query = Account.query(Account.userid >= 40, Account.userid < 50)
and
query1 = Account.query() # Retrieve all Account entitites
query2 = query1.filter(Account.userid >= 40) # Filter on userid >= 40
query3 = query2.filter(Account.userid < 50) # Filter on userid < 50 too
and state:
query3 is equivalent to the query variable from the previous example.

GAE-NDB: how prevent projection changed the results

I used ndb projection but it did change the results, how the results are not affected by projection?
class T(ndb.Model):
name = ndb.StringProperty()
name2 = ndb.StringProperty(repeated=True)
#classmethod
def test(cls):
for i in range(0, 10):
t = T(name=str(i))
if i%2 == 0:
t.name2=["zzz"]
t.put()
qr = T.query()
qo = ndb.QueryOptions(projection=['name', 'name2'])
items, cursor, more = qr.fetch_page(20, options=qo)
print len(items)
qo = ndb.QueryOptions(projection=['name'])
items, cursor, more = qr.fetch_page(20, options=qo)
print len(items)
The result is 5, 10
How to make result is 10, 10 ?
Thanks
An empty list-property (repeated=True) won't get indexed and as it's the index that projection queries use to return results, entities without values for the property won't be returned.
Your test case is susceptible to the eventual-consistency that Tim's comment mentions, but it isn't the only issue.

Between query equivalent on App Engine datastore?

I have a model containing ranges of IP addresses, similar to this:
class Country(db.Model):
begin_ipnum = db.IntegerProperty()
end_ipnum = db.IntegerProperty()
On a SQL database, I would be able to find rows which contained an IP in a certain range like this:
SELECT * FROM Country WHERE ipnum BETWEEN begin_ipnum AND end_ipnum
or this:
SELECT * FROM Country WHERE begin_ipnum < ipnum AND end_ipnum > ipnum
Sadly, GQL only allows inequality filters on one property, and doesn't support the BETWEEN syntax. How can I work around this and construct a query equivalent to these on App Engine?
Also, can a ListProperty be 'live' or does it have to be computed when the record is created?
question updated with a first stab at a solution:
So based on David's answer below and articles such as these:
http://appengine-cookbook.appspot.com/recipe/custom-model-properties-are-cute/
I'm trying to add a custom field to my model like so:
class IpRangeProperty(db.Property):
def __init__(self, begin=None, end=None, **kwargs):
if not isinstance(begin, db.IntegerProperty) or not isinstance(end, db.IntegerProperty):
raise TypeError('Begin and End must be Integers.')
self.begin = begin
self.end = end
super(IpRangeProperty, self).__init__(self.begin, self.end, **kwargs)
def get_value_for_datastore(self, model_instance):
begin = self.begin.get_value_for_datastore(model_instance)
end = self.end.get_value_for_datastore(model_instance)
if begin is not None and end is not None:
return range(begin, end)
class Country(db.Model):
begin_ipnum = db.IntegerProperty()
end_ipnum = db.IntegerProperty()
ip_range = IpRangeProperty(begin=begin_ipnum, end=end_ipnum)
The thinking is that after i add the custom property i can just import my dataset as is and then run queries on based on the ListProperty like so:
q = Country.gql('WHERE ip_range = :1', my_num_ipaddress)
When i try to insert new Country objects this fails though, complaning about not being able to create the name:
...
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/db/__init__.py", line 619, in _attr_name
return '_' + self.name
TypeError: cannot concatenate 'str' and 'IntegerProperty' objects
I tried defining an attr_name method for the new property or just setting self.name but that does not seem to help. Hopelessly stuck or heading in the right direction?
Short answer: Between queries aren't really supported at the moment. However, if you know a priori that your range is going to be relatively small, then you can fake it: just store a list on the entity with every number in the range. Then you can use a simple equality filter to get entities whose ranges contain a particular value. Obviously this won't work if your range is large. But here's how it would work:
class M(db.Model):
r = db.ListProperty(int)
# create an instance of M which has a range from `begin` to `end` (inclusive)
M(r=range(begin, end+1)).put()
# query to find instances of M which contain a value `v`
q = M.gql('WHERE r = :1', v)
The better solution (eventually - for now the following only works on the development server due to a bug (see issue 798). In theory, you can work around the limitations you mentioned and perform a range query by taking advantage of how db.ListProperty is queried. The idea is to store both the start and end of your range in a list (in your case, integers representing IP addresses). Then to get entities whose ranges contain some value v (i.e., between the two values in your list), you simply perform a query with two inequality filters on the list - one to ensure that v is at least as big as the smallest element in the list, and one to ensure that v is at least as small as the biggest element in the list.
Here's a simple example of how to implement this technique:
class M(db.Model):
r = db.ListProperty(int)
# create an instance of M which has a rnage from `begin` to `end` (inclusive)
M(r=[begin, end]).put()
# query to find instances of M which contain a value `v`
q = M.gql('WHERE r >= :1 AND r <= :1', v)
My solution doesn't follow the pattern you have requested, but I think it would work well on app engine. I'm using a list of strings of CIDR ranges to define the IP blocks instead of specific begin and end numbers.
from google.appengine.ext import db
class Country(db.Model):
subnets = db.StringListProperty()
country_code = db.StringProperty()
c = Country()
c.subnets = ['1.2.3.0/24', '1.2.0.0/16', '1.3.4.0/24']
c.country_code = 'US'
c.put()
c = Country()
c.subnets = ['2.2.3.0/24', '2.2.0.0/16', '2.3.4.0/24']
c.country_code = 'CA'
c.put()
# Search for 1.2.4.5 starting with most specific block and then expanding until found
result = Country.all().filter('subnets =', '1.2.4.5/32').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.4/31').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.4/30').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.0/29').fetch(1)
# ... repeat until found
# optimize by starting with the largest routing prefix actually found in your data (probably not 32)

What's the raw GQL to check a ReferenceProperty?

I have the following models:
class Author(db.Model):
name = db.StringProperty()
class Story(db.Model):
author = db.ReferenceProperty(Author)
What's the raw GQL to find all Stories by a certain author. In regular SQL, I will use joins but I don't think that's available in GQL.
Edit:
I'm looking for the raw GQL way, I know how to do it the Pythonic way. For instance something like(the following is probably totally wrong):
"SELECT * FROM Story WHERE author = :1", "Shakespeare"
I want to run the above from the GAE admin Data > Data Viewer > Query the Datastore. I want the raw SQL that someone could run from a typical mysql or psql shell.
Edit2: Ah, the raw-GQL for use in the data-viewer...
Here's one way:
1) Run this and get the ID number:
SELECT * FROM Author where name = 'shakespeare'
2) Using ID number from previous query, run this:
SELECT * FROM Story where author = key('Author', 12345)
Edit: at long last, the raw GQL:
(Easiest way: Use the implicit backreference property name; in the form "modelname_set".)
qry = GqlQuery("SELECT * FROM Author WHERE name = :1", "shakespeare")
shakespeare = qry.get()
shakespeare.story_set # this property now contains all Shakespeare's stories
or
qry0 = GqlQuery("SELECT * FROM Author WHERE name = :1", "shakespeare")
shakespeare = qry0.get()
qry1 = GqlQuery("SELECT * FROM Story WHERE author = :1", shakespeare.key())
shakespeare_stories = qry1.fetch(10) # probably good to have some limit here
I prefer this way:
qry = Author.all()
qry.filter('name = ', 'shakespeare')
shakespeare = qry.get()
shakespeare.story_set # this property now contains all Shakespeare's stories
The more involved way may sometimes be necessary:
qry0 = Author.all()
qry0.filter('name = ', 'shakespeare')
shakespeare = qry0.get()
qry1 = Story.all()
qry1.filter('author = ', shakespeare.key())
shakespeare_stories = qry1.fetch(10) # probably good to have some limit here

Resources