Can't execute a distinct projection query - google-app-engine

I have a simple little "Observation" class:
from google.appengine.ext import ndb
class Observation(ndb.Model):
remote_id = ndb.StringProperty()
dimension_id = ndb.IntegerProperty()
metric = ndb.StringProperty()
timestamp_observed = ndb.StringProperty()
timestamp_received = ndb.DateTimeProperty(auto_now_add=True)
#classmethod
def query_book(cls):
return cls.query()
I can run projection queries against the Datastore to return only certain columns. E.g:
observations = Observation.query().fetch(projection=[Observation.dimension_id])
This works nicely, but I only want unique results. The documentation makes this sound easy:
# Functionally equivalent
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
But when I do this:
observations = Observation.query().fetch(projection=[Observation.dimension_id], group_by=[Observation.dimension_id])
observations = Observation.query().fetch(projection=[Observation.dimension_id], distinct=True)
I get errors for both variants.
TypeError: Unknown configuration option ('group_by')
TypeError: Unknown configuration option ('distinct')
This happens on localhost and in the prod environment too. What am I missing?

Silly me - all of these params need to sit within the query() function, not within fetch(). The projection elements actually works in fetch(), but you need to move both the projection and distinct arguments into query() to get it to work.
From Grouping:
Projection queries can use the distinct keyword to ensure that only
completely unique results will be returned in a result set. This will
only return the first result for entities which have the same values
for the properties that are being projected.
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
Both queries are equivalent and will produce each author's name only
once.
Hope this helps anyone else with a similar problem :)

Related

Read embedded entity from python ndb client

I am using the google cloud datastore python client to write an entity into the datastore which contains an embedded entity. An example entity might look like:
data_type: 1
raw_bytes: <unindexed blob>
values: <indexed embedded entity>
I checked the data from the console and the data is getting saved correctly and the values are present.
Next, I need to run a query from a python app engine application. I have represented the above as the following entity in my app engine code:
class DataValues(ndb.Model):
param1 = ndb.BooleanProperty()
param2 = ndb.IntegerProperty()
param3 = ndb.IntegerProperty()
class MyEntity(ndb.Expando):
data_type = ndb.IntegerProperty(required=True)
raw_bytes = ndb.BlobProperty()
values = ndb.StructuredProperty(DataValues)
One of the filters in the query depends on a property in values. Sample query code is as below:
MyEntity.query().filter(MyEntity.data_type == 1).filter(MyEntity.values.param1 == True).get()
I have created the corresponding composite index in my index.yaml
The query runs successfully but the resulting entity contains the embedded entity values as None. All other property values are present.
What can be the issue here ?
Add properties of DataValues entity as properties of the MyEntity.
This is a bit of a guess, but since datastore attributes are kind of keyed by both their name (in this case values) and the name of the "field type/class" (i.e. StructuredProperty), this might fix your problem:
class EmbeddedProperty(ndb.StructuredProperty):
pass
class MyEntity(ndb.Expando):
data_type = ndb.IntegerProperty(required=True)
raw_bytes = ndb.BlobProperty()
values = EmbeddedProperty(DataValues)
Give it a shot and let me know if values starts coming back non-null.
I struggled with the same problem, wanting to convert the embedded entity into a Python dictionary. One possible solution, although not a very elegant one, is to use a GenericProperty:
class MyEntity(ndb.Model):
data_type = ndb.IntegerProperty(required=True)
raw_bytes = ndb.BlobProperty()
values = ndb.GenericProperty()
values will then be read as an "Expando" object: Expando(param1=False,...). You can access the individual values with values.param1, values.param2 etc. I would prefer having a custom model class, but this should do the job.

How to get result based on specific keywords using Google App Engine NO-SQL datastore query in java?

We use GoogleInfo table, in that table, we store scopes of application. e.g."https://www.googleapis.com/auth/drive.appdata,https://www.googleapis.com/auth/drive.file"
and I am trying to get a result based on some keyword like "drive" from scopes, based on that keyword I am trying to get a result. following is my code. Please suggest.
String keywords[] = {"admin","drive","gmail","userinfo"};
Query query = pm.newQuery("SELECT scopes FROM com.cloudcodes.gcontrol.dataaccesslayer.insights.google.drive.GoogleInfo where :p.contains(scopes)");
result = (List) query.execute(Arrays.asList(keywords));
List tempResult = new ArrayList();
tempResult.addAll(result);
return tempResult;
Seems that you are using JDO [1] and the PersistanceManager [2].
As far as I can see the query you are trying to execute might be wrong. Check how to perform queries with JDO [3].
Your query is:
String keywords[] = {"admin","drive","gmail","userinfo"};
Query query = pm.newQuery("SELECT scopes FROM com.cloudcodes.gcontrol.dataaccesslayer.insights.google.drive.GoogleInfo where :p.contains(scopes)");
result = (List) query.execute(Arrays.asList(keywords));
Looks like you want to do something like:
//Query for all persons with lastName equal to Smith or Jones
Query q = pm.newQuery(Person.class, ":p.contains(lastName)");
q.execute(Arrays.asList("Smith", "Jones"));
Person.class is the kind
Arrays.asList("Smith", "Jones")
This parameter ":p.contains(lastName)" that defines lastName is the property we want to check on.
You are setting the class as com.cloudcodes.gcontrol.dataaccesslayer.insights.google.drive.GoogleInfo, I suppose that this is the full Java name package where the class habits, and the class name is GoogleInfo. So you could try:
Query q = pm.newQuery(GoogleInfo.class, ":p.contains(scopes)");
q.execute(Arrays.asList("admin","drive","gmail","userinfo"));
You want to retrieve scopes. I assume that you want to use the REST API. So inside :p.contains(“scopes”) might go another property related to your keyWords that is in the Entity you want to retrieve, maybe an array property?
Here I share with you some docs that might be useful [4][5].
Hope this helps!
[1] https://cloud.google.com/appengine/docs/standard/java/datastore/jdo/overview-dn2
[2] http://massapi.com/method/javax/jdo/PersistenceManager.newQuery-4.html
[3] https://cloud.google.com/appengine/docs/standard/java/datastore/jdo/queries
[4] https://cloud.google.com/datastore/docs/concepts/overview#comparison_with_traditional_databases
[5] https://cloud.google.com/datastore/docs/reference/gql_reference

Google NDB: Best way to read child entities from an entity, repeated property vs regular query?

Let's say i have this really simple parent/child relatiosnship (any Answer class instances always has a Question parent):
class Answer(ndb.Model):
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
class Question(ndb.Model):
content = ndb.StringProperty()
answers = ndb.KeyProperty(repeated = True, kind = 'Answer')
def to_message(self):
"""Returns a protoRPC message object of the question"""
The two to message methods are simply used to return a protoRPC object.
The question is: in my to_message method, in the Question class, if i want to fetch all child Answer instances, retrieve them, and use their own to_message method to make them into a nice rpc Message, is it better to:
Iterate over the anwers repeated KeyProperty list
Do a query using a filter on the "parent" property, and iterate over the list it outputs
In terms of NDB access, the first method seems to be the best, but since we're going to go over the free limit anyway, i'm more wondering if the datastore is not more efficient at fetching stuff than i am, iterating over that list.
Edit: The original question has actually a very simple and obvious answer: the first way.
The real question would be, in case I have to filter out some Answer entities based on their attributes (for instance timestamp): is it better to query using a filter, or iterate over the list and use a condition to gather only the "interesting" entities?
With that schema you don't have to query anything because you already have the keys of each answer as a list of keys in question_entity.answers
So you only have to fetch the answers using that keys. Is better if you get all the answers in only one operation.
list_of_answers = ndb.get_multi(question_entity.answers)
(More info at NDB Entities and Keys)
On the other hand, if you model that relationship with a KeyProperty in Answer:
class Answer(ndb.Model):
question = ndb.KeyProperty(Question)
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
or with ancestors:
answer = Answer(parent=question_entity.key)
In these cases you should use a normal query for retrieve the answers:
answers = Answer.query(Answer.question == question_entity.key)
or an ancestor query:
answers = Answer.query(ancestor = question_entity.key)
respectively.
This means two jobs: Query the index plus fetching the datastore. In conclusion, in this case the first approach is cheaper for retrieving datastore data.
Using ndb.get_multi on the list of keys to fetch the Answers, and then iterating to call their to_message methods will be the most efficient.

Google app engine: ndb sort property

I have the following model:
class Product(ndb.Model):
name = ndb.StringProperty()
bidTime = ndb.DateTimeProperty()
price = ndb.IntegerProperty()
...
I'd likd to use the following query:
productRanks = Product.query(Product.bidTime>=startDate,
Product.bidTime<endDate).order(-Product.price).fetch()
where startDate and endDate are datetime objects. But I got the following error message:
The first sort property must be the same as the property to which the inequality filter is applied
If I add Product.bidTime in the order then there will be no error:
.order(Product.bidTime, -Product.price)
However, the sorted result would be wrong (according to date, not price). So, what is the problem?
There is no problem as far as appengine is concerned. It is behaving as documented. From the docs
Note: Because of the way the App Engine Datastore executes queries, if
a query specifies inequality filters on a property and sort orders on
other properties, the property used in the inequality filters must be
ordered before the other properties.
See https://developers.google.com/appengine/docs/python/datastore/queries#Sort_Orders
You may need to sort in memory after you get your result set.

NDB projection & caching questions

I have a couple of doubts regarding how NDB projection queries work and how the caching behaves behind the scenes
So given a model similar to:
class Users(ndb.Model):
user_name = ndb.StringProperty(required=True)
user_email = ndb.StringProperty(required=True)
user_password = ndb.StringProperty(required=True)
#classmethod # THIS ONE DOES NOT WORK
def get_profile_info(cls, id):
return ndb.Key(Users, id).get(projection=[Users.user_name])
#classmethod # THIS ONE WORKS
def get_profile_info(cls, id):
return Users.query(Users.key == ndb.Key(Users, id)).get(projection=[Users.user_name])
Why does the first classmethod raise a "TypeError: Unknown configuration option ('projection')"? Can't I simply call a projection on a direct get of a key, instead of having to query for a key?
Secondly, regarding caching, I'm not sure if I correctly understood this thread: NDB Caching When Using Projected Queries
Aren't projected queries cached? Does this mean its better to simply call a get() (and fetch the whole instance) so it is cached, instead of projecting?
Thanks in advance!
As per the error a projection makes no sense when using get. From the docs " It only gets values for those properties in the projection. It gets this data from the query index (and thus, properties in the projection must be indexed)". So doing get isn't accessing the object properties via indexes. Note gvr's comment on caching in your referenced question.

Resources