collection_name for ndb.KeyProperty

collection_name for ndb.KeyProperty - google-app-engine

db.ReferenceProperty allowed for a collection_name argument that specified how the reference object R could query for the objects that contain R as a reference.
I don't see a similar argument for ndb.KeyProperty. How do folks get around that?

It took a little getting used to when I made the switch to ndb, but it is actually simpler. The collection_name is just syntactic sugar for a query, and you can do the query yourself:
MyEntity(ndb.Model):
a_key = ndb.KeyProperty()
entities = MyEntity.query(MyEntity.a_key == some_key)

Related

How to get result based on specific keywords using Google App Engine NO-SQL datastore query in java?

We use GoogleInfo table, in that table, we store scopes of application. e.g."https://www.googleapis.com/auth/drive.appdata,https://www.googleapis.com/auth/drive.file"
and I am trying to get a result based on some keyword like "drive" from scopes, based on that keyword I am trying to get a result. following is my code. Please suggest.
String keywords[] = {"admin","drive","gmail","userinfo"};
Query query = pm.newQuery("SELECT scopes FROM com.cloudcodes.gcontrol.dataaccesslayer.insights.google.drive.GoogleInfo where :p.contains(scopes)");
result = (List) query.execute(Arrays.asList(keywords));
List tempResult = new ArrayList();
tempResult.addAll(result);
return tempResult;

Seems that you are using JDO [1] and the PersistanceManager [2].
As far as I can see the query you are trying to execute might be wrong. Check how to perform queries with JDO [3].
Your query is:
String keywords[] = {"admin","drive","gmail","userinfo"};
Query query = pm.newQuery("SELECT scopes FROM com.cloudcodes.gcontrol.dataaccesslayer.insights.google.drive.GoogleInfo where :p.contains(scopes)");
result = (List) query.execute(Arrays.asList(keywords));
Looks like you want to do something like:
//Query for all persons with lastName equal to Smith or Jones
Query q = pm.newQuery(Person.class, ":p.contains(lastName)");
q.execute(Arrays.asList("Smith", "Jones"));
Person.class is the kind
Arrays.asList("Smith", "Jones")
This parameter ":p.contains(lastName)" that defines lastName is the property we want to check on.
You are setting the class as com.cloudcodes.gcontrol.dataaccesslayer.insights.google.drive.GoogleInfo, I suppose that this is the full Java name package where the class habits, and the class name is GoogleInfo. So you could try:
Query q = pm.newQuery(GoogleInfo.class, ":p.contains(scopes)");
q.execute(Arrays.asList("admin","drive","gmail","userinfo"));
You want to retrieve scopes. I assume that you want to use the REST API. So inside :p.contains(“scopes”) might go another property related to your keyWords that is in the Entity you want to retrieve, maybe an array property?
Here I share with you some docs that might be useful [4][5].
Hope this helps!
[1] https://cloud.google.com/appengine/docs/standard/java/datastore/jdo/overview-dn2
[2] http://massapi.com/method/javax/jdo/PersistenceManager.newQuery-4.html
[3] https://cloud.google.com/appengine/docs/standard/java/datastore/jdo/queries
[4] https://cloud.google.com/datastore/docs/concepts/overview#comparison_with_traditional_databases
[5] https://cloud.google.com/datastore/docs/reference/gql_reference

Django query filter using large array of ids in Postgres DB

I want to pass a query in Django to my PostgreSQL database. When I filter my query using a large array of ids, the query is very slow and goes up to 70s.
After looking for an answer I saw this post which gives a solution to my problem, simply change the ARRAY [ids] in IN statement by VALUES (id1), (id2), ....
I tested the solution with a raw query in pgadmin, the query goes from 70s to 300ms...
How can I do the same command (i.e. not using an array of ids but a query with VALUES) in Django?

I found a solution building on #erwin-brandstetter answer using a custom lookup
from django.db.models import Lookup
from django.db.models.fields import Field
#Field.register_lookup
class EfficientInLookup(Lookup):
lookup_name = "ineff"
def as_sql(self, compiler, connection):
lhs, lhs_params = self.process_lhs(compiler, connection)
rhs, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + rhs_params
return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params
This allows to filter like this:
MyModel.objects.filter(id__ineff=<list-of-values>)

The trick is to transform the array to a set somehow.
Instead of (this form is only good for a short array):
SELECT *
FROM tbl t
WHERE t.tbl_id = ANY($1);
-- WHERE t.tbl_id IN($1); -- equivalent
$1 being the array parameter.
You can still pass an array like you had it, but unnest and join. Like:
SELECT *
FROM tbl t
JOIN unnest($1) arr(id) ON arr.id = t.tbl_id;
Or you can keep your query, too, but replace the array with a subquery unnesting it:
SELECT * FROM tbl t
WHERE t.tbl_id = ANY (SELECT unnest($1));
Or:
SELECT * FROM tbl t
WHERE t.tbl_id IN (SELECT unnest($1));
Same effect for performance as passing a set with a VALUES expression. But passing the array is typically much simpler.
Detailed explanation:
IN vs ANY operator in PostgreSQL
How to use ANY instead of IN in a WHERE clause with Rails?
Optimizing a Postgres query with a large IN

Is this an example of the first thing you're asking?
relation_list = list(ModelA.objects.filter(id__gt=100))
obj_query = ModelB.objects.filter(a_relation__in=relation_list)
That would be an "IN" command because you're first evaluating relation_list by casting it to a list, and then using it in your second query.
If instead you do the exact same thing, Django will only make one query, and do SQL optimization for you. So it should be more efficient that way.
You can always see the SQL command you'll be executing with obj_query.query if you're curious what's happening under the hood.
Hope that answers the question, sorry if it doesn't.

I had lots of trouble to make the custom lookup 'ineff' work.
I may have solved it, but would love some validation from Django and Postgres experts.
1) Using it 'directly' on a ForeignKey field (ModelB)
ModelA.objects.filter(ModelB__ineff=queryset_ModelB)
Throws the following exception:
"Related Field got invalid lookup: ineff"
ForeignKey fields cannot be used with custom lookups.
A similar issue is reported here:
Custom lookup is not being registered in Django
2) Using it 'indirectly' on the pk field of related model (ModelB.id)
ModelA.objects.filter(ModelB__id__ineff=queryset_ModelB.values_list('id', flat=True))
Throws the following exception:
"can only concatenate list (not "tuple") to list"
Looking at Django Traceback, I noticed that rhs_params is a tuple.
Yet we try to add it to lhs_params (a list) in our custom lookup.
Hence I changed:
params = lhs_params + rhs_params
into:
params = lhs_params + list(rhs_params)
3) I then got a Postgres error (at least I had passed Django ORM)
"function unnest(uuid) does not exist"
"HINT: No function matches the given name and argument types. You might need to add explicit type casts."
I apparently solved it by changing the sql:
from:
return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params
to:
return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params
Hence my final as_sql method looks like this:
def as_sql(self, compiler, connection):
lhs, lhs_params = self.process_lhs(compiler, connection)
rhs, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + list(rhs_params)
return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params
It seems to work, and is indeed faster than in__ (tested with EXPLAIN ANALYZE in Postgres).
But I would love to have some validation from experts, perhaps Erwin Brandstetter?
Thanks for your input.

Can't execute a distinct projection query

I have a simple little "Observation" class:
from google.appengine.ext import ndb
class Observation(ndb.Model):
remote_id = ndb.StringProperty()
dimension_id = ndb.IntegerProperty()
metric = ndb.StringProperty()
timestamp_observed = ndb.StringProperty()
timestamp_received = ndb.DateTimeProperty(auto_now_add=True)
#classmethod
def query_book(cls):
return cls.query()
I can run projection queries against the Datastore to return only certain columns. E.g:
observations = Observation.query().fetch(projection=[Observation.dimension_id])
This works nicely, but I only want unique results. The documentation makes this sound easy:
# Functionally equivalent
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
But when I do this:
observations = Observation.query().fetch(projection=[Observation.dimension_id], group_by=[Observation.dimension_id])
observations = Observation.query().fetch(projection=[Observation.dimension_id], distinct=True)
I get errors for both variants.
TypeError: Unknown configuration option ('group_by')
TypeError: Unknown configuration option ('distinct')
This happens on localhost and in the prod environment too. What am I missing?

Silly me - all of these params need to sit within the query() function, not within fetch(). The projection elements actually works in fetch(), but you need to move both the projection and distinct arguments into query() to get it to work.
From Grouping:
Projection queries can use the distinct keyword to ensure that only
completely unique results will be returned in a result set. This will
only return the first result for entities which have the same values
for the properties that are being projected.
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
Both queries are equivalent and will produce each author's name only
once.
Hope this helps anyone else with a similar problem :)

How do Django queries 'cast' argument strings into the appropriate field-matching types?

Let's take the Django tutorial. In the first part we can find this model:
class Poll(models.Model):
question = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')
with which Django generates the following SQL:
CREATE TABLE "polls_poll" (
"id" serial NOT NULL PRIMARY KEY,
"question" varchar(200) NOT NULL,
"pub_date" timestamp with time zone NOT NULL
);
One can note that Django automatically added an AutoField, gloriously named id, which is akin to an IntegerField in that it handles integers.
On part 3, we build a custom view, reachable through the following url pattern:
(r'^polls/(?P<poll_id>\d+)/$', 'polls.views.detail'),
The tutorial helpfully explains that a subsequent HTTP request will result in the following call:
detail(request=<HttpRequest object>, poll_id='23')
A few scrolls later, we can find this snippet:
def detail(request, poll_id):
try:
p = Poll.objects.get(pk=poll_id)
Notice how the URL tail component becomes the poll_id argument with a string value of '23', happily churned by the Manager (and therefore QuerySet) get method to produce the result of an SQL query containing a WHERE clause with an integer value of 23 certainly looking like that one:
SELECT * FROM polls_poll WHERE id=23
Certainly Django performed the conversion from the fact that the id field is an AutoField one. The question is how, and when. Specifically, I want to know which internal methods are called, and in what order (kind of like what the doc explains for form validation).
Note: I took a look at sources in django.db.models and found a few *prep* methods, but don't know neither when or where they are called, let alone if they're what I'm looking for.
PS: I know it's not casting stricto sensu, but I think you get the idea.

I think it's in django.db.models.query.get_where_clause

Querying on keys only from a back-reference

If I have a class that has a back reference (eg something_set), how do I query for keys only on that set? The Query() constructor allows you to do this by settings keys_only=True, but as far as I can tell, filtering directly on the back reference always de-references the entities when it returns them.

You can't - keys_only needs to be set when the Query is constructed, and that's already done for you when you access something_set.
That said, foo.bar_set is just Syntactic sugar for:
q = Foo.all().filter('bar =', foo_instance)
So you can do that and use the keys_only operator in the all() method.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

collection_name for ndb.KeyProperty - google-app-engine

db.ReferenceProperty allowed for a collection_name argument that specified how the reference object R could query for the objects that contain R as a reference. I don't see a similar argument for ndb.KeyProperty. How do folks get around that?

It took a little getting used to when I made the switch to ndb, but it is actually simpler. The collection_name is just syntactic sugar for a query, and you can do the query yourself: MyEntity(ndb.Model): a_key = ndb.KeyProperty() entities = MyEntity.query(MyEntity.a_key == some_key)

Related

How to get result based on specific keywords using Google App Engine NO-SQL datastore query in java?

Django query filter using large array of ids in Postgres DB

Can't execute a distinct projection query

How do Django queries 'cast' argument strings into the appropriate field-matching types?

Querying on keys only from a back-reference

Categories

Resources