Django-queryset get one object per field=foo - database

I have a basic FK to user, call it owner
class Baz(models.Model):
owner = models.ForeignKeyField(User)
....
....
Now, with a queryset of Baz's, is there something that I can chain that will give me only one Baz per owner?

The function you are truly looking for is GROUP BY. However, Django does not typically support building querysets that do not directly output model instances. In this situation, you have two approachs:
Baz.objects.values('owner').distinct()
This will net you each distinct owner, but not the Baz object itself.
Baz.objects.filter(pk__in=Baz.objects.values('owner').distinct())
The above will perform a subquery (at least in MySQL) and should give the intended results, but isn't the most efficient way to retrieve it.
Lastly, since aggregates have been added, it may be possible for you to write a custom aggregate class which would work as a kind of "Distinct" and simply "GROUP BY ".

I believe the question is how to run the equivalent of this:
SELECT * FROM myapp_baz GROUP BY owner_id;
Which will return one row for each unique owner_id.
It looks like this does the trick:
qs = Baz.objects.all()
qs.query.group_by = ['owner_id']
# Seems to do the trick
print [item for item in qs]

This is probably not the best solution (would like to keep it out of memory and in querysets) but:
>>> d={}
>>> [d.setdefault(str(a.owner),a) for a in qs ]
>>> d.values()
does return a list of objects, the latest for each owner. I have real reservations about the scalability of this solution.

EDIT: This has a higher chance of working:
CheckIn.objects.extra(where=['checkins_checkin.id = (SELECT MAX(checkins_checkin.id) FROM checkins_checkin temp, auth_user WHERE auth_user.id = temp.id AND temp.user_id = checkins_checkin.user_id)',]).count()

Related

Query given keys

I would like to accomplish some sort of hybrid solution between ndb.get_multi() and Query().
I have a set of keys, that I can use with:
entities = ndb.get_multi(keys)
I would like to query, filter, and order these entities using Query() or some more efficient way than doing all myself in the Python code manually.
How do people go about doing this? I want something like this:
query = Entity.gql('WHERE __key__ in :1 AND prop1 = :2 ORDER BY prop2', keys, 'hello')
entities = query.fetch()
Edit:
The above code works just fine, but it seems like fetch() never uses values from cache, whereas ndb.get_multi() does. Am I correct about this? If not, is the gql+fetch method much worse than get_multi+manual processing?
There are no way to use a query on already fetched properties, unless you will write it by yourself, but all this stuff can be easily done with built-in python filters. Note that its more efficient to run a query if you have a big dataset, rather than get_multi hundreds of keys to get only 5 entities.
entities = ndb.get_multi(keys)
# filtering
entities = [e for e in entities if e.prop1 == 'bla' and e.prop2 > 3]
#sorting by multiple properties
entities = sorted(entities, key=lambda x: (x.prop1, x.prop2))
UPDATE: And yes, cache is only used when you receive your entity by key, it is not used when you query for entities.

Why properties referenced in an equality (EQUAL) or membership (IN) filter cannot be projected?

https://developers.google.com/appengine/docs/java/datastore/projectionqueries
Why a projected query such as this : SELECT A FROM kind WHERE A = 1 not supported ?
Because it makes no sense. You are asking
SELECT A FROM kind WHERE A = 1
so, give me A where A = 1. Well, you already know that A = 1. It makes no sense for DB to allow that.
The IN query is internally just a series of equals queries merged together, so the same logic applies to it.
The reasoning behind this could be that since you already have the values of the properties you are querying you don't need them returned by the query. This is probably a good thing in the long run, but honestly, it's something that App Engine should allow anyway. Even if it didn't actually fetch these values from the datastore, it should add them to the entities returned to you behind the scenes so you can go about your business.
Anyway, here's what you can do...
query = MyModel.query().filter(MyModel.prop1 == 'value1', MyModel.prop2 == 'value2)
results = query.fetch(projection=[MyModel.prop3])
for r in results:
r.prop1 = 'value1' # the value you KNOW is correct
r.prop2 = 'value2'
Again, would be nice for this to happen behind the scenes because I don't think it's something anybody should ever care about. If I mention a property in a projection list, I'm already stating that I want that property as part of my entities. I shouldn't have to do any more computation to get that to happen.
On the other hand, it's just an extra for-loop. :)

Rails 3, ActiveRecord, PostgreSQL - ".uniq" command doesn't work?

I have following query:
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq
and gives me the error
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L...
When I update the original query to
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq
so the error is gone... In MySQL .uniq works, in PostgreSQL not. Exist any alternative?
As the error states for SELECT DISTINCT, ORDER BY expressions must appear in select list.
Therefore, you must explicitly select for the clause you are ordering by.
Here is an example, it is similar to your case but generalize a bit.
Article.select('articles.*, RANDOM()')
.joins(:users)
.where(:column => 'whatever')
.order('Random()')
.uniq
.limit(15)
So, explicitly include your ORDER BY clause (in this case RANDOM()) using .select(). As shown above, in order for your query to return the Article attributes, you must explicitly select them also.
I hope this helps; good luck
Just to enrich the thread with more examples, in case you have nested relations in the query, you can try with the following statement.
Person.find(params[:id]).cars.select('cars.*, lower(cars.name)').order("lower(cars.name) ASC")
In the given example, you're asking all the cars for a given person, ordered by model name (Audi, Ferrari, Porsche)
I don't think this is a better way, but may help to address this kind of situation thinking in objects and collections, instead of a relational (Database) way.
Thanks!
I assume that the .uniq method is translated to a DISTINCT clause on the SQL. PostgreSQL is picky (pickier than MySQL) -- all fields in the select list when using DISTINCT must be present in the ORDER_BY (and GROUP_BY) clauses.
It's a little unclear what you are attempting to do (a random ordering?). In addition to posting the full SQL sent, if you could explain your objective, that might be helpful in finding an alternative.
I just upgraded my 100% working and tested application from 3.1.1 to 3.2.7 and now have this same PG::Error.
I am using Cancan...
#users = User.accessible_by(current_ability).order('lname asc').uniq
Removing the .uniq solves the problem and it was not necessary anyway for this simple query.
Still looking through the change notes between 3.1.1 and 3.2.7 to see what caused this to break.

How to debug GQL queries in GAE?

I have some custom user model, and I count the number of users with name Joe:
c = UserModel.all().filter('name =', 'Joe').count()
Even though I know there is a Joe in the datastore, there is some mistake witch makes c == 0.
This is a problem I'm dealing with, however the biggest problem is that I don't know how to debug this.
I would like to get some query and visualise it somehow, so that I can understand what is there and why Joe is not there:
v = magically_visualise_contents_of(UserModel.all().filter('name =','Joe'))
handler.response.out.write(v)
Try running the query directly in the datastore viewer by GQL.
That usually helps identify minor issues, for example:
SELECT * FROM UserModel WHERE name = 'Joe'
Also, one common mistake with string matching is whitespace characters in the data, like "Joe ".

Datastore Query filtering on list

Select all records, ID which is not in the list
How to make like :
query = Story.all()
query.filter('ID **NOT IN** =', [100,200,..,..])
There's no way to do this efficiently in App Engine. You should simply select everything without that filter, and filter out any matching entities in your code.
This is now supported via GQL query
The 'IN' and '!=' operators in the Python runtime are actually
implemented in the SDK and translate to multiple queries 'under the
hood'.
For example, the query "SELECT * FROM People WHERE name IN ('Bob',
'Jane')" gets translated into two queries, equivalent to running
"SELECT * FROM People WHERE name = 'Bob'" and "SELECT * FROM People
WHERE name = 'Jane'" and merging the results. Combining multiple
disjunctions multiplies the number of queries needed, so the query
"SELECT * FROM People WHERE name IN ('Bob', 'Jane') AND age != 25"
generates a total of four queries, for each of the possible conditions
(age less than or greater than 25, and name is 'Bob' or 'Jane'), then
merges them together into a single result set.
source: appengine blog
This is an old question, so I'm not sure if the ID is a non-key property. But in order to answer this:
query = Story.all()
query.filter('ID **NOT IN** =', [100,200,..,..])
...With ndb models, you can definitely query for items that are in a list. For example, see the docs here for IN and !=. Here's how to filter as the OP requested:
query = Story.filter(Story.id.IN([100,200,..,..])
We can even query for items that in a list of repeated keys:
def all(user_id):
# See if my user_id is associated with any Group.
groups_belonged_to = Group.query().filter(user_id == Group.members)
print [group.to_dict() for group in belong_to]
Some caveats:
There's docs out there that mention that in order to perform these types of queries, Datastore performs multiple queries behind the scenes, which (1) might take a while to execute, (2) take longer if you searching in repeated properties, and (3) will up your costs with more operations.

Resources