BadArgumentError: _MultiQuery with cursors requires __key__ order in ndb - google-app-engine

I can't understand what this error means and apparently, no one ever got the same error on the internet
BadArgumentError: _MultiQuery with cursors requires __key__ order
This happens here:
return SocialNotification.query().order(-SocialNotification.date).filter(SocialNotification.source_key.IN(nodes_list)).fetch_page(10)
The property source_key is obviously a key and nodes_list is a list of entity keys previously retrieved.
What I need is to find all the SocialNotifications that have a field source_key that match one of the keys in the list.

The error message tries to tell you you that queries involving IN and cursors must be ordered by __key__ (which is the internal name for the key of the entity). (This is needed so that the results can be properly merged and made unique.) In this case you have to replace your .order() call with .order(SocialNotification._key).

It seems that this also happens when you filter for an inequality and try to fetch a page.
(e.g. MyModel.query(MyModel.prop != 'value').fetch_page(...) . This basically means (unless i missed something) that you can't fetch_page when using an inequality filter because on one hand you need the sort to be MyModel.prop but on the other hand you need it to be MyModel._key, which is hard :)

I found the answer here: https://developers.google.com/appengine/docs/python/ndb/queries#cursors
You can change your query to:
SocialNotification.query().order(-SocialNotification.date, SocialNotification.key).filter(SocialNotification.source_key.IN(nodes_list)).fetch_page(10)
in order to get this to work. Note that it seems to be slow (18 seconds) when nodes_list is large (1000 entities), at least on the Development server. I don't have a large amount of test
data on a test server.

You need the property you want to order on and key.
.order(-SocialNotification.date, SocialNotification.key)

I had the same error when filtering without a group.
The error occurred every time my filter returned more than one result.
To fix it I actually had to add ordering by key.

Related

Correct way to retrieve a single object from Realm database

I am absolutely loving Realm (0.92) in combination with Swift but have a question about reading an object from the database. My goal is to retrieve a single object with a known, unique ID (which also happens to be the primary key.
All the documentation appears to be oriented around queries for multiple objects which are then filtered. In this case I know the object ID and, since it is known to be unique, would like to retrieve it directly.
My current approach is as follows:
Realm().objects(Book).filter("id == %#", prevBook.nextID).first
This seems heavy-handed. Documentation from prior versions suggest that there is a more direct way but I can't seem to locate it in the documentation.
The problem with my current approach is that it is crashing with an exception on the following function:
public func filter(predicateFormat: String, _ args: CVarArgType...) -> Results<T>
The exception is mysteriously reported as:
EXC_BAD_ACCESS (code=1, address=0xedf)
Any suggestions are very welcome.
Anticipating one line of questioning: I have confirmed that replacing prevBook.nextID with a known, good ID does not solve the problem
object(ofType:forPrimaryKey:) is what you're looking for: Realm().object(ofType: Book.self, forPrimaryKey: prevBook.nextId). There's no simpler way than filter().first if you need to search for the object by something other than the primary key.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

ndb retrieving entity key by ID without parent

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

Quickly finding the last item in a database Cakephp

I just inherited some cakePHP code and I am not very familiar with it (or any other php/serverside language). I need to set the id of the item I am adding to the database to be the value of the last item plus one, originally I did a call like this:
$id = $this->Project->find('count') + 1;
but this seems to add about 8 seconds to my page loading (which seems weird because the database only has about 400 items) but that is another problem. For now I need a faster way to find the id of the last item in the database, is there a way using find to quickly retrieve the last item in a given table?
That's a very bad approach on setting the id.
You do know that, for example, MySQL supports auto-increment for INT-fields and therefore will set the id automatically for you?
The suggested functions getLastInsertId and getInsertId will only work after an insert and not always.
I also can't understand that your call adds 8 seconds to your siteload. If I do such a call on my table (which also has around 400 records) the call itself only needs a few milliseconds. There is no delay the user would notice.
I think there might be a problem with your database-setup as this seems very unlikely.
Also please have a look if your database supports auto-increment (I can't imagine that's not possible) as this would be the easiest way of adding your wanted functionality.
I would try
$id = $this->Project->getLastInsertID();
$id++;
The method can be found in cake/libs/model/model.php in line 2768
As well as on this SO page
Cheers!
If you are looking for the cakePHP3 solution to this you simply use last().
ie:
use Cake\ORM\TableRegistry;
....
$myrecordstable=Tableregistry::get('Myrecords');
$myrecords=$myrecordstable->find()->last();
$lastId = $myrecords->id;
....

Autocomplete Dropdown - too much data, timing out

So, I have an autocomplete dropdown with a list of townships. Initially I just had the 20 or so that we had in the database... but recently, we have noticed that some of our data lies in other counties... even other states. So, the answer to that was buy one of those databases with all towns in the US (yes, I know, geocoding is the answer but due to time constraints we are doing this until we have time for that feature).
So, when we had 20-25 towns the autocomplete worked stellarly... now that there are 80,000 it's not as easy.
As I type I am thinking that the best way to do this is default to this state, then there will be much less. I will add a state selector to the page that defaults to NJ then you can pick another state if need be, this will narrow down the list to < 1000. Though, I may have the same issue? Does anyone know of a work around for an autocomplete with a lot of data?
should I post teh codez of my webservice?
Are you trying to autocomplete after only 1 character is typed? Maybe wait until 2 or more...?
Also, can you just return the top 10 rows, or something?
Sounds like your application is suffocating on the amount of data being returned, and then attempted to be rendered by the browser.
I assume that your database has the proper indexes, and you don't have a performance problem there.
I would limit the results of your service to no more than say 100 results. Users will not look at any more than that any how.
I would also only being retrieving the data from the service once 2 or 3 characters are entered which will further reduce the scope of the query.
Good Luck!
Stupid question maybe, but... have you checked to make sure you have an index on the town name column? I wouldn't think 80K names should be stressing your database...
I think you're on the right track. Use a series of cascading inputs, State -> County -> Township where each succeeding one grabs the potential population based on the value of the preceding one. Each input would validate against its potential population to avoid spurious inputs. I would suggest caching the intermediate results and querying against them for the autocomplete instead of going all the way back to the database each time.
If you have control of the underlying SQL, you may want to try several "UNION" queries instead of one query with several "OR like" lines in its where clause.
Check out this article on optimizing SQL.
I'd just limit the SQL query with a TOP clause. I also like using a "less than" instead of a like:
select top 10 name from cities where #partialname < name order by name;
that "Ce" will give you "Cedar Grove" and "Cedar Knolls" but also "Chatham" & "Cherry Hill" so you always get ten.
In LINQ:
var q = (from c in db.Cities
where partialname < c.Name
orderby c.Name
select c.Name).Take(10);

Resources