Can't delete Datastore entities in google-app-engine - google-app-engine

I tried to delete all datastore entities in two different ways but I get an error:
Try 1:
results = myDS().query().fetch()
for res in results:
res.delete()
Try 2:
results = myDS().query().fetch()
ndb.delete_multi(results)
In both cases it fails and I get the error:
The server encountered an error and could not complete your request.
Any idea why?

In the results obtained from your queries you have actual entities.
In the first try, to delete an entity, you need to call .delete() on the entity's key, not on the entity itself, see also Deleting entities:
res.key.delete()
Similarly, in the 2nd try, you need to pass entity keys, not entities, to ndb.delete_multi(), see also Using batch operations:
ndb.delete_multi([r.key for r in results])
But in both cases it's more efficient to directly obtain just the entity keys from the queries (you don't actually need the entities themselves to delete them). It's also cheaper as you'd be skipping datastore read ops. Your tries would look like this:
keys = myDS().query().fetch(keys_only=True)
for key in keys:
key.delete()
keys = myDS().query().fetch(keys_only=True, limit=500) # up to 500 keys at a time
ndb.delete_multi(keys)

Related

neo4j 2.1.7: How can I index my current data which was there before

I had an old schema and imported a new schema to db, and tried to apply the indexes from the new schema also on the old schema
I created the indexes, and I wrote a cypher which is influenced by this indexes, but it took a long time until it returned an answer.
What I've tried to do in order to solve it -
I understand that after I create the index, only the data after I created the index would be indexed.
I also tried to write some cyphers which need to trigger the index, but it didn't work.
tl;dr -
How can I trigger those indexes on the old scheme?
You can use the :SCHEMA command in the Neo4j browser to see what indexes have been created and their current status (if they are online or still being built).
For example, this is the result of running :SCHEMA in the Neo4j browser for a simple graph with only one index:
:SCHEMA
Indexes
ON :Tag(title) ONLINE
No constraints

Generating and using entity keys within an ndb.Transaction

I'm trying to understand this code snippet taken from a sample app showing how to use the App Engine Search API:
rid = models.Review.allocate_ids(size=1)[0]
key = ndb.Key(models.Review._get_kind(), rid)
def _tx():
review = models.Review(
key=key,
product_key=prod.key,
username=username, rating=rating,
comment=comment)
review.put()
# in a transactional task, update the parent product's average
# rating to include this review's rating, and flag the review as
# processed.
defer(utils.updateAverageRating, key, _transactional=True)
return review
return ndb.transaction(_tx)
So basically it creates a Review entity for a Product and adds a deferred task to update the average rating of that product. I have two questions:
Why does this need to be wrapped in a transaction in the first place? What could go wrong if it wasn't placed in a transaction?
Why does the Review entity key need to be generated outside the transaction? I wasn't even aware of the allocate_ids() method before this example. Wouldn't this simpler alternative work as well?
def _tx():
review = models.Review(
product_key=prod.key,
username=username, rating=rating,
comment=comment)
review.put()
# in a transactional task, update the parent product's average
# rating to include this review's rating, and flag the review as
# processed.
defer(utils.updateAverageRating, review.key, _transactional=True)
return review
return ndb.transaction(_tx)
I don't get it either. As far as I can tell there's absolutely no reason to manually allocate the key outside of the transaction. model.put() returns the entity's key and you can grab it from the instance as well (as you do in your example). If the put() had been asynchronous, I could've seen the logic (I wouldn't have done it that way myself though), but it's not.
There doesn't seem to be any good reason for the transaction either. If you're only updating a single entity with data that does not depend on data that's already persisted to the datastore, you don't need a transaction. The only reason I can think of is ensuring that the deferred task is run only if the transaction applies successfully, but since there's no real reason for having the transaction in the first place you can just get rid of it and skip adding the deferred task if the write operation fails.

GAE Datastore Contention Issue

Our GAE app makes a local copy of another website's relational database in the NDB. There are 4 entity types - User, Table, Row, Field. Each user has a bunch of tables, each table has a bunch of rows, each row has a bunch of fields.
SomeUser > SomeTable > ARow > AField
Thus, each User becomes one entity group. I need a feature where I can clear out all the tables (and their rows) for a certain user. What is the right way to delete all the tables and all the rows, while avoiding the contention limit of ~5 operations/second.
The current code is getting TransactionFailedErrors because of contention on the Entity Group.
(detail that I overlooked is that we only want to delete tables with the attribute 'service' set to a certain value)
def delete_tables_for_service(user, service):
tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
for table in tables:
keys = []
keys += Fields.query(ancestor=table).fetch(keys_only=True)
keys += TableRows.query(ancestor=table).fetch(keys_only=True)
keys.append(table)
ndb.delete_multi(keys)
If all of the entities you're deleting are in one entity group, try deleting them all in one transaction. Without an explicit transaction, each delete is occurring in its own transaction, and all of the transactions have to line up (via contention and retries) to change the entity group.
Are you sure it's contention-based, or perhaps because the code above is executed within a transaction? A quick fix might be to increase the number of retries and turn on cross-group transactions for this method:
#ndb.transactional(retries=5, xg=True)
You can read more about that here: https://developers.google.com/appengine/docs/python/ndb/transactions. If that's not the culprit, maybe consider deferring or running the deletes asynchronously so they execute over time and in smaller batches. The trick with NDB is to do small bursts of work regularly, versus a large chunk of work infrequently. Here is one way to turn that code into an asynchronous unit of work:
def delete_tables_for_service(user, service):
tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
for table in tables:
# Delete fields
fields_keys = Fields.query(ancestor=table).fetch(keys_only=True)
ndb.delete_multi_async(fields_keys)
# Delete table rows
table_rows_keys = TableRows.query(ancestor=table).fetch(keys_only=True)
ndb.delete_multi_async(table_rows_keys)
# Finally delete table itself
ndb.delete_async(table.key)
If you want more control over the deletes, retries, failures, you can either use Task Queues, or simply use the defer library (https://developers.google.com/appengine/articles/deferred):
Turn deferred on in your app.yaml
Change the calls to ndb.delete_multi to deferred:
def delete_tables_for_service(user, service):
tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
for table in tables:
keys = []
keys += Fields.query(ancestor=table).fetch(keys_only=True)
keys += TableRows.query(ancestor=table).fetch(keys_only=True)
keys.append(table)
deferred.defer(_deferred_delete_tables_for_keys, keys)
def _deferred_delete_tables_for_keys(keys):
ndb.delete_multi(keys)

Only Ancestor queries are allowed inside transactions, how to deal with it?

I need to do a query inside a Transaction, however I don't know the Entity #Id, what I have is a value of a field, like a username but not the ID,
So in other words, I can't create a Key to do the query. How can I do a query to get an Entity inside a Transaction?
Without delving into deeper design issues, there are really two options:
1) Run the query outside of a transaction.
Objectify (which you tagged this post with) makes it easy to execute non-transactional queries even while inside a transaction. Just spawn a new ofy instance not associated with a transaction and use that to run the query... then go back to working in your transaction. Keep in mind that this does break out of the transaction and could have effects on the integrity of the operation. Often it doesn't matter.
If you're using Objectify4 you can just run the operation like this:
ofy.transactionless.load().type(Thing.class).filter("field", value)...etc
2) Use a lookup entity
This is typically the right answer when dealing with things like usernames. Create a separate entity which maps the username to your User object like this:
class Username {
#Id String username;
Key<User> user;
}
Use XG transactions to create a Username every time you create a User, and update it if you allow your usernames to change. Now, to perform a transactional lookup of User by username, first lookup the Username and then use that to lookup the User.
I've had a similar problem, and the simplest method I've come up with is to use one dummy parent entity. Instead of using XG transactions and inserting another Username entity for every other User entity, just create one dummy parent entity and set that as the ancestor of every User entity you create from there on. I think this method saves a lot of space and data management issues.

Safely deleting a Django model from the database using a transaction

In my Django application, I have code that deletes a single instance of a model from the database. There is a possibility that two concurrent requests could both try to delete the same model at the same time. In this case, I want one request to succeed and the other to fail. How can I do this?
The problem is that when deleting a instance with delete(), Django doesn't return any information about whether the command was successful or not. This code illustrates the problem:
b0 = Book.objects.get(id=1)
b1 = Book.objects.get(id=1)
b0.delete()
b1.delete()
Only one of these two delete() commands actually deleted the object, but I don't know which one. No exceptions are thrown and nothing is returned to indicate the success of the command. In pure SQL, the command would return the number of rows deleted and if the value was 0, I would know my delete failed.
I am using PostgreSQL with the default Read Commited isolation level. My understanding of this level is that each command (SELECT, DELETE, etc.) sees a snapshot of the database, but that the next command could see a different snapshot of the database. I believe this means I can't do something like this:
# I believe this wont work
#commit_on_success
def view(request):
try:
book = Book.objects.get(id=1)
# Possibility that the instance is deleted by the other request
# before we get to the next delete()
book.delete()
except ObjectDoesntExist:
# Already been deleted
Any ideas?
You can put the constraint right into the SQL DELETE statement by using QuerySet.delete instead of Model.delete:
Book.objects.filter(pk=1).delete()
This will never issue the SELECT query at all, just something along the lines of:
DELETE FROM Book WHERE id=1;
That handles the race condition of two concurrent requests deleting the same record at the same time, but it doesn't let you know whether your delete got there first. For that you would have to get the raw cursor (which django lets you do), .execute() the above DELETE yourself, and then pick up the cursor's rowcount attribute, which will be 0 if you didn't wind up deleting anything.

Resources