Generating and using entity keys within an ndb.Transaction - google-app-engine

I'm trying to understand this code snippet taken from a sample app showing how to use the App Engine Search API:
rid = models.Review.allocate_ids(size=1)[0]
key = ndb.Key(models.Review._get_kind(), rid)
def _tx():
review = models.Review(
key=key,
product_key=prod.key,
username=username, rating=rating,
comment=comment)
review.put()
# in a transactional task, update the parent product's average
# rating to include this review's rating, and flag the review as
# processed.
defer(utils.updateAverageRating, key, _transactional=True)
return review
return ndb.transaction(_tx)
So basically it creates a Review entity for a Product and adds a deferred task to update the average rating of that product. I have two questions:
Why does this need to be wrapped in a transaction in the first place? What could go wrong if it wasn't placed in a transaction?
Why does the Review entity key need to be generated outside the transaction? I wasn't even aware of the allocate_ids() method before this example. Wouldn't this simpler alternative work as well?
def _tx():
review = models.Review(
product_key=prod.key,
username=username, rating=rating,
comment=comment)
review.put()
# in a transactional task, update the parent product's average
# rating to include this review's rating, and flag the review as
# processed.
defer(utils.updateAverageRating, review.key, _transactional=True)
return review
return ndb.transaction(_tx)

I don't get it either. As far as I can tell there's absolutely no reason to manually allocate the key outside of the transaction. model.put() returns the entity's key and you can grab it from the instance as well (as you do in your example). If the put() had been asynchronous, I could've seen the logic (I wouldn't have done it that way myself though), but it's not.
There doesn't seem to be any good reason for the transaction either. If you're only updating a single entity with data that does not depend on data that's already persisted to the datastore, you don't need a transaction. The only reason I can think of is ensuring that the deferred task is run only if the transaction applies successfully, but since there's no real reason for having the transaction in the first place you can just get rid of it and skip adding the deferred task if the write operation fails.

Related

Displaying queries in CakePHP without running them

Is there any way to run a model command such as $this->MyModel->saveall($rows) but without it actually performing an action on the database, just displaying all the queries it would run, the way it does when one of the queries has an error?
Yes you can, have a look at "Transactions" http://book.cakephp.org/2.0/en/models/transactions.html
// get the datasource and store it in a local variable
$ds = $this->MyModel->getDataSource();
// begin a "transaction"
$ds->begin();
// do your saving
$this->MyModel->saveAll($rows); // you can add more queries here, that's what transactions are all about! :)
// rollback, in a normal situation you would check if the save was successful and commit()/rollbac() depending on the situation.
$ds->rollback();
Please note: Auto Increment fields WILL increment, due to the fact that MySQL or any other database engine will "reserve" these ID's while doing the transaction in order to prevent duplicated ID's. This shouldn't be of any concern, but when you are debugging and you are remembering an ID , it could give you a headache if it's Monday morning (been there, done that)... ;-)

Only Ancestor queries are allowed inside transactions, how to deal with it?

I need to do a query inside a Transaction, however I don't know the Entity #Id, what I have is a value of a field, like a username but not the ID,
So in other words, I can't create a Key to do the query. How can I do a query to get an Entity inside a Transaction?
Without delving into deeper design issues, there are really two options:
1) Run the query outside of a transaction.
Objectify (which you tagged this post with) makes it easy to execute non-transactional queries even while inside a transaction. Just spawn a new ofy instance not associated with a transaction and use that to run the query... then go back to working in your transaction. Keep in mind that this does break out of the transaction and could have effects on the integrity of the operation. Often it doesn't matter.
If you're using Objectify4 you can just run the operation like this:
ofy.transactionless.load().type(Thing.class).filter("field", value)...etc
2) Use a lookup entity
This is typically the right answer when dealing with things like usernames. Create a separate entity which maps the username to your User object like this:
class Username {
#Id String username;
Key<User> user;
}
Use XG transactions to create a Username every time you create a User, and update it if you allow your usernames to change. Now, to perform a transactional lookup of User by username, first lookup the Username and then use that to lookup the User.
I've had a similar problem, and the simplest method I've come up with is to use one dummy parent entity. Instead of using XG transactions and inserting another Username entity for every other User entity, just create one dummy parent entity and set that as the ancestor of every User entity you create from there on. I think this method saves a lot of space and data management issues.

Safely deleting a Django model from the database using a transaction

In my Django application, I have code that deletes a single instance of a model from the database. There is a possibility that two concurrent requests could both try to delete the same model at the same time. In this case, I want one request to succeed and the other to fail. How can I do this?
The problem is that when deleting a instance with delete(), Django doesn't return any information about whether the command was successful or not. This code illustrates the problem:
b0 = Book.objects.get(id=1)
b1 = Book.objects.get(id=1)
b0.delete()
b1.delete()
Only one of these two delete() commands actually deleted the object, but I don't know which one. No exceptions are thrown and nothing is returned to indicate the success of the command. In pure SQL, the command would return the number of rows deleted and if the value was 0, I would know my delete failed.
I am using PostgreSQL with the default Read Commited isolation level. My understanding of this level is that each command (SELECT, DELETE, etc.) sees a snapshot of the database, but that the next command could see a different snapshot of the database. I believe this means I can't do something like this:
# I believe this wont work
#commit_on_success
def view(request):
try:
book = Book.objects.get(id=1)
# Possibility that the instance is deleted by the other request
# before we get to the next delete()
book.delete()
except ObjectDoesntExist:
# Already been deleted
Any ideas?
You can put the constraint right into the SQL DELETE statement by using QuerySet.delete instead of Model.delete:
Book.objects.filter(pk=1).delete()
This will never issue the SELECT query at all, just something along the lines of:
DELETE FROM Book WHERE id=1;
That handles the race condition of two concurrent requests deleting the same record at the same time, but it doesn't let you know whether your delete got there first. For that you would have to get the raw cursor (which django lets you do), .execute() the above DELETE yourself, and then pick up the cursor's rowcount attribute, which will be 0 if you didn't wind up deleting anything.

Ensuring Database Integrity when Adding and Deleting

As I am developing my database, I am working to ensure data integrity. So, for example, a Book should be deleted when its Author is deleted from the database (but not vice-versa), assuming one author.
When I setup the foreign-key, I did set up a CASCADE, so I feel like this should happen automatically if I perform a delete from LINQ. Is this true? If not, do I need to perform all the deletes on my own, or how is this accomplished?
My second question, which goes along with that, is: does the database ensure that I have all the appropriate information I need for a row when I add it to the table (e.g. I can't add a book that doesn't have an author), or do I need to ensure this myself in the business logic? What would happen if I did try to do this using LINQ to SQL? Would I get an exception?
Thanks for the help.
A cascading foreign key will cascade the delete automatically for you.
Referencial integrity will be enforced by the database; in this case, you should add the Author first and then the Book. If you violate referencial integrity, you will get an exception.
It sounds like for second question you may be interested in using a transaction. For example, you need to add several objects to the database and want to make sure all get added or none. This is what a database transaction accomplishes. And, yes you should do this in your data/business layer, you can do this by adding partial class to your datacontext classes. If your business process states that for example EVERY user MUST have ADDRESS or something to that nature. This is up to your case scenario.
LINQ automatically uses transactions provided you are within a single (using), i.e you perform everything in that one step.
If you need to perform multiple steps or combine with non LINQ database action then you can use the transaction scope. You need to enable DISTRIBUTED TRANSACTION SERVICE. This allows transactions across for example files and database.
See TransactionScope
using (TransactionScope scope = new TransactionScope())
{
do stuff here
scope.Complete
}

NHibernate session.flush() fails but makes changes

We have a SQL Server database table that consists of user id, some numeric value, e.g. balance, and a version column.
We have multiple threads updating this table's value column in parallel, each in its own transaction and session (we're using a session-per-thread model). Since we want all logical transaction to occur, each thread does the following:
load the current row (mapped to a type).
make the change to the value, based on old value. (e.g. add 50).
session.update(obj)
session.flush() (since we're optimistic, we want to make sure we had the correct version value prior to the update)
if step 4 (flush) threw StaleStateException, refresh the object (with lockmode.read) and goto step 1
we only do this a certain number of times per logical transaction, if we can't commit it after X attempts, we reject the logical transaction.
each such thread commits periodically, e.g. after 100 successful logical transactions, to keep commit-induced I/O to manageable levels. meaning - we have a single database transaction (per transaction) with multiple flushes, at least once per logical change.
what's the problem here, you ask? well, on commits we see changes to failed logical objects.
specifically, if the value was 50 when we went through step 1 (for the first time), and we tried to update it to 100 (but we failed since e.g. another thread changed it to 70), then the value of 50 is committed for this row. obviously this is incorrect.
What are we missing here?
Well, I do not have a ton of experience here, but one thing I remember reading in the documentation is that if an exception occurs, you are supposed to immediately rollback the transaction and dispose of the session. Perhaps your issue is related to the session being in an inconsistent state?
Also, calling update in your code here is not necessary. Since you loaded the object in that session, it is already being tracked by nhibernate.
If you want to make your changes anyway, why do you bother with row versioning? It sounds like you should get the same result if you simply always update the data and let the last transaction win.
As to why the update becomes permanent, it depends on what the SQL statements for the version check/update look like and on your transaction control, which you left out of the code example. If you turn on the Hibernate SQL logging it will probably become obvious how this is happening.
I'm not a nhibernate guru, but answer seems simple.
When nhibernate loads an object, it expects it not to change in db as long as it's in nhibernate session cache.
As you mentioned - you got multi thread app.
This is what happens=>
1st thread loads an entity
2nd thread loads an entity
1st thread changes entity
2nd thread changes entity and => finds out that loaded entity has changed by something else and being afraid that it has screwed up changes 1st thread made - throws an exception to let programmer be aware about that.
You are missing locking mechanism. Can't tell much about how to apply that properly and elegantly. Maybe Transaction would help.
We had similar problems when we used nhibernate and raw ado.net concurrently (luckily - just for querying - at least for production code). All we had to do - force updating db on insert/update so we could actually query something through full-text search for some specific entities.
Had StaleStateException in integration tests when we used raw ado.net to reset db. NHibernate session was alive through bunch of tests, but every test tried to cleanup db without awareness of NHibernate.
Here is the documention for exception in the session
http://nhibernate.info/doc/nhibernate-reference/best-practices.html

Resources