Django: Avoiding ABA scenario in Database and ORM’s - database

I have a situation where I need to update votes for a candidate.
Citizens can vote for this candidate, with more than one vote per candidate. i.e. one person can vote 5 votes, while another person votes 2. In this case this candidate should get 7 votes.
Now, I use Django. And here how the pseudo code looks like
votes = candidate.votes
vote += citizen.vote
The problem here, as you can see is a race condition where the candidate’s votes can get overwritten by another citizen’s vote who did a select earlier and set now.
How can avoid this with an ORM like Django?

If this is purely an arithmetic expression then Django has a nice API called F expressions
Updating attributes based on existing fields
Sometimes you'll need to perform a simple arithmetic task on a field, such as incrementing or decrementing the current value. The obvious way to achieve this is to do something like:
>>> product = Product.objects.get(name='Venezuelan Beaver Cheese')
>>> product.number_sold += 1
>>> product.save()
If the old number_sold value retrieved from the database was 10, then the value of 11 will be written back to the database.
This can be optimized slightly by expressing the update relative to the original field value, rather than as an explicit assignment of a new value. Django provides F() expressions as a way of performing this kind of relative update. Using F() expressions, the previous example would be expressed as:
>>> from django.db.models import F
>>> product = Product.objects.get(name='Venezuelan Beaver Cheese')
>>> product.number_sold = F('number_sold') + 1
>>> product.save()
This approach doesn't use the initial value from the database. Instead, it makes the database do the update based on whatever value is current at the time that the save() is executed.
Once the object has been saved, you must reload the object in order to access the actual value that was applied to the updated field:
>>> product = Products.objects.get(pk=product.pk)
>>> print product.number_sold
42

Perhaps the select_for_update QuerySet method is helpful for you.
An excerpt from the docs:
All matched entries will be locked until the end of the transaction block, meaning that other transactions will be prevented from changing or acquiring locks on them.
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released. If this is not the behavior you want, call select_for_update(nowait=True). This will make the call non-blocking. If a conflicting lock is already acquired by another transaction, DatabaseError will be raised when the queryset is evaluated.
Mind that this is only available in the Django development release (i.e. > 1.3).

Related

Eiffel: how to use old keyword in ensure contract clause with detached

How to use the old keyword into ensure clause of a feature, the current statement doesn't seem to be valid at runtime
relationships: CHAIN -- any chain
some_feature
do
(...)
ensure
relationship_added: attached relationships as l_rel implies
attached old relationships as l_old_rel and then
l_rel.count = l_old_rel.count
end
For the whole case, following screenshots demonstrate the case
start of routine execution
end of routine execution
There are 2 cases to consider: when the original reference is void, and when — not:
attached relationships as r implies r.count =
old (if attached relationships as o then o.count + 1 else 1 end)
The intuition behind the assertion is as follows:
If relationships is Void on exit, we do not care about count.
If relationships is attached on exit, it has one more item compared to the old value. The old value may be attached or Void:
if the old value is attached, the new number of items is the old number plus 1;
if the old value is Void, the new number of items is 1.

Google NDB update only specific property?

Is there a way to update only specific property on NDB entity?
Consider this example.
Entity A has following property:
property B
property C
Let's assume both of these properties have values of 1 at the moment.
Two different request are trying to update same entity and they are happening at the same time.
So when Request#1 and #2 are retrieving this entity, value of B and C were 1.
Now Request #1 tries to update property B so it sets the value B to 2 and put into Datastore. Now B = 2 and C = 1 in the datastore.
But, Request #2 has B=1 and C=1 in the memory and when it change C to 2 and put into DB, it put's B=1 and C=2 which overwrites B value written by Request #1.
How do you get around this? Is there way to only write specific property into datastore?
I believe you may want to look into transactions.
As per the documentation:
If the transaction "collides" with another, it fails; NDB automatically retries such failed transactions a few times. Thus, the function may be called multiple times if the transaction is retried.
Link: https://developers.google.com/appengine/docs/python/ndb/transactions

Google Datastore queries and eventual consistency

I would like to confirm my understanding of eventual consistency in the Google datastore. Suppose that I have an entity defined as follows (using ndb):
class Record(ndb.Model):
name = ndb.StringProperty()
content = ndb.BlobProperty()
I think I understand Scenarios 1, but I have doubts about Scenarios 2 and 3, so some advice would be highly appreciated.
Scenario 1: I insert a new Record with name "Luca" and a given content. Then, I query the datastore:
qry = Record.query(name=="Luca")
for r in qry.iter():
logger.info("I got this content: %r" % r.content)
I understand that, due to eventual consistency, the just-inserted record might not be part of the result set. I know about using ancestor queries in order to over come this if needed.
Scenario 2: I read an existing Record with name "Luca", update the content, and write it back. For instance, assuming I have the key "k" of this record:
r = k.get()
r.content = "new content"
r.put()
Then, I run the same query as in Scenario 1. When I get the results, assume that the record is part of the result set (for instance, because the index already contained the record with name "Luca" and key k). Am I then guaranteed that the field content will have its new value "new content"?
In other words, if I update a record, leaving its key and indexed fields alone, am I guaranteed to read the most recent value?
Scenario 3: I do similarly to Scenario 2, again where k is the key of a record with name "Luca":
r = k.get()
r.content = "new content"
r.put()
but then I run a modified version of the query:
qry = Record.query(name=="Luca")
for k in qry.iter(keys_only=True):
r = k.get()
logger.info("I got this content: %r" % r.content)
In this case, logic tells me I should be getting the latest value of the content, because reading by key guarantees strong consistency. I would appreciate confirmation.
Scenario 1. Yes, your understanding is correct.
Scenario 2. No, same query, so still eventually consistent.
Scenario 3. Yes, your understanding is correct.
Also you can avoid eventual consistency by doing everything in the same transaction, but of course this may not be applicable.

Cost of updating entities in datastore (and, possible to append properties)?

I have a two part question.
Let's say I have a entity with a blob property...
# create entity
Entity(ndb.Model):
blob = ndb.BlobProperty(indexed=False)
e = Entity()
e.blob = 'abcd'
e_key = e.put()
# update entity
e = e_key.get()
e.blob += 'efg'
e.put()
So questions are:
The first time I put() that entity, the cost is 2 Write Ops; how many Ops does it cost to update the entity, as in the above example?
When I added 'efg' to the property, the old property had to be read into memory first, does app engine provide a way to append the old value without reading it first?
There are no partial updates. Every time you overwrite the whole entity. Numbers of indexes will also have an impact on cost. You might like to have a look at https://developers.google.com/appengine/articles/life_of_write for a detailed breakdown of what happens.

How to filter rows with null refrences in Google app engine DB

I have a Model UnitPattern, which reference another Model UnitPatternSet
e.g.
class UnitPattern(db.Model):
unit_pattern_set = db.ReferenceProperty(UnitPatternSet)
in my view I want to display all UnitPatterns having unit_pattern_set refrences as None, but query UnitPattern.all().filter("unit_pattern_set =", None) returns nothing, though I have total 5 UnitPatterns, out of which 2 have 'unit_pattern_set' set and 3 doesn't have
e.g.
print 'Total',UnitPattern.all().count()
print 'ref set',UnitPattern.all().filter("unit_pattern_set !=", None).count()
print 'ref not set',UnitPattern.all().filter("unit_pattern_set =", None).count()
outputs:
Total 5
ref set 2
ref not set 0
Shouldn't sum of query 2 and 3 be equal to query 1 ?
Reason seems to be that I added reference property unit_pattern_set later on, and these UnitPattern objects existed before that, but then how can I filter such entities?
This is described succinctly in the docs:
An index only contains entities that
have every property referred to by the
index. If an entity does not have a
property referred to by an index, the
entity will not appear in the index,
and will never be a result for the
query that uses the index.
Note that
the App Engine datastore makes a
distinction between an entity that
does not possess a property and an
entity that possesses the property
with a null value (None). If you want
every entity of a kind to be a
potential result for a query, you can
use a data model that assigns a
default value (such as None) to
properties used by query filters.
In your case, you have 3 entities that don't have the unit_pattern_set property set at all (because that property wasn't defined in the Model at the time those entities were created) - therefore those properties doesn't exist in the database representation of that entity, therefore that entity does not appear in the index of that property for that kind of entity.
Dan Sanderson's book Programming Google App Engine explains this in great detail on ~page 150 (unfortunately not available in the Google Books preview)
To fix the models you already have, you'll have to iterate over a query on UnitPattern (I've not tested the following code, please check it before you run it on your live data):
patterns = UnitPattern.all()
for pattern in patterns:
if not pattern.unit_pattern_set:
pattern.unit_pattern_set = None
pattern.put()
Edit: Also, the Updating you model's schema article discuss strategies you can use to handle schema changes such as this in future. However, that article is quite old and its method requires a web browser to keep hitting a url to trigger the next job to update more records - now that Task Queues exist, you could use a series of Tasks to make the change. The article on using deferred.defer has a framework you could utilise - it does a small amount of work, catches the DeadlineExceededError, and uses the handler to queue a new task which picks up where the current task left off.

Resources