I'm working on a distributed database. I'm in a situation where, during the healing of a partition (nodes are beginning to recognize the ones that they were split from) two different clients try and commit a Compare-and-Set of 3 to 4, and both are successful. Logically, this should not be possible, but I'm curious if there is any functional problem with both returning successful. Both clients correctly believe what the final state is, and the command that they sent out was successful. I can't think of any serious problems. Are there any?
The "standard" definition of CAS (to the extent that there is such a thing?) guarantees that at most one writer will see a successful response for a particular transition. A couple examples that depend on this guarantee:
// generating a unique id
while (true) {
unique_id = read(key)
if (compare_and_set(key, unique_id, unique_id + 1)) {
return unique_id
}
}
If two clients both read 3 and successfully execute compare_and_set(key, 3, 4), they'll both think they've "claimed" 3 as their unique id and may end up colliding down the road.
// distributed leases/leader election
while (true) {
locked_until = read(key)
if (locked_until < now()) {
if (compare_and_set(key, locked_until, now() + TEN_MINUTES)) {
// I'm now the leader for ~10 minutes.
return;
}
}
sleep(TEN_MINUTES)
}
Similar problem here: if two clients see that the lock is available and both successfully CAS to acquire it, they'll both believe that they are the leader at the same time.
Related
My Google App Engine application (Python3, standard environment) serves requests from users: if there is no wanted record in the database, then create it.
Here is the problem about database overwriting:
When one user (via browser) sends a request to database, the running GAE instance may temporarily fail to respond to the request and then it creates a new process to respond this request. It results that two instances respond to the same request. Both instances make a query to database almost in the same time, and each of them finds there is no wanted record and thus creates a new record. It results as two repeated records.
Another scenery is that for certain reason, the user's browser sends twice requests with time difference less than 0.01 second, which are processed by two instances at the server side and thus repeated records are created.
I am wondering how to temporarily lock the database by one instance to prevent the database overwriting from another instance.
I have considered the following schemes but have no idea whether it is efficient or not.
For python 2, Google App Engine provides "memcache", which can be used to mark the status of query for the purpose of database locking. But for python3, it seems that one has to setup a Redis server to rapidly exchange database status among different instances. So, how about the efficiency of database locking by using Redis?
The usage of session module of Flask. The session module can be used to share data (in most cases, the login status of users) among different requests and thus different instances. I am wondering the speed to exchange the data between different instances.
Appended information (1)
I followed the advice to use transaction, but it did not work.
Below is the code I used to verify the transaction.
The reason of failure may be that the transaction only works for CURRENT client. For multiple requests at the same time, the server side of GAE will create different processes or instances to respond to the requests, and each process or instance will have its own independent client.
#staticmethod
def get_test(test_key_id, unique_user_id, course_key_id, make_new=False):
client = ndb.Client()
with client.context():
from google.cloud import datastore
from datetime import datetime
client2 = datastore.Client()
print("transaction started at: ", datetime.utcnow())
with client2.transaction():
print("query started at: ", datetime.utcnow())
my_test = MyTest.query(MyTest.test_key_id==test_key_id, MyTest.unique_user_id==unique_user_id).get()
import time
time.sleep(5)
if make_new and not my_test:
print("data to create started at: ", datetime.utcnow())
my_test = MyTest(test_key_id=test_key_id, unique_user_id=unique_user_id, course_key_id=course_key_id, status="")
my_test.put()
print("data to created at: ", datetime.utcnow())
print("transaction ended at: ", datetime.utcnow())
return my_test
Appended information (2)
Here is new information about usage of memcache (Python 3)
I have tried the follow code to lock the database by using memcache, but it still failed to avoid overwriting.
#user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):
from google.appengine.api import memcache
import time
cache_key_id = test_key_id+"_"+user_key_id
print("cache_key_id", cache_key_id)
counter = 0
client = memcache.Client()
while True: # Retry loop
result = client.gets(cache_key_id)
if result is None or result == "":
client.cas(cache_key_id, "LOCKED")
print("memcache added new value: counter = ", counter)
break
time.sleep(0.01)
counter+=1
if counter>500:
print("failed after 500 tries.")
break
my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
client.cas(cache_key_id, "")
memcache.delete(cache_key_id)
If the problem is duplication but not overwriting, maybe you should specify data id when creating new entries, but not let GAE generate a random one for you. Then the application will write to the same entry twice, instead of creating two entries. The data id can be anything unique, such as a session id, a timestamp, etc.
The problem of transaction is, it prevents you modifying the same entry in parallel, but it does not stop you creating two new entries in parallel.
I used memcache in the following way (using get/set ) and succeeded in locking the database writing.
It seems that gets/cas does not work well. In a test, I set the valve by cas() but then it failed to read value by gets() later.
Memcache API: https://cloud.google.com/appengine/docs/standard/python3/reference/services/bundled/google/appengine/api/memcache
#user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):
from google.appengine.api import memcache
import time
cache_key_id = test_key_id+"_"+user_key_id
print("cache_key_id", cache_key_id)
counter = 0
client = memcache.Client()
while True: # Retry loop
result = client.get(cache_key_id)
if result is None or result == "":
client.set(cache_key_id, "LOCKED")
print("memcache added new value: counter = ", counter)
break
time.sleep(0.01)
counter+=1
if counter>500:
return "failed after 500 tries of memcache checking."
my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
client.delete(cache_key_id)
...
Transactions:
https://developers.google.com/appengine/docs/python/datastore/transactions
When two or more transactions simultaneously attempt to modify entities in one or more common entity groups, only the first transaction to commit its changes can succeed; all the others will fail on commit.
You should be updating your values inside a transaction. App Engine's transactions will prevent two updates from overwriting each other as long as your read and write are within a single transaction. Be sure to pay attention to the discussion about entity groups.
You have two options:
Implement your own logic for transaction failures (how many times to
retry, etc.)
Instead of writing to the datastore directly, create a task to modify
an entity. Run a transaction inside a task. If it fails, the App
Engine will retry this task until it succeeds.
Inspired by this, I wrote a simple mutex on Cassandra 2.1.4.
Here is a how the lock/unlock (pseudo) code looks:
public boolean lock(String uuid){
try {
Statement stmt = new SimpleStatement("INSERT INTO LOCK (id) VALUES (?) IF NOT EXISTS", uuid);
stmt.setConsistencyLevel(ConsistencyLevel.QUORUM);
ResultSet rs = session.execute(stmt);
if (rs.wasApplied()) {
return true;
}
} catch (Throwable t) {
Statement stmt = new SimpleStatement("DELETE FROM LOCK WHERE id = ?", uuid);
stmt.setConsistencyLevel(ConsistencyLevel.QUORUM);
session.execute(stmt); // DATA DELETED HERE REAPPEARS!
}
return false;
}
public void unlock(String uuid) {
try {
Statement stmt = new SimpleStatement("DELETE FROM LOCK WHERE id = ?", uuid);
stmt.setConsistencyLevel(ConsistencyLevel.QUORUM);
session.execute(stmt);
} catch (Throwable t) {
}
}
Now, I am able to recreate at will a situation where a WriteTimeoutException is thrown in lock() in a high load test. This means the data may or may not be written. After this my code deletes the lock - and again a WriteTimeoutException is thrown. However, the lock remains (or reappears).
Why is this?
Now I know I can easily put a TTL on this table (for this usecase), but how do I reliably delete that row?
My guess on seeing this code is a common error that happens in Distributed Systems programming. There is an assumption that in case in failure your attempt to correct the failure will succeed.
In the above code you check to make sure that initial write is successful, but don't make sure that the "rollback" is also successful. This can lead to a variety of unwanted states.
Let's imagine a few scenarios with Replicas A, B and C.
Client creates Lock but an error is thrown. The lock is present on all replicas but the client gets a timeout because that connection is lost or broken.
State of System
A[Lock], B[Lock], C[Lock]
We have an exception on the client and attempt to undo the lock by issuing a delete but this fails with an exception back at the client. This means the system can be in a variety of states.
0 Successful Writes of the Delete
A[Lock], B[Lock], C[Lock]
All quorum requests will see the Lock. There exists no combination of replicas which would show us the Lock has been removed.
1 Successful Writes of the Delete
A[Lock], B[Lock], C[]
In this case we are still vulnerable. Any request which excludes C as part of the quorum call will miss the deletion. If only A and B are polled than we'll still see the lock existing.
2/3 Successful Writes of the Delete (Quorum CL Is Met)
A[Lock/], B[], C[]
In this case we have once more lost the connection to the driver but somehow succeeded internally in replicating the delete request. These scenarios are the only ones in which we are actually safe and that future reads will not see the Lock.
Conclusion
One of the tricky things with situations like this is that if you fail do make your lock correctly because of network instability it is also unlikely that your correction will succeed since it has to work in the exact same environment.
This may be an instance where CAS operations can be beneficial. But in most cases it is better to not attempt to use distributing locking if at all possible.
I am trying to handle near-simultaneous input to my Entity Framework application. Members (users) can rate things, so I have a table for their ratings, where one column is the member's ID, one is the ID of the thing they're rating, one is the rating, and another is the time they rated it. The most recent rating is supposed to override the earlier ratings. When I receive input, I check to see if the member has already rated a thing or not, and if they have, I just update the rating using the existing row, or if they haven't, I add a new row. I noticed that when input comes in from the same user for the same item at nearly the same time, that I end up with two ratings for that user for the same thing.
Earlier I asked this question: How can I avoid duplicate rows from near-simultaneous SQL adds? and I followed the suggestion to add a SQL constraint requiring unique combinations of MemberID and ThingID, which makes sense, but I am having trouble getting this technique to work, probably because I don't know the syntax for doing what I want to do when an exception occurs. The exception comes up saying the constraint was violated, and what I would like to do then is forget the attemptd illegal addition of a row with the same MemberID and ThingID, and instead fetch the existing one and simply set the values to this slightly more recent data. However I have not been able to come up with a syntax that will do that. I have tried a few things and always I get an exception when I try to SaveChanges after getting the exception - either the unique constraint is still coming up, or I get a deadlock exception.
The latest version I tried was like this:
// Get the member's rating for the thing, or create it.
Member_Thing_Rating memPref = (from mip in _myEntities.Member_Thing_Rating
where mip.thingID == thingId
where mip.MemberID == memberId
select mip).FirstOrDefault();
bool RetryGet = false;
if (memPref == null)
{
using (TransactionScope txScope = new TransactionScope())
{
try
{
memPref = new Member_Thing_Rating();
memPref.MemberID = memberId;
memPref.thingID = thingId;
memPref.EffectiveDate = DateTime.Now;
_myEntities.Member_Thing_Rating.AddObject(memPref);
_myEntities.SaveChanges();
}
catch (Exception ex)
{
Thread.Sleep(750);
RetryGet = true;
}
}
if (RetryGet == true)
{
Member_Thing_Rating memPref = (from mip in _myEntities.Member_Thing_Rating
where mip.thingID == thingId
where mip.MemberID == memberId
select mip).FirstOrDefault();
}
}
After writing the above, I also tried wrapping the logic in a function call, because it seems like Entity Framework cleans up database transactions when leaving scope from where changes were submitted. So instead of using TransactionScope and managing the exception at the same level as above, I wrapped the whole thing inside a managing function, like this:
bool Succeeded = false;
while (Succeeded == false)
{
Thread.Sleep(750);
Exception Problem = AttemptToSaveMemberIngredientPreference(memberId, ingredientId, rating);
if (Problem == null)
Succeeded = true;
else
{
Exception BaseEx = Problem.GetBaseException();
}
}
But this only results in an unending string of exceptions on the unique constraint, being handled forever at the higher-level function. I have a 3/4 second delay between attempts, so I am surprised that there can be a reported conflict yet still there is nothing found when I query for a row. I suppose that indicates that all of the threads are failing because they are running at the same time and Entity Framework notices them all and fails them all before any succeed. So I suppose there should be a way to respond to the exception by looking at all the submissions and adjusting them? I don't know or see the syntax for that. So again, what is the way to handle this?
Update:
Paddy makes three good suggestions below. I expect his Stored Procedure technique would work around the problem, but I am still interested in the answer to the question. That is, surely one should be able to respond to this exception by manipulating the submission, but I haven't yet found the syntax to get it to insert one row and use the latest value.
To quote Eric Lippert, "if it hurts, stop doing it". If you are anticipating getting very high volumnes and you want to do an 'insert or update', then you may want to consider handling this within a stored procedure instead of using the methods outlined above.
Your problem is coming because there is a small gap between your call to the DB to check for existence and your insert/update.
The sproc could use a MERGE to do the insert or update in a single pass on the table, guaranteeing that you will only see a single row for a rating and that it will be the most recent update you receive.
Note - you can include the sproc in your EF model and call it using similar EF syntax.
Note 2 - Looking at your code, you don't rollback the transaction scope prior to sleeping your thread in the case of exception. This is a relatively long time to be holding a transaction open, particularly when you are expecting very high volumes. You may want to update your code something like this:
try
{
memPref = new Member_Thing_Rating();
memPref.MemberID = memberId;
memPref.thingID = thingId;
memPref.EffectiveDate = DateTime.Now;
_myEntities.Member_Thing_Rating.AddObject(memPref);
_myEntities.SaveChanges();
txScope.Complete();
}
catch (Exception ex)
{
txScope.Dispose();
Thread.Sleep(750);
RetryGet = true;
}
This may be why you seem to be suffering from deadlocks when you retry, particularly if you are getting rapid concurrent requests.
This may be a trivial question, but I was just hoping to get some practical experience from people who may know more about this than I do.
I wanted to generate a database in GAE from a very large series of XML files -- as a form of validation, I am calculating statistics on the GAE datastore, and I know there should be ~16,000 entities, but when I perform a count, I'm getting more on the order of 12,000.
The way I'm doing counting is basically I perform a filter, fetch a page of 1000 entities, and then spin up task queues for each entity (using its key). Each task queue then adds "1" to a counter that I'm storing.
I think I may have juiced the datastore writes too much; I set the rate of my task queues to 50/s.. I did get some writing errors, but not nearly enough to justify the 4,000 difference. Could it be possible that I was rushing the counting calls too much that it lead to inconsistency? Would slowing the rate that I process task queues to something like 5/s solve the problem? Thanks.
You can count your entities very easily (no tasks and almost for free):
int total = 0;
Query q = new Query("entity_kind").setKeysOnly();
// set your filter on this query
QueryResultList<Entity> results;
Cursor cursor = null;
FetchOptions queryOptions = FetchOptions.Builder.withLimit(1000).chunkSize(1000);
do {
if (cursor != null) {
queryOptions.startCursor(cursor);
}
results = datastore.prepare(q).asQueryResultList(queryOptions);
total += results.size();
cursor = results.getCursor();
} while (results.size() == 1000);
System.out.println("Total entities: " + total);
UPDATE:
If looping like I suggested takes too long, you can spin a task for every 100/500/1000 entities - it's definitely more efficient than creating a task for each entity. Even very complex calculations should take milliseconds in Java if done right.
For example, each task can retrieve a batch of entities, spin a new task (and pass a query cursor to this new task), and then proceed with your calculations.
I am getting lost on the following regarding the Datastore :
It is recommended to denormalize data as the Datastore does not support join queries. This means that the same information is copied in several entities
Denormalization means that whenever you have to update
data, it must be updated in different entities
But there is a limit of 1 write / second in a single entity group.
The problem I have is therefore the following :
In order to update records, I open a transaction then
Update all the required entities. The entities to be updated are within the same entity group but relate to different kinds
I am getting a "resource contention" exception
==> It seems therefore that the only way to update denormalized data is outside of a transaction. But doing this is really bad as some entities could be updated whereas other entities wouldn't.
Am I the only one having this problem ? How did you solve it ?
Thanks,
Hugues
The (simplified version of the ) code is as follows :
Objectify ofy=ObjectifyService.beginTransaction();
try {
Key<Party> partyKey=new Key<Party>(realEstateKey, Party.class, partyDTO.getId());
//--------------------------------------------------------------------------
//-- 1 - We update the party
//--------------------------------------------------------------------------
Party party=ofy.get(partyKey);
party.update(partyDTO);
//---------------------------------------------------------------------------------------------
//-- 2 - We update the kinds which have Party as embedded field, all in the same entity group
//---------------------------------------------------------------------------------------------
//2.1 Invoices
Query<Invoice> q1=ofy.query(Invoice.class).ancestor(realEstateKey).filter("partyKey", partyKey);
for (Invoice invoice: q1) {
invoice.setParty(party);
ofy.put(invoice);
}
//2.2Payments
Query<Payment> q2=ofy.query(Payment.class).ancestor(realEstateKey).filter("partyKey", partyKey);
for (Payment payment: q2) {
payment.setParty(payment);
ofy.put(payment);
}
}
ofy.getTxn().commit();
return (RPCResults.SUCCESS);
}
catch (Exception e) {
final Logger log = Logger.getLogger(InternalServiceImpl.class.getName());
log.severe("Problem while updating party : " + e.getLocalizedMessage());
return (RPCResults.FAILURE) ;
}
finally {
if (ofy.getTxn().isActive()) {
ofy.getTxn().rollback();
partyDTO.setCreationResult(RPCResults.FAILURE);
return (RPCResults.FAILURE) ;
}
}
This is happening because multiple requests to update the same entity group are occurring in a short period of time, not because you are updating many entities in the same entity group at once.
Since you have not shown your code, I can assume one of two things are happening:
The method you describe above is not actually using a transaction and you are running put_multi() with many entities of the same entity group. (If I had to guess, it'd be this.)
You have a high-traffic site and many other updates are simultaneously occurring at the same time.
Just in case someones gets in the same issue.
The problem was in the party.update(partyDTO) where under some specific conditions, I was initiating another transaction.
What I learned today is that :
--> Inside a transaction, you are allowed to include multiple puts even getting over the 1 entity / second
--> However, you should take care not initiating another transaction within your transaction