My Google App Engine application (Python3, standard environment) serves requests from users: if there is no wanted record in the database, then create it.
Here is the problem about database overwriting:
When one user (via browser) sends a request to database, the running GAE instance may temporarily fail to respond to the request and then it creates a new process to respond this request. It results that two instances respond to the same request. Both instances make a query to database almost in the same time, and each of them finds there is no wanted record and thus creates a new record. It results as two repeated records.
Another scenery is that for certain reason, the user's browser sends twice requests with time difference less than 0.01 second, which are processed by two instances at the server side and thus repeated records are created.
I am wondering how to temporarily lock the database by one instance to prevent the database overwriting from another instance.
I have considered the following schemes but have no idea whether it is efficient or not.
For python 2, Google App Engine provides "memcache", which can be used to mark the status of query for the purpose of database locking. But for python3, it seems that one has to setup a Redis server to rapidly exchange database status among different instances. So, how about the efficiency of database locking by using Redis?
The usage of session module of Flask. The session module can be used to share data (in most cases, the login status of users) among different requests and thus different instances. I am wondering the speed to exchange the data between different instances.
Appended information (1)
I followed the advice to use transaction, but it did not work.
Below is the code I used to verify the transaction.
The reason of failure may be that the transaction only works for CURRENT client. For multiple requests at the same time, the server side of GAE will create different processes or instances to respond to the requests, and each process or instance will have its own independent client.
#staticmethod
def get_test(test_key_id, unique_user_id, course_key_id, make_new=False):
client = ndb.Client()
with client.context():
from google.cloud import datastore
from datetime import datetime
client2 = datastore.Client()
print("transaction started at: ", datetime.utcnow())
with client2.transaction():
print("query started at: ", datetime.utcnow())
my_test = MyTest.query(MyTest.test_key_id==test_key_id, MyTest.unique_user_id==unique_user_id).get()
import time
time.sleep(5)
if make_new and not my_test:
print("data to create started at: ", datetime.utcnow())
my_test = MyTest(test_key_id=test_key_id, unique_user_id=unique_user_id, course_key_id=course_key_id, status="")
my_test.put()
print("data to created at: ", datetime.utcnow())
print("transaction ended at: ", datetime.utcnow())
return my_test
Appended information (2)
Here is new information about usage of memcache (Python 3)
I have tried the follow code to lock the database by using memcache, but it still failed to avoid overwriting.
#user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):
from google.appengine.api import memcache
import time
cache_key_id = test_key_id+"_"+user_key_id
print("cache_key_id", cache_key_id)
counter = 0
client = memcache.Client()
while True: # Retry loop
result = client.gets(cache_key_id)
if result is None or result == "":
client.cas(cache_key_id, "LOCKED")
print("memcache added new value: counter = ", counter)
break
time.sleep(0.01)
counter+=1
if counter>500:
print("failed after 500 tries.")
break
my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
client.cas(cache_key_id, "")
memcache.delete(cache_key_id)
If the problem is duplication but not overwriting, maybe you should specify data id when creating new entries, but not let GAE generate a random one for you. Then the application will write to the same entry twice, instead of creating two entries. The data id can be anything unique, such as a session id, a timestamp, etc.
The problem of transaction is, it prevents you modifying the same entry in parallel, but it does not stop you creating two new entries in parallel.
I used memcache in the following way (using get/set ) and succeeded in locking the database writing.
It seems that gets/cas does not work well. In a test, I set the valve by cas() but then it failed to read value by gets() later.
Memcache API: https://cloud.google.com/appengine/docs/standard/python3/reference/services/bundled/google/appengine/api/memcache
#user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):
from google.appengine.api import memcache
import time
cache_key_id = test_key_id+"_"+user_key_id
print("cache_key_id", cache_key_id)
counter = 0
client = memcache.Client()
while True: # Retry loop
result = client.get(cache_key_id)
if result is None or result == "":
client.set(cache_key_id, "LOCKED")
print("memcache added new value: counter = ", counter)
break
time.sleep(0.01)
counter+=1
if counter>500:
return "failed after 500 tries of memcache checking."
my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
client.delete(cache_key_id)
...
Transactions:
https://developers.google.com/appengine/docs/python/datastore/transactions
When two or more transactions simultaneously attempt to modify entities in one or more common entity groups, only the first transaction to commit its changes can succeed; all the others will fail on commit.
You should be updating your values inside a transaction. App Engine's transactions will prevent two updates from overwriting each other as long as your read and write are within a single transaction. Be sure to pay attention to the discussion about entity groups.
You have two options:
Implement your own logic for transaction failures (how many times to
retry, etc.)
Instead of writing to the datastore directly, create a task to modify
an entity. Run a transaction inside a task. If it fails, the App
Engine will retry this task until it succeeds.
I am building a Redis cache to store product data for eg
Key - value pairs as
key -> testKey
value [json] ->
{
"testA" : "A",
"testB" : "B",
"testC" : "C"
}
Problem i am struggling with is if i get two requests to update this value for key.
request1 to change -> "testB" = "Bx"
request2 to change -> "testC" = "Cx"
How to handle inconsistancy.
As based on my understanding one request will read above data and update only testB value and another request will update testC value because these are running in parallel and any new request is not waiting for last update in cache to propagate.
How do we maintain data consistancy with Redis ?.
I can think of locking using transaction DB in front but that will reduce latency of real time data.
It based on what data structure you selected in Redis.
In your case Hash will be a good way to store all fields in your values. And use HSET command to update target fields, which can guarantee your update requests will only update a single field. And all Redis commands will be execute senquentially, so you will not have concurrency issues.
Also you can use String to store raw json data, and serialize/deserialize for each query and update. In this case you will need to consider concurrency because your read and update will not be atomic operation.(maybe a distribute lock can be the solution).
Not getting expected behavior, my flink application getting live event and my trigger condition is depend on two event ABC and XYZ. when both event reach then trigger the notification.
application is using StreamTableEnviornment
here is the sql query that I am using
SELECT *
from EventTable
where eventName in ('ABC','XYZ')
and 1 IN (select 1 from EventTable where name='XYZ')
and 1 IN (select 1 from EventTable where name='ABC')
use case: 1
ABC event comes -->nothing happens (as expected and waiting for XYZ event)
XYZ event comes --> condition match and sql query gives two event record(ABC &XYZ) and it trigger the notification (as expected)
Now again if I send 'ABC' event then sql query give the result ABC event and notification triggered.
I was expecting that query will not give result as only one event ABC reached and will wait for event XYZ. could you please help me with this behaviour? Am I missing something to get the expected result?
When the second ABC is added to the dynamic table, the first XYZ is already there, so the conditions are met. The addition of this third row to the input table causes one new row to be appended to the output table.
See Dynamic Tables in the documentation for more information about the model underlying stream SQL.
I've a procedure which generates a tab delimited text file and also sends an email with a list of students as attachment using msdb.dbo.sp_send_dbmail.
When I execute the procedure thoruhg SQL server management studio, it sends only one email.
But I created a SSIS package and scheduled the job to run nightly. This job sends 4 copies of the email to each recipient.
EXEC msdb.dbo.sp_send_dbmail #profile_name = 'A'
,#recipients = #email_address
,#subject = 'Error Records'
,#query = 'SELECT * FROM ##xxxx'
,#attach_query_result_as_file = 1
,#query_attachment_filename = 'results.txt'
,#query_result_header = 1
,#query_result_width=8000
,#body = 'These students were not imported'
I've set following parameters to 0 (within database mail configuration wizard), to see if it makes any difference. But it didn't resolve the problem.
AccountRetryAttempts 0
AccountRetryDelay 0
DatabaseMailExeMinimumLifeTime 0
Any suggestions?
I assume you have this email wired up to an event, like OnError/OnTaskFailed, probably at the root level.
Every item you add to a Control Flow adds another layer of potential events. Imagine a Control Flow with a Sequence Container which Contains a ForEach Enumerator which contains a Data Flow Task. That's a fairly common design. Each of those objects has the ability to raise/handle events based on the objects it contains. The distance between the Control Flow's OnTaskFailed event handler and the Data Flow's OnTaskFailed event handler is 5 objects deep.
Data flow fails and raises the OnTaskFailed message. That message bubbles all the way up to the Control Flow resulting in email 1 being fired. The data flow then terminates. The ForEach loop receives signal that the Data Flow has completed and the return status was a failure so now the OnTaskFailed error fires for the Foreach loop. Repeat this pattern ad nauseum until every task/container has raised their own event.
Resolution depends, but usually folks get around this by either only putting the notification at the innermost objects (data flow in my example) or disabling the percolation of event handlers.
Check the solution here (it worked for me as I was getting 2 at a time) - Stored procedure using SP_SEND_DBMAIL sending duplicate emails to all recipients
Change the number of retries from X to 0. Now I only get 1 email. It'll be more obvious if your users are getting 4 emails, exactly 1 minute apart.
Here is what I'm trying to accomplish:
Retrieve 1 record from the database through TSQLDataset's CommandText: SELECT * FROM myTable WHERE ID = 1
Use TClientDataset to modify the record. (1 pending update)
Retrieve next record. SELECT * FROM myTable WHERE ID = 2
Modify the record. (now 2 pending updates)
Finally, send the 2 pending updates back to the database through ApplyUpdates function.
When I do step 3 I got "Must apply updates before refreshing data."
How can I refresh a TClientDataSet without applying pending updates?
You can append data packets manually to your DataSet calling the AppendData method.
In an application where the provider is in the same application with the ClientDataSet you can code something like this:
begin
ConfigureProviderToGetRecordWithID(1);
//make the ClientDataSet fetch this single record and not hit the EOF.
ClientDataSet1.PacketRecords := 1;
ClientDataSet1.Open;
ClientDataSet1.Edit;
ModifyFirstRecord;
ClientDataSet1.Post;
ConfigureProviderToGetRecordWithID(2);
ClientDataSet1.AppendData(DataSetProvider1.Data, False);
//now you have two records in your DataSet without losing the delta.
end;
This is kind of pseudo-code, but shows the general technique you could use.