How to avoid double processing of data on a busy site?

How to avoid double processing of data on a busy site? - database

On my website, people can add an item to their wishlist. When X number of people have added it to their list, then all those peoples' credit cards are charged.
The problem I'm facing is how to ensure that if two customers add it to their wishlist at the same time, then the payment processing code won't run twice. Any ideas?
An example of what can happen is:
We are waiting for 20 people to add the item to their wishlist, and we have 19.
Bob and Sally visit the site and click the 'add to wishlist' button
The server receives Bob's request, sees that 20 requests are now met, and charges the payments.
At the same time the server receives Sally's request, and still seeing 19 requests in db since Bob's order was simultaneously received, begins to process the payments. Hence, the payments are charged twice.
Any ideas on how to avoid this?
I am using a MySQL database and PHP for the programing.

This is the type of thing for which transactions are designed. The charging of the cards and the reseting of the wislist count must be in the same transaction so that they occur as an atomic unit. Furthermore, to avoid the problem you are describing, you must set the transaction isolation level to at least "Read Committed" "Repeatable Read".
Additional information:
Here's how to do it: 1. The app opens a transaction on the database. 2. The app does a select on the wishlish tables to retrieve the count. 3. If the count is >= n, the app does another select on the wishlist and related tables to retrive the pending wishlist orders, users, card info, etc. 4. Depending on the business rules regarding card transactions, the app then deletes the pending orders, or whatever to reset the wishlist count back to zero. 5. The app then closes the transaction.
Here's why it works: when the app does a select on the wishlist tables to retrieve the count inside a transaction, the db places a read lock on the tables associated with this query. If another transaction that opened during the pendency of the prior transaction tries to read those same tables, it must wait until the prior transaction has either a COMMIT or a ROLLBACK. If the prior transaction COMMITS, then the next transaction will see a count of 0 and all the other modifications. Otherwise, if the app executes a ROLLBACK for any reason, none of the data changes and the next transaction sees the data as it existed prior to the first transaction.

I am doing a similar site at the moment. Seems to be popular...
It is important that your processes are idempotent. In this context, this means that if you run our charging service multiple times, the orders which have already been charged are not charged twice.
I accomplish this by setting the OrderStatus to 'NotProcessed' when the order is placed.
Once the service runs and charges for an order the OrderStatus changes to 'PaymentPending'.
I charge for the order only if the OrderStatus is 'NotProcessed'.
PSEUDO CODE:
void ProcessPendingOrders()
{
var orders = getAllOrders();
foreach(Order order in orders)
{
if (order.OrderStatus == NotProcessed)
ChargeOrder(order)
}
}

Related

StoreKit: Transaction ID and Original Transaction ID Chaos

I've been struggling with this for some days now. I hope I get it now, but I wanted to check it with you.
For every transaction there is a SKPaymentTransaction. In a regular purchase, the property Original Transaction is empty. In a restore or auto renewal, Original transaction is the original transaction SKPaymentTransaction.
The tricky part in my opinion is the receipt received. So every transaction in the receipt contains a transaction_id and a original_transaction_id. In a one time purchase they are the same, in a subscription, the original_transaction_id is the transaction_id of the first transaction the user subscribed.
So my first question: If I want to check the validity of a purchase in the receipt -> The transactionID of the SKPayment transaction appears ONLY in the receipt, if it is not a restore or renewal. Otherwise the SKPaymentTransaction transactionID is NOT in the receipt. But since in these cases the SKPaymentTransaction has a property originalTransaction, originalTransaction.transactionID appears in the receipt. Correct?
And now the thing I have been struggling with, 2nd question: So the originalTransaction property of the SKPaymentTransaction has not necessarily anything to do with the original_transaction_id in the receipt, correct? I mean for a subscription with several renewals - If I restore them I get a SKPaymentTransaction with a transaction ID, which isn't in the receipt. Then I take instead the originalTransaction.transactionID of this SKPaymentTransaction and look for it in the receipt, but NOT in the original_transaction_id field but in the transaction_id field of the receipt, correct?
I hope I get it now..I really think the documentation is rather confusing here from Apple..

Restoring the transactions on your device will generate unique transaction_id's. So the original_transaction_id will not be found after this if you do it. The same happens on different devices, e.g. iPad, iPhone. the web_order_line_item_id will not change for these transactions if you need a stable identifier.

Yes, in your SKPaymentTransaction there is a property originalTransaction. You can find your original_transaction_id in the receipt. However it is not a good way to validate receipt, because it should be done using the server to avoid man in the middle attacks.
I would recommend you validating receipt through the server as Apple recommends.
There are a few of ready-to-go solutions, like ours - Apphud or RevenueCat.
Also I would recommend you reading articles about what is receipt validation and why it's needed: https://blog.apphud.com/receipt-validation/

SQL Server queue optimization by immediate removal

We use a table for a mail queue. When new mail needs to be sent, it is inserted into this table. There is a field in the table called status with an index on it.
A script runs every 10 seconds and checks whether there is new mail with status=0, sends this mail and then updates the status to 1 (the actual mail content is saved as a nvarchar(max) column).
My question: is there any benefit to immediately "cleaning" the table, meaning once an email is sent, copy the record into a different "sent" table and delete it from the mail queue table? Right now we are performing this cleaning process only once a month, removing about 500,000 emails each month.

You must consider performance of a few operations here:
selects of e-mails to be sent
inserts of new e-mails
updates of sent e-mails
deletes of sent e-mails
To address the deletion issue right away: if you have no issues with disk space, the fastest way to delete the data is to truncate the whole table, e.g. once a month.
Insert performance should not be affected by choosing different approaches - inserts' time depends more on the table's availability (speaking of locks) and physical structure, than on the size of the table. So when it comes to locking, it would be better not to clean the rows right away (looking just at the inserts right now).
Updates of the 'sent' flag is the most tricky part here. Size of the table plays a big role, because you first need to select the rows with new e-mails and then update the row once you actually send it. I assume you use a cursor to send the e-mails one by one in a loop, so all you need is just one select statement to find IDs of all the items to be sent. That should not be resource consuming if you have a non-clustered index on the flag column. You must remember to maintain the index though, but that can be taken care of outside the business hours. Once you sent an e-mail, in order to update the flag, you can access the row using it's ID, so if you have a clustered index on the ID field (which I hope is the case), then it is a fast operation.
All in all, I would say that you would find no benefit in cleaning the table right away, because you lock it for the sake of doing it, you fragment the indexes with deletes and selects and updates would not benefit that much from that approach.

if 10 second is justified time to run your script and volume of mail is lot then what your doing is correct.Becasue sending mail is very important than cleaning table.
cleaning process require Select -Insert,select -delete
Since you are saying 5,00,000 record per month.That measn you send this much mail per month.
so per 10 second mail=
select 500000/30 ,select 16666/(24*60*60.0)*10=2 mail approax.
So you send 2 mail per 10 second.
I think you can do all operation at one go.Like writing trigger to move data to archive when update Status=1.
no need of second scheduler.

Azure Mobile Service offline syncing behavior with relational database using .NET backend

We have parent child tables
Delviery
DeliveryItems
Delivery Table contains status updates;
Based on Delivery Status update,
we have triggered to insert items into another system.
From my mobile application I am inserting DeliveryItems first (offline),
and then updating Delivery Staus (offline).
Now when I am syncing with Azure mobile service.
Delivery record getting updated before completing insertion of all items.
I want insert/update/delete to be done sequentially, how do I achieve this?

That's based on your sync order.If you want to sync delivery items first, place the sync for "sync delivery items" above "delivery".

Its not currently possible to guarantee order for two main reasons:
An error in inserting delivery item 2, will not inherently stop the attempt to insert delivery item 3. (You can address this via a Handler)
Multiple actions taken on the same item are combined (So an offline insert, and update will go as one Insert to the server when you come online)
If it's the first case that is tripping you up, you can have the sync handler abort the sync (so items 3, 4, .. and the Delivery don't go up)
Handling the second case is more complex, with the simplest (but maybe unreasonable) approach is don't edit the Delivery until after you have inserted/edited all the items.

What SQL Server 2005/2008 locking approach should I use to process individual table rows in multiple server application instances?

I need to develop a server application (in C#) that will read rows from a simple table (in SQL Server 2005 or 2008), do some work, such as calling a web service, and then update the rows with the resulting status (success, error).
Looks quite simple, but things get tougher when I add the following application requisites:
Multiple application instances must be running at the same time, for Load Balancing and Fault Tolerance purposes. Typically, the application will be deployed on two or more servers, and will concurrently access the same database table. Each table row must be processed only once, so a common synchronization/locking mechanism must be used between multiple application instances.
When an application instance is processing a set of rows, other application instances shouldn't have to wait for it to end in order to read a different set of rows waiting to be processed.
If an application instance crashes, no manual intervention should need to take place on the table rows that were being processed (such as removing temporary status used for application locking on rows that the crashing instance was processing).
The rows should be processed in a queue-like fashion, i.e., the oldest rows should be processed first.
Although these requisites don't look too complex, I'm having some trouble in coming up with a solution.
I've seen locking hint suggestions, such as XLOCK, UPDLOCK, ROWLOCK, READPAST, etc., but I see no combination of locking hints that will allow me to implement these requisites.
Thanks for any help.
Regards,
Nuno Guerreiro

This is a typical table as queue pattern, as described in Using tables as Queues. You would use a Pending Queue and the dequeue transaction should also schedule a retry in a reasonable timeout. Is not realistically possible to hold on to locks for the duration of the web calls. On success, you would remove the pending item.
You also need to be able to dequeue in batch, dequeuing one-by-one is too slow if you go into serious load (100 and thousands of operations per second). So taking the Pending Queue example from the article linked:
create table PendingQueue (
id int not null,
DueTime datetime not null,
Payload varbinary(max),
cnstraint pk_pending_id nonclustered primary key(id));
create clustered index cdxPendingQueue on PendingQueue (DueTime);
go
create procedure usp_enqueuePending
#dueTime datetime,
#payload varbinary(max)
as
set nocount on;
insert into PendingQueue (DueTime, Payload)
values (#dueTime, #payload);
go
create procedure usp_dequeuePending
#batchsize int = 100,
#retryseconds int = 600
as
set nocount on;
declare #now datetime;
set #now = getutcdate();
with cte as (
select top(#batchsize)
id,
DueTime,
Payload
from PendingQueue with (rowlock, readpast)
where DueTime < #now
order by DueTime)
update cte
set DueTime = dateadd(seconds, #retryseconds, DueTime)
output deleted.Payload, deleted.id;
go
On successful processing you would remove the item from the queue using the ID. On failure, or on crash, it would be retries automatically in 10 minutes. One think you must internalize is that as long as HTTP does not offer transactional semantics you will never be able to do this with 100% consistent semantics (eg. guarantee that no item is processed twice). You can achieve a very high margin for error, but there will always be a moment when the system can crash after the HTTP call succeeded before the database is updated, and will cause the same item to be retried since you cannot distinguish this case from a case when the system crashed before the HTTP call.

I initially suggested SQL Server Service Broker for this. However, after some research it turns out this is probably not the best way of handling the problem.
What you're left with is the table architecture you've asked for. However, as you've been finding, it is unlikely that you will be able to come up with a solution that meets all the given criteria, due to the great complexity of locking, transactions, and the pressures placed on such a scheme by high concurrency and high transactions per second.
Note: I am currently researching this issue and will get back to you with more later. The following script was my attempt to meet the given requirements. However, it suffers from frequent deadlocks and processes items out of order. Please stay tuned, and in the meantime consider a destructive reads method (DELETE with OUTPUT or OUTPUT INTO).
SET XACT_ABORT ON; -- blow up the whole tran on any errors
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN
UPDATE X
SET X.StatusID = 2 -- in process
OUTPUT Inserted.*
FROM (
SELECT TOP 1 * FROM dbo.QueueTable WITH (READPAST, ROWLOCK)
WHERE StatusID = 1 -- ready
ORDER BY QueuedDate, QueueID -- in case of items with the same date
) X;
-- Do work in application, holding open the tran.
DELETE dbo.QueueTable WHERE QueueID = #QueueID; -- value taken from recordset that was output earlier
COMMIT TRAN;
In the case of several/many rows being locked at once by a single client, there is a possibility of the rowlock escalating to an extent, page, or table lock, so be aware of that. Also, normally holding long-running transactions that maintain locks is a big no-no. It may work in this special usage case, but I fear that high tps by multiple clients will make the system break down. Note that normally, the only processes querying your queue table should be those that are doing queue work. Any processes doing reporting should use READ UNCOMMITTED or WITH NOLOCK to avoid interfering with the queue in any way.
What is the implication of rows being processed out of order? If an application instance crashes while another instance is successfully completing rows, this delay will likely cause at least one row to be delayed in its completion, causing the processing order to be incorrect.
If the transaction/locking method above is not to your satisfaction, another way to handle your application crashing would be to give your instances names, then set up a monitor process that has the capacity to check periodically if those named instances are running. When a named instance starts up it would always reset any unprocessed rows that possess its instance identifier (something as simple as "instance A" and "instance B" would work). Additionally, the monitor process would check if the instances are running and if one of them is not, reset the rows for that missing instance, enabling any other instances to run. There would be a small lag between crash and recovery, but with proper architecture it could be quite reasonable.
Note: The following links should be edifying:
info about XLOCK
Tables as Queues

You can't do this with SQL transactions (or relying on transactions as your main component here). Actually, you can do this, but you shouldn't. Transactions are not meant to be used this way, for long locks, and you shouldn't abuse them like this.
Keeping a transaction open for that long (retrieve rows, call the web service, get back to make some updates) is simply not good. And there's no optimistic locking isolation level that will allow you to do what you want.
Using ROWLOCK is also not a good idea, because it's just that. A hint. It's subject to lock escalation, and it can be converted to a table lock.
May I suggest a single entry point to your database? I think it fits in the pub/sub design.
So there would be only one component that reads/updates these records:
Reads batches of messages (enough for all your other instances to consume) - 1000, 10000, whatever you see fit. It makes these batches available to the other (concurrent) components through some queued way. I'm not going to say MSMQ :) (it would be the second time today I recommend it, but it's really suitable in your case too).
It marks the messages as in progress or something similar.
Your consumers are all bound, transactionally, to the inbound queue and do their stuff.
When ready, after the web service call, they put the messages in an outbound queue.
The central component picks them up and, inside a distributed transaction, does an update on the database (if it fails the messages will stay in the queue). Since it is the only one that could do that operation you won't have any concurrency issues. At least not on the database.
In the mean time it can read the next pending batch and so on.

Pessimistic offline locking using Entity framework

First I'd like to describe the mechanism of a locking solution I'd like to implement. Basically an item can be opened in read or write mode. However if an user opens the item in write mode, no other user should be able to open it in edit mode. The item means a case in a customer service application.
In order to to this I came up with the following: The table will contain a flag which indicates if an item is checked out for edit, and an 'end time', while this flag is valid. The default value for it is 3 minutes, if no user interaction happens during this time, the flag can be ignored next time when an user tries to open the same item.
On the UI side, I use jQuery to monitor if an user is active. If he or she is, a periodic AJAX call extends his or her time frame so he or she can continue working on the item. When the user saves the item, the flag will be removed. The end time is necessary to handle situations when the browser crashes or when the user goes to drink a coffee and leaves the item open for an hour.
So, the question. :) If an user opens the item in edit mode first I have to read the flag & time values for the time item, and if I find these valid (flag is not set, or set but not valid because of the time) and I have to update them with new values.
What kind of transaction level should I use for this in EF, if any? Or should I write stored procedures to handle the select & update in a transaction? If so, what kind of locking method should I use?

You are describing pessimistic locking, there is really no debate on that. There are detailed instructions on what you want to do in the excellent MVC/EF tutorial http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
There’s a chapter early on about pessimistic.

Optimistic locking is still OK in this case. You can use timestamp / rowversion and your flag together. The flag will be used to handle your application logic - only single user can edit the record and the timestamp will be used to avoid race condition when setting the flag because only single thread will be able to read the record and write it back. If any other thread tries to read the record concurrently and saves it after the first thread it will get concurrency exception.
If you don't want to use timestamp different transaction isolation level will not help you because isolation level doesn't force queries to lock records. You must manually write SQL query and use UPDLOCK hint to lock the record by querying and after that execute update. You can do this in stored procedure.

The answer below is not a good way to implement pessimistic concurrency. You should not implement this at the application level. The RDBMS have better tools for this.
If you are locking a row in the db, this is by definition pessimistic.
Since you are controlling the pessimistic concurrency at the application level, I don't think it matters which transaction scope EF uses. EF will automatically start a db-level transaction when you SaveChanges.
To prevent multiple threads from executing the lock / unlock from your app, you can lock the section of code that queries & updates like so:
public static object _lock = new object();
public class MyClassThatManagesConcurrency
{
public void MyMethodThatManagesConcurrency()
{
lock(_lock)
{
// query for the data
// determine if item should be unlocked
// dbContext.SaveChanges();
}
}
}
With the above, no 2 threads will ever execute code inside the lock section at the same time. However, I am not sure why this is necessary. If all you are doing is reading the object and unlocking it when time has expired, and 2 threads enter the method at the same time, either way, the item will become unlocked.
On the other hand, if your db row for this object has a timestamp column (not a datetime column but a columng for versioning rows), and 2 threads enter the method at the same time, the second will receive a concurrency exception. But unless you have are versioning rows at the db level, I don't think you need to do any locking.
Reply to comment
Ok I get it now, you are right. But you are still locking at the application level, which means it should not matter which db transaction ef chooses. To prevent 2 users from unlocking the same object, use the C# lock block I posted above.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight