I'm implementing a Neo4j client for BG benchmark. There are 11 functions, simulating 11 different social networking actions. Each of these functions has its own transaction body. But when I'm running with 100 threads, sometimes it throws deadlock detection exception.
I have users as nodes and friendships as relationships. I have invite friend, reject friend, accept friend and thaw friendship which all have two users as their input. The way they're working is that they're getting all relationships of one user node and find the relationship with the other user node.
Is any one aware of locking mechanism of Neo4j?
You can read about the deadlocks in the Neo4j documentation. These can appear when you have concurrent modifications of the same entities (nodes or relationships). Note that when modifying a entity, several locks may be used: for instance for a relationship, the locks on the two nodes connected by the relationship are taken.
Default locking behaviour:
When adding, changing or removing a property on a node or relationship a write lock will be taken on the specific node or relationship.
When creating or deleting a node a write lock will be taken for the specific node.
When creating or deleting a relationship a write lock will be taken on the specific relationship and both its nodes.
The locks will be added to the transaction and released when the transaction finishes.
Design database in a way that minimum locking will be there.
Avoid using same nodes and relationships are used by many users at same instance. Keep minimum transaction period for those nodes and relationships.
Maybe serializing the parallel writing query would be helpful to your solution. In your situation, You have 11 functions, simulating 11 different social networking actions. In your description, I thought that some of the actions(transactions) may be executed in sequential order (e,g, you can only accept friend request after your friend sent an invitation). You may serialize some write transactions. In other words, some queries will be blocked until the previous ones finished.
With the help of Causal chaining and bookmarks, you can serialize the operations for each session. For example, if you have three functions, sendInvitationToFriend, reject friend, acceptFriend. reject/accept transactions will be blocked until the sendInvitationToFriend transaction finished.
Some code snippets (neo4j-java-driver 4.1) :
List<Bookmark> bookmarks = new ArrayList<>();
// Send Invitation.
try ( Session session = driver.session( builder().withDefaultAccessMode( AccessMode.WRITE ).build())){
session.writeTransaction( tx -> this.sendInvitationToFriend( tx, "friendId", "yourId"));
savedBookmarks.add(session.lastBookmark() );
}
// accept the invitation
try (Session session = driver.session( builder().withDefaultAccessMode( AccessMode.WRITE ).withBookmarks( savedBookmarks ).build())){
session.writeTransaction( tx -> this.acceptFriend(tx, "friendId", "yourId"));
}
// Create a friendship between the two people created above.
try (Session session = driver.session(builder().withDefaultAccessMode( AccessMode.WRITE).withBookmarks(savedBookmarks ).build())) {
session.writeTransaction( tx -> this.rejectFriend( tx, "friendId", "yourId"));
}
You also mentioned that your simulation is somehow random. My suggestion you could define a retry strategy in your program, re-attempt the query serval times until it goes well and let the main thread to sleep for a while between any two retries. You can find more details information from the link
Wish this post will be helpful to you.
Related
How to handle concurrency-related issues on a DB table if multiple applications are reading and writing on it? This case may not be specific to microservices.
OPERATION
STATUS
GET_ORDER
COMPLETE
CALCULATE_PRICE
RUNNING
A very basic use-case: multiple applications are writing in the above table. Before writing, they check if same operation is already present in RUNNING status. If not present, they insert the entry. Otherwise they just skip. Both read and write operations are simple SQL queries.
Problem is - 2 different applications can read at the same time and find that there is no 'CREATE_INVOICE' operation RUNNING, so they both will insert it in the table which will now look like:
OPERATION
STATUS
GET_ORDER
COMPLETE
CALCULATE_PRICE
RUNNING
CREATE_INVOICE
RUNNING
CREATE_INVOICE
RUNNING
As a result the table has two duplicate CREATE_INVOICE records. Besides applying unique constraint on the table, what are the ways to resolve this?
By "2 different applications" do you mean that there are two completely separate applications which create invoices, or just 2 instances of the same application?
If the former, I'd be curious why there are two applications doing the same thing writing to the same DB.
If the latter, those instances will need to coordinate in some way (a uniqueness constraint on the table is an example of such coordination), and it's important to note that this coordination makes the application a little more stateful.
My preferred way of dealing with this would be to be event driven (e.g. by tapping into database change data capture) and sharding: for instance, when a GET_ORDER record is marked COMPLETE in the DB (resulting in a CDC record being published), based on the order ID, that CDC record is always routed to the same shard in the invoice creation application (or the price calculation application for that matter; your second table seems to imply that invoice creation can be simultaneous with price calculation), thus avoiding the conflict.
I am running three PostgreSQL instances using replication (1 master, 2 slaves) which are accessed by two separate servers:
The first (unexposed) server basically iterates over every row in a particular table and continuously updates specific columns (resources) every tick (based on production rate of those resources) for each user.
The second server is a public API that exposes various functions such as spending a certain amount of those resources.
In order to access and manipulate the data I am using an ORM library which allows me to write code as follows:
const resources = await repository.findById(1337);
// some complex computation
resources.iron = computeNewIron(resources.iron);
await repository.save(resources);
Of course it might occur that the API wants to deduct a specific amount of resources right when the server handling the ticks is trying to update the amount of resources which can cause either of the servers to assume a certain amount of resources that is incorrect, basically your typical UPDATE anomaly.
My problem is that I am not just writing a "simple" atomic query such UPDATE table SET iron = iron + 42 WHERE id = :id. The ORM library is internally using a direct assignment that is not self-referencing the respective columns which yields something akin to UPDATE table SET iron = 123 WHERE id = :id where the amount has been computed previously.
I can just assume that it's possible to prevent the mentioned anomaly if I use manually written queries that are incrementing/decrementing the values atomically with self-references. I'd like to know which other options can alleviate the issue. Should I wrap my SELECT/computation/UPDATE in a transaction? Does this suffice?
Your question is a bit unclear, but if your transaction spans several statements, yet needs to have a consistent state of the database, there are basically two options:
Use pessimistic locking: when you read values from the database, do it with SELECT ... FOR UPDATE. Then the rows are locked for the duration of your transaction, and no concurrent transaction can modify them.
Use optimistic locking: start your transaction in REPEATABLE READ isolation level. Then you see a consistent snapshot of the database for the whole duration of your transaction. If somebody else modifies your data after you read them, your UPDATE will cause a serialization error and you'll have to retry the transaction.
Optimistic locking is better if conflicts are rare, while pessimistic locking is preferable if conflicts are likely.
I have two tables in DynamoDB. One has data about homes, one has data about businesses. The homes table has a list of the closest businesses to it, with walking times to each of them. That is, the homes table has a list of IDs which refer to items in the businesses table. Since businesses are constantly opening and closing, both these tables need to be updated frequently.
The problem I'm facing is that, when either one of the tables is updated, the other table will have incorrect data until it is updated itself. To make this clearer: let's say one business closes and another one opens. I could update the businesses table first to remove the old business and add the new one, but the homes table would then still refer to the now-removed business. Similarly, if I updated the homes table first to refer to the new business, the businesses table would not yet have this new business' data yet. Whichever table I update first, there will always be a period of time where the two tables are not in synch.
What's the best way to deal with this problem? One way I've considered is to do all the updates to a secondary database and then swap it with my primary database, but I'm wondering if there's a better way.
Thanks!
Dynamo only offers atomic operations on the item level, not transaction level, but you can have something similar to an atomic transaction by enforcing some rules in your application.
Let's say you need to run a transaction with two operations:
Delete Business(id=123) from the table.
Update Home(id=456) to remove association with Business(id=123) from the home.businesses array.
Here's what you can do to mimic a transaction:
Generate a timestamp for locking the items
Let's say our current timestamp is 1234567890. Using a timestamp will allow you to clean up failed transactions (I'll explain later).
Lock the two items
Update both Business-123 and Home-456 and set an attribute lock=1234567890.
Do not change any other attributes yet on this update operation!
Use a ConditionalExpression (check the Developer Guide and API) to verify that attribute_not_exists(lock) before updating. This way you're sure there's no other process using the same items.
Handle update lock responses
Check if both updates succeeded to Home and Business. If yes to both, it means you can proceed with the actual changes you need to make: delete the Business-123 and update the Home-456 removing the Business association.
For extra care, also use a ConditionExpression in both updates again, but now ensuring that lock == 1234567890. This way you're extra sure no other process overwrote your lock.
If both updates succeed again, you can consider the two items updated and consistent to be read by other processes. To do this, run a third update removing the lock attribute from both items.
When one of the operations fail, you may try again X times for example. If it fails all X times, make sure the process cleans up the other lock that succeeded previously.
Enforce the transaction lock throught your code
Always use a ConditionExpression in any part of your code that may update/delete Home and Business items. This is crucial for the solution to work.
When reading Home and Business items, you'll need to do this (this may not be necessary in all reads, you'll decide if you need to ensure consistency from start to finish while working with an item read from DB):
Retrieve the item you want to read
Generate a lock timestamp
Update the item with lock=timestamp using a ConditionExpression
If the update succeeds, continue using the item normally; if not, wait one or two seconds and try again;
When you're done, update the item removing the lock
Regularly clean up failed transactions
Every minute or so, run a background process to look for potentially failed transactions. If your processes take at max 60 seconds to finish and there's an item with lock value older than, say 5 minutes (remember lock value is the time the transaction started), it's safe to say that this transaction failed at some point and whatever process running it didn't properly clean up the locks.
This background job would ensure that no items keep locked for eternity.
Beware this implementation do not assure a real atomic and consistent transaction in the sense traditional ACID DBs do. If this is mission critical for you (e.g. you're dealing with financial transactions), do not attempt to implement this. Since you said you're ok if atomicity is broken on rare failure occasions, you may live with it happily. ;)
Hope this helps!
I'm using Map Reduce (http://code.google.com/p/appengine-mapreduce/) to do an operation over a set of entities. However, I am finding my operations are being duplicated.
Are map reduce maps sometimes called more than once for a specific entity? Is this the case even if they don't fail the initial time?
edit: here are some more details.
def reparent_request(entity):
#check if the entity has a parent
if not is_valid_to_reparent(entity):
return
#copy it
try:
copy = clone_entity(Request, entity, parent=entity.user)
copy.put() #we hard put here so we can use the reference later in this function.
except:
...
... update some references to the copied object ...
#delete the original
yield op.db.Delete(entity)
At the end, I am non-deterministically left with two entities, both with the new parent.
I've reparented a load of entities before - it was a nightmare because of the exact problem you're facing.
What I would do instead is:
Create a new queue. Ensure its paused and that you have a lot of storage space dedicated to queues. It's only temporary, but you'll need it.
Instead of editing your entities in your map reduce job, add them to the queue with a name that will be unique for each entity. The key works fine.
When adding to the queue, because it's paused you'll get an error if you try and add the same named queue twice - so catch the error and skip it, because you know that entity must already have been touched by the map reduce job.
When you're confident that every entity has a matching queue task and the map reduce job has finished, unpause your queue. The queue will do the reparenting.
A couple of notes:
* the task queue size can get pretty big. Can't remember numbers, but it was gigs. Also the size of the queue doesn't update in real time - so don't worry that it might still says gigs of tasks when the queue is nearly empty.
* the reliability of the queue storage is an unknown I believe. It didn't happen to us, but queue items could disappear I guess. Fortunately, you can rerun this process multiple times to ensure it worked, especially if you're deleting entities.
* you may want to ensure you queue has a concurrency limit on it. Without one, a delay in the execution of a couple of tasks can absolutely cripple your application. Learnt that the hard way! I think 30 concurrent tasks went quite well for us.
Hope that's useful, let me know if you come up with any improvements!
App Engine mapreduce runs on the task queue, and like anything else that uses the task queue, tasks have to be idempotent - that is, running them multiple times should have the same effect as running them once. Tasks will occasionally be run more than once; the mapreduce library may have its own reasons for rerunning mapper tasks, too.
In your situation, I'd suggest creating the new entity with a key whose ID is the same as the old entity; that way running it multiple times will just overwrite the same entity.
I’m building a system that generates “work items” that are queued up for back-end processing. I recently completed a system that had the same requirements and came up with an architecture that I don’t feel is optimal and was hoping for some advice for this new system.
Work items are queued up centrally and need to be processed in an essentially FIFO order. If this were the only requirement, then I would probably favor an MSMQ or SQL Server Service Broker solution. However, in reality, I need to select work items in a modified FIFO order. A work item has several attributes, and they need to be assigned in FIFO order where certain combinations of attribute values exist.
As an example, a work item may have the following attributes: Office, Priority, Group Number and Sequence Number (within group). When multiple items are queued for the same Group Number, they are guaranteed to be queued in Sequence Number order and will have the same priority.
There are several back-end processes (currently implemented as Windows Services) that pull work times in modified FIFO order given certain configuration parameters for the given service. The service running Washington, DC is configured to process only work items for DC, while the service in NY may be configured to process both NY and DC items (mainly to increase overall throughput). In addition to this type of selectivity, higher priority items should be processed first, and items that contain the same “Group Number” must be processed in Sequence Number order. So if the NY service is working on a DC item in group 100 with sequence 1, I don’t want the DC service to pull off DC item in group 100 sequence 2 because sequence 1 is not yet complete. Items in other groups should remain eligible for processing.
In the last system, I implemented the queues with SQL tables. I created stored procedures to submit items and, more importantly, to “assign” items to the Windows Services that were responsible for processing them. The assignment stored procedures contain the selection logic I described above. Each Windows Service would call the assignment stored procedure, passing it the parameters that were unique to that instance of the service (e.g. the eligible offices). This assignment stored procedure stamps the work item as assigned (in process) and when the work is complete, a final stored procedure is called to remove the item from the “queue” (table).
This solution does have some advantages in that I can quickly examine the state of these “queues” by a simple SQL select statement. I’m also able to manipulate the queues easily (e.g. I can bump priorities with a simple SQL update statement). However, on the downside, I occasionally have to deal with deadlocks on these queue tables and have the burden of writing these stored procedures (which gets tedious after a while).
Somehow I think that either MSMQ (with or without WCS) or Service Broker should be able to provide a more elegant solution. Rolling my own queuing/work-item-processing system just feels wrong. But as far as I know, these technologies don’t offer the flexibility that I need in the assignment process. I am hoping that I am wrong. Any advice would be welcome.
It seems to me that your concept of an atomic unit of work is a Group. So I would suggest that you only queue up a message that identified a Group Id, and then your worker will have to go to a table that maps Group Id to 1 or more Work Items.
You can handle your other problems by using more than one queue - NY-High, NY-Low, DC-High, DC-Low, etc.
In all honesty, though, I think you are better served to fix your deadlock issues in your current architecture. You should be reading the TOP 1 message from your queue table with Update Lock and Read Past hints, ordered by your priority logic and whatever filter criteria you want (Office/Location). Then you process your 1 message, change it's status or move it to another table. You should be able to call that stored procedure in parallel without a deadlock issue.
Queues are for FIFO order, not random access order. Even though you are saying that you want FIFO order, you want FIFO order with respect to a random set of variables, which is essentially random order. If you want to use queues, you need to be able to determine order before the message goes in the queue, not after it goes in.