I am building a distributed KV store just to learn a little bit more about distributed systems and concurrency. The implementation of the KV store I am building is completely transactional, with an in-memory transaction log. The storage is also completely in-memory just for simplicity. The API exposes the GET, INSERT, UPDATE, REMOVE. Note that all endpoints operate on a single key, not a range of keys.
I am managing concurrency via locks. However, I have a single global lock that locks the entire datastore. This sound terribly inefficient because if I want to read the value for K1 while I update K2, I must wait for k2 to finish updating despite being unrelated.
I know that there are DBs that use more granular locking. For example, in MySQL servers there are row-level locks. How can key-level locks be implemented?
I have
type Storage struct {
store map[string]int32
}
Should I add something like this?:
type Storage struct {
store map[string]int32
locks map[string]mutex.Lock
}
The issue if I do this is that locks has to be kept in sync with the store. Another option would be to combine the two maps, but even then I come into the issue of deleting an entry in the map while the lock is being held if a REMOVE request comes before a GET.
Concetual part
Transactions
First, transaction logs are not needed for strong consitency. Transaction logs are useful for upholding ACID properties.
Transactions are also not strictly required for strong consistency in a database, but they can be a useful tool for ensuring consistency in many situations.
Strong consistency refers to the property that ensures that all reads of a database will return the most recent write, regardless of where the read operation is performed. In other words, strong consistency guarantees that all clients will see the same data, and that the data will be up-to-date and consistent across the entire system.
You can use a consensus algorithm, such as Paxos or Raft, to assure strong consistency. When storing data, you can store data with a version, and use that as the ID in Paxos.
Locking in KV Stores
In a key-value (KV) store, keys are typically locked using some kind of locking mechanism, such as a mutex or a reader-writer lock (as suggested by #paulsm4). This allows multiple threads or processes to access and modify the data in the KV store concurrently, while still ensuring that the data remains consistent and correct.
For example, when a thread or process wants to read or modify a particular key in the KV store, it can acquire a lock for that key. This prevents other threads or processes from concurrently modifying the same key, which can lead to race conditions and other problems. Once the thread or process has finished reading or modifying the key, it can release the lock, allowing other threads or processes to access the key.
The specific details of how keys are locked in a KV store can vary depending on the implementation of the KV store. Some KV stores may use a global lock (as you were already doing, which is sometimes inefficient) that locks the entire data store, while others may use more granular locking mechanisms, such as row-level or key-level locks, to allow more concurrent access to the data.
So tldr; conceptually, you were right. The devil is in the details of the implementation of the locking.
Coding
To strictly answer the question about locking, one can consider Readers-writers locks as suggested by #paulsm4. In golang, a similar lock is RWMutex. It is used in sync.Map.
Here is a short example:
type Storage struct {
store sync.Map // a concurrent map
}
// GET retrieves the value for the given key.
func (s *Storage) GET(key string) (int32, error) {
// Acquire a read lock for the key.
v, ok := s.store.Load(key)
if !ok {
return 0, fmt.Errorf("key not found: %s", key)
}
// Return the value.
return v.(int32), nil
}
// INSERT inserts the given key-value pair into the data store.
func (s *Storage) INSERT(key string, value int32) error {
// Acquire a write lock for the key.
s.store.Store(key, value)
return nil
}
// UPDATE updates the value for the given key.
func (s *Storage) UPDATE(key string, value int32) error {
// Acquire a write lock for the key.
s.store.Store(key, value)
return nil
}
// REMOVE removes the key-value pair for the given key from the data store.
func (s *Storage) REMOVE(key string) error {
// Acquire a write lock for the key.
s.store.Delete(key)
return nil
}
You would need Paxos on top of this to ensure consistency accross replicas.
Related
Stages of MongoDB aggregation pipeline are always executed sequentially. Can the documents that the pipeline processes be changed between the stages? E.g. if stage1 matches some docs from collection1 and stage2 matches some docs from collection2 can some documents from collection2 be written to during or just after stage1 (i.e. before stage2)? If so, can such behavior be prevented?
Why this is important: Say stage2 is a $lookup stage. Lookup is the NoSQL equivalent to SQL join. In a typical SQL database, a join query is isolated from writes. Meaning while the join is being resolved, the data affected by the join cannot change. I would like to know if I have the same guarantee in MongoDB. Please note that I am coming from noSQL world (just not MongoDB) and I understand the paradigm well. No need to suggest e.g. duplicating the data, if there was such a solution, I would not be asking on SO.
Based on my research, MongoDb read query acquires a shared (read) lock that prevents writes on the same collection until it is resolved. However, MongoDB documentation does not say anything about aggregation pipeline locks. Does aggregation pipeline hold read (shared) locks to all the collections it reads? Or just to the collection used by the current pipeline stage?
More context: I need to run a "query" with multiple "joins" through several collections. The query is generated dynamically, I do not know upfront what collections will be "joined". Aggregation pipeline is the supposed way to do that. However, to get consistent "query" data, I need to ensure that no writes are interleaved between the stages of the pipeline.
E.g. a delete between $match and $lookup stage could remove one of the joined ("lookuped") documents making the entire result incorrect/inconsistent. Can this happen? How to prevent it?
#user20042973 already provided a link to https://www.mongodb.com/docs/manual/reference/read-concern-snapshot/#mongodb-readconcern-readconcern.-snapshot- in the very first comment, but considering followup comments and questions from OP regarding transactions, it seems it requires full answer for clarity.
So first of all transactions are all about writes, not reads. I can't stress it enough, so please read it again - transaction, or how mongodb introduced the "multidocument transactions" are there to ensure multiple updates have a single atomic operation "commit". No changes made within a transaction are visible outside of the transaction until it is committed, and all of the changes become visible at once when the transaction is committed. The docs: https://www.mongodb.com/docs/manual/core/transactions/#transactions-and-atomicity
The OP is concerned that any concurrent writes to the database can affect results of his aggregation operation, especially for $lookup operations that query other collections for each matching document from the main collection.
It's a very reasonable consideration, as MongoDB has always been eventually consistent and did not provide guarantees that such lookups will return the same results if the linked collection were changed during aggregation. Generally speaking it doesn't even guarantee a unique key is unique within a cursor that uses this index - if a document was deleted, and then a new one with same unique key was inserted there is non-zero chance to retrieve both.
The instrument to workaround this limitation is called "read concern", not "transaction". There are number of read concerns available to balance between speed and reliability/consistency: https://www.mongodb.com/docs/v6.0/reference/read-concern/ OP is after the most expensive one - "snapshot", as ttps://www.mongodb.com/docs/v6.0/reference/read-concern-snapshot/ put it:
A snapshot is a complete copy of the data in a mongod instance at a specific point in time.
mongod in this context spells "the whole thing" - all databases, collections within these databases, documents within these collections.
All operations within a query with "snapshot" concern are executed against the same version of data as it was when the node accepted the query.
Transactions use this snapshot read isolation under the hood and can be used to guarantee consistent results for $lookup queries even if there are no writes within the transaction. I'd recommend to use read concern explicitly instead - less overhead, and more importantly it clearly shows the intent to devs who are going to maintain your app.
Now, regarding this part of the question:
Based on my research, MongoDb read query acquires a shared (read) lock that prevents writes on the same collection until it is resolved.
It would be nice to have source of this claim. As of today (v5.0+) aggregation is lock-free, i.e. it is not blocked even if other operation holds an exclusive X lock on the collection: https://www.mongodb.com/docs/manual/faq/concurrency/#what-are-lock-free-read-operations-
When it cannot use lock-free read, it gets intended shared lock on the collection. This lock prevents only write locks on collection level, like these ones: https://www.mongodb.com/docs/manual/faq/concurrency/#which-administrative-commands-lock-a-collection-
IS lock on a collections still allows X locks on documents within the collection - insert, update or delete of a document requires only intended IX lock on collection, and exclusive X lock on the single document being affected by the write operation.
The final note - if such read isolation is critical to the business, and you must guarantee strict consistency, I'd advise to consider SQL databases. It might be more performant than snapshot queries. There are much more factors to consider, so I'll leave it to you. The point is mongo shines where eventual consistency is acceptable. It does pretty good with causal consistency within a server session, which gives enough guarantee for much wider range of usecases. I encourage you to test how good it will do with snapshots queries, especially if you are running multiple lookups, which can by its own be slow enough on larger datasets and might not even work without allowing disk use.
Q: Can MongoDB documents processed by an aggregation pipeline be affected by external write during pipeline execution?
A: Depending on how the transactions are isolated from each other.
Snapshot isolation refers to transactions seeing a consistent view of data: transactions can read data from a “snapshot” of data committed at the time the transaction starts. Any conflicting updates will cause the transaction to abort.
MongoDB transactions support a transaction-level read concern and transaction-level write concern. Clients can set an appropriate level of read & write concern, with the most rigorous being snapshot read concern combined with majority write concern.
To achieve it, set readConcern=snapshot and writeConcern=majority on connection string/session/transaction (but not on database/collection/operation as under a transaction database/collection/operation concern settings are ignored).
Q: Do transactions apply to all aggregation pipeline stages as well?
A: Not all operations are allowed in transaction.
For example, according to mongodb docs db.collection.aggregate() is allowed in transaction but some stages (e.g $merge) is excluded.
Full list of supported operation inside transaction: Refer mongodb doc.
Yes, MongoDB documents processed by an aggregation pipeline can be affected by external writes during pipeline execution. This is because the MongoDB aggregation pipeline operates on the data at the time it is processed, and it does not take into account any changes made to the data after the pipeline has started executing.
For example, if a document is being processed by the pipeline and an external write operation modifies or deletes the same document, the pipeline will not reflect those changes in its results. In some cases, this may result in incorrect or incomplete data being returned by the pipeline.
To avoid this situation, you can use MongoDB's snapshot option, which guarantees that the documents returned by the pipeline are a snapshot of the data as it existed at the start of the pipeline execution, regardless of any external writes that occur during the execution. However, this option can affect the performance of the pipeline.
Alternatively, it is possible to use a transaction in MongoDB 4.0 and later versions, which allows to have atomicity and consistency of the write operations on the documents during the pipeline execution.
I'm implementing a Neo4j client for BG benchmark. There are 11 functions, simulating 11 different social networking actions. Each of these functions has its own transaction body. But when I'm running with 100 threads, sometimes it throws deadlock detection exception.
I have users as nodes and friendships as relationships. I have invite friend, reject friend, accept friend and thaw friendship which all have two users as their input. The way they're working is that they're getting all relationships of one user node and find the relationship with the other user node.
Is any one aware of locking mechanism of Neo4j?
You can read about the deadlocks in the Neo4j documentation. These can appear when you have concurrent modifications of the same entities (nodes or relationships). Note that when modifying a entity, several locks may be used: for instance for a relationship, the locks on the two nodes connected by the relationship are taken.
Default locking behaviour:
When adding, changing or removing a property on a node or relationship a write lock will be taken on the specific node or relationship.
When creating or deleting a node a write lock will be taken for the specific node.
When creating or deleting a relationship a write lock will be taken on the specific relationship and both its nodes.
The locks will be added to the transaction and released when the transaction finishes.
Design database in a way that minimum locking will be there.
Avoid using same nodes and relationships are used by many users at same instance. Keep minimum transaction period for those nodes and relationships.
Maybe serializing the parallel writing query would be helpful to your solution. In your situation, You have 11 functions, simulating 11 different social networking actions. In your description, I thought that some of the actions(transactions) may be executed in sequential order (e,g, you can only accept friend request after your friend sent an invitation). You may serialize some write transactions. In other words, some queries will be blocked until the previous ones finished.
With the help of Causal chaining and bookmarks, you can serialize the operations for each session. For example, if you have three functions, sendInvitationToFriend, reject friend, acceptFriend. reject/accept transactions will be blocked until the sendInvitationToFriend transaction finished.
Some code snippets (neo4j-java-driver 4.1) :
List<Bookmark> bookmarks = new ArrayList<>();
// Send Invitation.
try ( Session session = driver.session( builder().withDefaultAccessMode( AccessMode.WRITE ).build())){
session.writeTransaction( tx -> this.sendInvitationToFriend( tx, "friendId", "yourId"));
savedBookmarks.add(session.lastBookmark() );
}
// accept the invitation
try (Session session = driver.session( builder().withDefaultAccessMode( AccessMode.WRITE ).withBookmarks( savedBookmarks ).build())){
session.writeTransaction( tx -> this.acceptFriend(tx, "friendId", "yourId"));
}
// Create a friendship between the two people created above.
try (Session session = driver.session(builder().withDefaultAccessMode( AccessMode.WRITE).withBookmarks(savedBookmarks ).build())) {
session.writeTransaction( tx -> this.rejectFriend( tx, "friendId", "yourId"));
}
You also mentioned that your simulation is somehow random. My suggestion you could define a retry strategy in your program, re-attempt the query serval times until it goes well and let the main thread to sleep for a while between any two retries. You can find more details information from the link
Wish this post will be helpful to you.
How can we make sure only one instance of a JVM is modifying a mem cache key instance at any one time, and also that a read isn't happening while in the middle of a write? I can use either the low level store, or the JSR wrapper.
List<Foo> foos = cache.get("foos");
foos.add(new Foo());
cache.put("foos", foos);
is there any way for us to protect against concurrent writes etc?
Thanks
Fh. is almost on the right way, but not completely. He is right saying MemCache works atomically, but this does not save you to take care of 'synchronization'. As we have many instances of JVM's running, we cannot speak of real synchronization in terms of what we usually understand when thinking of multi-threaded environments. Here's the solution to 'synchronize' MemCache access across one or many Instances :
With the methodMemcacheService.getIdentifiable(Object key) you get an Identifiable object instance. You can later put it back in the MemCache using MemcacheService.putIfUntouched(...)
Check the API at MemCache API : getIdentifiable().
An Identifiable Object is a wrapper containing the Object you fetched via it's Key.
Here's how it works:
Instance A fetches Identifiable X.
Instance B fetches Identifiable X at the 'same' time
Instance A & B will do an update of X's wrapped object (your object, actually).
The first instance doing a putIfUntouched(...) to store back the object will succeed, putIfUntouched will return true.
The second instance trying this will fail to do so, with putIfUntouched returning false.
Are you worried about corrupting memcache by reading a key, while a parallel operation is writing to the same key? Don't worry about this, on the memcache server both operations will run atomically.
Memcache will never output a 'corrupt' result, nor have corruption issues while writing.
Let's we have a lot of such classes(millions)
class WordInfo
{
string Value;
string SomeOtherFeatures;
List<Point> Points;
}
And following code
private Dictionary<string, WordInfo> _dict;
public void ProcessData(IEnumerable<Tuple<string,int,int> words)
{
foreach(var word in words)
{
if(_dict.ContainsKey(word.Item1))
{
_dict[word.Item1].Points.Add(new Point(word.Item2,word.Item3));
}
else
{
_dict.Add(word.Item1, new WordInfo(....))
}
}
}
Main()
{
while(true)
{
IEnumerable<Tuple<string,int,int> data = GetDataSomewhere();
ProcessData(data);
}
}
As you can see this code must work 24\7. The main problem is that i donnt know how to represent _dict (place where i store information) in database. I need to process 1000-5000 words per second. Relational db is not good for my task, right? What about NoSQL? I need fast UPDATE and INSERT operations. Also i need fast check is word exists(SELECT) in db. Because of i have millions records it's also not trivial. What do you can suggest? May be write my custom solution based on files?
A relational database should be able to insert/update 1000-5000 words per second easily, assuming you don't create too many transactions.
Transactions are ACID and "D" means durable: when the client receives a notification that the transaction is committed, it is guaranteed that the effects of the transaction are already in the permanent storage (so even if a power cut happens at that exact moment, the transaction won't be "erased"). In practice, this means the DBMS must wait for the disk to finish the physical write.
If you wrap each and every insert/update in its own transaction, you'll also have to perform this wait for each and every one of them. OTOH, if you wrap many inserts/updates in a single transaction, you'll have to pay this price only once per whole "chunk".
Also, checking for the existence of a specific row within millions of others is a task databases are very good at, thanks to the power of B-Tree indexes.
As for the database structure, you'd need something similar to this:
And you'd process it like this (pseudocode):
BEGIN TRANSACTION;
foreach(var word in words)
{
try {
INSERT INTO WORD (WORD_VALUE, SOME_OTHER_FEATURES) VALUES (word.Item1, ...);
}
catch (PK violation) {
// Ignore it.
}
try {
INSERT INTO POINT (WORD_VALUE, X, Y) VALUES (word.Item1, word.Item2, word.Item3);
}
catch (PK violation) {
// Ignore it.
}
}
COMMIT;
(NOTE: I'm assuming you never update the SOME_OTHER_FEATURES after it has been initially inserted. If you do, the logic above would be more complicated.)
If your DBMS supports it, consider making both of these tables clustered (aka. index-organized). Also, if your DBMS supports it, compress the leading edge of the POINT's primary index (WORD_VALUE), since all points related to the same word contain same value there.
BTW, the model above uses so-called identifying relationships and natural keys. An alternate model that uses surrogate keys and non-identifying relationships is possible, but would complicate the kind of processing you need.
Background
Given:
a set of threads
each thread has its own data source
objects in each data source references objects in other data sources
there is a possibility for duplicate objects across various data sources
the threads are writing to a database with an engine that enforces the foreign key constraint
each type of object gets its own table and references the other objects through a foreign key
each thread has its own connection to the database
Proposed Solution
A register class which tracks the ID of the objects that have been written. The inteface of the register class has public methods, thus (represented in Java):
public interface Register
{
synchronized boolean requestObjectLock(int id);
synchronized boolean saveFinalized(int id);
synchronized boolean checkSaved(int id);
}
The method requestObjectLock checks to see if the object has been locked by another thread yet, and returns false it has. Otherwise, it locks that ID and returns true. It is then the responsibility of the calling thread to call saveFinalized when it has been successfully written to the database, and the responsibility of all other threads to check to see whether it has been written already with checkSaved before writing an object that references it. In other words, there are three states an object can be in: unregistered, locked (registered but unwritten), and saved (registered and written).
Reasoning
As far as I know there is no way to guarentee that one SQL query will finish before another when called by different threads. Thus, if an object was only registered or unregistered, it seems possible that a thread could check to see if an object was written, start writing an object that referenced it, and have its query complete (and fail) before the query that actually wrote the referenced object did.
Questions
Is it possible to guarantee the sequence of execution of queries being executed by different threads? And therefore, is this solution overengineered? Is there a simpler solution? On the other hand, is it safe?
The terms you need to research on the database side are "transaction isolation level" and "concurrency control". DBMS platform support varies. Some platforms implement a subset of the isolation levels defined in the SQL standards. (The SQL standards allow this. They're written in terms of what behavior isn't allowed.) And different platforms approach concurrency control in different ways.
Wikipedia, although not authoritative, has a good introduction to isolation levels, and also a good introduction to concurrency control.
As far as I know there is no way to guarentee that one SQL query will
finish before another when called by different threads.
That's kind of true. It's also kind of not true. In SQL standards, transaction isolation levels aren't concerned with who finishes first. They're concerned with behavior that's not allowed.
dirty read: Transaction A can read data written by concurrent, uncommitted transaction B.
nonrepeatable read: Transaction A reads data twice. A concurrent transaction, B, commits between the two reads. The data transaction A read first is different from the data it read second, because of transaction B. (Some people describe transaction A as seeing "same rows, different column values".)
phantom read: Transaction A reads data twice. A concurrent transaction, B, commits between the two reads. Transaction A's two reads return two different sets of rows, because transaction B has affected the evaluation of transaction A's WHERE clause. (Some people describe transaction A as seeing "same column values, different rows".)
You control transaction behavior in SQL using SET TRANSACTION. So SET TRANSACTION SERIALIZABLE means dirty reads, nonrepeatable reads, and phantom reads are impossible. SET TRANSACTION REPEATABLE READ allows phantom reads, but dirty reads and nonrepeatable reads are impossible.
You'll have to check your platform's documentation to find out what it supports. PostgreSQL, for example, supports all four isolation levels syntactically. But internally it only has two levels: read committed and serializable. That means you can SET TRANSACTION READ UNCOMMITTED, but you'll get "read committed" behavior.
Important for you: The effect of a serializable isolation level is to guarantee that transactions appear to have been issued one at a time by a single client. But that's not quite the same thing as saying that if transaction A starts before transaction B, it will commit before transaction B. If they don't affect each other, the dbms is allowed to commit transaction B first without violating the serializable isolation level semantics.
When I have questions myself about how these work, I test them by opening two command-line clients connected to the same database.