We are accessing shared database from multiple pods in kubernetes. All the pods may writes/updates and reads from the db. How to handle data integrity in this case ? Is JPA isolation levels are enough for handling this or need to go with pessimistic locking ? Can you also help me with the difference between both ?
Your question has nothing to do with Kubernets. This simply concurrent database access that you'll get if more than one connection is accessing your database.
If you want to avoid problems like lost update then you need locking.
There are two types of locking.
Optimistic
Pessimistic
Optimistic is handled by Hibernate with a version field. Pessimistic is the locking mechanism of the database.
You should also read Hibernates documentation about Locking.
https://docs.jboss.org/hibernate/orm/5.5/userguide/html_single/Hibernate_User_Guide.html#locking
Related
I'm referring to the scenario of applying 2PC to heterogenous distributed transactions. Say I want to write to both database A and database B together in atomic fashion. Now reads and writes to database A and B each themselves could be from multiple concurrent users. Normally in a high throughput environment we'd prefer A and B to use Optimistic Locking (row versioning for example) instead of Pessimistic Locking so high volume concurrent read only operations will not be blocked.
But if A and B are also involved as a whole in a 2PC protocol, does that mean they HAVE to use Lock during each of their PREPARE phases for those relevant data changes? And basically need to hold that lock until COMMIT are done? Because the transaction on each is forced split into two parts, and you simply can't just do "prepare to write a value that I read now is the most up-to-date value" cos when you actually commit it later, it could very well be changed, if the resource isn't locked?
Does that mean for example, in distributed transaction environments involving multiple databases, each of their own concurrent throughput will be limited simply because they need to be coordinated with other databases even though they on their own could have used Optimistic Locking?
I am making this application which reads and writes from a database and is accessed by multiple users. To avoid concurrency issues I am using mutex. The database that I am using is postgresql. Its documentation says it is ACID compliant and provides various levels of synchronization such as read_committed etc. So I can avoid using mutex and put all my statements in a transaction block and the database will take care of it. But I am not fully confident of using this transaction based approach as I am having trust issues with the database automatic mechanism.
My current approach:
mutex.lock();
\\perform database operations
mutex.unlock();
Alternative approach:
begin transaction
\\perform database operations
end transaction
Is it wise to handle with mutex or should I rely on the database mechanism.
Each user is accessing the database in a separate thread. And database operations are simple. One read and one write. That is all.
If the database is being accessed by multiple users simultaneously, an application level mutex does absolutely nothing to prevent them from stepping on each other on the database side1. You must use the locking constructs provided at the database level (transactions) to achieve what you are after.
A better use case for the application level mutex is to provide resource locking between threads running within the application (which may also be achievable with database transactions, but use the right tool for the job).
1: I have to be careful here: if an application handles multiple users in a single instance, or otherwise shares database objects outside of the database, then a mutex might be a good way to do locking. Even then, it won't protect things on the database (meaning it's not functionality that is built into the DBMS), and it's still probably better to let the database take care of it's own locks.
I am using hibernate-PostgreSQL in my application and I am using application-managed entity manager to perform database operations.
I have a very basic question regarding whether it is required/recommended to start a transaction for reading records from database?
If yes, won’t it make extra work to manage the transaction? We need to make sure rollback is done after reading, if not it will cause connection leak. Make sure transaction is not kept open for long as it can cause unnecessary resource usage and so on.
Hibernate document says "Always use clear transaction boundaries, even for read-only operations.". What would be the advantage of starting transaction if I just want to read some records from database?
Is it only for batching multiple read operations using single transaction?
I know SQL Server is very robust in this sense (transactions and locking), but how would that work with NoSQL databases like AWS DocumentDB with Mongo API?
There's no shortcut to diving in and learning each individual systems concurrency model and offerings :/
These guarentees can be found by searching for "Isolation Levels" Or "Default Isolation Levels" for your target database.
https://docs.mongodb.com/manual/core/read-isolation-consistency-recency/
https://www.postgresql.org/docs/7.2/xact-read-committed.html
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html
One thing to note is that the MySQL and PostgreSQL default isolation level is "Read Committed". Which actually can lead to incorrect applications in concurrent environments for common types of queries.
For example if you have a multi threaded web application which allows users to set to their account balance. If both threads fetch the account balance this will result in a logical race where the last thread ends up overwriting the result of first thread. This is described in detail in each of the documents above.
In Amazon DocumentDB, all CRUD statements (findAndModify, update, insert, delete) guarantee atomicity and consistency, even for operations that modify multiple documents. For more information, see Implicit Transactions.
Additionally, reads from an Amazon DocumentDB cluster’s primary instance are strongly consistent under normal operating conditions and have read-after-write consistency. For more information, see Read Preference Options.
Hallo,
I want to get data into a database on a multicore system with ative WAL using JDBC. I was thinking about spawning multiple threads in my application to insert data parallely.
If the application has multiple threads I will have to increase the isolation level to Repeatable Read which on MVCC-databases should be mapped to Snapshot isolation.
If I were using one thread I wouldn't need isolation levels. As far as I know most Snapshot isolation databases analyze the write sets of all transaction that could have a conflict and then rollback all but one of the real conflict transactions. More specific I'm talking about Oracle, InnoDB and PostgreSQL.
1.) Is this analyzing of the write sets expensive?
2.) Is it a good idea to multithread the inserts for a higher total throughput? Real conflict are nearly impossible because of the application layer feeding the threads conflict free stuff. But the database shall be a safety net.
Oracle does not support Repeatable Read. It supports only Read Committed and Serializable. I might be mistaken, but setting an isolation level of Repeatable Read for Oracle might result in a transaction with an isolation level of Serializable. In short, you are left to mercy of the database support for the isolation levels that you desire.
I cannot speak for InnoDB and PostgreSQL, but the same would apply if they do not support the required isolation levels. The database could automatically upgrade the isolation level to a higher level to meet the desired isolation characteristics. You ought to rethink this approach, if your application's desired isolation level has to be Repeatable Read.
The problem like you've rightly inferred is that optimistic locking will possibly result in transaction rollbacks, if a conflict is detected. Oracle does so by reporting the ORA-08177 SQL error. Since this error is reported when two threads will access the same data range, it could be avoided if the threads work against data sets involving different data ranges. You will have to ensure that this is the case when dividing work across threads.
I think the limiting factor here will be disk IO, not the overhead of moving to Repeatable Read.
Even a single thread may be able to max out the disks on the DB server especially with the amount of DB logging required on insert / update. Are you sure that's not already the case?
Also, in any multi-user system, you probably want to be running with Repeatable Read isolation anyway (Postgres only supports this and serializable). So, I don't think of this as adding any "overhead" above what I would normally see.