During the webinar I heard that there are three ways to access a database for a number of users:
Using process per user
Thread per user
And pool of processes
The lecturer said that process per user avoid the need to take care of parallelization/locking/etc but is too heavy and complex. Thread per user is lightweight, but requires a lot of locking overhead. Pool of processes have shared data structure when accessing DB.
What is not clear to me is - arent users always access the same datastructure and regardless whether we have process or thread or poll - we still need to implement locking? Why would processes not require locking? What is the diff between process per user and pool of processes? As lecturer said - shared data structures. But what does it mean if processes dont share DB? Is DB replicated for each user assuming we are in process per user situation?
I really want this to get clarified, but I could not ask this during the webinar.
Thank you!
Locking is required only if you have some shared resource. When connecting to DB you first create a connection object and then through that object you connect and send your queries. In MySQL innoDb the database performs row level locking and not the entire table is locked. So if multiple process are trying to access different rows no locking will be required.
Coming to Threads, what people mostly do is they create a connection pool where multiple threads access that pool. Let say you have 50 threads and connection pool of 5 objects. Now all 50 thread cannot access these 5 connection objects they need to wait for the connection object to be free and once it is free they can use the connection object to fire query . Threads share same memory space so mostly all shared resources must be thread safe
Since creation of a process is quite heavy, you would want to keep it less max 10-20 processes on 4gb machine. But threads are less costly to create you can have them in large numbers (~50). So if there nothing to be shared Threads will give more parallelism.
Again everything boils down to how good is your design and that is very problem specific.
Related
I have the case where multiple Linux processes need to link with RocksDB library and concurently read (high load) the same database.
The only one process updates database several times a day.
Is it possible to concurrently read from within multiple processes from RocksDB?
Unfortunately can't find this information over the Internet.
Seems that Rocksdb supports multiple read-only or secondary instances (two variations of read-only mode):
Read-only Instance - Opens the database in read-only mode. When the Read-only instance is created, it gets a static read-only view of
the Primary Instance’s database contents
Secondary Instance – Opens the database in read-only mode. Supports extra ability to dynamically catch-up with the Primary
instance (through a manual call by the user – based on their
delay/frequency requirements)
But only one read-write instance:
The Primary Instance is a regular RocksDB instance capable of read,
write, flush and compaction. The Read-only and Secondary Instances
supports read operations alone.
Only single instance of Primary is allowed; but many concurrent
Read-only and Secondary Instances are allowed.
https://github.com/facebook/rocksdb/wiki/Basic-Operations#concurrency indicates that:
A database may only be opened by one process at a time. The rocksdb
implementation acquires a lock from the operating system to prevent
misuse. Within a single process, the same rocksdb::DB object may be
safely shared by multiple concurrent threads. I.e., different threads
may write into or fetch iterators or call Get on the same database
without any external synchronization (the rocksdb implementation will
automatically do the required synchronization).
I'm currently supporting a multi-thread app (Delphi 10.4) that generates about 20 threads to take data from remote punch-clocks and enter it into a DB table (same one for all threads), but does so by having each thread generate its own (MSADO) connection on construction. Now while this does make the application thread-safe since they don't share a resource, would it be a better idea efficiency-wise to make one connection that the threads would share and ensure thread-safety by using TMonitor, critical sections or something similar?
Depends how much writes you have to the database. If you use the single connection and still use one insert statement for each thread, then you are not solving anything. In addition, your application will slow down due to the synchronization between the threads waiting their turn on the database connection.
To do this in a proper way, you will need to apply something like a producer - consumer pattern with a queue between the producer(s) (the threads fetching data from the punch-clock) and the consumer (the db writer thread).
Once the reader thread fetches the data, it will:
lock the queue access
add it to the queue.
unlock the queue
Once the writer thread starts running, it will:
lock the queue access
gather all the data from the queue,
remove messages from the queue
unlock the queue
prepare singe INSERT statement for all rows from the queue
execute the transaction on the DB
sleep for a short period of time (allow other threads to work)
This is not data safe approach, since if there is failure between the data removed from the queue and data committed to the database, the data will be lost.
As you see, this is more complex approach, so if you don't experiance DB congestion, it's not worth to use single connection.
Note: There is also approach to use a pool of DB connections which is the most used pattern in this cases. With the DB pool, a smaller number of connections are shared among large number of threads trying to read/write the database.
We have a typical scenario in which we have to load-balance a set of cloned consumer applications, each running in a different physical server. Here, we should be able to dynamically add more servers for scalability.
We were thinking of using the round-robin load balancing here. But we don't want a long-running job in the server cause a message to wait in its queue for consumption.
To solve this, we thought of having 2 concurrentConsumers configured for each of the server application. When an older message is processed by a thread and a new message arrives, the latter will be consumed from the queue by the second thread. While processing the new message, the second thread has to check a class (global) variable shared by the threads. If 'ON', it can assume that one thread is active (ie. job is already in progress). In that case, it re-routes the message back to its source queue. But if the class variable is 'OFF', it can start the job with the message data.
The jobs are themselves heavyweight and so we want only one job to be processed at a time. That's why the second thread re-routes the message, if another thread is active.
So, the question is 'Any simple way that concurrent consumers can share data in Camel?'. Or, can we solve this problem in an entirely different way?
For a JMS broker like ActiveMQ you should be able to simply use concurrent listeners on the same queue. It should do round robin but only with the consumers that are idle. So basically this should just work. Eventually you have to set the prefetch size to 1 as prefetch might cause a consumer to take messages even when a long running process will block them.
How should one ensure correctness when multiple processes access one single SQLite database file?
First, avoid concurrent access to sqlite database files. Concurrency is one of sqlite's weak points and if you have a highly concurrent application, consider using another database engine.
If you cannot avoid concurrency or drop sqlite, wrap your write transactions in BEGIN IMMEDIATE; ... END;. The default transaction mode in sqlite is DEFERRED which means that a lock is acquired only on first actual write attempt. With IMMEDIATE transactions, the lock is acquired immediately, or you get SQLITE_BUSY immediately. When someone holds a lock to the database, other locking attempts will result in SQLITE_BUSY.
Dealing with SQLITE_BUSY is something you have to decide for yourself. For many applications, waiting for a second or two and then retrying works quite all right, giving up after n failed attempts. There are sqlite3 API helpers that make this easy, e.g. sqlite3_busy_handler() and sqlite3_busy_timeout() but it can be done manually as well.
You could also use OS level synchronization to acquire a mutex lock to the database, or use OS level inter-thread/inter-process messaging to signal when one thread is done accessing the database.
Any SQLite primitive will return SQLITE_BUSY if it tries to access a database other process is accessing at the same time. You could check for that error code and just repeat the action.
Alternatively you could use OS synchronization - mutex on MS Windows or something similar on other OSes. The process will try to acquire the mutex and if someone else already holds it the process will be blocked until the other process finishes the operation and releases the mutex. Care should be taken to prevent cases when the process acquires the mutext and then never releases it.
The SQLite FAQ about exactly this
Basically you need to wrap your data access code with transactions. This will keep your data consistent. Nothing else is required.
In SQLite you are using
BEGIN TRANSACTION
COMMIT TRANSACTION
pairs to delimit your transactions. Put your SQL code in between in order to have it execute in a single transaction.
However, as previous people have commented before me - you need to pay close attention for concurrency issues. SQLite can work reasonably fast if it used for read access (multiple readers are not blocked and can run concurrently).
However - the picture changes considerably if your code interleaves write and read access. With SQLite - your entire database file will be locked if even a single writer is active.
I have a VB6 application accessing a single table on a MSSQL2000 server via ADO. I'm using read-only access (adOpenStatic, adLockReadOnly) There are other applications in the network which do make changes to the table.
For some reason I'm getting errors about my application being chosen as a deadlock victim.
I'm really confused: Why can there be a deadlock when I'm just reading from a single table? I'd expect timeouts, because of the writing of the other applications, but not a deadlock...
Can someone shed some light on this?
UPDATE: 2009-06-15 I'm still interested in a solution to this problem. So I'm providing some more information:
It makes no difference if I choose adOpenForwardOnly or adOpenStatic
It makes no difference if the cursor position is client or server.
It is possible for a single SELECT statement to deadlock against a single UPDATE or DELETE statement due to the presence of a non-clustered index, consider the follwing scenario:
The reader (your app) first obtains a shared lock on the non-clustered index in order to perform a lookup, and then attempts to obtain a shared lock on the page contianing the data in order to return the data itself.
The writer (other app) first obtains an exlusive lock on the database page containing the data, and then attempts to obtain an exclusive lock on the index in order to update the index.
You can find more information on this (and other) type of deadlock in the Microsoft KB article Q169960 (http://support.microsoft.com/kb/q169960/)
Also you might want to take a look on Google on how to obtain deadlock trace information (trace flag 1222) - this will report on exactly what SQL statements are conflicting over what objects whenever a deadlock occurrs. This is a fairly decent looking article - http://blogs.msdn.com/bartd/archive/2006/09/09/747119.aspx
I think there are a number of possibilities in the answers already provided here. Since you only take shared locks, the deadlock can't be due to lock escalation, and must simply be acquiring locks that are incompatible with those acquired in another process, and acquiring those locks in a different order...
Your shared locks are incompatible with another process taking exclusive locks. The scenario might run something like this...
You take shared lock on resource A
Other process takes exclusive lock on resource B
Other process tries to take exclusive lock on resource A, and blocks waiting for you to release your shared lock on A.
You try to take shared lock on resource B, and would block waiting for the other process to release its exclusive lock on B, except that you're now in a deadlock situation, which is identified by the server and it chooses a process to kill.
N.B. deadlocks can have more players than just 2. Sometimes there's a whole chain of interwoven activity that results in a deadlock, but the principle is the same.
Often, if multiple applications access the same database, there is a DBA that manages all access via stored procedures, so he can ensure resources are always locked in the same order. If you're not in that situation, and the other applications use ad-hoc SQL statements you'd have to inspect their code to find out if they might conflict with your app in the way I've described. That doesn't sound like fun.
A pragmatic solution might be to catch the error when your transaction is killed as a deadlock victim, and simply re-try the transaction several times. Depending on how much activity the other apps are generating, you might achieve acceptable results this way.
several cases described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/03/when-index-covering-prevents-deadlocks.aspx
Reads can still incur locks, in order for the DB to ensure that a write isnt done in the middle of a non-atmic read. In other words the read lock ensures that you get an accurate consistent snapshot of whatever data you are slecting.
Do you get the same behaviour with adOpenForwardOnly ?
You might want to check that your SQL Server statistics are up to date. Or you could get your DBA to rebuild all indexes. Many locking problems are due to out of date statistics/indexes.
It depends on both application's behavior.
your app can surely wait on the other to release resources.
A deadlock refers to a condition when two or more processes are waiting for each other to release a resource, or more than two processes are waiting for resources in a circular chain. Sure you can create a deadlock with read-only access because the read will NOT wait.
There is a nice explanation about the deadlock conditions at the wikipedia
Wouldn't it be something like this?
Other Application: Write to table (acquire write lock on table)
Your Application: Read from table (acquire read lock on table, can't due to write lock).