I'm currently supporting a multi-thread app (Delphi 10.4) that generates about 20 threads to take data from remote punch-clocks and enter it into a DB table (same one for all threads), but does so by having each thread generate its own (MSADO) connection on construction. Now while this does make the application thread-safe since they don't share a resource, would it be a better idea efficiency-wise to make one connection that the threads would share and ensure thread-safety by using TMonitor, critical sections or something similar?
Depends how much writes you have to the database. If you use the single connection and still use one insert statement for each thread, then you are not solving anything. In addition, your application will slow down due to the synchronization between the threads waiting their turn on the database connection.
To do this in a proper way, you will need to apply something like a producer - consumer pattern with a queue between the producer(s) (the threads fetching data from the punch-clock) and the consumer (the db writer thread).
Once the reader thread fetches the data, it will:
lock the queue access
add it to the queue.
unlock the queue
Once the writer thread starts running, it will:
lock the queue access
gather all the data from the queue,
remove messages from the queue
unlock the queue
prepare singe INSERT statement for all rows from the queue
execute the transaction on the DB
sleep for a short period of time (allow other threads to work)
This is not data safe approach, since if there is failure between the data removed from the queue and data committed to the database, the data will be lost.
As you see, this is more complex approach, so if you don't experiance DB congestion, it's not worth to use single connection.
Note: There is also approach to use a pool of DB connections which is the most used pattern in this cases. With the DB pool, a smaller number of connections are shared among large number of threads trying to read/write the database.
Related
During the webinar I heard that there are three ways to access a database for a number of users:
Using process per user
Thread per user
And pool of processes
The lecturer said that process per user avoid the need to take care of parallelization/locking/etc but is too heavy and complex. Thread per user is lightweight, but requires a lot of locking overhead. Pool of processes have shared data structure when accessing DB.
What is not clear to me is - arent users always access the same datastructure and regardless whether we have process or thread or poll - we still need to implement locking? Why would processes not require locking? What is the diff between process per user and pool of processes? As lecturer said - shared data structures. But what does it mean if processes dont share DB? Is DB replicated for each user assuming we are in process per user situation?
I really want this to get clarified, but I could not ask this during the webinar.
Thank you!
Locking is required only if you have some shared resource. When connecting to DB you first create a connection object and then through that object you connect and send your queries. In MySQL innoDb the database performs row level locking and not the entire table is locked. So if multiple process are trying to access different rows no locking will be required.
Coming to Threads, what people mostly do is they create a connection pool where multiple threads access that pool. Let say you have 50 threads and connection pool of 5 objects. Now all 50 thread cannot access these 5 connection objects they need to wait for the connection object to be free and once it is free they can use the connection object to fire query . Threads share same memory space so mostly all shared resources must be thread safe
Since creation of a process is quite heavy, you would want to keep it less max 10-20 processes on 4gb machine. But threads are less costly to create you can have them in large numbers (~50). So if there nothing to be shared Threads will give more parallelism.
Again everything boils down to how good is your design and that is very problem specific.
I am using sqlite3 3.6.23.1 version in fedora linux.I have two threads running to access the database.There is a chance that both thread will try perform write operation in the same table.
Table gets locked when first thread is performing write operation.How do i handle this case.
Is there any C,sqlite3 API mechanism like wait and then write into the table until another threads is completing write operation.
Thanks & Regards.
-praveen
There is a "shared cache" mode which can be set via the C API as described here.
Sqlite3 does a good job of maximizing concurrency and it is also thread-safe. See the File Locking and Concurrency document for more details.
The table does indeed get locked for the duration of the write operation, but sqlite3 is able to navigate this condition by waiting for the lock to be released and then the lock is granted to the second process/thread wanting to perform a write (or read). The time-out for waiting for a lock can be configured in your sqlite connection code. Here is the syntax for Python 2.7:
sqlite3.connect(database[, timeout, detect_types, isolation_level,
check_same_thread, factory, cached_statements])
The default timeout is 5.0 seconds. So it would take a fairly bulky SELECT or COMMIT transaction to hold a lock for that amount of time. Based on your use-case you could, however, tweak the timeout OR include code to catch timeout exceptions.
A final option would be to incorporate some kind of flagging mechanism in your code that requires competing threads to wait until a flag clears before attempting to access the DB, but this duplicates the effort of the sqlite3 developers who have catered for concurrency scenarios as a major part of their job.
Here is an interesting article outlining a problem whereby older versions of sqlite may sleep for a whole second when unable to acquire a lock.
We have a typical scenario in which we have to load-balance a set of cloned consumer applications, each running in a different physical server. Here, we should be able to dynamically add more servers for scalability.
We were thinking of using the round-robin load balancing here. But we don't want a long-running job in the server cause a message to wait in its queue for consumption.
To solve this, we thought of having 2 concurrentConsumers configured for each of the server application. When an older message is processed by a thread and a new message arrives, the latter will be consumed from the queue by the second thread. While processing the new message, the second thread has to check a class (global) variable shared by the threads. If 'ON', it can assume that one thread is active (ie. job is already in progress). In that case, it re-routes the message back to its source queue. But if the class variable is 'OFF', it can start the job with the message data.
The jobs are themselves heavyweight and so we want only one job to be processed at a time. That's why the second thread re-routes the message, if another thread is active.
So, the question is 'Any simple way that concurrent consumers can share data in Camel?'. Or, can we solve this problem in an entirely different way?
For a JMS broker like ActiveMQ you should be able to simply use concurrent listeners on the same queue. It should do round robin but only with the consumers that are idle. So basically this should just work. Eventually you have to set the prefetch size to 1 as prefetch might cause a consumer to take messages even when a long running process will block them.
We are writing a simple application:
build thousands of SQL select statements
run each select using BeginExecuteReader
put the results into another database
We've tried a few things that either leave connections in a SUSPENDED state (as verified by sp_who2), or take a much longer time to complete than just the SQL query itself (maybe some kind of deadlocking?).
We are:
calling EndExecuteReader in the callback handler.
calling conn.Close() and conn.Dispose()
recursively starting another call
public static void StartQuery() {
// build the query for array[i]
// ...
SqlConnection conn = new SqlConnection(AsyncConnectionString);
conn.Open();
cmd.BeginExecuteReader(CallbackHandler, cmd);
i++;
}
public static void CallbackHandler(IAsyncResult ar) {
// unpack the cmd
cmd.EndExecuteReader();
// read some stuff to a DataTable...
// SqlBulkCopy to another database (synchronously)
cmd.Connection.Close();
cmd.Connection.Dispose();
StartQuery();
}
Does anyone have recommendations or links on a robust pattern to solve this type of problem?
Thanks!
I assume you did set the AsyncronousProcessing on the connection string. Thousands of BeginExecute queries pooled in CLR is a recipe for disaster:
you'll be quickly capped by the max worker threads in the SQL Server and start experiencing long connection Open times and frequent time outs.
running 1000 loads in parallel is guaranteed to be much slower than running 1000 loads sequentially on N connections, where N is given by the number of cores on the Server. Thousands of parallel requests will simply create excessive contention on shared resources and slow each other down.
You have absolutely no reliability with thousands of requests queued up in CLR. If the process crashes, you loose all the work whitout any trace.
A much better approach is to use a queue from which a pool of workers dequeue loads and execute them. A typical producer-consumer. The number of workers (consumers) will be tuned by the SQL Server resources (CPU cores, memory, IO pattern of the loads) but a safe number is 2 times the number of server cores. Each worker uses a dedicated connection for it's work. the role of the workers and the role of the queue is not to speed up the work, but on the contrary, they act as a throttling mechanism to prevent you from swamping the server.
An even better approach is to have the queue persisted in the database, as a means to recover from a crash. See Using Tables as Queues for the proper way of doing it, since table based queuing is notoriously error prone.
And finally, you can just let SQL Server handle everything, the queueing, the throttling and the processing itself via Activation. See Asynchronous Procedure Execution and the follow up article Passing Parameters to a Background Procedure.
Which one is the proper solution depends on lots of factors you know about your problem, but I don't, so I can't recommend which way should you go.
How should one ensure correctness when multiple processes access one single SQLite database file?
First, avoid concurrent access to sqlite database files. Concurrency is one of sqlite's weak points and if you have a highly concurrent application, consider using another database engine.
If you cannot avoid concurrency or drop sqlite, wrap your write transactions in BEGIN IMMEDIATE; ... END;. The default transaction mode in sqlite is DEFERRED which means that a lock is acquired only on first actual write attempt. With IMMEDIATE transactions, the lock is acquired immediately, or you get SQLITE_BUSY immediately. When someone holds a lock to the database, other locking attempts will result in SQLITE_BUSY.
Dealing with SQLITE_BUSY is something you have to decide for yourself. For many applications, waiting for a second or two and then retrying works quite all right, giving up after n failed attempts. There are sqlite3 API helpers that make this easy, e.g. sqlite3_busy_handler() and sqlite3_busy_timeout() but it can be done manually as well.
You could also use OS level synchronization to acquire a mutex lock to the database, or use OS level inter-thread/inter-process messaging to signal when one thread is done accessing the database.
Any SQLite primitive will return SQLITE_BUSY if it tries to access a database other process is accessing at the same time. You could check for that error code and just repeat the action.
Alternatively you could use OS synchronization - mutex on MS Windows or something similar on other OSes. The process will try to acquire the mutex and if someone else already holds it the process will be blocked until the other process finishes the operation and releases the mutex. Care should be taken to prevent cases when the process acquires the mutext and then never releases it.
The SQLite FAQ about exactly this
Basically you need to wrap your data access code with transactions. This will keep your data consistent. Nothing else is required.
In SQLite you are using
BEGIN TRANSACTION
COMMIT TRANSACTION
pairs to delimit your transactions. Put your SQL code in between in order to have it execute in a single transaction.
However, as previous people have commented before me - you need to pay close attention for concurrency issues. SQLite can work reasonably fast if it used for read access (multiple readers are not blocked and can run concurrently).
However - the picture changes considerably if your code interleaves write and read access. With SQLite - your entire database file will be locked if even a single writer is active.