Are there any local DB that support multi-threading? - database

I tried sqlite,
by using multi-thread, only one thread can update db at the same time..
I need multi-thread updating the db at same time.
Is there are any DB can do the job?
ps: I use delphi6.
I found that sqlite can support multi-threading,
But in my test of asgsqlite, when one thread inserting, others will fail to insert.
I'm still in testing.

SQLite can be used in multi-threaded environments.
Check out this link.

Firebird can be used in an embedded version, but it's no problem to use the standard (server) installation locally as well. Very small, easy to deploy, concurrent access. Works good with Delphi, you should look into it as an option.
See also the StackOverflow question "Which embedded database to use in a Delphi application?"

Sqlite locks the entire database when updating (unless this has changed since I last used it). A second thread cannot update the database at the same time (even using entirely separate tables). However there is a timeout parameter that tells the second thread to retry for x milliseconds before failing. I think ASqlite surfaces this parameter in the database component (I think I actually wrote that bit of code, all 3 lines, but it was a couple of years ago).
Setting the timeout to a larger value than 0 will allow multiple threads to update the database. However there may be performance implications.

since version 3.3.1, SQLite's threading requirements have been greatly relaxed. in most cases, it means that it simply works. if you really need more concurrency than that, it might be better to use a DB server.

SQL Server 2008 Express supports concurrency, as well as most other features of SQL Server. And it's free.

Why do you need multiple threads to update it at the same time? I'm sure sqlite will ensure that the updates get done correctly, even if that means one thread waiting for the other one to finish; this is transparent to the application.
Indeed, having several threads updating concurrently would, in all likelihood, not be beneficial to performance. That's to say, it might LOOK like several threads were updating concurrently, but actually the result would be that the updates get done slower than if they weren't (due to the fact that they need to hold many page locks etc to avoid problems).

DBISAM from ElevateSoft works very nicely in multi-threaded mode, and has auto-session naming to make this easy. Be sure to follow the page in the help on how to make it all safe, and job done.

I'm actually at the moment doing performance testing with a multi-threaded Java process on Sybase ASE. The process parses a 1GB file and does inserts into a table.
I was afraid at first, because many of the senior programmers warned me about "table locking" and how dangerous it is to do concurrent access to DB. But I went ahead and did testing (because I wanted to find out for myself).
I created and compared a single threaded process to a process using 4 threads. I only received a 20% reduction in total execution time. I retried the the process using different thread counts and batch insert sizes. The maximum I could squeeze was 20%.
We are going to be switching to Oracle soon, so I'll share how Oracle handles concurrent inserts when that happens.

Related

Should I keep this "GlobalConnection" or create connection for every query?

I have inherited a legacy Delphi application that uses ADO to connect to SQL Server.
The application has a notion of a "Global Connection" -- that is a single connection that it opens at the start, and then keeps open all throughout the running of the application (which can be days, weeks, or longer....)
So my question is this: Should I keep this way of doing things or should I switch to a "connect-query-disconnect" mode of doing things? Does it matter?
Switching would be a non-trivial task, but I'll do it if it means better performance, data management, etc.
Well, it depends on what you're expecting to get out of it, and what kind of application it is.
There's nothing in particular wrong with using a single long-running connection, as long as the application can gracefully handle disconnections and recover or log/notify when it can't reconnect.
The problem with a connect-query-disconnect setup is that you're adding the overhead of connecting and disconnecting on every query. That's going to slow things down, and in an interactive GUI application users may notice the additional overhead. You also have to make sure that authorization is transparently handled if it isn't already.
At the same time, there may be interactive performance gains to be had if you can push all the queries off onto background threads and asynchronously update the GUI. If contention appears because the queries are serialized, you can migrate to a connection-pool system fairly readily as well and improve things even more. This has a fairly high complexity cost to it though, so now you're looking to balancing what the gains are compared to the work involved.
Right now, my ultimate response is "if it ain't broke, don't fix it." Changes along the lines you propose are a lot of work -- how much do the users of this application stand to gain? Are there other problems to solve that might benefit them more?
Edit: Okay, so it's broke. Well, slow at least, which is all the same to me. If you've ruled out problems with the SQL Server itself, and the queries are performing as fast as they can (i.e. DB schema is sane, the right indexes are available, queries aren't completely braindead, server has enough RAM and fast enough I/O, network isn't flaky, etc.), then yes, it's time to find ways to improve the performance of the app itself.
Simply moving to a connect-query-disconnect is going to make things worse, and the more queries you're issuing the bigger the drop off is going to be. It sounds like you're going to need to rearchitect the app so that you can run fewer queries, run them in the background, cache more aggressively on the client, or some combination of all 3.
Don't forget the making the clients perform better means that server side performance gets more important since it's probably going to be handling a higher load if clients start making multiple connections and issuing multiple queries in parallel.
As mr Frazier told before - the one global connection is not bad per se.
If you intend to change, first detect WHAT is the problem. Let's see some scenarios:
1
Some screens(IOW: an set of 1..n forms to operate in a business entity) are slow. Possible causes:
insuficient filtering resulting in a pletora of records being pulled from database without necessity.
the number of records are ok, but takes too much to render it. Solution: faster controls or intelligent rendering (ex.: Virtual list views)
too much queries each time you open an screen. Possible solutions: use TClientDatasets (or any in-memory dataset) to hold infrequently modified lookup tables. An more sophisticated cache for more extensive tables or opening those datasets in other threads can improve response times.
Scrolling on datasets with controls bound can be slow (just to remember, because those little details can be easily forgotten).
2
Whole app simply slows down. Checklist:
Network cards are ok? An few net cards mal-functioning can wreak havoc even on good structured networks as they create unnecessary noise on the line.
[MSSQL DBA HAT ON] The next on the line of attack is SQL Server. Ask the DBA to trace blocks and deadlocks. Register slow queries and work on them speed up. This relate directly to #1.1 and #1.3
Detect if some naive developer have done SELECT inside transactions. In read committed isolation, it's just overhead, as it'll create more network traffic. Open the query, retrieve the data and close the dataset.
Review the database schema, if you can.
Are any data-bound operations on a bulk of records (let's say, remarking the price of some/majority/all products) being done on the app? Make an SP or refactor the operation on an query, it'll be much faster and will reduce the load of the entire server.
Extensive operations on a group of records? Learn how to do that operations at once on the server instead of one-by-one record. See an examination of most used alternatives on the MSSQL MVP Erland Sommarskog's article on array and list on MSSQL.
Beware of queries with WHERE like : WHERE SomeFunction(table1.blabla) = #SomeParam . Most of time, that ones will not use an index causing to read the entire table to select the desired data. If is a big table.... Indexing on a persisted computed columns can make miracles...[MSSQL HAT OFF]
That's what I can think of without a little more detail... ;-)
Database connections are time consuming resources to create and the rule of thumb should be create as little as possible and reuse as much as possible. That's why some other technologies have database connection pools, which are typically established at application/service startup and then kept as long as possible and shared among threads.
From your comment, the application has performances issues, but it's difficult without more details to make any recommendation.
Should try to nail down what is slow - are all queries slow or just some specific ones?
If just some specific ones is there some correlation.
My 2 cents.

Using Haskell's HDBC, can I use a single SQLite connection across multiple threads?

If I want to query (read) from an SQLite database using Haskell's HDBC across multiple threads, can I use a single connection, or should each thread have its own connection? Thanks.
I searched through the hdbc-sqlite code and found this comment:
Logic for handling counts of changes: look at the total changes before
and after the query. If they differ, then look at the local changes.
(The local change counter appears to not be updated unless really
running a query that makes a change, according to the docs.)
This is OK thread-wise because SQLite doesn't support using a given
dbh in more than one thread anyway.
The official Sqlite documentation has a whole page about this topic
The FAQ says:
(5) Can multiple applications or multiple instances of the same
application access a single database file at the same time?
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
This information would exclude both of your approaches. Perhaps you can write some tests for this to show that this information are wrong.

How does DBContext SaveChanges work internally?

Using Entity Framework with SQL Server 2008, we've got an application that writes high volumes of data, say 1000 new rows per minute, each being in their own DBContext.saveChanges call (we're not batching them together)
The issue is that our writes fall way, way behind. To the point that it seems like the thing is thrashing. For example, we'll call saveChanges with new rows a couple thousand times over two minutes, and not a single write will be made, then all of a sudden we'll get a handful of writes (but many are completely lost).
We've taken a SQL trace, and seen that SQL doesn't receive a command to write for even 10% of our saveChanges calls.
So it would seem there's an issue somewhere in between saveChanges and SQL Server. I'm wondering how this call works. Does it use thread pooling? Queueing? some buffer that we could be overrunning? Maybe its silently failing due to the volume of writes?
MSDN is pretty useless on explaining how this stuff actually works
Read the performance considerations in the msdn and also have a look at Fastest Way of Inserting in Entity Framework.
I don't know how it works internally, but with this kind of overload you better insert the data into a queue and use one or more (but limited) threads to empty the queue and write to the database. You can test and adjust the amount of threads so you won't lose data.

Implementing multithreaded application under C

I am implementing a small database like MySQL.. Its a part of a larger project..
Right now i have designed the core database, by which i mean i have implemented a parser and i can now execute some basic sql queries on my database.. it can store, update, delete and retrieve data from files.. As of now its fine.. however i want to implement this on network..
I want more than one user to be able to access my database server and execute queries on it at the same time... I am working under Linux so there is no issue of portability right now..
I know i need to use Sockets which is fine.. I also know that i need to use a concept like Thread Pool where i will be required to create a maximum number of threads initially and then for each client request wake up a thread and assign it to the client..
As for now what i am unable to figure out is how all this is actually going to be bundled together.. Where should i implement multithreading.. on client side / server side.? how is my parser going to be configured to take input from each of the clients separately?(mostly via files i think?)
If anyone has idea about how i can implement this pls do tell me bcos i am stuck here in this project...
Thanks.. :)
If you haven't already, take a look at Beej's Guide to Network Programming to get your hands dirty in some socket programming.
Next I would take his example of a stream client and server and just use that as a single threaded query system. Once you have got this down, you'll need to choose if you're going to actually use threads or use select(). My gut says your on disk database doesn't yet support parallel writes (maybe reads), so likely a single server thread servicing requests is your best bet for starters!
In the multiple client model, you could use a simple per-socket hashtable of client information and return any results immediately when you process their query. Once you get into threading with the networking and db queries, it can get pretty complicated. So work up from the single client, add polling for multiple clients, and then start reading up on and tackling threaded (probably with pthreads) client-server models.
Server side, as it is the only person who can understand the information. You need to design locks or come up with your own model to make sure that the modification/editing doesn't affect those getting served.
As an alternative to multithreading, you might consider event-based single threaded approach (e.g. using poll or epoll). An example of a very fast (non-SQL) database which uses exactly this approach is redis.
This design has two obvious disadvantages: you only ever use a single CPU core, and a lengthy query will block other clients for a noticeable time. However, if queries are reasonably fast, nobody will notice.
On the other hand, the single thread design has the advantage of automatically serializing requests. There are no ambiguities, no locking needs. No write can come in between a read (or another write), it just can't happen.
If you don't have something like a robust, working MVCC built into your database (or are at least working on it), knowing that you need not worry can be a huge advantage. Concurrent reads are not so much an issue, but concurrent reads and writes are.
Alternatively, you might consider doing the input/output and syntax checking in one thread, and running the actual queries in another (query passed via a queue). That, too, will remove the synchronisation woes, and it will at least offer some latency hiding and some multi-core.

Read-only, rather small database - ways to optimise?

I have a following design. There's a pool of identical worker processes (max 64 of them, on average 15) that uses a shared database for reading only. The database is about 25 MB. Currently, it's implemented as a MySQL database, and all the workers connect to it. This works for now, but I'd like to:
eliminate cross-process data transfer - i. e. execute SQL in-process
keep the data completely in memory at all time (I mean, 25 MB!)
not load said 25 MB separately into each process (i. e. keep it in shared memory somehow)
Since it's all reading, concurrent access issues are nonexistent, and locking is not necessary. Data refreshes happen from time to time, but these are unfrequent and I'm willing to shut down the whole shebang for those.
Access is performed via pretty vanilla SQL SELECTs. No subqueries, no joins. LIKE conditions are the fanciest feature ever used. Indices, however, are very much needed.
Question - can anyone think of a database library that would provide the goals outlined above?
You can use SQLite with its in-memory database.
I would look at treating like a cache. MEMCACHED is easy and very fast as all in memory. Fan of MongoDB or similar will also be faster although disk based.

Resources