I have a project right now where I'd like to be able to pull rows out of an Access database that a 3rd party product uses to store its information. There will likely be a small number of users hitting this database at the same time my "export" process does, so I'm a little concerned about data integrity and concurrent access.
Will I likely run into problems with my .NET import process (using LINQ/ADO.NET/?) when it is trying to pull data out of the MDB at the same time someone else is saving a row? How does Access's locking work?
There should no problem. Problems can occur only on concurrent write operations. The locking from MS Access based on file locks in the ldb file. The locks occur only on pages and not on the completely file. Because the locks are in the ldb file and not in the mdb file that there are no problems with parallel reading.
In previous workings with Access (back when I was using 2003 for things) the only thing I ran into was that occasionally a read would lock rows just above and below the current read. However, I believe this may have been an isolated issue with our application.
When you open the database, do not attempt to open in read-only mode (although you might think it makes sense). When you are the first user in, Access opens the mdb file in read-only mode and does not create an ldb, forcing all subsequent users to be in read-only mode as well.
Related
I have a database file, Only one user will open it for read only.
I search for best performance.
Is it good to open it in exclusive read only mode or just open it ?
If a network is NOT involved (muti-user), then you not see really much of any difference in performance.
However, on a network? Then exclusive can make a large difference.
In fact, this is why we seen for 18+ years the OFTEN repeated performance tip to create + keep a persistent connection from the front end application to the back end file share open at ALL times.
The reason for above is that when an access front end attempts to open a table (not open the back end file), Access will attempt to gain exclusive use of the back end file. It does this, since in multi-user mode, several users can have pending updates, and thus buffers etc. require additional management, and “op locks” on the back end occur. Op-locks are part of the windows file system that allow multiple users to open and work on the file at the same time. While this setup allows multiple users to work at the same time on the file, there is significant slowdowns due to each workstation potentially having buffers of active data that has to be written to the back end, and MORE important ensure that two users don’t update the same records at the same time. So, this flipping into multi-user mode can take significant amounts of time, and EACH TIME access tries to grab data, it will attempt to flip OUT of multi-user mode to ensure really great performance.
However, this attempt to change from multi-user mode to single user mode can take HUGE amounts of time. So, by using a persistent connection, (force to keep open) to the back end, then this LONG delay time does not occur. In addition with virus scanning software, then opening of the file can take HUGE amounts of time.
The above issue only really applies to a shared back end in a multi-user environment. And of course in such scenarios, then opening the database as exclusive would eliminate the above issues.
So Access does in fact flip between single user mode, and multi-user mode. This use of the windows file locking system will as noted MOST certainly slow down Access when it flips to multi-user mode (and worse, as noted the attempt to flip back and forth actually slows things down more than the performance gain you get had only one user open the file). Again, with one user you not notice any difference, but if 2 or more users have the back end file open, then you will see significant delays with 2 or more users. (But the persistent connection will usually remove the delays).
So, for single user mode on a single machine?
Opening normal, or exclusive likely will not change anything, and you likely not notice any performance difference. (Even read only as opposed to read/write will not be noticed.
However, the instant you introduce a network, and multiple users opening that file share from a network folder, then significant advantages can be had and seen by use of exclusive mode. As noted, the exclusive mode helps prevent the switching between single user mode on the network (for better performance), and switching to mutli-user mode (with the client workstation buffers that have write pending data). So do keep in mind that Access WILL attempt to open the back end as single user mode (this is not exclusive mode – but single user mode for better performance over the network).
However, overall, the cost of switching between the two modes tends to be the greater slowdown, then that of the speed advantages when working with a file share over a network.
So, exclusive mode when a network is involved will often result in substantial performance increases.
ON a local machine with no network? Because one has so much processing power, then opening as full read/write, or read only does not usually result in any noticeable performance gains.
So, the answer here only really applies if you are working Access files over a network. For local un-split databases on a single workstation you not gain anything of notice regardless of what mode you open the database in, or even in fact if you use a snapshot query (read only) as opposed to a query that has full read/write results.
Question : Is keeping a lock on a record for a long period of time common practice with modern database systems ?
My understanding is locking records in a database (optimistic or pessimistic) is usually for very short period of time during a transaction.
The software I'm working with right now keeps locks on records for long periods of time :
A lock is kept on the record of the logged in user (in the ACTIVE_USERS' table) for the whole time the user is logged in the software.
Let say USER A is working on a file. The record corresponding to the file is locked until USER A saves the file or exit the file. So if a colleague, USER B tries to work on the same file, a popup shows up saying 'You can't work on this file because USER A is working on it right now'.
The company I'm working for to implement compatibility with Microsoft SQL Server wants the changes to be minimal : so I need to implement such a locking mechanism. I've hacked something that is working on a minimal test project but I'm not sure it is up to the industry and MSSQL's standards ...
This is a bit long for a comment.
Using the database locking mechanism for this application-level locking seems unusual. Database locks could be on the row, page, or table level, and they also affect indexes, so there could be unexpected side effects. Obviously, a proliferation of locks also makes deadlocks much more likely.
Normally, application locks would be handled on the record level. Using flags (of some sort) in the record, the application would ensure that only one row would have access to the file.
I would say, it might work. But I would never design a system that way and I'd be wary of unexpected consequences.
I have a big SQLite database to process, so I would like to use MPI for parallelization to accelerate the speed. What I want to do is sending a database from root to every slave, and sending the modified databases to root after slave add some table into it. I want to use MPI_Type_create_struct to create a datatype to store database, but the database is too complicated. IS there any other way to handle this situation? Thank you in advance!
I recently dealt with a similar problem - I have a large MPI application that uses SQLite as a configuration store. Handling multi-process writes is a challenge with an embedded SQL database. My experience with this involves a massively parallel application (running up to 65,535 ranks) with a shared filesystem.
Based on the FAQ from SQLite and some experience with database engines, there are a few ways to approach this problem. I am making the assumption that you are operating with a shared distributed file system, and multiple separate computers (a standard HPC cluster setup).
Since SQLite will block when multiple processes write to the database (but not read), reads will most likely not be an issue. Each process can run multiple SELECT commands at the same time without issue.
The challenge will be in the writing. Disk I/O is several orders of magnitude slower than computation, so generally this will be the bottleneck. Having said that, network communication may also be a significant slowdown, so how you approach the problem really depends on where the weakest link of your running environment will be.
If you have a fast network and slow disk speed, or if you want to implement this in the most straightforward way possible, your best bet is to have a single MPI rank in charge of writing to the database. Your compute processes would independently run SELECT commands until computation was complete, then send the new data to the MPI database process. The database control process would then write the new data to disk. I would not try to send the structure of the database across the network, rather I would send the data that should be written, along with (possibly) a flag that would identify what table/insert query the data should be written with. This technique is sort of similar to how a RDBMS works - while RDBMS servers do support concurrent writes, there is a "central" process in control of the ordering of write operations.
One thing to note is that if a process writes to the SQLite database, the file is locked for all processes that are trying to read or write to it. You will need to either handle the SQLITE_BUSY return code in your worker processes, register a callback to handle this, change the busy behavior, or use an alternate technique. In my application, I found that loading the database as an in-memory database, (https://www.sqlite.org/inmemorydb.html) for the readers provided a good workaround. Readers access the in-memory database, but sent results to the controlling process for writes. The downside is that you will have multiple copies of the database in memory.
Another option that might be less network intensive is to do the reads concurrently and have each worker process write out to their own file. You could write out to separate SQLite database files, or even export something like CSV (depending on the complexity of the data). When writes are complete, you would then have a single process merge the individual files into a single result database file - see How can I merge many SQLite databases?. This method has its own issues, but depending on where your bottlenecks are and how the system as a whole is laid out, this technique may work.
Finally, you might consider reading from the SQLite database and saving the data to a proper distributed file format, such as HDF5 (or using MPI IO). Once the computation is done, it would be pretty straightforward to write a script that would create a new SQLite database from this foreign file format.
We want to add a load of records to an AS400 database file using COBOL several times a day. This file is also continually being updated and added to by 30 users through an online cobol screen, (albeit different records). My initial thought on this is one of horror, but is the File Sharing on the AS400 robust enough to handle this kind of multi threading, or does one task lock the file and release it when it has finished.
I'm an RPG programmer. I routinely have several hundred jobs adding, changing and deleting records from the same table all day long, for decades.
IBM i file sharing works very well - so well I never even think about it. There are a few tasks which require exclusive access to a file - backup & restore, for instance - but the sort of I/O that application programs perform works quite well with the typical 'shared update' access.
The COBOL application might lock records it has read until they are updated or released, or if processing under commitment control, until a commit or rollback.
I have an application that uses a SQL FILESTREAM to store images. I insert a LOT of images (several millions images per days).
After a while, the machine stops responding and seem to be out of memory... Looking at the memory usage of the PC, we don't see any process taking a lot of memory (neither SQL or our application). We tried to kill our process and it didn't restore our machine... We then kill the SQL services and it didn't not restore to system. As a last resort, we even killed all processes (except the system ones) and the memory still remained high (we are looking in the task manager's performance tab). Only a reboot does the job at that point. We have tried on Win7, WinXP, Win2K3 server with always the same results.
Unfortunately, this isn't a one-shot deal, it happens every time.
Has anybody seen that kind of behaviour before? Are we doing something wrong using the SQL FILESTREAMS?
You say you insert a lot of images per day. What else do you do with the images? Do you update them, many reads?
Is your file system optimized for FILESTREAMs?
How do you read out the images?
If you do a lot of updates, remember that SQL Server will not modify the filestream object but create a new one and mark the old for deletion by the garbage collector. At some time the GC will trigger and start cleaning up the old mess. The problem with FILESTREAM is that it doesn't log a lot to the transaction log and thus the GC can be seriously delayed. If this is the problem it might be solved by forcing GC more often to maintain responsiveness. This can be done using the CHECKPOINT statement.
UPDATE: You shouldn't use FILESTREAM for small files (less than 1 MB). Millions of small files will cause problems for the filesystem and the Master File Table. Use varbinary in stead. See also Designing and implementing FILESTREAM storage
UPDATE 2: If you still insist on using the FILESTREAM for storage (you shouldn't for large amounts of small files), you must at least configure the file system accordingly.
Optimize the file system for large amount of small files (use these as tips and make sure you understand what they do before you apply)
Change the Master File Table
reservation to maximum in registry (FSUTIL.exe behavior set mftzone 4)
Disable 8.3 file names (fsutil.exe behavior set disable8dot3 1)
Disable last access update(fsutil.exe behavior set disablelastaccess 1)
Reboot and create a new partition
Format the storage volumes using a
block size that will fit most of the
files (2k or 4k depending on you
image files).