I have a single threaded application that only reads data from an encrypted SQLite database. The encryption is accomplished with the SqlCipher security extension. This application does no writing to this database as it is for reference data only. We run multiple copies of this application in parallel and all using the same database. With two simultaneous copies we had no problem. When we increased to eight copies we began to get consistent "database is locked" errors. We were opening the database with the SQLITE_OPEN_READWRITE flag. We switched to the SQLITE_OPEN_READONLY flag and observed no fewer errors.
This just doesn't seem right. Why is there an exclusive database lock when there are no write operations at all?
I'm currently learning how to create databases & tables using ADO from VBA (running from Excel). I am able to create a database either of format .mdb (using connection "Microsoft.Jet.OLEDB.4.0;") or .accdb (using connection "Microsoft.Ace.OLEDB.12.0;") - and I am able to create tables and insert records. So far so good.
I notice, however, that after I create such a database via ADO, if I open it in MS Access, the file size reduces dramatically. Note: just the action of opening it in Access has this effect. I'm not making any changes to the data there.
An example: a freshly-created database from ADO is 1304k; then after opening in Access it's 816k.
As far as I can see, no data is being lost when it is opened in Access. I can only guess that Access is somehow automatically optimising the database when it's opened, or otherwise getting rid of unnecessary stuff which was inserted by ADO. Obviously, I would prefer the file size to be as small as possible. So I'm wondering:
1) Does anyone know what's going on here?
2) If ADO is inserting unnecessary data, is there any way I can optimise it programmatically? My end users will be running Excel only, so I can't ask them to periodically open the database in Access just to optimise the data.
Without knowing specifically how you are creating tables, importing records, structures and schema; or your environment 2000/02/03 mdb or 2007/10/13 accdb version, machine (32-bit/64-bit, laptop/desktop), Windows OS (XP, Vista, 7, 8, 10), single CPU or LAN, one can only speculate.
First it is important to understand the strange, hard to define program that is MS Access. In essence, MS Access is a suite of connected objects: Jet/ACE SQL engine, GUI front-end application, report generator, and IDE coding interface. It is not actually a database but ships by default to the ACE engine which by the way is not restricted to MS Access but is wholly a Windows technology (.dll files) available for all Office and other PC applications. Excel can run accdb/mdb files without Access even installed! When compared to other databases such as another file-server (popular, open-source) counterpart, SQLite, and client server (SQL Server, MySQL, PostgreSQL) it is really the Jet/ACE component that is the relational engine being compared. Interestingly, Access can connect to all aforementioned RDBMS's by switching out its default.
With that said, the multifaceted nature of MS Access makes use of temporary objects in both SQL backend engine and VBA frontend, interacting with both hard disk and memory, especially in importing records and various queries. Even the MSN website mentions it on its Compact and repair page.
Access creates temporary, hidden objects to accomplish various tasks.
Sometimes, these temporary objects remain in your database after
Access no longer needs them.
Furthermore, make-table and action queries (append/update/delete) actually copies the entire resultset before committing final changes. Hence, users are prompted about the number of records added before changes are finalized with opportunity to rollback the action. So after migrating data, your database may have returned slightly to former state. Then, there's the garbage collection in VBA that releases memory when objects are no longer in use and OLEDB driver discontinued connection. It may be you witness some difference in file size as some space was recovered after your VBA processing finalized and went out of scope. I wonder at what instance you viewed the larger sized Access file? Would a refreshed Windows Explorer or CPU restart change the view? Did you read properties of file (from right-click) or details section of Explorer which differ in approximation? Do you witness other file size changes like in Excel workbooks? Is this a regularly occurrence or anomaly?
Managing the database creation process purely in code may the most efficient way to use the database instead of using Access' graphical user interface as multiple-user access locking and application objects add some overhead.
All in all, it is not likely ADO adds any data or components without your development knowledge. Regularly, decompile, compact, and explicitly remove VBA objects (i.e., set obj = Nothing). See helpful performance tips. Also, don't focus too much on file size but performance and integrity as over the course of app, file size will be a fluctuating metric. One final note, though Excel is very popular and easy to use, as mentioned about the powerful native components of MS Access, consider developing your end use needs in Access (free runtimes are available via Microsoft, allowable since Jet is a Windows technology). In every aspect, Access provides a more stable multiple user application and automation environment.
When you open an "Access database file" in Access, it adds/creates/updates a "Access application project" inside the database file. It's basically just adding some extra blobs/tables/data inside the database if they are/it is missing. And it checks the file and truncates the file length to the correct file length.
When it does this, it may be correcting a file length error, or discarding some unused cruft, or just adjusting the amount of space that was reserved for just this purpose. But whatever, it is actually doing something that can't be done any other way, and it is using different defaults than you get any other way.
In older versions, when it did this, it always made the file bigger. Now, on your computer, using your versions, it's making the file smaller. This may be because of bugs in the original file creation, or it may be "behavior is by intent". You can't change that, so you shouldn't worry about it.
But it's not always a good idea to truncate unused file space: if you are going to add data to the file, you just have to request more file space. And the different version libraries for mdb files had different ideas about what the "best" way to compact a database file was, and would give different file lengths.
Having said that, it used to be possible to compact & repair a database file (mdb) using JRO or DAO. That would discard unused objects (compact) and correct the file length and delete hanging references (repair). I'm not aware of any similar functionality for ACE, but I haven't looked.
Access might be compacting the database when you close it. There is a setting called Compact on Close that determines whether the database will automatically be compacted whenever you exit out of it.
Microsoft Office Support - Compact and repair a database
I am taking the backup of SQLite DB using cp command after running wal_checkpoint(FULL). The DB is being used in WAL mode so there are other files like -shm and -wal in my folder.
When I run wal_checkpoint(FULL), the changes in WAL file get committed to the database. I was wondering whether -wal and -shm files get deleted after running a checkpoint. If not, then what do they contain ?
I know my backup process is not good since I am not using SQLite backup APIs. This is a bug in my code.
After searching through numerous sources, I believe the following to be true:
The -shm file contains an index to the -wal file. The -shm file improves performance when reading the -wal file.
If the -shm file gets deleted, it get created again during next database access.
If checkpoint is run, the -wal file can be deleted.
To perform safe backups:
It is recommended that you use SQLite backup functions for making backups. SQLite library can even make backups of an online database.
If you don't want to use (1), then the best way is to close the database handles. This ensures a clean and consistent state of the database file, and deletes the -shm and -wal files. A backup can then be made using cp, scp etc.
If the SQLite database file is intended to be transmitted over a network, then the vacuum command should be run after checkpoint. This removes the fragmentation in the database file thereby reducing its size, so you transfer less data through network.
The -shm file does not contain any permanent data.
When the last connection is closed, the database is automatically checkpointed, and the -wal file is then deleted.
This implies that after a checkpoint, and if no other connections exist, the -wal file does not contain any important data.
If possible, you should close the connection before taking the backup.
I have studied a lot how durability is achieved in databases and if I understand well it works like this (simplified):
Clent's point of view:
start transaction.
insert into table values...
commit transaction
DB engine point of view:
write transaction start indicator to log file
write changes done by client to log file
write transaction commit indicator to log file
flush log file to HDD (this ensures durability of data)
return 'OK' to client
What I observed:
Client application is single thread application (one db connection). I'm able to perform 400 transactions/sec, while simple tests that writes something to file and then fsync this file to HDD performs only 150 syncs/sec. If client were multithread/multi connection I would imagine that DB engine groups transactions and does one fsync per few transactions, but this is not the case.
My question is if, for example MsSQL, really synchronizes log file (fsync, FlushFileBuffers, etc...) on every transaction commit, or is it some other kind of magic behind?
The short answer is that, for a transaction to be durable, the log file has to be written to stable storage before changes to the database are written to disk.
Stable storage is more complicated than you might think. Disks, for example, are not usually considered to be stable storage. (Not by people who write code for transactional database engines, anyway.)
It see how a particular open source dbms writes to stable storage, you'll need to read the source code. PostgreSQL source code is online. (File is xlog.c) Don't know about MySQL source.
In Sql Server, I find that, when I mark a database as read only, its existing large transaction log remains. To fix this I have to set it back to writeable, then dbcc shrinkfile on the log file, then set it read-only again.
What is the use of a transaction log if the database is read only? Is there a reason it doesn't just get deleted/flushed?
If your database is still in full recovery mode, the log will not shrink without a proper backup. You should switch your read-only database to simple recovery.
Also, the log file is needed should you ever decide to detach/attach this database. As noted here:
In contrast, for a read-only database,
the log cannot be rebuilt because the
primary file cannot be updated.
Therefore, when you attach a read-only
database whose log is unavailable, you
must provide the log files or files in
the FOR ATTACH clause.
There is no use for a transaction log on a read-only db :-) Assuming you are past some point at which you no longer care about the transactions done to populate it. I imagine it's not flushed because that would not be good default behaviour, imagine if you wanted to go back to write mode? Or you discovered during read-only operation that there was a problem and you needed to use the logs?