does SQLITE needs explicit lock? - database

I have read the sqlite.org contents to know about the sqlite locking mechanism.
As per my understanding sqlite uses its own locking mechanism to prevent from multiple processes / threads writing at once on same file.
https://www.sqlite.org/lockingv3.html
https://www.sqlite.org/wal.html
Do we need explicit lock (like semaphore mutex) for write queries across one or more process?
When WAL mode enabled, If process-1 executing read query while process-2 writing to the DB , will process-1 gets db_locked/ db_busy error?

Related

Multithreaded reads and writes to a single SQlite database using the C API

My application consists of a number of threads (typically 5-10) and each is responsible for reading a value from an SQLite database, working on it for an amount of time and then writing a new value back to the database.
Each of the threads is running by itself without any synchronization.
My question is: Do I need to write the synchronization code myself or it is possible to interact with the SQLite C API in such a way that this gets taken care of for you. IE. If there is a transaction in progress to write a value and a different thread tries to write or read the same row, SQLite will block until it's okay to do so?
Do I need to write the synchronization code myself or it is possible to interact with the SQLite C API in such a way that this gets taken care of for you.
The SQLite documentation covers this. In a nutshell, SQLite has three different threading models available:
single-thread, in which all internal mutexes are disabled, and the SQLite API cannot safely be used by multiple threads at the same time;
multi-thread, in which the library can safely be used concurrently by multiple threads, but database connections can be used only by a single thread at a time; and
serialized, in which multiple threads can use the API concurrently without restriction.
There is a built-in default selected at compile time (which is "serialized" mode in a standard build). A different mode can be selected during library initialization (sqlite3_config()), and a per-connection thread mode can be specified when you open a new connection, except that single-thread mode cannot be overridden.
But note well that under all circumstances, SQLite provides a single transaction at a time per connection. Thus, you may need your own synchronization even in "serialized" mode.
If there is a transaction in progress to write a value and a different thread tries to write or read the same row, SQLite will block until it's okay to do so?
If an SQLite connection is in serialized mode and has autocommit enabled, then you're fine, in the sense that each statement will be executed in its own transaction, and different threads will not interfere with each other (but they may counteract each other). If a connection is instead in the weaker "multi-thread" mode then you must provide your own synchronization, so that different threads do not attempt to use the connection concurrently. If autocommit is disabled, then you will probably need to synchronize even in "serialized" mode to accommodate multiple threads using the same connection, else you will be unable to effectively control transaction boundaries and the contents of each transaction.
With respect to a single, established connection, there is no meaningful difference between "multi-thread" mode and "single-thread" mode.
If there is a transaction in progress to write a value and a different thread tries to write or read the same row, SQLite will block until it's okay to do so?
The easy way to do this is use mult-threaded mode, with one connection object per thread. Then threads can acquire a lock on the database with a BEGIN IMMEDIATE transaction, and if you get a SQLITE_BUSY error, the thread can either do something else for a while before trying again, or if you had set up a busy timeout ahead of time to a reasonable for your needs timeout, the thread will just sleep for up to that length of time trying to acquire the transaction lock periodically before giving up.
If you use serialized mode and a single connection for the entire program, you have to include all the logic and locking to make sure that only one particular thread can access the database when a transaction is active yourself. Much easier to use sqlite's native, well tested, support for that functionality.

Will database file be corrupted if concurrent write happened in SQLite3 WAL mode

As SQLite Doc mentioned,
WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.
Writers merely append new content to the end of the WAL file. Because writers do nothing that would interfere with the actions of readers, writers and readers can run at the same time. However, since there is only one WAL file, there can only be one writer at a time.
I've tried to simulate the scenario by sending multiple requests via JMeter as following:
https://i.stack.imgur.com/HdKBf.png
Nothing went wrong after the simulation. But my concerns are:
Does it mean in WAL mode, SQLite will handle multiple write requests itself?
Did I simulate the scenario in the right way? (Is there any better tool to do it?)
To answer your title question: No, SQLite3 in WAL-Mode will not corrupt your Database file.
Write Ahead Logging will enable you to read and write independently, but writes will still be executed one after another, even if you send them to the SQLite Database at the same time.
You can look up more advantages or disdvantages of WAL-Mode in the SQLite-Wiki - WAL

exclusive lock file on windows c webserver

I'm writing a (very small) webserver in C language on Windows.
I need exclusive lock file both on reading and writing files, i've read msdn documentation about locking etc. and I've found the function LockFileEx with the OVERLAPPED structure and an Event hEvent, I read also about how they work but the question is:
- In a web server we have lots of files, when a thread locks for example the file "test.txt"(exclusive lock) because for there was a request of this file, how can I synchronize another thread that wants to get the lock on the same file?
thank you.
Take a look at the use of mutex objects. They should solve that problem for you.
Threads that need access to the lock file can request a lock for it and be queued. When the current thread is done, it releases its lock and the next requesting thread is granted the lock.

How do filesystems handle concurrent read/write?

User A asks the system to read file foo and at the same time user B wants to save his or her data onto the same file. How is this situation handled on the filesystem level?
Most filesystems (but not all) use locking to guard concurrent access to the same file. The lock can be exclusive, so the first user to get the lock gets access - subsequent users get a "access denied" error. In your example scenario, user A will be able to read the file and gets the file lock, but user B will not be able to write while user A is reading.
Some filesystems (e.g. NTFS) allow the level of locking to be specified, to allow for example concurrent readers, but no writers. Byte-range locks are also possible.
Unlike databases, filesystems typically are not transactional, not atomic and changes from different users are not isolated (if changes can even be seen - locking may prohibit this.)
Using whole-file locks is a coarse grained approach, but it will guard against inconsistent updates. Not all filesystems support whole-file locks, and so it is common practice to use a lock file - a typically empty file whose presence indicates that its associated file is in use. (Creating a file is an atomic operation on most file systems.)
Wikipedia - File Locking
For Linux, the short answer is you could get some strange information back from a file if there is a concurrent writer. The kernel does use locking internally to run each read() and write() operation serially. (Although, I forget whether the whole file is locked or if it's on a per-page granularity.) But if the application uses multiple write() calls to write information to the file, a read() could happen between any of those calls, so it could see inconsistent data. This is an atomicity violation in the operating system.
As mdma has mentioned, you could use file locking to make sure there is only one reader and one writer at a time. It sounds like NTFS uses mandatory locking, where if one program locks the file, all other programs get error messages when they try to access it.
Unix programs generally don't use locking at all, and when they do, the lock is usually advisory. An advisory lock only prevents other processes from getting an advisory lock on the same file; it doesn't actually prevent the read or write. (That is, it only locks the file for those who check the lock.)

fcntl, lockf, which is better to use for file locking?

Looking for information regarding the advantages and disadvantages of both fcntl and lockf for file locking. For example which is better to use for portability? I am currently coding a linux daemon and wondering which is better suited to use for enforcing mutual exclusion.
What is the difference between lockf and fcntl:
On many systems, the lockf() library routine is just a wrapper around fcntl(). That is to say lockf offers a subset of the functionality that fcntl does.
Source
But on some systems, fcntl and lockf locks are completely independent.
Source
Since it is implementation dependent, make sure to always use the same convention. So either always use lockf from both your processes or always use fcntl. There is a good chance that they will be interchangeable, but it's safer to use the same one.
Which one you chose doesn't matter.
Some notes on mandatory vs advisory locks:
Locking in unix/linux is by default advisory, meaning other processes don't need to follow the locking rules that are set. So it doesn't matter which way you lock, as long as your co-operating processes also use the same convention.
Linux does support mandatory locking, but only if your file system is mounted with the option on and the file special attributes set. You can use mount -o mand to mount the file system and set the file attributes g-x,g+s to enable mandatory locks, then use fcntl or lockf. For more information on how mandatory locks work see here.
Note that locks are applied not to the individual file, but to the inode. This means that 2 filenames that point to the same file data will share the same lock status.
In Windows on the other hand, you can actively exclusively open a file, and that will block other processes from opening it completely. Even if they want to. I.e., the locks are mandatory. The same goes for Windows and file locks. Any process with an open file handle with appropriate access can lock a portion of the file and no other process will be able to access that portion.
How mandatory locks work in Linux:
Concerning mandatory locks, if a process locks a region of a file with a read lock, then other processes are permitted to read but not write to that region. If a process locks a region of a file with a write lock, then other processes are not permitted to read nor write to the file. What happens when a process is not permitted to access the part of the file depends on if you specified O_NONBLOCK or not. If blocking is set it will wait to perform the operation. If no blocking is set you will get an error code of EAGAIN.
NFS warning:
Be careful if you are using locking commands on an NFS mount. The behavior is undefined and the implementation widely varies whether to use a local lock only or to support remote locking.
Both interfaces are part of the POSIX standard, and nowadays both interfaces are available on most systems (I just checked Linux, FreeBSD, Mac OS X, and Solaris). Therefore, choose the one that fits better your requirements and use it.
One word of caution: it is unspecified what happens when one process locks a file using fcntl and another using lockf. In most systems these are equivalent operations (in fact under Linux lockf is implemented on top of fcntl), but POSIX says their interaction is unspecified. So, if you are interoperating with another process that uses one of the two interfaces, choose the same one.
Others have written that the locks are only advisory: you are responsible for checking whether a region is locked. Also, don't use stdio functions, if you want the to use the locking functionality.
Your main concerns, in this case (i.e. when "coding a Linux daemon and wondering which is better suited to use for enforcing mutual exclusion"), should be:
will the locked file be local or can it be on NFS?
e.g. can the user trick you into creating and locking your daemon's pid file on NFS?
how will the lock behave when forking, or when the daemon process is terminated with extreme prejudice e.g. kill -9?
The flock and fcntl commands behave differently in both cases.
My recommendation would be to use fcntl. You may refer to the File locking article on Wikipedia for an in-depth discussion of the problems involved with both solutions:
Both flock and fcntl have quirks which
occasionally puzzle programmers from
other operating systems. Whether flock
locks work on network filesystems,
such as NFS, is implementation
dependent. On BSD systems flock calls
are successful no-ops. On Linux prior
to 2.6.12 flock calls on NFS files
would only act locally. Kernel 2.6.12
and above implement flock calls on NFS
files using POSIX byte range locks.
These locks will be visible to other
NFS clients that implement
fcntl()/POSIX locks.1 Lock upgrades
and downgrades release the old lock
before applying the new lock. If an
application downgrades an exclusive
lock to a shared lock while another
application is blocked waiting for an
exclusive lock, the latter application
will get the exclusive lock and the
first application will be locked out.
All fcntl locks associated with a file
for a given process are removed when
any file descriptor for that file is
closed by that process, even if a lock
was never requested for that file
descriptor. Also, fcntl locks are not
inherited by a child process. The
fcntl close semantics are particularly
troublesome for applications which
call subroutine libraries that may
access files.
I came across an issue while using fcntl and flock recently that I felt I should report here as searching for either term shows this page near the top on both.
Be advised BSD locks, as mentioned above, are advisory. For those who do not know OSX (darwin) is BSD. This must be remembered when opening a file to write into.
To use fcntl/flock you must first open the file and get its ID. However if you have opened the file with "w" the file will instantly be zeroed out. If your process then fails to get the lock as the file is in use elsewhere, it will most likely return, leaving the file as 0kb. The process which had the lock will now find the file has vanished from underneath it, catastrophic results normally follow.
To remedy this situation, when using file locking, never open the file "w", but instead open it "a", to append. Then if the lock is successfully acquired, you can then safely clear the file as "w" would have, ie. :
fseek(fileHandle, 0, SEEK_SET);//move to the start
ftruncate(fileno((FILE *) fileHandle), 0);//clear it out
This was an unpleasant lesson for me.
As you're only coding a daemon which uses it for mutual exclusion, they are equivalent, after all, your application only needs to be compatible with itself.
The trick with the file locking mechanisms is to be consistent - use one and stick to it. Varying them is a bad idea.
I am assuming here that the filesystem will be a local one - if it isn't, then all bets are off, NFS / other network filesystems handle locking with varying degrees of effectiveness (in some cases none)

Resources