I am taking an operating systems class where we just learned about the 'readers and writers' problem: how do you deal with multiple processes that want to read and write from the same memory (at the same time)? I'm also dealing with a version of this problem at work: I am writing an application that requires multiple users to read and write to a shared SQL server database. Because the 'readers and writers' problem seems so well understood and discussed, I'm assuming that Microsoft has solved it for me. Meaning, that I don't need to worry about setting permissions or configuring SQL server to ensure that people are not reading and writing to a database at the same time. Specifically, can I assume that, with SQL server 2005, by default:
No process reads while another process is writing
No two processes write at the same time
A writer will take an exclusive X lock on at least the row(s) they are modifying and will hold this until their transaction commits. X locks are incompatible with other X locks so two writers can not concurrently modify the same row.
A reader will (at default read committed isolation level) take a shared lock and release this as soon as the data is read. This is incompatible with an X lock so readers must wait for writing transactions to finish before reading modified data. SQL Server also has snapshot isolation in which readers are not blocked by writers but instead read an earlier version of the row.
Classic SQL Servers like MS-SQL use a pessimistic approach and lock rows, tables or pages until a writing operation is done. You really don't have to cope with that, because -- as you said -- the creator already solved that problem. Have a look at this article for some first information, any database book will cover the problem in depth. If you are interested in this topic, I would suggest reading "Database Systems" by Connolly and Begg, for example.
Related
I haven't seen any documentation around this; the high level question is, given we have a Postgres Instance (Hosted as a AWS RDS), which contains many Databases, can an issue with query execution and locking on one of the Databases cause an issue across the entire instance?
I am specifically looking for info around the query execution/locking - I appreciate that issues around memory / CPU usage could be shared as this is a shared resource.
Earlier today one of our databases had an issue where essentially, all query execution ground to a halt; upon further investigation it turned out that internally we had 8 'blocking' queries that were blocking cyclically (i.e. depended on each other to execute).
For some reason, not only did the affected DB become stuck, but so did other Databases on the instance - even those not within the lock cycle.
What parameters / constraints should we be aware of when sharing an RDS Instance between databases?
We are using AWS Postgres 11
What you describe doesn't sound like locks in the database. If several processes block one another, that is called a deadlock, and PostgreSQL resolves it automatically by canceling one of the involved transactions.
So maybe there was some lock outside the database involved, or you simply forgot to commit a transaction, and you had a live lock in the database.
Now normally this will only affect a single database, because you cannot access objects in a database different from the one you are connected to. The only exception is if you have a lock on one of the shared tables: pg_database, pg_authid, pg_tablespace and pg_shdepend. But you must be doing something rather unusual that typically requires superuser privileges to do that. You don't have those privileges on a hosted database.
The upshot is: there is nothing to consider, and what you describe should not happen. You should investigate more closely what exactly causes such a hang.
In other words, what are the steps to acquire locks ? Also, when 'WITH(Nolock)' hint is added to a query/'Read Uncommitted' Isolation level is used, does this avoid all or some of the overheads associated with acquiring locks ?
This is too big for a specific answer, but in a nutshell SQL Server will employ various types of locks depending on request being made of it. A select might acquire one type of lock and an update will acquire another.
This link has a good 101 on the subject. SQL Server locking Basics
And this one too
Another good locking read
I have gone thru the entire Microsoft site to understand the isolation levels in SQL Server 2008 R2. However before adopting one I would like to take suggestion from the experts at SO.
I have a PHP based web page primarily used as a dashboard. Users (not more than 5) will upload bulk data (around 40,000 rows) each day, and around 70 users will have ready only access to the database. Please note that I have fixed schedule for these 5 users for upload, but I want to mistake proof the same for any data loss. Help me with the below questions:
What is the best isolation level I can use?
Will the default READ COMMITTED isolation help me here?
Also is there a way to set isolation level thru SSMS for a particular database, other than TSQL statements? (universal isolation for a database)
70 users will have the download options, is there a chance that the db will get corrupted if all or most of them try to download at the same time? How do I avoid the same?
Any suggestion from experts....
Regards,
Yuvraj S
Isolation levels are really about how long shared locks on data being read are kept. But as Lieven already mention: those are NOT about preventing "corruption" in the database - those are about preventing readers and writers from getting in each other's way.
First up: any write operation (INSERT, UPDATE) will always require an exclusive lock on that row, and exclusive locks are not compatible with anything else - so if a given row to be updated is already locked, any UPDATE operation will have to wait - no way around this.
For reading data, SQL Server takes out shared locks - and the isolation levels are about how long those are held.
The default isolation level (READ COMMITTED) means: SQL Server will try to get a shared lock on a row, and if successful, read the contents of the row and release that lock again right away. So the lock only exists just for the brief period of the row being read. Shared locks are compatible to other shared locks, so any number of readers can read the same rows at the same time. Shared locks however prevent exclusive locks, so shared locks prevent mostly UPDATE on the same rows.
And then there's also the READ UNCOMMITTED isolation level - which basically takes out no locks; this means, it can also read rows that are currently being updated and exclusively locked - so you might get non-committed data - data that might not even really end up in the database in the end (if the transaction updating it gets rolled back) - be careful with this one!
The next level up is REPEATABLE READ, in which case the shared locks once acquired are held until the current transaction terminates. This locks more rows and for a longer period of time - reads are repeatable since those rows you have read are locked against updates "behind your back".
And the ultimate level is SERIALIZABLE in which entire ranges for rows (defined by your WHERE clause in the SELECT) are locked until the current transaction terminates.
Update:
More than the download part (secondary for me) I am worried about 5 users trying to update one database at the same time.
Well, don't worry - SQL Server will definitely handle this without any trouble!
If those 5 (or even 50) concurrent users are updating different rows - they won't even notice someone else is around. The updates will happen, no data will be hurt in the process - all is well.
If some of those users try to update the same row - they will be serialized. The first one will be able to get the exclusive lock on the row, do its update, release the lock and then go on. Now the second user would get its chance at it - get the exclusive lock, update the data, release lock, go on.
Of course: if you don't do anything about it, the second user's data will simply overwrite the first update. That's why there is a need for concurrency checks. You should check to see whether or not the data has changed between the time you read it and the time you want to write it; if it's changed, it means someone else already updated it in the mean time -> you need to think of a concurrency conflict resolution strategy for this case (but that's a whole other question in itself....)
Here's my situation (SQL Server):
I have a web application that utilizes nHibernate for data access, and another 3 desktop applications. All access the same database, and are likely to utilize the same tables at any one time.
Now, with the help of NH I'm batching selects in order to load an aggregate with all of its hierarchy - so I would see 4 to maybe 7 selects being issued at once (not sure if it matters).
Every few days one of the applications will get a : "Transaction has been chosen as the deadlock victim." (this usually appears on a select)
I tried changing to snapshot isolation on the database , but that didn't helped - I was ending up with :
Snapshot isolation transaction aborted
due to update conflict. You cannot use
snapshot isolation to access table
'...' directly or indirectly in
database '...' to update,
delete, or insert the row that has
been modified or deleted by another
transaction. Retry the transaction or
change the isolation level for the
update/delete statement.
What suggestions to you have for this situation ? What should I try, or what should I read in order to find a solution ?
EDIT:
Actually there's no raid in there :). The number of users per day is small (I'll say 100 per day - with hundreds of small orders on a busy day), the database is a bit bigger at about 2GB and growing faster every day.
It's a business app, that handles orders, emails, reports, invoices and stuff like that.
Lazy loading would not be an option in this case.
I guess taking a very close looks at those indexes is my best bet.
Deadlocks are complicated. A deadlock means that at least two sessions have locks and are waiting for one another to release a different lock; since both are waiting, the locks never get released, neither session can continue, and a deadlock occurs.
In other words, A has lock X, B has lock Y, now A wants Y and B wants X. Neither will give up the lock they have until they are finished with their transaction. Both will wait indefinitely until they get the other lock. SQL Server sees that this is happening and kills one of the transactions in order to prevent the deadlock. Snapshot isolation won't help you - the DB still needs to preserve atomicity of transactions.
There is no simple answer anyone can give as to why a deadlock would be occurring. You'll need to profile your application to find out.
Start here: How to debug SQL deadlocks. That's a good intro.
Next, look at Detecting and Ending Deadlocks on MSDN. That will give you a lot of good background information on why deadlocks occur, and help you understand what you're looking at/for.
There are also some previous SO questions that you might want to look at:
Diagnosing Deadlocks in SQL Server 2005
Zero SQL deadlock by design
Or, if the deadlocks are very infrequent, just write some exception-handling code into your application to retry the transaction if a deadlock occurs. Sometimes it can be extremely hard (if not nearly impossible) to prevent certain deadlocks. As long as you write transactionally-safe code, it's not the end of the world; it's completely safe to just try the transaction again.
Is your hardware properly configured (specifically RAID configuration)? Is it capable of matching your workload?
If hardware is all good and humming, you should ensure you have the 'right' indexes to match your query workload.
Many locking/deadlock problems can be eliminated with the correct indexes (covering indexes can take pressure off the clustered index during inserts).
BTW: turning on snapshot isolation will put increased pressure on your tempDB. How is tempDB configured? RAID 0 is preferred (and even better use an SSD if tempDB is a bottleneck).
While it's not uncommon to find this error in NHibernate sessions with large numbers of users, it seems to be happening too often in your case.
Perhaps your objects are very large resulting in long-running selects? And if your selects are taking too long, that might indicate problems with your indexes (as Mitch Wheat explains)
If everything is in order, you could also try Lazy Loading to postpone your selects until when you really need your data. This might not be appropriate for your exact situation so you do have to see if it works.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Many databases I've encountered (Like SQL Server) use a single file to store the entire database. This seems to be a pretty common approach. What are the advantages to storing the entire database in a single file, as opposed to breaking up the data into more logical units, such as a single table per file.
Also, how does a database work internally. How does it handle concurrent writes to the same file by different threads. In most applications I've seen you can only have 1 open write handle on a file at a time. How do the various database engines handle the concurrent writes.
A single non-fragmented large file can be treated by the server application much like a raw disk is treated by the operating system: a random-seekable block of bytes. The database server could, if it chose to, implement an entire file system on top of that block of bytes, if there was a benefit to implementing tables as separate files.
Concurrent writes to different sections of the same file are not a problem. The database uses locking strategies to make sure that multiple threads aren't trying to access the same section of the file, and this is one of the main reasons that database transactions exist: to isolate the visible effects of one transaction from another.
For example, a database server might keep track of which rows in which tables have been accessed by which in-flight transactions; when a transaction retires, the rows which it had touched are released so that they can be freely accessed by other transactions. In this scenario, other transactions might simply block - i.e. wait - when they try to access rows that are currently being part of another transaction. If the other transaction doesn't complete within a reasonable (configurable) time, then the waiting transaction might be aborted. Often the reason for this is a deadlock. The application using the database can then choose, if it wants, to retry the transaction.
This locking could be implemented using semaphores or other synchronization mechanisms, depending on the performance tradeoffs.
Right, a given file might only have one process with an open file descriptor, otherwise the different processes could overwrite each other's work. Typically all I/O on a database must be done by the RDBMS process. All applications then submit their queries through some inter-process communication (including network), and get results. The physical I/O of the database file is therefore centralized.
It's also pretty common, in practice, for RDBMS implementations to have a lock manager thread to govern access to subsections of the file, either table, page, or row, depending on the RDBMS implementation. That creates a "bottleneck" because while the RDBMS might have many threads executing queries and doing network communication, but concurrent access to a given section of the database still has to queue up to acquire locks. It'd be very tricky to make lock management fully parallel.
As for single file versus multiple file, the pros and cons also depend on the RDBMS implementation. One example is MySQL's InnoDB which by default uses the single-file approach. But it doesn't know how to shrink the file if you delete a bunch of data; it just marks some space in the file as "free," to be used by subsequent inserts. Even if you drop a whole table, the file never shrinks. But if you had chosen the file-per-table option when you set up your InnoDB table space, and you drop a table, InnoDB can remove the file for that table, and therefore free the disk space.
I think Barry's answer is quite excellent. I'll just tag a few more thoughts. Note this kind of blurs between filesystem and raw devices, which are quite different but can be conceptually thought of the same thing.
Why would a DBMS vendor roll their own I/O management etc.??
Control
When most DBMS systems grew up (Oracle, DB2, Sybase ASE {SQL Server is a cousin to Sybase ASE}) operating systems' file systems were not as advanced as they were today but were progressing rapidly (Oracle was written in 1979!!, Sybase in 1987). Assuming the OS could do all sorts of fancy things that were both fast and safe was not always a given. DBMS vendors wrote their own I/O libraries to help reduce the likelihood that they wouldn't be affected by operating system quirks or become obsolete as technology progressed.
This is much less prevalent now (MySQL, PostgreSQL, SQLite, etc. don't do this) -- even SQL Server turned a large portion of the management back over to Windows because the Windows team worked closely with SQL Server team to optimize for a DBMS workload.
Security
Keeping tight control of the entire data file allows the DBMS to ensure that writes happen when it wants it to and not when the OS feels like it. Keeping their own data caches ensures that the OS won't think that some low level log rotation job pages out important database data.
Consistency
Oracle, Sybase ASE, etc. are very expensive systems that are very complex. If you spent $10M on a DBMS install and it ran slowly (or worse, corrupted data!) because of some crazy bug in your particular revision of your OS' kernel who would you blame? The DBMS vendor. Rolling your own I/O, lock management, concurrency control, threading, etc. is certainly the hard way to do it -- but when you absolutely need repeatable, consistent behavior from your DBMS across a wide range of operating systems you have to take the OS out of the equation as much as possible.
Again, as OS have matured and grown, many of the newer systems have tried to use the OS-level features as much as possible, but even MySQL has some buffer pools that you can configure in my.cnf
A related note.
I believe it is a MSFT recommendation that you create a filegroup for your system tables and one for your other objects. Another may also be created to store indexes. We don't do this, as none of our applications demand such high performance. It would also increase complexity of maintenance.