I have a lot of small transactions. I perform them in some sequence one by one with XA JDBC driver in Oracle RAC. Is there guarantee they are committed in the same order I call them?
Update:
All is happening in a single database session.
Yes, they are.
In a RAC or non-RAC environment, when you get a successful execution of a commit, it means that the database change has been guaranteed recorded on disk (in the REDO log). When the log writer writes to the redo log, it increments the SCN (System Change Number) which is global across all instances in the RAC. You can see the database's current SCN by querying select current_scn from v$database; if you have sufficient privileges.
There are some non-default options for commit where it doesn't necessarily wait until the redo log write is complete before it returns a success. If you use them, then possibly multiple transactions can be batched within the same SCN. That would permit a commit on a different instance to grab an SCN prior to one being issued for the batch. Avoid those options, and you'll be fine.
Related
I have gone thru the entire Microsoft site to understand the isolation levels in SQL Server 2008 R2. However before adopting one I would like to take suggestion from the experts at SO.
I have a PHP based web page primarily used as a dashboard. Users (not more than 5) will upload bulk data (around 40,000 rows) each day, and around 70 users will have ready only access to the database. Please note that I have fixed schedule for these 5 users for upload, but I want to mistake proof the same for any data loss. Help me with the below questions:
What is the best isolation level I can use?
Will the default READ COMMITTED isolation help me here?
Also is there a way to set isolation level thru SSMS for a particular database, other than TSQL statements? (universal isolation for a database)
70 users will have the download options, is there a chance that the db will get corrupted if all or most of them try to download at the same time? How do I avoid the same?
Any suggestion from experts....
Regards,
Yuvraj S
Isolation levels are really about how long shared locks on data being read are kept. But as Lieven already mention: those are NOT about preventing "corruption" in the database - those are about preventing readers and writers from getting in each other's way.
First up: any write operation (INSERT, UPDATE) will always require an exclusive lock on that row, and exclusive locks are not compatible with anything else - so if a given row to be updated is already locked, any UPDATE operation will have to wait - no way around this.
For reading data, SQL Server takes out shared locks - and the isolation levels are about how long those are held.
The default isolation level (READ COMMITTED) means: SQL Server will try to get a shared lock on a row, and if successful, read the contents of the row and release that lock again right away. So the lock only exists just for the brief period of the row being read. Shared locks are compatible to other shared locks, so any number of readers can read the same rows at the same time. Shared locks however prevent exclusive locks, so shared locks prevent mostly UPDATE on the same rows.
And then there's also the READ UNCOMMITTED isolation level - which basically takes out no locks; this means, it can also read rows that are currently being updated and exclusively locked - so you might get non-committed data - data that might not even really end up in the database in the end (if the transaction updating it gets rolled back) - be careful with this one!
The next level up is REPEATABLE READ, in which case the shared locks once acquired are held until the current transaction terminates. This locks more rows and for a longer period of time - reads are repeatable since those rows you have read are locked against updates "behind your back".
And the ultimate level is SERIALIZABLE in which entire ranges for rows (defined by your WHERE clause in the SELECT) are locked until the current transaction terminates.
Update:
More than the download part (secondary for me) I am worried about 5 users trying to update one database at the same time.
Well, don't worry - SQL Server will definitely handle this without any trouble!
If those 5 (or even 50) concurrent users are updating different rows - they won't even notice someone else is around. The updates will happen, no data will be hurt in the process - all is well.
If some of those users try to update the same row - they will be serialized. The first one will be able to get the exclusive lock on the row, do its update, release the lock and then go on. Now the second user would get its chance at it - get the exclusive lock, update the data, release lock, go on.
Of course: if you don't do anything about it, the second user's data will simply overwrite the first update. That's why there is a need for concurrency checks. You should check to see whether or not the data has changed between the time you read it and the time you want to write it; if it's changed, it means someone else already updated it in the mean time -> you need to think of a concurrency conflict resolution strategy for this case (but that's a whole other question in itself....)
I have n machines writing to DB (sql server) at the exact same time (initiating a transaction). I'm setting the Isolation level to serializable. My understanding is that whichever machine's transaction gets to the DB first, gets executed and the other transactions will be blocked while this completes.
Is that correct?
It depends - are they all performing the same activities? That is, exactly the same statements executing in the same order, with no flow control?
If not, and two connections are accessing independent objects in the DB, they can run in parallel.
If there's some overlap of resources, then some progress may be made by multiple connections until one of them wants to take a lock that another already has - at which point it will wait. There is then the possibility of deadlocks.
SERIALIZABLE:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
whichever machine's transaction gets to the DB first, gets executed and the other transactions will be blocked while this completes
No, this i incorrect. The results should be as if each transaction was executed one after another (serially, hence the isolation level name). But the engine is free to use any implementation it likes, as long as it honors the guarantees of the serializable isolation model. And some engines actually implement it pretty much as you describe it, eg. Redis Transactions (although Redis has no 'isolation level' concept).
For SQL Server the transactions will execute in parallel until they hit a lock conflict. When a conflict occurs the transaction that has the lock granted continues undisturbed, while the one that requests the lock in a conflicting mode has to wait for the lock to free (for the granted transaction to commit). Which transaction happens to be the request and which one happens to be the granted is entirely up to what is being executed. That means that it well may be the case that the second machine gets the grant first and finishes first, while the first machine waits.
For a better understanding how the locking behavior differs under serializable isolation level, see Key-Range Locking
Yes, this will be true for write operations at any isolation level: "My understanding is that whichever machine's transaction gets to the DB first, gets executed and the other transactions will be blocked while this completes."
The isolation level helps determine what happens when you READ data while this is going on. Serializable read operations will block your write operations, which might be the behavior you want.
SqlCE has a parameter set on the Connect String called Flush Interval. It is defined as:
The interval time (in seconds) before all committed transactions are flushed to disk. If not specified, the default value is 10.
I thought that a committed transaction, by definition, is a transaction that has been flushed to disk, specifically the database file. If a transaction is only stored in RAM then cannot the transaction be easily lost?
I thought that transactions were first written to a log file and then applied to the database file itself, so perhaps this parameter could mean the time to wait until the transaction log is applied to the database file?
I would have thought that this parameter should be 0.
UPDATE-------
Let me put my database internals hat on. As I understand it, when an application starts a transaction a start-of-transaction record is written to a database LOG, then each added, changed, or deleted record, is written to the LOG then an end-of-transaction record. A separate thread detected the end-of-transaction and moved the records from the LOG to the DATABASE. When this was complete a transaction ID was incremented to indicate that the transaction was complete. If the process crashed anywhere, when the database started it would would check the LOG to ascertain the state of the database and either finish or roll back open transactions. All of this implies that work is written to disk at all steps of the process.
If Flush Interval was the time to write from the LOG to the DATABASE then everything makes sense, but if the transaction is held in RAM not a LOG then the database cannot be ACID compliant.
With newer versions, the Commit operation is overloaded. If you call Commit with a CommitMode.Immediate parameter, the Flush-Interval setting is ignored, and changes are persisted to the file immediately. The default option is CommitMode.Deferred (in the parameter-less call) which is based on the Flush-Interval value.
Refer to my post on SQLCE Corruptions: resolving corruption in SQL Server Compact Edition database files
With older versions there were indeed corruption problems. Worst case scenario with the more recent versions is missing data.
You cannot set the parameter to zero, though. 1 is the minimum.
I am looking into using log shipping in a SQL Server 2005 environment. The idea was to set up frequent log shipping to a secondary server. The intent: Use the secondary server to serve report queries, thereby offloading the primary db server.
I came across this on a sqlservercentral forum thread:
When you create the log shipping you have 2 choices. You can configure restore log operation to be done with norecovery or with standby option. If you use the norecovery option, you can not issue select statements on it. If instead of norecovery you use the standby option, you can run select queries on the database.
Bear in mind with the standby option when log file restores occur users will be kicked out without warning by the restore process. Acutely when you configure the log shipping with standby option, you can also select between 2 choices – kill all processes in the secondary database and perform log restore or don’t perform log restore if the database is being used. Of course if you select the second option, the restore operation might never run if someone opens a connection to the database and doesn’t close it, so it is better to use the first option.
So my questions are:
Is the above true? Can you really not use log shipping in the way I intend?
If it is true, could someone explain why you cannot execute SELECT statements to a database while the transaction log is being restored?
EDIT:
First question is duplicate of this serverfault question. But I still would like the second question answered: Why is it not possible to execute SELECT statements while the transaction log is being restored?
could someone explain why you cannot
execute SELECT statements to a
database while the transaction log is
being restored?
Short answer is that RESTORE statement takes an exclusive lock on the database being restored.
For writes, I hope there is no need for me to explain why they are incompatible with a restore. Why does it not allow reads either? First of all, there is no way to know if a session that has a lock on a database is going to do a read or a write. But even if it would be possible, restore (log or backup) is an operation that updates directly the data pages in the database. Since these updates go straight to the physical location (the page) and do not follow the logical hierarchy (metadata-partition-page-row), they would not honor possible intent locks from other data readers, and thus have the possibility to change structures as they are read. A SELECT table scan following the page next-prev pointers would be thrown into disarray, resulting in a corrupted read.
Well yes and no.
You can do exactly what you wish to do, in that you may offload reporting workloads to a secondary server by configuring Log Shipping to a read only copy of a database. I have set this type of architecture up on a number of occasions previously and it works very well indeed.
The caveat is that in order to perform a restore of a Transaction Log Backup file there must be no other connections to the database in question. Hence the two choices being, when the restore process runs it will either fail, thereby prioritising user connections, or it will succeed by disconnecting all user connection in order to perform the restore.
Dependent on your restore frequency this is not necessarily a problem. You simply educate your users to the fact that, say every hour at 10 past the hour, there is a possibility that your report may fail. If this happens simply re-run the report.
EDIT: You may also want to evaluate alternative architeciture solutions to your business need. For example, Transactional Replication or Database Mirroring with a Database Snapshot
If you have enterprise version, you can use database mirroring + snapshot to create read-only copy of the database, available for reporting, etc. Mirroring uses "continuous" log shipping "under the hood". It is frequently used in scenario you have described.
Yes it's true.
I think the following happens:
While the transaction log is being restored, the database is locked, as large portions of it are being updated.
This is for performance reasons more then anything else.
I can see two options:
Use database mirroring.
Schedule the log shipping to only occur when the reporting system is not in use.
Slight confusion in that, the norecovery flag on the restore means your database is not going to be brought out of a recovery state and into an online state - that is why the select statements will not work - the database is offline. The no-recovery flag is there to allow you to restore multiple log files in a row (in a DR type scenario) without bringing the database back online.
If you did not want to log ship / have the disadvantages you could swap to a one way transactional replication, but the overhead / set-up will be more complex overall.
Would peer-to-peer replication work. Then you can run queries on one instance and so save the load on the original instance.
So I know that autocommit commits every sql statement, but do updates to the database go directly to the disk or do they remain on cache until flushed?
I realize it's dependent on the database implementation.
Does auto-commit mean
a) every statement is a complete transaction AND it goes straight to disk or
b) every statement is a complete transaction and it may go to cache where it will be flushed later or it may go straight to disk
Clarification would be great.
Auto-commit simply means that each statement is in its own transaction which commits immediately. This is in contrast to the "normal" mode, where you must explicitly BEGIN a transaction and then COMMIT once you are done (usually after several statements).
The phrase "auto-commit" has nothing to do with disk access or caching. As an implementation detail, most databases will write to disk on commit so as to avoid data loss, but this isn't mandatory in the spec.
For ARIES-based protocols, committing a transaction involves logging all modifications made within that transaction. Changes are flushed immediately to logfile, but not necessarily to datafile (that is dependent on the implementation). That is enough to ensure that the changes can be recovered in the event of a failure. So, (b).
Commit provides no guarantee that something has been written to disk, only that your transaction has been completed and the changes are now visible to other users.
Permanent does not necessarily mean written to disk (i.e. durable)... Even if a "commit" waits for the transaction to complete can be configured with some databases.
For example, Oracle 10gR2 has several commit modes, including IMMEDIATE,WAIT,BATCH,NOWAIT. BATCH will queue the buffer the changes and the writer will write the changes to disk at some future time. NOWAIT will return immediately without regard for I/O.
The exact behavior of commmit is very database specific and can often be configured depending on your tolerance for data loss.
It depends on the DBMS you're using. For example, Firebird has it as an option in configuration file. If you turn Forced Writes on, the changes go directly to the disk. Otherwise they are submitted to the filesystem, and the actual write time depends on the operating system caching.
If the database transaction is claimed to be ACID, then the D (durability) mandates that the transaction committed should survive the crash immediately after the successful commit. For single server database, that means it's on the disk (disk commit). For some modern multi-server databases, it can also means that the transaction is sent to one or more servers (network commit, which are typically much faster than disk), under the assumption that the probability of multiple server crash at the same time is much smaller.
It's impossible to guarantee that commits are atomic, so modern databases use two-phase or three phase commit strategies. See Atomic Commit