Approach for Change tracking mechanism with concurrent writes - sql-server

I need to implement data synchronization in a distributed system taking into account concurrent writes to the data table.
The export from Main DB should read only changed rows.
Common advice is to use triggers, marking rows with timestamp of data update or consequent revision; and tell this timestamp/revision_number to the remote system.
E.g. What is the best approach to pull "Delta" data into Analytics DB from a highly transactional DB?
But the problem with concurrent writes is in the moment of time when commit takes place. Here is a problem:
[time 00:00] Transaction A starts writing a big batch of data;
marking rows with timestamp [00:00].
[time 00:02] Transaction B starts writing a small amount of data;
going to mark rows with timestamp [00:02].
[time 00:03] Transaction B finishes writing and commit happens.
[time 00:10] Export is done to the remote part of the system.
Isolation level is ReadCommitted, so it gets know only data from transaction B, timestamp 00:02.
[time 00:15] Transaction A finishes writing and commit happens.
Remote system never receves this data(!)
I like the solution Change tracking in MSSQL, but I keep in mind our intention to migrate to PostgreSQL soon. So I have to implement a general solution.
What is the correct approach to solve this problem?

Related

How exactly streaming data to PostgreSQL through STDIN works?

Let's say I am using COPY to stream data into my database.
COPY some_table FROM STDIN
I noticed that AFTER stream had finished, database needs significant amount of time to process this data and input these variables into the table. In PgAdmin's monitoring I can see that there are nearly 0 table writes throughout streaming process and then suddenly everything writes in 1 peak.
Some statistics:
I am inserting 450k rows into one table without indexes or keys,
table has 28 fields,
I am sending all NULLs to every field
I am worried that there are problems with my implementation of streams. Is it how streaming works? Database is waiting to gather all text to then execute one gigantic command?
COPY inserts the rows as they are sent, so the data are really streamed. But PostgreSQL doesn't write them to disk immediately: rather, it only writes transaction log (WAL) information to disk, and the actual rows are written to the shared memory cache. The data are persisted later, during the next checkpoint. There is a delay between the start of COPY and actual writing to disk, which could explain what you observe.
The monitoring charts provided in pgAdmin are not fit for the purpose you are putting them to. Most of that data is coming from the stats collector, and that is generally only updated once per statement or transaction, at the end of the statement or transaction. So they are pretty much useless for monitoring rare, large, ongoing operations.
The type of monitoring you want to do is best done with OS tools, like top, vmstat, or sar.

Clarification in regards to using safe_time in YugabyteDB

The document https://docs.yugabyte.com/latest/architecture/transactions/transactional-io-path/ says that a distributed txn can choose the safe_time from one of the involved tablets, and that safe_time considers the first uncommitted raft log’s hybrid timestamp. Does this mean that yugabytedb guarantees that all txn can read the data written by the txn committed before it starts?
[Disclaimer]: This question was first asked on the YugabyteDB Community Slack channel.
There are two components to choosing a read timestamp for a snapshot isolation transaction: (1) it needs to be recent enough to capture everything that has been committed before the transaction started; and (2) it needs to be as low as possible to avoid unnecessary wait. Choosing the safe time from the first tablet that a transaction reads from or writes to is just a heuristic towards the above goal. Safe time considers the timestamp of the first uncommitted (in the Raft sense) record in that tablet's Raft log as one of the inputs, and what actually goes into safe time calculation is that uncommitted timestamp minus "epsilon" (smallest possible hybrid time step) so that that record committing will not change the view of data as of this timestamp (and also safe time is capped by the hybrid time leader lease of the tablet's leader so that we are safe against leader changes and a new leader trying to read at a new timestamp past the leader lease expiration). So, all of the above concerns the "snapshot safety" (i.e. the property that if we are reading at some time read_ht, we are guaranteed that no writes will be done to that data with timestamps <= read_ht). If safe time on a tablet has not reached a particular read_ht when a read request arrives at that tablet, the tablet will wait for it to reach read_ht before starting the read operation. Now, let's address the question how we guarantee that all the data written prior to a transaction starting is visible to that transaction. This is done through a mechanism called "read restarts" and a clock skew assumption. If a read request on behalf of a snapshot isolation transaction with a read time read_ht encounters a committed record with a commit timestamp range between read_ht and read_ht + max_clock_skew, that record might have been committed prior to the transaction starting (due to clock skew) and we have to restart the read request at the timestamp of that transaction. The way this is implemented has some optimizations: the value read_ht + max_clock_skew is only computed once per transaction and does not change with read restarts, and we call it global_limit. It is an upper bound on the commit timestamp of any transaction that could have committed prior to our transaction operation starting, and by setting read_ht = global_limit (which is actually suitable in some cases, like long-running reporting queries), we can safely avoid any read restarts. There is also another similar mechanism called local_limit, which limits the number of restarts to one per tablet. So, with read restarts, we can be sure that a read request will capture all records that were written prior to the transaction starting, even with clock skew.

How do Narayana/XA recover from TM failures?

I was trying to reason about failure recovery actions that can be taken by systems/frameworks which guarantee synchronous data sources. I've been unable to find a clear explanation of Narayana's recovery mechanism.
Q1: Does Narayana essentially employ a 2-phase commit to ensure distributed transactions across 2 datasources?
Q2: Can someone explain Narayana's behavior in this scenario?
Application wants to save X to 2 data stores
Narayana's transaction manager (TM) generates a transaction ID and writes info to disk
TM now sends a prepare message to both data stores
Each data store responds back with prepare_success
TM updates local transaction log and sends a commit message to both data stores
TM fails (permanently). And because of packet loss on the network, only one data store receives the commit message. But the other data stores receives and successfully processes the commit message.
The two data stores are now out of sync with each other (one source has an additional transaction that is not present in the other source).
When a new TM is brought up, it does not have access to the old transaction state records. So the TM cannot initiate the recovery of the missing transaction in one of the data stores.
So how can 2PC/Narayana/XA claim that they guarantee distributed transactions that can maintain 2 data stores in sync? From where I stand, they can only maintain synchronous data stores with a very high probability, but they cannot guarantee it.
Q3: Another scenario where I'm unclear on the behavior of the application/framework. Consider the following interleaved transactions (both on the same record - or at least with a partially overlapping set of records):
Di = Data source i
Ti = Transaction i
Pi = prepare message for transaction i
D1 receives P1; responds P1_success
D2 receives P2; responds P2_success
D1 receives P2; responds P2_failure
D2 receives P1; responds P1_failure
The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. Does this not mean that at high transaction speeds for a contentious record - it is possible that all transactions will keep failing (until the record experiences a lower transaction request rate)?
One could argue that we are choosing consistency over availability but unlike ACID systems there is no guarantee that at least one of the transactions will succeed (thus avoiding a potentially long-lasting deadlock).
I would refer you to my article on how Narayana 2PC works
https://developer.jboss.org/wiki/TwoPhaseCommit2PC
To your questions
Q1: you already mentioned that in the comment - yes, Narayana uses 2PC = Narayana implements the XA specification (pubs.opengroup.org/onlinepubs/009680699/toc.pdf).
Q2: The steps in the scenario are not precise. Narayana writes to disk at time of prepare is called, not at time the transaction is started.
Application saves X to 2 data stores
TM now sends a prepare message to both data stores
Each data store responds back with prepare_success
TM saves permanently info about the prepared transaction and its ID to transaction log store
TM sends a commit message to both data stores
...
I don't agree that 2PC claims to guarantee to maintain 2 data stores in sync.
I was wondering about this too (e.g. asked here https://developer.jboss.org/message/954043).
2PC claims guaranteeing ACID properties. Having 2 stores in sync is kind of what CAP consistency is about.
In this Narayana strictly depends on capabilities of particular resource managers (data stores or jdbc drivers of data stores).
ACID declares
atomicity - whole transaction is committed or rolled-back (no info when it happens, no info about resources in sync)
consistency - before and when the transaction ends the system is in consistent state
durability - all is stored even when a crash occurs
isolation - (tricky one, left at the end) - for being ACID we have to be serializable. That's you can observe transactions happening "one by one".
If I take a pretty simplified example, to show my point - expecting DB being implemented in a naive way of locking whole database when transaction starts - you committed jms message, that's processed and now you don't commit the db record. When DB works in the serializable isolation level (that's what ACID requires!) then your next write/read operation has to wait until the 'in-flight prepared' transaction is resolved. DB is just stuck and waiting. If you read you won't get answer so you can't say what is the value.
The Narayana's recovery manager then come to that prepared transaction after connection is established and commit it. And you read action returns information that is 'correct'.
Q3: I don't understand the question, sorry. But if you state that The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. then you are right, you are doomed to get failing transaction till network become more stable.

SQL server: how to read data from a table still in progress

I have two processes working simultaneously: one is generating a big table, and the other one is to read data from the table generated sequentially. I noticed that the second process has to wait for the first one finished to get started, i.e., the read operation blocks.
My question is that whether there is any way to allow the second process to read data from a table even if it is under working. Thanks!
Please, do not suggest using (nolock) for data that is not static. You may skip records or read records twice. Not good if it is financial data.
http://www.jasonstrate.com/2012/06/the-side-effect-of-nolock/
You should really take a look at my presentation 'How isolated are your sessions' - http://craftydba.com/?page_id=880.
What you are describing above is one of the isolation levels, read committed - writers block readers. It just a fact of life when dealing with transactions (sessions).
There are a bunch more isolations levels.
http://technet.microsoft.com/en-us/library/ms189122(v=sql.105).aspx
The (NOLOCK) or (READUNCOMMITTED) has three side effect, dirty reads, unrepeatable reads, and phantom reads.
How about using Read Committed Snapshot Isolation (*RCSI)?*
It is a version of Read Committed in which readers are not blocked since the version store (tempdb) keeps a copy of the records. It does not have as much of a impact as SNAPSHOT ISOLATION. Put in place some type of monitoring on version store growth.
http://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/
Like any advice at the transaction level, first test this change in a lower environment. Have a full understanding of the six isolation levels and how they effect your database & application.

To NOLOCK or NOT to NOLOCK, that is the question

This is really more of a discussion than a specific question about nolock.
I took over an app recently that almost every query (and there are lots of them) has the nolock option on them. Now I am pretty new to SQL server (used Oracle for 10 years) but yet I find this pretty disturbing. So this weekend I was talking with one of my friends who runs a rather large ecommerce site (name will be withheld to protect the guilty) and he says he has to do this with all of his SQL servers cause he will always end in deadlocks.
Is this just a huge short fall with SQL server? Is this just a failure in the DB design (mine is not 3rd level, but its close) Is anybody out there running an SQL server app without nolocks? These are issues that Oracle handles better with more grandulare recordlocks.
Is SQL server just not able to handle big loads? Is there some better workaround than reading uncommited data? I would love to hear what people think.
Thanks
SQL Server has added snapshot isolation in SQL Server 2005, this will enable you to still read the latest correct value without having to wait for locks. StackOverflow is also using Snapshot Isolation. The Snapshot Isolation level is more or less the same that Oracle uses, this is why deadlocks are not very common on an Oracle box. Just be aware to have plenty of tempdb space if you do enable it
from Books On Line
When the READ_COMMITTED_SNAPSHOT
database option is set ON, read
committed isolation uses row
versioning to provide statement-level
read consistency. Read operations
require only SCH-S table level locks
and no page or row locks. When the
READ_COMMITTED_SNAPSHOT database
option is set OFF, which is the
default setting, read committed
isolation behaves as it did in earlier
versions of SQL Server. Both
implementations meet the ANSI
definition of read committed
isolation.
If somebody says that without NOLOCK their application always gets deadlocked, then there is (more than likely) a problem with their queries. A deadlock means that two transactions cannot proceed because of resource contention and the problem cannot be resolved. An example:
Consider Transactions A and B. Both are in-flight. Transaction A has inserted a row into table X and Transaction B has inserted a row into table Y, so Transaction A has an exclusive lock on X and Transaction B has an exclusive lock on Y.
Now, Transaction A needs run a SELECT against table Y and Transaction B needs to run a SELECT against table X.
The two transactions are deadlocked: A needs resource Y and B needs resource X. Since neither transaction can proceed until the other completes, the situtation cannot be resolved: neither transactions demand for a resource may be satisified until the other transaction releases its lock on the resource in contention (either by ROLLBACK or COMMIT, doesn't matter.)
SQL Server identifies this situation and select one transaction or the other as the deadlock victim, aborts that transaction and rolls back, leaving the other transaction free to proceed to its presumable completion.
Deadlocks are rare in real life (IMHO). One rectifies them by
ensuring that transaction scope is as small as possible, something SQL server does automatically (SQL Server's default transaction scope is a single statement with an implicit COMMIT), and
ensuring that transactions access resources in the same sequence. In the example above, if transactions A and B both locked resources X and Y in the same sequence, there would not be a deadlock.
Timeouts
A timeout, on the other hand, occurs when a transaction exceeds its wait time and is rolled back due to resource contention. For instance, Transaction A needs resource X. Resource X is locked by Transaction B, so Transaction A waits for the lock to be released. If the lock isn't released within the queries timeout limimt, the waiting transaction is aborted and rolled back. Every query has a query timeout associated with it (the default value is 30s, I believe), after which time the transaction is aborted and rolled back. The query timeout can be set to 0s, in which case SQL Server will let the query wait forever.
This is probably what they are talking about. In my experience, timeouts like this usually occur in big databases when large batch jobs are updating thousands and thousands of records in a single transaction, although they can happen because a transaction goes to long (connect to your production database in Query Abalyzer, execute BEGIN TRANSACTION, update a single row in a frequently hit table in Query Analyzer and go to lunch without executing ROLLBACK or COMMIT TRANSACTION and see how long it takes for the production DBAs to go apes**t on you. Don't ask me how I know this)
This sort of timeout is usually what results in splattering perfectly innocent SQL with all sorts of NOLOCK hints
[TIP: if your going to do that, just execute SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED as the first statement in your stored procedure and have done with it.]
The problem with this approach (NOLOCK/READ UNCOMMITTED) is that you can read uncommitted data from other transaction: stuff that is incomplete or that may get rolled back later, so your data integrity is comprimised. You might be sending out a bill based on data with a high level of bogosity.
My general rule is that one should avoid the use of table hints insofar as possible. Let SQL Server and its query optimizer do their jobs.
The right way to avoid this sort of issue is to avoid the sort of transactions (insert a million rows all at one fell swoop, for instance) that cause the problems. The locking strategy implicit in relational database SQL is designed around small transactions of short scope. Lock should be small in scope and short in duration. Think "bank teller updating somebody's checking account with a deposit." as the underlying use case. Design your processes to work in that model and you'll be much happier all the way 'round.
Instead of inserting a million rows in one mondo insert statement, do the work in independent chunks and commit each chunk independently. If your million row insert dies after processing 999,000 rows, all the work done is lost (not to mention that the rollback can be a b*tch, and the table is still locked during rollback as well.) If you insert the million rows in block of 1000 rows each, committing after each block, you avoid the lock contention that causes deadlocks, as locks will be obtained and released and things will keep moving. If something goes south in the 999th block of 1000 rows, and the transaction get aborted and rolled back, you've still gotten 998,000 rows inserted; you've only lost 1000 rows of work. Restart/Retry is much easier.
Also, lock escalation occurs in large transactions. For effiency, locks escalate to larger and larger scope as the number of locks held by transaction increases. If a single transaction inserts/updates/deletes a single row in a table, I get a row lock. Keep doing that and once the number of row locks held by that transaction against that table hits a threshold value, SQL Server will escalate the locking strategy: the row locks will be consolidated and converted into a smaller number page locks, thus increasing the scope of the locks held. From that point forward, an insert/delete/update of a single row will lock that page in the table. Once the number of page locks held hits its threshold value, the page locks are again consolidated and the locking strategy escalates to table locks: the transaction now locks the entire table and nobody else may play until the transaction commits or rolls back.
Whether you can avoid functionally avoid the use of NOLOCK/READ UNCOMMITTED is entirely dependent on the nature of the processes hitting the underlying database (and the culture of the organization owning it).
Myself, I try to avoid its use as much as possible.
Hope this helps.
No, there is no need to use NOLOCK. Links: SO 1
As for load, we deal with 2000 rows per second which is small change compared to 35k TPS
Deadlocks are caused by lock contention and usually caused by inconsistent write order on tables in transactions. ORMs especially are rubbish at this. We get them very infrequently. A well written DAL should retry too as per MSDN.
In a traditional normalized OLTP environment, NOLOCK is a code smell and almost certainly unnecessary in a properly designed system.
In a dimensional model, I used NOLOCK extensively to avoid locking very large fact and dimension tables which were being populated with later fact data (and dimensions may have been expiring). In the dimensional model, the facts either never change or never change after a certain point. Similarly, any dimension which is referenced will also be static, so for example, the NOLOCK will stop your long analysis operation on yesterday's data from blocking a dimension expiration during a data load for today's data.
You should only use nolock on an unchanging table. Of course, this will be the same then as Read Committed Snapshot. Without the snapshot, you are only saving the time it takes to apply a shared lock, and then to remove it, which for most cases isn't necessary.
As for a changing table... No lock doesn't just mean getting a row before a transaction is done updating all of its rows. You can get ghost data as data pages split, or even index pages split. Or no data. That alone scared me away, but I think there may be even more scenarios where you simply get the wrong data.
Of course, nolock for getting rough estimates or to just check in on a process might be reasonable.
Basic rule of thumb -- if you care about the data at all, and the data is changing, then do not use NoLOCK.

Resources