Synchronous Replication Setup on 2 Postgresql 9.2.1.4 machines - database

I am running synchronous replication on 2 Postgresql 9.2.1.4 machines (master and slave)
Here is the configuration:
Master Parameters
synchronous_commit=on
synchronous_standby_names = '*'
no synchronous_replication_timeout parameter, so 10 sec by default
no synchronous_replication parameter, so async by default
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
hot_standby = on
Slave Parameters
no synchronous_commit, so by default on
no synchronous_replication_service parameter, so by default async
max_wal_senders = 5
wal_keep_segments = 32
hot_standby = on
The application inserts records on Master and reads the from Master or Slave by using pgpool. Sometimes it happens that just after inserting the records the application does not see the inserted records (probably by reading from another db host as inserted ),
but when we check it afterwords the records are there in the database.
On
http://wiki.postgresql.org/wiki/Synchronous_replication#SYNCHRONOUS_REPLICATION_OVERVIEW
I found:
"If no reply is received within the timeout we raise a NOTICE and then return successful commit (no other action is possible)."
My Questions
a) Does it really mean that if the synchronous_replication_timeout
(which is 10 second by default) on Master is exceeded and in any of three cases where
the data did not reach the Slave or
the Transaction was not commited on Slave or
the Transaction was rolled back on Slave,
that the Master commits the transaction but the slave not at all?
If so then the transaction does not seem to be really synchronous...
b) What if I set on Master synchronous_replication_timeout=0 Will
Master wait infinitly for Slave to Commit or Rollback and in case
slave commits, master commis too in case slave rollbacks, master
rollbacks too?
What values should I set in
synchronous_replication (on master)
= async (def) | recv | fsync | apply
and
synchronous_replication_service (on Slave)
= async (def) | recv | fsync | apply
in order to ensure I do have propper synchronous replication setup
(so I am sure that data is commited on both servers or rolled back on both)
Shoud they both be set to apply?
Is there any option to ensure that by using synchronous replication
on PosgreSQL 9.1.4 the data are commited on both master and slave
are commited at the same time?

The wiki page you referenced currently describes a patch implementing synchronous replication which wasn't committed, see here if you're interested:
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02484.php
So the questions you have about GUCs "synchronous_replication_timeout" or "synchronous_replication_service" aren't relevant to released versions of PostgreSQL, since the version of synchronous replication which was eventually committed differed substantially from the one described in that wiki page. Sorry about that, and I'll see about getting that wiki page cleaned up. The information you want is at:
http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION

Related

Row locking behaviour while updating

In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.

Which transaction level is best suited to read records from an application event log table?

I'm implementing a background process for moving event log records from SQL database to mongoDB.
Event log / audit trail entries are known to change only once at the end of the event. The process is like this:
1) an event log entry gets created to fixate that a business process has been initiated
2) a new transaction is started for the business process
3) audit trail entries are created
4) attempts to update the event log entry with successful status
5) the transaction completes
6) if the transaction fails - updates the event log entry as failed
So, theoretically, the background process could safely read all the log entries that have been already marked as failed/successful and, most probably, after specific timeout could also safely read the entries that for unknown reason are stuck in "Process was started" state.
To avoid any locks on the event log table while the background process is reading event log records in batches, I would like to use a relaxed isolation level but I'm not sure which one would be safe to use on a table that will have lots of parallel inserts and updates occurring constantly (albeit updates only on the records that my background process will ignore; so I don't care about dirty reads).
In my case it seems to be acceptable to miss the records that are being inserted right now (I'll get them in the next background job run anyway) but it is not acceptable to get duplicate/missing records among the older records that aren't being updated right now.
P.S. You might ask - why not log directly to mongoDB? There are two reasons: 1) the database has many triggers and stored procs that log to SQL table and the customer doesn't want to reimplement all of that; 2) the customer wants to ensure atomicity of event log/audit trail with the transaction of the business process and he's afraid that with direct journaling to mongoDB there might be cases when for some (most probably - very critical) reason event log entries go missing while the SQL transaction has succeeded and the data was changed. It's not trivially possible to include writing to mongoDB in a single atomic unit of work with an SQL transaction.

Distribution Clenup in Transaction Replication

I have setup transaction Replication PULL type between SQL servers.
But, my distribution cleanup job is not removing any data from MS_replCommands and repltransaction tables.
I have set Immediate_Snyc and allow_anonymous to 0.
Distribution Job Detail:
Query:
EXEC dbo.sp_MSdistribution_cleanup #min_distretention = 0, #max_distretention = 72
JOB result:
Executed as user: NT SERVICE\SQLSERVERAGENT. Removed 0 replicated transactions consisting of 0 statements in 0 seconds (0 rows/sec). [SQLSTATE 01000] (Message 21010). The step succeeded.
note: When I have set Immediate_Snyc to 1 and tried then it worked, but why not with 0 as on other server I have set 0 and it's working.
Please help me.
Strange- the expected behaviour is that if immediate_sync is "true", then the distribution database will hold transaction data for the set maximum retention period, so that the current, and any new subscribers can get the baseline snapshot + transactions necessary to "catch up". You'd expect the distribution database to hold data for the max retention period (72 hours in your case).
If it's set to "false" then any new subscribers will need a new snapshot, however any distributed commands will be cleared from the distribution database by the cleanup job.
Double check that all your subscribers are receiving transactions, and do you have anonymous subscriptions enabled?

What SQL Server 2005/2008 locking approach should I use to process individual table rows in multiple server application instances?

I need to develop a server application (in C#) that will read rows from a simple table (in SQL Server 2005 or 2008), do some work, such as calling a web service, and then update the rows with the resulting status (success, error).
Looks quite simple, but things get tougher when I add the following application requisites:
Multiple application instances must be running at the same time, for Load Balancing and Fault Tolerance purposes. Typically, the application will be deployed on two or more servers, and will concurrently access the same database table. Each table row must be processed only once, so a common synchronization/locking mechanism must be used between multiple application instances.
When an application instance is processing a set of rows, other application instances shouldn't have to wait for it to end in order to read a different set of rows waiting to be processed.
If an application instance crashes, no manual intervention should need to take place on the table rows that were being processed (such as removing temporary status used for application locking on rows that the crashing instance was processing).
The rows should be processed in a queue-like fashion, i.e., the oldest rows should be processed first.
Although these requisites don't look too complex, I'm having some trouble in coming up with a solution.
I've seen locking hint suggestions, such as XLOCK, UPDLOCK, ROWLOCK, READPAST, etc., but I see no combination of locking hints that will allow me to implement these requisites.
Thanks for any help.
Regards,
Nuno Guerreiro
This is a typical table as queue pattern, as described in Using tables as Queues. You would use a Pending Queue and the dequeue transaction should also schedule a retry in a reasonable timeout. Is not realistically possible to hold on to locks for the duration of the web calls. On success, you would remove the pending item.
You also need to be able to dequeue in batch, dequeuing one-by-one is too slow if you go into serious load (100 and thousands of operations per second). So taking the Pending Queue example from the article linked:
create table PendingQueue (
id int not null,
DueTime datetime not null,
Payload varbinary(max),
cnstraint pk_pending_id nonclustered primary key(id));
create clustered index cdxPendingQueue on PendingQueue (DueTime);
go
create procedure usp_enqueuePending
#dueTime datetime,
#payload varbinary(max)
as
set nocount on;
insert into PendingQueue (DueTime, Payload)
values (#dueTime, #payload);
go
create procedure usp_dequeuePending
#batchsize int = 100,
#retryseconds int = 600
as
set nocount on;
declare #now datetime;
set #now = getutcdate();
with cte as (
select top(#batchsize)
id,
DueTime,
Payload
from PendingQueue with (rowlock, readpast)
where DueTime < #now
order by DueTime)
update cte
set DueTime = dateadd(seconds, #retryseconds, DueTime)
output deleted.Payload, deleted.id;
go
On successful processing you would remove the item from the queue using the ID. On failure, or on crash, it would be retries automatically in 10 minutes. One think you must internalize is that as long as HTTP does not offer transactional semantics you will never be able to do this with 100% consistent semantics (eg. guarantee that no item is processed twice). You can achieve a very high margin for error, but there will always be a moment when the system can crash after the HTTP call succeeded before the database is updated, and will cause the same item to be retried since you cannot distinguish this case from a case when the system crashed before the HTTP call.
I initially suggested SQL Server Service Broker for this. However, after some research it turns out this is probably not the best way of handling the problem.
What you're left with is the table architecture you've asked for. However, as you've been finding, it is unlikely that you will be able to come up with a solution that meets all the given criteria, due to the great complexity of locking, transactions, and the pressures placed on such a scheme by high concurrency and high transactions per second.
Note: I am currently researching this issue and will get back to you with more later. The following script was my attempt to meet the given requirements. However, it suffers from frequent deadlocks and processes items out of order. Please stay tuned, and in the meantime consider a destructive reads method (DELETE with OUTPUT or OUTPUT INTO).
SET XACT_ABORT ON; -- blow up the whole tran on any errors
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN
UPDATE X
SET X.StatusID = 2 -- in process
OUTPUT Inserted.*
FROM (
SELECT TOP 1 * FROM dbo.QueueTable WITH (READPAST, ROWLOCK)
WHERE StatusID = 1 -- ready
ORDER BY QueuedDate, QueueID -- in case of items with the same date
) X;
-- Do work in application, holding open the tran.
DELETE dbo.QueueTable WHERE QueueID = #QueueID; -- value taken from recordset that was output earlier
COMMIT TRAN;
In the case of several/many rows being locked at once by a single client, there is a possibility of the rowlock escalating to an extent, page, or table lock, so be aware of that. Also, normally holding long-running transactions that maintain locks is a big no-no. It may work in this special usage case, but I fear that high tps by multiple clients will make the system break down. Note that normally, the only processes querying your queue table should be those that are doing queue work. Any processes doing reporting should use READ UNCOMMITTED or WITH NOLOCK to avoid interfering with the queue in any way.
What is the implication of rows being processed out of order? If an application instance crashes while another instance is successfully completing rows, this delay will likely cause at least one row to be delayed in its completion, causing the processing order to be incorrect.
If the transaction/locking method above is not to your satisfaction, another way to handle your application crashing would be to give your instances names, then set up a monitor process that has the capacity to check periodically if those named instances are running. When a named instance starts up it would always reset any unprocessed rows that possess its instance identifier (something as simple as "instance A" and "instance B" would work). Additionally, the monitor process would check if the instances are running and if one of them is not, reset the rows for that missing instance, enabling any other instances to run. There would be a small lag between crash and recovery, but with proper architecture it could be quite reasonable.
Note: The following links should be edifying:
info about XLOCK
Tables as Queues
You can't do this with SQL transactions (or relying on transactions as your main component here). Actually, you can do this, but you shouldn't. Transactions are not meant to be used this way, for long locks, and you shouldn't abuse them like this.
Keeping a transaction open for that long (retrieve rows, call the web service, get back to make some updates) is simply not good. And there's no optimistic locking isolation level that will allow you to do what you want.
Using ROWLOCK is also not a good idea, because it's just that. A hint. It's subject to lock escalation, and it can be converted to a table lock.
May I suggest a single entry point to your database? I think it fits in the pub/sub design.
So there would be only one component that reads/updates these records:
Reads batches of messages (enough for all your other instances to consume) - 1000, 10000, whatever you see fit. It makes these batches available to the other (concurrent) components through some queued way. I'm not going to say MSMQ :) (it would be the second time today I recommend it, but it's really suitable in your case too).
It marks the messages as in progress or something similar.
Your consumers are all bound, transactionally, to the inbound queue and do their stuff.
When ready, after the web service call, they put the messages in an outbound queue.
The central component picks them up and, inside a distributed transaction, does an update on the database (if it fails the messages will stay in the queue). Since it is the only one that could do that operation you won't have any concurrency issues. At least not on the database.
In the mean time it can read the next pending batch and so on.

Is it possible in DB2 or in any Database to detect if the table is locked or not?

Is it possible in DB2 to detect if the table is locked or not. Actually whenever we use Select statement and if that table is locked [ may be because of on going execution of insertion or deletion ] , then we have to wait till the table is unlocked.
In our application sometimes it goes to even 2-3 mins. What i think is, if i can have some mechanism by which i can detect the locked table, then i will not even try to fetch the records, instead i will splash some message.
Not only in DB2, but is it possible to detect this in any Database.
I've never used DB2, but according to the documentation it seems you can use the following to make queries not wait for a lock:
SET CURRENT LOCK TIMEOUT NOT WAIT
Alternatively, you can set the lock timeout value to 0
SET CURRENT LOCK TIMEOUT 0
Both the statements have the same effect.
Once you have this, you can try to select from the table and catch the error.
I would recommend against NO WAIT, and rather, specify a low LOCK TIMEOUT (10-30s). If the target table is only locked temporarily (small update, say for 1 second), your second app will timeout immediately. If you have a 10s timeout, the second app would simply wait for the first app to COMMIT or ROLLBACK (1 sec) then move forward.
Also consider there's a bit of a "first come, first served" policy when it comes to handing out locks - if the second app "gives up", a third app could get in and grab the locks needed by the second. It's possible that the second app experiences lock starvation because it keeps giving up.
If you are experiencing ongoing concurrency issues, consider lock monitoring to get a handle on how the database is being accessed. There's lots of useful statistics (such as average lock-wait time, etc.) that can help you tune your parameters and application behaviour.
DB2 V9.7 Infocenter - Database Monitoring

Resources