Different query result in SQL Server high availability group's nodes [migrated] - sql-server

This question was migrated from Stack Overflow because it can be answered on Database Administrators Stack Exchange.
Migrated 23 days ago.
I have high availability group with two nodes and I ensure they have "synchronized commit" mode. So I assured they are identical in records of tables.
A developer reported a bug for different results of stored procedure is related to a report between primary and secondary node. At first, it was ridiculously funny for me due to knowing about my mode in HA but it is correct. The result sometimes is different.
I tried to save log for this difference so I created a table in master database which isn’t involved in HA and try to save the count of any tables are in the query( I save two numbers one count(*) as usual and the other one with "nolock" parameter to know about dirty records) but I have some log records show the tables are the same but the result of join of them is different in two nodes! Do you have any suggestion for me to discover what causes this distinguish ?

You'll want to read this documentation that covers a wide variety of items that can all contribute to this. The long and the short is, if you need exact point in time 100% up to date information, use the primary replica.
Notable excerpt, though:
The primary replica sends log records of changes on primary database to the secondary replicas. On each secondary database, a dedicated redo thread applies the log records. On a read-access secondary database, a given data change does not appear in query results until the log record that contains the change has been applied to the secondary database and the transaction has been committed on primary database.
Don't forget that secondary replicas also remap the isolation level to snapshot.

Related

Temporal Tables and Log shipping

We are building a system in our company which will need temporal tables in sql server and might need log shipping as well. I wanted to know if there are any unexpected impacts of log shipping on a temporal table that wouldn't happen on a normal table?
I would expect no impact in either direction (that is, log song won't change the temporal table nor will the temporal table change log shipping). At its core, log shipping is just restoring transaction logs on another server. And temporal tables are (more or less) a trigger that maintains another table on data mutations. That extra work will be present in the log backup and will restore just fine at the log shipping secondary
Previously in the company we used temporary tables, but when the volume of business data had to grow, we had to change the queries using WITH
WITH table_alias AS (SELECT ...)

Disable transactions on SQL Server

I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.

Azure SQL - Automatic Tuning with Geo-replication - Server in unspecified state and query store has reached its capacity limit

I have a primary db and a secondary geo-replicated db.
On the primary, the server atuomatic tuning is turned on.
On the replica when I try to do the same I encounter the following issues.
The database is inheriting settings from the server, but the server is
in the unspecified state. Please specify the automatic tuning state on
the server.
And
Automated recommendation management is disabled because Query Store
has reached its capacity limit and is not collecting new data. Learn
more about the retention policies to maintain Query Store so new data
can be collected.
However, on the server, tuning options are on so I don't understand that "unspecified state". Moreover, why I look at the Query Store set up in both databases properties in SSMS they are exactly the same with 9MB of space available out of 10MB.
Note: both databases are setup on the 5 DTUs basic pricing plan.
UPDATE
While the primary db Query store Operation Mode is Read Write, the replica is Read Only. It seems I cannot change it (I couldn't from the properties dialog of the db in SSMS).
Fair enough but how can the same query be 10 times faster on primary than on replica. Aren't optimizations copied accross?
UPDATE 2
Actually Query Store are viewable on SSMS and I can see that they are identical in both db. I think the difference in response times that I observe is not related.
UPDATE 3
I marked #vCillusion's post as the answer as he/she deserves credits. However, it's too detailed with regards to the actual issue.
My replica is read-only and as such cannot be auto-tuned as this would require writing in the query store. Azure not being able to collect any data into the read only query store led to a misleading (and wrong) error message about the query store reaching its capacity.
We get this message only when the Query Store is in read-only mode. Double check your query store configuration.
According to MSDN, you might need to consider below:
To recover Query Store try explicitly setting the read-write mode and recheck actual state.
ALTER DATABASE [QueryStoreDB]
SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO
SELECT actual_state_desc, desired_state_desc, current_storage_size_mb,
max_storage_size_mb, readonly_reason, interval_length_minutes,
stale_query_threshold_days, size_based_cleanup_mode_desc,
query_capture_mode_desc
FROM sys.database_query_store_options;
If the problem persists, it indicates corruption of the Query Store data and continues on the disk. We can recover Query Store by executing sp_query_store_consistency_check stored procedure within the affected database.
If that did not help, you could try to clear Query Store before requesting read-write mode.
ALTER DATABASE [QueryStoreDB]
SET QUERY_STORE CLEAR;
GO
ALTER DATABASE [QueryStoreDB]
SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO
SELECT actual_state_desc, desired_state_desc, current_storage_size_mb,
max_storage_size_mb, readonly_reason, interval_length_minutes,
stale_query_threshold_days, size_based_cleanup_mode_desc,
query_capture_mode_desc
FROM sys.database_query_store_options;
If you checked it and it's in read-write mode, then we might be dealing with some bug here. Please provide feedback to Microsoft on this.
Additional points of limitation in query store:
Also, note Query Store feature is introduced to monitor performance and is still evolving. There are certain known limitations around it.
As of now, it does not work on Read-Only databases (Including read-only AG replicas). Since readable secondary Replicas are read-only,
the query store on those secondary replicas is also read-only. This
means runtime statistics for queries executed on those replicas are
not recorded into the query store.
Check database is not Read-Only.
Query store didn't work for system databases like master or tempdb
Check if Primary filegroup have enough memory since the Data is stored only in Primary filegroup
The supported scenario is that Automatic tuning needs to be enabled on the primary only. Index automatically created on the primary will be automatically replicated to the secondary. This process takes the usual sync up time between the primary and the secondary. At this time it is not possible to have secondary read only replica tuned differently for performance than the primary. The query store error message is due to its read-only state as note above, and should be disregarded. The performance issue to your secondary replica most likely needs to be explored through some other reasons.

Major performance difference between two Oracle database instances

I am working with two instances of an Oracle database, call them one and two. two is running on better hardware (hard disk, memory, CPU) than one, and two is one minor version behind one in terms of Oracle version (both are 11g). Both have the exact same table table_name with exactly the same indexes defined. I load 500,000 identical rows into table_name on both instances. I then run, on both instances:
delete from table_name;
This command takes 30 seconds to complete on one and 40 minutes to complete on two. Doing INSERTs and UPDATEs on the two tables has similar performance differences. Does anyone have any suggestions on what could have such a drastic impact on performance between the two databases?
I'd first compare the instance configurations - SELECT NAME, VALUE from V$PARAMETER ORDER BY NAME and spool the results into text files for both instances and use some file comparison tool to highlight differences. Anything other than differences due to database name and file locations should be investigated. An extreme case might be no archive logging on one database and 5 archive destinations defined on the other.
If you don't have access to the filesystem on the database host find someone who does and have them obtain the trace files and tkprof results from when you start a session, ALTER SESSION SET sql_trace=true, and then do your deletes. This will expose any recursive SQL due to triggers on the table (that you may not own), auditing, etc.
If you can monitor the wait_class and event columns in v$session for the deleting session you'll get a clue as to the cause of the delay. Generally I'd expect a full table DELETE to be disk bound (a wait class indication I/O or maybe configuration). It has to read the data from the table (so it knows what to delete), update the data blocks and index blocks to remove the entries which generate a lot of entries for the UNDO tablespace and the redo log.
In a production environment, the underlying files may be spread over multiple disks (even SSD). Dev/test environments may have them all stuck on one device and have a lot of head movement on the disk slowing things down. I could see that jumping an SQL maybe tenfold. Yours is worse than that.
If there is concurrent activity on the table [wait_class of 'Concurrency'] (eg other sessions inserting) you may get locking contention or the sessions are both trying to hammer the index.
Something is obviously wrong in instance two. I suggest you take a look at these SO questions and their answers:
Oracle: delete suddenly taking a long time
oracle delete query taking too much time
In particular:
Do you have unindexed foreign key references (reason #1 of delete taking a looong time -- look at this script from AskTom),
Do you have any ON DELETE TRIGGER on the table ?
Do you have any activity on instance two (if this table is continuously updated, you may be blocked by other sessions)
please note: i am not a dba...
I have the following written on my office window:
In case of emergency ask the on call dba to:
Check Plan
Run Stats
Flush Shared Buffer Pool
Number 2 and/or 3 normally fix queries which work in one database but not the other or which worked yesterday but not today....

How do I ensure SQL Server replication is running?

I have two SQL Server 2005 instances that are geographically separated. Important databases are replicated from the primary location to the secondary using transactional replication.
I'm looking for a way that I can monitor this replication and be alerted immediately if it fails.
We've had occasions in the past where the network connection between the two instances has gone down for a period of time. Because replication couldn't occur and we didn't know, the transaction log blew out and filled the disk causing an outage on the primary database as well.
My google searching some time ago led to us monitoring the MSrepl_errors table and alerting when there were any entries but this simply doesn't work. The last time replication failed (last night hence the question), errors only hit that table when it was restarted.
Does anyone else monitor replication and how do you do it?
Just a little bit of extra information:
It seems that last night the problem was that the Log Reader Agent died and didn't start up again. I believe this agent is responsible for reading the transaction log and putting records in the distribution database so they can be replicated on the secondary site.
As this agent runs inside SQL Server, we can't simply make sure a process is running in Windows.
We have emails sent to us for Merge Replication failures. I have not used Transactional Replication but I imagine you can set up similar alerts.
The easiest way is to set it up through Replication Monitor.
Go to Replication Monitor and select a particular publication. Then select the Warnings and Agents tab and then configure the particular alert you want to use. In our case it is Replication: Agent Failure.
For this alert, we have the Response set up to Execute a Job that sends an email. The job can also do some work to include details of what failed, etc.
This works well enough for alerting us to the problem so that we can fix it right away.
You could run a regular check that data changes are taking place, though this could be complex depending on your application.
If you have some form of audit train table that is very regularly updated (i.e. our main product has a base audit table that lists all actions that result in data being updated or deleted) then you could query that table on both servers and make sure the result you get back is the same. Something like:
SELECT CHECKSUM_AGG(*)
FROM audit_base
WHERE action_timestamp BETWEEN <time1> AND BETWEEN <time2>
where and are round values to allow for different delays in contacting the databases. For instance, if you are checking at ten past the hour you might check items from the start the last hour to the start of this hour. You now have two small values that you can transmit somewhere and compare. If they are different then something has most likely gone wrong in the replication process - have what-ever pocess does the check/comparison send you a mail and an SMS so you know to check and fix any problem that needs attention.
By using SELECT CHECKSUM_AGG(*) the amount of data for each table is very very small so the bandwidth use of the checks will be insignificant. You just need to make sure your checks are not too expensive in the load that apply to the servers, and that you don't check data that might be part of open replication transactions so might be expected to be different at that moment (hence checking the audit trail a few minutes back in time instead of now in my example) otherwise you'll get too many false alarms.
Depending on your database structure the above might be impractical. For tables that are not insert-only (no updates or deletes) within the timeframe of your check (like an audit-trail as above), working out what can safely be compared while avoiding false alarms is likely to be both complex and expensive if not actually impossible to do reliably.
You could manufacture a rolling insert-only table if you do not already have one, by having a small table (containing just an indexed timestamp column) to which you add one row regularly - this data serves no purpose other than to exist so you can check updates to the table are getting replicated. You can delete data older than your checking window, so the table shouldn't grow large. Only testing one table does not prove that all the other tables are replicating (or any other tables for that matter), but finding an error in this one table would be a good "canery" check (if this table isn't updating in the replica, then the others probably aren't either).
This sort of check has the advantage of being independent of the replication process - you are not waiting for the replication process to record exceptions in logs, you are instead proactively testing some of the actual data.

Resources