How to resolve DMS failure to access LSN issue in SQL Server? - sql-server

I am trying to use DMS to capture change logs from the SQL Server and write them to S3. I have set up a long polling period of 6 hours. (AWS recommends > 1 hour). DMS fails with the below error when the database is idle for a few hours during the night.
DMS Error:
Last Error AlwaysOn BACKUP-ed data is not available Task error notification received from subtask 0, thread 0
Error from cloud watch - Failed to access LSN '000033fc:00005314:01e6' in the backup log sets since BACKUP/LOG-s are not available
I am currently using DMS version 3.4.6 with multi-az.
I always thought the DMS reads the change data immediately after updating the T log with the DML changes. Why do we see this error even with a long polling period? Can someone explain why this issue is caused? how we can handle this?

Related

DMS is failing sometimes and its not read the data in the next cyle

some time CDC DMS is failing with error 'Failure in resolving stream position by timestamp'
I have a DMS CDC task and it will be run every 4 hours. DMS is able to read transaction log and it place the file in s3 for some runs its failing with error 'Failure in resolving stream position by timestamp'. after that data is not coming in the next cycle. And we have set up polling interval to 24 hours.

Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart

I am receiving the mentioned warning continuously from Debezium Connector for SQL Server, that I am running by using connect-standalone. At a certain moment, I have tried to start two concurent connectors that connect to the same database, but that was yesterday, and after that I restarted this connector several times, and the other connector is stopped, so I don't know where is that information persisted, and why, because, at the moment only one connector is running, so this shouldn't be logged. As CDC does not work, this seems like a problem, regardless of the fact that it is logged as only a warning, because no (other) error is logged but such lines:
WARN Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart (io.debezium.connector.common.BaseSourceTask:238)

SQL Server Trace Files Filling Up Agent Drive

Background:
SQL Compliance Manager is collecting files on an Agent Server to audit and once the trace files collect on the Agent the Compliance Manager agent service account moves these files to the Collection Server folder, processes them and deletes them.
Problem:
Over 5 times in the last month, the trace files have started filling up the Agent drive to the point where the trace files have to be stopped by running a SQL query to change the status of the traces. This has also had a knock on effect with the Collection Server and the folder on there starts to fill up excessively and the Collection Server Agent is unable to process the audit trace files. 4/5 times the issue occurred closely after a SQL fail over, however, the last time this trace error occurred there had been no fail over. The only thing that was noticeable in the event logs was that 3 SQL jobs went off around the time the traces started acting up.
Behaviour:
A pattern has been identified which shows on Windows Event Viewer that there is an execution timeout close or at the time the trace files start becoming unwieldy.
Error: An error occurred starting traces for instance XXXXXXXXX. Error: Execution Timeout Expired.
The timeout period elapsed prior to completion of the operation or the server is not responding..
The trace start timeout value can be modified on the Trace Options tab of the Agent Properties dialog in the SQLcompliance Management Console.
Although, I do not believe by just adjusting the Timeout settings will cause for the traces to stop acting in that way, as these are recommended settings and other audited servers have these same settings but do not act in the same way. The issue only persists with one box.
Questions:
I want to find out if anyone else has experienced a similar issue and if so, was the environment the issue happened in dealing with a heavy load? By reducing the load did it help or were there other remediation steps to take? Or does anyone know of a database auditing tool which is lightweight and doesn't create these issues?!
Any help or advice appreciated!

SignalR SQL Server Broker - Orphaned Service Broker Queue Errors

I am using SQL Server Broker on SQL Server 2008 for Scaleout with SignalR v2.1.2. It was recently discovered that we are producing 50k+ errors per day in our DB logs. After some research, there are 3 orphaned Service Broker queues from December. Error example:
2016-02-27 23:58:01.79 spid30s The activated proc '[dbo].[SqlQueryNotificationStoredProcedure-2ffbddba-6ddc-4ad0-88b4-45a405e975e0]' running on queue 'MY_SIGNALR_DB.dbo.SqlQueryNotificationService-2ffbddba-6ddc-4ad0-88b4-45a405e975e0' output the following: 'Could not find stored procedure 'dbo.SqlQueryNotificationStoredProcedure-2ffbddba-6ddc-4ad0-88b4-45a405e975e0'.'
These queues were created in December and were NOT dropped for some reason. The corresponding SPs were apparently dropped as expected. The DB will produce an error every 5 seconds for this (equates to 50k per day with 3 queues). Each queue DOES contain a message.
Questions:
What can cause this?
Are there additional SignalR settings that can be implemented to ensure these are cleaned up?
Is this a bug in SQL Server Service Broker?
Is there a document which describes SignalR's expected behavior with regards to Queues and their expiration?
Thank you for your time.
These are leftover from SqlDependency. The implementation of the SqlDependency.Start() is to create a just-in-time service, queue and activated procedure (see the reference source). This has some issues, and even a simple Visual Studio debugging session can leave stranded queues/activated procedures.
You can clean up these left-over service/queue/procedures as they happen, or you can choose to use the lower level SqlNotificationRequest class and handle the service/queue deployment on your own. Pick your poison.

Getting Alert that Backup Log Failed but it didn't

I'm migrating databases from SQL Server 2008 R2 to a new server running SQL Server 2012. I set up an alert for any severity >= 16. I have a maintenance plan that includes a log backup of all user databases every 5 minutes. After restoring about 10 databases to the new server, I started getting an alert every 30 minutes that says:
DESCRIPTION: BACKUP failed to complete the command BACKUP LOG MyDatabaseName. Check the backup application log for detailed messages.
COMMENT: (None)
JOB RUN: (None)
I searched the logs and there is nothing about a failed backup, and all the backups are fine. I get the alert every 30 minutes, so it's not happening on all of the log backups because they run every 5 minutes. And it's only for one or sometimes two databases out of the 10 that have been restored onto the new server.
I would greatly appreciate anyone that can point me in the right direction to start troubleshooting this.
The maintenance plan runs via a SQL Server Agent job. Check the history of the job. Any failures might show there.
Error level 16 is not considered critical and can be fixed by the user.
Just setup the following to monitor all alerts > level 11.
1 - Database mail
http://craftydba.com/?p=1025
2 - Operator
http://craftydba.com/?p=1085
3 - Alerts
http://craftydba.com/?p=1099
Next time you get a alert, you should get an email with details.
If you want to be real fancy, you can have the alert call a job. Log the alert in the APPLICATION log and then send the email.

Resources