How to fix jobs conflict in transactional replication - sql-server

Have two multiple geographically distributed SQL Server Databases with transactional replication.SQL Server agent sync two servers every 1 minute.After working for 10 minutes on server B (subscriber),it has an error, which lasts 10-15 minutes and corrected herself. Then again it working 10 minute and has an error. On server A (publication) I have log backup schedule it runs every 10 minutes. Maybe there is conflict between two jobs ?
SQLServerAgent Error: Request to run job XXX (from User sa)
refused because the job is already running from a request by Schedule 14 (Replication agent schedule.).
Changed database context to 'XXX'. (.Net SqlClient Data Provider)
How to fix it ?

Related

SQL Server Managed Backup for Windows Azure (SSMBackup2WA) stuck waiting for progress update

I have a database running on an azure vm with sql server. The db is in full recovery mode. The backup is configured through the web interface. Database and log backups have been working flawlessly for years. But recently the log backup was interrupted halfway through and the log backup process somehow got stuck. The following event has been logged every 5 minutes since then (reading log with managed_backup.sp_get_backup_diagnostics):
[SSMBackup2WAAdminXevent] Database Name = DB, Database ID = 777, Stage =
VerifyJobOutcome, Error Code = 0, Error Message = Warning, Additional Info = A
progress update hasn't been received from SQL Server in more than 30 minutes
for log backup. SSMBackup2WA will continue to wait.
SSMBackup2WA seem to be stuck waiting for a progress update never being received. This has resulted in no log backups being taken. The database backup have continued running without problem.
I have trouble finding the job/task used by SSMBackup2WA. I understand its not in the usual batch of SQL Server Agent jobs but somehow hidden.
My idea is to somehow cancel the existing job that is stuck in waiting loop but I have not figured out how.
I have tried to "reset" the backup process by turning off the backup and then turning it on again but that did not help.
I have no possibility to restart the sql server (and I don't know if that would help).
So since no one seemed to have an answer to this one I resorted to restarting the SQL-server. And after the restart the transaction log backup started working again!
What is interesting is the following log that appeared in the application event log during the restart. It does seem like there was a thread hanging indefinitely, waiting for an status update that never arrived. The restart seems to have taken care of it by killing this status thread and not restarting it again in the erroneous state it had ended up in.
Log Name: Application
Source: Microsoft SQL Server Automated Backup
Date: 1/15/2022 11:16:20 AM
Event ID: 57007
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: wn-sqlserver1
Description:
[Warning] AutomatedBackupStatusMonitorError:
System.Exception:
Error in auto-backup status monitor thread --->
Microsoft.SqlServer.Management.IaaSAgentSqlQuery.Contract.IaaSAgentSqlQueryException:
A network-related or instance-specific error occurred while
establishing a connection to SQL Server. The server was not
found or was not accessible. Verify that the instance name
is correct and that SQL Server is configured to allow remote
connections. (provider: Named Pipes Provider, error: 40 - Could
not open a connection to SQL Server) --->

BizTalk SQL Server Job always fails after some hours

We have BizTalk 2016 running on SQL Server 2016 AlwaysOn.
The SQL Server Agent Job MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb is doing its thing but after some hours it fails. Sometimes 10 hours sometimes 90 hours or anything in between. I know, the job is designed to run forever and in a case of an error restarts itself within a minute. But I would like to know the actual error message for this failed job. The job history is not helpful because the job log entry is truncated.
A failover is not happening. The question is: WHY is this job failing and ultimately: how do I stop it from doing that?
I have set-up the extended monitoring of the failing step and it revealed, that the job failed because of a deadlock and it was chosen as the deadlock victim. So now is the question, why is there a deadlock? Is MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDbknown for deadlock issues?
Check the documentation at Description of the SQL Server Agent Jobs in BizTalk Server, it says:
Important At first, the MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb job status icon displays a status of Success. However, there will be no corresponding success entry in the job history. If one of the jobs in the MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb job fails, a failure entry appears in the job history and the status icon displays a status of Failure. The job will always display a status of Failure after the first failure. To verify that the other BizTalk Server SQL Server Agent jobs run correctly, check the status of the other BizTalk Server SQL Server Agent jobs.
Hope this answer your question.

Getting Alert that Backup Log Failed but it didn't

I'm migrating databases from SQL Server 2008 R2 to a new server running SQL Server 2012. I set up an alert for any severity >= 16. I have a maintenance plan that includes a log backup of all user databases every 5 minutes. After restoring about 10 databases to the new server, I started getting an alert every 30 minutes that says:
DESCRIPTION: BACKUP failed to complete the command BACKUP LOG MyDatabaseName. Check the backup application log for detailed messages.
COMMENT: (None)
JOB RUN: (None)
I searched the logs and there is nothing about a failed backup, and all the backups are fine. I get the alert every 30 minutes, so it's not happening on all of the log backups because they run every 5 minutes. And it's only for one or sometimes two databases out of the 10 that have been restored onto the new server.
I would greatly appreciate anyone that can point me in the right direction to start troubleshooting this.
The maintenance plan runs via a SQL Server Agent job. Check the history of the job. Any failures might show there.
Error level 16 is not considered critical and can be fixed by the user.
Just setup the following to monitor all alerts > level 11.
1 - Database mail
http://craftydba.com/?p=1025
2 - Operator
http://craftydba.com/?p=1085
3 - Alerts
http://craftydba.com/?p=1099
Next time you get a alert, you should get an email with details.
If you want to be real fancy, you can have the alert call a job. Log the alert in the APPLICATION log and then send the email.

merge replication - can't create snapshot - timeout - sql server 2008

I have a SQL Server 2008 database, and I need a mergereplication because i want to sync with mobile devices afterwards.
So I created a replication but when it comes to start the snapshotagent, the agent tries to start for about 20 minutes and then it shows the message
The replication agent has not logged a progress message in 10 minutes.
This might indicate an unresponsive agent or high system activity.
Verify that records are being replicated to the destination and that
connections to the Subscriber, Publisher, and Distributor are still
active.
There aren't any other errormessages, neither in the snapshot-agent-status-window nor in the agent-log-window.
I don't have the administrator of the domain, but the local administrator and a domainuser with admin-privilegs. Both have all rights to database, are in the access-list of the replication.
The server agent runs on the local administrator-account and there are 3 MergeReplications on the server, working
The job runs also under the local administrator.
Thank you for your help, Karl
So it works again...
Maybe someone else has got the same issue one day, so i post the solution here:
I researched on the server and found out, the sql server service is running under a local user. The reason for this is, that there were problems with the backupsystem, used by our customers and so they changed it years ago.
Because of the local user account a 15404-Error occures.
Knowing, that i mustn't use domain-accounts, I also solved the initial problem with my snapshot-agent. I searched for hours (nearly days ;) ) and it was just this little change:
When the Replication is created, the job is created too. The job has three steps. The Job-owner is the local-admin, also for the server-agent-service. But the second step of my job (replictionsnapshot) has one setting: run as. And by default this isn't the job-owner but the user running the creation, in my case my domain-account.
Now, that I set it to the local-administrator as well everything works fine again.
Thanks, Karl
I had the same issue, And the below fixed the issue. The replication agent was timing out after 10 minutes and changing the heartbeat from 10 to 30 minutes solved the issue,
Run the below command
exec sp_changedistributor_property #property = 'heartbeat_interval', #value = 30;
and then restart the sql agent on the subscriber to continue syncing.

Agent message code 20084. The process could not connect to Subscriber

All servers running SQL 2005
SQL server (NOLA) replicates to 35 remote locations (StoreXX).
Earlier this week, one publication started having problems connecting
to 30 of the 35 remote locations, with an error of:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date 2/4/2010 10:00:01 AM
Log Job History (NOLA-Closing_Balance-CB Defaults to Stores-R99S-
Store1-2790)
Step ID 2
Server NOLA
Job Name NOLA-Closing_Balance-CB Defaults to Stores-R99S-Store1-2790
Step Name Run agent.
Duration 00:31:47
Sql Severity 0
Sql Message ID 0
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0
Message
2010-02-04 16:31:48.081 Parameter values obtained from agent profile:
-bcpbatchsize 2147473647
-commitbatchsize 100
-commitbatchthreshold 1000
-historyverboselevel 2
-keepalivemessageinterval 300
-logintimeout 15
-maxbcpthreads 1
-maxdeliveredtransactions 0
-pollinginterval 5000
-querytimeout 1800
-skiperrors
-transactionsperhistory 100
2010-02-04 16:31:48.081 Connecting to Subscriber 'R99S-Store1'
2010-02-04 16:31:48.440 Agent message code 20084. The process could
not connect to Subscriber 'R99S-Store1'.
2010-02-04 16:31:48.472 Category:NULL
Source: Microsoft SQL Native Client
Number: 10054
Message: TCP Provider: An existing connection was forcibly closed by
the remote host.
2010-02-04 16:31:48.472 Category:NULL
Source: Microsoft SQL Native Client
Number: 10054
Message: Communication link failure
2010-02-04 16:31:48.472 The agent failed with a 'Retry' status. Try to
run the agent at a later time.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The problem is only this one publication & not all subscriptions. I
have deleted a problem subscription & readded, same problem.
I have created another publication (same dB) & subscription (same &
new subscribers) with the same results (error above).
Now is where it gets weird…..
I created a new publication using a different dB on both the publisher
& subscriber & everything works fine.
I have had the network folks check what they need to check & have
googled until I am blue in the face.
Can anyone give me any insight into this issue.
AHIA,
Larry….

Resources