SQL unable to resume data movement after manual failover - sql-server

We have a 3 node availability group, PRIMARY, a synchronized none readable replica and a async readable replica. We are currently doing an in place upgrade to 2016, both secondaries have been upgraded and patched so I am manually failing over the primary to the none readable secondary sync replica. When I fail over, the new PRIMARY appears fine, the readable async replica appears fine but the now new secondary goes into a none synchronized state. I have attempted to resume data movement but it fails with an error message about object explorer unable to refresh.
In the error logs I can see these errors:
***The database 'X' cannot be opened because it is version 852. This server supports version 706 and earlier. A downgrade path is not supported.
AlwaysOn Availability Groups data movement for database 'X' has been suspended for the following reason: "system" (Source ID 7; Source string: 'SUSPEND_FROM_REVALIDATION'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.
Error: 948, Severity: 20, State: 102.***
I assumed this might be due to the primary being 2016 and this server being 2012 but apparently this is a legacy issue on this box, it a DEV box so not been a major concern.
Does anybody know why this is happening and what to do to resolve this?

Related

MSSQL error: "Script level upgrade for database 'master' failed ... upgrade step 'msdb110_upgrade.sql' encountered error 200, state 7, severity 25."

All of a sudden one day (on my DEV PC) my Microsoft SQL Server 2012 instance (installed as instance name "SQL2012") would not start (all my other installed instances did). Trying to start it manually under Services failed. I don't recall making any recent changes prior to this. The cause of the failure was a mystery.
On inspecting Event Viewer, under System it showed a rather amusing error message [emphasis mine]:
The SQL Server (SQL2012) service terminated with the following service-specific error:
WARNING: You have until SQL Server (SQL2012) to logoff. If you have not logged off at this time, your session will be disconnected, and any open files or devices you have open may lose data.
checking under Application Event Log, I found these 2 error messages (preceded by a number of MSSQL$SQL2012 informational messages):
Script level upgrade for database 'master' failed because upgrade step 'msdb110_upgrade.sql' encountered error 200, state 7, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
followed by:
Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
Fearing having lost my system databases (and not having a backup of them to restore - who makes backups of their system dbs anyway??) and needing to access the instance, and attached databases - I was willing to try anything. Even the possible restore of the system databases: Restoring the SQL Server Master Database Even Without a Backup - but that looked quite complex.
Fortunately, I was eventually able to start the instance (thank you to this answer: https://stackoverflow.com/a/59676743/4993856 which I trusted, because Pinal Dave also mentions that particular switch in: SQL SERVER – Script level upgrade for database ‘master’ failed because upgrade step msdb110_upgrade.sql encountered error 926, state 1, severity 25) if I ran:
net start mssqlserver$SQL2012 /T902
This pointed to some issue with the upgrade script... (Remember SQL is installed with instance name: SQL2012, hence the mssqlserver$SQL2012 used above for the named instance).
After some more searching I discovered this post: Installing service pack / cumulative update on SQL Server 2016 / 2017 breaks database engine (not exactly the same SQL version as mine) which pointed to the following possible Region Settings setting (Control Panel [when viewed by 'icons'] > All Control Panel Items > Region > Administrative > "Change system locale..."):
"Beta: Use Unicode UTF-8 for worldwide language support" in Region Settings
THAT WAS IT!!! After de-selecting that option (and possibly restarting my computer), the MSSQL Server 2012 Instance started up without any issue, and I was able to access all my previously attached databases.
I assume the pending upgrade scripts ran successfully. Thinking back about it now, it is possible that I agreed to installing a SQL Update, and never bothered to test access to the instance afterwards.
I also don't recall exactly why I chose to enable that specific setting under Region Settings, possibly due to some Linux compatibility, but it looks like it has become defaulted 'on' in recent Windows builds.
I got the same problem SQL2017 after update Windows Patch Hotfix3391(KB5001228)
after restart server MSSQL Fail to start and event viewer shown error below
Script level upgrade for database 'master' failed because upgrade step 'msdb110_upgrade.sql' encountered error 200, state 7, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
Solution
Fix by remove Beta:Use Unicode UTF-8 for Worldwide lang.. in the Region Settings
Then it require restart server. After restart MSSQL can start as normal.
The problem is the msdb_110.sql update script, the script is a bit of a mess, with mixed tabs and spaces (wtf?).
It tries to run a couple of procedures that fail, on startup of sql-server. They fail when the code-page is 65001 (usually because the BETA utf-8 code page option has been selected) and so SQL server fails to start.
This appears to happen any time a SQL Server update is installed. I only experience this error with SQL Server 2017, not 2019
Why?
Don't know? The script is a mess.
Solution
Deselect the use utd-8 code page option
Restart the machine
Start sql server and let it run the script
(optional) reselect the use utd-8 code page option
Restart machine again and sql server
(optinal but recommended) uninstall windows, install a unix and run postgres

Database 'wss_content_1' cannot be opened. It is in the middle of a restore

This is the scenario:
I am testing redundancy and fail over in my dev machine by creating two SQL Servers.
I have created two SQL Servers. One with an extra instance:
-SQL1 : primary server
-SQL2 : mirror server
-SQL2\wtn : witness instance
First I made a full backup of the database and transaction log in my primary server, then I restored the database and transaction log file.
I used option "Restore with no recovery".
In the database node "restoring" is being shown. I believe this is normal when you would like to keep pulling data.
Then in the primary server, I tried to create mirroring on the database. After the wizard completed, I clicked on start Mirroring, but now I am getting the following error:
An error occurred while starting mirroring.
------------------------------
ADDITIONAL INFORMATION:
Alter failed for Database 'WSS_Content_1'. (Microsoft.SqlServer.Smo)
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
------------------------------
Database 'wss_content_1' cannot be opened. It is in the middle of a restore. (Microsoft SQL Server, Error: 927)
I see there is another post here on stack overflow, but that does not have much information. I have been waiting more than 10 minutes.
Update
I am using SQL Enterprise 2016 by the way. Perhaps there is a difference in 2016 version.
I was watching this youtube video and my steps were exactly as this guy's.
Update 2:
I followed this as well but did not help.
How to: Prepare a Mirror Database for Mirroring (Transact-SQL)

The MSDTC transaction manager was unable to push the transaction to the destination transaction manager due to communication problems

I have a BizTalk server and a SQL server which BizTalk sends messages via WCF-SQL to. The BizTalk server has been calling to this server for over a year with no problems. I came in this morning any suddenly it can't (it was working on Friday).
The full error I'm getting when calling the WCF-SQL endpoint is:
A message sent to adapter "WCF-SQL" on send port "MyPort" with URI "mssql://mySQLServer" is suspended.
Error details: System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException:
The MSDTC transaction manager was unable to push the transaction to the destination transaction manager due to communication problems.
Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers. (Exception from HRESULT: 0x8004D02A)
at System.Transactions.Oletx.ITransactionShim.Export(UInt32 whereaboutsSize, Byte[] whereabouts, Int32& cookieIndex, UInt32& cookieSize, CoTaskMemHandle& cookieBuffer)
at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
I've followed instructions from the following thread:
MSDTC on server 'server is unavailable
I've run msdtc -uninstall then msdtc -install and restarted the service several times.
I've rebooted the server several times.
I can connect to the database using Sql Server Management Studio
DTCPing when trying to connect from the SQL server to the Biztalk server results in (when DTCPing is running on the BizTalk):
Problem:fail to invoke remote RPC method
Error(0x6BA) at dtcping.cpp #303
-->RPC pinging exception
-->1722(The RPC server is unavailable.)
RPC test failed
when going from Biztalk to SQL I get this (even thought DTCPing is running on the other end)
Please refer to following log file for details:
C:\Temp\DTCPing\myserv.log
Invoking RPC method on dbaditest
RPC test is successful
++++++++++++RPC test completed+++++++++++++++
Please start PING from dbaditest to complete the test
neither server is running a firewall at all
I'm all out of things to try.
Edit: I can confirm that other servers/computers can connect to the SQL server. So I have to assume that it's the BizTalk server that is the problem.
Edit 2: I tried connecting from BizTalk Server to another SQL server on the network and got the same error. I'm moments away from throwing my hands up and rebuilding my dev environment -- ugg :(
Edit 3: I can telnet to port 135 from BizTalk to SQL Server, so there's nothing blocking it.
Edit 4: DTCTester results in:
tablename= #dtc24449
Creating Temp Table for Testing: #dtc24449
Warning: No Columns in Result Set From Executing: 'create table #dtc24449 (ival int)'
Initializing DTC
Beginning DTC Transaction
Enlisting Connection in Transaction
Error:
SQLSTATE=25S12,Native error=-2147168242,msg='[Microsoft][ODBC SQL Server Driver]Distributed transaction error'
Error:
SQLSTATE=24000,Native error=0,msg=[Microsoft][ODBC SQL Server Driver]Invalid cursor state
Typical Errors in DTC Output When
a. Firewall Has Ports Closed
-OR-
b. Bad WINS/DNS entries
-OR-
c. Misconfigured network
-OR-
d. Misconfigured SQL Server machine that has multiple netcards.
Aborting DTC Transaction
Releasing DTC Interface Pointers
Successfully Released pTransaction Pointer.
You've already taken some steps here, but carefully go through the MSDN Article on Troubleshooting MSDTC.
I'd be concerned that someone imaged another server off of yours, but uninstalling and reinstalling MSDTC should have fixed that. It might be worth checking on these registry values as well (from the above link):
Windows enhances security by requiring authenticated calls to the RPC interface. This functionality is configurable through the EnableAuthEpResolution and RestrictRemoteClients registry keys. To ensure that remote computers are able to access the RPC interface, follow these steps:
Click Start, click Run, type regedit.exe, and then click OK to start Registry Editor.
Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT
Under the RPC key, create the following DWORD entries with the indicated values. If the RPC key does not exist then it must be created.
DWORD entry Default value Recommended value
EnableAuthEpResolution 0 (disabled) 1
RestrictRemoteClients 1 (enabled) 0
Close Registry Editor.
Restart the MSDTC Service.
Is your BizTalk/SQL computer name unique? (no conflicts with other machine)
Can you DTC connect to another SQL server from your BizTalk server? I would suggest you to use DTCTester
testing the DTC connection instead of DTCPing.
Not sure if this will help but thought I'd mention it.
From BOTH servers:
Start -> Admin Tools -> Component Services
Expand Component Services -> Computers -> My Computer -> Distributed Transaction Coordinator and right-click Local DTC. Go to Security tab and check over the settings there.
Enable Network DTC Access
Allow Remote Clients
Allow Inbound/Outbound as required
Select correct authentication
Enable XA Transactions as required
MSDTC Service should auto restart. These settings could perhaps have changed since Friday? I have had this happen before for reasons unknown
Wow, I finally figured it out. As most people said, it MUST be some kind of network issue (and I didn't disagree). The kicker was that my PC was allowed DTC from it to SQL, but the VM running on my PC didn't. What it ended up being was that we were pushed to install Symantec Endpoint Protection just last week (right before I left for the weekend).
I uninstalled it and all it working now.

Merge replication unintialized subcription is expired or does not exist

I am trying to set up a merge replication using web synchronization between a publishing SQL Server 2012 standard and subscribing SQL Server 2012 Express. After following the instructions provided at Technet, I am stuck on this:
Source: Merge Process(Web Sync Server)
Number: -2147200985
Message: The subscription to publication 'MyMergePublication' has expired or does not exist.
I already verified that SSL certification are good, that I can browse to the publishing machine's URL https:\\mycomputer\replisapi.dll and get the expected output. I already verified that snapshot was set up and I took a giant hammer & use an administrator account to run the pool identity which is really bad security-wise but wanted to validate that it was not security that was tripping me up.
To further the mystery, when I try and fail to sync, the publisher acknowledges that a new subscriber has been registered, but it cannot get the snapshot at all and thus subscriber database is still empty.
On the replication monitor, there are no failed synchronization history, or any errors; all it has to say is that the subscriber is uninitialized, and no more.
Turning up the verbosity of the merge agent, I saw some sql being executed and tried replicating the sql and i found this was failing with same error:
{call sys.sp_MSgetreplicainfo(?,?,?,?,?,?,?,90)}
I called it with only the 3 mandatory parameters supplied and it would fail. That is despite the prior call sp_helpmergepublication does return a row for that publication. Oddly, the content of sp_helpmergepublication does not match what I configured for the subscription (e.g. it says web url is null when viewing the properties correctly shows the web url being set). Not sure that is significant.
The content of sp_MSgetreplicainfo contains a call to another system sprocs that I cannot run for some reason (says not found) so I'm not sure what is actually going on here.
Any clues would be greatly appreciated.

SQL Server 2008 R2 Cluster - SQL Server instance Fails after moving it to another node

I have 2 nodes in a SQL Server 2008 R2 Cluster that I inherited. Looking at the 'Failover Cluster Manager', under 'Services and applications', I see 13 SQL Server instances. It, and all of its resources, are owned by one node. My thought is that they should be evenly distributed between the 2 servers.
When I try to move one of these instances to the other node, everything comes back online with the exception of 'SQL Server (Name)' under Other Resources. It says 'Failed'. When I try to manually bring it online, I get an error message
The operation has failed. An error occurred while attempting to bring the resource 'SQL Server (NAME)' online.
Under see details, it says
Error Code: 0X8007139a The cluster resource could not be brought online by the resource monitor
In system event viewer on the target server, I see the events 1069 and 1205, which both basically say "cluster service or application failed". Under the folder 'FailoverClustering-Manager' > Admin, I see event 4683 - The error was 'The IP Address 10.10.9.150' is already used'. Not sure why that would make SQL Server fail, but none of the other resources. For all the 'Failover' folders in Event Viewer, none of the 'Diagnostic' logs have any events.
Generated and checked the cluster.log file on both servers. For some reason, the time is off in that log, so it's hard for me to pinpoint when the errors below occurred:
[RES] Physical Disk: Resource SQL Network Name (CSDBNAME) is not in online or pending state.
[RES] SQL Server : ResUtilSetResourceServiceEnvironment: Failed to open service key for MSSQL$NAME, error = 2.
[RES] SQL Server : [sqsrvres] OnlineThread: ResUtilSetResourceServiceEnvironment failed (status cb)
[RES] SQL Server : [sqsrvres] OnlineThread: Error cb bringing resource online.
[RHS] Online for resource SQL Server (NAME) failed.
[RCM] rcm::RcmResource::HandleFailure: (SQL Server (NAME))
That's all the log info I can find. Any other ideas to successfully move resources from one node to the other?
First
For the cluster disk you can see in "Disk manager" if it is "reserved".
Try to add a disk cluster like you have done before and see if it move.
Second
For de IP Address, you can try to stop this resource and test a ping from node 1 and form node 2 to check if this IP exist or not
Third
Check in Active Directory, to see if it didn't have a problem with security autorisation in the failover ressources
Check if cluster Name had some autorisation in the network name service

Resources