Adding a new node to an existing SQL cluster - sql-server

I am new to fail-over clustering, I have a two node Windows fail-over cluster (windows server 2016) with SQL Server 2016 Installed in cluster mode which configured successfully and everything is working fine. I needed to configure an Always-On Availability to a DR site, but ended up in error while adding the third node.
I have uninstalled Antivirus,checked for duplicate name in the AD of which there are non and it didn't fix my problem.
Cluster service on node xxxxxxxx did not reach the running state. The error code is 0x5b4. For more information check the cluster log and the system event log from node xxxxxx. This operation returned because the timeout period expired.
Operation failed, attempting cleanup.
The server 'xxxxxx.xxx.net' could not be added to the cluster. An error occurred while adding node 'xxxxxx.xxxx.net' to cluster 'xxxxxx'.
This operation returned because the timeout period expired
From the event viewer in failover cluster manager it says xxxxx has been evicted from the cluster.

You should first run following PowerShell command
Clear-ClusterNode -Name nodeName -Force
After running this command try to add node back to the cluster

Have You tried to add third server from Failover Cluster Manager (FCM) i.e. have you spinned the Role/Server from FCM or from Hyper V manager. Please try to spin Vm from FCM

Related

SQL Server 2019 Always On AG - DBEngine Service account locking

I am having a strange issue and am hoping you guys might be able to help!
Problem: I have a 2 node SQL Server 2019 Availability Group Cluster utilising a FSW. Both nodes are using the same DBEngine Service account. and it's been working fine for quite some time.
Today I restarted the passive node DBEngine account . When the node came back up, it was no longer synchronising with node 1. The state of the replica was disconnected, and I could see lots of login failures on Node 1 (active node) SQL Logs.
I found that the DBEngine service account had locked. I had it unlocked, but it soon locked again.
Has anyone got any ideas? Any input would be greatly received!
Steps I tried:
created a new service account to rule out the account being used elsewhere, started both nodes under the new account.... account locked out when node 2 started
unlocked the account, stopped node 2. restarted node 1. Account fine... waited.. account still fine. Started node 2 service... account locked out.
recreated mirroring endpoints on both nodes and reapplied connect permissions to the dbengine service account. - this didn't fix it.
restarted both Servers.
removed the node 2 replica from the availability group, removed all databases (from node 2) and dropped the mirroring endpoint on node 2. restarted node 2 service. - at this point both nodes were happily running under the same service account.
tried re-adding node 2 as a replica using the wizard. It added it, backed up the database, restored to node 2, and got to the very last step where it connects it, and the password locked out again!
The account gets locked if someone is using wrong password.
You can check task scheduler if any task using service account.
If application uses same service account. It could be due to caching of old credentials.

SQL Server Cluster Failover -

I am working on a project that requires a reconfigurement of the current Failover Cluster settings. The project itself requires the availability of a report server as well as the login by use of a windows authenticated account.
The current situation is: Server runs perfect while on the first node, but whenever a failover occurs, and the cluster switches over to the second node, there is an issue with gaining access to the report server. When the cluster is running on the second node, on SSMS the availability group shows that the cluster is in (Secondary) position, while the replicas themselves are as follows: Node1 (secondary), role: Secondary; node2: [blank], role: Unknown.
This also brings up an error saying having issues logging into to the report server from node2.
If anyone knows of anyway, or settings to be changed, I would be greatly appreciative.
Thank you!

The MSDTC transaction manager was unable to push the transaction to the destination transaction manager due to communication problems

I have a BizTalk server and a SQL server which BizTalk sends messages via WCF-SQL to. The BizTalk server has been calling to this server for over a year with no problems. I came in this morning any suddenly it can't (it was working on Friday).
The full error I'm getting when calling the WCF-SQL endpoint is:
A message sent to adapter "WCF-SQL" on send port "MyPort" with URI "mssql://mySQLServer" is suspended.
Error details: System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException:
The MSDTC transaction manager was unable to push the transaction to the destination transaction manager due to communication problems.
Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers. (Exception from HRESULT: 0x8004D02A)
at System.Transactions.Oletx.ITransactionShim.Export(UInt32 whereaboutsSize, Byte[] whereabouts, Int32& cookieIndex, UInt32& cookieSize, CoTaskMemHandle& cookieBuffer)
at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
I've followed instructions from the following thread:
MSDTC on server 'server is unavailable
I've run msdtc -uninstall then msdtc -install and restarted the service several times.
I've rebooted the server several times.
I can connect to the database using Sql Server Management Studio
DTCPing when trying to connect from the SQL server to the Biztalk server results in (when DTCPing is running on the BizTalk):
Problem:fail to invoke remote RPC method
Error(0x6BA) at dtcping.cpp #303
-->RPC pinging exception
-->1722(The RPC server is unavailable.)
RPC test failed
when going from Biztalk to SQL I get this (even thought DTCPing is running on the other end)
Please refer to following log file for details:
C:\Temp\DTCPing\myserv.log
Invoking RPC method on dbaditest
RPC test is successful
++++++++++++RPC test completed+++++++++++++++
Please start PING from dbaditest to complete the test
neither server is running a firewall at all
I'm all out of things to try.
Edit: I can confirm that other servers/computers can connect to the SQL server. So I have to assume that it's the BizTalk server that is the problem.
Edit 2: I tried connecting from BizTalk Server to another SQL server on the network and got the same error. I'm moments away from throwing my hands up and rebuilding my dev environment -- ugg :(
Edit 3: I can telnet to port 135 from BizTalk to SQL Server, so there's nothing blocking it.
Edit 4: DTCTester results in:
tablename= #dtc24449
Creating Temp Table for Testing: #dtc24449
Warning: No Columns in Result Set From Executing: 'create table #dtc24449 (ival int)'
Initializing DTC
Beginning DTC Transaction
Enlisting Connection in Transaction
Error:
SQLSTATE=25S12,Native error=-2147168242,msg='[Microsoft][ODBC SQL Server Driver]Distributed transaction error'
Error:
SQLSTATE=24000,Native error=0,msg=[Microsoft][ODBC SQL Server Driver]Invalid cursor state
Typical Errors in DTC Output When
a. Firewall Has Ports Closed
-OR-
b. Bad WINS/DNS entries
-OR-
c. Misconfigured network
-OR-
d. Misconfigured SQL Server machine that has multiple netcards.
Aborting DTC Transaction
Releasing DTC Interface Pointers
Successfully Released pTransaction Pointer.
You've already taken some steps here, but carefully go through the MSDN Article on Troubleshooting MSDTC.
I'd be concerned that someone imaged another server off of yours, but uninstalling and reinstalling MSDTC should have fixed that. It might be worth checking on these registry values as well (from the above link):
Windows enhances security by requiring authenticated calls to the RPC interface. This functionality is configurable through the EnableAuthEpResolution and RestrictRemoteClients registry keys. To ensure that remote computers are able to access the RPC interface, follow these steps:
Click Start, click Run, type regedit.exe, and then click OK to start Registry Editor.
Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT
Under the RPC key, create the following DWORD entries with the indicated values. If the RPC key does not exist then it must be created.
DWORD entry Default value Recommended value
EnableAuthEpResolution 0 (disabled) 1
RestrictRemoteClients 1 (enabled) 0
Close Registry Editor.
Restart the MSDTC Service.
Is your BizTalk/SQL computer name unique? (no conflicts with other machine)
Can you DTC connect to another SQL server from your BizTalk server? I would suggest you to use DTCTester
testing the DTC connection instead of DTCPing.
Not sure if this will help but thought I'd mention it.
From BOTH servers:
Start -> Admin Tools -> Component Services
Expand Component Services -> Computers -> My Computer -> Distributed Transaction Coordinator and right-click Local DTC. Go to Security tab and check over the settings there.
Enable Network DTC Access
Allow Remote Clients
Allow Inbound/Outbound as required
Select correct authentication
Enable XA Transactions as required
MSDTC Service should auto restart. These settings could perhaps have changed since Friday? I have had this happen before for reasons unknown
Wow, I finally figured it out. As most people said, it MUST be some kind of network issue (and I didn't disagree). The kicker was that my PC was allowed DTC from it to SQL, but the VM running on my PC didn't. What it ended up being was that we were pushed to install Symantec Endpoint Protection just last week (right before I left for the weekend).
I uninstalled it and all it working now.

add node fails w/ Azure WSFC 2012 for SQL2012 AlwaysOn Availability Grps

Adding node fails Windows Server 2012 Failover Cluster for AlwaysOn Availability Groups in all AZURE, is leaving an apparent phantom VM node. How can I cleanup up?
Server property for target server VM is flagged as "clustered", but is not. There was another node added successfully, but when trying again to add the node , that failed earlier, does not work, as cluster manager reports back that target "xxxxx server is already clustered".
I evicted the the single node, then "destroyed cluster". Then created anewly named cluster. Added one node, but when trying to add the "problem" sql server VM, I get same return msg : "server is already in a cluster". When I remote into the target sql Azure VM, server manager shows the server as "Clustered". I can not find any way to clean this failed operation up.
When I open FO cluster mgr on the SQL VM, I see red-x'ed the cluster name of the cluster I had previously "destroyed". Same VNET, same subnet. Validates OK on cluster build up to point of failure when trying a add 2nd SQL VM node to cluster.
No solution was found after rambling thru msdn, technet. Had to delete azure vm,s completely, but note that in the
same cloud viewed thru ms new portal, parts still are displayed in those pages. Like Loose random balloons drifting around azure chaos...

SQL Server 2008 R2 Cluster - SQL Server instance Fails after moving it to another node

I have 2 nodes in a SQL Server 2008 R2 Cluster that I inherited. Looking at the 'Failover Cluster Manager', under 'Services and applications', I see 13 SQL Server instances. It, and all of its resources, are owned by one node. My thought is that they should be evenly distributed between the 2 servers.
When I try to move one of these instances to the other node, everything comes back online with the exception of 'SQL Server (Name)' under Other Resources. It says 'Failed'. When I try to manually bring it online, I get an error message
The operation has failed. An error occurred while attempting to bring the resource 'SQL Server (NAME)' online.
Under see details, it says
Error Code: 0X8007139a The cluster resource could not be brought online by the resource monitor
In system event viewer on the target server, I see the events 1069 and 1205, which both basically say "cluster service or application failed". Under the folder 'FailoverClustering-Manager' > Admin, I see event 4683 - The error was 'The IP Address 10.10.9.150' is already used'. Not sure why that would make SQL Server fail, but none of the other resources. For all the 'Failover' folders in Event Viewer, none of the 'Diagnostic' logs have any events.
Generated and checked the cluster.log file on both servers. For some reason, the time is off in that log, so it's hard for me to pinpoint when the errors below occurred:
[RES] Physical Disk: Resource SQL Network Name (CSDBNAME) is not in online or pending state.
[RES] SQL Server : ResUtilSetResourceServiceEnvironment: Failed to open service key for MSSQL$NAME, error = 2.
[RES] SQL Server : [sqsrvres] OnlineThread: ResUtilSetResourceServiceEnvironment failed (status cb)
[RES] SQL Server : [sqsrvres] OnlineThread: Error cb bringing resource online.
[RHS] Online for resource SQL Server (NAME) failed.
[RCM] rcm::RcmResource::HandleFailure: (SQL Server (NAME))
That's all the log info I can find. Any other ideas to successfully move resources from one node to the other?
First
For the cluster disk you can see in "Disk manager" if it is "reserved".
Try to add a disk cluster like you have done before and see if it move.
Second
For de IP Address, you can try to stop this resource and test a ping from node 1 and form node 2 to check if this IP exist or not
Third
Check in Active Directory, to see if it didn't have a problem with security autorisation in the failover ressources
Check if cluster Name had some autorisation in the network name service

Resources