I have 3 SQL servers in a mirror, one acting as witness with a DB mirrored with high safety. When I fail over from principal to mirror, the app goes down and sometimes comes back up after 2 refreshes, sometimes 5 minutes. It keeps trying to connect to the principal (I can see the denied attempts in the log for the user). These are not named instances.
I've tried with and without including "Failover Partner" in the connection string (shouldn't need it with a witness I think). And I am using DNS names for the principal and mirror, matching what it is in "mirroring_partner_instance". I've tried adding a timeout to the connection string and specifying on TCPIP. I've also turned on SQL browser. Just things I have tried based on other posts.
I remember mirror fail overs being very fast, almost instant. Am I just being too impatient? Is there a way to speed up the retry?
Related
I am in a unique situation where I need to test my server connectivity to Oracle databases however I do not have access to any account or password.
Reason why the connectivity needs to be tested is because many times there are multiple layers of firewalls between my servers and the database, and also particularly recently while trying to access RAC/Exadata databases we realized that doing a telnet on the "scan" IP range (which were the only range visible to me) was not enough and that there are underlying physical/virtual IPs that are actually used to connect which were blocked. If I can test connectivity I can at least confirm the database is accessible.
I thought about connecting using sqlplus test#DB, where "test" account doesn't actually exist. If I get a reply saying that incorrect username/password logon denied, then at least I know the database connectivity is working because at least it reached the database to perform authentication. But I have audit concerns (whether DBAs will think someone is trying to hack the system) and also whether there's an actual way or command to do this test.
like #OldProgrammer pointed out, this is pretty much an optimal case for tnsping from the command line
tnsping MY_SERVICE_NAME
Here's a good post showing the basic options. Oh, and I'm pretty sure the DBA's can still see the traffic if they want to.
I have error in my log for a few weeks, I searched a lot but I couldn't found useful answer.
I did close SQL Server port for public IP, But I have problem yet.
Error: 17806, Severity: 20, State: 14.
SSPI handshake failed with error code 0x8009030c, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. The logon attempt failed [CLIENT: 10.10.3.25]
Time raised: 27 Jan 2015 2:23 PM
It was raised error while this system was off.
The Scenario –
A couple of separate individual Windows ID’s started generating these errors while attempting connections, all other windows logins were working properly. The connections were initially happening through applications, but also occurred through sqlcmd. When logged in to the server locally with the offending ID’s the connections to SQL would succeed.
The Troubleshooting process –
Check all the regular SSPI issues, I wont bore you with the details as they are easily searchable
A relatively easy way of checking the “easy” authentication issues If possible/appropriate is to log into the SQL Server locally with the offending ID and fire up sqlcmd and connect to the server via sqlcmd –Sservername,port –E (by specifying the port you force TCP/IP instead of LPC, thereby forcing the network into the equation)
Verify whether the login is trying to use NTLM or Kerberos (many ways to do this but simplest is to see if there are any other KERBEROS connections on the machine)
SELECT DISTINCT auth_scheme FROM sys.dm_exec_connections
If Kerberos is in use, there are a few additional things to verify related to SPN’s, since only NTLM was in use on this server I skipped that
Determine if the accounts were excluded from connecting to the machine through the network through a group policy or some other AD setting
After all of these checked out OK, I began to try and figure out what the error code 0x8009030c meant, turns out, its fairly obvious what the description is : sec_e_logon_denied. This description was so helpful I thought about making this server into a boat anchor but, luckily for my employer the server room is located many miles away and has armed guards.
Since I knew we could logon locally to the SQL Server with the ID that SQL was rejecting with logon denied something else was trying to make my life miserable.
We didn’t have logon failure security auditing turned on so, I had no way of getting a better error description, As luck would have it though this would prove instrumental in finding the root cause. To get a better error message, I found this handy KB article detailing steps needed to put net logon into debug mode.
Say hello to my new best friend! — nltest.exe
After downloading nltest & using it to enable netlogon debugging on the SQL Server, I got this slightly better message in the netlogon.log file
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Entered
06/15 14:15:39 [CRITICAL] NlPrintRpcDebug: Couldn’t get EEInfo for I_NetLogonSamLogonEx: 1761 (may be legitimate for 0xc0000064)
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Returns 0xC0000064
The error code 0XC0000064 maps to “NO_SUCH_USER”
Since I was currently logged in to the server with the ID that was returning no such user, something else was obviously wrong, and luckily at this point I knew it wasn’t SQL.
Running “set log” on the server revealed that a local DC (call it DC1) was servicing the local logon request.
After asking our AD guys about DC1 and its synchronization status, as well as whether the user actually existed there, everything still looked OK.
After looking around a bit more I discovered this gem of a command for nltest to determine which DC will handle a logon request
C:\>nltest /whowill:Domain Account
[16:32:45] Mail message 0 sent successfully (\MAILSLOT\NET\GETDC579)
[16:32:45] Response 0: DC2 D:Domain A:Account (Act found)
The command completed successfully
Even though this command returned “act found” it was returning from DC2. (I dont exactly understand why the same account would authenticate against 2 different DC’s based on a local desktop login or a SQL login but it apparently can)
After asking the AD guys about DC2 the light bulbs apparently went off for them as that server actually exists behind a different set of firewalls, in a totally different location. While DC2 would return a ping, the console wouldn’t allow logons for some reason. After a quick reboot of DC2, and some magic AD pixie dust (I am not an AD admin, if it wasn’t totally obvious from my newfound friend nltest) the windows Id’s that were having trouble started authenticating against DC3 and our SSPI errors went away.
Interesting tidbit — During troubleshooting, I found that this particular SQL Server was authenticating accounts against at least 5 different DC’s. Some of this might be expected since there are different domains at play but, I haven’t heard a final answer from the AD guys about whether it should work that way.
The solution
Reboot the misbehaving DC, of course there may be other ways to fix this by redirecting requests to a different DC without a reboot but, since it was misbehaving anyway, and the AD experts wanted to reboot so we went with that. A reboot of SQL would have likely solved this problem too but, I hate reboot fixes of issues, they always seem to come back!
reference
I have 3 servers set up for SQL mirroring and automatic failover using a witness server. This works as expected.
Now my application that connects to the database, seems to have a problem when a failover occurs - I need to manually intervene and change connection strings for it to connect again.
The best solution I've found so far involves using Failover Partner parameter of the connection string, however it's neither intuitive nor complete: Data Source="Mirror";Failover Partner="Principal" found here.
From the example in the blog above (scenario #3) when the first failover occurs, and principal (failover partner) is unavailable, data source is used instead (which is the new principal). If it fails again (and I only tried within a limited period), it then comes up with an error message. This happens because the connection string is cached, so until this is refreshed, it will keep coming out with an error (it seems connection string refreshes ~5 mins after it encounters an error). If after failover I swap data source and failover partner, I will have one more silent failover again.
Is there a way to achieve fully automatic failover for applications that use mirroring databases too (without ever seeing the error)?
I can see potential workarounds using custom scripts that would poll currently active database node name and adjust connection string accordingly, however it seems like an overkill at the moment.
Read the blog post here
http://blogs.msdn.com/b/spike/archive/2010/12/15/running-a-database-mirror-setup-with-the-sqlbrowser-service-off-may-produce-unexpected-results.aspx
It explains what is happening, the failover partner is actually being read from the sql server not from your config. Run the query in that post to find out what is actually being used as the failover server. It will probably be a machine name that is not discoverable from where your client is running.
You can clear the application pool in the case a failover has happened. Not very nice I know ;-)
// ClearAllPools resets (or empties) the connection pool.
// If there are connections in use at the time of the call,
// they are marked appropriately and will be discarded
// (instead of being returned to the pool) when Close is called on them.
System.Data.SqlClient.SqlConnection.ClearAllPools();
We use it when we change an underlying server via SQL Server alias, to enforce a "refresh" of the server name.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.clearallpools.aspx
The solution is to turn connection pooling off Pooling="false"
Whilst this has minimal impact on small applications, I haven't tested it with applications that receive hundreds of requests per minute (or more) and not sure what the implications are. Anyone care to comment?
Try this connectionString:
connectionString="Data Source=[MSSQLPrincipalServerIP,MSSQLPORT];Failover Partner=[MSSQLMirrorServerIP,MSSQLPORT];Initial Catalog=DatabaseName;Persist Security Info=True;User Id=userName; Password=userPassword.; Connection Timeout=15;"
If you are using .net development, you can try to use ObjAdoDBLib or PigSQLSrvLib and PigSQLSrvCoreLib, and the code will become simple.
Example code:
New object
ObjAdoDBLib
Me.ConnSQLSrv = New ConnSQLSrv(Me.DBSrv, Me.MirrDBSrv, Me.CurrDB, Me.DBUser, Me.DBPwd, Me.ProviderSQLSrv)
PigSQLSrvLib or PigSQLSrvCoreLib
Me.ConnSQLSrv = New ConnSQLSrv(Me.DBSrv, Me.MirrDBSrv, Me.CurrDB, Me.DBUser, Me.DBPwd)
Execute this method to automatically connect to the online database after the mirror database fails over.
Me.ConnSQLSrv.OpenOrKeepActive
For more information, see the relevant links.
https://www.nuget.org/packages/ObjAdoDBLib/
https://www.nuget.org/packages/PigSQLSrvLib/
https://www.nuget.org/packages/PigSQLSrvCoreLib/
I'm trying to set up mirroring between two sql 2008 databases on different servers in my internal network, as a test run before doing the same thing with two live servers in different locations.
When I actually try and switch the mirroring on the target DB (with
ALTER DATABASE testdb SET PARTNER = N'TCP://myNetworkAddress:5022') I'm getting an error telling me that the server network address can not be reached or does not exist. A little research suggests this is a fairly unhelpful message that pops up due to a number of possible causes, some of which are not directly related to the server existing or otherwise.
So far I've checked and tried the following to solve this problem:
On the target server, I've verified that in SQL Configuration Manager that "Protocols for SQLEXPRESS" (my local installation is labelled SQLEXPRESS for some reason, even though querying SERVERPROPERTY('Edition') reveals that it's 64-bit Enterprise), and Client Protocols for SQL Native Client 10 all have TCP/IP enabled
I'm using a utility program called CurrPorts to verify that there is a TCP/IP port with the same number specified by the mirroring setup (5022) is open and listening on my machine. Netstat verifies that both machines are listening on this port.
I've run SELECT type_desc, port FROM sys.tcp_endpoints; and
SELECT state_desc, role FROM sys.database_mirroring_endpoints to ensure that everything is set up as it should be. The only thing that confused me was the "role" returns 1 .. not entirely sure what that means.
I've tried to prepare the DB correctly. I've taken backups of the database and the log file from the master DB and restored them on the target database with NORESTORE. I've tried turning mirroring on both while leaving them in the NORESTORE state and running an empty RESTORE ... neither seems to make much difference. Just as a test I also tried to mirror an inactive, nearly empty database that I created but that didn't work either.
I've verified that neither server is behind a firewall (they're both on the same network, although on different machines)
I've no idea where to turn next. I've seen these two troubleshooting help pages:
http://msdn.microsoft.com/en-us/library/ms189127.aspx
http://msdn.microsoft.com/en-us/library/aa337361.aspx
And as far as I can tell I've run through all the points to no avail.
One other thing I'm unsure of is the service accounts box in the wizard. For both databases I've been putting in our high-level access account name which should have full admin permissions on the database - I assumed this was the right thing to do.
I'm not sure where to turn next to try and troubleshoot this problem. Suggestions gratefully received.
Cheers,
Matt
I think that SQL Express can only act as a witness server with this SQL feature, you might get better mileage on ServerFault though.
Mike.
Your network settings might be OK. We got quite non-informative error messages in MS SQL - the problem might be an authorization issue and the server still will be saying "network address can not be reached".
By the way, how the authentication is performed? A MSSQL service (on server1) itself must be runned as a valid db user (on server2, and vice versa) in order to make the mirroring work.
ActiveDirectory Server 2003
I am using the ActiveDirectoryMembershipProvider and ADroleProvider. They work great. Until my active directory server restarts in the middle of the day to get updates. (I'm not in charge of the server and can't change this). When this happens, for the five minutes the server is rebooting, my users can't use my website because I've tied my menu to the Role Provider. So, here are my questions:
Is it possible to tell my RoleProvider to use the "next" available ADS? If so, how so that while the initial one reboots, I don't frustrate my users with ADS connection messages?
Should I be using some kind of connection pool that automatically reconnects to the available server? If so, how?
Let's imagine that all my active directory servers go down. Is there a way to keep my web application running? Obviously there are bigger problems if all servers are down, but what I'm after is a possible "disconnected" active directory authentication that will still move forward if the server somehow goes kaput. Is this wise AND possible?
You probably have the server connection string set to "server01.domain.local". If you change it to just "domain.local" you're no longer depending on "server01" being online. Instead you will use the Round Robin feature of Active Directory DNS to get a list of all domain controllers and use one that's online. (I don't think your admins reboot all of the domain controllers at the same time...)
Also try running nslookup domain.local a couple of times in succession in a command prompt to see the order changing.