Do connection string DNS lookups get cached? - sql-server

Suppose the following:
I have a database set up on database.mywebsite.com, which resolves to IP 111.111.1.1, running from a local DNS server on our network.
I have countless ASP, ASP.NET and WinForms applications that use a connection string utilising database.mywebsite.com as the server name, all running from the internal network.
Then the box running the database dies, and I switch over to a new box with an IP of 222.222.2.2.
So, I update the DNS for database.mywebsite.com to point to 222.222.2.2.
Will all the applications and computers running them have cached the old resolved IP address?
I'm assuming they will have.
Any suggestions along the lines of "don't have your IP change each time you switch box" are not too welcome as I cannot control this aspect of the situation, unfortunately. We are currently using the machine name of the box, which changes every time it dies and all apps etc. have to be updated with the new machine name. It hurts.

Even if the DNS is not cached local to the machine, it will likely be cached somewhere along the DNS chain between the machine and the name servers, at least for a short while. My understanding is this situation would usually be handled with IP takeover where you just make the new machine 111.111.1.1.
Probably a question for serverfault.

You're looking for DNS TTL (Time To Live) I guess.. In my opinion applications may cache the IP for at most the value of the TTL. I'm afraid however that some applications/technologies might actually cache it longer (agian in my opinion completely wrong)

Each machine will cache the ip address.
The length of time it is cached is the TTL (Time To Live). This is a setting on your DNS server, if you set it very low say 5 mins, then you show be up and running fairly quikly. A bit of a hack but it should work.

Yes, the other comments are correct in that what controls this is the DNS TTL set for the hostname database.mywebsite.com.
You'll have to decide what the maximum amount of time you're willing to wait for if you have a failure on your primary address (111.111.1.1) after you make the switch to the secondary address. Lower settings will give you a quicker recovery time, but will also increase the load and bandwidth to your DNS server because clients will have to re-query it to refresh their cache more often.
You can use nslookup using the -d option from your cmd prompt to see what your default TTL times and remaining TTL times are for the DNS server you are querying.
%> nslookup -d google.com

You should assume that they are cashed for two reasons not clearly mentioned before:
1- Many "modern" versions of OS families do DNS caching.
2- Many applications do DNS caching or have poor error/failure detection on live connections and/or opening new connections. This would possibly include your database client.
Also, this is probably not well documented. I did some googling, and found this for MySQL:
http://dev.mysql.com/doc/refman/5.0/en/connector-net-programming-connecting-connection-string.html#connector-net-programming-connecting-errors
It does not clearly explain its behavior in this regard.

I had a similar issue with a web site that disables the application pool recycling features and runs for weeks on end. Sometimes, a clustered SQL Server box would restart and for some reason, my SqlConnection's were not reconnecting. I was getting the error:
A network-related or instance-specific
error occurred while establishing a
connection to SQL Server. The server
was not found or was not accessible.
Verify that the instance name is
correct and that SQL Server is
configured to allow remote
connections. (provider: Named Pipes
Provider, error: 40 - Could not open a
connection to SQL Server)
The server was there - and running - in fact, if I just recycled the app pool, the app would work fine - but I don't like recycling app pools!
The connections that were being held in the connection pool were somehow using old connection information, and that could have been old IP addresses. This is what seems so similar to the poster's question, that it appears to be cached DNS information, because as soon as some sort of a cache is cleared, the app works fine.
This is how I solved it - by forcing all of the connections in the pool to be re-created:
Try
' Example: SqlDependency, but this could also be any SqlConnection.Open call
Dim result As Boolean = SqlClient.SqlDependency.Start(ConnStr)
Catch sqlex As SqlClient.SqlException
SqlClient.SqlConnection.ClearAllPools()
End Try
The code sample is just the boiled-down basics - it should be tweaked for your situation!

The DNS gets cached, but for any server that resolves to the wrong ip address, you can update the HOSTS file of the server and the ip should be updated immediately. This could be a solution if you have a limited amount of servers accessing your database server.

Related

Very slow SQL query when using IP in connection string

I am having a real big headache with slow SQL query. I have two connection string which are as follows :
ConnectString1 = "Driver={SQL Server};Server=MSSQLSERVER5;Database=SchoolMain;Uid=Admin;Pwd=admin101;"
and
ConnectString2 = "Provider=SQLOLEDB.1;Persist Security Info=False;User ID=Admin;Password=admin101;Initial Catalog=SchoolMain;Data Source=192.168.1.2,1433"
Both are connecting to a SQL Server database instance on the same system. The first connects using the database instance name, while second connects using IP address (over the internet to the system) and port number.
The query with ConnectString1 is very first, takes less than 2 seconds to execute while query using ConnectString2 is extremely very slow and most times comes back with timeout expired error.
I have searched everywhere on the internet and still cannot find where the issue is. I read about turning off the LLMNR protocol and adding entries in the hosts file to tackle Reverse DNS, followed the steps but it's still the same as query with ConnectString2 is still very slow.
Though when I changed the IP address in ConnectString2 from 192.168.1.2 to 127.0.0.1, the query works very fast, just exactly as it is with ConnectString1.
Is there a way to route all IP address to 127.0.0.1 on the machine?
I need ConnectString2 to work query will be pushed over the IP address from other systems outside the LAN.
Note: I am using SQL Server 2008
Please help.
The main reason this may happen is reverse DNS, meaning the sql needs to translate the IP to a physical address and it takes time.
To fix this problem use your host file to add the address to your machine and than use the name given at the host file instead of the IP.
If you do not want to temper with the host file try to look at the alternate solution to enable the TCP/IP which is disabled by default.

Strange mikrotik dns relation to firebird database

In one company there is windows server 2008 hosting firebird 2 database.
Clients are using some software to connect from local machines to this database.
Network is running on few mikrotik routers.
When i change main gateway mikrotik router dns to cleanbrowsing ip addresses (185.228.168.10 and 185.228.169.11), software can not connect fo this firebird database.
When i use 8.8.8.8 dns or 1.1.1.1 - no such problems.
Software does not relate to dns, i know this because it is written by me in c#.
How possible is that and why it happens?
Changing the main gateway router's DNS server to another upstream server means you are potentially getting different responses to DNS queries. Assuming that nothing else has changed on your network, I imagine one of the following:
Your new DNS provider does not have special config for the dns entries you are querying
Your new DNS provider is located somewhere else physically, and you are running into a situation where geolocation matters (different dns responses to differently located users)
There is another gadget on the network intercepting DNS and is unaware of the change you are making. For example a NAT rule on a router that redirects 8.8.8.8 to an internal DNS server.
I agree with your assessment that the software is probably not causing this, because you changed infrastructure, I think that this is an infrastructure problem.
With 15+ years of experience running FirebirdSQL in small networks, I always set following things to prevent such problems:
The first DNS at the router's DHCP should point to the router's IP (gateway) itself, so it resolves local pc names easier
Setting a (random?) DHCP domain name at router's setup is recommended too
Edit/replace the firebird.conf file with one of fixed default port (3050) + event port (3051).
Opening those ports on each PC's firewall is a MUST. Both incoming and outgoing. You may narrow it to local IP range to prevent outside attacks. (Create a script once, run it on each PC as Admin once.)
Usually I also add "fbserver.exe" to firewall exception too
Restart FirebirdSQL service (or the whole PC) after changing gateway or DNS or firebird.conf

The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems

I have hosted my WebApp on server 1 and my database on server 2
But I'm getting following error
Communication with the underlying transaction manager has failed.
I googled and found a post which mentioned that it is the issue of DTC(Distributed Transaction)
I enabled DTC on server2(DB server) and made an exception of it in Firewall.
But still same error.
Here is the full stack trace
Message: System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException: The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems. Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers. (Exception from HRESULT: 0x8004D02B)
at System.Transactions.Oletx.IDtcProxyShimFactory.ReceiveTransaction(UInt32 propgationTokenSize, Byte[] propgationToken, IntPtr managedIdentifier, Guid& transactionIdentifier, OletxTransactionIsolationLevel& isolationLevel, ITransactionShim& transactionShim)
at System.Transactions.TransactionInterop.GetOletxTransactionFromTransmitterPropigationToken(Byte[] propagationToken)
Kindly advice
We had the exact same situation, and more than once. Each time, it was one of the following:
The IP address in the DNS for the server is outdated (as said in error message: "two machines cannot find each other by their NetBIOS names"). You can check if this is the case by trying ping servername from one server to another in the command prompt. If the ping by name fails and ping by IP succeeds (or ping by name returns the wrong IP), than you should talk to the System Admins to take a look at DNS/DHCP.
The servers are created as an image of preconfigured server (for example, if you are working with virtual machines, and instead of doing a fresh install for each of the servers, you simply clone the image). This is a problem because DTC has an internal "Identifier" - and in case of image cloning both your installations now have same DTC ID, and won't be able to communicate with each other. The solution is to simply uninstall and install the DTC again.
Hope it helps.
Things to check:
Have you done this configuration on both servers?
Are both servers members of the same domain?
Have you checked the event log?
I had the same problem while connecting to a remote SQl Server.
The solution in my case was to add "enlist=false" to the connection string.
I was missing quite a lot of things:
No authentication (as DB server and APP server and not within same AD domain)
Rule to Windows Firewall enabling msdtc.exe
Rule to firewall between DMZ and internal zone TCP 135,1024-65535 in both directions. The link tell you how to restrict the firewall policy to few ports only.
short / long server names to hosts or a shared DNS server. Eg. 192.168.1.1 app1 as well as 192.168.1.1 app1.domain.local
On the other hand based on this link my setup doesn't require:
Allow Remote Clients
Allow Remote Administration
Enable XA Transactions (required prior Windows Server 2003 SP1)
Solved after adding remote IP\machine name to files on server:
hosts, lmhosts
in folder
C:\Windows\System32\drivers\etc
One of our servers displayed this error after the Virtual Machine (VM) controlling our Domain Controller froze. Several related communication problems also started to pop up (like failed password resets). Resetting the frozen VM fixed the issue.
Lots of helpful answers already given.
One problem for me was the presence of invalid (cyrillic) characters in the computer name.
And there is also a way to validate the connection between two servers (or between a server and a computer) using a small tool from Microsoft called DTCPing.

Sql Server JDBC Connection Reset Error : Only on Amazon EC2

Context: The Cloud
We have a java-based web application that we normally host on our own servers. Recently we used Amazon Web Services (AWS EC2) cloud to host an instance.
This "cloud setup" matches our typical "on site" setup: one server for the app server, another server for the database server. (Several app servers point to the same database server)
The problem
In this cloud setup, we receive intermittent "connection reset by peer errors" between the database and the jdbc driver, where at (seemingly) random intervals and at random points in the codebase, the database connection fails.
Here are a few error excerpts for the log
Stack Trace Example 1:
at com.participate.pe.genericdisplay.client.taglib.GenDisplayViewTag.doStartTag(GenDisplayViewTag.java:77)
... 75 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:304)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getMetaData(SQLServerConnection.java:1734)
at org.jboss.resource.adapter.jdbc.WrappedConnection.getMetaData(WrappedConnection.java:354)
Stack Trace Example 2
at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4437)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4389)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1457)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:1462)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.setAutoCommit(SQLServerConnection.java:1610)
at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.checkTransaction(BaseWrapperManagedConnection.java:429)
Technical Environment
Jboss 4.2.2.GA (Jboss-Web 2.0/ Tomcat 6)
MSSQL 2005 2.0 jdbc driver
Some points
We have never seen this problem in
our own environment (i.e. own data centers) running the application for several years
This led me to conclude "something funny is going on with Amazon network environment". I may be wrong/missing something/etc.
This problem only occurs with our application. We have other java and php applications which have not had this problem. The other java application uses a different jdbc driver (jtds, afaik)
It doesn't seem like a simple connection timeout
Questions
-Has anyone seen this before?
-If it's an EC2 "known issue", can we configure our way around the problem (i.e. make sure everything is on its own subnet or virtual private cloud (vpc) ?
-Any jdbc driver settings to get past this problem?
** Update **
I've extended and increased the bounty on this question.
On extra bit of information: the two virtual servers (database and application server) were on different subnets--i.e. one hop between the two servers.
In a non-cloud environment we have "zero hops" bewtewn the two servers.
Our hosting admins said we had no control over the subnets of our EC2 instances. This made me wonder if virtual private cloud would help.
thanks in advance
will
Not sure if this is related or not. We experienced something similar with an app that we were running in the EC2 environment. Same symptom, that the database connection would intermittently close. We were using MSSQL 1.2 driver. Also, we would see the errors usually after a delay or idle time with the connection. Our assumption (never proven) was that something in the network layer was closing the connection and the client wasn't detecting it, so it became stale.
We were able to work around it because we were using commons connection pools, and had the pool recreate the connection on failure. We eventually moved the application out of EC2 and didn't see the issue again.
Just a word of caution on usind DBCP/connection pool features to mitigate the issue - the more you enable 'testOnBorrow' and other features, the more you can introduce latency or other performance changing affects on the system. I don't know if DBCP still does this or not, but a few years ago it would generate actual test queries to test the connection - full stack, database responses - not just at the network layer. The above link from Brian brings back horrific memories from the early 2000s on surrounding re-try logic for JDBC connection management.
Anyway, it's tough to really root cause this, other than gather evidence and eliminate the 'seemingly random' to a specific set of conditions:
You could try to throw up a Wireshark/PCAP trace, find when it happens, and send the results to both Amazon and Microsoft to see if they can root cause it
You could try the above with certain test harnesses to isolate the problem (JMeter tests to get concurrency up), bounce the network connection, watch for recovery, etc
You could try alternative versions of SQL Server to discount a SQL Server/JDBC driver bug that has since been fixed.
If DNS is used in connection strings, could use IP addresses to validate nslookup issues
I'm not a SQL Server expert, but another route for research could be within the related products domain - e.g. see if anyone experienced similar issues with TFS/Sharepoint (e.g. such as http://nickhoggard.wordpress.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )
I have seen this issue in both the EC2 environment and the Windows Azure environment. I think connection retry logic needs to be a standard part of your design when working in a distributed computing environment.
This article is for SQL Azure - but I think it equally applies to EC2 and all drivers.
I can also confirm that this happens and will spin up a lower priority investigation since it's not production critical.
Our production servers are in our data center. We use developer laptops to run our applications. Neither of these get this issue once we configured c3p0 connection pool timeouts and test period (see article: http://www.codefin.net/2007/05/hibernate-and-mysql-connection-timeouts.html).
However, we do have a development staging server that is in EC2 and it does indeed happen there. If I find something that seems to work, I'll ping back. Also, I'm using mysql. I see that you are using MS SQL Server so it is across database vendors.

Mirroring in SQL Server 2008

I'm trying to set up mirroring between two sql 2008 databases on different servers in my internal network, as a test run before doing the same thing with two live servers in different locations.
When I actually try and switch the mirroring on the target DB (with
ALTER DATABASE testdb SET PARTNER = N'TCP://myNetworkAddress:5022') I'm getting an error telling me that the server network address can not be reached or does not exist. A little research suggests this is a fairly unhelpful message that pops up due to a number of possible causes, some of which are not directly related to the server existing or otherwise.
So far I've checked and tried the following to solve this problem:
On the target server, I've verified that in SQL Configuration Manager that "Protocols for SQLEXPRESS" (my local installation is labelled SQLEXPRESS for some reason, even though querying SERVERPROPERTY('Edition') reveals that it's 64-bit Enterprise), and Client Protocols for SQL Native Client 10 all have TCP/IP enabled
I'm using a utility program called CurrPorts to verify that there is a TCP/IP port with the same number specified by the mirroring setup (5022) is open and listening on my machine. Netstat verifies that both machines are listening on this port.
I've run SELECT type_desc, port FROM sys.tcp_endpoints; and
SELECT state_desc, role FROM sys.database_mirroring_endpoints to ensure that everything is set up as it should be. The only thing that confused me was the "role" returns 1 .. not entirely sure what that means.
I've tried to prepare the DB correctly. I've taken backups of the database and the log file from the master DB and restored them on the target database with NORESTORE. I've tried turning mirroring on both while leaving them in the NORESTORE state and running an empty RESTORE ... neither seems to make much difference. Just as a test I also tried to mirror an inactive, nearly empty database that I created but that didn't work either.
I've verified that neither server is behind a firewall (they're both on the same network, although on different machines)
I've no idea where to turn next. I've seen these two troubleshooting help pages:
http://msdn.microsoft.com/en-us/library/ms189127.aspx
http://msdn.microsoft.com/en-us/library/aa337361.aspx
And as far as I can tell I've run through all the points to no avail.
One other thing I'm unsure of is the service accounts box in the wizard. For both databases I've been putting in our high-level access account name which should have full admin permissions on the database - I assumed this was the right thing to do.
I'm not sure where to turn next to try and troubleshoot this problem. Suggestions gratefully received.
Cheers,
Matt
I think that SQL Express can only act as a witness server with this SQL feature, you might get better mileage on ServerFault though.
Mike.
Your network settings might be OK. We got quite non-informative error messages in MS SQL - the problem might be an authorization issue and the server still will be saying "network address can not be reached".
By the way, how the authentication is performed? A MSSQL service (on server1) itself must be runned as a valid db user (on server2, and vice versa) in order to make the mirroring work.

Resources