chrony - local stratum & orphan - when there's no Internet - ntp

Most chrony server configurations I've found so far are about commenting out and setting the allow directive.
I'd like several servers in my network to be synchronized to one of my 2 dedicated chrony NTP servers, even if there's no Internet, and when each chronyd is not able to synchronize to any of X.pl.pool.ntp.org servers.
According to man, local stratum directive allows chrony to serve time 'even if not synchronized to a time source' - I'm wondering if it works in the following way: as long as chronyd is able to sync to one of the X.pl.pool.ntp.org servers, everything is fine; when chronyd is not able to sync to any external source, then it still serves as a valid NTP server for the local clients thanks to local stratum directive - am I right? Isn't it like telling the clients that chrony is always healthy no matter the Internet connection status thus you can always sync to it?
Question 1: should I comment out local stratum 10 to meet my requirements?
Question 2: I'm also considering using orphan mode on my NTP servers in the following way: local stratum 10 orphan
As far as I understand, it works in the following way: let's assume my NTP servers lose connections to all X.pl.pool.ntp.org servers (I configured them to pool the same external sources) - now - thanks to local stratum 10 orphan, the clients will always sync to the server with the lowest Reference ID first (we assume my production servers are set to poll only my local NTP servers) - am I right about it?
Regards,

Related

Finding out sources of connections to MongoDB cluster

The "Real Time Metrics" panel of my MongoDB Atlas cluster, shows 36 connections, even though I terminated all server apps that were supposed to be connected to it. Currently nothing should be connected to it, but I still see those 36 connections. I tried pausing the cluster and then resuming it - the connections came back. Is there any way for me to find out where are they coming from? OR, terminating all connections.
Each connection is supposed to provide with it what is called "app metadata". This is supposed to always include:
The driver identifier (e.g. pymongo 1.2.3)
The platform of the client (e.g. linux amd64)
Additionally, you can provide your own information to be sent as part of client metadata which you can use to identify your application. See e.g. https://docs.mongodb.com/ruby-driver/master/tutorials/ruby-driver-create-client/ :app_name option.
Atlas has internal processes that connect to cluster nodes and cluster nodes communicate with each other also. All of these add to connection count seen on each node.
To figure out where connections are coming from:
Read the server logs (which you have to download first) to obtain the client metadata sent with each connection.
Hopefully this will provide enough clues to identify cluster to cluster connections. You should also be able to tell those by source IPs which you should be able to dig out of cluster configuration.
Atlas connections should be using either Go or Java drivers, if you don't use those in your own applications this would be an easy way of telling those apart.
Add app name to all of your application connections to eliminate those from the unknown ones.
There is no facility provided by MongoDB server to terminate connections from clients. You can kill operations and sessions but connections used for those operations would remain until the clients close them. When clients close connections depends on the particular driver used and connection pool settings, see e.g. https://docs.mongodb.com/ruby-driver/master/tutorials/ruby-driver-create-client/#connection-pooling.

How do i know the hostname of my NTP server?

I set up a NTP server on my windows machine using the Meinberg Ntp server setup.
I think I have it working, but where do I find the name of the server so I can add it to the config file of the device I want to sync to the server?
You access all network services a computer hosts by its hostname or IP, independent of the protocol. Some services can also be registered in the DNS to make them "discoverable" but normally only networks of a certain size justify the effort involved in setting this up.
Simply determine the hostname of your computer and specify this as the ntp host on your device you want to sync. Perhaps the easiest way to get to the hostname is pressing lWindows + [Pause/Break][1], which shows you the system properties. Should work on most current Windows OSs.

Sql Server JDBC Connection Reset Error : Only on Amazon EC2

Context: The Cloud
We have a java-based web application that we normally host on our own servers. Recently we used Amazon Web Services (AWS EC2) cloud to host an instance.
This "cloud setup" matches our typical "on site" setup: one server for the app server, another server for the database server. (Several app servers point to the same database server)
The problem
In this cloud setup, we receive intermittent "connection reset by peer errors" between the database and the jdbc driver, where at (seemingly) random intervals and at random points in the codebase, the database connection fails.
Here are a few error excerpts for the log
Stack Trace Example 1:
at com.participate.pe.genericdisplay.client.taglib.GenDisplayViewTag.doStartTag(GenDisplayViewTag.java:77)
... 75 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:304)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getMetaData(SQLServerConnection.java:1734)
at org.jboss.resource.adapter.jdbc.WrappedConnection.getMetaData(WrappedConnection.java:354)
Stack Trace Example 2
at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4437)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4389)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1457)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:1462)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.setAutoCommit(SQLServerConnection.java:1610)
at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.checkTransaction(BaseWrapperManagedConnection.java:429)
Technical Environment
Jboss 4.2.2.GA (Jboss-Web 2.0/ Tomcat 6)
MSSQL 2005 2.0 jdbc driver
Some points
We have never seen this problem in
our own environment (i.e. own data centers) running the application for several years
This led me to conclude "something funny is going on with Amazon network environment". I may be wrong/missing something/etc.
This problem only occurs with our application. We have other java and php applications which have not had this problem. The other java application uses a different jdbc driver (jtds, afaik)
It doesn't seem like a simple connection timeout
Questions
-Has anyone seen this before?
-If it's an EC2 "known issue", can we configure our way around the problem (i.e. make sure everything is on its own subnet or virtual private cloud (vpc) ?
-Any jdbc driver settings to get past this problem?
** Update **
I've extended and increased the bounty on this question.
On extra bit of information: the two virtual servers (database and application server) were on different subnets--i.e. one hop between the two servers.
In a non-cloud environment we have "zero hops" bewtewn the two servers.
Our hosting admins said we had no control over the subnets of our EC2 instances. This made me wonder if virtual private cloud would help.
thanks in advance
will
Not sure if this is related or not. We experienced something similar with an app that we were running in the EC2 environment. Same symptom, that the database connection would intermittently close. We were using MSSQL 1.2 driver. Also, we would see the errors usually after a delay or idle time with the connection. Our assumption (never proven) was that something in the network layer was closing the connection and the client wasn't detecting it, so it became stale.
We were able to work around it because we were using commons connection pools, and had the pool recreate the connection on failure. We eventually moved the application out of EC2 and didn't see the issue again.
Just a word of caution on usind DBCP/connection pool features to mitigate the issue - the more you enable 'testOnBorrow' and other features, the more you can introduce latency or other performance changing affects on the system. I don't know if DBCP still does this or not, but a few years ago it would generate actual test queries to test the connection - full stack, database responses - not just at the network layer. The above link from Brian brings back horrific memories from the early 2000s on surrounding re-try logic for JDBC connection management.
Anyway, it's tough to really root cause this, other than gather evidence and eliminate the 'seemingly random' to a specific set of conditions:
You could try to throw up a Wireshark/PCAP trace, find when it happens, and send the results to both Amazon and Microsoft to see if they can root cause it
You could try the above with certain test harnesses to isolate the problem (JMeter tests to get concurrency up), bounce the network connection, watch for recovery, etc
You could try alternative versions of SQL Server to discount a SQL Server/JDBC driver bug that has since been fixed.
If DNS is used in connection strings, could use IP addresses to validate nslookup issues
I'm not a SQL Server expert, but another route for research could be within the related products domain - e.g. see if anyone experienced similar issues with TFS/Sharepoint (e.g. such as http://nickhoggard.wordpress.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )
I have seen this issue in both the EC2 environment and the Windows Azure environment. I think connection retry logic needs to be a standard part of your design when working in a distributed computing environment.
This article is for SQL Azure - but I think it equally applies to EC2 and all drivers.
I can also confirm that this happens and will spin up a lower priority investigation since it's not production critical.
Our production servers are in our data center. We use developer laptops to run our applications. Neither of these get this issue once we configured c3p0 connection pool timeouts and test period (see article: http://www.codefin.net/2007/05/hibernate-and-mysql-connection-timeouts.html).
However, we do have a development staging server that is in EC2 and it does indeed happen there. If I find something that seems to work, I'll ping back. Also, I'm using mysql. I see that you are using MS SQL Server so it is across database vendors.

SMO some times doesn't display the instances in sql2008 cluster

I have used SMO API.in that i have used SmoApplication.EnumAvailableServers(FALSE) and from that i have filtered local instances i have used this approch insted of true to make this as convinent for remote sqldiscovery also.using that api created a dll and use that dll in c++.
Now this is working in all combinations but some times it is failed to retrieve the instannces in win2008 sql2008 cluster combination. if i run the exe for 5 times it got succeed for 3 times and failed for two times...
What is wromg with win-sql2008 cluster .is there any additional changes needed to make it work properrly.My firewall is off and also added exception for tcp port 1433.
Anyy help is greately Appreciated...
Thanks in Advance.
SMO finds the instances through the SQL Browser Service if I recall correctly. The SQL Browser is listening on UDP 1434 which should be opened in the firewall, but for a cluster the service would be set to manual start up, again if I recall correctly, I don't have a SQL 2008/Windows 2008 cluster to check immediately. Check that the SQL Browser Service is started on the nodes owning instances and then that each node has UDP 1434 open in the firewall.

Do connection string DNS lookups get cached?

Suppose the following:
I have a database set up on database.mywebsite.com, which resolves to IP 111.111.1.1, running from a local DNS server on our network.
I have countless ASP, ASP.NET and WinForms applications that use a connection string utilising database.mywebsite.com as the server name, all running from the internal network.
Then the box running the database dies, and I switch over to a new box with an IP of 222.222.2.2.
So, I update the DNS for database.mywebsite.com to point to 222.222.2.2.
Will all the applications and computers running them have cached the old resolved IP address?
I'm assuming they will have.
Any suggestions along the lines of "don't have your IP change each time you switch box" are not too welcome as I cannot control this aspect of the situation, unfortunately. We are currently using the machine name of the box, which changes every time it dies and all apps etc. have to be updated with the new machine name. It hurts.
Even if the DNS is not cached local to the machine, it will likely be cached somewhere along the DNS chain between the machine and the name servers, at least for a short while. My understanding is this situation would usually be handled with IP takeover where you just make the new machine 111.111.1.1.
Probably a question for serverfault.
You're looking for DNS TTL (Time To Live) I guess.. In my opinion applications may cache the IP for at most the value of the TTL. I'm afraid however that some applications/technologies might actually cache it longer (agian in my opinion completely wrong)
Each machine will cache the ip address.
The length of time it is cached is the TTL (Time To Live). This is a setting on your DNS server, if you set it very low say 5 mins, then you show be up and running fairly quikly. A bit of a hack but it should work.
Yes, the other comments are correct in that what controls this is the DNS TTL set for the hostname database.mywebsite.com.
You'll have to decide what the maximum amount of time you're willing to wait for if you have a failure on your primary address (111.111.1.1) after you make the switch to the secondary address. Lower settings will give you a quicker recovery time, but will also increase the load and bandwidth to your DNS server because clients will have to re-query it to refresh their cache more often.
You can use nslookup using the -d option from your cmd prompt to see what your default TTL times and remaining TTL times are for the DNS server you are querying.
%> nslookup -d google.com
You should assume that they are cashed for two reasons not clearly mentioned before:
1- Many "modern" versions of OS families do DNS caching.
2- Many applications do DNS caching or have poor error/failure detection on live connections and/or opening new connections. This would possibly include your database client.
Also, this is probably not well documented. I did some googling, and found this for MySQL:
http://dev.mysql.com/doc/refman/5.0/en/connector-net-programming-connecting-connection-string.html#connector-net-programming-connecting-errors
It does not clearly explain its behavior in this regard.
I had a similar issue with a web site that disables the application pool recycling features and runs for weeks on end. Sometimes, a clustered SQL Server box would restart and for some reason, my SqlConnection's were not reconnecting. I was getting the error:
A network-related or instance-specific
error occurred while establishing a
connection to SQL Server. The server
was not found or was not accessible.
Verify that the instance name is
correct and that SQL Server is
configured to allow remote
connections. (provider: Named Pipes
Provider, error: 40 - Could not open a
connection to SQL Server)
The server was there - and running - in fact, if I just recycled the app pool, the app would work fine - but I don't like recycling app pools!
The connections that were being held in the connection pool were somehow using old connection information, and that could have been old IP addresses. This is what seems so similar to the poster's question, that it appears to be cached DNS information, because as soon as some sort of a cache is cleared, the app works fine.
This is how I solved it - by forcing all of the connections in the pool to be re-created:
Try
' Example: SqlDependency, but this could also be any SqlConnection.Open call
Dim result As Boolean = SqlClient.SqlDependency.Start(ConnStr)
Catch sqlex As SqlClient.SqlException
SqlClient.SqlConnection.ClearAllPools()
End Try
The code sample is just the boiled-down basics - it should be tweaked for your situation!
The DNS gets cached, but for any server that resolves to the wrong ip address, you can update the HOSTS file of the server and the ip should be updated immediately. This could be a solution if you have a limited amount of servers accessing your database server.

Resources