Controlling Nagios Login Frequency When Monitoring Remote Hosts

Controlling Nagios Login Frequency When Monitoring Remote Hosts - nagios

I've constructed a Nagios remote-host monitoring setup (non-NRPE), and it's functional and useful, except:
Somehow, I found that the Nagios host logs in to various remote hosts, only to log out one second later (if not in that same second), every 3 minutes or so; how often it does this doesn't appear to be deterministic. These logins don't coincide with any check periods I've defined.
From an arbitrary member of my remote host array's auth.log:
Feb 25 10:51:11 MACHINE sshd[3590]: Accepted publickey for nagios from 10.1.2.110 port 54069 ssh2
Feb 25 10:51:11 MACHINE sshd[3590]: pam_unix(sshd:session): session opened for user nagios by (uid=0)
Feb 25 10:51:11 MACHINE sshd[3599]: Received disconnect from 10.1.2.110: 11: disconnected by user
Feb 25 10:51:11 MACHINE sshd[3590]: pam_unix(sshd:session): session closed for user nagios
And then, three minutes later:
Feb 25 10:54:10 MACHINE sshd[3632]: Accepted publickey for nagios from 10.1.2.110 port 54176 ssh2
Feb 25 10:54:10 MACHINE sshd[3632]: pam_unix(sshd:session): session opened for user nagios by (uid=0)
Feb 25 10:54:10 MACHINE sshd[3642]: Received disconnect from 10.1.2.110: 11: disconnected by user
Feb 25 10:54:10 MACHINE sshd[3632]: pam_unix(sshd:session): session closed for user nagios
I can't figure it. My service follows the generic-service template, which I've modified for a slightly longer check-interval and max-check-attempts. Why is Nagios on this serial login spree?

Have you checked your host definitions? What do you use for 'check-host'? If that performs a check 'through' an NRPE check (rather than something like a 'local' check-ping), then it could be logging in as well.
Also you can check your Nagios log file to see what checks are actually being performed. I usually perform a 'tail -f nagios.log | grep [IP_ADDRESS_of_target_host]' to narrow the results to a specific machine.
If nothing is showing up there, in a last ditch effort you can enable debugging and check the Nagios debug file - EVERYTHING Nagios does will go into this file. As the debug file tends to roll very quickly (at least in our install - >6.8K checks), you may have to get creative with 'grep' to find what you're looking for.

If the check is returning a CRITICAL/WARNING state, it could be that your retry_interval is set to 3 minutes, which I believe is the default. Doublecheck your service template in nagios/etc/objects/templates

Related

chrony - local stratum & orphan - when there's no Internet

Most chrony server configurations I've found so far are about commenting out and setting the allow directive.
I'd like several servers in my network to be synchronized to one of my 2 dedicated chrony NTP servers, even if there's no Internet, and when each chronyd is not able to synchronize to any of X.pl.pool.ntp.org servers.
According to man, local stratum directive allows chrony to serve time 'even if not synchronized to a time source' - I'm wondering if it works in the following way: as long as chronyd is able to sync to one of the X.pl.pool.ntp.org servers, everything is fine; when chronyd is not able to sync to any external source, then it still serves as a valid NTP server for the local clients thanks to local stratum directive - am I right? Isn't it like telling the clients that chrony is always healthy no matter the Internet connection status thus you can always sync to it?
Question 1: should I comment out local stratum 10 to meet my requirements?
Question 2: I'm also considering using orphan mode on my NTP servers in the following way: local stratum 10 orphan
As far as I understand, it works in the following way: let's assume my NTP servers lose connections to all X.pl.pool.ntp.org servers (I configured them to pool the same external sources) - now - thanks to local stratum 10 orphan, the clients will always sync to the server with the lowest Reference ID first (we assume my production servers are set to poll only my local NTP servers) - am I right about it?
Regards,

Siteminder (maybe) 500 error issues

We have been using JBoss Wildfly 8.2 for about 9 months and never had this issue until about two weeks ago (Nov 9th 2015 approximately). We use IIS 7.5 on Windows 2008R2. We serve .war files with Java/JavaScript and also server ColdFusion separately. We connect to MSSQL Server 2012. All of this has been the same with out the errors. We also use Spring and SOLR. We use SiteMinder.
What happens:
User goes to our website, they see a blank screen with the IE11 tab stating 500 error. After some minutes, the user can refresh and the site will be up fine. We have confirmed that two users at once will experience the same thing. As far as suspected browser configuration issues, the user does not need to do anything. They just wait 5-15 mins and then click refresh. Now that I think of it ... maybe the 500 error resolves after I have logged in remotely, but I am not 100% sure.
This does not happen every day, and we have 4 different sites with the exact same software and VM setup (IIS / .war file / etc is the same) and it happens to different sites randomly.
It is ALWAYS the first users in the morning that I have seen so far.
There are no server logs after 02:00 AM ... the site is trying to be accessed at around 6:00 AM. JBoss shows no indication of any errors and everything looks just fine in the back end. Our last error is: 23 Nov 15 02:00:03,915 ERROR [stderr] (Timer-6) java.net.ConnectException: Connection refused: connect -- this is expected since that is DB maintenance time. After that ... nothing.
The Application Logs in Server Manager show the following at the time of the 500 error:
New virus definition file loaded
Then 6+ occurrences:
Failed to initialize the message bus
SiteMinder agent has encountered initialization errors and will not service requests
Server already running Duplicate LLAWP processes not allowed, exiting
Then I log into the server successfully and shortly after I see
The Software Protection service is starting
Then, random or not, the site seems to be up and logs say Software Protection service has started.
What the heck is going on and how do I fix this? It seems coincidental that the 500 server error goes away after I log in, but still...

Apache Proxy Plugin handling of JVM ID in JSESSION Cookie

I am trying to understand the mapping between the JVMID present in the JSESSION Cookie and the ipaddr:port of the managed server. Few questions below -
Who generates the JVMID and how does apache plugin know the JVMID of a given node. Does it get it back in the response from the server (may be as part of the Dynamic Server List?).
If we send a request to an apache with a JSESSION cookie containing a JVMID, and that apache hasn’t handled any requests yet, what would be the behavior?
Assuming that apache maintains a local mapping between JVMIDs and node addresses, how does this get updated? (specially in case of apache restart or a managed server restart)
See more at: http://middlewaremagic.com/weblogic/?p=654#comment-9054

1) The JVM ID is generated from each Weblogic server and appended to the JSESSIONID.
Apache logs the individual server HASH and maps it to the respective Managed server, and is able to send it to the same weblogic managed server as the previous request.
Here is an Example log from http://www.bea-weblogic.com/weblogic-server-support-pattern-common-diagnostic-process-for-proxy-plug-in-problems.html
Mon May 10 13:14:40 2004 getpreferredServersFromCookie: -2032354160!-457294087
Mon May 10 13:14:40 2004 GET Primary JVMID1: -2032354160
Mon May 10 13:14:40 2004 GET Secondary JVMID2: -457294087
Mon May 10 13:14:40 2004 [Found Primary]: 172.18.137.50:38625:65535
Mon May 10 13:14:40 2004 list[0].jvmid: -2032354160
Mon May 10 13:14:40 2004 secondary str: -457294087
Mon May 10 13:14:40 2004 list[1].jvmid: -457294087
Mon May 10 13:14:40 2004 secondary str: -457294087
Mon May 10 13:14:40 2004 [Found Secondary]: 172.18.137.54:38625:65535
Mon May 10 13:14:40 2004 Found 2 servers
2) If the plugin is installed on the new Apache as well, the moment Apache starts up it will ping all available Weblogic servers to report them as Live or Dead (my terms used here, not official) - while doing that health check it gets the JVMID for each available Weblogic. After that when it will receive the first request with a pre-existing JVMID - it can direct correctly.
3) there are some params like DynamicServerList ON - if it's On it keeps polling for Healthy Weblogics, if OFF then it send it to a hardcoded list only. so if On - then it's pretty dynamic

NTP Configuration without Internet

I am trying to setup a local NTP Server without Internet Connection.
Below is my ntp.conf on Server
# Server
server 127.127.1.0
fudge 127.127.1.0 stratum 5
broadcast 10.108.190.255
Below is my ntp.conf on Clients
# Clients
server 10.108.190.14
broadcastclient
but my clients are not sync with the server. Output to ntpq -p on Clients show that they are not taking time from the server, and server ip is show at stratum 16
Could any one please help in this issue.

The server should use its local clock as the source. A better set up is to use orphan mode for isolated networks which gives you fail-over. Check out the documentation:
http://www.eecis.udel.edu/~mills/ntp/html/orphan.html

You need to configure the clients with th e prefer keyword. ntpd tries its hardest not to honor local undisciplined clocks in order to prevent screwups.
server 10.108.190.14 prefer
For more information see: http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3658
This is all assuming that you have included the full and entire ntp.con and did not leave out any bits about restrict lines.

How about using chrony?
Steps
Install chrony in both your devices
sudo apt install chrony
Let's assume the server IP address 192.168.1.87 then client configuration (/etc/chrony/chrony.conf) as follows:
server 192.168.1.87 iburst
keyfile /etc/chrony/chrony.keys
driftfile /var/lib/chrony/chrony.drift
log tracking measurements statistics
logdir /var/log/chrony
Server configuration (/etc/chrony/chrony.conf), assume your client IP is 192.168.1.14
keyfile /etc/chrony/chrony.keys
driftfile /var/lib/chrony/chrony.drift
log tracking measurements statistics
logdir /var/log/chrony
local stratum 8
manual
allow 192.0.0.0/24
allow 192.168.1.14
Restart chrony in both computers
sudo systemctl stop chrony
sudo systemctl start chrony
5.1 Checking on the client-side,
sudo systemctl status chrony
`**output**:
июн 24 13:26:42 op-desktop systemd[1]: Starting chrony, an NTP client/server...
июн 24 13:26:42 op-desktop chronyd[9420]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 -DEBUG)
июн 24 13:26:42 op-desktop chronyd[9420]: Frequency -6.446 +/- 1.678 ppm read from /var/lib/chrony/chrony.drift
июн 24 13:26:43 op-desktop systemd[1]: Started chrony, an NTP client/server.
июн 24 13:26:49 op-desktop chronyd[9420]: Selected source 192.168.1.87`
5.1 chronyc tracking output:
Reference ID : C0A80157 (192.168.1.87)
Stratum : 9
Ref time (UTC) : Thu Jun 24 10:50:34 2021
System time : 0.000002018 seconds slow of NTP time
Last offset : -0.000000115 seconds
RMS offset : 0.017948076 seconds
Frequency : 5.491 ppm slow
Residual freq : +0.000 ppm
Skew : 0.726 ppm
Root delay : 0.002031475 seconds
Root dispersion : 0.000664742 seconds
Update interval : 65.2 seconds
Leap status : Normal

Jenkins + Active Directory Authentication - Slow Login

I have a Jenkins master running on Windows 2008 SP2 set up with Active Directory authentication. The authentication is working fine and normally there is no issue with Login.
Occasionally however Jenkins will take 4 to 5 minutes to log a user in. This seems to correlate with the amount of time a user has been inactive (i.e. A user who has not logged in for 2 or 3 weeks will experience extremely slow response when trying to log in).
Has anyone else experienced this behavior? I'm really not sure if I should start looking at active directory or Jenkins to troubleshoot this.

The plugin maintainers actively suggest to enable logging (using "hudson.plugins.active_directory" to ALL) and file a bug if a problem happens.

Jenkins slow login with Active directory seens to be almost alway's related to DNS issue. Check your dns Service (srv), NS, DomainDnsZones, _ldap.tcp.domaine.com, _gc._tcp.domaine.com and ForestDnsZone response from your AD/DNS server. If you can't reach all ip/port listed, you will face some random slow loggin (30 or 60 seconds, depend on the query) when the jenkins AD pluggins will try to query those servers/services.
You can easly trap the dns query with tcpdump or wireshark on your jenkin's server or on the DNS server.
tcpdump -i interface port 53
specifying sites can help you to make a potable workaround as that will limit slow loging to only bad result when querying inexistant/unreachable ip/port return by ForestDnsZone.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight