Dynamically modifying Nagios config - nagios

We have a few services which will startup and shutdown at various times. Due to this it is not possible to include them in the nagios configuration upfront.
I would like to periodically poll a DB or a file to enlist the dynamic services and modify the nagios config so that they show up on the Nagios dashboard.
Is there built-in support for such a thing in nagios? If not I would be restarting Nagios everytime the config is changed by a background process.
Thanks,
Yash

Yes. This can be done using the Nagios 'Command File' interface. For instance, I wrote an NRPE wrapper, called check_nrpe_retime, that uses the 'SCHEDULE_FORCED_SVC_CHECK' remote command to reschedule the next active check based on check results and other (external to Nagios) information. I set the Nagios config up with the 24x7 time period, then all the timing is controlled externally by check_nrpe_retime. Making it WAY easier to manager dynamic processes. External commands are described here and a list of commands is here.

Related

Can systemd restart a service when a specific binary on the server is updated?

On Ubuntu 18.04 I have Unattended Upgrades update apps regularly including a 3rd party PPA that installs a binary /usr/bin/some_app. My systemd unit file runs that service via ExecStart=/usr/bin/some_app. I can verify the updates work on schedule in /var/log/apt/history.log.
However even when the binary is updated via Unattended Upgrades systemd doesn't restart the app, I assume because some_app is started via a custom unit file unrelated to that PPA. So from the cli some_app --version shows v2.0.0 but systemd is still running v1.0.0.
Does systemd have a method to track a file or detect the binary referenced in ExecStart has changed on disk and it should restart? A backup hack for me would be using RuntimeMaxSec= which would get the job done, but I was hoping something more elegant existed
You could try adding a .path unit (man systemd.path) to watch for a close() after write() change to the file, which then restarts your app service. Not tested:
/etc/systemd/system/myappwatch.path
[Unit]
Description=watch for changed file
[Path]
PathChanged=/usr/bin/some_app
#Unit=myappwatch.service
/etc/systemd/system/myappwatch.service
[Service]
ExecStart=/bin/systemctl restart myapp.service
You might be able to replace the systemctl restart with something magic like Conflicts=myapp but I'll let you experiment. You also need to enable the .path unit as usual with an appropriate WantedBy=. I'm not sure what happens if the path is a symbolic link, so perhaps you should resolve the path to the real file is that is the case.

pnp4nagios not logging performance data for new host

We've just updated Nagios from 3.5.x to the current version (4.0.7) and subsequently added a new host for monitoring.
The new host shows as 'Down' in Nagios, and this seems to be related to the fact that pnp4nagios is not logging performance data (the individual checks for users, http etc are all find).
Initially there was an error that the directory
/usr/local/pnp4nagios/var/perfdata/newhost.com
that contains the xml setup and rrd files for the new host was missing), so I manually created this directory, but now it complains that the files are missing.
Does anyone know the appropriate steps to overcome this issue?
Thanks,
Toby
PS I'd tag this 'pnp4nagios', but that tag doesn't exist and I can't create them
UPDATE
It's possible that pnp4nagios is a red herring/symptom. Looking more closely I realise that Nagios actually believes the host is down, even though all services are up. The host status information is '(Host check timed out after 30.01 seconds)'...does this make any more sense?
It's indeed very unlikely that pnp4nagios has something to do with your host being down. pnp actually exports output and performance data to feed the rrd database and xml files (via npcd module or evenhandler command).
The fact that nagios reports the host check timed out after 30 sec means that :
- you have a problem with your host check command, please double-check the syntax
- this check command times out after a certain timelapse (most likely defined in nagios.conf) because the plugin was still running.
I'd recommend running this command from the server's prompt. You want to do something like :
/path/to/libexec/check_command -H ipaddress -args
For example:
/usr/local/libexec/nagios/check_ping -H 192.168.1.1 -w 200,40% -c 500,80% -timeout 120
See if something might be hanging. Having the output would be helpful.
Once your host check returns correct output and performance data to nagios, pnp will hopefuly do the rest.
In the unlikely event it helps anyone, pnp4nagios was indeed a red herring. The problem was that ping wasn't enabled for the host being checked, and this is the test for whether a host is up or not. Hence this was failing, despite other services being reported as working.

How some services start without restart where as some require restart

Some of the windows services will start only after restarting the pc where as some start as soon as software is installed.
For example sql server(instance name) will start as soon as it is installed. Some other service requires restart.After restarting that computer it will start appearing in services.msc. Does it done by using registry? I got a link related to registry of services .But i am not able to track which one does it? Is it registry or something else?
(Setting service to manual or automatic is different,my question is about service added during the install of software for the first time)
You shouldn't be directly manipulating the registry to create a service. You should be using the service control manager API's to create and if desired start the service. The registry values are documented but they are still private to the API and only take effect upon reboot. Using the API will take affect immediately and the registry changes are done by the API.
If you are using Windows Installer you can let the installer handle all of this for you by using the Windows Installer's ServiceInstall and ServiceControl tables.
Some services have dependencies on resources that aren't available until after a reboot. One example might be a locked file that will be overwritten during startup via the Pending Files Rename Operations pattern. Another gotcha is if the service has a dependency on a system environment variable. After updating the registry to set the environment you are supposed to send a message to the broadcast address informing all processes on a settings change. Unfortunately the service control manager ignores these messages so it takes a reboot to catch up.
Other examples would be on a case by case basis.

Building a centralized configuration repository

I'm trying to develop an open source application to be sort like a centralized configuration management for all Unix platform like for example (changing root password, SSH configuration, DNS settings, /etc/hosts management.... and others).
I need your feedback for what do you recommend to use as the interface for all the configuration (list of scripts will be running in the Unix Servers as a clients to read the configuration and apply it in each system "Client===>to===>Server mode"
Should I use LDAP to host the configurations and any Unix OS can talk to the LDAP to get the configuration
or Should I just save the configuration in Database (e.g. MySQL) and build a web interface to read the database and print the configuration to the client ?
or you have any other idea?
You might look into something like Chef or Puppet instead. Why re-invent the wheel?
Curl can download a file from a URL and write that file to standard output. For example, executing curl -sS http://someHost/file.cfg will download "file.cfg" from the specified web server. The "-sS" options instruct Curl to print error messages but not any any progress diagnostics. By the way, Curl supports many protocols including HTTP, FTP and LDAP, so you have flexibility in the technology you want to use to host your centralised configuration repository (CCR).
You could use curl to retrieve a configuration file from the CCR, store the result in a local file and then parse that local file.
Check out Blueprint from DevStructure. It sounds like something along the lines of what you're trying to do. Basically it reverse engineers servers and detects everything that has changed from the install state. Open-source too.
https://github.com/devstructure/blueprint (Blueprint # Github)
We are also about to launch ConfigChief which is a central configuration repository that would do what you want: central point to store configuration (with all features like versioning, audit, ACL, inheritence, etc).
Once you have that, combined with change notification, you can just run a curl as Ciaran McHale says against the CCR and get your parsed configuration file back. This would eliminate the need for writing scripts to generate config files from the outside.
If you are interested, you can signup for a beta at http://woot.configchief.com
DISCLAIMER: I guess it is obvious from the first word!

Suspend weblogic datasource on command line

I was Wondering if there is anyway of suspending / resuming weblogic 10 jdbc datasources via the command line. I am aware that i can do this in the admin console, but because our app has many different datasources it is a bit of a pain.
The reason behind this is that our testers are doing error flow tests and have to simulate the db going down. Ideally i would like to give then a bat file for suspending all datasources and another one for resuming all datasources.
Any ideas?
Thanks
You can use the WLST scripting to do that. From the command line, run $BEA_HOME/wlserver10.0/common/bin/wlst.sh (.cmd on Windows):
Connect to the running server. Use the managed server port as this is a server runtime property:
wls:/offline> connect('weblogic','weblogic','t3://localhost:7002')
Go to the serverRuntime tree:
wls:/mydomain/serverConfig> serverRuntime()
Navigate to the JDBCService, to your managed server name, the JDBCDataSource Runtime and finally to your datasource name:
wls:/mydomain/serverRuntime> cd('JDBCServiceRuntime/managedsrv1/JDBCDataSourceRuntimeMBeans/MyDS')
Then just suspend and resume it:
wls:/mydomain/serverRuntime/JDBCServiceRuntime/managedsrv1/JDBCDataSourceRuntimeMBeans/MyDS> cmo.suspend()
wls:/mydomain/serverRuntime/JDBCServiceRuntime/managedsrv1/JDBCDataSourceRuntimeMBeans/MyDS> cmo.resume()
use command ls() to see the the other variables and operations.
You can record your script... might be easier than writing the batch file in some cases.
You can get help with the methods via javadocs.

Resources