Nagios notification when no message is received within 48 hours - nagios

In Nagios it is easy to check that a LogMessage happened in the last 48 hours and sound an alarm. What I would like, though, is to instead configure Nagios to sound an alarm when a specific message did not occur within 48 hours.
Can anyone point me in the right direction?
I am using the "Check WMI Plus" plugin (no agent required) in order to check the event log on a windows box.

Without knowing what your exact "specific message" is, it's hard to give a specific answer, but we can do this:
I'm going to raise a CRITICAL event when I haven't seen a "processing of windows Group Policy failed" error or warning event in the last 48 hours.
You use the -w and -c options to define criteria for WARNING and CRITICAL events in check_wmi_plus.
From check_wmi_plus.pl --help | less -i we get the help and we can find the checkeventlog options.
There are two tricks:
checkeventlog only has one field _ItemCount, so you don't need to specify it
You want to specify a
range of values, that includes only 0 - so use #0:0
First, define a specific section in the events.ini file. Mine is: /opt/nagios/bin/plugins/check_wmi_plus.d/events.ini
I added this:
[eventSpecial]
im=Group Policy failed
I added that just below the [eventdefault] section.
Basically, the im= means 'include message' - if it's not specified everything is included, so by specifying it, I've said "only include messages that match this regular expression."
Then you need the command for checkeventlog
I use:
/opt/nagios/bin/plugins/check_wmi_plus.pl -H HOST -u USER -p PASS -m checkeventlog -a % -o 2 -3 48 -4 eventSpecial -c #0:0
So for the optional arguments (again with the --help option):
-a % == search all event logs
-o 2 == Warning and error severity only
-3 48 == last 48 hours
-4 eventSpecial == refer to the section in events.ini that I just created
-c #0:0 == raise a CRITICAL if there are exactly 0 occurances
With this command, if there ARE messages during the period, I get:
OK - 3 event(s) of Severity Level: "Error,Warning", were recorded in
the last 48 hours from the % Event Log. (List is on next line. Fields
shown are -
Logfile:TimeGenerated:SeverityLevel:EventId:Type:SourceName:Message)|'Event
Count'=3;0;
System:20130604195600.378642-000|Error:1129:0:Microsoft-Windows-GroupPolicy:The processing of Group Policy failed because of lack of network
connectivity to a domain controller. This may be a transient
condition. A success message would be generated once the machine gets
connected to the domain controller and Group Policy has succesfully
processed. If you do not see a success message for several hours, then
contact your administrator.
System:20130604055521.084809-000|Error:1129:0:Microsoft-Windows-GroupPolicy:The processing of Group Policy failed because of lack of network
connectivity to a domain controller. This may be a transient
condition. A success message would be generated once the machine gets
connected to the domain controller and Group Policy has succesfully
processed. If you do not see a success message for several hours, then
contact your administrator.
System:20130603220259.894040-000|Error:1055:0:Microsoft-Windows-GroupPolicy:The processing of Group Policy failed. Windows could not resolve the
computer name. This could be caused by one of more of the following:
a) Name Resolution failure on the current domain controller. b)
Active Directory Replication Latency (an account created on another
domain controller has not replicated to the current domain
controller).
Which does not include a critical event.
If there are none, I get this:
CRITICAL - [Triggered by _ItemCount in the range 0:0] - 0 event(s) of
Severity Level: "Error,Warning", were recorded in the last 4 hours
from the % Event Log.|'Event Count'=0;0;
Which does include the critical event, because there were no entries in the log file to match my criteria.
And you can just define a standard Nagios command using the appropriate $USER8$ macros to include it in your configuration.

You should try this and create a simple DOS script that is kicked off every hour to monitor nagios and restart it when it see 2 nagios.exe. Here is the DOS script to kill the nagios.exe service and restart it.
-------- CheckNagios.bat --------
#echo off
set mypgm=nagios.exe
REM GET date/time stamp
For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set mydate=%%c-%%a-%%b)
For /f "tokens=1-2 delims=/:" %%a in ('time /t') do (set mytime=%%a%%b)
:checkNagios
rem get number of nagios processes
for /f %%i in ('c:\windows\system32\tasklist.exe ^| find /i /c "%mypgm%"') do set /a numProc=%%i
echo Last Check: %mydate%_%mytime%
ECHO # of processes = %numProc%
if %numProc% GTR 1. (goto kill) else goto end
:kill
c:\windows\system32\taskkill.exe /f /IM %mypgm%
REM restart nagios
net start Nagwin_Nagios
REM restart other nagios processes
rem for /f %%x in ('net start ^| findstr /i "nagwin_"') do net stop %%x
:end
echo Exiting program.
echo =================
rem SCHEDULE TASK TO RUN EVERY HOUR and pipe to a logfile
rem SCHTASKS /create /TN "Check Nagios" /TR "c:\icw\bin\checkNagios.bat >> c:\checknagios.log 2>&1" /SC HOURLY /ST 16:00 /MO 1 /RU DOMAIN\USERNAME /RP PASSWORD
REM store last check that will be used by emailNagios.bat using blat.exe
set LAST_NAGIOS_CHECK=%%mydate%%_%%mytime%%

Related

Script to Kill All RDP Sessions

I am trying to create a simple script to kill ALL remote desktop sessions, active or disconnected, without rebooting the server. My server OS is Windows Server 2012R2 with Remote Desktop Services enabled and licensed.
I found a simple batch file script here to do this task: enter link description here
When I run this script locally on my terminal server, I get an error in the batch file runs that says: Session Disc not found
And only the console user is logged off. Can anyone tell me what is wrong with this script?
query session >session.txt
for /f "skip=1 tokens=3," %%i in (session.txt) DO logoff %%i
del session.txt
My Session text file looks like this:
SESSIONNAME USERNAME ID STATE TYPE DEVICE
services 0 Disc
>console Administrator 13 Active
wcunningham 18 Disc
kstarkey 25 Disc
rdp-tcp#11 cyannone 52 Active
rdp-tcp 65536 Listen
What I found is a good workaround is adding an additional line to handle the disconnected sessions. Since disconnected sessions don't list the sessionname, the ID is at position 2 not position 3 like it is with active session. So adding an additional line that specifies token=2 did the trick:
query session >session.txt
for /f "skip=1 tokens=2," %%i in (session.txt) DO logoff.exe %%i
for /f "skip=1 tokens=3," %%i in (session.txt) DO logoff.exe %%i
del session.txt
skip the first 3 lignes and the active user
query session >session.txt
for /f "skip=3 eol=> tokens=2," %%i in (session.txt) DO echo %%i
for /f "skip=3 eol=> tokens=3," %%i in (session.txt) DO echo %%i
del session.txt

How to debug task scheduler on win 10?

I'm trying to exectute the following bat every 15 minutes on my pc:
#ECHO OFF
SETLOCAL enabledelayedexpansion
SET host=http://dnsad.de/rest/
SET slideshowurl=http://dnsad.de/display/currentSlideshow/mac/
SET slideshowfolder=C:\Slideshow
SET ieprocess="iexplore.exe"
SET ignore_result=INFORMATION:
FOR /f "delims=" %%a IN ('getmac /v ^|find /i "Realtek"') DO (
FOR %%b IN (%%a) DO (
SET element=%%b
IF "!element:~2,1!!element:~5,1!!element:~8,1!"=="---" set mac=%%b
)
)
SET formattedmac=%mac:-=:%
SET macpath=%mac:-=_%
FOR /f "delims=" %%a IN ('curl -X GET %host%%formattedmac%') DO (
FOR %%b IN (%%a) DO (
SET update=%%b
)
)
IF "%update%"=="[true]" (
CD %ProgramFiles%\WinHTTrack\
httrack %slideshowurl%%formattedmac% -q -O "C:\Slideshow" -s0 -B -a
curl -X PUT %host%%formattedmac%
START iexplore -k %slideshowfolder%\dnsad.de\display\currentSlideshow\mac\%macpath%.html
)
EXIT
The script works as it should when executed. Im getting the device's mac address, getting the expected server responses from curl, WinHTTrack is backing up the data correctly, curl updates the server fields and then the internet explorer gets opened with the updated,local html.
When scheduled as Task with win 7 it works as it should as well. When running the bat from Task Scheduler on Win 10 the last thing it does is the curl PUT, but the Internet Exploerer is never opened. The task is marked as succesful.
I am logged in as admin on Win 7 and Win 10. I tested pretty much every setting within the taskscheduler. Nothing seems to be working. Why doesnt the internet explorer start ?
[EDIT]
It seems that the option "Run whether user is logged on or not" causes the problem. But here is the catch:
I'm displaying slideshows in Internet Explorer Kiosk Mode and need to get updated Data from my server to Display new Slideshows regularly. The mentioned option prevents the console from popping up when executing the bat file. If i "only execute if user is logged on" i do get the updated Data to display in Internet Explorer, but every 15 minutes a console window pops up for a second.
I tried exectuing with cmd /c "update" /min "PATH TO BAT" which dosesn't solve the problem.
As mentioned in my [Edit] the problem was the option "run whether user is logged on or not", which i used to prevent the command line window to pop up. When i use the option "only execute if user is logged on" i need to "wrap" the bat in a VBScript which doesnt open the cli. So by scheduling a vbs with the following lines:
Set objShell = WScript.CreateObject("WScript.Shell")
objShell.Run "cmd /c PATH\TO\BAT", 0, True
i can execute the bat without a flashing cli.

Create a batch file to control services from multiple servers

I would like to create a .bat file that will give me option to chose between different servers and perform actions such as stop/start services. So far I am getting System error 67 has occurred.The network name cannot be found. Is there a better way to do this. or can I chose the server name from a pop up option.
#ECHO OFF
CLS
ECHO 1.server1
ECHO 2.server2
ECHO 3.server3
ECHO 4.server4
ECHO.
set /p server_name=Enter server name:
IF %server_name%== 1 GOTO app1
IF %server_name%== 2 GOTO app2
IF %server_name%== 3 GOTO app3
IF %server_name%== 4 GOTO app4
:app1
call:restart "server1"
GOTO End
:app2
call:restart "server2"
GOTO End
:app3
call:restart "server3"
GOTO End
:app4
call:restart "server4"
GOTO End
:restart
net use \\%~1/User:%username%
SC \\%~1 Stop service
timeout 10
SC \\%~1 Start service
GOTO End
Instead of debugging your file, I'd like to present you a different solution:
On the servers where you want to restart your services create this bat file:
#ECHO OFF
:STOP
SC STOP <your_service>
ping 127.0.0.1 -n 6 > nul
SC QUERY <your_service> | find /I "STATE" | find "STOPPED"
if errorlevel 1 goto :STOP
SC START <your_service>
Now, create on your servers tasks with the task scheduler. Don't set any triggers but make the task execute your newly created bat file. Select any options you need (like a specific user with a specific password, whether to run the script when no one is logged in or not, whether to run it with admin priveledges or not, etc.) and give it name (your_task).
Finally, modify your script like this:
#ECHO OFF
CLS
ECHO 1.server1
ECHO 2.server2
ECHO 3.server3
ECHO 4.server4
ECHO.
:SELECT
set /p server_name=Enter server name:
IF %server_name%==1 GOTO app1
IF %server_name%==2 GOTO app2
IF %server_name%==3 GOTO app3
IF %server_name%==4 GOTO app4
GOTO SELECT
:app1
SCHTASKS /RUN /S "server1" /TN "your_task"
GOTO End
:app2
SCHTASKS /RUN /S "server2" /TN "your_task"
GOTO End
:app3
SCHTASKS /RUN /S "server3" /TN "your_task"
GOTO End
:app4
SCHTASKS /RUN /S "server4" /TN "your_task"
GOTO End
:End
You can also pass a user and a password to the SCHTASKS command if needed.
It should be clear how it works. I'd recommend taking a closer look at the first bat file. Using this:
SC \\%~1 Stop service
timeout 10
SC \\%~1 Start service
might cause trouble. Sometimes service won't stop within 10 seconds out of various reasons. Trying to start the service while it hasn't stopped yet, will result in an error and the service will remain stopped.
Instead, our bat script within the scheduled task will try to stop the service, then wait for 5 seconds (yes, -n 6 means 5 seconds ^^) and then it will check whether the service has the state "STOPPED". If it is, it will start it, otherwise, it will re-try to stop it, wait 5 more seconds and so on.

Get service date and time from Event Viewer using Batch script

I have come up with a script that will restart a specific service and now I would like to know if there is a way I can get a service start time from event viewer using batch files.
Appreciate if anyone could give me the answer. Thanks!
Use wevtutil.
The service start/stop events are logged in the system event log, there are several ways to open it (use google). Clicking the events we can see a "service entered the running state" event with an ID 7036, let's use it to find the last start time of Application Experience service.
Only one event is needed /c:1 and since it's the last in the log let's reverse the direction with /rd:true:
wevtutil qe system /rd:true /c:1 /q:"Event[EventData[Data[#Name='param1']='Application Experience'] and System[EventID=7036]]
The output is this xml blob:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/><EventID Qualifiers='16384'>7036</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x8080000000000000</Keywords><TimeCreated SystemTime='2015-10-12T10:43:13.841899000Z'/><EventRecordID>4287264</EventRecordID><Correlation/><Execution ProcessID='800' ThreadID='1804'/><Channel>System</Channel><Computer>zOo</Computer><Security/></System><EventData><Data Name='param1'>Application Experience</Data><Data Name='param2'>running</Data><Binary>410065004C006F006F006B00750070005300760063002F0034000000</Binary></EventData></Event>
Let's extract the date and time.
First remove everything from the beginning up to SystemTime= with string replacement set "xml=!xml:*SystemTime=!":
='2015-10-12T10:43:13.841899000Z'/>.....................................(the rest of the string)
Then split at ' and T and . into tokens: =, 2015-10-12, 10:43:13, 841899000Z, />.... and grab the 2nd and the 3rd:
#echo off
setlocal enableDelayedExpansion
for /f "tokens=*" %%a in ('
wevtutil qe system /rd:true /c:1 ^
/q:"Event[EventData[Data[#Name='param1']='Application Experience'] and System[EventID=7036]]"
') do (
set "xml=%%a" & set "xml=!xml:*SystemTime=!"
for /f "delims='T. tokens=2,3" %%b in ("!xml!") do (
echo Started at date: %%b time: %%c
)
)
endlocal
pause
The date uses YYYY-MM-DD format, the time is 24-hour:
Started at date: 2015-10-12 time: 10:43:13

Finding date on a wildcard file name and comparing to current date

We use a piece of software that runs and on completion it creates a file with a completion date and time in the title.
In a set folder there is a file for each day, the main name of the file is the same and it's just the date and time in the file name that is changed.
Once this file is created we know the process has finished and we then want to run a series of commands.
We want to run a batch file to search for the file in the specific folder with the current date in the file name and then find the date of that file, then compare that to a current date. If the test is positive we know the process has finished for that day and then we can proceeds further. If the date doesn't match then it waits 4 minutes and tries again, until it finds the file that was created today.
The problem I have is that the file name is created with a date in a different sequence than the files creation date, so I can't compare. I don't know how to change this so the command can find the file and compare the dates.
The commands for stoping/starting services from :same down are working OK, as tested in separate batch file. It's the finding file and date comparison part I can't figure out.
As you can probably see I'm a bit rusty on this sort of command sequence, and maybe I'm going about this the wrong way, so some assistance appreciated.
For testing purposes I've put in the Echo's and pauses so I can see where things are up to while I test. The file name has a date while I try and test this, but ultimately this will need to be some sort of wildcard that inserts the current date, to search/compare by.
This is what I have put together so far:
REM .............Start Script.................
#echo on
:LOOP
set currentDate=%date%
SET filename="x:\DATA\File Upload Summary Report 2014-09-25*.*"
pause
FOR %%f IN (%filename%) DO SET filedatetime=%%~tf
Pause
REM next command displays date of screen so I can compare
ECHO %filedatetime:~0,-6% >> %destination%
Pause
IF %filedatetime:~0, 10% == %currentDate% goto same
goto notsame
:same
REM service stop & start commands
Echo Same
pause
net stop nxServerV3
REM wait for 5 seconds by using ping, then next line returns y
#ping -n 4 -w 1000 0.0.0.1 > NUL
CHOICE/cyn t:Y,5
REM start nexus server
#ping -n 4 -w 1000 0.0.0.1 > NUL
net start nxServerV3
REM wait for 5 seconds by using ping
#ping -n 4 -w 1000 0.0.0.1 > NUL
net start ConnectorService
#ping -n 4 -w 1000 0.0.0.1 > NUL
goto end
:notsame
REM Loop scrip after 4 minutes
Echo Not Same
echo Press any key to exit...
if ERRORLEVEL 1 goto end
timeout /t 240
goto :LOOP
:end
endlocal
Thanks for the help.
Change of approach
OK, I've modified my thinking. As the file is created with the date and time in the filename, rather than comparing the dates, instead I now just add the date into the filename, then search for that filename. The only issue being the filename needs a wild card as there is some extra details in the filename, but don't want to match that part of the search. I've just forgotten the sequence for this, as it doesn't appear to be using the wild card when looking for the file, it seems to be taking the wild card as part of the filename. Other than that the new approach seems to work OK.
REM .............Start Script.................
Echo on
for /f "tokens=2 delims==" %%a in ('wmic OS Get localdatetime /value') do set "dt=%%a"
set "YY=%dt:~2,2%" & set "YYYY=%dt:~0,4%" & set "MM=%dt:~4,2%" & set "DD=%dt:~6,2%"
set currentDate=%date%
set fulldate=%YYYY%-%MM%-%DD%
pause
SET filename="x:\DATA\File Upload Summary Report %fulldate%*.pdf"
pause
:LOOP
if exist filename goto restart
goto notexist
First you can use timeout 1 > nul to pause your code for a second also for comparing files take a look at the forfiles /? command its very useful it supports date/time comparsion
Since you're using Nexus (but should be considering v4) then it's safe to provide you with a formula for manipulating substrings. Since you don't say what the formats of the two strings to be compared are, you'll have to do the hard yakka yourself.
Simply, to extract a substring from var, use
set newvar=%var:~m,n%
where m is the start position (counts from 0) if positive, count-from-end if negative.
n is the count-of-characters if positive, count-from-end if negative.
[,n] is optional.
OTT, bang substrings together in any desired order as though they were ordinary environment variables.

Resources