Check database connectivity - database

I'm writing a unix script to check for database connectivity in a server. When my database connection gets errored out or when there is delay observed in connecting to the database, I want the output as "Not connected". In case it gets connected, my output should be "Connected". It is a Oracle databse.
When there is delay in database connectivity, my code is not working and my script gets hung. What changes should I make in my code so that it is able to handle both the conditions(when I get an error connecting to the database and when there is delay observed in connecting to the database)??
if sqlplus $DB_USER/$DB_PASS#$DB_INSTANCE< /dev/null | grep 'Connected to'; then
echo "Connectivity is OK"
else
echo "No Connectivity"
fi

The first thing to add to your code is a timeout. Checking database connectivity is not easy and there can be all kinds of problems in the various layers that your connection passes. A timeout gives you the option to break out of a hanging session and continue the task with reporting that the connection failed.
googleFu gave me a few nice examples:
Timeout a command in bash without unnecessary delay

If you are using Linux, you can use the timeout command to do what you want. So the following will have three outcomes, setting the variable RC as follows:
"Connected to" successful: RC set to 0
"Connected to" not found: RC set to 1
sqlplus command timed out after 5 minutes: RC set to 124
WAIT_MINUTES=5
SP_OUTPUT=$(timeout ${WAIT_MINUTES}m sqlplus $DB_USER/$DB_PASS#$DB_INSTANCE < /dev/null )
CMD_RC=$?
if [ $CMD_RC -eq 124 ]
then
ERR_MSG="Connection attempt timed out after $WAIT_MINUES minutes"
RC=$CMD_RC
else
echo $SP_OUTPUT | grep -q 'Connected to'
GREP_RC=$?
if [ $GREP_RC -eq 0 ]
then
echo "Connectivity is OK"
RC=0
else
ERR_MSG="Connectivity or user information is bad"
RC=1
fi
fi
if [ $RC -gt 0 ]
then
# Add code to send email with subject of $ERR_MSG and body of $SP_OUTPUT
echo Need to email someone about $ERR_MSG
fi
exit $RC
I'm sure there are several improvements to this, but this will get you started.
Briefly, we use the timeout command to wait the specified time for the sqlplus command to run. I separated out the grep as a separate command to allow the use of timeout and to allow more flexibility in checking additional text messages.
There are several examples on StackOverflow on sending email from a Linux script.

Related

Nomad task getting killed

I have two tasks in task group
1) a db task to bring up a db and
2) the app that needs the db to be up.
Both start in parallel and the db tasks takes a lil bit time but by then the app recognizes that db is not up and kills the db task. Any solutions? Please advise.
It's somewhat common to have an entrypoint script that checks if the db is healthy. Here's a script i've used before:
#!/bin/sh
set -e
cmd="$*"
postgres_ready() {
if test -z "${NO_DB}"
then
PGPASSWORD="${RDS_PASSWORD}" psql -h "${RDS_HOSTNAME}" -U "${RDS_USERNAME}" -d "${RDS_DB_NAME}" -c '\l'
return $?
else
echo "NO_DB Postgres will pretend to be up"
return 0
fi
}
until postgres_ready
do
>&2 echo "Postgres is unavailable - sleeping"
sleep 1
done
>&2 echo "Postgres is up - continuing..."
exec "${cmd}"
You could save it as entrypoint.sh and run it with your application start script as the argument. eg: entrypoint.sh python main.py

Catch invalid password on Sudo

Is there a way to trap/catch a invalid password when you use sudo? Basically I want to return a specific exit code if the sudo password is invalid. I don't want to avoid sudo or get around it, I just want to close/exit a script in a matter of my choosing.
Based on the man page of sudo(8), there is no easy way for evaluating the exact error reasons for a failure:
Exit Value
Upon successful execution of a program, the exit status from sudo will
simply be the exit status of the program that was executed.
Otherwise, sudo exits with a value of 1 if there is a
configuration/permission problem or if sudo cannot execute the given
command. In the latter case the error string is printed to the
standard error. If sudo cannot stat(2) one or more entries in the
user's PATH, an error is printed on stderr. (If the directory does not
exist or if it is not really a directory, the entry is ignored and no
error is printed.) This should not happen under normal circumstances.
The most common reason for stat(2) to return ''permission denied'' is
if you are running an automounter and one of the directories in your
PATH is on a machine that is currently unreachable.
The only "ugly" approach, which comes to my mind is to parse the result of stderr to determine the error reason:
#!/bin/bash
tmpfile=`mktemp`
sudo echo "dummy" 2>$tmpfile
if [ $? == 1 ]; then
if [ `cat $tmpfile | grep -x "sudo.*incorrect password attempts" | wc -l` == 1 ]; then
# exit due to failed password attempts
echo "too many failed password attempts"
else
# other reason, for instance configuration
echo "other reason"
fi
fi
rm $tmpfile
Note, however, that this approach is not upgrade-safe and moreover language-dependent: If a patch to sudo changes the text which is shown to the user in case of a wrong password, or the user logs on in a different language, this coding will not be able to handle this properly.

Nagios bash script returns no output when executed through check_nrpe

My nagios bash script works fine from the client's command line.
When I execute the same script through check_nrpe from the nagios server it returns the following message "CHECK_NRPE: No output returned from daemon."
Seems like a command in the bash script is not being executed.
arrVars=(`/usr/bin/ipmitool sensor | grep "<System sensor>"`)
#echo "Hello World!!"
myOPString=""
<Process array and determine string to echo along with exit code>
echo $myOPString
if [[ $flag == "False" ]]; then
exit 1
else
exit 0
fi
"Hello World" shows up on the nagios monitoring screen if I uncomment the echo statement.
I am new to linux but seems like the nagios user isn't able to execute ipmitool
arrVars=(`/usr/bin/ipmitool sensor | grep "<System sensor>"`)
Check the output of the above, You can echo it and check for the values. If it still does not work use another script to be called by this to get the output and assign it to a variable
exit 1
This refers to the Severity , So you would have to define different conditions where the severity changes
Add this line to the sudoers
nagios ALL=(root) NOPASSWD: /usr/bin/ipmitool
Then use "sudo /usr/bin/ipmitool" in your script

Bash: Breaking out of IF loop in FOR loop

I am trying to combine a FOR loop (that iterates over IP addresses) and an IF loop (that uses nc to check for a successful ssh connection before moving on).
I have an array ${INSTANCE_IPS[#]} with the IP addresses in it (at the moment it contains 2 IP Addresses). Here is the code:
while [ $ITERATION -le 30 ]
do
for instance in ${INSTANCE_IPS[#]}
do
nc -w 2 $instance 22 > /dev/null
if [ $? -eq 0 ]
then echo "connection succeeded to $instance"
else
ITERATION=$((ITERATION+1))
echo ITERATION=$ITERATION
echo "[info] connection to $instance unsuccessful. trying again. iteration=$ITERATION"
sleep 20
fi
done
done
The 'else' statement in the IF loop works fine. It is the 'then' statement I am having problems with... I don't know how to break out of the IF loop once the connections are successful. Here's an example output when I run the above:
connection succeeded to 10.11.143.171
connection succeeded to 10.11.143.170
connection succeeded to 10.11.143.171
connection succeeded to 10.11.143.170
connection succeeded to 10.11.143.171
connection succeeded to 10.11.143.170
If I use break after then echo "connection succeeded to $instance then it only iterates through 1 IP address and never breaks out:
connection succeeded to 10.11.143.171
connection succeeded to 10.11.143.171
connection succeeded to 10.11.143.171
Ideally I think the best thing to do would be to query the number of elements in the array, then perform a netcat connection an increment some value by 1 until it equals the number of elements in the array, but I'm really not sure how to dot that.
Any help is appreciated :) Please let me know if you need any more information.
Cheers
Reformulate your logic. You can't break if something succeeds, because you don't know whether another item might fail.
Instead, keep a flag saying whether you've successfully gone through all of them, and set it to false if something fails. At this point, you can also break and wait.
ITERATION=0
all_succeeded=false
while [ "$all_succeeded" = "false" -a $ITERATION -le 30 ]
do
all_succeeded=true
for instance in ${INSTANCE_IPS[#]}
do
nc -w 2 $instance 22 > /dev/null
if [ $? -eq 0 ]
then
echo "connection succeeded to $instance"
else
all_succeeded=false
echo "[info] connection to $instance unsuccessful."
sleep 20
break
fi
done
let ITERATION++
done
if [ "$all_succeeded" = "true" ]
then
echo "It worked"
else
echo "Giving up"
fi

Check database connectivity using Shell script

I am trying to write a shell script to check database connectivity. Within my script I am using the command
sqlplus uid/pwd#database-schemaname
to connect to my Oracle database.
Now I want to save the output generated by this command (before it drops to SQL prompt) in a temp file and then grep / find the string "Connected to" from that file to see if the connectivity is fine or not.
Can anyone please help me to catch the output and get out of that prompt and test whether connectivity is fine?
Use a script like this:
#!/bin/sh
echo "exit" | sqlplus -L uid/pwd#dbname | grep Connected > /dev/null
if [ $? -eq 0 ]
then
echo "OK"
else
echo "NOT OK"
fi
echo "exit" assures that your program exits immediately (this gets piped to sqlplus).
-L assures that sqlplus won't ask for password if credentials are not ok (which would make it get stuck as well).
(> /dev/null just hides output from grep, which we don't need because the results are accessed via $? in this case)
You can avoid the SQL prompt by doing:
sqlplus uid/pwd#database-schemaname < /dev/null
SqlPlus exits immediately.
Now just grep the output of the above as:
if sqlplus uid/pwd#database-schemaname < /dev/null | grep 'Connected to'; then
# have connectivity to Oracle
else
# No connectivity
fi
#! /bin/sh
if echo "exit;" | sqlplus UID/PWD#database-schemaname 2>&1 | grep -q "Connected to"
then echo connected OK
else echo connection FAIL
fi
Not knowing whether the "Connected to" message is put to standard output or standard error, this checks both. "qrep -q" instead of "grep... >/dev/null" assumes Linux.
#!/bin/bash
output=`sqlplus -s "user/pass#POLIGON.TEST " <<EOF
set heading off feedback off verify off
select distinct machine from v\\$session;
exit
EOF
`
echo $output
if [[ $output =~ ERROR ]]; then
echo "ERROR"
else
echo "OK"
fi
Here's a good option which does not expose the password on the command line
#!/bin/bash
CONNECT_STRING=<USERNAME>/<PASS>#<SID>
sqlplus -s -L /NOLOG <<EOF
whenever sqlerror exit 1
whenever oserror exit 1
CONNECT $CONNECT_STRING
exit
EOF
SQLPLUS_RC=$?
echo "RC=$SQLPLUS_RC"
[ $SQLPLUS_RC -eq 0 ] && echo "Connected successfully"
[ $SQLPLUS_RC -ne 0 ] && echo "Failed to connect"
exit SQLPLUS_RC
none of the proposed solutions works for me, as my script is executed in machines running several countries, with different locales, I can't simply check for one String simply because this string in the other machine is translated to a different language. As a solution I'm using SQLcl
https://www.oracle.com/database/technologies/appdev/sqlcl.html
which is compatible with all sql*plus scripts and allow you to test the database connectivity like this:
echo "disconnect" | sql -L $DB_CONNECTION_STRING > /dev/null || fail "cannot check connectivity with the database, check your settings"
#!/bin/sh
echo "exit" | sqlplus -S -L uid/pwd#dbname
if [ $? -eq 0 ]
then
echo "OK"
else
echo "NOT OK"
fi
For connection validation -S would be sufficient.
The "silent" mode doesn't prevent terminal output. All it does is:
-S Sets silent mode which suppresses the display of
the SQL*Plus banner, prompts, and echoing of
commands.
If you want to suppress all terminal output, then you'll need to do something like:
sqlplus ... > /dev/null 2>&1
This was my one-liner for docker container to wait until DB is ready:
until sqlplus -s sys/Oracle18#oracledbxe/XE as sysdba <<< "SELECT 13376411 FROM DUAL; exit;" | grep "13376411"; do echo "Could not connect to oracle... sleep for a while"; sleep 3; done
And the same in multiple lines:
until sqlplus -s sys/Oracle18#oracledbxe/XE as sysdba <<< "SELECT 13376411 FROM DUAL; exit;" | grep "13376411";
do
echo "Could not connect to oracle... sleep for a while";
sleep 3;
done
So it basically does select with magic number and checks that correct number was actually returned.

Resources