BASH store values in an array and check difference of each value - arrays

[CentOS, BASH, cron] Is there a method to declare variants that would keep even when system restarts?
The scenario is to snmpwalk interface I/O errors and store the values in an array. A cron job to snmpwalk again, say 5 mins later, would have another set of values. I would like to compare them with previous corresponding value of each interface. If the difference exceeds the threshold (50), an alert would generate.
So the question is: how to store an array variable that would lost in the system? and how to check the difference of each value in two arrays?
UPDATE Mar 16, 2012 I attach my final script here for your reference.
#!/bin/bash
# This script is to monitor interface Input/Output Errors of Cisco devices, by snmpwalk the error values every 5 mins, and send email alert if incremental value exceeds threshold (e.g. 500).
# Author: Wu Yajun | Created: 12Mar2012 | Updated: 16Mar2012
##########################################################################
DIR="$( cd "$( dirname "$0" )" && pwd )"
host=device.ip.addr.here
# Check and initiate .log file storing previous values, create .tmp file storing current values.
test -e $DIR/host1_ifInErrors.log || snmpwalk -c public -v 1 $host IF-MIB::ifInErrors > $DIR/host1_ifInErrors.log
snmpwalk -c public -v 1 $host IF-MIB::ifInErrors > $DIR/host1_ifInErrors.tmp
# Compare differences of the error values, and alert if diff exceeds threshold.
# To exclude checking some interfaces, e.g. Fa0/6, Fa0/10, Fa0/11, change the below "for loop" to style as:
# for i in {1..6} {8..10} {13..26}
totalIfNumber=$(echo $(wc -l $DIR/host1_ifInErrors.tmp) | sed 's/ \/root.*$//g')
for (( i=1; i<=$totalIfNumber; i++))
do
currentValue=$(cat $DIR/host1_ifInErrors.tmp | sed -n ''$i'p' | sed 's/^.*Counter32: //g')
previousValue=$(cat $DIR/host1_ifInErrors.log | sed -n ''$i'p' | sed 's/^.*Counter32: //g')
diff=$(($currentValue-$previousValue))
[ $diff -ge 500 ] && (ifName=$(echo $(snmpwalk -c public -v 1 $host IF-MIB::ifName.$i) | sed 's/^.*STRING: //g') ; echo "ATTENTION - Input Error detected from host1 interface $ifName" | mutt -s "ATTENTION - Input Error detected from host1 interface $ifName" <email address here>)
done
# Store current values for next time checking.
snmpwalk -c public -v 1 $host IF-MIB::ifInErrors > $DIR/host1_ifInErrors.log

Save the variables in a file. Add a date stamp:
echo "$(date)#... variables here ...." >> "$file"
Read the last values from the file:
tail -1 "$file" | cut "-d#" -f2 | read ... variables here ....
That also gives you a nice log file where you can monitor the changes. I suggest to always append to the file, so you can easily see when the service is down/didn't run for some reason.
To check for changes, you can use an simple if
if [[ "...old values..." != "...new values..." ]]; then
send mail
fi

Related

Putting files in directory into array variable

I'm writing bash code that will search for specific files in the directory it is run in and add them into an array variable. The problem I am having is formatting the results. I need to find all the compressed files in the current directory and display both the names and sizes of the files in order of last modified. I want to take the results of that command and put them into an array variable with each line element containing the file's name and corresponding size but I don't know how to do that. I'm not sure if I should be using command "find" instead of "ls" but here is what I have so far:
find_files="$(ls -1st --block-size=MB)"
arr=( ($find_files) )
I'm not sure exactly what format you want the array to be in, but here is a snippet that creates an associative array keyed by filename with the size as the value:
$ ls -l test.{zip,bz2}
-rw-rw-r-- 1 user group 0 Sep 10 13:27 test.bz2
-rw-rw-r-- 1 user group 0 Sep 10 13:26 test.zip
$ declare -A sizes; while read SIZE FILENAME ; do sizes["$FILENAME"]="$SIZE"; done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -A sizes=(["'test.zip'"]="0" ["'test.bz2'"]="0" )
$
And if you just want an array of literally "filename size" entries, that's even easier:
$ while read SIZE FILENAME ; do sizes+=("$FILENAME $SIZE"); done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -a sizes=([0]="'test.zip' 0" [1]="'test.bz2' 0")
$
Both of these solutions work, and were tested via copy paste from this post.
The first is fairly slow. One problem is external program invocations within a loop - date for example, is invoked for every file. You could make it quicker by not including the date in the output array (see Notes below). Particularly for method 2 - that would result in no external command invocations inside the while loop. But method 1 is really the problem - orders of magnitude slower.
Also, somebody probably knows how to convert an epoch date to another format in awk for example, which could be faster. Maybe you could do the sort in awk too. Perhaps just keep the epoch date?
These solutions are bash / GNU heavy and not portable to other environments (bash here strings, find -printf). OP tagged linux and bash though, so GNU can be assumed.
Solution 1 - capture any compressed file - using file to match (slow)
The criteria for 'compressed' is if file output contains the word compress
Reliable enough, but perhaps there is a conflict with some other file type description?
file -l | grep compress (file 5.38, Ubuntu 20.04, WSL) indicates for me there are no conflicts at all (all files listed are compression formats)
I couldn't find a way of classifying any compressed file other than this
I ran this on a directory containing 1664 files - time (real) was 40 seconds
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
# Make the array
# A here string (<<<) must be used, to keep array in the global environment
while IFS= read -r -d '' path; do
[[ "$(file --brief "${path%% *}")" == *compress* ]] &&
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" -type f -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - to test
printf '%s\n' "${compressed_files[#]}"
Solution 2 - use file extensions - orders of magnitude faster
If you know exactly what extensions you are looking for, you can
compose them in a find command
This is alot faster
On the same directory as above, containing 1664 files - time (real) was 200 miliseconds
This example looks for .gz, .zip, and .7z (gzip, zip and 7zip respectively)
I'm not sure if -type f -and -regex '.*[.]\(gz\|zip\|7z\) -and printf may be faster again, now I think of it. I started with globs cause I assumed that was quicker
That may also allow for storing the extension list in a variable..
This method avoids a file analysis on every file in your target
It also makes the while loop shorter - you're only iterating matches
Note the repetition of -printf here, this is due to the logic that
find uses: -printf is 'True'. If it were included by itself, it would
act as a 'match' and print all files
It has to be used as a result of a name match being true (using -and)
Perhaps somebody has a better composition?
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
while IFS= read -r -d '' path; do
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" \
-type f -and -name '*.gz' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.zip' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.7z' -and -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - for testing
printf '%s\n' "${compressed_files[#]}"
Sample output (of either method):
$ comp-find.bash /tmp
/tmp/comptest/websters_english_dictionary.tmp.tar.gz 265.148 Thu Sep 10 07:53:37 AEST 2020
/tmp/comptest/What_is_Systems_Architecture_PART_1.tar.gz 1357.06 Thu Sep 10 08:17:47 AEST 2020
Note:
You can add a literal K to indicate the block size / units (kilobytes)
If you want to print the path only from this array, you can use suffix removal: printf '%s\n' "${files[#]&& *}"
For no date in the array (it's used to sort, but then its job may be done), simply remove $(date -d #${path##* }) (incl. the space).
Kind of tangential, but to use different date formats, replace $(date -d #${path##* }) with:
$(date -I -d #${path##* }) ISO format - note that short opts style: date -Id #[date] did not work for me
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) like ISO, but w/ seconds
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) same again, but w/ nanoseconds (find gives you nano seconds)
Sorry for the long post, hopefully it's informative.

How to access last item in bash array on Mac OS?

I had to create a bash array on Mac OS as follows. The $1 represents # of git commits you want to store in the array.
IFS=$'\n' read -rd '' -a array<<< "$(git log -n $1 | grep commit | awk '{print $2}')"
I can't access last array item as ${array[-1]}. I get the error "array: bad array subscript".
However, when I create the array on linux OS, I can access the last array item in the same way successfully.
readarray -t array <<< "$(git log -n $1 | grep commit | awk '{print $2}')"
echo ${array[-1]} is successful on Linux machine but not on Mac OS machine.
In a bash too old to support negative subscripts, you end up needing to do something like:
echo "${array[$((${#array[#]} - 1))]}"

Count ip repeat in log from bash

bash as I can tell from the repetition of an IP within a log through a specific search?
By example:
#!/bin/bash
# Log line: [Sat Jul 04 21:55:35 2015] [error] [client 192.168.1.39] Access denied with status code 403.
grep "status\scode\s403" /var/log/httpd/custom_error_log | while read line ; do
pattern='^\[.*?\]\s\[error\]\s\[client\s(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\].*?403'
[[ $line =~ $pattern ]]
res_remote_addr="${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}.${BASH_REMATCH[4]}"
echo "Remote Addr: $res_remote_addr"
done
I need to know the end results obtained a few times each message 403 ip, if possible sort highest to lowest.
By example output:
200.200.200.200 50 times.
200.200.200.201 40 times.
200.200.200.202 30 times.
... etc ...
This we need to create an html report from a monthly log of apache in a series of events (something like awstats).
there are better ways. following is my proposal, which should be more readable and easier to maintain:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' log_file | sort | uniq -c | sort -k1,1 -r -n
output should be in a form of:
count1 ip1
count2 ip2
update:
filter only 403:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?=.*403)' log_file | sort | uniq -c | sort -k1,1 -r -n
notice that a look ahead would suffice.
If log file is in the format as mentioned in question, the best is to use awk to filter out the status code needed plus output only the IP. Then use the uniq command to count each occurence:
awk '/code 403/ {print $8}' error.log | sort | uniq -c |sort -n
In awk, we filter by regexp /code 403/ and then for matching lines we print the 8th value (values are separated by whitespace), which is the IP.
Then we need to sort the output, so that the same IPs are one after another - this is requirement of the uniq program.
uniq -c prints each unique line from input only once - and preceded by the number of occurences. Finnaly we sort this list numericaly to get the IPs sorted by count.
Sample output (first is no. of occurences, second is IP):
1 1.1.1.1
10 2.2.2.2
12 3.3.3.3

importing data from a CSV in Bash

I have a CSV file that I need to use in a bash script. The CSV is formatted like so.
server1,file.name
server1,otherfile.name
server2,file.name
server3,file.name
I need to be able to pull this information into either an array or in some other way so that I can then filter the information and only pull out data for a single server that i can then pass to another command within the script.
I need it to go something like this.
Import workfile.csv
check hostname | return only lines from workfile.csv that have the hostname as column one and store column 2 as a variable.
find / -xdev -type f -perm -002 | compare to stored info | chmod o-w all files not in listing
I'm stuck using bash because of the environment that I'm working in.
The csv can be to big for adding all filenames in the find parameter list.
You also do not want to call find in a loop for every line in the csv.
Solution:
First make a complete list of files in a tmp file.
Second parse the csv and filter the files.
Third is chmod -w.
The next solution stores the files in a tmp
Make a script that gets the servername as a parameter.
See comment in the code:
# Before EDIT:
# Hostname by parameter 1
# Check that you have a hostname
if [ $# -ne 1 ]; then
echo "Usage: $0 hostname"
# Exit script, failure
exit 1
fi
hostname=$1
# Edit, get hostname by system call
hostname=$(hostname)
# Or: hostname=$(hostname -s)
# Additional check
if [ ! -f workfile.csv ]; then
echo "inputfile missing"
exit 1
fi
# After edits, ${hostname} is now filled.
find / -xdev -type f -perm -002 -name "${file}" > /tmp/allfiles.tmp
# Do not use cat workfile.csv | grep ..., you do not need to call cat
# grep with ^ for beginning of line, add a , for a complete first field
# grep "^${hostname}," workfile.csv
# cut for selecting second field with delimiter ','
# cut -d"," -f2
# while read file => can be improved with xargs but lets start with this.
grep "^${hostname}," workfile.csv | cut -d"," -f2 | while read file; do
# Using sed with #, not /, since you need / in the search string
# Variable in sed mist be outside the single quotes and in double quotes
# Add $ after the file for end-of-line
# delete the line with the file (#searchstring#d)
sed -i '#/'"${file}"'$#d' /tmp/allfiles.tmp
done
echo "Review /tmp/allfiles.tmp before chmodding all these files"
echo "Delete the echo and exit when you are happy"
# Just an exit for testing
exit
# Using < is for avoiding a call to cat
</tmp/allfiles.tmp xargs chmod -w
It might be easier when you can chmod -w all the files and chmod +w the files in the csv. This is a little different than you asked, since all files from the csv are writable after this process, maybe you do not want that.

Execute bash command stored in associative array over SSH, store result

For a larger project that's not relevant, I need to collect system stats from the local system or a remote system. Since I'm collecting the same stats either way, I'm preventing code duplication by storing the stats-collecting commands in a Bash associative array.
declare -A stats_cmds
# Actually contains many more key:value pairs, similar style
stats_cmds=([total_ram]="$(free -m | awk '/^Mem:/{print $2}')")
I can collect local system stats like this:
get_local_system_stats()
{
# Collect stats about local system
complex_data_structure_that_doesnt_matter=${stats_cmds[total_ram]}
# Many more similar calls here
}
A precondition of my script is that ~/.ssh/config is setup such that ssh $SSH_HOSTNAME works without any user input. I would like something like this:
get_remote_system_stats()
{
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=`ssh $SSH_HOSTNAME ${stats_cmds[total_ram]}`
}
I've tried every combination of single quotes, double quotes, backticks and such that I can imagine. Some combinations result in the stats command getting executed too early (bash: 7986: command not found), others cause syntax errors, others return null (single quotes around the stats command) but none store the proper result in my data structure.
How can I evaluate a command, stored in an associative array, on a remote system via SSH and store the result in a data structure in my local script?
Make sure that the commands you store in your array don't get expanded when you assign your array!
Also note that the complex-looking quoting style is necessary when nesting single quotes. See this SO post for an explanation.
stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"'')
And then just launch your ssh as:
sh "$ssh_hostname" "${stats_cmds[total_ram]}"
(yeah, I lowercased your variable name because uppercase variable names in Bash are really sick). Then:
get_local_system_stats() {
# Collect stats about local system
complex_data_structure_that_doesnt_matter=$( ${stats_cmds[total_ram]} )
# Many more similar calls here
}
and
get_remote_system_stats() {
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=$(ssh "$ssh_hostname" "${stats_cmds[total_ram]}")
}
First, I'm going to suggest an approach that makes minimal changes to your existing implementation. Then, I'm going to demonstrate something closer to best practices.
Smallest Modification
Given your existing code:
declare -A remote_stats_cmds
remote_stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"''
[used_ram]='free -m | awk '"'"'/^Mem:/{print $3}'"'"''
[free_ram]='free -m | awk '"'"'/^Mem:/{print $4}'"'"''
[cpus]='nproc'
[one_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $1}'"'"' | tr -d " "'
[five_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $2}'"'"' | tr -d " "'
[fifteen_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $3}'"'"' | tr -d " "'
[iowait]='cat /proc/stat | awk '"'"'NR==1 {print $6}'"'"''
[steal_time]='cat /proc/stat | awk '"'"'NR==1 {print $9}'"'"'')
...one can evaluate these locally as follows:
result=$(eval "${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
...or remotely as follows:
result=$(ssh "$hostname" bash <<<"${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
No separate form is required.
The Right Thing
Now, let's talk about an entirely different way to do this:
# no awful nested quoting by hand!
collect_total_ram() { free -m | awk '/^Mem:/ {print $2}'; }
collect_used_ram() { free -m | awk '/^Mem:/ {print $3}'; }
collect_cpus() { nproc; }
...and then, to evaluate locally:
result=$(collect_cpus)
...or, to evaluate remotely:
result=$(ssh "$hostname" bash <<<"$(declare -f collect_cpus); collect_cpus")
...or, to iterate through defined functions with the collect_ prefix and do both of these things:
declare -A local_results
declare -A remote_results
while IFS= read -r funcname; do
local_results["${funcname#collect_}"]=$("$funcname")
remote_results["${funcname#collect_}"]=$(ssh "$hostname" bash <<<"$(declare -f "$funcname"); $funcname")
done < <(compgen -A function collect_)
...or, to collect all the items into a single remote array in one pass, avoiding extra SSH round-trips and not eval'ing or otherwise taking security risks with results received from the remote system:
remote_cmd=""
while IFS= read -r funcname; do
remote_cmd+="$(declare -f "$funcname"); printf '%s\0' \"$funcname\" \"\$(\"$funcname\")\";"
done < <(compgen -A function collect_)
declare -A remote_results=( )
while IFS= read -r -d '' funcname && IFS= read -r -d '' result; do
remote_results["${funcname#collect_}"]=$result
done < <(ssh "$hostname" bash <<<"$remote_cmd")

Resources