Count ip repeat in log from bash - arrays

bash as I can tell from the repetition of an IP within a log through a specific search?
By example:
#!/bin/bash
# Log line: [Sat Jul 04 21:55:35 2015] [error] [client 192.168.1.39] Access denied with status code 403.
grep "status\scode\s403" /var/log/httpd/custom_error_log | while read line ; do
pattern='^\[.*?\]\s\[error\]\s\[client\s(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\].*?403'
[[ $line =~ $pattern ]]
res_remote_addr="${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}.${BASH_REMATCH[4]}"
echo "Remote Addr: $res_remote_addr"
done
I need to know the end results obtained a few times each message 403 ip, if possible sort highest to lowest.
By example output:
200.200.200.200 50 times.
200.200.200.201 40 times.
200.200.200.202 30 times.
... etc ...
This we need to create an html report from a monthly log of apache in a series of events (something like awstats).

there are better ways. following is my proposal, which should be more readable and easier to maintain:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' log_file | sort | uniq -c | sort -k1,1 -r -n
output should be in a form of:
count1 ip1
count2 ip2
update:
filter only 403:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?=.*403)' log_file | sort | uniq -c | sort -k1,1 -r -n
notice that a look ahead would suffice.

If log file is in the format as mentioned in question, the best is to use awk to filter out the status code needed plus output only the IP. Then use the uniq command to count each occurence:
awk '/code 403/ {print $8}' error.log | sort | uniq -c |sort -n
In awk, we filter by regexp /code 403/ and then for matching lines we print the 8th value (values are separated by whitespace), which is the IP.
Then we need to sort the output, so that the same IPs are one after another - this is requirement of the uniq program.
uniq -c prints each unique line from input only once - and preceded by the number of occurences. Finnaly we sort this list numericaly to get the IPs sorted by count.
Sample output (first is no. of occurences, second is IP):
1 1.1.1.1
10 2.2.2.2
12 3.3.3.3

Related

How can I get the names of all namespaces containing the word "nginx" and store those names in an array

Basically I want to automate this task where I have some namespaces in Kubernetes I need to delete and others that I want to leave alone. These namespaces contain the word nginx. So I was thinking in order to do that I could get the output of get namespace using some regex and store those namespaces in an array, then iterate through that array deleting them one by one.
array=($(kubectl get ns | jq -r 'keys[]'))
declare -p array
for n in {array};
do
kubectl delete $n
done
I tried doing something like this but this is very basic and doesn't even have the regex. But I just left it here as an example to show what I'm trying to achieve. Any help is appreciated and thanks in advance.
kubectl get ns doesn't output JSON unless you add -o json. This:
array=($(kubectl get ns | jq -r 'keys[]'))
Should result in an error like:
parse error: Invalid numeric literal at line 1, column 5
kubectl get ns -o json emits a JSON response that contains a list of Namespace resources in the items key. You need to get the metadata.name attribute from each item, so:
kubectl get ns -o json | jq -r '.items[].metadata.name'
You only want namespaces that contain the word "nginx". We could filter the above list with grep, or we could add that condition to our jq expression:
kubectl get ns -o json | jq -r '.items[]|select(.metadata.name|test("nginx"))|.metadata.name'
This will output your desired namespaces. At this point, there's no reason to store this in array and use a for loop; you can just pipe the output to xargs:
kubectl get ns -o json |
jq -r '.items[]|select(.metadata.name|test("nginx"))|.metadata.name' |
xargs kubectl delete ns
kubectl get ns
output
NAME STATUS AGE
default Active 75d
kube-public Active 75d
kube-system Active 75d
oci-service-operator-system Active 31d
olm Active 31d
command
kubectl get ns --no-headers | awk '{if ($1 ~ "de") print $1}'
Output
default
kube-node-lease
this will give you a list of namespaces
array=$(kubectl get ns --no-headers | awk '{if ($1 ~ "de") print $1}')
Testing
bash-4.2$ array=$(kubectl get ns --no-headers | awk '{if ($1 ~ "de") print $1}')
bash-4.2$ echo $array
default kube-node-lease
bash-4.2$ for n in $array; do echo $n; done
default
kube-node-lease
bash-4.2$

Putting files in directory into array variable

I'm writing bash code that will search for specific files in the directory it is run in and add them into an array variable. The problem I am having is formatting the results. I need to find all the compressed files in the current directory and display both the names and sizes of the files in order of last modified. I want to take the results of that command and put them into an array variable with each line element containing the file's name and corresponding size but I don't know how to do that. I'm not sure if I should be using command "find" instead of "ls" but here is what I have so far:
find_files="$(ls -1st --block-size=MB)"
arr=( ($find_files) )
I'm not sure exactly what format you want the array to be in, but here is a snippet that creates an associative array keyed by filename with the size as the value:
$ ls -l test.{zip,bz2}
-rw-rw-r-- 1 user group 0 Sep 10 13:27 test.bz2
-rw-rw-r-- 1 user group 0 Sep 10 13:26 test.zip
$ declare -A sizes; while read SIZE FILENAME ; do sizes["$FILENAME"]="$SIZE"; done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -A sizes=(["'test.zip'"]="0" ["'test.bz2'"]="0" )
$
And if you just want an array of literally "filename size" entries, that's even easier:
$ while read SIZE FILENAME ; do sizes+=("$FILENAME $SIZE"); done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -a sizes=([0]="'test.zip' 0" [1]="'test.bz2' 0")
$
Both of these solutions work, and were tested via copy paste from this post.
The first is fairly slow. One problem is external program invocations within a loop - date for example, is invoked for every file. You could make it quicker by not including the date in the output array (see Notes below). Particularly for method 2 - that would result in no external command invocations inside the while loop. But method 1 is really the problem - orders of magnitude slower.
Also, somebody probably knows how to convert an epoch date to another format in awk for example, which could be faster. Maybe you could do the sort in awk too. Perhaps just keep the epoch date?
These solutions are bash / GNU heavy and not portable to other environments (bash here strings, find -printf). OP tagged linux and bash though, so GNU can be assumed.
Solution 1 - capture any compressed file - using file to match (slow)
The criteria for 'compressed' is if file output contains the word compress
Reliable enough, but perhaps there is a conflict with some other file type description?
file -l | grep compress (file 5.38, Ubuntu 20.04, WSL) indicates for me there are no conflicts at all (all files listed are compression formats)
I couldn't find a way of classifying any compressed file other than this
I ran this on a directory containing 1664 files - time (real) was 40 seconds
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
# Make the array
# A here string (<<<) must be used, to keep array in the global environment
while IFS= read -r -d '' path; do
[[ "$(file --brief "${path%% *}")" == *compress* ]] &&
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" -type f -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - to test
printf '%s\n' "${compressed_files[#]}"
Solution 2 - use file extensions - orders of magnitude faster
If you know exactly what extensions you are looking for, you can
compose them in a find command
This is alot faster
On the same directory as above, containing 1664 files - time (real) was 200 miliseconds
This example looks for .gz, .zip, and .7z (gzip, zip and 7zip respectively)
I'm not sure if -type f -and -regex '.*[.]\(gz\|zip\|7z\) -and printf may be faster again, now I think of it. I started with globs cause I assumed that was quicker
That may also allow for storing the extension list in a variable..
This method avoids a file analysis on every file in your target
It also makes the while loop shorter - you're only iterating matches
Note the repetition of -printf here, this is due to the logic that
find uses: -printf is 'True'. If it were included by itself, it would
act as a 'match' and print all files
It has to be used as a result of a name match being true (using -and)
Perhaps somebody has a better composition?
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
while IFS= read -r -d '' path; do
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" \
-type f -and -name '*.gz' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.zip' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.7z' -and -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - for testing
printf '%s\n' "${compressed_files[#]}"
Sample output (of either method):
$ comp-find.bash /tmp
/tmp/comptest/websters_english_dictionary.tmp.tar.gz 265.148 Thu Sep 10 07:53:37 AEST 2020
/tmp/comptest/What_is_Systems_Architecture_PART_1.tar.gz 1357.06 Thu Sep 10 08:17:47 AEST 2020
Note:
You can add a literal K to indicate the block size / units (kilobytes)
If you want to print the path only from this array, you can use suffix removal: printf '%s\n' "${files[#]&& *}"
For no date in the array (it's used to sort, but then its job may be done), simply remove $(date -d #${path##* }) (incl. the space).
Kind of tangential, but to use different date formats, replace $(date -d #${path##* }) with:
$(date -I -d #${path##* }) ISO format - note that short opts style: date -Id #[date] did not work for me
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) like ISO, but w/ seconds
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) same again, but w/ nanoseconds (find gives you nano seconds)
Sorry for the long post, hopefully it's informative.

Bash RegEx and Storing Commands into a Variable

in Bash I have an array names that contains the string values
Dr. Praveen Hishnadas
Dr. Vij Pamy
John Smitherson,Dr.,Service Account
John Dinkleberg,Dr.,Service Account
I want to capture only the names
Praveen Hishnadas
Vij Pamy
John Smitherson
John Dinkleberg
and store them back into the original array, overwriting their unsanitized versions.
I have the following snippet of code note that I'm executing the regex in Perl (-P)
for i in "${names[#]}"
do
echo $i|grep -P '(?:Dr\.)?\w+ \w+|$' -o | head -1
done
Which yields the output
Dr. Praveen Hishnadas
Dr. Vij Pamy
John Smitherson
John Dinkleberg
Questions:
1) Am I using the look-around command ?: incorrectly? I'm trying to optionally match "Dr." while
not capturing it
2) How would I store the result of that echo back into the array names? I have tried setting it to
i=echo $i|grep -P '(?:Dr\.)?\w+ \w+|$' -o | head -1
i=$(echo $i|grep -P '(?:Dr\.)?\w+ \w+|$' -o | head -1)
i=`echo $i|grep -P '(?:Dr\.)?\w+ \w+|$' -o | head -1`
but to no avail. I only started learning bash 2 days ago and I feel like my syntaxing is slightly off. Any help is appreciated.
Your lookahead says "include Dr. if it's there". You probably want a negative lookahead like (?!Dr\.)\w+ \w+. I'll throw in a leading \b anchor a a bonus.
names=('Dr. Praveen Hishnadas' 'Dr. Vij Pamy' 'John Smitherson,Dr.,Service Account' 'John Dinkleberg,Dr.,Service Account')
for i in "${names[#]}"
do
grep -P '\b(?!Dr\.)\w+ \w+' -o <<<"$i" |
head -n 1
done
It doesn't matter for the examples you provided, but you should basically always quote your variables. See When to wrap quotes around a shell variable?
Maybe also google "falsehoods programmers believe about names".
To update your array, loop over the array indices and assign back into the array.
for((i=0;i<${#names[#]};++i)); do
names[$i]=$(grep -P '\b(?!Dr\.)\w+ \w+|$' -o <<<"${names[i]}" | head -n 1)
done
How about something like this for the regex?
(?:^|\.\s)(\w+)\s+(\w+)
Regex Demo
(?: # Non-capturing group
^|\.\s # Start match if start of line or following dot+space sequence
)
(\w+) # Group 1 captures the first name
\s+ # Match unlimited number of spaces between first and last name (take + off to match 1 space)
(\w+) # Group 2 captures surname.

(pdksh) Looping through n files and store their dates in array

Using pdksh
stat() command is unavailable on system.
I need to loop through the amount of files found and store their dates in an array. $COMMAND stores the number of files found in $location as can be seen below.
Can someone help me please?
COMMAND=`find $location -type f | wc -l`
CMD_getDate=$(find $location -type f | xargs ls -lrt | awk '{print $6} {print $7}')
Well, first, you don't need to do the wc. The size of the array will tell you how many there are. Here's a simple way to build an array of dates and names (designed for pdksh; there are better ways to do it in AT&T ksh or bash):
set -A files
set -A dates
find "$location" -type f -ls |&
while read -p inum blocks symode links owner group size rest; do
set -A files "${files[#]}" "${rest##* }"
set -A dates "${dates[#]}" "${rest% *}"
done
Here's one way to examine the results:
print "Found ${#files[#]} files:"
let i=0
while (( i < ${#files[#]} )); do
print "The file '${files[i]}' was modified on ${dates[i]}."
let i+=1
done
This gives you the full date string, not just the month and day. That might be what you want - the date output of ls -l (or find -ls) is variable depending on how long ago the file was modified. Given these files, how does your original format distinguish between the modification times of a and b?
$ ls -l
total 0
-rw-rw-r--+ 1 mjreed staff 0 Feb 3 2014 a
-rw-rw-r--+ 1 mjreed staff 0 Feb 3 04:05 b
As written, the code above would yields this for the above directory with location=.:
Found 2 files:
The file './a' was modified on Feb 3 2014.
The file './b' was modified on Feb 3 00:00.
It would help if you indicated what the actual end goal is..

BASH store values in an array and check difference of each value

[CentOS, BASH, cron] Is there a method to declare variants that would keep even when system restarts?
The scenario is to snmpwalk interface I/O errors and store the values in an array. A cron job to snmpwalk again, say 5 mins later, would have another set of values. I would like to compare them with previous corresponding value of each interface. If the difference exceeds the threshold (50), an alert would generate.
So the question is: how to store an array variable that would lost in the system? and how to check the difference of each value in two arrays?
UPDATE Mar 16, 2012 I attach my final script here for your reference.
#!/bin/bash
# This script is to monitor interface Input/Output Errors of Cisco devices, by snmpwalk the error values every 5 mins, and send email alert if incremental value exceeds threshold (e.g. 500).
# Author: Wu Yajun | Created: 12Mar2012 | Updated: 16Mar2012
##########################################################################
DIR="$( cd "$( dirname "$0" )" && pwd )"
host=device.ip.addr.here
# Check and initiate .log file storing previous values, create .tmp file storing current values.
test -e $DIR/host1_ifInErrors.log || snmpwalk -c public -v 1 $host IF-MIB::ifInErrors > $DIR/host1_ifInErrors.log
snmpwalk -c public -v 1 $host IF-MIB::ifInErrors > $DIR/host1_ifInErrors.tmp
# Compare differences of the error values, and alert if diff exceeds threshold.
# To exclude checking some interfaces, e.g. Fa0/6, Fa0/10, Fa0/11, change the below "for loop" to style as:
# for i in {1..6} {8..10} {13..26}
totalIfNumber=$(echo $(wc -l $DIR/host1_ifInErrors.tmp) | sed 's/ \/root.*$//g')
for (( i=1; i<=$totalIfNumber; i++))
do
currentValue=$(cat $DIR/host1_ifInErrors.tmp | sed -n ''$i'p' | sed 's/^.*Counter32: //g')
previousValue=$(cat $DIR/host1_ifInErrors.log | sed -n ''$i'p' | sed 's/^.*Counter32: //g')
diff=$(($currentValue-$previousValue))
[ $diff -ge 500 ] && (ifName=$(echo $(snmpwalk -c public -v 1 $host IF-MIB::ifName.$i) | sed 's/^.*STRING: //g') ; echo "ATTENTION - Input Error detected from host1 interface $ifName" | mutt -s "ATTENTION - Input Error detected from host1 interface $ifName" <email address here>)
done
# Store current values for next time checking.
snmpwalk -c public -v 1 $host IF-MIB::ifInErrors > $DIR/host1_ifInErrors.log
Save the variables in a file. Add a date stamp:
echo "$(date)#... variables here ...." >> "$file"
Read the last values from the file:
tail -1 "$file" | cut "-d#" -f2 | read ... variables here ....
That also gives you a nice log file where you can monitor the changes. I suggest to always append to the file, so you can easily see when the service is down/didn't run for some reason.
To check for changes, you can use an simple if
if [[ "...old values..." != "...new values..." ]]; then
send mail
fi

Resources