Bash - ssh'ing to hosts in associative array

Bash - ssh'ing to hosts in associative array - arrays

I am writing a script that stores client connection data such as session ID, server, and port in an associative array. I need to ssh onto these hosts, and run an lsof command to find what process is using this port.
declare -A HOSTMAP
HOSTMAP[session]=$session_ids
HOSTMAP[cname]=$cnames
HOSTMAP[port]=$ports
When printed, data in the array is displayed as so (host names and ports have been changed to protect the innocent)
for g in "${!HOSTMAP[#]}"
do
printf "[%s]=%s\n" "$g" "${HOSTMAP[$g]}"
done
[cname]=hostname1
hostname2
hostname3
hostname4
hostname5
hostname6
hostname7
hostname8
[session]=44
5
3
9
14
71
65
47
[port]=11111
22222
33333
44444
55555
66666
77777
88888
I would like to do an operation akin to the following:
for session in $session_id
do
echo "Discovering application mapped to session ${session} on ${cname}:${port}"
ssh -tq ${cname} "lsof -Tcp | grep ${port}"
done
Many thanks in advance for advising on an elegant solution

bash doesn't allow nesting of arrays. Here, I would just use separate indexed arrays. However, since your session appears to be an integers, and indexed arrays are sparse, you can use the session id as the index for the cnames and the ports.
cnames=([44]=hostname1 [5]=hostname2 [3]=hostname3)
ports=([44]=1111 [5]=2222 [3]=3333)
for session in "${!cnames[#]}"; do
cname=${cnames[$session]}
port=${ports[$session]}
echo "Discovering application mapped to session ${session} on ${cname}:${port}"
ssh -tq ${cname} "lsof -Tcp | grep ${port}"
done

You can assign the output of ssh to a variable
result=$(ssh -tq ${cname} "lsof -Tcp | grep ${port}")
Then you can extract the data you want from $result.

Related

Loading of large CSV file into bash associative array slow/stuck

I have a very large CSV file (~10mil rows) with 2 numeric column representing ids. The requirement is: given the first id, return very fast the second id.
I need to get the CSV to behave like a map structure and it has to be in memory. I couldn't find a way to expose awk variables back to the shell so I thought of using bash associative arrays.
The problem is that loading the csv into an associative array gets very slow/stuck after ~8 mil rows. I've been trying to eliminate the causes of slowdown that I could think of: file reading/IO, associative arraylimitations. So, I have a couple of functions that read the file into an associative array, but all of them have the same slowness problem.
Here is the test data
loadSplittedFilesViaMultipleArrays -> assumes the original file was split into smaller files (1 mil rows) and uses a while read loop to build 4 associative arrays (max 3 mil records each)
loadSingleFileViaReadarray -> uses readarray to read the original file into a temp array and then goes through that to build the associative array
loadSingleFileViaWhileRead -> uses a while read loop to build the associative array
But I can't seem to figure it out. Maybe this way of doing it is completely wrong... Can anyone pitch in with some suggestions?

Bash is the wrong tool for an associative array of this size. Consider using a language more suited (Perl, Python, Ruby, PHP, js, etc etc)
For a Bash only environment you could use a sqlite3 sql database which is usually installed with Bash. (It is not POSIX however)
First you would create the database from your csv file. There are many ways to do this (Perl, Python, Ruby, GUI tools) but this is simple enough to do interactively in sqlite3 command line shell (exp.db must not exist at this point):
$ sqlite3 exp.db
SQLite version 3.19.3 2017-06-27 16:48:08
Enter ".help" for usage hints.
sqlite> create table mapping (id integer primary key, n integer);
sqlite> .separator ","
sqlite> .import /tmp/mapping.csv mapping
sqlite> .quit
Or, pipe in the sql statements:
#!/bin/bash
cd /tmp
[[ -f exp.db ]] && rm exp.db # must be a new db as written
echo 'create table mapping (id integer primary key, n integer);
.separator ","
.import mapping.csv mapping' | sqlite3 exp.db
(Note: as written, exp.db must not exist or you will get INSERT failed: UNIQUE constraint failed: mapping.id. You can write it so the database exp.db is updated rather than created by the csv file, but you would probably want to use a language like Python, Perl, Tcl, Ruby, etc to do that.)
In either case, that will create an indexed database mapping the first column onto the second. The import will take a little while (15-20 seconds with the 198 MB example) but it creates a new persistent database from the imported csv:
$ ls -l exp.db
-rw-r--r-- 1 dawg wheel 158105600 Nov 19 07:16 exp.db
Then you can quickly query that new database from Bash:
$ time sqlite3 exp.db 'select n from mapping where id=1350044575'
1347465036
real 0m0.004s
user 0m0.001s
sys 0m0.001s
That takes 4 milliseconds on my older iMac.
If you want to use Bash variables for your query you can concatenate or construct the query string as needed:
$ q=1350044575
$ sqlite3 exp.db 'select n from mapping where id='"$q"
1347465036
And since the db is persistent, you can just compare file times of the csv file to the db file to test whether you need to recreate it:
if [[ ! -f "$db_file" || "$csv_file" -nt "$db_file" ]]; then
[[ -f "$db_file" ]] && rm "$db_file"
echo "creating $db_file"
# create the db as above...
else
echo "reusing $db_file"
fi
# query the db...
More:
sqlite tutorial
sqlite home

Inspired by #HuStmpHrrr's comment, I thought about another, maybe simpler alternative.
You can use GNU Parallel to split the file up into 1MB (or other) sized chunks and then use all your CPU cores to search each of the resulting chunks in parallel:
parallel --pipepart -a mapping.csv --quote awk -F, -v k=1350044575 '$1==k{print $2;exit}'
1347465036
Takes under a second on my iMac and that was the very last record.

I made a little Perl-based TCP server that reads the CSV into a hash and then sits looping forever doing lookups for requests coming via TCP from clients. It is pretty self-explanatory:
#!/usr/bin/perl
use strict;
use warnings;
################################################################################
# Load hash from CSV at startup
################################################################################
open DATA, "mapping.csv";
my %hash;
while( <DATA> ) {
chomp $_;
my ($field1,$field2) = split /,/, $_;
if( $field1 ne '' ) {
$hash{$field1} = $field2;
}
}
close DATA;
print "Ready\n";
################################################################################
# Answer queries forever
################################################################################
use IO::Socket::INET;
# auto-flush on socket
$| = 1;
my $port=5000;
# creating a listening socket
my $socket = new IO::Socket::INET (
LocalHost => '127.0.0.1',
LocalPort => $port,
Proto => 'tcp',
Listen => 5,
Reuse => 1
);
die "cannot create socket $!\n" unless $socket;
while(1)
{
# waiting for a new client connection
my $client_socket = $socket->accept();
my $data = "";
$client_socket->recv($data, 1024);
my $key=$data;
chomp $key;
my $reply = "ERROR: Not found $key";
if (defined $hash{$key}){
$reply=$hash{$key};
}
print "DEBUG: Received $key: Replying $reply\n";
$client_socket->send($reply);
# notify client that response has been sent
shutdown($client_socket, 1);
}
So, you save the code above as go.pl and then make it executable with:
chmod +x go.pl
then start the server in the background with:
./go.pl &
Then, when you want to do a lookup as a client, you send your key to localhost:5000 using the standard socat utility like this:
socat - TCP:127.0.0.1:5000 <<< "1350772177"
1347092335
As a quick benchmark, it does 1,000 lookups in 8 seconds.
START=$SECONDS; tail -1000 *csv | awk -F, '{print $1}' |
while read a; do echo $a | socat - TCP:127.0.0.1:5000 ; echo; done; echo $START,$SECONDS
It could probably be speeded up by a slight change to handle multiple keys to lookup per request to reduce socket connection and teardown overhead.

How can I split bash CLI arguments into two separate arrays for later usage?

New to StackOverflow and new to bash scripting. I have a shell script that is attempting to do the following:
cd into a directory on a remote machine. Assume I have already established a successful SSH connection.
Save the email addresses from the command line input (these could range from 1 to X number of email addresses entered) into an array called 'emails'
Save the brand IDs (integers) from the command line input (these could range from 1 to X number of brand IDs entered) into an array called 'brands'
Use nested for loops to iterate over the 'emails' and 'brands' arrays and add each email address to each brand via add.py
I am running into trouble splitting up and saving data into each array, because I do not know where the command line indices of the emails will stop, and where the indices of the brands will begin. Is there any way I can accomplish this?
command line input I expect to look as follows:
me#some-remote-machine:~$ bash script.sh person1#gmail.com person2#gmail.com person3#gmail.com ... personX#gmail.com brand1 brand2 brand3 ... brandX
The contents of script.sh look like this:
#!/bin/bash
cd some/directory
emails= ???
brands= ???
for i in $emails
do
for a in $brands
do
python test.py add --email=$i --brand_id=$a --grant=manage
done
done
Thank you in advance, and please let me know if I can clarify or provide more information.

Use a sentinel argument that cannot possibly be a valid e-mail address. For example:
$ bash script.sh person1#gmail.com person2#gmail.com '***' brand1 brand2 brand3
Then in a loop, you can read arguments until you reach the non-email; everything after that is a brand.
#!/bin/bash
cd some/directory
while [[ $1 != '***' ]]; do
emails+=("$1")
shift
done
shift # Ignore the sentinal
brands=( "$#" ) # What's left
for i in "${emails[#]}"
do
for a in "${brands[#]}"
do
python test.py add --email="$i" --brand_id="$a" --grant=manage
done
done
If you can't modify the arguments that will be passed to script.sh, then perhaps you can distinguish between an address and a brand by the presence or absence of a #:
while [[ $1 = *#* ]]; do
emails+=("$1")
shift
done
brands=("$#")
I'm assuming that the number of addresses and brands are independent. Otherwise, you can simply look at the total number of arguments $#. Say there are N of each. Then
emails=( "${#:1:$#/2}" ) # First half
brands=( "${#:$#/2+1}" ) # Second half

Count ip repeat in log from bash

bash as I can tell from the repetition of an IP within a log through a specific search?
By example:
#!/bin/bash
# Log line: [Sat Jul 04 21:55:35 2015] [error] [client 192.168.1.39] Access denied with status code 403.
grep "status\scode\s403" /var/log/httpd/custom_error_log | while read line ; do
pattern='^\[.*?\]\s\[error\]\s\[client\s(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\].*?403'
[[ $line =~ $pattern ]]
res_remote_addr="${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}.${BASH_REMATCH[4]}"
echo "Remote Addr: $res_remote_addr"
done
I need to know the end results obtained a few times each message 403 ip, if possible sort highest to lowest.
By example output:
200.200.200.200 50 times.
200.200.200.201 40 times.
200.200.200.202 30 times.
... etc ...
This we need to create an html report from a monthly log of apache in a series of events (something like awstats).

there are better ways. following is my proposal, which should be more readable and easier to maintain:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' log_file | sort | uniq -c | sort -k1,1 -r -n
output should be in a form of:
count1 ip1
count2 ip2
update:
filter only 403:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?=.*403)' log_file | sort | uniq -c | sort -k1,1 -r -n
notice that a look ahead would suffice.

If log file is in the format as mentioned in question, the best is to use awk to filter out the status code needed plus output only the IP. Then use the uniq command to count each occurence:
awk '/code 403/ {print $8}' error.log | sort | uniq -c |sort -n
In awk, we filter by regexp /code 403/ and then for matching lines we print the 8th value (values are separated by whitespace), which is the IP.
Then we need to sort the output, so that the same IPs are one after another - this is requirement of the uniq program.
uniq -c prints each unique line from input only once - and preceded by the number of occurences. Finnaly we sort this list numericaly to get the IPs sorted by count.
Sample output (first is no. of occurences, second is IP):
1 1.1.1.1
10 2.2.2.2
12 3.3.3.3

Unique/No Duplicated values in Shell Array Linux

I need to make a new array or just delete from the actual array the duplicate elements,
#The NTP IPS are the following ones:
#10.30.10.0, 10.30.10.0, 10.30.20.0, 10.30.20.0, 10.30.20.0
#!/bin/bash
ips_networks=()
for ip in ${ips_for_ntp[#]};do
ips_networks+=${ip%.*}.0
done
So I'll get ips_networks with duplicate ips, but I need just one of each ip into another array or the same, I have try with awk, set -A (Is not working on my linux), cut but with no luck, is there anyway to make an unique value array?

ips="10.30.10.0, 10.30.10.0, 10.30.20.0, 10.30.20.0, 10.30.20.0"
unique_ips=`echo $ips | sed -e "s/\s\\+//g" | sed -e "s/,/\\n/g"| sort | uniq`
echo $unique_ips #10.30.10.0 10.30.20.0

Linux redirect to multiple targets

How could I redirect output to multiple targets, say stdout, file, socket and so?
say, i have a system here and connected to some network. When it fails, the guy supervises it via ssh should be able to notice it, or the GUI client should receive the error info, or, in the worst case, we can still find something in the log.
or even more targets. Atomicity may or may not need to be guaranteed.
so how to do this in bash and/or in C?

I think you are looking for the "tee" command.

You can redirect with tee to any number of files and to any commands too, like:
seq 50 | tee copy1 copy2 >((echo Original linecount: $(grep -c ''))>&2) | grep '9'
what prints:
9
19
29
39
49
Original linecount: 50 #printed to stderr
or
seq 50 | tee copy1 copy2 >((echo Original linecount: $(grep -c ''))>&2) | grep '9' | wc -l
what prints the count of numbers containing digit 9 in first 50 numbers, while make two copyes of the original sequence...
Original linecount: 50 #stderr
5

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Bash - ssh'ing to hosts in associative array - arrays

You can assign the output of ssh to a variable result=$(ssh -tq ${cname} "lsof -Tcp | grep ${port}") Then you can extract the data you want from $result.

Related

Loading of large CSV file into bash associative array slow/stuck

How can I split bash CLI arguments into two separate arrays for later usage?

Count ip repeat in log from bash

Unique/No Duplicated values in Shell Array Linux

Linux redirect to multiple targets

Categories

Resources