Proper way to keep array from pipe BASH

Proper way to keep array from pipe BASH - arrays

I saw quite a few different solutions to resolve an issue with keeping an array from a pipe however none seemed to do the trick for me, currently my script works correctly however the array "databasesarray" is lost upon "done", how would I go about keeping this information with my complex pipe scheme?
databasesarray=()
N=0
dbs -d 123123 | grep db|awk '{print $2}'|while read db;
do
databasesarray[$N]="$db";
databasesarray[$N]+=$(gdb $db|grep dn);
echo ${N} ${databasesarray[$N]};
N=$(($N + 1));
done

Better and more efficient way of filling up array in a loop:
databasesarray=()
while read -r db; do
databasesarray+=( "$db $(gdb "$db"|grep "dn")" )
done < <(dbs -d 123123 | awk '/db/{print $2}')
Your grep and awk can be combined into one
Instead of pipe with while it better to use process substitution < <(...) syntax
PS: You could use read -a for filling up array:
read -a databasesarray < <(dbs -d 123123 | awk '/db/{print $2}')

Related

bash script split array of strings to designed output

Data in log file is something like:
"ipHostNumber: 127.0.0.1
ipHostNumber: 127.0.0.2
ipHostNumber: 127.0.0.3"
that's my code snippet:
readarray -t iparray < "$destlog"
unset iparray[0]
for ip in "${iparray[#]}"
do
IFS=$'\n' read -r -a iparraychanged <<< "${iparray[#]}"
done
And what I wanted to receive is to transfer IP's to another array and then read every line from that array and ping it.
UPDATE: I've acknowledged something, it's probably that I want one array to another but without some string, in this case cut "ipHostNumber: " and have remained only IPs.
Thanks in advance, if there's anything missing please let me know.

Why do you need arrays at all? Just read the input and do the work.
while IFS=' ' read -r _ ip; do
ping -c1 "$ip"
done < "$destlog"
See https://mywiki.wooledge.org/BashFAQ/001 .
Another way is to use xargs and filtering:
awk '{print $2}' "$destlog" | xargs -P0 -n1 ping -c1

I prefer KamilCuk's xargs solution, but if you just wanted the array because you plan to reuse it -
$: readarray -t ip < <( sed '1d; s/^.* //' file )
Now it's loaded.
$: declare -p ip
declare -a ip=([0]="127.0.0.2" [1]="127.0.0.3")
$: for addr in "${ip[#]}"; do ping -c 1 "$addr"; done

sh shell: how do I grab and store values, which may have space, in an array

I am trying to write a script to grab the users from the passwd file
USERS_LIST=( $( cat /etc/passwd | cut -d":" -f1 ) )
the above would do the trick up until now because I only had users with no spaces in their names.
However, this is not the case anymore. I need to be able to resolve usernames that may very well have spaces in their names.
I tried reading line by line the file, but the same problem exists (this is one line but I have indented it for clarity here):
tk=($( while read line ; do
j=$(echo ${line} | cut -d":" -f1 )
echo "$j"
done < /etc/passwd )
)
unfortunately if I try to print the array, the usernames with space will be split in 2 array cells.
So username "named user" , will occupy array [0] and [1] locations.
How can I fix that in sh shell?
thank you for your help!

Arrays are bash (and ksh, and zsh) features not present in POSIX sh, so I'm assuming that you mean to ask about bash. You can't store anything in an array in sh, since sh doesn't have arrays.
Don't populate an array that way.
users_list=( $( cat /etc/passwd | cut -d":" -f1 ) )
...string-splits and glob-expands contents. Instead:
# This requires bash 4.0 or later
mapfile -t users_list < <(cut -d: -f1 </etc/passwd)
...or...
IFS=$'\n' read -r -d '' -a users_list < <(cut -d: -f1 </etc/passwd)
Now, if you really want POSIX sh compatibility, there is one array -- exactly one, the argument list. You can overwrite it if you see fit.
set --
cut -d: -f1 </etc/passwd >tempfile
while read -r username; do
set -- "$#" "$username"
done <tempfile
At that point, "$#" is an array of usernames.

Having issues using IFS to cut a string into an array. BASH

I have tried everything I can think of to cut this into separate elements for my array but I am struggling..
Here is what I am trying to do..
(This command just rips out the IP addresses on the first element returned )
$ IFS=$"\n"
$ aaa=( $(netstat -nr | grep -v '^0.0.0.0' | grep -v 'eth' | grep "UGH" | sed 's/ .*//') )
$ echo "${#aaa[#]}"
1
$ echo "${aaa[0]}"
4.4.4.4
5.5.5.5
This shows more than one value when I am looking for the array to separate 4.4.4.4 into ${aaa[0]} and 5.5.5.5 into ${aaa[1]}
I have tried:
IFS="\n"
IFS=$"\n"
IFS=" "
Very confused as I have been working with arrays a lot recently and have never ran into this particular issue.
Can someone tell me what I am doing wrong?

There is a very good example on how to use IFS + read -a to split a string into an array on this other stackoverflow page
How does splitting string to array by 'read' with IFS word separator in bash generated extra space element?
netstat is deprecated, replaced by ss, so I'm not sure how to reproduce your exact problem

Execute bash command stored in associative array over SSH, store result

For a larger project that's not relevant, I need to collect system stats from the local system or a remote system. Since I'm collecting the same stats either way, I'm preventing code duplication by storing the stats-collecting commands in a Bash associative array.
declare -A stats_cmds
# Actually contains many more key:value pairs, similar style
stats_cmds=([total_ram]="$(free -m | awk '/^Mem:/{print $2}')")
I can collect local system stats like this:
get_local_system_stats()
{
# Collect stats about local system
complex_data_structure_that_doesnt_matter=${stats_cmds[total_ram]}
# Many more similar calls here
}
A precondition of my script is that ~/.ssh/config is setup such that ssh $SSH_HOSTNAME works without any user input. I would like something like this:
get_remote_system_stats()
{
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=`ssh $SSH_HOSTNAME ${stats_cmds[total_ram]}`
}
I've tried every combination of single quotes, double quotes, backticks and such that I can imagine. Some combinations result in the stats command getting executed too early (bash: 7986: command not found), others cause syntax errors, others return null (single quotes around the stats command) but none store the proper result in my data structure.
How can I evaluate a command, stored in an associative array, on a remote system via SSH and store the result in a data structure in my local script?

Make sure that the commands you store in your array don't get expanded when you assign your array!
Also note that the complex-looking quoting style is necessary when nesting single quotes. See this SO post for an explanation.
stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"'')
And then just launch your ssh as:
sh "$ssh_hostname" "${stats_cmds[total_ram]}"
(yeah, I lowercased your variable name because uppercase variable names in Bash are really sick). Then:
get_local_system_stats() {
# Collect stats about local system
complex_data_structure_that_doesnt_matter=$( ${stats_cmds[total_ram]} )
# Many more similar calls here
}
and
get_remote_system_stats() {
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=$(ssh "$ssh_hostname" "${stats_cmds[total_ram]}")
}

First, I'm going to suggest an approach that makes minimal changes to your existing implementation. Then, I'm going to demonstrate something closer to best practices.
Smallest Modification
Given your existing code:
declare -A remote_stats_cmds
remote_stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"''
[used_ram]='free -m | awk '"'"'/^Mem:/{print $3}'"'"''
[free_ram]='free -m | awk '"'"'/^Mem:/{print $4}'"'"''
[cpus]='nproc'
[one_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $1}'"'"' | tr -d " "'
[five_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $2}'"'"' | tr -d " "'
[fifteen_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $3}'"'"' | tr -d " "'
[iowait]='cat /proc/stat | awk '"'"'NR==1 {print $6}'"'"''
[steal_time]='cat /proc/stat | awk '"'"'NR==1 {print $9}'"'"'')
...one can evaluate these locally as follows:
result=$(eval "${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
...or remotely as follows:
result=$(ssh "$hostname" bash <<<"${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
No separate form is required.
The Right Thing
Now, let's talk about an entirely different way to do this:
# no awful nested quoting by hand!
collect_total_ram() { free -m | awk '/^Mem:/ {print $2}'; }
collect_used_ram() { free -m | awk '/^Mem:/ {print $3}'; }
collect_cpus() { nproc; }
...and then, to evaluate locally:
result=$(collect_cpus)
...or, to evaluate remotely:
result=$(ssh "$hostname" bash <<<"$(declare -f collect_cpus); collect_cpus")
...or, to iterate through defined functions with the collect_ prefix and do both of these things:
declare -A local_results
declare -A remote_results
while IFS= read -r funcname; do
local_results["${funcname#collect_}"]=$("$funcname")
remote_results["${funcname#collect_}"]=$(ssh "$hostname" bash <<<"$(declare -f "$funcname"); $funcname")
done < <(compgen -A function collect_)
...or, to collect all the items into a single remote array in one pass, avoiding extra SSH round-trips and not eval'ing or otherwise taking security risks with results received from the remote system:
remote_cmd=""
while IFS= read -r funcname; do
remote_cmd+="$(declare -f "$funcname"); printf '%s\0' \"$funcname\" \"\$(\"$funcname\")\";"
done < <(compgen -A function collect_)
declare -A remote_results=( )
while IFS= read -r -d '' funcname && IFS= read -r -d '' result; do
remote_results["${funcname#collect_}"]=$result
done < <(ssh "$hostname" bash <<<"$remote_cmd")

Faster grep function for big (27GB) files

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB).
To speed up the analysis I split the 27GB file into 1GB files and then applied the following script (with the help of some people here). However it is not very efficient (to produce a 180KB file it takes 30 hours!).
Here's the script. Is there a more appropriate tool than grep? Or a more efficient way to use grep?
#!/bin/bash
NR_CPUS=4
count=0
for z in `echo {a..z}` ;
do
for x in `echo {a..z}` ;
do
for y in `echo {a..z}` ;
do
for ids in $(cat input.sam|awk '{print $1}');
do
grep $ids sample_"$z""$x""$y"|awk '{print $1" "$10" "$11}' >> output.txt &
let count+=1
[[ $((count%NR_CPUS)) -eq 0 ]] && wait
done
done #&

A few things you can try:
1) You are reading input.sam multiple times. It only needs to be read once before your first loop starts. Save the ids to a temporary file which will be read by grep.
2) Prefix your grep command with LC_ALL=C to use the C locale instead of UTF-8. This will speed up grep.
3) Use fgrep because you're searching for a fixed string, not a regular expression.
4) Use -f to make grep read patterns from a file, rather than using a loop.
5) Don't write to the output file from multiple processes as you may end up with lines interleaving and a corrupt file.
After making those changes, this is what your script would become:
awk '{print $1}' input.sam > idsFile.txt
for z in {a..z}
do
for x in {a..z}
do
for y in {a..z}
do
LC_ALL=C fgrep -f idsFile.txt sample_"$z""$x""$y" | awk '{print $1,$10,$11}'
done >> output.txt
Also, check out GNU Parallel which is designed to help you run jobs in parallel.

My initial thoughts are that you're repeatedly spawning grep. Spawning processes is very expensive (relatively) and I think you'd be better off with some sort of scripted solution (e.g. Perl) that doesn't require the continual process creation
e.g. for each inner loop you're kicking off cat and awk (you won't need cat since awk can read files, and in fact doesn't this cat/awk combination return the same thing each time?) and then grep. Then you wait for 4 greps to finish and you go around again.
If you have to use grep, you can use
grep -f filename
to specify the set of patterns to match in the filename, rather than a single pattern on the command line. I suspect form the above you can pre-generate such a list.

ok I have a test file containing 4 character strings ie aaaa aaab aaac etc
ls -lh test.txt
-rw-r--r-- 1 root pete 1.9G Jan 30 11:55 test.txt
time grep -e aaa -e bbb test.txt
<output>
real 0m19.250s
user 0m8.578s
sys 0m1.254s
time grep --mmap -e aaa -e bbb test.txt
<output>
real 0m18.087s
user 0m8.709s
sys 0m1.198s
So using the mmap option shows a clear improvement on a 2 GB file with two search patterns, if you take #BrianAgnew's advice and use a single invocation of grep try the --mmap option.
Though it should be noted that mmap can be a bit quirky if the source files changes during the search.
from man grep
--mmap
If possible, use the mmap(2) system call to read input, instead of the default read(2) system call. In some situations, --mmap yields better performance. However, --mmap can cause undefined behavior (including core dumps) if an input file shrinks while grep is operating, or if an I/O error occurs.

Using GNU Parallel it would look like this:
awk '{print $1}' input.sam > idsFile.txt
doit() {
LC_ALL=C fgrep -f idsFile.txt sample_"$1" | awk '{print $1,$10,$11}'
}
export -f doit
parallel doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt
If the order of the lines is not important this will be a bit faster:
parallel --line-buffer doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Proper way to keep array from pipe BASH - arrays

Related

bash script split array of strings to designed output

sh shell: how do I grab and store values, which may have space, in an array

Having issues using IFS to cut a string into an array. BASH

Execute bash command stored in associative array over SSH, store result

Faster grep function for big (27GB) files

Categories

Resources