"Push" onto bash associative array - arrays

I'm trying to run a script for all files in a directory with a common ID.
ls -1 *.vcf
a1.sourceA.vcf
a1.sourceB.vcf
a1.sourceC.vcf
a2.sourceA.vcf
a2.sourceB.vcf
a2.sourceC.vcf
a3.sourceA.vcf
a3.sourceC.vcf
The ID in each case precedes the first . (a1, a2 or a3) and for each ID I want to have all the sources for that ID in an associative array, keyed by the ID, e.g.;
a1 => [a1.sourceA.vcf, a1.sourceB.vcf, a1.sourceC.vcf]
I've attempted this as follows:
for file in $(ls *.vcf | sort)
do
id=$(echo $file | cut -d '.' -f 1)
vcfs[$id]+=$file
done
for i in "${!vcfs[#]}"
do
echo "key : $i"
echo "value: ${vcfs[$i]}"
echo " "
done
But I can't figure out how to get it working.
In Perl I would push values onto a hash of arrays in the loop:
push #{$vcfs{$id}}, $file;
to give me a data structure like this:
'a1' => [
'a1.sourceA.vcf',
'a1.sourceB.vcf',
'a1.sourceC.vcf'
],
'a3' => [
'a3.sourceA.vcf',
'a3.sourceC.vcf'
],
'a2' => [
'a2.sourceA.vcf',
'a2.sourceB.vcf',
'a2.sourceC.vcf'
]
How can I achieve this in bash?

From another answer given in question's comments
unset a1 a2 a3
function push {
local arr_name=$1
shift
if [[ $(declare -p "$arr_name" 2>&1) != "declare -a "* ]]
then
declare -g -a "$arr_name"
fi
declare -n array=$arr_name
array+=($#)
}
for file in *.vcf; do [[ -e $file ]] && push "${file%%.*}" "$file"; done
(IFS=,;echo "${a1[*]}")
(IFS=,;echo "${a2[*]}")
(IFS=,;echo "${a3[*]}")
But depending on needs maybe for with pattern is sufficient
for file in a1.*.vcf; do ... ; done
Finally $(ls ) must not be used in for loops as seen in other answers.
Why you shouldn't parse the output of ls

Related

Fetching data into an array

I have a file like this below:
-bash-4.2$ cat a1.txt
0 10.95.187.87 5444 up 0.333333 primary 0 false 0
1 10.95.187.88 5444 up 0.333333 standby 1 true 0
2 10.95.187.89 5444 up 0.333333 standby 0 false 0
I want to fetch the data from the above file into a 2D array.
Can you please help me with a suitable way to put into an array.
Also post putting we need put a condition to check whether the value in the 4th column is UP or DOWN. If it's UP then OK, if its down then below command needs to be executed.
-bash-4.2$ pcp_attach_node -w -U pcpuser -h localhost -p 9898 0
(The value at the end is getting fetched from the 1st column.
You could try something like that:
while read -r line; do
declare -a array=( $line ) # use IFS
echo "${array[0]}"
echo "${array[1]}" # and so on
if [[ "$array[3]" ]]; then
echo execute command...
fi
done < a1.txt
Or:
while read -r -a array; do
if [[ "$array[3]" ]]; then
echo execute command...
fi
done < a1.txt
This works only if field are space separated (any kind of space).
You could probably mix that with regexp if you need more precise control of the format.
Firstly, I don't think you can have 2D arrays in bash. But you can however store lines into a 1-D array.
Here is a script ,parse1a.sh, to demonstrate emulation of 2D arrays for the type of data you included:
#!/bin/bash
function get_element () {
line=${ARRAY[$1]}
echo $line | awk "{print \$$(($2+1))}" #+1 since awk is one-based
}
function set_element () {
line=${ARRAY[$1]}
declare -a SUBARRAY=($line)
SUBARRAY[$(($2))]=$3
ARRAY[$1]="${SUBARRAY[#]}"
}
ARRAY=()
while IFS='' read -r line || [[ -n "$line" ]]; do
#echo $line
ARRAY+=("$line")
done < "$1"
echo "Full array contents printout:"
printf "%s\n" "${ARRAY[#]}" # Full array contents printout.
for line in "${ARRAY[#]}"; do
#echo $line
if [ "$(echo $line | awk '{print $4}')" == "down" ]; then
echo "Replace this with what to do for down"
else
echo "...and any action for up - if required"
fi
done
echo "Element access of [2,3]:"
echo "get_element 2 3 : "
get_element 2 3
echo "set_element 2 3 left: "
set_element 2 3 left
echo "get_element 2 3 : "
get_element 2 3
echo "Full array contents printout:"
printf "%s\n" "${ARRAY[#]}" # Full array contents printout.
It can be executed by:
./parsea1 a1.txt
Hope this is close to what you are looking for. Note that this code will loose all indenting spaces during manipulation, but a formatted update of the lines could solve that.

Pass multiple arrays as arguments to a Bash script?

I've looked, but have only seen answers to one array being passed in a script.
I want to pass multiple arrays to a bash script that assigns them as individual variables as follows:
./myScript.sh ${array1[#]} ${array2[#]} ${array3[#]}
such that: var1=array1 and var2=array2 and var3=array3
I've tried multiple options, but doing variableName=("$#") combines all arrays together into each variable. I hope to have in my bash script a variable that represents each array.
The shell passes a single argument vector (that is to say, a simple C array of strings) off to a program being run. This is an OS-level limitation: There exists no method to pass structured data between two programs (any two programs, written in any language!) in an argument list, except by encoding that structure in the contents of the members of this array of C strings.
Approach: Length Prefixes
If efficiency is a goal (both in terms of ease-of-parsing and amount of space used out of the ARG_MAX limit on command-line and environment storage), one approach to consider is prefixing each array with an argument describing its length.
By providing length arguments, however, you can indicate which sections of that argument list are supposed to be part of a given array:
./myScript \
"${#array1[#]}" "${array1[#]}" \
"${#array2[#]}" "${array2[#]}" \
"${#array3[#]}" "${array3[#]}"
...then, inside the script, you can use the length arguments to split content back into arrays:
#!/usr/bin/env bash
array1=( "${#:2:$1}" ); shift "$(( $1 + 1 ))"
array2=( "${#:2:$1}" ); shift "$(( $1 + 1 ))"
array3=( "${#:2:$1}" ); shift "$(( $1 + 1 ))"
declare -p array1 array2 array3
If run as ./myScript 3 a b c 2 X Y 1 z, this has the output:
declare -a array1='([0]="a" [1]="b" [2]="c")'
declare -a array2='([0]="X" [1]="Y")'
declare -a array3='([0]="z")'
Approach: Per-Argument Array Name Prefixes
Incidentally, a practice common in the Python world (particularly with users of the argparse library) is to allow an argument to be passed more than once to amend to a given array. In shell, this would look like:
./myScript \
"${array1[#]/#/--array1=}" \
"${array2[#]/#/--array2=}" \
"${array3[#]/#/--array3=}"
and then the code to parse it might look like:
#!/usr/bin/env bash
declare -a args array1 array2 array3
while (( $# )); do
case $1 in
--array1=*) array1+=( "${1#*=}" );;
--array2=*) array2+=( "${1#*=}" );;
--array3=*) array3+=( "${1#*=}" );;
*) args+=( "$1" );;
esac
shift
done
Thus, if your original value were array1=( one two three ) array2=( aye bee ) array3=( "hello world" ), the calling convention would be:
./myScript --array1=one --array1=two --array1=three \
--array2=aye --array2=bee \
--array3="hello world"
Approach: NUL-Delimited Streams
Another approach is to pass a filename for each array from which a NUL-delimited list of its contents can be read. One chief advantage of this approach is that the size of array contents does not count against ARG_MAX, the OS-enforced command-line length limit. Moreover, with an operating system where such is available, the below does not create real on-disk files but instead creates /dev/fd-style links to FIFOs written to by subshells writing the contents of each array.
./myScript \
<( (( ${#array1[#]} )) && printf '%s\0' "${array1[#]}") \
<( (( ${#array2[#]} )) && printf '%s\0' "${array2[#]}") \
<( (( ${#array3[#]} )) && printf '%s\0' "${array3[#]}")
...and, to read (with bash 4.4 or newer, providing mapfile -d):
#!/usr/bin/env bash
mapfile -d '' array1 <"$1"
mapfile -d '' array2 <"$2"
mapfile -d '' array3 <"$3"
...or, to support older bash releases:
#!/usr/bin/env bash
declare -a array1 array2 array3
while IFS= read -r -d '' entry; do array1+=( "$entry" ); done <"$1"
while IFS= read -r -d '' entry; do array2+=( "$entry" ); done <"$2"
while IFS= read -r -d '' entry; do array3+=( "$entry" ); done <"$3"
Charles Duffy's response works perfectly well, but I would go about it a different way that makes it simpler to initialize var1, var2 and var3 in your script:
./myScript.sh "${#array1[#]} ${#array2[#]} ${#array3[#]}" \
"${array1[#]}" "${array2[#]}" "${array3[#]}"
Then in myScript.sh
#!/bin/bash
declare -ai lens=($1);
declare -a var1=("${#:2:lens[0]}") var2=("${#:2+lens[0]:lens[1]}") var3=("${#:2+lens[0]+lens[1]:lens[2]}");
Edit: Since Charles has simplified his solution, it is probably a better and more clear solution than mine.
Here is a code sample, which shows how to pass 2 arrays to a function. There is nothing more than in previous answers except it provides a full code example.
This is coded in bash 4.4.12, i.e. after bash 4.3 which would require a different coding approach. One array contains the texts to be colorized, and the other array contains the colors to be used for each of the text elements :
function cecho_multitext () {
# usage : cecho_multitext message_array color_array
# what it does : Multiple Colored-echo.
local -n array_msgs=$1
local -n array_colors=$2
# printf '1: %q\n' "${array_msgs[#]}"
# printf '2: %q\n' "${array_colors[#]}"
local i=0
local coloredstring=""
local normalcoloredstring=""
# check array counts
# echo "msg size : "${#array_msgs[#]}
# echo "col size : "${#array_colors[#]}
[[ "${#array_msgs[#]}" -ne "${#array_colors[#]}" ]] && exit 2
# build the colored string
for msg in "${array_msgs[#]}"
do
color=${array_colors[$i]}
coloredstring="$coloredstring $color $msg "
normalcoloredstring="$normalcoloredstring $msg"
# echo -e "coloredstring ($i): $coloredstring"
i=$((i+1))
done
# DEBUG
# echo -e "colored string : $coloredstring"
# echo -e "normal color string : $normal $normalcoloredstring"
# use either echo or printf as follows :
# echo -e "$coloredstring"
printf '%b\n' "${coloredstring}"
return
}
Calling the function :
#!/bin/bash
green='\E[32m'
cyan='\E[36m'
white='\E[37m'
normal=$(tput sgr0)
declare -a text=("one" "two" "three" )
declare -a color=("$white" "$green" "$cyan")
cecho_multitext text color
Job done :-)
I do prefer using base64 to encode and decode arrays like:
encode_array(){
local array=($#)
echo -n "${array[#]}" | base64
}
decode_array(){
echo -n "$#" | base64 -d
}
some_func(){
local arr1=($(decode_array $1))
local arr2=($(decode_array $2))
local arr3=($(decode_array $3))
echo arr1 has ${#arr1[#]} items, the second item is ${arr1[2]}
echo arr2 has ${#arr2[#]} items, the third item is ${arr2[3]}
echo arr3 has ${#arr3[#]} items, the here the contents ${arr3[#]}
}
a1=(ab cd ef)
a2=(gh ij kl nm)
a3=(op ql)
some_func "$(encode_array "${a1[#]}")" "$(encode_array "${a2[#]}")" "$(encode_array "${a3[#]}")"
The output is
arr1 has 3 items, the second item is cd
arr2 has 4 items, the third item is kl
arr3 has 2 items, the here the contents op ql
Anyway, that will not work with values that have tabs or spaces. If required, we need a more elaborated solution. something like:
encode_array()
{
for item in "$#";
do
echo -n "$item" | base64
done | paste -s -d , -
}
decode_array()
{
local IFS=$'\2'
local -a arr=($(echo "$1" | tr , "\n" |
while read encoded_array_item;
do
echo "$encoded_array_item" | base64 -d;
echo "$IFS"
done))
echo "${arr[*]}";
}
test_arrays_step1()
{
local IFS=$'\2'
local -a arr1=($(decode_array $1))
local -a arr2=($(decode_array $2))
local -a arr3=($(decode_array $3))
unset IFS
echo arr1 has ${#arr1[#]} items, the second item is ${arr1[1]}
echo arr2 has ${#arr2[#]} items, the third item is ${arr2[2]}
echo arr3 has ${#arr3[#]} items, the here the contents ${arr3[#]}
}
test_arrays()
{
local a1_2="$(echo -en "c\td")";
local a1=("a b" "$a1_2" "e f");
local a2=(gh ij kl nm);
local a3=(op ql );
a1_size=${#a1[#])};
resp=$(test_arrays_step1 "$(encode_array "${a1[#]}")" "$(encode_array "${a2[#]}")" "$(encode_array "${a3[#]}")");
echo -e "$resp" | grep arr1 | grep "arr1 has $a1_size, the second item is $a1_2" || echo but it should have only $a1_size items, with the second item as $a1_2
echo "$resp"
}
Based on the answers to this question you could try the following.
Define the arrays as variable on the shell:
array1=(1 2 3)
array2=(3 4 5)
array3=(6 7 8)
Have a script like this:
arg1=("${!1}")
arg2=("${!2}")
arg3=("${!3}")
echo "arg1 array=${arg1[#]}"
echo "arg1 #elem=${#arg1[#]}"
echo "arg2 array=${arg2[#]}"
echo "arg2 #elem=${#arg2[#]}"
echo "arg3 array=${arg3[#]}"
echo "arg3 #elem=${#arg3[#]}"
And call it like this:
. ./test.sh "array1[#]" "array2[#]" "array3[#]"
Note that the script will need to be sourced (. or source) so that it is executed in the current shell environment and not a sub shell.

Remove (not just unset) multiple strings from an array without knowing their positions

Say I have arrays
a1=(cats,cats.in,catses,dogs,dogs.in,dogses)
a2=(cats.in,dogs.in)
I want to remove everything from a1 that matches the strings in a2 after removing ".in" , in addition to the ones that match completely(including ".in").
So from a1, I want to remove cats, cats.in, dogs, dogs.in, but not catses or dogses.
I think I'll have to do this in 2 steps. I found how to cut the ".in" away:
for elem in "${a2[#]}" ; do
var="${elem}"
len="${#var}"
pref=${var:0:len-3}
done
^ this gives me "cats" and "dogs"
What command do I need to add to the loop remove each elem from a1?
Seems to me that the easiest way to solve this is with nested for loops:
#!/usr/bin/env bash
a1=(cats cats.in catses dogs dogs.in dogses)
a2=(cats.in dogs.in)
for x in "${!a1[#]}"; do # step through a1 by index
for y in "${a2[#]}"; do # step through a2 by content
if [[ "${a1[x]}" = "$y" || "${a1[x]}" = "${y%.in}" ]]; then
unset a1[x]
fi
done
done
declare -p a1
But depending on your actual data, the following might be better, using two separate for loops instead of nesting.
#!/usr/bin/env bash
a1=(cats cats.in catses dogs dogs.in dogses)
a2=(cats.in dogs.in)
# Flip "a2" array to "b", stripping ".in" as we go...
declare -A b=()
for x in "${!a2[#]}"; do
b[${a2[x]%.in}]="$x"
done
# Check for the existence of the stripped version of the array content
# as an index of the associative array we created above.
for x in "${!a1[#]}"; do
[[ -n "${b[${a1[x]%.in}]}" ]] && unset a1[$x] a1[${x%.in}]
done
declare -p a1
The advantage here would be that instead of looping through all of a2 for each item in a1, you just loop once over each array. Down sides might depend on your data. For example, if contents of a2 are very large, you might hit memory limits. Of course, I can't know that from what you included in your question; this solution works with the data you provided.
NOTE: this solution also depends on an associative array, which is a feature introduced to bash in version 4. If you're running an old version of bash, now might be a good time to upgrade. :)
This is the solution I went with:
for elem in "${a2[#]}" ; do
var="${elem}"
len="${#var}"
pref=${var:0:len-3}
#set 'cats' and 'dogs' to ' '
for i in ${!a1[#]} ; do
if [ "${a1[$i]}" = "$pref" ] ; then
a1[$i]=''
fi
#set 'cats.in' and 'dogs.in' to ' '
if [ "${a1[$i]}" = "$var" ] ; then
a1[$i]=''
fi
done
done
Then I created a new array from a1 without the ' ' elements
a1new=( )
for filename in "${a1[#]}" ; do
if [[ $a1 != '' ]] ; then
a1new+=("${filename}")
fi
done
A naive approach would be:
#!/bin/bash
# Checkes whether a value is in an array.
# Usage: "$value" "${array[#]}"
inarray () {
local n=$1 h
shift
for h in "$#";do
[[ $n = "$h" ]] && return
done
return 1
}
a1=(cats cats.in catses dogs dogs.in dogses)
a2=(cats.in dogs.in)
result=()
for i in "${a1[#]}";do
if ! inarray "$i" "${a2[#]}" && ! inarray "$i" "${a2[#]%.in}"; then
result+=("$i")
fi
done
# Checking.
printf '%s\n' "${result[#]}"
If you only want to print the values to stdout, you might instead want to use comm:
comm -23 <(printf '%s\n' "${a1[#]}"|sort -u) <(printf '%s\n' "${a2[#]%.in}" "${a2[#]}"|sort -u)

Check multiple var if exists in array with grep

I am using this code to check one $var if exists in array :
if echo ${myArr[#]} | grep -qw $myVar; then echo "Var exists on array" fi
How could I combine more than one $vars to my check? Something like grep -qw $var1,$var2; then ... fi
Thank you in Advance.
if echo ${myArr[#]} | grep -qw -e "$myVar" -e "$otherVar"
then
echo "Var exists on array"
fi
From the man-page:
-e PATTERN, --regexp=PATTERN
Use PATTERN as the pattern.
This can be used to specify multiple search patterns, or to protect a pattern beginning with a hyphen (-). (-e is specified by POSIX.)
But if you want to use arrays like this you might as well use the bash built-in associative arrays.
To implement and logic:
myVar1=home1
myVar2=home2
myArr[0]=home1
myArr[1]=home2
if echo ${myArr[#]} | grep -qw -e "$myVar1.*$myVar2" -e "$myVar2.*$myVar1"
then
echo "Var exists on array"
fi
# using associative arrays
declare -A assoc
assoc[home1]=1
assoc[home2]=1
if [[ ${assoc[$myVar1]} && ${assoc[$myVar2]} ]]; then
echo "Var exists on array"
fi
Actually you don't need grep for this, Bash is perfectly capable of doing Extended Regular Expressions itself (Bash 3.0 or later).
pattern="$var1|$var2|$var3"
for element in "${myArr[#]}"
do
if [[ $element =~ $pattern ]]
then
echo "$pattern exists in array"
break
fi
done
Something quadratic, but aware of spaces:
myArr=(aa "bb c" ddd)
has_values(){
for e in "${myArr[#]}" ; do
for f ; do
if [ "$e" = "$f" ]; then return 0 ; fi
done
done
return 1
}
if has_values "ee" "bb c" ; then echo yes ; else echo "no" ; fi
this example will print no because "bb c" != "bb c"

Is there a way to search an entire array inside of an argument?

Posted my code below, wondering if I can search one array for a match... or if theres a way I can search a unix file inside of an argument.
#!/bin/bash
# store words in file
cat $1 | ispell -l > file
# move words in file into array
array=($(< file))
# remove temp file
rm file
# move already checked words into array
checked=($(< .spelled))
# print out words & ask for corrections
for ((i=0; i<${#array[#]}; i++ ))
do
if [[ ! ${array[i]} = ${checked[#]} ]]; then
read -p "' ${array[i]} ' is mispelled. Press "Enter" to keep
this spelling, or type a correction here: " input
if [[ ! $input = "" ]]; then
correction[i]=$input
else
echo ${array[i]} >> .spelled
fi
fi
done
echo "MISPELLED: CORRECTIONS:"
for ((i=0; i<${#correction[#]}; i++ ))
do
echo ${array[i]} ${correction[i]}
done
otherwise, i would need to write a for loop to check each array indice, and then somehow make a decision statement whether to go through the loop and print/take input
The ususal shell incantation to do this is:
cat $1 | ispell -l |while read -r ln
do
read -p "$ln is misspelled. Enter correction" corrected
if [ ! x$corrected = x ] ; then
ln=$corrected
fi
echo $ln
done >correctedwords.txt
The while;do;done is kind of like a function and you can pipe data into and out of it.
P.S. I didn't test the above code so there may be syntax errors

Resources