Strange behaviour while subtracting 2 string arrays - arrays

I am subtracting array1 from array2
My 2 arrays are
array1=(apps argocd cache core dev-monitoring-busk test-ci-cd)
array2=(apps argocd cache core default kube-system kube-public kube-node-lease monitoring)
And the way Im subtracting them is
for i in "${array2[#]}"; do
array1=(${array1[#]//$i})
done
echo ${array1[#]}
Now my expected result should be
dev-monitoring-busk test-ci-cd
But my expected result is
dev--busk test-ci-cd
Although the subtraction looks good but its also deleting the string monitoring from dev-monitoring-busk. I dont understand why. Can some point out whats wrong here ?
I know that there are other solutions out there for a diff between 2 arrays like
echo ${Array1[#]} ${Array2[#]} | tr ' ' '\n' | sort | uniq -u
But this is more of a diff and not a subtraction. So this does not work for me.

Bit of a kludge but it works ...
use comm to find those items unique to a (sorted) data set
use tr to convert between spaces (' ' == array element separator) and carriage returns ('\n' ; comm works on individual lines)
echo "${array1[#]}" | tr ' ' '\n' | sort : convert an array's elements into separate lines and sort
comm -23 (sorted data set #1) (sorted data set #2) : compare sorted data sets and return the rows that only exist in data set #1
Pulling this all together gives us:
$ array1=(apps argocd cache core dev-monitoring-busk test-ci-cd)
$ array2=(apps argocd cache core default kube-system kube-public kube-node-lease monitoring)
# find rows that only exist in array1
$ comm -23 <(echo "${array1[#]}" | tr ' ' '\n' | sort) <(echo "${array2[#]}" | tr ' ' '\n' | sort)
dev-monitoring-busk
test-ci-cd
# same thing but this time replace carriage returns with spaces (ie, pull all items onto a single line of output):
$ comm -23 <(echo "${array1[#]}" | tr ' ' '\n' | sort) <(echo "${array2[#]}" | tr ' ' '\n' | sort) | tr '\n' ' '
dev-monitoring-busk test-ci-cd
NOTEs about comm:
- takes 2 sorted data sets as input
- generates 3 columns of output:
- (output column #1) rows only in data set #1
- (output column #2) rows only in data set #2
- (output column #3) rows in both data sets #1 and #2
- `comm -xy` ==> discard ouput columns 'x' and 'y'
- `comm -12` => discard output columns #1 and #2 => only show lines common to both data sets (output column #3)
- `comm -23' => discard output columns #2 and #3 => only show lines that exist in data set #1 (output column #1)

If I'm understanding correctly, what you want is not to subtract array1
from array2, but to subtract array2 from array1.
As others are pointing out, bash replacement do not work with arrays.
Instead you can make use of an associative array if your bash version >= 4.2.
Please try the following:
declare -a array1=(apps argocd cache core dev-monitoring-busk test-ci-cd)
declare -a array2=(apps argocd cache core default kube-system kube-public kube-node-lease monitoring)
declare -A mark
declare -a ans
for e in "${array2[#]}"; do
mark[$e]=1
done
for e in "${array1[#]}"; do
[[ ${mark[$e]} ]] || ans+=( "$e" )
done
echo "${ans[#]}"
It first iterate over array2 and marks its elements by using
an associative arrray mark.
It then iterates over array1 and add the element to the answer
if it is not seen in the mark.

Related

BASH: sorting an associative array with their keys

So I am quite struggling with arrays in shell scripting, especially dealing with sorting the key values. Here's what I have
declare -A array
Array[0]=(0)
Array[1]=(4)
Array[2]=(6)
Array[3]=(1)
So in each array we have (0,4,6,1), if we sort them to the largest to the smallest, it would be (6,4,1,0). Now, I wonder if I could sort the key of the value, and put them in a new array like this(sort of like ranking them):
newArray[0]=(2) # 2 which was the key for 6
newArray[1]=(1) # 1 which was the key for 4
newArray[2]=(3) # 3 which was the key for 1
newArray[3]=(0) # 0 which was the key for 0
I've tried some solutions but they are so much hard coded and not working for some situations. Any helps would be appreciated.
Create a tuple of index+value.
Sort over value.
Remove values.
Read into an array.
array=(0 4 6 1)
tmp=$(
# for every index in the array
for ((i = 0; i < ${#array[#]}; ++i)); do
# output the index, space, an array value on every line
echo "$i ${array[i]}"
done |
# sort lines using Key as second column Numeric Reverse
sort -k2nr |
# using space as Delimiter, extract first Field from each line
cut -d' ' -f1
)
# Load tmp into an array separated by newlines.
readarray -t newArray <<<"$tmp"
# output
declare -p newArray
outputs:
declare -a newArray=([0]="2" [1]="1" [2]="3" [3]="0")

How does bash array slicing work when start index is not provided?

I'm looking at a script, and I'm having trouble determining what is going on.
Here is an example:
# Command to get the last 4 occurrences of a pattern in a file
lsCommand="ls /my/directory | grep -i my_pattern | tail -4"
# Load the results of that command into an array
dirArray=($(echo $(eval $lsCommand) | tr ' ' '\n'))
# What is this doing?
yesterdaysFileArray=($(echo ${x[#]::$((${#x[#]} / 2))} | tr ' ' '\n'))
There is a lot going on here. I understand how arrays work, but I don't know how $x is getting referenced if it was never declared.
I see that the $((${#x[#]} / 2}} is taking the number of elements and dividing it in half, and the tr is used to create the array. But what else is going on?
I think the last line is an array slice pattern in bash of form ${array[#]:1:2}, where array[#] returns the contents of the array, :1:2 takes a slice of length 2, starting at index 1.
So for your case though you are taking the start index empty because you haven't specified any and length as half the count of array.
But there is a lot better way to do this in bash as below. Don't use eval and use the built-in globbing support from the shell itself
cd /my/directory
fileArray=()
for file in *my_pattern*; do
[[ -f "$file" ]] || { printf '%s\n' 'no file found'; return 1; }
fileArray+=( "$file" )
done
and do
printf '%s\n' "${fileArray[#]::${#fileArray[#]}/2}"

Removing list of comma separated words from a sentence

I have two variables as follows :
sentence="name string,age int,address string,dob timestamp,job string"
ignore="age int,dob timestamp"
Basically I need to iterate through the comma separated variable $ignore and remove each word from the above varibale $sentence.
After performing this the output sentence should be as below:
echo $outputsentence
name string,address string,job string
Should I create an array for words to be ignored and iterate through it an perform a sed operation? Is there any other way around?
With GNU sed:
pattern=$(sed "s/,/|/g" <<< "$ignore")
outputsentence=$(sed -r 's/('"$pattern"'),*//g' <<< "$sentence")
The first sed command replace all , with an alternation operator | in the ignore list.
This result is used as a pattern in to remove the strings from $sentence.
This is a situation that requires sets: you want to know which members of set A are not present in set B.
For this we have a beautiful article Set Operations in the Unix Shell that describes all of them.
If you want to check the intersection of sets, say:
$ comm -12 <(tr ',' '\n' <<< "$sentence" | sort) <(tr ',' '\n' <<< "$ignore" | sort)
age int
dob timestamp
For complement, use comm -23:
$ comm -23 <(tr ',' '\n' <<< "$sentence" | sort) <(tr ',' '\n' <<< "$ignore" | sort)
address string
job string
name string
Note tr ',' '\n' <<< "$var" | sort just splits the ,-separated strings in slices. Then, <( ) is a process substitution.

Bash: find non-repeated elements in an array

I'm looking for a way to find non-repeated elements in an array in bash.
Simple example:
joined_arrays=(CVE-2015-4840 CVE-2015-4840 CVE-2015-4860 CVE-2015-4860 CVE-2016-3598)
<magic>
non_repeated=(CVE-2016-3598)
To give context, the goal here is to end up with an array of all package update CVEs that aren't generally available via 'yum update' on a host due to being excluded. The way I came up with doing such a thing is to populate 3 preliminary arrays:
available_updates=() #just what 'yum update' would provide
all_updates=() #including excluded ones
joined_updates=() # contents of both prior arrays
Then apply logic to joined_updates=() that would return only elements that are included exactly once. Any element with two occurrences is one that can be updated normally and doesn't need to end up in the 'excluded_updates=()' array.
Hopefully this makes sense. As I was typing it out I'm wondering if it might be simpler to just remove all elements found in available_updates=() from all_updates=(), leaving the remaining ones as the excluded updates.
Thanks!
One pure-bash approach is to store a counter in an associative array, and then look for items where the counter is exactly one:
declare -A seen=( ) # create an associative array (requires bash 4)
for item in "${joined_arrays[#]}"; do # iterate over original items
(( seen[$item] += 1 )) # increment value associated with item
done
declare -a non_repeated=( )
for item in "${!seen[#]}"; do # iterate over keys
if (( ${seen[$item]} == 1 )); then # if counter for that key is 1...
non_repeated+=( "$item" ) # ...add that item to the output array.
done
declare -p non_repeated # print result
Another, terser (but buggier -- doesn't work with values containing newline literals) approach is to take advantage of standard text manipulation tools:
non_repeated=( ) # setup
# use uniq -c to count; filter for results with a count of 1
while read -r count value; do
(( count == 1 )) && non_repeated+=( "$value" )
done < <(printf '%s\n' "${joined_arrays[#]}" | sort | uniq -c)
declare -p non_repeated # print result
...or, even terser (and buggier, requiring that the array value split into exactly one field in awk):
readarray -t non_repeated \
< <(printf '%s\n' "${joined_arrays[#]}" | sort | uniq -c | awk '$1 == 1 { print $2; }'
To crib an answer I really should have come up myself from #Aaron (who deserves an upvote from anyone using this; do note that it retains the doesn't-work-with-values-with-newlines bug), one can also use uniq -u:
readarray -t non_repeated < <(printf '%s\n' "${joined_arrays[#]}" | sort | uniq -u)
I would rely on uniq.
It's -u option is made for this exact case, outputting only the uniques occurrences. It relies on the input to be a sorted linefeed-separated list of tokens, hence the need for IFS and sort :
$ my_test_array=( 1 2 3 2 1 0 )
$ printf '%s\n' "${my_test_array[#]}" | sort | uniq -u
0
3
Here is a single awk based solution that doesn't require sort:
arr=( 1 2 3 2 1 0 )
printf '%s\n' "${arr[#]}" |
awk '{++fq[$0]} END{for(i in fq) if (fq[i]==1) print i}'
0
3

comparing 2 strings element of an array

a=0
b=1
for a in ${my_array[#]}
do
for b in ${my_array[#]}
do
if [ " ${my_array[a]} " = " ${my_array[b]} " ]
then
continue
((a++))
fi
done
((b++))
done
Hi. I want to compare 2 strings. They are in the same array. If they are same, I just print it one of them. How can I do that ? I write some code. There are 2 thins (a and b ) a's first value is 0 and it stores first element of array. b's first value is 1 and it stores 1.element of array. I want to compare them and if they are same strings, I just print one of them .so I use "continue". think my code is true, but it doesn't work .there is a mistake which I can't see. Can you help me ?
for example it runs like that .
Enter words :
erica 17
sally 16
john 18
henry 17
john 18
jessica 19
as you see there are 2 john 18. I don't want both of them. My program will be check there are 2 strings are the same . If they are same I will just use one of them .
The if statment "=" - assign, "==" - compare.
If I understand correctly, you want to uniquify the elements of an array. If this is right, then the following stackoverflow question (How can I get unique values from an array in Bash?) appears to answer it in the following one-liner:
echo "${my_array[#]}" | tr ' ' '\n' | sort | uniq
Unfortunately, since your input words (or elements of array) contain spaces, the above will not work as expected. The issue is because the first echo will flatten-out the array into space-separated elements. The solution to this would be to use printf and remove 'tr'. Here it is...
printf '%s\n' "${my_array[#]}" | sort | uniq -c
But this alters the position of elements wrt the original array. Hope that is fine?
You can use sort and uniq to be able to get your desired output:
echo "${my_array[#]}" | tr ' ' '\n' | sort | uniq -c | tr '\n' ' '
And if you have spaces in your input you can use this:
printf '%s\n' "${my_array[#]}" | sort | uniq -c
That way you will get the number of times the string occurred but it will be printed just once.

Resources