Bash find unique values in array1 not in array2 (and vice-versa) - arrays

In bash, I know to be able to find the unique values between two arrays can be found by:
echo "${array1[#]} ${array2[#]}" | tr ' ' '\n' | sort | uniq -u
However, this gives the unique values between BOTH arrays. What if I wanted something about the elements that are unique only to array1 and elements that are unique only to array2? For example:
array1=(1 2 3 4 5)
array2=(2 3 4 5 6)
original_command_output = 1 6
new_command_output1 = 1
new_command_output2 = 6

You could use the comm command.
To get elements unique to the first array:
comm -23 \
<(printf '%s\n' "${array1[#]}" | sort) \
<(printf '%s\n' "${array2[#]}" | sort)
and elements unique to the second array:
comm -13 \
<(printf '%s\n' "${array1[#]}" | sort) \
<(printf '%s\n' "${array2[#]}" | sort)
Or, more robust, allowing for any character including newlines to be part of the elements, split on the null byte:
comm -z -23 \
<(printf '%s\0' "${array1[#]}" | sort -z) \
<(printf '%s\0' "${array2[#]}" | sort -z)

comm is probably the way to go but if you're running bash >= 4 then you can do it with associative arrays:
#!/bin/bash
declare -a array1=(1 2 3 4 5) array2=(2 3 4 5 6)
declare -A uniq1=() uniq2=()
for e in "${array1[#]}"; do uniq1[$e]=; done
for e in "${array2[#]}"; do
if [[ ${uniq1[$e]-1} ]]
then
uniq2[$e]=
else
unset "uniq1[$e]"
fi
done
echo "new_command_output1 = ${!uniq1[*]}"
echo "new_command_output2 = ${!uniq2[*]}"
new_command_output1 = 1
new_command_output2 = 6

BASH builtins can handle this cleaner and quicker. This will read both arrays for each element, comparing if they exist in either. If no match is found, output unique elements
arr1=(1 2 3 4 5)
arr2=(1 3 2 4)
for i in "${arr1[#]}" "${arr2[#]}" ; do
[[ ${arr2[#]} =~ $i ]] || echo $i
[[ ${arr1[#]} =~ $i ]] || echo $i
done
output: 5
If one of yours arrays have multiple character elements, e.g. 152, then you must convert the arrays, adding a literal character before and after. This way regex can identify an exact match
arr1=(1 2 3 4 5)
arr2=(1 3 2 4 152)
for i in "${arr1[#]}" ; do
var1+="^$i$"
done
for i in "${arr2[#]}" ; do
var2+="^$i$"
done
for i in "${arr1[#]}" "${arr2[#]}" ; do
[[ $var1 =~ "^$i$" ]] || echo $i
[[ $var2 =~ "^$i$" ]] || echo $i
done
output: 5 152

Related

Picking input record fields with AWK

Let's say we have a shell variable $x containing a space separated list of numbers from 1 to 30:
$ x=$(for i in {1..30}; do echo -n "$i "; done)
$ echo $x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
We can print the first three input record fields with AWK like this:
$ echo $x | awk '{print $1 " " $2 " " $3}'
1 2 3
How can we print all the fields starting from the Nth field with AWK? E.g.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EDIT: I can use cut, sed etc. to do the same but in this case I'd like to know how to do this with AWK.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this awk:
awk '{for (i=3; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
or pass start position as argument:
awk -v n=3 '{for (i=n; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
Version 4: Shortest is probably using sub to cut off the first three fields and their separators:
$ echo $x | awk 'sub(/^ *([^ ]+ +){3}/,"")'
Output:
4 5 6 7 8 9 ...
This will, however, preserve all space after $4:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"")'
4 5
so if you wanted the space squeezed, you'd need to, for example:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"") && $1=$1'
4 5
with the exception that if there are only 4 fields and the 4th field happens to be a 0:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"")&&$1=$1'
$ [no output]
in which case you'd need to:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"") && ($1=$1) || 1'
0
Version 1: cut is better suited for the job:
$ cut -d\ -f 4- <<<$x
Version 2: Using awk you could:
$ echo -n $x | awk -v RS=\ -v ORS=\ 'NR>=4;END{printf "\n"}'
Version 3: If you want to preserve those varying amounts of space, using GNU awk you could use split's fourth parameter seps:
$ echo "1 2 3 4 5 6 7" |
gawk '{
n=split($0,a,FS,seps) # actual separators goes to seps
for(i=4;i<=n;i++) # loop from 4th
printf "%s%s",a[i],(i==n?RS:seps[i]) # get fields from arrays
}'
Adding one more approach to add all value into a variable and once all fields values are done with reading just print the value of variable. Change the value of n= as per from which field onwards you want to get the data.
echo "$x" |
awk -v n=3 '{val="";for(i=n; i<=NF; i++){val=(val?val OFS:"")$i};print val}'
With GNU awk, you can use the join function which has been a built-in include since gawk 4.1:
x=$(seq 30 | tr '\n' ' ')
echo "$x" | gawk '#include "join"
{split($0, arr)
print join(arr, 4, length(arr), "|")}
'
4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30
(Shown here with a '|' instead of a ' ' for clarity...)
Alternative way of including join:
echo "$x" | gawk -i join '{split($0, arr); print join(arr, 4, length(arr), "|")}'
Using gnu awk and gensub:
echo $x | awk '{ print gensub(/^([[:digit:]]+[[:space:]]){3}(.*$)/,"\\2",$0)}'
Using gensub, split the string into two sections based on regular expressions and print the second section only.

Find items common between two Bash arrays

I have below shell script in which I have two arrays number1 and number2. I have a variable range which has list of numbers.
Now I need to figure out what are all numbers which are in number1 array are also present in range variable. Similarly for number2 array as well. Below is my shell script and it is working fine.
number1=(1220 1374 415 1097 1219 557 401 1230 1363 1116 1109 1244 571 1347 1404)
number2=(411 1101 273 1217 547 1370 286 1224 1362 1091 567 561 1348 1247 1106 304 435 317)
range=90,197,521,540,552,554,562,569:570,573,576,579,583,594,597,601,608:609,611,628,637:638,640:641,644:648
range_f=" "$(eval echo $(echo $range | perl -pe 's/(\d+):(\d+)/{$1..$2}/g;s/,/ /g;'))" "
echo "$range_f"
for item in "${number1[#]}"; do
if [[ $range_f =~ " $item " ]] ; then
new_number1+=($item)
fi
done
echo "new list: ${new_number1[#]}"
for item in "${number2[#]}"; do
if [[ $range_f =~ " $item " ]] ; then
new_number2+=($item)
fi
done
echo "new list: ${new_number2[#]}"
Is there any better way to write above stuff? As of now I have two for loops iterating and then figuring out new_number1 and new_number2 arrays.
Note:
Numbers like 644:648 means, it starts with 644 and ends with 648. It is just short form.
You can use comm with process substitution instead of looping:
mapfile -t new_number1 < <(comm -12 <(printf '%s\n' "${number1[#]}" | sort) <(printf '%s\n' $range_f | sort))
mapfile -t new_number2 < <(comm -12 <(printf '%s\n' "${number2[#]}" | sort) <(printf '%s\n' $range_f | sort))
mapfile -t name reads from the nested process substitution into the named array
printf ... | sort pair provides the sorted input streams for comm
comm -12 emits the items common to the two streams
Aside from codeforester's answer, I can think of two other ways of doing this:
Load the values of $range as keys of an associative array. The
values will be 1. Loop through each member of ${number1[#]} and
${number2[#]}, testing them against the values in the associative
array.
Use codeforester's printf ... | sort trick, but pipe both the list
and the range through sort | uniq -c, then grep for the
duplicates.
I'm not sure if either one of these is an actual improvement on your code. ... I would create a 'find duplicates' shell function, but otherwise your code looks solid.

Generate lowest possible number from array in bash

Let's say I have an array with some numbers, ordered from the lowest to the highest numeral:
root#blubb:~# min=1
root#blubb:~# echo $array
1 2 3 6 16 26 27
I want a bash script to always try the minimum number defined first (in this case min=1) and if that is not possible add 1 and try it again, until it finally works (in this case it would take 4)
I tried a lot with shuf and while-/for loops but could not get it to run properly.
min=1
array="1 2 3 6 16 26 27"
found=0
res=$min
for elem in $array; do
if [[ $elem = $res ]]; then
found=1
continue
fi
if [[ $found = 1 ]]; then
((res+=1))
if ((elem>res)); then
break
fi
fi
done
echo $res
Or with a function
function min_array {
local min array res found
min=$1
shift
array=$*
found=0
res=$min
for elem in $array; do
if [[ $elem = $res ]]; then
found=1
continue
fi
if [[ $found = 1 ]]; then
((res+=1))
if ((elem>res)); then
break
fi
fi
done
echo $res
}
$ min_array 1 1 2 3 6 16 26
4
$ min_array 6 1 2 3 6 16 26
7
$ min_array 8 1 2 3 6 16 26
8
EDIT one case was missing, another version
function min_array {
local min array res found
min=$1
shift
array=$*
found=0
res=$min
for elem in $array; do
if [[ $elem = $res ]]; then
found=1
((res+=1))
continue
fi
if [[ $found = 1 ]]; then
if ((elem>res)); then
break
else
((res+=1))
fi
fi
done
echo $res
}
Try this.
array=(7 5 3 1)
min()
{
local min=$1; shift
local n
for n in "$#"; do
if ((n<min)); then
min=$n
fi
done
echo "$min"
}
min "${array[#]}"

How to add a value into the middle of an array?

Hello I have the following array:
array=(1 2 3 4 5 7 8 9 12 13)
I execute this for loop:
for w in ${!array[#]}
do
comp=$(echo "${array[w+1]} - ${array[w]} " | bc)
if [ $comp = 1 ]; then
/*???*/
else
/*???*/
fi
done
What I would like to do is to insert a value when the difference between two consecutive elements is not = 1
How can I do it?
Many thanks.
Just create a loop from the minimum to the maximum values and fill the gaps:
array=(1 2 3 4 5 7 8 9 12 13)
min=${array[0]}
max=${array[-1]}
new_array=()
for ((i=min; i<=max; i++)); do
if [[ " ${array[#]} " =~ " $i " ]]; then
new_array+=($i)
else
new_array+=(0)
fi
done
echo "${new_array[#]}"
This creates a new array $new_array with the values:
1 2 3 4 5 0 7 8 9 0 0 12 13
This uses the trick in Check if an array contains a value.
You can select parts of the original array with ${arr[#]:index:count}.
Select the start, insert a new element, add the end.
To insert an element after index i=5 (the fifth element)
$ array=(1 2 3 4 5 7 8 9 12 13)
$ i=5
$ arr=("${array[#]:0:i}") ### take the start of the array.
$ arr+=( 0 ) ### add a new value ( may use $((i+1)) )
$ arr+=("${array[#]:i}") ### copy the tail of the array.
$ array=("${arr[#]}") ### transfer the corrected array.
$ printf '<%s>' "${array[#]}"; echo
<1><2><3><4><5><6><7><8><9><12><13>
To process all the elements, just do a loop:
#!/bin/bash
array=(1 2 3 4 5 7 8 9 12 13)
for (( i=1;i<${#array[#]};i++)); do
if (( array[i] != i+1 ));
then arr=("${array[#]:0:i}") ### take the start of the array.
arr+=( "0" ) ### add a new value
arr+=("${array[#]:i}") ### copy the tail of the array.
# echo "head $i ${array[#]:0:i}" ### see the array.
# echo "tail $i ${array[#]:i}"
array=("${arr[#]}") ### transfer the corrected array.
fi
done
printf '<%s>' "${array[#]}"; echo
$ chmod u+x ./script.sh
$ ./script.sh
<1><2><3><4><5><0><7><8><9><10><0><0><13>
There does not seem to be a way to insert directly into an array. You can append elements to another array instead though:
result=()
for w in ${!array[#]}; do
result+=("${array[w]}")
comp=$(echo "${array[w+1]} - ${array[w]} " | bc)
if [ $comp = 1 ]; then
/* probably leave empty */
else
/* handle missing digits */
fi
done
As a final step, you can assign result back to the original array.

bash, return position of the smallest entry in an array

I have an array in bash. For example
array=(1 3 4e-10 6 4 2e-4 7 5 2 9)
I would like to know how to return position of the smallest number, in this case 3.
Try this:
arr=(1 3 4e-10 6 4 2e-4 7 5 2 9)
for min value and position
echo "${arr[#]}" | tr -s ' ' '\n' | awk '{print($0" "NR)}' |
sort -g -k1,1 | head -1
4e-10 3
for position of min value
echo "${arr[#]}" | tr -s ' ' '\n' | awk '{print($0" "NR)}' |
sort -g -k1,1 | head -1 | cut -f2 -d' '
3
$ array=(1 3 4e-10 6 4 2e-4 7 5 2 9)
$ echo "${array[*]}" | tr ' ' '\n' | awk 'NR==1{min=$0}NR>1 && $1<min{min=$1;pos=NR}END{print min,pos}'
4e-10 3
or just
$ echo "${array[*]}" | tr ' ' '\n' | awk 'NR==1{min=$0}NR>1 && $1<min{min=$1;pos=NR}END{print pos}'
3
to get just the position.
Like the comments say, Bash doesn't do floats. I'd loop through the array and use perl or awk. Something like this should work:
for i in ${array[#]}; do echo $i; done | perl -e 'use strict; my #array; while(<STDIN>) { chomp $_; push (#array,$_+0); } foreach my $number (sort {$a <=> $b} #array) { print "$number\n"; } '
A very "line-noisy" combination of bash and perl
array=(1 3 4e-10 6 4 2e-4 7 5 2 9)
perl -lanE '
say 1 + (sort {$a->[1] <=> $b->[1]} map {[$_, $F[$_]]} 0..$#F)[0]->[0]
' <<< "${array[#]}"
outputs 3

Resources