How can I use a Bash array as input to a command? - arrays

Say, for example, I have the following array:
files=( "foo" "bar" "baz fizzle" )
I want to pipe the contents of this array through a command, say sort, as though each element where a line in a file. Sure, I could write the array to a temporary file, then use the temporary file as input to sort, but I'd like to avoid using a temporary file if possible.
If "bar fizzle" didn't have that space character, I could do something like this:
echo ${files[#]} | tr ' ' '\012' | sort
Any ideas? Thanks!

sort <(for f in "${files[#]}" ; do echo "$f" ; done)

Yet another solution:
printf "%s\n" "${files[#]}" | sort

SAVE_IFS=$IFS
IFS=$'\n'
echo "${files[*]}" | sort
IFS=$SAVE_IFS
Of course it won't work properly if there are any newlines in array values.

For sorting files I would recommend sorting in zero-terminated mode (to avoid errors in case of embedded newlines in file names or paths):
files=(
$'fileNameWithEmbeddedNewline\n.txt'
$'saneFileName.txt'
)
echo ${#files[#]}
sort <(for f in "${files[#]}" ; do printf '%s\n' "$((i+=1)): $f" ; done)
sort -z <(for f in "${files[#]}" ; do printf '%s\000' "$((i+=1)): $f" ; done) | tr '\0' '\n'
printf "%s\000" "${files[#]}" | sort -z | tr '\0' '\n'
find . -type f -print0 | sort -z | tr '\0' '\n'
sort -z reads & writes zero-terminated lines!

Related

How to split string to array with specific word in bash

I have a string after I do a command:
[username#hostname ~/script]$ gsql ls | grep "Graph graph_name"
- Graph graph_name(Vertice_1:v, Vertice_2:v, Vertice_3:v, Vertice_4:v, Edge_1:e, Edge_2:e, Edge_3:e, Edge_4:e, Edge_5:e)
Then I do
IFS=", " read -r -a vertices <<< "$(gsql use graph ifgl ls | grep "Graph ifgl(" | cut -d "(" -f2 | cut -d ")" -f1)" to make the string splitted and append to array. But, what I want is to split it by delimiter ", " then append each word that contain ":v" to an array, its mean word that contain ":e" will excluded.
How to do it? without do a looping
Like this, using grep
mapfile -t array < <(gsql ls | grep "Graph graph_name" | grep -oP '\b\w+:v')
The regular expression matches as follows:
Node
Explanation
\b
the boundary between a word char (\w) and something that is not a word char
\w+
word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
:v
':v'
This bash script should work:
declare arr as array variable
arr=()
# use ", " as delimiter to parse the input fed through process substituion
while read -r -d ', ' val || [[ -n $val ]]; do
val="${val%)}"
val="${val#*\(}"
[[ $val == *:v ]] && arr+=("$val")
done < <(gsql ls | grep "Graph graph_name")
# check array content
declare -p arr
Output:
declare -a arr='([0]="Vertice_1:v" [1]="Vertice_2:v" [2]="Vertice_3:v" [3]="Vertice_4:v")'
Since there is a condition per element the logical way is to use a loop. There may be ways to do it, but here is a solution with a for loop:
#!/bin/bash
input="Vertice_1:v, Vertice_2:v, Vertice_3:v, Vertice_4:v, Edge_1:e, Edge_2:e, Edge_3:e, Edge_4:e, Edge_5:e"
input="${input//,/ }" #replace , with SPACE (bash array uses space as separator)
inputarray=($input)
outputarray=()
for item in "${inputarray[#]}"; do
if [[ $item =~ ":v" ]]; then
outputarray+=($item) #append the item to the output array
fi
done
echo "${outputarray[#]}"
will give output: Vertice_1:v Vertice_2:v Vertice_3:v Vertice_4:v
since the elements don't have space in them this works

Does array 1 contain any of the strings in array 2

I am trying to make an if statement where if array1 contains any of the strings in array2 it should print "match" else print "no match"
So far I have the following. Not sure how to complete it. Both loops should break as soon as the first match is found.
#!/bin/bash
array1=(a b c 1 2 3)
array2=(b 1)
for a in "${array1[#]}"
do
for b in "${array2[#]}"
do
if [ "$a" == "$b" ]; then
echo "Match!"
break
fi
done
done
Maybe this isn't even the best way to do it?
This illustrates the desired result
if [ array1 contains strings in array2 ]
then
echo "match"
else
echo "no match"
fi
To check whether array1 contains any entry from array2 you can use grep. This will be way faster and shorter than loops in bash.
The following commands exit with status code 0 if and only if there is a match. Use them as ...
if COMMAND FROM BELOW; then
echo match
else
echo no match
fi
Single-Line Array Entries
The simple version for strings without linebreaks is
printf %s\\n "${array1[#]}" | grep -qFxf <(printf %s\\n "${array2[#]}")
Multiline Array Entries
Sadly there doesn't seem to be a straightforward way to make this work for array entries with linebreaks. GNU grep has the option -z to set the "line" delimiters in the input to null, but apparently no option to do the same for the file provided to -f. Listing the entries from array2 as -e arguments to grep is not working either -- grep -F seems to be unable to match multiline patterns. However, we can use the following hack:
printf %q\\n "${array1[#]}" | grep -qFxf <(printf %q\\n "${array2[#]}")
Here we assume that bash's built-in printf %q always prints a unique single line -- which it currently does. However, future implementations of bash may change this. The documentation help printf only states that the output thas to be correctly quoted for bash.
For a fast solution, you're better off using an external tool that can process the entire array as a whole (such as the grep-based answers). Doing nested loops in pure bash is likely to be slower for any substantial amount of data (where the item-by-item processing in bash is likely to be more expensive than the external process start-up time).
However, if you do need a pure bash solution, I see that your current solution has no way to print out the "no match" scenario. In addition, it may print out "match" multiple times.
To fix that, you can just store the fact that a match has been found, and use that to both:
exit the outer loop early as well as the inner loop; and
print the correct string at the end.
To do this, you can use something like:
#!/bin/bash
# Test data.
array1=(a b c 1 2 3)
array2=(b 1)
# Default to not-found state.
foundMatch=false
for a in "${array1[#]}" ; do
for b in "${array2[#]}" ; do
# Any match switches to found state and exits inner loop.
[[ "$a" == "$b" ]] && foundMatch=true && break
done
# If found, exit outer loop as well.
${foundMatch} && break
done
# Output appropriate message for found/not-found state.
$foundMatch && echo "Match" || echo "No match"
For array elements which does not contain newlines, the grep -qf with printf "%s\n" would be a good option. For comparing arrays with any elements, I ended with this:
cmp -s /dev/null <(comm -z12 <(printf "%s\0" "${array1[#]}" | sort -z) <(printf "%s\0" "${array2[#]}" | sort -z))
The printf "%s\0" "${array[#]}" | sort -z print a sorted list of zero terminated array elements. The comm -z12 then extracts common elements in both lists. The cmp -s /dev/null checks if the output of comm is empty, which will not be empty if any element is in both lists. You could use [ -z "$(comm -z ...)" ] to check if the output of comm would be empty, but bash will complain that the output of a command captured with $(..) contains a null byte, so it's better to cmp -s /dev/null.
I think | is faster then <(), so your if could be:
if ! printf "%s\0" "${array1[#]}" | sort -z |
comm -z12 - <(printf "%s\0" "${array2[#]}" | sort -z) |
cmp -s /dev/null -; then
echo "Some elements are in both array1 and array2"
fi
The following could work:
printf "%s\0" "${array1[#]}" | eval grep -qzFx "$(printf " -e %q" "${array2[#]}")"
But I believe I found a bug in grepv3.1 when matching a newline character with -x flag. If you don't use the newline character, the above line works.
Would you try the following:
array1=(a b c 1 2 3)
array2=(b 1)
declare -A seen # set marks of the elements of array1
for b in "${array2[#]}"; do
(( seen[$b]++ ))
done
for a in "${array1[#]}"; do
(( ${seen[$a]} )) && echo "match" && exit
done
echo "no match"
It may be efficient by avoiding the double loop, although the discussion of efficiency may be meaningless as long as using bash :)

How to replace a word from a file with elements in an array

I am trying to replace all the word "null" to elements in array. The problem is that after replacing one word of "null", I would like to replace the next "null" with next element in the array.
I am not very good with bash and I feel like this is quite a basic question.
Here is what I have so far:
for m in $(cat finalfile.csv)
do
if [ "$m" = "null" ]
then
m=cwearray[$counter]
let counter++
fi
done
This doesn't replace anything in the finalfile.csv.
For example if the file has:
"value1","value2","null","value3"\n
"value1","value2","null","value3"...
and the array has ["foo","bar"]
I would like it to be:
"value1","value2","foo","value3"\n
"value1","value2","bar","value3"...
can be done with bash, even with multiple nulls per line:
$ cat finalfile.csv
"value1","value2","null","null"
"value1","value2","null","value3"
$ cwearray=( foo bar baz )
$ idx=0
$ while read -r line; do
while [[ $line == *null* ]]; do
line=${line/null/${cwearray[idx++]}}
# ...............^^^^^^^^^^^^^^^^^^
# replace the _first_ "null" with the _next_ array element
done
echo "$line"
done < finalfile.csv > updatedfinalfile.csv
$ cat updatedfinalfile.csv
"value1","value2","foo","bar"
"value1","value2","baz","value3"
It's easier in Perl where you can increase the index directly in the replacement part of a substitution:
printf '%s\n' 1,2,3,null null,2,3,4 null,null,null,null \
| perl -pe 'BEGIN { #cwe = qw( A B C D E F ) }
s/(?:^|(?<=,))null(?=,|$)/$cwe[$i++]/g'
Update: It seems you've updated your question with a sample input. If nulls are double quoted, it gets even easier, as there's no need to check whether they're surrounded with commas or beginning/end of the line.
perl -pe 'BEGIN{ #cwe = qw( foo bar ) }
s/"null"/"$cwe[$i++]"/g'
An awk solution :
declare -a cwearray
cwearray=(foo bar)
awk -F, 'NR==FNR{repl[NR]=$0; next}{for(i=1;i<=NF;i++){if($i=="\"null\""){$i="\""repl[++counter]"\""}}}1' OFS="," <(for i in "${cwearray[#]}"; do echo "$i"; done) <file>
Read the file line by line. If a line contains null, then use sed to replace all occurrences of null with the corresponding value, retrieved via array index.
#!/bin/bash
file="finalfile.csv"
counter=0
array=(
"foo"
"bar"
)
while read -r line; do
item="${array[$counter]}"
echo "$line" | sed "s/null/$item/g"
((counter++))
done < "$file"

Prepending a variable to all items in a bash array

CURRENTFILENAMES=( "$(ls $LOC -AFl | sed "1 d" | grep "[^/]$" | awk '{ print $9 }')" )
I have written the above code, however it is not behaving as I expect it to in a for-loop, which I wrote as so
for a in "$CURRENTFILENAMES"; do
CURRENTFILEPATHS=( "${LOC}/${a}" )
done
Which I expected to prepend the value in the variable LOC to all of the items in the CURRENTFILENAMES array, however it has just prepended it to the beginning of the array, how can I remedy this?
You need to use += operator for appending into an array:
CURRENTFILEPATHS+=( "${LOC}/${a}" )
However parsing ls output is not advisable, use find instead.
EDIT: Proper way to run this loop:
CURRENTFILEPATHS=()
while IFS= read -d '' -r f; do
CURRENTFILEPATHS+=( "$f" )
done < <(find "$LOC" -maxdepth 1 -type f -print0)

Check if each element of an array is present in a string in bash, ignoring certain characters and order

On the web I found answers to find if an element of array is present in the string. But I want to find if each element in the array is present in the string.
eg. str1 = "This_is_a_big_sentence"
Initially str2 was like
str2 = "Sentence_This_big"
Now I wanted to search if string str1 contains "sentence"&"this"&"big" (All 3, ignore alphabetic order and case)
So I used arr=(${str2//_/ })
How do i proceed now, I know comm command finds intersection, but it needs a sorted list, also I need to ignore _ underscores.
I get my str2 by finding the extension of a particular type of file using the command
for i in `ls snooze.*`; do echo $i | cut -d "." -f2
# Till here i get str2 and need to check as mentioned above. Not sure how to do this, i tried putting str2 as array and now just need to check if all elements of my array occur in str1 (ignore case,order)
Any help would be highly appreciated. I did try to use This link
Now I wanted to search if string a contains "sentence"&"this"&"big"
(All 3, ignore alphabatic order and case)
Here is one approach:
#!/bin/bash
str1="This_is_a_big_sentence"
str2="Sentence_This_big"
if ! grep -qvwFf <(sed 's/_/\n/g' <<<${str1,,}) <(sed 's/_/\n/g' <<<${str2,,})
then
echo "All words present"
else
echo "Some words missing"
fi
How it works
${str1,,} returns the string str1 with all capitals replaced by lower case.
sed 's/_/\n/g' <<<${str1,,} returns the string str1, all converted to lower case and with underlines replaced by new lines so that each word is on a new line.
<(sed 's/_/\n/g' <<<${str1,,}) returns a file-like object containing all the words in str1, each word lower case and on a separate line.
The creation of file-like objects is called process substitution. It allows us, in this case, to treat the output of a shell command as if it were a file to read.
<(sed 's/_/\n/g' <<<${str2,,}) does the same for str2.
Assuming that file1 and file2 each have one word per line, grep -vwFf file1 file2 removes from file2 every occurrence of a word in file2. If there are no words left, that means that every word in file2 appears in file1.
By adding the option -q, grep will return no output but will set an exit code that we can use in our if statement.
In the actual command, file1 and file2 are replaced by our file-like objects.
The remaining grep options can be understood as follows:
-w tells grep to look for whole words only.
-F tells grep to look for fixed strings, not regular expressions.
-f tells grep to look for the patterns to match in the file (or file-like object) which follows.
-v tells grep to remove (the default is to keep) the words which match.
Here is an awk solution to check existence of all the words from a string in another string:
str1="This_is_a_big_sentence"
str2="Sentence_This_big"
awk -v RS=_ 'FNR==NR{a[tolower($1)]; next} {delete a[tolower($1)]} END{print (length(a)) ? "Not all words" : "All words"}' <(echo "$str2") <(echo "$str1")
With indentation:
awk -v RS=_ 'FNR==NR {
a[tolower($1)];
next
}
{ delete a[tolower($1)] }
END {
print (length(a)) ? "Not all words" : "All words"
}' <(echo "$str2") <(echo "$str1")
Explanation:
-v RS=_ We use record separator as _
FNR==NR - Execute this block for str2
a[tolower($1)]; next - Populate an array a with each lowercase word as key
{delete a[tolower($1)]} - For each word in str1 delete key in array a
END - If length of array a is still not 0 then there are some words left.
Here's another solution:
#!/bin/bash
str1="This_is_a_big_sentence"
str2="sentence_This_big"
var=0
var2=0
while read in
do
if [ $(echo $str1 | grep -ioE $in) ]
then
var=$((var+1))
fi
var2=$((var2+1))
done < <(echo $str2 | sed -e 's/\(.*\)/\L\1/' -e 's/_/\n/g')
if [[ $var -eq $var2 && $var -ne 0 ]]
then
echo "matched"
else
echo "not matched"
What this script does make str2 all lower case with sed -e 's/\(.*\)/\L\1/' which is a substitution of any character with its lower case, then replace underscores _ with return lines \n with the following sed expression: sed -e 's/_/\n/g', which is another substitution.
Now the individual words are fed into a while loop that compares str1 with the word that was fed in. Every time there's a match, increment var and every time we iterate though the while, we increment var2. If var == var2, then all the words of str2 were found in str1. Hope that helps.
Here's an approach.
if [ "$(echo "This_BIG_senTence" | grep -ioE 'this|big|sentence' | wc -l)" == "3" ]; then echo "matched"; fi
How it works.
grep options -i makes the grep case insensitive, -E for extended regular expressions, and -o separates the matches by line. Now that it is separated by line use wc with -l for line count. Since we had 3 conditions we check if it equals 3. Grep will return the lines where the match occurred, so if you are only working with a string, the example above will return the string for each condition, in this case 3, so there won't be any problems.
Note you can also create a grep chain and see if its empty.
if [ $(echo "This_BIG_SenTence" | grep -i this | grep -i big | grep -i sentence) ]; then echo matched; else echo not_matched; fi
Now I know what you mean. Try this:
#!/bin/bash
# add 4 non-matching examples
> snooze.foo_bar
> snooze.bar_go
> snooze.go_foo
> snooze.no_match
# add 3 matching examples
> snooze.foo_bar_go
> snooze.goXX_XXfoo_XXbarXX
> snooze.bar_go_foo_Ok
str1=("foo" "bar" "go")
for i in `ls snooze.*`; do
str2=${i#snooze.}
j=0
found=1
while [[ $j -lt ${#str1[#]} ]]; do
if ! echo $str2 | eval grep \${str1[$j]} >& /dev/null; then
found=0
break
fi
((j++))
done
if [[ $found -ne 0 ]]; then
echo Match found: $str2
fi
done
Resulting print of this script:
Match found: bar_go_foo_Ok
Match found: foo_bar_go
Match found: goXX_XXfoo_XXbarXX
alternatively, the if..grep line above can be replaced by
if [[ ! $str2 =~ `eval echo \${str1[$j]}` ]]; then
utilizing bash's regular expression match.
Note: I am not too careful about special characters in the search string, such as "\" or " " (space), which may cause problem.
--- Some explanations ---
In the if .. grep line, $j is first evaluated to the running index, from 0 to the number of elements in $str1 minus 1. Then, eval will re-evaluate the whole grep command again, causing ${str1[jjj]} to be re-evaluated (Here, jjj is the already evaluated index)
The strategy is to set found=1 (found by default), and then when any grep fails, we set found to 0 and break the inner j-loop.
Everything else should be straightforward.

Resources