Comment out items that do not match pattern in array

Comment out items that do not match pattern in array - arrays

I have a log file I am trying to comment out lines that do not match my array. I did successfully learn how to create an array and I can echo out the array items but I am having trouble taking anything that doesn't match my array and adding something in front of it. Here is my code, if you have suggestions on another path or ways I can make it better:
for itsSaturday in $(find "$LOCATION" -mindepth 1 -maxdepth 1 -name "*.log" ); do
TEMPFILE="$itsSaturday.$$"
declare -a someArray=( "breakfast" "scrambled eggs" "Bloody Mary" )
theCall='some_additional_text_'
commentOn="## You_need_"
for arrayItem in "${someArray[#]}"; do
merged="$theCall$arrayItem"
if ! grep -q "$merged" "$itsSaturday"; then
sed -e '/$merged/! s:$commentOn$theCall::g' "$itsSaturday" > $TEMPFILE && mv $TEMPFILE "$itsSaturday"
fi
done
done
file:
some_additional_text_breakfast
some_additional_text_bacon
some_additional_text_scrambled eggs
some_additional_text_Bloody Mary
some_additional_text_orange juice
some_additional_text_breakfast
file into:
some_additional_text_breakfast
## You_need_some_additional_text_bacon
some_additional_text_scrambled eggs
some_additional_text_Bloody Mary
## You_need_some_additional_text_orange juice
some_additional_text_breakfast
How can I add a variable before items that do not match my array?

I don't like doing this using bash and sed, but I think the following might be enough:
#! /bin/bash
declare -a someArray=( "breakfast" "scrambled eggs" "Bloody Mary" )
theCall='some_additional_text_'
commentOn="## You_need_"
OIFS="$IFS"
IFS='|' mergedLines="${someArray[*]/#/$theCall}"
IFS="$OIFS"
for i in *.txt
do
TEMPFILE="$i.$$"
sed -r "/$mergedLines/!s/^/$commentOn/" "$i" >> "$TEMPFILE"
done
I shifted the array and other constants out of the loop.
"${someArray[*]/#/$theCall}" uses bash string substitution to append the contents of $theCall to every element in the array.
IFS='|' mergedLines="${someArray[*]} is a convenient trick to combine the elements of an array into a pipe-separated string.
Combined, (2) and (3) get me
some_additional_text_breakfast|some_additional_text_scrambled eggs|some_additional_text_Bloody Mary
in mergedLines.
Then it's just a matter of using extended regular expressions in sed (for |) and replacing non-matching lines.
Your sed pattern used single quotes, so the variables within were not expanded.

Try replacing the inner for-loop with:
PROG=$(printf '%s\n' "${COMMENT[#]}" | while read comment ; do
/bin/echo -n '$0 !~ /'"$comment"'$/ && '
done
echo '1 { printf commentOn } ; { print }')
awk -v commentOn="$commentOn" "$PROG" $itsSaturday > $TEMPFILE && mv $TEMPFILE $itsSaturday
On each file, this creates an awk program that does the work.

Related

how to parse json in bash script as an array? array value should have both key:value format

my json.file looks like
{
"price1" : "120.10",
"price2" : "110.30",
"price3" : "244.45"
}
I have used below sed command in my bash script to declare array that reads from json
array=( $(sed -n '/{/,/}/{s/[^:]*:[^"]*"\([^"]*\).*/\1/p;}' json.file) )
this gave me output for echo ${array[*]} values
120.10 110.10 244.45
I am looking for my array values to include the key name as well (key:value).
my desired output should be key:value format
price1:120.10 price2:110.10 price3:244.45
can someone please help or guide me?

Parsing json with sed is probably a bad idea. But let's assume it is a super-simple and regular kind of json format. You were not too far from a working solution:
$ array=($(sed -nr 's/.*"(.*)".*"(.*)".*/\1:\2/p' json.file))
$ echo ${array[*]}
price1:120.10 price2:110.30 price3:244.45
The p flag of the substitute command tells sed to print the result if there was a match. It is good to know. If you have only two quoted strings per line of interest it should work. The regular expression .*"(.*)".*"(.*)".* matches anything - .* - followed by a double quote, anything again, another double quote... The parentheses - (.*) - do not change what is matched. It just records the matched part between parentheses in one of nine buffers that can be used in the replacement string - \1:\2. So here, the replacement string corresponds to:
<first recorded match>:<second recorded match>
If you want to be more specific about the matching lines you can. For instance:
sed -nr 's/^ "(.*)" : "(.*)",?$/\1:\2/p' json.file
Also specifies that there are 2 leading spaces, that the colon is preceded and followed by one single space and that there is a optional comma after the last quote and before the end of line.
But there is also the much simpler:
sed -nr 's/[[:space:]",]//gp' json.file
that just removes all spaces, double quotes and commas, printing only the matching lines. And guess what? It is what you (apparently) want.
Anyway, remember that if your json files are more complex than what you show sed is definitely not the right tool.

Your solution just needs to extract the extra data:
array=( $(sed -n '/{/,/}/{s/"\([^"]*\)"[^:]*:[^"]*"\([^"]*\).*/\1:\2/p;}' <<< "$j") )
echo ${array[#]}
price1:120.10 price2:110.30 price3:244.45
Simpler looking solutions may work too, but I'm assuming that you wrote your pattern like that deliberately.
The key trick here is to enclose part of your pattern with escaped brackets \(...\) and then have a numbered substitution for each in the output part \1, \2, etc.

json='{"price1":"120.10","price2":"110.30","price3":"244.45"}'
array=($(jq -r 'to_entries[] | ( "\(.key):\(.value)")' <<< "$json"))
echo ${array[#]}
printf '%s\n' "${array[#]}"
declare -p array
echo
array=($(jq -r 'to_entries[] | ( "\(.key):\(.value)") | #sh' <<< "$json"))
echo ${array[#]}
printf '%s\n' "${array[#]}"
declare -p array
Output:
price1:120.10 price2:110.30 price3:244.45
price1:120.10
price2:110.30
price3:244.45
declare -a array='([0]="price1:120.10" [1]="price2:110.30" [2]="price3:244.45")'
'price1:120.10' 'price2:110.30' 'price3:244.45'
'price1:120.10'
'price2:110.30'
'price3:244.45'
declare -a array='([0]="'\''price1:120.10'\''" [1]="'\''price2:110.30'\''" [2]="'\''price3:244.45'\''")'

For bash scripting, you might want to use an Associative array:
declare -A prices
while IFS=$'\t' read -r key value; do
prices[$key]=$value
done < <(
jq -r 'to_entries[] | [.key, .value] | #tsv' json.file
)
Then, inspect the array
declare -p prices
declare -A prices='([price1]="120.10" [price3]="244.45" [price2]="110.30" )'
or iterate over it
for key in "${!prices[#]}"; do
printf '%s => %s\n' "$key" "${prices[$key]}"
done
price1 => 120.10
price3 => 244.45
price2 => 110.30

How to replace a word from a file with elements in an array

I am trying to replace all the word "null" to elements in array. The problem is that after replacing one word of "null", I would like to replace the next "null" with next element in the array.
I am not very good with bash and I feel like this is quite a basic question.
Here is what I have so far:
for m in $(cat finalfile.csv)
do
if [ "$m" = "null" ]
then
m=cwearray[$counter]
let counter++
fi
done
This doesn't replace anything in the finalfile.csv.
For example if the file has:
"value1","value2","null","value3"\n
"value1","value2","null","value3"...
and the array has ["foo","bar"]
I would like it to be:
"value1","value2","foo","value3"\n
"value1","value2","bar","value3"...

can be done with bash, even with multiple nulls per line:
$ cat finalfile.csv
"value1","value2","null","null"
"value1","value2","null","value3"
$ cwearray=( foo bar baz )
$ idx=0
$ while read -r line; do
while [[ $line == *null* ]]; do
line=${line/null/${cwearray[idx++]}}
# ...............^^^^^^^^^^^^^^^^^^
# replace the _first_ "null" with the _next_ array element
done
echo "$line"
done < finalfile.csv > updatedfinalfile.csv
$ cat updatedfinalfile.csv
"value1","value2","foo","bar"
"value1","value2","baz","value3"

It's easier in Perl where you can increase the index directly in the replacement part of a substitution:
printf '%s\n' 1,2,3,null null,2,3,4 null,null,null,null \
| perl -pe 'BEGIN { #cwe = qw( A B C D E F ) }
s/(?:^|(?<=,))null(?=,|$)/$cwe[$i++]/g'
Update: It seems you've updated your question with a sample input. If nulls are double quoted, it gets even easier, as there's no need to check whether they're surrounded with commas or beginning/end of the line.
perl -pe 'BEGIN{ #cwe = qw( foo bar ) }
s/"null"/"$cwe[$i++]"/g'

An awk solution :
declare -a cwearray
cwearray=(foo bar)
awk -F, 'NR==FNR{repl[NR]=$0; next}{for(i=1;i<=NF;i++){if($i=="\"null\""){$i="\""repl[++counter]"\""}}}1' OFS="," <(for i in "${cwearray[#]}"; do echo "$i"; done) <file>

Read the file line by line. If a line contains null, then use sed to replace all occurrences of null with the corresponding value, retrieved via array index.
#!/bin/bash
file="finalfile.csv"
counter=0
array=(
"foo"
"bar"
)
while read -r line; do
item="${array[$counter]}"
echo "$line" | sed "s/null/$item/g"
((counter++))
done < "$file"

Check if each element of an array is present in a string in bash, ignoring certain characters and order

On the web I found answers to find if an element of array is present in the string. But I want to find if each element in the array is present in the string.
eg. str1 = "This_is_a_big_sentence"
Initially str2 was like
str2 = "Sentence_This_big"
Now I wanted to search if string str1 contains "sentence"&"this"&"big" (All 3, ignore alphabetic order and case)
So I used arr=(${str2//_/ })
How do i proceed now, I know comm command finds intersection, but it needs a sorted list, also I need to ignore _ underscores.
I get my str2 by finding the extension of a particular type of file using the command
for i in `ls snooze.*`; do echo $i | cut -d "." -f2
# Till here i get str2 and need to check as mentioned above. Not sure how to do this, i tried putting str2 as array and now just need to check if all elements of my array occur in str1 (ignore case,order)
Any help would be highly appreciated. I did try to use This link

Now I wanted to search if string a contains "sentence"&"this"&"big"
(All 3, ignore alphabatic order and case)
Here is one approach:
#!/bin/bash
str1="This_is_a_big_sentence"
str2="Sentence_This_big"
if ! grep -qvwFf <(sed 's/_/\n/g' <<<${str1,,}) <(sed 's/_/\n/g' <<<${str2,,})
then
echo "All words present"
else
echo "Some words missing"
fi
How it works
${str1,,} returns the string str1 with all capitals replaced by lower case.
sed 's/_/\n/g' <<<${str1,,} returns the string str1, all converted to lower case and with underlines replaced by new lines so that each word is on a new line.
<(sed 's/_/\n/g' <<<${str1,,}) returns a file-like object containing all the words in str1, each word lower case and on a separate line.
The creation of file-like objects is called process substitution. It allows us, in this case, to treat the output of a shell command as if it were a file to read.
<(sed 's/_/\n/g' <<<${str2,,}) does the same for str2.
Assuming that file1 and file2 each have one word per line, grep -vwFf file1 file2 removes from file2 every occurrence of a word in file2. If there are no words left, that means that every word in file2 appears in file1.
By adding the option -q, grep will return no output but will set an exit code that we can use in our if statement.
In the actual command, file1 and file2 are replaced by our file-like objects.
The remaining grep options can be understood as follows:
-w tells grep to look for whole words only.
-F tells grep to look for fixed strings, not regular expressions.
-f tells grep to look for the patterns to match in the file (or file-like object) which follows.
-v tells grep to remove (the default is to keep) the words which match.

Here is an awk solution to check existence of all the words from a string in another string:
str1="This_is_a_big_sentence"
str2="Sentence_This_big"
awk -v RS=_ 'FNR==NR{a[tolower($1)]; next} {delete a[tolower($1)]} END{print (length(a)) ? "Not all words" : "All words"}' <(echo "$str2") <(echo "$str1")
With indentation:
awk -v RS=_ 'FNR==NR {
a[tolower($1)];
next
}
{ delete a[tolower($1)] }
END {
print (length(a)) ? "Not all words" : "All words"
}' <(echo "$str2") <(echo "$str1")
Explanation:
-v RS=_ We use record separator as _
FNR==NR - Execute this block for str2
a[tolower($1)]; next - Populate an array a with each lowercase word as key
{delete a[tolower($1)]} - For each word in str1 delete key in array a
END - If length of array a is still not 0 then there are some words left.

Here's another solution:
#!/bin/bash
str1="This_is_a_big_sentence"
str2="sentence_This_big"
var=0
var2=0
while read in
do
if [ $(echo $str1 | grep -ioE $in) ]
then
var=$((var+1))
fi
var2=$((var2+1))
done < <(echo $str2 | sed -e 's/\(.*\)/\L\1/' -e 's/_/\n/g')
if [[ $var -eq $var2 && $var -ne 0 ]]
then
echo "matched"
else
echo "not matched"
What this script does make str2 all lower case with sed -e 's/\(.*\)/\L\1/' which is a substitution of any character with its lower case, then replace underscores _ with return lines \n with the following sed expression: sed -e 's/_/\n/g', which is another substitution.
Now the individual words are fed into a while loop that compares str1 with the word that was fed in. Every time there's a match, increment var and every time we iterate though the while, we increment var2. If var == var2, then all the words of str2 were found in str1. Hope that helps.

Here's an approach.
if [ "$(echo "This_BIG_senTence" | grep -ioE 'this|big|sentence' | wc -l)" == "3" ]; then echo "matched"; fi
How it works.
grep options -i makes the grep case insensitive, -E for extended regular expressions, and -o separates the matches by line. Now that it is separated by line use wc with -l for line count. Since we had 3 conditions we check if it equals 3. Grep will return the lines where the match occurred, so if you are only working with a string, the example above will return the string for each condition, in this case 3, so there won't be any problems.
Note you can also create a grep chain and see if its empty.
if [ $(echo "This_BIG_SenTence" | grep -i this | grep -i big | grep -i sentence) ]; then echo matched; else echo not_matched; fi

Now I know what you mean. Try this:
#!/bin/bash
# add 4 non-matching examples
> snooze.foo_bar
> snooze.bar_go
> snooze.go_foo
> snooze.no_match
# add 3 matching examples
> snooze.foo_bar_go
> snooze.goXX_XXfoo_XXbarXX
> snooze.bar_go_foo_Ok
str1=("foo" "bar" "go")
for i in `ls snooze.*`; do
str2=${i#snooze.}
j=0
found=1
while [[ $j -lt ${#str1[#]} ]]; do
if ! echo $str2 | eval grep \${str1[$j]} >& /dev/null; then
found=0
break
fi
((j++))
done
if [[ $found -ne 0 ]]; then
echo Match found: $str2
fi
done
Resulting print of this script:
Match found: bar_go_foo_Ok
Match found: foo_bar_go
Match found: goXX_XXfoo_XXbarXX
alternatively, the if..grep line above can be replaced by
if [[ ! $str2 =~ `eval echo \${str1[$j]}` ]]; then
utilizing bash's regular expression match.
Note: I am not too careful about special characters in the search string, such as "\" or " " (space), which may cause problem.
--- Some explanations ---
In the if .. grep line, $j is first evaluated to the running index, from 0 to the number of elements in $str1 minus 1. Then, eval will re-evaluate the whole grep command again, causing ${str1[jjj]} to be re-evaluated (Here, jjj is the already evaluated index)
The strategy is to set found=1 (found by default), and then when any grep fails, we set found to 0 and break the inner j-loop.
Everything else should be straightforward.

Append elements of an array to the end of a line

First let me say I followed questions on stackoverflow.com that relate to my question and it seems the rules are not applying. Let me show you.
The following script:
#!/bin/bash
OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
TODAY=`date +"%m-%d-%y"`
HOSTNAME=`hostname`
WORKSPACES=( "bob" "mel" "sideshow-ws2" )
if ! [ -f $OUTPUT_DIR/$HOSTNAME.csv ] && [ $HOSTNAME == "sideshow" ]; then
echo "$TODAY","$HOSTNAME" > $OUTPUT_DIR/$HOSTNAME.csv
echo "${WORKSPACES[0]}," >> $OUTPUT_DIR/$HOSTNAME.csv
sed -i "/^'"${WORKSPACES[0]}"'/$/'"${WORKSPACES[1]}"'/" $OUTPUT_DIR/$HOSTNAME.csv
sed -i "/^'"${WORKSPACES[1]}"'/$/${WORKSPACES[2]}"'/" $OUTPUT_DIR/$HOSTNAME.csv
fi
I want the output to look like:
09-20-14,sideshow
bob,mel,sideshow-ws2
the sed statements are supposed to append successive array elements to preceding ones on the same line. Now I know there's a simpler way to do this like:
echo "${WORKSPACES[0]},${WORKSPACES[1]},${WORKSPACES[2]}" >> $OUTPUT_DIR/$HOSTNAME.csv
But let's say I had 30 elements in the array and I wanted to appended them one after the other on the same line? Can you show me how to loop through the elements in an array and append them one after the other on the same line?
Also let's say I had the output of a command like:
df -m /export/ws/$ws | awk '{if (NR!=1) {print $3}}'
and I wanted to append that to the end of the same line.
But when I run it I get:
+ OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
++ date +%m-%d-%y
+ TODAY=09-20-14
++ hostname
+ HOSTNAME=sideshow
+ WORKSPACES=("bob" "mel" "sideshow-ws2")
+ '[' -f /share/es-ops/Build_Farm_Reports/WorkSpace_Reports/sideshow.csv ']'
And the file right now looks like:
09-20-14,sideshow
bob,
I am happy to report that user syme solved this (see below) but then I realized I need the date in the first column:
09-7-14,bob,mel,sideshow-ws2
Can I do this using syme's for loop?
Okay user syme solved this too he said "Just add $TODAY to the for loop" like this:
for v in "$TODAY" "${WORKSPACES[#]}"
Okay now the output looks like this I changed the elements in the array btw:
sideshow
09-20-14,bob_avail,bob_used,mel_avail,mel_used,sideshow-ws2_avail,sideshow-ws2_used
Now below that the next line will be populated by a , in the first column skipping the date and then:
df -m /export/ws/$v | awk '{if (NR!=1) {print $3}}
which equals the value of available space on bob in the first iteration
and then:
df -m /export/ws/$v | awk '{if (NR!=1) {print $2}}
which equals the value of used space on bob in the 2nd iteration
and then we just move on to the next value in ${WORKSPACE[#]}
which will be mel and do the available and used as we did with bob or $v above.
I know you geniuses on here will make child's play out of this.
I solved my own last question on this thread:
WORKSPACES2=( "bob" "mel" "sideshow-ws2" )
separator="," # defined empty for the first value
for v in "${WORKSPACES2[#]}"
do
available=`df -m /export/ws/$v | awk '{if (NR!=1) {print $3}}'`
used=`df -m /export/ws/$v | awk '{if (NR!=1) {print $2}}'`
echo -n "$separator$available$separator$used" >> $OUTPUT_DIR/$HOSTNAME.csv # append, concatenated, the separator and the value to the file
done
produces:
sideshow
09-20-14,bob_avail,bob_used,mel_avail,mel_used,sideshow-ws2_avail,sideshow-ws2_used
,470400,1032124,661826,1032124,43443,1032108

echo -n permits to print text without the linebreak.
To loop over the values of the array, you can use a for-loop:
echo "$TODAY,$HOSTNAME" > $OUTPUT_DIR/$HOSTNAME.csv # with a linebreak
separator="" # defined empty for the first value
for v in "${WORKSPACES[#]}"
do
echo -n "$separator$v" >> $OUTPUT_DIR/$HOSTNAME.csv # append, concatenated, the separator and the value to the file
separator="," # comma for the next values
done
echo >> $OUTPUT_DIR/$HOSTNAME.csv # add a linebreak (if you want it)

How do I assign the output of a command into an array?

I need to assign the results from a grep to an array... for example
grep -n "search term" file.txt | sed 's/:.*//'
This resulted in a bunch of lines with line numbers in which the search term was found.
1
3
12
19
What's the easiest way to assign them to a bash array? If I simply assign them to a variable they become a space-separated string.

To assign the output of a command to an array, you need to use a command substitution inside of an array assignment. For a general command command this looks like:
arr=( $(command) )
In the example of the OP, this would read:
arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
The inner $() runs the command while the outer () causes the output to be an array. The problem with this is that it will not work when the output of the command contains spaces. To handle this, you can set IFS to \n.
IFS=$'\n' arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
You can also cut out the need for sed by performing an expansion on each element of the array:
arr=($(grep -n "search term" file.txt))
arr=("${arr[#]%%:*}")

Space-separated strings are easily traversable in bash.
# save the ouput
output=$(grep -n "search term" file.txt | sed 's/:.*//')
# iterating by for.
for x in $output; do echo $x; done;
# awk
echo $output | awk '{for(i=1;i<=NF;i++) print $i;}'
# convert to an array
ar=($output)
echo ${ar[3]} # echos 4th element
if you are thinking space in file name use find . -printf "\"%p\"\n"

#Charles Duffy linked the Bash anti-pattern docs in a comment, and those give the most correct answer:
readarray -t arr < <(grep -n "search term" file.txt | sed 's/:.*//')
His comment:
Note that array=( $(command) ) is considered an antipattern, and is the topic of BashPitfalls #50. – Charles Duffy Nov 16, 2020 at 14:07