Removing list of comma separated words from a sentence

Removing list of comma separated words from a sentence - arrays

I have two variables as follows :
sentence="name string,age int,address string,dob timestamp,job string"
ignore="age int,dob timestamp"
Basically I need to iterate through the comma separated variable $ignore and remove each word from the above varibale $sentence.
After performing this the output sentence should be as below:
echo $outputsentence
name string,address string,job string
Should I create an array for words to be ignored and iterate through it an perform a sed operation? Is there any other way around?

With GNU sed:
pattern=$(sed "s/,/|/g" <<< "$ignore")
outputsentence=$(sed -r 's/('"$pattern"'),*//g' <<< "$sentence")
The first sed command replace all , with an alternation operator | in the ignore list.
This result is used as a pattern in to remove the strings from $sentence.

This is a situation that requires sets: you want to know which members of set A are not present in set B.
For this we have a beautiful article Set Operations in the Unix Shell that describes all of them.
If you want to check the intersection of sets, say:
$ comm -12 <(tr ',' '\n' <<< "$sentence" | sort) <(tr ',' '\n' <<< "$ignore" | sort)
age int
dob timestamp
For complement, use comm -23:
$ comm -23 <(tr ',' '\n' <<< "$sentence" | sort) <(tr ',' '\n' <<< "$ignore" | sort)
address string
job string
name string
Note tr ',' '\n' <<< "$var" | sort just splits the ,-separated strings in slices. Then, <( ) is a process substitution.

Related

How to split string to array with specific word in bash

I have a string after I do a command:
[username#hostname ~/script]$ gsql ls | grep "Graph graph_name"
- Graph graph_name(Vertice_1:v, Vertice_2:v, Vertice_3:v, Vertice_4:v, Edge_1:e, Edge_2:e, Edge_3:e, Edge_4:e, Edge_5:e)
Then I do
IFS=", " read -r -a vertices <<< "$(gsql use graph ifgl ls | grep "Graph ifgl(" | cut -d "(" -f2 | cut -d ")" -f1)" to make the string splitted and append to array. But, what I want is to split it by delimiter ", " then append each word that contain ":v" to an array, its mean word that contain ":e" will excluded.
How to do it? without do a looping

Like this, using grep
mapfile -t array < <(gsql ls | grep "Graph graph_name" | grep -oP '\b\w+:v')
The regular expression matches as follows:
Node
Explanation
\b
the boundary between a word char (\w) and something that is not a word char
\w+
word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
:v
':v'

This bash script should work:
declare arr as array variable
arr=()
# use ", " as delimiter to parse the input fed through process substituion
while read -r -d ', ' val || [[ -n $val ]]; do
val="${val%)}"
val="${val#*\(}"
[[ $val == *:v ]] && arr+=("$val")
done < <(gsql ls | grep "Graph graph_name")
# check array content
declare -p arr
Output:
declare -a arr='([0]="Vertice_1:v" [1]="Vertice_2:v" [2]="Vertice_3:v" [3]="Vertice_4:v")'

Since there is a condition per element the logical way is to use a loop. There may be ways to do it, but here is a solution with a for loop:
#!/bin/bash
input="Vertice_1:v, Vertice_2:v, Vertice_3:v, Vertice_4:v, Edge_1:e, Edge_2:e, Edge_3:e, Edge_4:e, Edge_5:e"
input="${input//,/ }" #replace , with SPACE (bash array uses space as separator)
inputarray=($input)
outputarray=()
for item in "${inputarray[#]}"; do
if [[ $item =~ ":v" ]]; then
outputarray+=($item) #append the item to the output array
fi
done
echo "${outputarray[#]}"
will give output: Vertice_1:v Vertice_2:v Vertice_3:v Vertice_4:v
since the elements don't have space in them this works

Strange behaviour while subtracting 2 string arrays

I am subtracting array1 from array2
My 2 arrays are
array1=(apps argocd cache core dev-monitoring-busk test-ci-cd)
array2=(apps argocd cache core default kube-system kube-public kube-node-lease monitoring)
And the way Im subtracting them is
for i in "${array2[#]}"; do
array1=(${array1[#]//$i})
done
echo ${array1[#]}
Now my expected result should be
dev-monitoring-busk test-ci-cd
But my expected result is
dev--busk test-ci-cd
Although the subtraction looks good but its also deleting the string monitoring from dev-monitoring-busk. I dont understand why. Can some point out whats wrong here ?
I know that there are other solutions out there for a diff between 2 arrays like
echo ${Array1[#]} ${Array2[#]} | tr ' ' '\n' | sort | uniq -u
But this is more of a diff and not a subtraction. So this does not work for me.

Bit of a kludge but it works ...
use comm to find those items unique to a (sorted) data set
use tr to convert between spaces (' ' == array element separator) and carriage returns ('\n' ; comm works on individual lines)
echo "${array1[#]}" | tr ' ' '\n' | sort : convert an array's elements into separate lines and sort
comm -23 (sorted data set #1) (sorted data set #2) : compare sorted data sets and return the rows that only exist in data set #1
Pulling this all together gives us:
$ array1=(apps argocd cache core dev-monitoring-busk test-ci-cd)
$ array2=(apps argocd cache core default kube-system kube-public kube-node-lease monitoring)
# find rows that only exist in array1
$ comm -23 <(echo "${array1[#]}" | tr ' ' '\n' | sort) <(echo "${array2[#]}" | tr ' ' '\n' | sort)
dev-monitoring-busk
test-ci-cd
# same thing but this time replace carriage returns with spaces (ie, pull all items onto a single line of output):
$ comm -23 <(echo "${array1[#]}" | tr ' ' '\n' | sort) <(echo "${array2[#]}" | tr ' ' '\n' | sort) | tr '\n' ' '
dev-monitoring-busk test-ci-cd
NOTEs about comm:
- takes 2 sorted data sets as input
- generates 3 columns of output:
- (output column #1) rows only in data set #1
- (output column #2) rows only in data set #2
- (output column #3) rows in both data sets #1 and #2
- `comm -xy` ==> discard ouput columns 'x' and 'y'
- `comm -12` => discard output columns #1 and #2 => only show lines common to both data sets (output column #3)
- `comm -23' => discard output columns #2 and #3 => only show lines that exist in data set #1 (output column #1)

If I'm understanding correctly, what you want is not to subtract array1
from array2, but to subtract array2 from array1.
As others are pointing out, bash replacement do not work with arrays.
Instead you can make use of an associative array if your bash version >= 4.2.
Please try the following:
declare -a array1=(apps argocd cache core dev-monitoring-busk test-ci-cd)
declare -a array2=(apps argocd cache core default kube-system kube-public kube-node-lease monitoring)
declare -A mark
declare -a ans
for e in "${array2[#]}"; do
mark[$e]=1
done
for e in "${array1[#]}"; do
[[ ${mark[$e]} ]] || ans+=( "$e" )
done
echo "${ans[#]}"
It first iterate over array2 and marks its elements by using
an associative arrray mark.
It then iterates over array1 and add the element to the answer
if it is not seen in the mark.

Bash: set IFS to Space after specific character only?

I'm using IFS=', ' to split a string of comma-delimited text into an array. The problem is that occasionally one of the comma-delimited items contains a space following a :. The resulting array contains that item as two separate array elements. Is it possible to set IFS to only split ', ' and ignore a comma-delimited item that contains ': ' (or any other character for that matter)?
See the comma-delimited string returned from the first command below, note the second item has the :. See the MarkerNames[1] and MarkerNames[2] to see the unwanted split in the second command below.
$ exiftool -s3 -TracksMarkersName audioFile.wav
Marker1, Tempo: 120.0, Silence, Marker2, Silence.1, Marker3, Silence.2, Marker4, Silence.3, Marker5
$ IFS=', ' read -r -a MarkerNames <<< $(exiftool -s3 -TracksMarkersName audioFile.wav)
$ declare -p MarkerNames
declare -a MarkerNames='([0]="Marker1" [1]="Tempo:" [2]="120.0" [3]="Silence" [4]="Marker2" [5]="Silence.1" [6]="Marker3" [7]="Silence.2" [8]="Marker4" [9]="Silence.3" [10]="Marker5")'

IFS contains an enumeration of the characters which each can be a field separator. So ", " says "any run of spaces or commas separates my fields".
The simplest workaround I think would be to preprocess the output so you get the breaks where you want them.
IFS='~' MarkerNames=($(exiftool -s3 -TracksMarkersName audioFile.wav | sed 's/, /~/g'))
This of course requires you to find another IFS value which doesn't occur in your data. If Bash 4+ is available, maybe use a newline and readarray.

You could split on commas and remove the leading / trailing spaces afterwards:
IFS=',' read -r -a MarkerNames <<< $(exiftool -s3 -TracksMarkersName audioFile.wav)
shopt -s extglob # Needed for extended glob
MarkerNames=( "${MarkerNames[#]/#*( )}" ) # Remove leading spaces
MarkerNames=( "${MarkerNames[#]/%*( )}" ) # Remove trailing spaces

Spaces in array content getting broken with grep

I am using array to tackle with spaces in line of my file. But when i am using grep to filter with value of array it is breaking because of spaces.
For example my line is as per below
bbbh.cone.abc.com:/home 'bbbh.cone.abc.com
As it has spaces i am using array as per below.
object1=$(echo "$line" | awk '{print $1}' )
object2=$(echo "$line" | awk '{print $2}' )
object3=$(echo "$line" | awk '{print $3}' )
object4=$(echo "$line" | awk '{print $4}' )
hiteshcharry=("$object1" "$object2" "$object3" "$object4")
grep "${hiteshcharry[#]}" <filename>
It give me error because of spaces.
Below is the example.
I have below line in my file.
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
So i have 2 spaces in my above line. I have written my script in such way so that it can handle a line with maximum 4 spaces.
When i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
it give me error because of spaces. However when i print the value of array it show me correct value.
Example : -
one of line from my file is as below( it has 2 spaces)
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
I am putting this value in my array named as hiteshcharry. when i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
It is giving me error because of spaces in value of array. In output it should filter the line having value equal to array named hiteshcharry.
I hope this is clear now.
Output of omnidb command is in picture. So i want to grep the lines having
"st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space
'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'" from
output of omnidb command which is in picture
enter image description here
Thanks. i have added declare -p hiteshcharry and it start printing the each elements of array. But i am error shown in picture .
enter image description here

When you pass your array to grep through "${array[#]}", grep will see each array element as a separate argument. So, the first element would become the pattern to search for, and the second element onwards would become the file names to be searched on. Obviously, that's not what you want.
You can use process substitution to make grep match the strings contained in your array, like this:
omnidb -session "$sessionid" -detail | grep -Fxf <(printf '%s\n' "${hiteshcharry[#]}")
printf will print your array elements one line per element
grep -Fxf treats the about output as a file containing strings to be searched (-F option treats them as strings, not patterns, -x matches the whole line of omnidb output, preventing any partial matches)

put input with spaces as a single element in array in bash

I have a file from which I extract the first three columns using the cut command and write them into an array.
When I check the length of the array , it is giving me four. I need the array to have only 3 elements.
I think it's taking space as the delimiter for array elements.
aaa|111|ADAM|1222|aauu
aaa|222|MIKE ALLEN|5678|gggg
aaa|333|JOE|1222|eeeee
target=($(cut -d '|' -f1-3 sample_file2.txt| sort -u ))

In bash 4 or later, use readarray with process substitution to populate the array. As is, your code cannot distinguish between the whitespace separating each line in the output from the whitespace occurring in "Mike Allen". The readarray command puts each line of the input into a separate array element.
readarray -t target < <(cut -d '|' -f1-3 sample_file2.txt| sort -u)
Prior to bash 4, you need a loop to read each line individually to assign to the array.
while IFS='' read -r line; do
target+=("$line")
done < <(cut -d '|' -f1-3 sample_file2.txt | sort -u)

This should work:
IFS=$'\n' target=($(cut -d '|' -f1-3 sample_file2.txt| sort -u ))
Example:
#!/bin/bash
IFS=$'\n' target=($(cut -d '|' -f1-3 sample_file2.txt| sort -u ))
echo ${#target[#]}
echo "${target[1]}"
Output:
3
aaa|222|MIKE ALLEN

As an alternative, using the infamous eval,
eval target=($(cut -sd '|' -f1-3 sample_file2.txt | sort -u | \
xargs -d\\n printf "'%s'\n"))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Removing list of comma separated words from a sentence - arrays

With GNU sed: pattern=$(sed "s/,/|/g" <<< "$ignore") outputsentence=$(sed -r 's/('"$pattern"'),*//g' <<< "$sentence") The first sed command replace all , with an alternation operator | in the ignore list. This result is used as a pattern in to remove the strings from $sentence.

Related

How to split string to array with specific word in bash

Strange behaviour while subtracting 2 string arrays

Bash: set IFS to Space after specific character only?

Spaces in array content getting broken with grep

put input with spaces as a single element in array in bash

Categories

Resources