How to properly pass a $string with spaces into grep - arrays

i tried to make bash script that can find "keyword" inside *.desktop file. my approach is to set some keyword as array, then pass it to grep, it work flawlessly until the keyword has at least two word separated by space.
what it should be
cat /usr/share/applications/*.desktop | grep -i "Mail Reader"
what i have tried
search=$(printf 'Name=%s' "${appsx[$index]}")
echo \""$search\"" #debug
cat /usr/share/applications/*.desktop | grep -i $search
search=$(printf 'Name=%s' "${appsx[$index]}")
echo \""$search\"" #debug
cat /usr/share/applications/*.desktop | grep -i \""$search\""
search=$(printf '"Name=%s"' "${appsx[$index]}")
echo $search #debug
cat /usr/share/applications/*.desktop | grep -i $search
any suggestions is highly appreciated

If you simply assign Mail Reader to the variable search like below
search=Mail Reader
bash would complain that Reader command is not found as it takes anything after that first blank character to be a subsequent command. What you need is
search="Mail Reader" # 'Mail Reader' would also do.
In the case of your command substitution, things are not different, you need double quote wrappers though, as the substitution itself would not happen inside the single
quotes
search="$(command)"
In your case, you did an overkill using a command substitution though. It could be well simplified to:
search="Name=${appsx[$index]}"
# Then do the grep.
# Note that cat-grep combo could be simplified to
# -h suppresses printing filenames to get same result as cat .. | grep
grep -ih "$search" /usr/share/applications/*.desktop

Related

"basename" command won't include multiple files

I have a problem with “basename” command as follow:
In my host directory I have two samples’ fastq.gz files, named as:
A29_WES_S3_R1_001.fastq.gz
A29_WES_S3_R2_001.fastq.gz
A30_WES_S1_R1_001.fastq.gz
A30_WES_S1_R2_001.fastq.gz
Now I need to have their basename without suffix like:
A29_WES_S3_R1_001
A29_WES_S3_R2_001
A30_WES_S1_R1_001
A30_WES_S1_R2_001
I used the bash pipeline as follow:
#!/bin/bash
FILES1=(*R1_001.fastq.gz)
FILES2=(*R2_001.fastq.gz)
read1="${FILES1[#]}"
read2="${FILES2[#]}"
Ffile=$read1
Ffileprevix=$(basename "$Ffile" .fastq.gz)
Mfile=$read2
Mfileprevix=$(basename "$Mfile" .fastq.gz)
echo $Ffileprevix
echo $Mfileprevix
exit;
But every time I just get this output:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001
Only the last file (A30) would be included in the command!
I checked my pipeline in this way:
echo $read1
echo $read2
The result:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001.fastq.gz
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001.fastq.gz
Then I did:
echo $Ffile
echo $Mfile
The result:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001.fastq.gz
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001.fastq.gz
So $read1, $read2, $Ffile, and $Mfile work well.
Then I put “-a” in my basename command as it will take multiple files:
Ffileprevix=$(basename -a "$Ffile" .fastq.gz)
Mfileprevix=$(basename -a "$Mfile" .fastq.gz)
But it got worse! The result was like:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001.fastq.gz .fastq.gz
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001.fastq.gz .fastq.gz
Finally, I tried “for ..... do ....” command to make a loop for basename command. Again, nothing changed!!
Is there anybody can help me to obtain what I want:
A29_WES_S3_R1_001
A29_WES_S3_R2_001
A30_WES_S1_R1_001
A30_WES_S1_R2_001
I'd leave basename out of this entirely, but that's entirely personal preference. You could do something more like:
FILES_PATTERN_1=".*R1_001.fastq.gz"
FILES_PATTERN_2=".*R2_001.fastq.gz"
# Get FILE PATTERN 1
echo "Pattern 1:"
for FILE in $(find . | grep "${FILES_PATTERN_1}" | cut -d. -f2 | tr -d /); do
echo $FILE
done
# Get FILE PATTERN 2
echo "Pattern 2:"
for FILE in $(find . | grep "${FILES_PATTERN_2}" | cut -d. -f2 | tr -d /); do
echo $FILE
done
Output should be:
Pattern 1:
A30_WES_S1_R1_001
A29_WES_S3_R1_001
Pattern 2:
A29_WES_S3_R2_001
A30_WES_S1_R2_001
You could also play with awk to parse things instead:
# Get FILE PATTERN 1
echo "Pattern 1:"
for FILE in $(find . | grep "${FILES_PATTERN_1}" | awk -F '[/.]' '{print $3}'); do
echo $FILE
done
There are a number of ways to approach this. If you had a lot more patterns to test you could make more use of functions here to reduce code duplication.
Also note, I'm doing this from a shell on Mac OSX, so if you're doing this from a Linux box some of these commands may need to be tweaked due to differences in output for some commands, like find. (ex: print $1 instead of print $3)

Counting the number of files in a directory that contain the different variables in my array - bash script

I have a bash script, which needs to check certain files for certain variables, and count how many files come back containing those variables.
As there is more than one variable I need to look for I decided to to use an array for the variables.
The code I am using is below:
#!/bin/bash
declare -a MYARRAY=('Variable One' 'Variable Two' 'Variable Three');
COUNT_MYARRAY=$(find $DIRECTORY -mtime -1 -exec grep -ln $MYARRAY {} \; | wc -l)
I have declared the $DIRECTORY in my real script.
However, it does not seem to pick up files if they have the second and third variable within?
Can anyone see where I might be going wrong?
You can use greps regex support and pass multiple expressions using 'var1\|var2'. First construct the grep argument and then execute grep.
You don't need line numbers -n to grep to count the files...
grep can handle multiple files - it will be faster to pass multiple files to one grep with -exec ... +, rather then spawn grep for each file.
UPPER_CASE_VARIABLES are shouting at me and by convention upper vase variables are reserved for exported variables.
myarray=('Variable One' 'Variable Two' 'Variable Three')
arg=$(printf "%s\|" "${MYARRAY[#]}" | sed 's/\\|$//')
directory=.
count_myarray=$(find "$directory" -type f -mtime -1 -exec grep -l "$arg" {} + | wc -l)
Alternatively: you can pass multiple -exec arguments to find. So first from myarray construct arguments to find in the form -exec grep -l <the var>. Note that multiple variables can be in same files, so get unique filenames after grepping.
myarray=('Variable One' 'Variable Two' 'Variable Three');
findargs=()
for i in "${MYARRAY[#]}"; do
findargs+=(-exec grep -l "$i" {} +)
done
directory=.
count_myarray=$(find "$directory" -type f -mtime -1 "${findargs[#]}" | sort -u | wc -l)
or similar:
count_myarray=$(printf '-exec\0grep\0-l\0%s\0{}\0+\0' "${myarray[#]}" | xargs -0 find "$directory" -type f -mtime -1 | sort -u | wc -l)
Remember to quote your variable expansions to protect against whitespaces or special characters in filenames and directory names.
Going wrong:
With echo $MYARRAY you find Variable One, not the string you want for grep.
Also note that it is better to use lowercase for your variable names. I will use ${directory} and not $DIRECTORY (and in double quotes for directories with a space).
You have more options with grep. When you want a file with 8 occurances counted one, you can not use the grep option -c. An useful option is -r. You are looking for something like
grep -Erl "Variable One|Variable Two|Variable Three" | wc -l
This is difficult when the variables might have special characters like $or |.
Another option of grep is using the option
-f FILE, Obtain patterns from FILE, one per line
So you should make a function that writes the variables to a file, and use something like
grep -rlFf "myVariablesFile" "${directory}" | wc -l
When the content of the file is changing rapidly, you might want to avoid the temporary file with
grep -rlFf <(function_that_writes_variables_to_stdout) "${directory}"| wc -l
or directly
grep -rlFf <(printf "%s\n" "${var1}" "${var2}" "${var3}") "${directory}" | wc -l

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

Content of array in bash is OK when called directly, but lost when called from function

I am trying to use xmllint to search an xml file and store the values I need into an array. Here is what I am doing:
#!/bin/sh
function getProfilePaths {
unset profilePaths
unset profilePathsArr
profilePaths=$(echo 'cat //profiles/profile/#path' | xmllint --shell file.xml | grep '=' | grep -v ">" | cut -f 2 -d "=" | tr -d \")
profilePathsArr+=( $(echo $profilePaths))
return 0
}
In another function I have:
function useProfilePaths {
getProfilePaths
for i in ${profilePathsArr[#]}; do
echo $i
done
return 0
}
useProfilePaths
The behavior of the function changes whether I do the commands manually on the command line VS calling them from different function as part of a wrapper script. When I can my function from a wrapper script, the items in the array are 1, compared to when I do it from the command line, it's 2:
$ echo ${#profilePathsArr[#]}
2
The content of profilePaths looks like this when echoed:
$ echo ${profilePaths}
/Profile/Path/1 /Profile/Path/2
I am not sure what the separator is for an xmllint call.
When I call my function from my wrapper script, the content of the first iteration of the for loop looks like this:
for i in ${profilePathsArr[#]}; do
echo $i
done
the first echo looks like:
/Profile/Path/1
/Profile/Path/2
... and the second echo is empty.
Can anyone help me debug this issue? If I could find out what is the separator used by xmllint, maybe I could parse the items correctly in the array.
FYI, I have already tried the following approach, with the same result:
profilePaths=($(echo 'cat //profiles/profile/#path' | xmllint --shell file.xml | grep '=' | grep -v ">" | cut -f 2 -d "=" | tr -d \"))
Instead of using the --shell switch and many pipes, you should use the proper --xpath switch.
But as far of I know, when you have multiple values, there's no simple way to split the different nodes.
So a solution is to iterate like this :
profilePaths=(
$(
for i in {1..100}; do
xmllint --xpath "//profiles[$i]/profile/#path" file.xml || break
done
)
)
or use xmlstarlet:
profilePaths=( $(xmlstarlet sel -t -v "//profiles/profile/#path" file.xml) )
it display output with newlines by default
The problem you're having is related to data encapsulation; specifically, variables defined in a function are local, so you can't access them outside that function unless you define them otherwise.
Depending on the implementation of sh you're using, you may be able get around this by using eval on your variable definition or with a modifier like global for mksh and declare -g for zsh and bash. I know that mksh's implementation definitely works.
Thank you for providing feedback on how I can resolve this problem. After investigating more, I was able to make this work by changing the way I was iterating the content of my 'profilePaths' variable to insert its values into the 'profilePathsArr' array:
# Retrieve the profile paths from file.xml and assign to 'profilePaths'
profilePaths=$(echo 'cat //profiles/profile/#path' | xmllint --shell file.xml | grep '=' | grep -v ">" | cut -f 2 -d "=" | tr -d \")
# Insert them into the array 'profilePathsArr'
IFS=$'\n' read -rd '' -a profilePathsArr <<<"$profilePaths"
For some reason, with all the different function calls from my master script and calls to other scripts, it seemed like the separators were lost along the way. I am unable to find the root cause, but I know that by using "\n" as the IFS and a while loop, it worked like a charm.
If anybody wishes to add more comments on this, you are more than welcome.

Using a variable to pass grep pattern in bash

I am struggling with passing several grep patterns that are contained within a variable. This is the code I have:
#!/bin/bash
GREP="$(which grep)"
GREP_MY_OPTIONS="-c"
for i in {-2..2}
do
GREP_MY_OPTIONS+=" -e "$(date --date="$i day" +'%Y-%m-%d')
done
echo $GREP_MY_OPTIONS
IFS=$'\n'
MYARRAY=( $(${GREP} ${GREP_MY_OPTIONS} "/home/user/this path has spaces in it/"*"/abc.xyz" | ${GREP} -v :0$ ) )
This is what I wanted it to do:
determine/define where grep is
assign a variable (GREP_MY_OPTIONS) holding parameters I will pass to grep
assign several patterns to GREP_MY_OPTIONS
using grep and the patterns I have stored in $GREP_MY_OPTIONS search several files within a path that contains spaces and hold them in an array
When I use "echo $GREP_MY_OPTIONS" it is generating what I expected but when I run the script it fails with an error of:
/bin/grep: invalid option -- ' '
What am I doing wrong? If the path does not have spaces in it everything seems to work fine so I think it is something to do with the IFS but I'm not sure.
If you want to grep some content in a set of paths, you can do the following:
find <directory> -type f -print0 |
grep "/home/user/this path has spaces in it/\"*\"/abc.xyz" |
xargs -I {} grep <your_options> -f <patterns> {}
So that <patterns> is a file containing the patterns you want to search for in each file from directory.
Considering your answer, this shall do what you want:
find "/path\ with\ spaces/" -type f | xargs -I {} grep -H -c -e 2013-01-17 {}
From man grep:
-H, --with-filename
Print the file name for each match. This is the default when
there is more than one file to search.
Since you want to insert the elements into an array, you can do the following:
IFS=$'\n'; array=( $(find "/path\ with\ spaces/" -type f -print0 |
xargs -I {} grep -H -c -e 2013-01-17 "{}") )
And then use the values as:
echo ${array[0]}
echo ${array[1]}
echo ${array[...]}
When using variables to pass the parameters, use eval to evaluate the entire line. Do the following:
parameters="-H -c"
eval "grep ${parameters} file"
If you build the GREP_MY_OPTIONS as an array instead of as a simple string, you can get the original outline script to work sensibly:
#!/bin/bash
path="/home/user/this path has spaces in it"
GREP="$(which grep)"
GREP_MY_OPTIONS=("-c")
j=1
for i in {-2..2}
do
GREP_MY_OPTIONS[$((j++))]="-e"
GREP_MY_OPTIONS[$((j++))]=$(date --date="$i day" +'%Y-%m-%d')
done
IFS=$'\n'
MYARRAY=( $(${GREP} "${GREP_MY_OPTIONS[#]}" "$path/"*"/abc.xyz" | ${GREP} -v :0$ ) )
I'm not clear why you use GREP="$(which grep)" since you will execute the same grep as if you wrote grep directly — unless, I suppose, you have some alias for grep (which is then the problem; don't alias grep).
You can do one thing without making things complex:
First do a change directory in your script like following:
cd /home/user/this\ path\ has\ spaces\ in\ it/
$ pwd
/home/user/this path has spaces in it
or
$ cd "/home/user/this path has spaces in it/"
$ pwd
/home/user/this path has spaces in it
Then do what ever your want in your script.
$(${GREP} ${GREP_MY_OPTIONS} */abc.xyz)
EDIT :
[sgeorge#sgeorge-ld stack1]$ ls -l
total 4
drwxr-xr-x 2 sgeorge eng 4096 Jan 19 06:05 test tesd
[sgeorge#sgeorge-ld stack1]$ cat test\ tesd/file
SUKU
[sgeorge#sgeorge-ld stack1]$ grep SUKU */file
SUKU
EDIT :
[sgeorge#sgeorge-ld stack1]$ find */* -print | xargs -I {} grep SUKU {}
SUKU

Resources