Saving egrep output containing '*' to bash array - arrays

I would like to save egrep output to a bash array:
arr=( $(egrep -Rn 'regex') )
If there so happens to be a '*' in the egrep result, it appears like bash is expanding the '*' to be all files in current directory. And the expansion plus results of egrep are then saved into arr.
How do I fix this? I want the '*' in the grep results to be unaltered.

To use the idiom attempted in the question "correctly" might look something like this:
# DON'T DO THIS.
set -f # turn off globbing
IFS=$'\n' # word-split only on newlines
arr=( $(...) ) # populate array
unset IFS # return IFS to defaults (assuming it was there before)
set +f # turn globbing back on
Obviously, there's a lot of room to get this wrong and leave your shell in a state other than the way it started (What if your script had a different initial IFS value? What if this code is sourced from a script that wants globbing to be disabled to work correctly?). Don't do it.
One approach, compatible with bash 3.x, is to use read -a (reading into an array) with IFS (used to separate fields) containing a newline, and -d (used to separate records) set to a NUL:
IFS=$'\n' read -r -d '' -a arr < <(egrep -Rn 'regex' && printf '\0')
The trailing NUL added to the input is present to ensure that read exits successfully; otherwise, this could trigger an abrupt exit if using set -e.
A longer but more explicit approach is to do the iteration yourself:
arr=( )
while IFS= read -r; do
arr+=( "$REPLY" )
done < <(egrep -Rn 'regex')
Another, using bash 4.x features (readarray, AKA mapfile):
readarray -t arr < <(egrep -Rn 'regex')

Related

split more than one maching output of awk '{print $1}' and store it in variables [duplicate]

I need to read the output of a command in my script into an array. The command is, for example:
ps aux | grep | grep | x
and it gives the output line by line like this:
10
20
30
I need to read the values from the command output into an array, and then I will do some work if the size of the array is less than three.
The other answers will break if output of command contains spaces (which is rather frequent) or glob characters like *, ?, [...].
To get the output of a command in an array, with one line per element, there are essentially 3 ways:
With Bash≥4 use mapfile—it's the most efficient:
mapfile -t my_array < <( my_command )
Otherwise, a loop reading the output (slower, but safe):
my_array=()
while IFS= read -r line; do
my_array+=( "$line" )
done < <( my_command )
As suggested by Charles Duffy in the comments (thanks!), the following might perform better than the loop method in number 2:
IFS=$'\n' read -r -d '' -a my_array < <( my_command && printf '\0' )
Please make sure you use exactly this form, i.e., make sure you have the following:
IFS=$'\n' on the same line as the read statement: this will only set the environment variable IFS for the read statement only. So it won't affect the rest of your script at all. The purpose of this variable is to tell read to break the stream at the EOL character \n.
-r: this is important. It tells read to not interpret the backslashes as escape sequences.
-d '': please note the space between the -d option and its argument ''. If you don't leave a space here, the '' will never be seen, as it will disappear in the quote removal step when Bash parses the statement. This tells read to stop reading at the nil byte. Some people write it as -d $'\0', but it is not really necessary. -d '' is better.
-a my_array tells read to populate the array my_array while reading the stream.
You must use the printf '\0' statement after my_command, so that read returns 0; it's actually not a big deal if you don't (you'll just get an return code 1, which is okay if you don't use set -e – which you shouldn't anyway), but just bear that in mind. It's cleaner and more semantically correct. Note that this is different from printf '', which doesn't output anything. printf '\0' prints a null byte, needed by read to happily stop reading there (remember the -d '' option?).
If you can, i.e., if you're sure your code will run on Bash≥4, use the first method. And you can see it's shorter too.
If you want to use read, the loop (method 2) might have an advantage over method 3 if you want to do some processing as the lines are read: you have direct access to it (via the $line variable in the example I gave), and you also have access to the lines already read (via the array ${my_array[#]} in the example I gave).
Note that mapfile provides a way to have a callback eval'd on each line read, and in fact you can even tell it to only call this callback every N lines read; have a look at help mapfile and the options -C and -c therein. (My opinion about this is that it's a little bit clunky, but can be used sometimes if you only have simple things to do — I don't really understand why this was even implemented in the first place!).
Now I'm going to tell you why the following method:
my_array=( $( my_command) )
is broken when there are spaces:
$ # I'm using this command to test:
$ echo "one two"; echo "three four"
one two
three four
$ # Now I'm going to use the broken method:
$ my_array=( $( echo "one two"; echo "three four" ) )
$ declare -p my_array
declare -a my_array='([0]="one" [1]="two" [2]="three" [3]="four")'
$ # As you can see, the fields are not the lines
$
$ # Now look at the correct method:
$ mapfile -t my_array < <(echo "one two"; echo "three four")
$ declare -p my_array
declare -a my_array='([0]="one two" [1]="three four")'
$ # Good!
Then some people will then recommend using IFS=$'\n' to fix it:
$ IFS=$'\n'
$ my_array=( $(echo "one two"; echo "three four") )
$ declare -p my_array
declare -a my_array='([0]="one two" [1]="three four")'
$ # It works!
But now let's use another command, with globs:
$ echo "* one two"; echo "[three four]"
* one two
[three four]
$ IFS=$'\n'
$ my_array=( $(echo "* one two"; echo "[three four]") )
$ declare -p my_array
declare -a my_array='([0]="* one two" [1]="t")'
$ # What?
That's because I have a file called t in the current directory… and this filename is matched by the glob [three four]… at this point some people would recommend using set -f to disable globbing: but look at it: you have to change IFS and use set -f to be able to fix a broken technique (and you're not even fixing it really)! when doing that we're really fighting against the shell, not working with the shell.
$ mapfile -t my_array < <( echo "* one two"; echo "[three four]")
$ declare -p my_array
declare -a my_array='([0]="* one two" [1]="[three four]")'
here we're working with the shell!
You can use
my_array=( $(<command>) )
to store the output of command <command> into the array my_array.
You can access the length of that array using
my_array_length=${#my_array[#]}
Now the length is stored in my_array_length.
Here is a simple example. Imagine that you are going to put the files and directory names (under the current folder) to an array and count them. The script would be like;
my_array=( `ls` )
my_array_length=${#my_array[#]}
echo $my_array_length
Or, you can iterate over this array by adding the following script:
for element in "${my_array[#]}"
do
echo "${element}"
done
Please note that this is the core concept and the input must be sanitized before the processing, i.e. removing extra characters, handling empty Strings, and etc. (which is out of the topic of this thread).
It helps me all the time suppose you want to copy whole list of directories into current directory into an array
bucketlist=($(ls))
#then print them one by one
for bucket in "${bucketlist[#]}"; do
echo " here is bucket: ${bucket}"
done

Creating an array from a text file in Bash

A script takes a URL, parses it for the required fields, and redirects its output to be saved in a file, file.txt. The output is saved on a new line each time a field has been found.
file.txt
A Cat
A Dog
A Mouse
etc...
I want to take file.txt and create an array from it in a new script, where every line gets to be its own string variable in the array. So far I have tried:
#!/bin/bash
filename=file.txt
declare -a myArray
myArray=(`cat "$filename"`)
for (( i = 0 ; i < 9 ; i++))
do
echo "Element [$i]: ${myArray[$i]}"
done
When I run this script, whitespace results in words getting split and instead of getting
Desired output
Element [0]: A Cat
Element [1]: A Dog
etc...
I end up getting this:
Actual output
Element [0]: A
Element [1]: Cat
Element [2]: A
Element [3]: Dog
etc...
How can I adjust the loop below such that the entire string on each line will correspond one-to-one with each variable in the array?
Use the mapfile command:
mapfile -t myArray < file.txt
The error is using for -- the idiomatic way to loop over lines of a file is:
while IFS= read -r line; do echo ">>$line<<"; done < file.txt
See BashFAQ/005 for more details.
mapfile and readarray (which are synonymous) are available in Bash version 4 and above. If you have an older version of Bash, you can use a loop to read the file into an array:
arr=()
while IFS= read -r line; do
arr+=("$line")
done < file
In case the file has an incomplete (missing newline) last line, you could use this alternative:
arr=()
while IFS= read -r line || [[ "$line" ]]; do
arr+=("$line")
done < file
Related:
Need alternative to readarray/mapfile for script on older version of Bash
You can do this too:
oldIFS="$IFS"
IFS=$'\n' arr=($(<file))
IFS="$oldIFS"
echo "${arr[1]}" # It will print `A Dog`.
Note:
Filename expansion still occurs. For example, if there's a line with a literal * it will expand to all the files in current folder. So use it only if your file is free of this kind of scenario.
Use mapfile or read -a
Always check your code using shellcheck. It will often give you the correct answer. In this case SC2207 covers reading a file that either has space separated or newline separated values into an array.
Don't do this
array=( $(mycommand) )
Files with values separated by newlines
mapfile -t array < <(mycommand)
Files with values separated by spaces
IFS=" " read -r -a array <<< "$(mycommand)"
The shellcheck page will give you the rationale why this is considered best practice.
You can simply read each line from the file and assign it to an array.
#!/bin/bash
i=0
while read line
do
arr[$i]="$line"
i=$((i+1))
done < file.txt
This answer says to use
mapfile -t myArray < file.txt
I made a shim for mapfile if you want to use mapfile on bash < 4.x for whatever reason. It uses the existing mapfile command if you are on bash >= 4.x
Currently, only options -d and -t work. But that should be enough for that command above. I've only tested on macOS. On macOS Sierra 10.12.6, the system bash is 3.2.57(1)-release. So the shim can come in handy. You can also just update your bash with homebrew, build bash yourself, etc.
It uses this technique to set variables up one call stack.
Make sure set the Internal File Separator (IFS)
variable to $'\n' so that it does not put each word
into a new array entry.
#!/bin/bash
# move all 2020 - 2022 movies to /backup/movies
# put list into file 1 line per dir
# dirs are "movie name (year)/"
ls | egrep 202[0-2] > 2020_movies.txt
OLDIFS=${IFS}
IFS=$'\n' #fix separator
declare -a MOVIES # array for dir names
MOVIES=( $( cat "${1}" ) ) // load into array
for M in ${MOVIES[#]} ; do
echo "[${M}]"
if [ -d "${M}" ] ; then # if dir name
mv -v "$M" /backup/movies/
fi
done
IFS=${OLDIFS} # restore standard separators
# not essential as IFS reverts when script ends
#END

How can I store the "find" command results as an array in Bash

I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.
Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.
Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.
The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"
If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.
you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"
None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done
In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")

How to write an array ignoring space characters in shell scripting?

I have a text file which consists of say ..following information say test.text:
an
apple of
one's eye
I want to read these lines in an array using shell scripting by doing a cat test.text. I have tried using a=(`cat test.text`), but that doesn't work as it considers space as a delimiter. I need the values as a[0]=an , a[1]=apple of , a[2]=one's eye. I don't want to use IFS. Need help, thanks in advance..!!
In bash 4 or later
readarray a < test.text
This will include an empty element for each blank line, so you might want to remove the empty lines from the input file first.
In earlier versions, you'll need to build the array manually.
a=()
while read; do a+=("$REPLY"); done < test.text
One of various options you have is to use read with bash. Set IFS to the newline and line separator to NUL
IFS=$'\n' read -d $'\0' -a a < test.txt
Plain sh
IFS='
'
set -- $(< test.txt)
unset IFS
echo "$1"
echo "$2"
echo "$#"
bash
IFS=$'\n' a=($(< test.txt))
echo "${a[0]}"
echo "${a[1]}"
echo "${a[#]}"
I'm inclined to say these are the best of the available solutions because they do not involve looping.
Let's say:
cat file
an
apple of
one's eye
Use this while loop:
arr=()
while read -r l; do
[[ -n "$l" ]] && arr+=("$l")
done < file
TEST
set | grep arr
arr=([0]="an" [1]="apple of" [2]="one's eye")

Reading output of a command into an array in Bash

I need to read the output of a command in my script into an array. The command is, for example:
ps aux | grep | grep | x
and it gives the output line by line like this:
10
20
30
I need to read the values from the command output into an array, and then I will do some work if the size of the array is less than three.
The other answers will break if output of command contains spaces (which is rather frequent) or glob characters like *, ?, [...].
To get the output of a command in an array, with one line per element, there are essentially 3 ways:
With Bash≥4 use mapfile—it's the most efficient:
mapfile -t my_array < <( my_command )
Otherwise, a loop reading the output (slower, but safe):
my_array=()
while IFS= read -r line; do
my_array+=( "$line" )
done < <( my_command )
As suggested by Charles Duffy in the comments (thanks!), the following might perform better than the loop method in number 2:
IFS=$'\n' read -r -d '' -a my_array < <( my_command && printf '\0' )
Please make sure you use exactly this form, i.e., make sure you have the following:
IFS=$'\n' on the same line as the read statement: this will only set the environment variable IFS for the read statement only. So it won't affect the rest of your script at all. The purpose of this variable is to tell read to break the stream at the EOL character \n.
-r: this is important. It tells read to not interpret the backslashes as escape sequences.
-d '': please note the space between the -d option and its argument ''. If you don't leave a space here, the '' will never be seen, as it will disappear in the quote removal step when Bash parses the statement. This tells read to stop reading at the nil byte. Some people write it as -d $'\0', but it is not really necessary. -d '' is better.
-a my_array tells read to populate the array my_array while reading the stream.
You must use the printf '\0' statement after my_command, so that read returns 0; it's actually not a big deal if you don't (you'll just get an return code 1, which is okay if you don't use set -e – which you shouldn't anyway), but just bear that in mind. It's cleaner and more semantically correct. Note that this is different from printf '', which doesn't output anything. printf '\0' prints a null byte, needed by read to happily stop reading there (remember the -d '' option?).
If you can, i.e., if you're sure your code will run on Bash≥4, use the first method. And you can see it's shorter too.
If you want to use read, the loop (method 2) might have an advantage over method 3 if you want to do some processing as the lines are read: you have direct access to it (via the $line variable in the example I gave), and you also have access to the lines already read (via the array ${my_array[#]} in the example I gave).
Note that mapfile provides a way to have a callback eval'd on each line read, and in fact you can even tell it to only call this callback every N lines read; have a look at help mapfile and the options -C and -c therein. (My opinion about this is that it's a little bit clunky, but can be used sometimes if you only have simple things to do — I don't really understand why this was even implemented in the first place!).
Now I'm going to tell you why the following method:
my_array=( $( my_command) )
is broken when there are spaces:
$ # I'm using this command to test:
$ echo "one two"; echo "three four"
one two
three four
$ # Now I'm going to use the broken method:
$ my_array=( $( echo "one two"; echo "three four" ) )
$ declare -p my_array
declare -a my_array='([0]="one" [1]="two" [2]="three" [3]="four")'
$ # As you can see, the fields are not the lines
$
$ # Now look at the correct method:
$ mapfile -t my_array < <(echo "one two"; echo "three four")
$ declare -p my_array
declare -a my_array='([0]="one two" [1]="three four")'
$ # Good!
Then some people will then recommend using IFS=$'\n' to fix it:
$ IFS=$'\n'
$ my_array=( $(echo "one two"; echo "three four") )
$ declare -p my_array
declare -a my_array='([0]="one two" [1]="three four")'
$ # It works!
But now let's use another command, with globs:
$ echo "* one two"; echo "[three four]"
* one two
[three four]
$ IFS=$'\n'
$ my_array=( $(echo "* one two"; echo "[three four]") )
$ declare -p my_array
declare -a my_array='([0]="* one two" [1]="t")'
$ # What?
That's because I have a file called t in the current directory… and this filename is matched by the glob [three four]… at this point some people would recommend using set -f to disable globbing: but look at it: you have to change IFS and use set -f to be able to fix a broken technique (and you're not even fixing it really)! when doing that we're really fighting against the shell, not working with the shell.
$ mapfile -t my_array < <( echo "* one two"; echo "[three four]")
$ declare -p my_array
declare -a my_array='([0]="* one two" [1]="[three four]")'
here we're working with the shell!
You can use
my_array=( $(<command>) )
to store the output of command <command> into the array my_array.
You can access the length of that array using
my_array_length=${#my_array[#]}
Now the length is stored in my_array_length.
Here is a simple example. Imagine that you are going to put the files and directory names (under the current folder) to an array and count them. The script would be like;
my_array=( `ls` )
my_array_length=${#my_array[#]}
echo $my_array_length
Or, you can iterate over this array by adding the following script:
for element in "${my_array[#]}"
do
echo "${element}"
done
Please note that this is the core concept and the input must be sanitized before the processing, i.e. removing extra characters, handling empty Strings, and etc. (which is out of the topic of this thread).
It helps me all the time suppose you want to copy whole list of directories into current directory into an array
bucketlist=($(ls))
#then print them one by one
for bucket in "${bucketlist[#]}"; do
echo " here is bucket: ${bucket}"
done

Resources