How to write an array ignoring space characters in shell scripting? - arrays

I have a text file which consists of say ..following information say test.text:
an
apple of
one's eye
I want to read these lines in an array using shell scripting by doing a cat test.text. I have tried using a=(`cat test.text`), but that doesn't work as it considers space as a delimiter. I need the values as a[0]=an , a[1]=apple of , a[2]=one's eye. I don't want to use IFS. Need help, thanks in advance..!!

In bash 4 or later
readarray a < test.text
This will include an empty element for each blank line, so you might want to remove the empty lines from the input file first.
In earlier versions, you'll need to build the array manually.
a=()
while read; do a+=("$REPLY"); done < test.text

One of various options you have is to use read with bash. Set IFS to the newline and line separator to NUL
IFS=$'\n' read -d $'\0' -a a < test.txt

Plain sh
IFS='
'
set -- $(< test.txt)
unset IFS
echo "$1"
echo "$2"
echo "$#"
bash
IFS=$'\n' a=($(< test.txt))
echo "${a[0]}"
echo "${a[1]}"
echo "${a[#]}"
I'm inclined to say these are the best of the available solutions because they do not involve looping.

Let's say:
cat file
an
apple of
one's eye
Use this while loop:
arr=()
while read -r l; do
[[ -n "$l" ]] && arr+=("$l")
done < file
TEST
set | grep arr
arr=([0]="an" [1]="apple of" [2]="one's eye")

Related

Creating an array of Strings from Grep Command

I'm pretty new to Linux and I've been trying some learning recently. One thing I'm struggling is Within a log file I would like to grep for all the unique IDs that exist and store them in an array.
The format of the ids are like so id=12345678,
I'm struggling though to get these in to an array. So far I've tried a range of things, the below however
a=($ (grep -HR1 `id=^[0-9]' logfile))
echo ${#a[#]}
but the echo count is always returned as 0. So it is clear the populating of the array is not working. Have explored other pages online, but nothing seems to have a clear explanation of what I am looking for exactly.
a=($(grep -Eow 'id=[0-9]+' logfile))
a=("${a[#]#id=}")
printf '%s\n' "${a[#]}"
It's safe to split an unquoted command substitution here, as we aren't printing pathname expansion characters (*?[]), or whitespace (other than the new lines which delimit the list).
If this were not the case, mapfile -t a <(grep ...) is a good alternative.
-E is extended regex (for +)
-o prints only matching text
-w matches a whole word only
${a[#]#id=} strips the id suffix from each array element
Here is an example
my_array=()
while IFS= read -r line; do
my_array+=( "$line" )
done < <( ls )
echo ${#my_array[#]}
printf '%s\n' "${my_array[#]}"
It prints out 14 and then the names of the 14 files in the same folder. Just substitute your command instead of ls and you started.
Suggesting readarray command to make sure it array reads full lines.
readarray -t my_array < <(grep -HR1 'id=^[0-9]' logfile)
printf "%s\n" "${my_array[#]}"

split more than one maching output of awk '{print $1}' and store it in variables [duplicate]

I need to read the output of a command in my script into an array. The command is, for example:
ps aux | grep | grep | x
and it gives the output line by line like this:
10
20
30
I need to read the values from the command output into an array, and then I will do some work if the size of the array is less than three.
The other answers will break if output of command contains spaces (which is rather frequent) or glob characters like *, ?, [...].
To get the output of a command in an array, with one line per element, there are essentially 3 ways:
With Bash≥4 use mapfile—it's the most efficient:
mapfile -t my_array < <( my_command )
Otherwise, a loop reading the output (slower, but safe):
my_array=()
while IFS= read -r line; do
my_array+=( "$line" )
done < <( my_command )
As suggested by Charles Duffy in the comments (thanks!), the following might perform better than the loop method in number 2:
IFS=$'\n' read -r -d '' -a my_array < <( my_command && printf '\0' )
Please make sure you use exactly this form, i.e., make sure you have the following:
IFS=$'\n' on the same line as the read statement: this will only set the environment variable IFS for the read statement only. So it won't affect the rest of your script at all. The purpose of this variable is to tell read to break the stream at the EOL character \n.
-r: this is important. It tells read to not interpret the backslashes as escape sequences.
-d '': please note the space between the -d option and its argument ''. If you don't leave a space here, the '' will never be seen, as it will disappear in the quote removal step when Bash parses the statement. This tells read to stop reading at the nil byte. Some people write it as -d $'\0', but it is not really necessary. -d '' is better.
-a my_array tells read to populate the array my_array while reading the stream.
You must use the printf '\0' statement after my_command, so that read returns 0; it's actually not a big deal if you don't (you'll just get an return code 1, which is okay if you don't use set -e – which you shouldn't anyway), but just bear that in mind. It's cleaner and more semantically correct. Note that this is different from printf '', which doesn't output anything. printf '\0' prints a null byte, needed by read to happily stop reading there (remember the -d '' option?).
If you can, i.e., if you're sure your code will run on Bash≥4, use the first method. And you can see it's shorter too.
If you want to use read, the loop (method 2) might have an advantage over method 3 if you want to do some processing as the lines are read: you have direct access to it (via the $line variable in the example I gave), and you also have access to the lines already read (via the array ${my_array[#]} in the example I gave).
Note that mapfile provides a way to have a callback eval'd on each line read, and in fact you can even tell it to only call this callback every N lines read; have a look at help mapfile and the options -C and -c therein. (My opinion about this is that it's a little bit clunky, but can be used sometimes if you only have simple things to do — I don't really understand why this was even implemented in the first place!).
Now I'm going to tell you why the following method:
my_array=( $( my_command) )
is broken when there are spaces:
$ # I'm using this command to test:
$ echo "one two"; echo "three four"
one two
three four
$ # Now I'm going to use the broken method:
$ my_array=( $( echo "one two"; echo "three four" ) )
$ declare -p my_array
declare -a my_array='([0]="one" [1]="two" [2]="three" [3]="four")'
$ # As you can see, the fields are not the lines
$
$ # Now look at the correct method:
$ mapfile -t my_array < <(echo "one two"; echo "three four")
$ declare -p my_array
declare -a my_array='([0]="one two" [1]="three four")'
$ # Good!
Then some people will then recommend using IFS=$'\n' to fix it:
$ IFS=$'\n'
$ my_array=( $(echo "one two"; echo "three four") )
$ declare -p my_array
declare -a my_array='([0]="one two" [1]="three four")'
$ # It works!
But now let's use another command, with globs:
$ echo "* one two"; echo "[three four]"
* one two
[three four]
$ IFS=$'\n'
$ my_array=( $(echo "* one two"; echo "[three four]") )
$ declare -p my_array
declare -a my_array='([0]="* one two" [1]="t")'
$ # What?
That's because I have a file called t in the current directory… and this filename is matched by the glob [three four]… at this point some people would recommend using set -f to disable globbing: but look at it: you have to change IFS and use set -f to be able to fix a broken technique (and you're not even fixing it really)! when doing that we're really fighting against the shell, not working with the shell.
$ mapfile -t my_array < <( echo "* one two"; echo "[three four]")
$ declare -p my_array
declare -a my_array='([0]="* one two" [1]="[three four]")'
here we're working with the shell!
You can use
my_array=( $(<command>) )
to store the output of command <command> into the array my_array.
You can access the length of that array using
my_array_length=${#my_array[#]}
Now the length is stored in my_array_length.
Here is a simple example. Imagine that you are going to put the files and directory names (under the current folder) to an array and count them. The script would be like;
my_array=( `ls` )
my_array_length=${#my_array[#]}
echo $my_array_length
Or, you can iterate over this array by adding the following script:
for element in "${my_array[#]}"
do
echo "${element}"
done
Please note that this is the core concept and the input must be sanitized before the processing, i.e. removing extra characters, handling empty Strings, and etc. (which is out of the topic of this thread).
It helps me all the time suppose you want to copy whole list of directories into current directory into an array
bucketlist=($(ls))
#then print them one by one
for bucket in "${bucketlist[#]}"; do
echo " here is bucket: ${bucket}"
done

Why doesn't this split string expression give me an array?

I am wondering why this array expression in Bash doesn't give me an array. It just gives me the first element in the string:
IFS='\n' read -r -a POSSIBLE_ENCODINGS <<< $(iconv -l)
I want to try out all available encodings to see how reading different file encodings for a script in R works, and I am using this Bash-script to create text-files with all possible encodings:
#!/bin/bash
IFS='\n' read -r -a POSSIBLE_ENCODINGS <<< $(iconv -l)
echo "${POSSIBLE_ENCODINGS[#]}"
for CURRENT_ENCODING in "${POSSIBLE_ENCODINGS[#]}"
do
TRIMMED=$(echo $CURRENT_ENCODING | sed 's:/*$::')
iconv --verbose --from-code=UTF-8 --to-code="$TRIMMED" --output=encoded-${TRIMMED}.txt first_file.txt
echo "Current encoding: ${TRIMMED}"
echo "Output file:encoded-${TRIMMED}.txt"
done
EDIT: Code edited according to answers below:
#!/bin/bash
readarray -t possibleEncodings <<< "$(iconv -l)"
echo "${possibleEncodings[#]}"
for currentEncoding in "${possibleEncodings[#]}"
do
trimmedEncoding=$(echo $currentEncoding | sed 's:/*$::')
echo "Trimmed encoding: ${trimmedEncoding}"
iconv --verbose --from-code=UTF-8 --to-code="$trimmedEncoding" --output=encoded-${trimmedEncoding}.txt first_file.txt
echo "Current encoding: ${trimmedEncoding}"
echo "Output file:encoded-${trimmedEncoding}.txt"
done
You could just readarray/mapfile instead which are tailor made for reading multi-line output into an array.
mapfile -t possibleEncodings < <(iconv -l)
The here-strings are useless, when you can just run the command in a process-substitution model. The <() puts the command output as if it appears on a file for mapfile to read from.
As for why your original attempt didn't work, you are just doing the read call once, but there is still strings to read in the subsequent lines. You either need to read till EOF in a loop or use the mapfile as above which does the job for you.
As a side-note always use lowercase letters for user defined variable/array and function names. This lets you distinguish your variables from the shell's own environment variables which are upper-cased.
because read reads only one line, following while can be used
arr=()
while read -r line; do
arr+=( "$line" )
done <<< "$(iconv -l)"
otherwise, there is also readarray builtin
readarray -t arr <<< "$(iconv -l)"

Creating an array from a text file in Bash

A script takes a URL, parses it for the required fields, and redirects its output to be saved in a file, file.txt. The output is saved on a new line each time a field has been found.
file.txt
A Cat
A Dog
A Mouse
etc...
I want to take file.txt and create an array from it in a new script, where every line gets to be its own string variable in the array. So far I have tried:
#!/bin/bash
filename=file.txt
declare -a myArray
myArray=(`cat "$filename"`)
for (( i = 0 ; i < 9 ; i++))
do
echo "Element [$i]: ${myArray[$i]}"
done
When I run this script, whitespace results in words getting split and instead of getting
Desired output
Element [0]: A Cat
Element [1]: A Dog
etc...
I end up getting this:
Actual output
Element [0]: A
Element [1]: Cat
Element [2]: A
Element [3]: Dog
etc...
How can I adjust the loop below such that the entire string on each line will correspond one-to-one with each variable in the array?
Use the mapfile command:
mapfile -t myArray < file.txt
The error is using for -- the idiomatic way to loop over lines of a file is:
while IFS= read -r line; do echo ">>$line<<"; done < file.txt
See BashFAQ/005 for more details.
mapfile and readarray (which are synonymous) are available in Bash version 4 and above. If you have an older version of Bash, you can use a loop to read the file into an array:
arr=()
while IFS= read -r line; do
arr+=("$line")
done < file
In case the file has an incomplete (missing newline) last line, you could use this alternative:
arr=()
while IFS= read -r line || [[ "$line" ]]; do
arr+=("$line")
done < file
Related:
Need alternative to readarray/mapfile for script on older version of Bash
You can do this too:
oldIFS="$IFS"
IFS=$'\n' arr=($(<file))
IFS="$oldIFS"
echo "${arr[1]}" # It will print `A Dog`.
Note:
Filename expansion still occurs. For example, if there's a line with a literal * it will expand to all the files in current folder. So use it only if your file is free of this kind of scenario.
Use mapfile or read -a
Always check your code using shellcheck. It will often give you the correct answer. In this case SC2207 covers reading a file that either has space separated or newline separated values into an array.
Don't do this
array=( $(mycommand) )
Files with values separated by newlines
mapfile -t array < <(mycommand)
Files with values separated by spaces
IFS=" " read -r -a array <<< "$(mycommand)"
The shellcheck page will give you the rationale why this is considered best practice.
You can simply read each line from the file and assign it to an array.
#!/bin/bash
i=0
while read line
do
arr[$i]="$line"
i=$((i+1))
done < file.txt
This answer says to use
mapfile -t myArray < file.txt
I made a shim for mapfile if you want to use mapfile on bash < 4.x for whatever reason. It uses the existing mapfile command if you are on bash >= 4.x
Currently, only options -d and -t work. But that should be enough for that command above. I've only tested on macOS. On macOS Sierra 10.12.6, the system bash is 3.2.57(1)-release. So the shim can come in handy. You can also just update your bash with homebrew, build bash yourself, etc.
It uses this technique to set variables up one call stack.
Make sure set the Internal File Separator (IFS)
variable to $'\n' so that it does not put each word
into a new array entry.
#!/bin/bash
# move all 2020 - 2022 movies to /backup/movies
# put list into file 1 line per dir
# dirs are "movie name (year)/"
ls | egrep 202[0-2] > 2020_movies.txt
OLDIFS=${IFS}
IFS=$'\n' #fix separator
declare -a MOVIES # array for dir names
MOVIES=( $( cat "${1}" ) ) // load into array
for M in ${MOVIES[#]} ; do
echo "[${M}]"
if [ -d "${M}" ] ; then # if dir name
mv -v "$M" /backup/movies/
fi
done
IFS=${OLDIFS} # restore standard separators
# not essential as IFS reverts when script ends
#END

Saving egrep output containing '*' to bash array

I would like to save egrep output to a bash array:
arr=( $(egrep -Rn 'regex') )
If there so happens to be a '*' in the egrep result, it appears like bash is expanding the '*' to be all files in current directory. And the expansion plus results of egrep are then saved into arr.
How do I fix this? I want the '*' in the grep results to be unaltered.
To use the idiom attempted in the question "correctly" might look something like this:
# DON'T DO THIS.
set -f # turn off globbing
IFS=$'\n' # word-split only on newlines
arr=( $(...) ) # populate array
unset IFS # return IFS to defaults (assuming it was there before)
set +f # turn globbing back on
Obviously, there's a lot of room to get this wrong and leave your shell in a state other than the way it started (What if your script had a different initial IFS value? What if this code is sourced from a script that wants globbing to be disabled to work correctly?). Don't do it.
One approach, compatible with bash 3.x, is to use read -a (reading into an array) with IFS (used to separate fields) containing a newline, and -d (used to separate records) set to a NUL:
IFS=$'\n' read -r -d '' -a arr < <(egrep -Rn 'regex' && printf '\0')
The trailing NUL added to the input is present to ensure that read exits successfully; otherwise, this could trigger an abrupt exit if using set -e.
A longer but more explicit approach is to do the iteration yourself:
arr=( )
while IFS= read -r; do
arr+=( "$REPLY" )
done < <(egrep -Rn 'regex')
Another, using bash 4.x features (readarray, AKA mapfile):
readarray -t arr < <(egrep -Rn 'regex')

Resources