Read file into array with empty lines - arrays

I'm using this code to load file into array in bash:
IFS=$'\n' read -d '' -r -a LINES < "$PAR1"
But unfortunately this code skips empty lines.
I tried the next code:
IFS=$'\n' read -r -a LINES < "$PAR1"
But this variant only loads one line.
How do I load file to array in bash, without skipping empty lines?
P.S. I check the number of loaded lines by the next command:
echo ${#LINES[#]}

You can use mapfile available in BASH 4+
mapfile -t lines < "$PAR1"

To avoid doing anything fancy, and stay compatible with all versions of bash in common use (as of this writing, Apple is shipping bash 3.2.x to avoid needing to comply with the GPLv3):
lines=( )
while IFS= read -r line; do
lines+=( "$line" )
done
See also BashFAQ #001.

Related

Why doesn't this split string expression give me an array?

I am wondering why this array expression in Bash doesn't give me an array. It just gives me the first element in the string:
IFS='\n' read -r -a POSSIBLE_ENCODINGS <<< $(iconv -l)
I want to try out all available encodings to see how reading different file encodings for a script in R works, and I am using this Bash-script to create text-files with all possible encodings:
#!/bin/bash
IFS='\n' read -r -a POSSIBLE_ENCODINGS <<< $(iconv -l)
echo "${POSSIBLE_ENCODINGS[#]}"
for CURRENT_ENCODING in "${POSSIBLE_ENCODINGS[#]}"
do
TRIMMED=$(echo $CURRENT_ENCODING | sed 's:/*$::')
iconv --verbose --from-code=UTF-8 --to-code="$TRIMMED" --output=encoded-${TRIMMED}.txt first_file.txt
echo "Current encoding: ${TRIMMED}"
echo "Output file:encoded-${TRIMMED}.txt"
done
EDIT: Code edited according to answers below:
#!/bin/bash
readarray -t possibleEncodings <<< "$(iconv -l)"
echo "${possibleEncodings[#]}"
for currentEncoding in "${possibleEncodings[#]}"
do
trimmedEncoding=$(echo $currentEncoding | sed 's:/*$::')
echo "Trimmed encoding: ${trimmedEncoding}"
iconv --verbose --from-code=UTF-8 --to-code="$trimmedEncoding" --output=encoded-${trimmedEncoding}.txt first_file.txt
echo "Current encoding: ${trimmedEncoding}"
echo "Output file:encoded-${trimmedEncoding}.txt"
done
You could just readarray/mapfile instead which are tailor made for reading multi-line output into an array.
mapfile -t possibleEncodings < <(iconv -l)
The here-strings are useless, when you can just run the command in a process-substitution model. The <() puts the command output as if it appears on a file for mapfile to read from.
As for why your original attempt didn't work, you are just doing the read call once, but there is still strings to read in the subsequent lines. You either need to read till EOF in a loop or use the mapfile as above which does the job for you.
As a side-note always use lowercase letters for user defined variable/array and function names. This lets you distinguish your variables from the shell's own environment variables which are upper-cased.
because read reads only one line, following while can be used
arr=()
while read -r line; do
arr+=( "$line" )
done <<< "$(iconv -l)"
otherwise, there is also readarray builtin
readarray -t arr <<< "$(iconv -l)"

When do I set IFS to a newline in Bash?

I thought setting IFS to $'\n' would help me in reading an entire file into an array, as in:
IFS=$'\n' read -r -a array < file
However, the above command only reads the first line of the file into the first element of the array, and nothing else.
Even this reads only the first line into the array:
string=$'one\ntwo\nthree'
IFS=$'\n' read -r -a array <<< "$string"
I came across other posts on this site that talk about either using mapfile -t or a read loop to read a file into an array.
Now my question is: when do I use IFS=$'\n' at all?
You are a bit confused as to what IFS is. IFS is the Internal Field Separator used by bash to perform word-splitting to split lines into words after expansion. The default value is [ \t\n] (space, tab, newline).
By reassigning IFS=$'\n', you are removing the ' \t' and telling bash to only split words on newline characters (your thinking is correct). That has the effect of allowing some line with spaces to be read into a single array element without quoting.
Where your implementation fails is in your read -r -a array < file. The -a causes words in the line to be assigned to sequential array indexes. However, you have told bash to only break on a newline (which is the whole line). Since you only call read once, only one array index is filled.
You can either do:
while IFS=$'\n' read -r line; do
array+=( $line )
done < "$filename"
(which you could do without changing IFS if you simply quoted "$line")
Or using IFS=$'\n', you could do
IFS=$'\n'
array=( $(<filename) )
or finally, you could use IFS and readarray:
readarray array <filename
Try them and let me know if you have questions.
Your second try almost works, but you have to tell read that it should not just read until newline (the default behaviour), but for example until the null string:
$ IFS=$'\n' read -a arr -d '' <<< $'a b c\nd e f\ng h i'
$ declare -p arr
declare -a arr='([0]="a b c" [1]="d e f" [2]="g h i")'
But as you pointed out, mapfile/readarray is the way to go if you have it (requires Bash 4.0 or newer):
$ mapfile -t arr <<< $'a b c\nd e f\ng h i'
$ declare -p arr
declare -a arr='([0]="a b c" [1]="d e f" [2]="g h i")'
The -t option removes the newlines from each element.
As for when you'd want to use IFS=$'\n':
As just shown, if you want to read a files into an array, one line per element, if your Bash is older than 4.0, and you don't want to use a loop
Some people promote using an IFS without a space to avoid unexpected side effects from word splitting; the proper approach in my opinion, though, is to understand word splitting and make sure to avoid it with proper quoting as desired.
I've seen IFS=$'\n' used in tab completion scripts, for example the one for cd in bash-completion: this script fiddles with paths and replaces colons with newlines, to then split them up using that IFS.

Creating an array from a text file in Bash

A script takes a URL, parses it for the required fields, and redirects its output to be saved in a file, file.txt. The output is saved on a new line each time a field has been found.
file.txt
A Cat
A Dog
A Mouse
etc...
I want to take file.txt and create an array from it in a new script, where every line gets to be its own string variable in the array. So far I have tried:
#!/bin/bash
filename=file.txt
declare -a myArray
myArray=(`cat "$filename"`)
for (( i = 0 ; i < 9 ; i++))
do
echo "Element [$i]: ${myArray[$i]}"
done
When I run this script, whitespace results in words getting split and instead of getting
Desired output
Element [0]: A Cat
Element [1]: A Dog
etc...
I end up getting this:
Actual output
Element [0]: A
Element [1]: Cat
Element [2]: A
Element [3]: Dog
etc...
How can I adjust the loop below such that the entire string on each line will correspond one-to-one with each variable in the array?
Use the mapfile command:
mapfile -t myArray < file.txt
The error is using for -- the idiomatic way to loop over lines of a file is:
while IFS= read -r line; do echo ">>$line<<"; done < file.txt
See BashFAQ/005 for more details.
mapfile and readarray (which are synonymous) are available in Bash version 4 and above. If you have an older version of Bash, you can use a loop to read the file into an array:
arr=()
while IFS= read -r line; do
arr+=("$line")
done < file
In case the file has an incomplete (missing newline) last line, you could use this alternative:
arr=()
while IFS= read -r line || [[ "$line" ]]; do
arr+=("$line")
done < file
Related:
Need alternative to readarray/mapfile for script on older version of Bash
You can do this too:
oldIFS="$IFS"
IFS=$'\n' arr=($(<file))
IFS="$oldIFS"
echo "${arr[1]}" # It will print `A Dog`.
Note:
Filename expansion still occurs. For example, if there's a line with a literal * it will expand to all the files in current folder. So use it only if your file is free of this kind of scenario.
Use mapfile or read -a
Always check your code using shellcheck. It will often give you the correct answer. In this case SC2207 covers reading a file that either has space separated or newline separated values into an array.
Don't do this
array=( $(mycommand) )
Files with values separated by newlines
mapfile -t array < <(mycommand)
Files with values separated by spaces
IFS=" " read -r -a array <<< "$(mycommand)"
The shellcheck page will give you the rationale why this is considered best practice.
You can simply read each line from the file and assign it to an array.
#!/bin/bash
i=0
while read line
do
arr[$i]="$line"
i=$((i+1))
done < file.txt
This answer says to use
mapfile -t myArray < file.txt
I made a shim for mapfile if you want to use mapfile on bash < 4.x for whatever reason. It uses the existing mapfile command if you are on bash >= 4.x
Currently, only options -d and -t work. But that should be enough for that command above. I've only tested on macOS. On macOS Sierra 10.12.6, the system bash is 3.2.57(1)-release. So the shim can come in handy. You can also just update your bash with homebrew, build bash yourself, etc.
It uses this technique to set variables up one call stack.
Make sure set the Internal File Separator (IFS)
variable to $'\n' so that it does not put each word
into a new array entry.
#!/bin/bash
# move all 2020 - 2022 movies to /backup/movies
# put list into file 1 line per dir
# dirs are "movie name (year)/"
ls | egrep 202[0-2] > 2020_movies.txt
OLDIFS=${IFS}
IFS=$'\n' #fix separator
declare -a MOVIES # array for dir names
MOVIES=( $( cat "${1}" ) ) // load into array
for M in ${MOVIES[#]} ; do
echo "[${M}]"
if [ -d "${M}" ] ; then # if dir name
mv -v "$M" /backup/movies/
fi
done
IFS=${OLDIFS} # restore standard separators
# not essential as IFS reverts when script ends
#END

Bash: read lines into an array *without* touching IFS

I'm trying to read the lines of output from a subshell into an array, and I'm not willing to set IFS because it's global. I don't want one part of the script to affect the following parts, because that's poor practice and I refuse to do it. Reverting IFS after the command is not an option because it's too much trouble to keep the reversion in the right place after editing the script. How can I explain to bash that I want each array element to contain an entire line, without having to set any global variables that will destroy future commands?
Here's an example showing the unwanted stickiness of IFS:
lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines without IFS"
IFS=$'\r\n' lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines with IFS"
lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines without IFS?"
The output is:
42 lines without IFS
6 lines with IFS
6 lines without IFS?
This question is probably based on a misconception.
IFS=foo read does not change IFS outside of the read operation itself.
Thus, this would have side effects, and should be avoided:
IFS=
declare -a array
while read -r; do
array+=( "$REPLY" )
done < <(your-subshell-here)
...but this is perfectly side-effect free:
declare -a array
while IFS= read -r; do
array+=( "$REPLY" )
done < <(your-subshell-here)
With bash 4.0 or newer, there's also the option of readarray or mapfile (synonyms for the same operation):
mapfile -t array < <(your-subshell-here)
In examples later added to your answer, you have code along the lines of:
lines=($(egrep "^-o" speccmds.cmd))
The better way to write this is:
mapfile -t lines < <(egrep "^-o" speccmds.cmd)
Are you trying to store the lines of the output in an array, or the words of each line?
lines
mapfile -t arrayname < <(your subshell)
This does not use IFS at all.
words
(your subshell) | while IFS=: read -ra words; do ...
The form var=value command args... puts the var variable into the environment of the command, and does not affect the current shell's environment.

How to write an array ignoring space characters in shell scripting?

I have a text file which consists of say ..following information say test.text:
an
apple of
one's eye
I want to read these lines in an array using shell scripting by doing a cat test.text. I have tried using a=(`cat test.text`), but that doesn't work as it considers space as a delimiter. I need the values as a[0]=an , a[1]=apple of , a[2]=one's eye. I don't want to use IFS. Need help, thanks in advance..!!
In bash 4 or later
readarray a < test.text
This will include an empty element for each blank line, so you might want to remove the empty lines from the input file first.
In earlier versions, you'll need to build the array manually.
a=()
while read; do a+=("$REPLY"); done < test.text
One of various options you have is to use read with bash. Set IFS to the newline and line separator to NUL
IFS=$'\n' read -d $'\0' -a a < test.txt
Plain sh
IFS='
'
set -- $(< test.txt)
unset IFS
echo "$1"
echo "$2"
echo "$#"
bash
IFS=$'\n' a=($(< test.txt))
echo "${a[0]}"
echo "${a[1]}"
echo "${a[#]}"
I'm inclined to say these are the best of the available solutions because they do not involve looping.
Let's say:
cat file
an
apple of
one's eye
Use this while loop:
arr=()
while read -r l; do
[[ -n "$l" ]] && arr+=("$l")
done < file
TEST
set | grep arr
arr=([0]="an" [1]="apple of" [2]="one's eye")

Resources