When do I set IFS to a newline in Bash? - arrays

I thought setting IFS to $'\n' would help me in reading an entire file into an array, as in:
IFS=$'\n' read -r -a array < file
However, the above command only reads the first line of the file into the first element of the array, and nothing else.
Even this reads only the first line into the array:
string=$'one\ntwo\nthree'
IFS=$'\n' read -r -a array <<< "$string"
I came across other posts on this site that talk about either using mapfile -t or a read loop to read a file into an array.
Now my question is: when do I use IFS=$'\n' at all?

You are a bit confused as to what IFS is. IFS is the Internal Field Separator used by bash to perform word-splitting to split lines into words after expansion. The default value is [ \t\n] (space, tab, newline).
By reassigning IFS=$'\n', you are removing the ' \t' and telling bash to only split words on newline characters (your thinking is correct). That has the effect of allowing some line with spaces to be read into a single array element without quoting.
Where your implementation fails is in your read -r -a array < file. The -a causes words in the line to be assigned to sequential array indexes. However, you have told bash to only break on a newline (which is the whole line). Since you only call read once, only one array index is filled.
You can either do:
while IFS=$'\n' read -r line; do
array+=( $line )
done < "$filename"
(which you could do without changing IFS if you simply quoted "$line")
Or using IFS=$'\n', you could do
IFS=$'\n'
array=( $(<filename) )
or finally, you could use IFS and readarray:
readarray array <filename
Try them and let me know if you have questions.

Your second try almost works, but you have to tell read that it should not just read until newline (the default behaviour), but for example until the null string:
$ IFS=$'\n' read -a arr -d '' <<< $'a b c\nd e f\ng h i'
$ declare -p arr
declare -a arr='([0]="a b c" [1]="d e f" [2]="g h i")'
But as you pointed out, mapfile/readarray is the way to go if you have it (requires Bash 4.0 or newer):
$ mapfile -t arr <<< $'a b c\nd e f\ng h i'
$ declare -p arr
declare -a arr='([0]="a b c" [1]="d e f" [2]="g h i")'
The -t option removes the newlines from each element.
As for when you'd want to use IFS=$'\n':
As just shown, if you want to read a files into an array, one line per element, if your Bash is older than 4.0, and you don't want to use a loop
Some people promote using an IFS without a space to avoid unexpected side effects from word splitting; the proper approach in my opinion, though, is to understand word splitting and make sure to avoid it with proper quoting as desired.
I've seen IFS=$'\n' used in tab completion scripts, for example the one for cd in bash-completion: this script fiddles with paths and replaces colons with newlines, to then split them up using that IFS.

Related

Creating an array of Strings from Grep Command

I'm pretty new to Linux and I've been trying some learning recently. One thing I'm struggling is Within a log file I would like to grep for all the unique IDs that exist and store them in an array.
The format of the ids are like so id=12345678,
I'm struggling though to get these in to an array. So far I've tried a range of things, the below however
a=($ (grep -HR1 `id=^[0-9]' logfile))
echo ${#a[#]}
but the echo count is always returned as 0. So it is clear the populating of the array is not working. Have explored other pages online, but nothing seems to have a clear explanation of what I am looking for exactly.
a=($(grep -Eow 'id=[0-9]+' logfile))
a=("${a[#]#id=}")
printf '%s\n' "${a[#]}"
It's safe to split an unquoted command substitution here, as we aren't printing pathname expansion characters (*?[]), or whitespace (other than the new lines which delimit the list).
If this were not the case, mapfile -t a <(grep ...) is a good alternative.
-E is extended regex (for +)
-o prints only matching text
-w matches a whole word only
${a[#]#id=} strips the id suffix from each array element
Here is an example
my_array=()
while IFS= read -r line; do
my_array+=( "$line" )
done < <( ls )
echo ${#my_array[#]}
printf '%s\n' "${my_array[#]}"
It prints out 14 and then the names of the 14 files in the same folder. Just substitute your command instead of ls and you started.
Suggesting readarray command to make sure it array reads full lines.
readarray -t my_array < <(grep -HR1 'id=^[0-9]' logfile)
printf "%s\n" "${my_array[#]}"

Read odd lines of a file text an add to a array in bash

I have a text file with with the following lines of text:
https://i.imgur.com/9iLSS35.png
Delete page: https://imgur.com/delete/SbmZQDht9Xk2NbW
https://i.imgur.com/9iLSS336.png
Delete page: https://imgur.com/delete/SbmZQDht9Xk2NbW
https://i.imgur.com/9iLSS37.png
Delete page: https://imgur.com/delete/SbmZQDht9Xk2NbW
I need to read the odd lines and add to a array in bash. I have tried the following code but it generates a variable and not an array as I need.
A=$(sed -n 1~2p ./upimagen.txt)
How can I read the odd lines of the text file and add each line to an array and then access separately?
The result would be:
${A[0]}
https://i.imgur.com/9iLSS35.png
${A[1]}
https://i.imgur.com/9iLSS336.png
...
Array assignment requires an outer set of parentheses.
A=($(sed -n 1~2p ./upimagen.txt))
This will split on all whitespace, though, not just newlines. It will also expand any globs that use * or ?. It's better to use readarray plus process substitution to ensure that each line is preserved as is:
readarray -t A < <(sed -n 1~2p ./upimagen.txt)
Just a slight variation on John's answer using awk. Using either readarray or the synonymous mapfile reading from a process substitution is probably your best bet, e.g.
readarray -t A < <(awk 'FNR % 2 {print $1}' ./upimagen.txt)
While there is nothing wrong with the direct initialization using command substitution, e.g. A=( $(...) ), if your files are large, or if you only need part of the range of lines, or say you have a custom function you want to run on every X number of elements read, mapfile/readarray offers a wide set of features over straight command substitution.
Using awk :
readarray -t A < <(awk 'NR%2' ./upimagen.txt)

Read file into array with empty lines

I'm using this code to load file into array in bash:
IFS=$'\n' read -d '' -r -a LINES < "$PAR1"
But unfortunately this code skips empty lines.
I tried the next code:
IFS=$'\n' read -r -a LINES < "$PAR1"
But this variant only loads one line.
How do I load file to array in bash, without skipping empty lines?
P.S. I check the number of loaded lines by the next command:
echo ${#LINES[#]}
You can use mapfile available in BASH 4+
mapfile -t lines < "$PAR1"
To avoid doing anything fancy, and stay compatible with all versions of bash in common use (as of this writing, Apple is shipping bash 3.2.x to avoid needing to comply with the GPLv3):
lines=( )
while IFS= read -r line; do
lines+=( "$line" )
done
See also BashFAQ #001.

Bash: read lines into an array *without* touching IFS

I'm trying to read the lines of output from a subshell into an array, and I'm not willing to set IFS because it's global. I don't want one part of the script to affect the following parts, because that's poor practice and I refuse to do it. Reverting IFS after the command is not an option because it's too much trouble to keep the reversion in the right place after editing the script. How can I explain to bash that I want each array element to contain an entire line, without having to set any global variables that will destroy future commands?
Here's an example showing the unwanted stickiness of IFS:
lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines without IFS"
IFS=$'\r\n' lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines with IFS"
lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines without IFS?"
The output is:
42 lines without IFS
6 lines with IFS
6 lines without IFS?
This question is probably based on a misconception.
IFS=foo read does not change IFS outside of the read operation itself.
Thus, this would have side effects, and should be avoided:
IFS=
declare -a array
while read -r; do
array+=( "$REPLY" )
done < <(your-subshell-here)
...but this is perfectly side-effect free:
declare -a array
while IFS= read -r; do
array+=( "$REPLY" )
done < <(your-subshell-here)
With bash 4.0 or newer, there's also the option of readarray or mapfile (synonyms for the same operation):
mapfile -t array < <(your-subshell-here)
In examples later added to your answer, you have code along the lines of:
lines=($(egrep "^-o" speccmds.cmd))
The better way to write this is:
mapfile -t lines < <(egrep "^-o" speccmds.cmd)
Are you trying to store the lines of the output in an array, or the words of each line?
lines
mapfile -t arrayname < <(your subshell)
This does not use IFS at all.
words
(your subshell) | while IFS=: read -ra words; do ...
The form var=value command args... puts the var variable into the environment of the command, and does not affect the current shell's environment.

How to create an array from the lines of a command's output

I have a file called failedfiles.txt with the following content:
failed1
failed2
failed3
I need to use grep to return the content on each line in that file, and save the output in a list to be accessed. So I want something like this:
temp_list=$(grep "[a-z]" failedfiles.txt)
However, the problem with this is that when I type
echo ${temp_list[0]}
I get the following output:
failed1 failed2 failed3
But what I want is when I do:
echo ${temp_list[0]}
to print
failed1
and when I do:
echo ${temp_list[1]}
to print
failed2
Thanks.
#devnull's helpful answer explains why your code didn't work as expected: command substitution always returns a single string (possibly composed of multiple lines).
However, simply putting (...) around a command substitution to create an array of lines will only work as expected if the lines output by the command do not have embedded spaces - otherwise, each individual (whitespace-separated) word will become its own array element.
Capturing command output lines at once, in an array:
To capture the lines output by an arbitrary command in an array, use the following:
bash < 4 (e.g., on OSX as of OS X 10.9.2): use read -a
IFS=$'\n' read -rd '' -a linesArray <<<"$(grep "[a-z]" failedfiles.txt)"
bash >= 4: use readarray:
readarray -t linesArray <<<"$(grep "[a-z]" failedfiles.txt)"
Note:
<<< initiates a so-called here-string, which pipes the string to its right (which happens to be the result of a command substitution here) into the command on the left via stdin.
While command <<< string is functionally equivalent to echo string | command in principle, the crucial difference is that the latter creates subshells, which make variable assignments in command pointless - they are localized to each subshell.
An alternative to combining here-strings with command substitution is [input] process substitution - <(...) - which, simply put, allows using a command's output as if it were an input file; the equivalent of <<<"$(command)" is < <(command).
read: -a reads into an array, and IFS=$'\n' ensures that every line is considered a separate field and thus read into its own array element; -d '' ensures that ALL lines are read at once (before breaking them into fields); -r turns interpretation of escape sequence in the input off.
readarray (also callable as mapfile) directly breaks input lines into an array of lines; -t ensures that the terminating \n is NOT included in the array elements.
Looping over command output lines:
If there is no need to capture all lines in an array at once and looping over a command's output line by line is sufficient, use the following:
while IFS= read -r line; do
# ...
done < <(grep "[a-z]" failedfiles.txt)
IFS= ensures that each line is read unmodified in terms of whitespace; remove it to have leading and trailing whitespace trimmed.
-r ensures that the lines are read 'raw' in that substrings in the input that look like escape sequences - e.g., \t - are NOT interpreted as such.
Note the use of [input] process substitution (explained above) to provide the command output as input to the read loop.
You did not create an array. What you did was Command Substitution which would simply put the output of a command into a variable.
In order to create an array, say:
temp_list=( $(grep "[a-z]" failedfiles.txt) )
You might also want to refer to Guide on Arrays.
The proper and portable way to loop over lines in a file is simply
while read -r line; do
... something with "$line"
done <failedfiles.txt

Resources