This question already has answers here:
How to parse a CSV file in Bash?
(6 answers)
Closed last month.
This post was edited and submitted for review 28 days ago and failed to reopen the post:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
assuming i have an output/file
1,a,info
2,b,inf
3,c,in
I want to run a while loop with read
while read r ; do
echo "$r";
# extract line to $arr as array separated by ','
# call some program (e.g. md5sum, echo ...) on one item of arr
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
I would like to use readarray and while, but compelling alternatives are welcome too.
There is a specific way to have readarray (mapfile) behave correctly with process substitution, but i keep forgetting it. this is intended as a Q&A so an explanation would be nice
Since compelling alternatives are welcome too and assuming you're just trying to populate arr one line at a time:
$ cat tst.sh
#!/usr/bin/env bash
while IFS=',' read -a arr ; do
# extract line to $arr as array separated by ','
# echo the first item of arr
echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
$ ./tst.sh
1
2
3
or if you also need each whole input line in a separate variable r:
$ cat tst.sh
#!/usr/bin/env bash
while IFS= read -r r ; do
# extract line to $arr as array separated by ','
# echo the first item of arr
IFS=',' read -r -a arr <<< "$r"
echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
$ ./tst.sh
1
2
3
but bear in mind why-is-using-a-shell-loop-to-process-text-considered-bad-practice anyway.
readarray (mapfile) and read -a disambiguation
readarray == mapfile first:
help readarray
readarray: readarray [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
Read lines from a file into an array variable.
A synonym for `mapfile'.
Then
help mapfile
mapfile: mapfile [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
Read lines from the standard input into an indexed array variable.
Read lines from the standard input into the indexed array variable ARRAY, or
from file descriptor FD if the -u option is supplied. The variable MAPFILE
is the default ARRAY.
Options:
-d delim Use DELIM to terminate lines, instead of newline
-n count Copy at most COUNT lines. If COUNT is 0, all lines are copied
-O origin Begin assigning to ARRAY at index ORIGIN. The default index is 0
-s count Discard the first COUNT lines read
-t Remove a trailing DELIM from each line read (default newline)
-u fd Read lines from file descriptor FD instead of the standard input
-C callback Evaluate CALLBACK each time QUANTUM lines are read
-c quantum Specify the number of lines read between each call to
CALLBACK
...
While read -a:
help read
read: read [-ers] [-a array] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...]
Read a line from the standard input and split it into fields.
Reads a single line from the standard input, or from file descriptor FD
if the -u option is supplied. The line is split into fields as with word
splitting, and the first word is assigned to the first NAME, the second
word to the second NAME, and so on, with any leftover words assigned to
the last NAME. Only the characters found in $IFS are recognized as word
delimiters.
...
Options:
-a array assign the words read to sequential indices of the array
variable ARRAY, starting at zero
...
Note:
Only the characters found in $IFS are recognized as word delimiters.
Useful with -a flag!
Create an array from a splitted string
For creating an array by splitting a string you could either:
IFS=, read -ra myArray <<<'A,1,spaced string,42'
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
Oe use mapfile, but as this command is intented to work of whole files, syntax is something counter-intuitive:
mapfile -td, myArray < <(printf %s 'A,1,spaced string,42')
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
Or, if you want to avoid fork ( < <(printf... ), you have to
mapfile -td, myArray <<<'A,1,spaced string,42'
myArray[-1]=${myArray[-1]%$'\n'}
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
This will be a little quicker, but not more readable...
For you sample:
mapfile -t rows <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
for row in ${rows[#]};do
IFS=, read -a cols <<<"$row"
declare -p cols
done
declare -a cols=([0]="1" [1]="a" [2]="info")
declare -a cols=([0]="2" [1]="b" [2]="inf")
declare -a cols=([0]="3" [1]="c" [2]="in")
for row in ${rows[#]};do
IFS=, read -a cols <<<"$row"
printf ' %s | %s\n' "${cols[0]}" "${cols[2]}"
done
1 | info
2 | inf
3 | in
Or even, if really you want to use readarray:
for row in ${rows[#]};do
readarray -dt, cols <<<"$row"
cols[-1]=${cols[-1]%$'\n'}
declare -p cols
done
declare -a cols=([0]="1,a,info")
declare -a cols=([0]="2,b,inf")
declare -a cols=([0]="3,c,in")
Playing with callback option:
(Added some spaces on last line)
testfunc() {
local IFS array cnt line
read cnt line <<< "$#"
IFS=,
read -a array <<< "$line"
printf ' [%3d]: %3s | %3s :: %s\n' $cnt "${array[#]}"
}
mapfile -t -C testfunc -c 1 <<HEREDOC
1,a,info
2,b,inf
3,c d,in fo
HEREDOC
[ 0]: 1 | a :: info
[ 1]: 2 | b :: inf
[ 2]: 3 | c d :: in fo
Same, with -u flag:
Open the file descriptor:
exec {mydoc}<<HEREDOC
1,a,info
2,b,inf
3,c d,in fo
HEREDOC
Then
mapfile -u $mydoc -C testfunc -c 1
[ 0]: 1 | a :: info
[ 1]: 2 | b :: inf
[ 2]: 3 | c d :: in fo
And finally close the file descriptor:
exec {mydoc}<&-
About bash csv module,
For further informations about enable -f /path/to/csv csv, RFCs and limitations, have a look at my previous post about How to parse a CSV file in Bash?
If the loadable builtin csv is available/acceptable, something like:
help csv
csv: csv [-a ARRAY] string
Read comma-separated fields from a string.
Parse STRING, a line of comma-separated values, into individual fields,
and store them into the indexed array ARRAYNAME starting at index 0.
If ARRAYNAME is not supplied, "CSV" is the default array name.
The script.
#!/usr/bin/env bash
enable csv || exit
while IFS= read -r line && csv -a arr "$line"; do
printf '%s\n' "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
See help enable
With bash 5.2+ there is a default path for the loadables in config-top.h which should be configurable at compile time.
BASH_LOADABLES_PATH
The solution is readarray -t -d, arr < <(printf "%s," "$r")
The special part is < <(...) because readarray ....
there is no proper reason to be found why it first needs a redirection arrow and then process-substitution. Neither in tldp process-sub nor SS64 .
My final understanding is that, <(...) opens a named pipe and readarray is waiting for it to close. By moving this in place of a file behind < it is handled by bash as a file input and (anonymously) piped into stdin.
example:
while read r ; do
echo "$r";
readarray -t -d, arr < <(printf "%s," "$r");
echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
Anyway this is just a reminder for myself, because i keep forgetting and readarray is the only place where i actually need this.
The question was also answered mostly here, here why the pipe isn't working and somewhat here, but they are difficult to find and the reasoning to comprehend.
for example the shopt -s lastpipe solution is not clear at first, but it turns out that in bash all piped elements are usually not executed in the main shell, thus state changes have no effect on the full program. this command changes the behavior to have the last pipe element execute in main (except in an interactive shell)
shopt -s lastpipe;
while read r ; do
echo "$r";
printf "%s," "$r" | readarray -t -d, arr;
echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
one alternative to lastpipe would be to do all activity in the sub shell:
while read r ; do
echo "$r";
printf "%s," "$r" | {
readarray -t -d, arr ;
echo "${arr[0]}";
}
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
With the following code:
string="[Git status]^functionGitStatus"
IFS='^' read -r -a array <<< "$string"
echo "${array[#]}"
echo "size: '${#array[#]}'"
for e in "${array[#]}"; do
echo "'$e'"
done
It works as expected and shows:
[Git status] functionGitStatus
size: '2'
'[Git status]'
'functionGitStatus'
I did do a research and for example at
How to split a string into an array in Bash?
is possible use , (command with empty space)
If I try to accomplish the same approach:
string="[Git status]^ functionGitStatus"
IFS='^ ' read -r -a array <<< "$string"
echo "${array[#]}"
echo "size: '${#array[#]}'"
for e in "${array[#]}"; do
echo "'$e'"
done
I got:
[Git status] functionGitStatus
size: '3'
'[Git'
'status]'
'functionGitStatus'
sadly not as expected (even when only exists one ocurrence with ^ )
I want to know if is possible to use a complete word-term as separator for example:
string="[Git status]-fn:-functionGitStatus"
IFS='-fn:-' read -r -a array <<< "$string"
echo "${array[#]}"
echo "size: '${#array[#]}'"
for e in "${array[#]}"; do
echo "'$e'"
done
But it shows:
[Git status] u ctio GitStatus
size: '9'
'[Git status]'
''
''
''
''
''
'u'
'ctio'
'GitStatus'
Seems is not possible or perhaps there is a special flag to interpret how complete word-term. If is not possible what other function would help in this scenario?
With bash parameter expansion and mapfile:
$ string='[Git status]-fn:-functionGitStatus'
$ mapfile -t array <<< "${string//-fn:-/$'\n'}"
$ echo "${array[#]}"
[Git status] functionGitStatus
$ echo "size: '${#array[#]}'"
2
$ for e in "${array[#]}"; do echo "'$e'"; done
'[Git status]'
'functionGitStatus'
Explanation: "${string//-fn:-/$'\n'}" is a bash parameter expansion with substitution: all (because // instead of /) -fn:- substrings in string are replaced by $'\n', that is, a newline (the $'...' syntax is documented in the QUOTING section of bash manual). <<< is the here-string redirection operator. It "feeds" the mapfilecommand with the result of the parameter expansion with substitution. mapfile -t array (also documented in the bash manual) stores its input in the bash array named array, one line per cell, removing the trailing newline character (-t option).
This question already has answers here:
How to split a string into an array in Bash?
(24 answers)
Closed 1 year ago.
Can somebody help me out. I want to split TEXT(variable with \n) into array in bash.
Ok, I have some text-variable:
variable='13423exa*lkco3nr*sw
kjenve*kejnv'
I want to split it in array.
If variable did not have new line in it, I will do it by:
IFS='*' read -a array <<< "$variable"
I assumed the third element should be:
echo "${array[2]}"
>sw
>kjenve
But with new line it is not working. Please give me right direction.
Use readarray.
$ variable='13423exa*lkco3nr*sw
kjenve*kejnv'
$ readarray -d '*' -t arr < <(printf "%s" "$variable")
$ declare -p arr
declare -a arr=([0]="13423exa" [1]="lkco3nr" [2]=$'sw\nkjenve' [3]="kejnv")
mapfile: -d: invavlid option
Update bash, then use readarray.
If not, replace separator with zero byte and read it element by element with read -d ''.
arr=()
while IFS= read -d '' -r e || [[ -n "$e" ]]; do
arr+=("$e")
done < <(printf "%s" "$variable" | tr '*' '\0');
declare -p arr
declare -a arr=([0]="13423exa" [1]="lkco3nr" [2]=$'sw\nkjenve' [3]="kejnv")
You can use the readarray command and use it like in the following example:
readarray -d ':' -t my_array <<< "a:b:c:d:"
for (( i = 0; i < ${#my_array[*]}; i++ )); do
echo "${my_array[i]}"
done
Where the -d parameter defines the delimiter and -t ask to remove last delimiter.
Use a ending character different than new line
end=.
read -a array -d "$end" <<< "$v$end"
Of course this solution suppose there is at least one charecter not used in your input variable.
I'd like to either process one row of a csv file or the whole file.
The variables are set by the header row, which may be in any order.
There may be up to 12 columns, but only 3 or 4 variables are needed.
The source files might be in either format, and all I want from both is lastname and country. I know of many different ways and tools to do it if the columns were fixed and always in the same order. But they're not.
examplesource.csv:
firstname,lastname,country
Linus,Torvalds,Finland
Linus,van Pelt,USA
examplesource2.csv:
lastname,age,country
Torvalds,66,Finland
van Pelt,7,USA
I have cobbled together something from various Stackoverflow postings which looks a bit voodoo but seems fairly robust. I say "voodoo" because shellcheck complains that, for example, "firstname is referenced but not assigned". And yet it prints it.
#!/bin/bash
#set the field seperator to newline
IFS=$'\n'
#split/transpose the first-line column titles to rows
COLUMNAMES=$(head -n1 examplesource.csv | tr ',' '\n')
#set an array and read the columns into it
columns=()
for line in $COLUMNAMES; do
columns+=("$line")
done
#reset the field seperator
IFS=","
#using -p here to debug in output
declare -ap columns
#read from line 2 onwards
sed 1d examplesource.csv | while read "${columns[#]}"; do
echo "${firstname} ${lastname} is from ${country}"
done
In the case of looping through everything, it works perfectly for my needs and I can process within the "while read" loop. But to make it cleaner, I'd rather pass the current element(?) to an external function to process (not just echo).
And if I only wanted the array (current row) belonging to "Torvalds", I cannot find how to access that or even get its current index, eg: "if $wantedname && $lastname == $wantedname then call function with currentrow only otherwise loop all rows and call function".
I know there aren't multidimensional associative arrays in bash from reading
Multidimensional associative arrays in Bash and I've tried to understand arrays from
https://opensource.com/article/18/5/you-dont-know-bash-intro-bash-arrays
Is it clear what I'm trying to achieve in a bash-only manner and does the question make sense?
Many thanks.
Let's short your function. Don't read the source twice (first with head then with sed). You can do that once. Also the whole array reading can be shorten to just IFS=',' COLUMNAMES=($(head -n1 source.csv)). Here's a shorter version:
#!/bin/bash
cat examplesource.csv |
{
IFS=',' read -r -a columnnames
while IFS=',' read -r "${columnnames[#]}"; do
echo "${firstname} ${lastname} is from ${country}"
done
}
If you want to parse both files and the same time, ie. join them, nothing simpler ;). First, let's number lines in the first file using nl -w1 -s,. Then we use join to join the files on the name of the people. Remember that join input needs to be sort-ed using proper fields. Then we sort the output with sort using the number from the first file. After that we can read all the data just like that:
# join the files, using `,` as the seaprator
# on the 3rd field from the first file and the first field from the second file
# the output should be first the fields from the first file, then the second file
# the country (field 1.4) is duplicated in 2.3, so just omiting it.
join -t, -13 -21 -o 1.1,1.2,1.3,2.2,2.3 <(
# number the lines in the first file
<examplesource.csv nl -w1 -s, |
# there is one field more, sort using the 3rd field
sort -t, -k3
) <(
# sort the second file using the first field
<examplesource2.csv sort -t, -k1
) |
# sort the output using the numbers from the first file
sort -t, -k1 -n |
# well, remove the numbers
cut -d, -f2- |
# just a normal read follows
{
# read the headers
IFS=, read -r -a names
while IFS=, read -r "${names[#]}"; do
# finally out output!
echo "${firstname} ${lastname} is from ${country} and is so many ${age} years old!"
done
}
Tested on tutorialspoint.
GNU Awk has multidimensional arrays. It also has array sorting mechanisms, which I have not used here. Please comment if you are interested in pursuing this solution further. The following depends on consistent key names and line numbers across input files, but can handle an arbitrary number of fields and input files.
$ gawk -V |gawk NR==1
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
$ gawk -F, '
FNR == 1 {for(f=1;f<=NF;f++) Key[f]=$f}
FNR != 1 {for(f=1;f<=NF;f++) People[FNR][Key[f]]=$f}
END {
for(Person in People) {
for(attribute in People[Person])
output = output FS People[Person][attribute]
print substr(output,2)
output=""
}
}
' file*
66,Finland,Linus,Torvalds
7,USA,Linus,van Pelt
A bash solution takes a bit more work than an awk solution, but if this is an exercise over what bash provides, it provides all you need to handle determining the column holding the last name from the first line of input and then outputting the lastname from the remaining lines.
An easy approach is simply to read each line into a normal array and then loop over the elements of the first line to locate the column "lastname" appears in saving the column in a variable. You can then read each of the remaining lines the same way and output the lastname field by outputting the element at the saved column.
A short example would be:
#!/bin/bash
col=0 ## column count for lastname
cnt=0 ## line count
while IFS=',' read -a arr; do ## read each line into array
if [ "$cnt" -eq '0' ]; then ## test if line-count is zero
for ((i = 0; i < "${#arr[#]}"; i++)); do ## loop for lastname
[ "${arr[i]}" = 'lastname' ] && ## test for lastname
{ col=i; break; } ## if found set cos = 1, break loop
done
fi
[ "$cnt" -gt '0' ] && ## if not headder row
echo "line $cnt lastname: ${arr[col]}" ## output lastname variable
((cnt++)) ## increment linecount
done < "$1"
Example Use/Output
Using your two files data files, the output would be:
$ bash readcsv.sh ex1.csv
line 1 lastname: Torvalds
line 2 lastname: van Pelt
$ bash readcsv.sh ex2.csv
line 1 lastname: Torvalds
line 2 lastname: van Pelt
A similar implementation using awk would be:
awk -F, -v col=1 '
NR == 1 {
for (i in FN) {
if (i = "lastname") next
}
col++
}
NR > 1 {
print "lastname: ", $col
}
' ex1.csv
Example Use/Output
$ awk -F, -v col=1 'NR == 1 { for (i in FN) { if (i = "lastname") next } col++ } NR > 1 {print "lastname: ", $col }' ex1.csv
lastname: Torvalds
lastname: van Pelt
(output is the same for either file)
Thank you all. I've taken a couple of bits from two answers
I used the answer from David to find the number of the row, then I used the elegantly simple solution from Kamil at to loop through what I need.
The result is exactly what I wanted. Thank you all.
$ readexample.sh examplesource.csv "Torvalds"
Everyone
Linus Torvalds is from Finland
Linus van Pelt is from USA
now just Torvalds
Linus Torvalds is from Finland
And this is the code - now that you know what I want it to do, if anyone can see any dangers or improvements, please let me know as I'm always learning. Thanks.
#!/bin/bash
FILENAME="$1"
WANTED="$2"
printDetails() {
SINGLEROW="$1"
[[ ! -z "$SINGLEROW" ]] && opt=("--expression" "1p" "--expression" "${SINGLEROW}p") || opt=("--expression" "1p" "--expression" "2,199p")
sed -n "${opt[#]}" "$FILENAME" |
{
IFS=',' read -r -a columnnames
while IFS=',' read -r "${columnnames[#]}"; do
echo "${firstname} ${lastname} is from ${country}"
done
}
}
findRow() {
col=0 ## column count for lastname
cnt=0 ## line count
while IFS=',' read -a arr; do ## read each line into array
if [ "$cnt" -eq '0' ]; then ## test if line-count is zero
for ((i = 0; i < "${#arr[#]}"; i++)); do ## loop for lastname
[ "${arr[i]}" = 'lastname' ] && ## test for lastname
{
col=i
break
} ## if found set cos = 1, break loop
done
fi
[ "$cnt" -gt '0' ] && ## if not headder row
if [ "${arr[col]}" == "$1" ]; then
echo "$cnt" ## output lastname variable
fi
((cnt++)) ## increment linecount
done <"$FILENAME"
}
echo "Everyone"
printDetails
if [ ! -z "${WANTED}" ]; then
echo -e "\nnow just ${WANTED}"
row=$(findRow "${WANTED}")
printDetails "$((row + 1))"
fi
I have a file from which I extract the first three columns using the cut command and write them into an array.
When I check the length of the array , it is giving me four. I need the array to have only 3 elements.
I think it's taking space as the delimiter for array elements.
aaa|111|ADAM|1222|aauu
aaa|222|MIKE ALLEN|5678|gggg
aaa|333|JOE|1222|eeeee
target=($(cut -d '|' -f1-3 sample_file2.txt| sort -u ))
In bash 4 or later, use readarray with process substitution to populate the array. As is, your code cannot distinguish between the whitespace separating each line in the output from the whitespace occurring in "Mike Allen". The readarray command puts each line of the input into a separate array element.
readarray -t target < <(cut -d '|' -f1-3 sample_file2.txt| sort -u)
Prior to bash 4, you need a loop to read each line individually to assign to the array.
while IFS='' read -r line; do
target+=("$line")
done < <(cut -d '|' -f1-3 sample_file2.txt | sort -u)
This should work:
IFS=$'\n' target=($(cut -d '|' -f1-3 sample_file2.txt| sort -u ))
Example:
#!/bin/bash
IFS=$'\n' target=($(cut -d '|' -f1-3 sample_file2.txt| sort -u ))
echo ${#target[#]}
echo "${target[1]}"
Output:
3
aaa|222|MIKE ALLEN
As an alternative, using the infamous eval,
eval target=($(cut -sd '|' -f1-3 sample_file2.txt | sort -u | \
xargs -d\\n printf "'%s'\n"))