Fill a bash array from a NUL separated input - arrays

I want to create a bash array from a NUL separated input (from stdin).
Here's an example:
## Let define this for clarity
$ hd() { hexdump -v -e '/1 "%02X "'; echo ;}
$ echo -en "A B\0C\nD\0E\0" | hd
41 20 42 00 43 0A 44 00 45 00
So this is my input.
Now, working with NUL works fine if not using the -a of read command:
$ while read -r -d '' v; do echo -n "$v" | hd; done < <(echo -en "A B\0C\nD\0E\0")
41 20 42
43 0A 44
45
We get the correct values. But I can't store these values using -a:
$ read -r -d '' -a arr < <(echo -en "A B\0C\nD\0E\0")
$ declare -p arr
declare -a arr='([0]="A" [1]="B")'
Which is obviously not what I wanted. I would like to have:
$ declare -p arr
declare -a arr='([0]="A B" [1]="C
D" [2]="E")'
Is there a way to go with read -a, and if it doesn't work, why? Do you know a simple way to do this (avoiding the while loop) ?

read -a is the wrong tool for the job, as you've noticed; it only supports non-NUL delimiters. The appropriate technique is given in BashFAQ #1:
arr=()
while IFS= read -r -d '' entry; do
arr+=( "$entry" )
done
In terms of why read -d '' -a is the wrong tool: -d gives read an argument to use to determine when to stop reading entirely, rather than when to stop reading a single element.
Consider:
while IFS=$'\t' read -d $'\n' words; do
...
done
...this will read words separated by tab characters, until it reaches a newline. Thus, even with read -a, using -d '' will read until it reaches a NUL.
What you want, to read until no more content is available and split by NULs, is not a '-d' of NUL, but no end-of-line character at all (and an empty IFS). This is not something read's usage currently makes available.

bash-4.4-alpha added a -d option to mapfile:
The `mapfile' builtin now has a -d option to use an arbitrary
character
as the record delimiter, and a -t option to strip the delimiter as
supplied with -d.
— https://tiswww.case.edu/php/chet/bash/CHANGES
Using this, we can simply write:
mapfile -t -d '' arr < <(echo -en "A B\0C\nD\0E\0")

If anyone wonders, here's the function (using while) that I use to store values from a NUL-separated stdin:
read_array () {
local i
var="$1"
i=0
while read -r -d '' value; do
printf -v "$var[$i]" "%s" "$value"
i=$[$i + 1]
done
}
It can then be used quite cleanly:
$ read_array arr < <(echo -en "A B\0C\nD\0E\0")
$ declare -p arr
declare -a arr='([0]="A B" [1]="C
D" [2]="E")'

Here's a simplification of #vaab's function. It uses bash 4.3's nameref feature:
read_array () {
local -n a=$1
while read -r -d '' value; do
a+=("$value")
done
}
Test:
test_it () {
local -a arr
read_array arr < <(echo -en "A B\0C\nD\0E\0")
declare -p arr
}
test_it

Related

read csv output into an array and process the variable in a loop using bash [duplicate]

This question already has answers here:
How to parse a CSV file in Bash?
(6 answers)
Closed last month.
This post was edited and submitted for review 28 days ago and failed to reopen the post:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
assuming i have an output/file
1,a,info
2,b,inf
3,c,in
I want to run a while loop with read
while read r ; do
echo "$r";
# extract line to $arr as array separated by ','
# call some program (e.g. md5sum, echo ...) on one item of arr
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
I would like to use readarray and while, but compelling alternatives are welcome too.
There is a specific way to have readarray (mapfile) behave correctly with process substitution, but i keep forgetting it. this is intended as a Q&A so an explanation would be nice
Since compelling alternatives are welcome too and assuming you're just trying to populate arr one line at a time:
$ cat tst.sh
#!/usr/bin/env bash
while IFS=',' read -a arr ; do
# extract line to $arr as array separated by ','
# echo the first item of arr
echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
$ ./tst.sh
1
2
3
or if you also need each whole input line in a separate variable r:
$ cat tst.sh
#!/usr/bin/env bash
while IFS= read -r r ; do
# extract line to $arr as array separated by ','
# echo the first item of arr
IFS=',' read -r -a arr <<< "$r"
echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
$ ./tst.sh
1
2
3
but bear in mind why-is-using-a-shell-loop-to-process-text-considered-bad-practice anyway.
readarray (mapfile) and read -a disambiguation
readarray == mapfile first:
help readarray
readarray: readarray [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
Read lines from a file into an array variable.
A synonym for `mapfile'.
Then
help mapfile
mapfile: mapfile [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
Read lines from the standard input into an indexed array variable.
Read lines from the standard input into the indexed array variable ARRAY, or
from file descriptor FD if the -u option is supplied. The variable MAPFILE
is the default ARRAY.
Options:
-d delim Use DELIM to terminate lines, instead of newline
-n count Copy at most COUNT lines. If COUNT is 0, all lines are copied
-O origin Begin assigning to ARRAY at index ORIGIN. The default index is 0
-s count Discard the first COUNT lines read
-t Remove a trailing DELIM from each line read (default newline)
-u fd Read lines from file descriptor FD instead of the standard input
-C callback Evaluate CALLBACK each time QUANTUM lines are read
-c quantum Specify the number of lines read between each call to
CALLBACK
...
While read -a:
help read
read: read [-ers] [-a array] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...]
Read a line from the standard input and split it into fields.
Reads a single line from the standard input, or from file descriptor FD
if the -u option is supplied. The line is split into fields as with word
splitting, and the first word is assigned to the first NAME, the second
word to the second NAME, and so on, with any leftover words assigned to
the last NAME. Only the characters found in $IFS are recognized as word
delimiters.
...
Options:
-a array assign the words read to sequential indices of the array
variable ARRAY, starting at zero
...
Note:
Only the characters found in $IFS are recognized as word delimiters.
Useful with -a flag!
Create an array from a splitted string
For creating an array by splitting a string you could either:
IFS=, read -ra myArray <<<'A,1,spaced string,42'
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
Oe use mapfile, but as this command is intented to work of whole files, syntax is something counter-intuitive:
mapfile -td, myArray < <(printf %s 'A,1,spaced string,42')
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
Or, if you want to avoid fork ( < <(printf... ), you have to
mapfile -td, myArray <<<'A,1,spaced string,42'
myArray[-1]=${myArray[-1]%$'\n'}
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
This will be a little quicker, but not more readable...
For you sample:
mapfile -t rows <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
for row in ${rows[#]};do
IFS=, read -a cols <<<"$row"
declare -p cols
done
declare -a cols=([0]="1" [1]="a" [2]="info")
declare -a cols=([0]="2" [1]="b" [2]="inf")
declare -a cols=([0]="3" [1]="c" [2]="in")
for row in ${rows[#]};do
IFS=, read -a cols <<<"$row"
printf ' %s | %s\n' "${cols[0]}" "${cols[2]}"
done
1 | info
2 | inf
3 | in
Or even, if really you want to use readarray:
for row in ${rows[#]};do
readarray -dt, cols <<<"$row"
cols[-1]=${cols[-1]%$'\n'}
declare -p cols
done
declare -a cols=([0]="1,a,info")
declare -a cols=([0]="2,b,inf")
declare -a cols=([0]="3,c,in")
Playing with callback option:
(Added some spaces on last line)
testfunc() {
local IFS array cnt line
read cnt line <<< "$#"
IFS=,
read -a array <<< "$line"
printf ' [%3d]: %3s | %3s :: %s\n' $cnt "${array[#]}"
}
mapfile -t -C testfunc -c 1 <<HEREDOC
1,a,info
2,b,inf
3,c d,in fo
HEREDOC
[ 0]: 1 | a :: info
[ 1]: 2 | b :: inf
[ 2]: 3 | c d :: in fo
Same, with -u flag:
Open the file descriptor:
exec {mydoc}<<HEREDOC
1,a,info
2,b,inf
3,c d,in fo
HEREDOC
Then
mapfile -u $mydoc -C testfunc -c 1
[ 0]: 1 | a :: info
[ 1]: 2 | b :: inf
[ 2]: 3 | c d :: in fo
And finally close the file descriptor:
exec {mydoc}<&-
About bash csv module,
For further informations about enable -f /path/to/csv csv, RFCs and limitations, have a look at my previous post about How to parse a CSV file in Bash?
If the loadable builtin csv is available/acceptable, something like:
help csv
csv: csv [-a ARRAY] string
Read comma-separated fields from a string.
Parse STRING, a line of comma-separated values, into individual fields,
and store them into the indexed array ARRAYNAME starting at index 0.
If ARRAYNAME is not supplied, "CSV" is the default array name.
The script.
#!/usr/bin/env bash
enable csv || exit
while IFS= read -r line && csv -a arr "$line"; do
printf '%s\n' "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
See help enable
With bash 5.2+ there is a default path for the loadables in config-top.h which should be configurable at compile time.
BASH_LOADABLES_PATH
The solution is readarray -t -d, arr < <(printf "%s," "$r")
The special part is < <(...) because readarray ....
there is no proper reason to be found why it first needs a redirection arrow and then process-substitution. Neither in tldp process-sub nor SS64 .
My final understanding is that, <(...) opens a named pipe and readarray is waiting for it to close. By moving this in place of a file behind < it is handled by bash as a file input and (anonymously) piped into stdin.
example:
while read r ; do
echo "$r";
readarray -t -d, arr < <(printf "%s," "$r");
echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
Anyway this is just a reminder for myself, because i keep forgetting and readarray is the only place where i actually need this.
The question was also answered mostly here, here why the pipe isn't working and somewhat here, but they are difficult to find and the reasoning to comprehend.
for example the shopt -s lastpipe solution is not clear at first, but it turns out that in bash all piped elements are usually not executed in the main shell, thus state changes have no effect on the full program. this command changes the behavior to have the last pipe element execute in main (except in an interactive shell)
shopt -s lastpipe;
while read r ; do
echo "$r";
printf "%s," "$r" | readarray -t -d, arr;
echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
one alternative to lastpipe would be to do all activity in the sub shell:
while read r ; do
echo "$r";
printf "%s," "$r" | {
readarray -t -d, arr ;
echo "${arr[0]}";
}
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

Reading csv string into bash array

The following function uses awk to convert a csv line to multiple lines. I then can assign the output to an array to be able to access the fields.
function csv_to_lines() {
echo $# | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")";}
{for(i=1; i<=NF; i++) {printf("%s\n", $i)}}'
}
line='A,B,"C,D",E'
arr=($(csv_to_lines $line))
printf '%s,' "${arr[#]}"
However, this doesn't work for empty fields. For example:
line='A,,,,,"C,D",E'
arr=($(csv_to_lines $line))
printf '%s,' "${arr[#]}"
Outputs
A,"C,D",E,
But I expected
A,,,,,"C,D",E,
Evidently, all empty lines are ignored when assigning to the array. How do I create an array that keeps the empty lines?
Current code:
$ line='A,,,,,"C,D",E'
$ csv_to_lines $line
A
"C,D"
E
Looking at the actual characters generated we see:
$ csv_to_lines $line | od -c
0000000 A \n \n \n \n \n " C , D " \n E \n
0000016
As is the arr=(...) is going to split this data on white space and store the printable characters in the array, effectively doing the same as:
$ arr=(A
"C,D"
E)
$ typeset -p arr
declare -a arr=([0]="A" [1]="C,D" [2]="E")
$ printf '%s,' "${arr[#]}"
A,"C,D",E,
A couple ideas for storing the 'blank lines' in the array:
Use mapfile to read each line into the array, eg:
$ mapfile -t arr < <(csv_to_lines $line)
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")
Or have awk use something other than \n as a delimiter, then define a custom IFS to parse the function results into the array, eg:
$ function csv_to_lines() { echo $# | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")";}
{for(i=1; i<=NF; i++) {printf("%s|", $i)}}'; }
$ csv_to_lines $line
A|||||"C,D"|E|
$ IFS='|' arr=($(csv_to_lines $line))
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")
Both of these lead to:
$ printf '%s,' "${arr[#]}"
A,,,,,"C,D",E,

Bash. Split text to array by delimiter [duplicate]

This question already has answers here:
How to split a string into an array in Bash?
(24 answers)
Closed 1 year ago.
Can somebody help me out. I want to split TEXT(variable with \n) into array in bash.
Ok, I have some text-variable:
variable='13423exa*lkco3nr*sw
kjenve*kejnv'
I want to split it in array.
If variable did not have new line in it, I will do it by:
IFS='*' read -a array <<< "$variable"
I assumed the third element should be:
echo "${array[2]}"
>sw
>kjenve
But with new line it is not working. Please give me right direction.
Use readarray.
$ variable='13423exa*lkco3nr*sw
kjenve*kejnv'
$ readarray -d '*' -t arr < <(printf "%s" "$variable")
$ declare -p arr
declare -a arr=([0]="13423exa" [1]="lkco3nr" [2]=$'sw\nkjenve' [3]="kejnv")
mapfile: -d: invavlid option
Update bash, then use readarray.
If not, replace separator with zero byte and read it element by element with read -d ''.
arr=()
while IFS= read -d '' -r e || [[ -n "$e" ]]; do
arr+=("$e")
done < <(printf "%s" "$variable" | tr '*' '\0');
declare -p arr
declare -a arr=([0]="13423exa" [1]="lkco3nr" [2]=$'sw\nkjenve' [3]="kejnv")
You can use the readarray command and use it like in the following example:
readarray -d ':' -t my_array <<< "a:b:c:d:"
for (( i = 0; i < ${#my_array[*]}; i++ )); do
echo "${my_array[i]}"
done
Where the -d parameter defines the delimiter and -t ask to remove last delimiter.
Use a ending character different than new line
end=.
read -a array -d "$end" <<< "$v$end"
Of course this solution suppose there is at least one charecter not used in your input variable.

Parse variables from string and add them to an array with Bash

In Bash, how can I get the strings between acolades (without the '_value' suffix) from for example
"\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*"
and put them into an array?
The result for the above example should be something like:
var_array=("host_name" "host_domain")
The string could also contain other stuff such as:
"${package_updates_count_value} ${package_updates_type_value} updates"
The result for the above example should be something like:
var_array=("package_updates_count" "package_updates_type")
All variables end with _value. There could 1 or more variables in the string.
Not sure what would be the most efficient way and how I'd best handle this. Regex? Sed?
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
# would also work with cat input or the like.
myarray=($(echo "$input" | awk -F'$' \
'{for(i=1;i<=NF;i++) {match($i, /{([^}]*)_value}/, a); print a[1]}}'))
Split your line(s) on $. Check if a column contains { }. If it does, print what's after { and before _value}. (If not, it will print out the empty string, which bash array creation will ignore.)
If there are only two variables, this will work.
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
first=$(echo $input | sed -r -e 's/[}].+//' -e 's/.+[{]//')
last=$(echo $input | sed -r -e 's/.+[{]//' -e 's/[}].+//')
output="var_array=(\"$first\" \"$last\")"
Maybe not very efficient and beautiful, but it works well.
Starting with a string variable:
$ str='\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*'
Use grep -o to print all matching words.
$ grep -o '\${\w*_value}' <<< "$str"
${host_name_value}
${host_domain_value}
${host_ip_value}
Then remove ${ and _value}.
$ grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//'
host_name
host_domain
host_ip
Finally, use readarray to safely read the results into an array.
$ readarray -t var_array < <(grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//')
$ declare -p var_array
declare -a var_array=([0]="host_name" [1]="host_domain" [2]="host_ip")

convert JSON array to bash array preserving whitespaces

I want to transform JSON file into bash array of strings that i will later be able to iterate over. My JSON structure is as follows:
[
{
"USERID": "TMCCP",
"CREATED_DATE": "31/01/2020 17:52"
},
{
"USERID": "TMCCP",
"CREATED_DATE": "31/01/2020 17:52"
}
]
And this is my bash script:
test_cases=($(jq -c '.[]' data.json))
echo ${test_cases[0]}
echo ${test_cases[1]}
echo ${test_cases[2]}
echo ${test_cases[3]}
As you can see it returns array with 4 elements instead of 2. Output:
{"USERID":"TMCCP","CREATED_DATE":"31/01/2020
17:52"}
{"USERID":"TMCCP","CREATED_DATE":"31/01/2020
17:52"}
For some reason having whitespace in date field causes some parsing issues. Any idea how to get over this?
Use readarray instead.
$ readarray -t test_cases < <(jq -c '.[]' file)
$ declare -p test_cases
declare -a test_cases=([0]="{\"USERID\":\"TMCCP\",\"CREATED_DATE\":\"31/01/2020 17:52\"}" [1]="{\"USERID\":\"TMCCP\",\"CREATED_DATE\":\"31/01/2020 17:52\"}")
And read can be used as shown below where readarray is unavailable.
IFS=$'\n' read -d '' -a test_cases < <(jq -c '.[]' file)
Use readarray to populate the array, rather than using an unquoted command substitution; bash doesn't care about JSON quoting when it splits the result into separate words.
readarray -t test_cases < <(jq -c '.[]' data.json)
In bash 3.2 (which is what you appear to be stuck with), you need something slightly more unwieldy
while IFS= read -r line; do
test_cases+=("$line")
done < <(jq -c '.[]' data.json)

Resources