Reading csv string into bash array - arrays

The following function uses awk to convert a csv line to multiple lines. I then can assign the output to an array to be able to access the fields.
function csv_to_lines() {
echo $# | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")";}
{for(i=1; i<=NF; i++) {printf("%s\n", $i)}}'
}
line='A,B,"C,D",E'
arr=($(csv_to_lines $line))
printf '%s,' "${arr[#]}"
However, this doesn't work for empty fields. For example:
line='A,,,,,"C,D",E'
arr=($(csv_to_lines $line))
printf '%s,' "${arr[#]}"
Outputs
A,"C,D",E,
But I expected
A,,,,,"C,D",E,
Evidently, all empty lines are ignored when assigning to the array. How do I create an array that keeps the empty lines?

Current code:
$ line='A,,,,,"C,D",E'
$ csv_to_lines $line
A
"C,D"
E
Looking at the actual characters generated we see:
$ csv_to_lines $line | od -c
0000000 A \n \n \n \n \n " C , D " \n E \n
0000016
As is the arr=(...) is going to split this data on white space and store the printable characters in the array, effectively doing the same as:
$ arr=(A
"C,D"
E)
$ typeset -p arr
declare -a arr=([0]="A" [1]="C,D" [2]="E")
$ printf '%s,' "${arr[#]}"
A,"C,D",E,
A couple ideas for storing the 'blank lines' in the array:
Use mapfile to read each line into the array, eg:
$ mapfile -t arr < <(csv_to_lines $line)
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")
Or have awk use something other than \n as a delimiter, then define a custom IFS to parse the function results into the array, eg:
$ function csv_to_lines() { echo $# | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")";}
{for(i=1; i<=NF; i++) {printf("%s|", $i)}}'; }
$ csv_to_lines $line
A|||||"C,D"|E|
$ IFS='|' arr=($(csv_to_lines $line))
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")
Both of these lead to:
$ printf '%s,' "${arr[#]}"
A,,,,,"C,D",E,

Related

How to parse and convert string list to JSON string array in shell command?

How to parse and convert string list to JSON string array in shell command?
'["test1","test2","test3"]'
to
test1
test2
test3
I tried like below:
string=$1
array=${string#"["}
array=${array%"]"}
IFS=',' read -a array <<< $array;
echo "${array[#]}"
Any other optimized way?
As bash and jq are tagged, this solution relies on both (without summoning eval). The input string is expected to be in $string, the output array is generated into ${array[#]}. It is robust wrt spaces, newlines, quotes, etc. as it uses NUL as delimiter.
mapfile -d '' array < <(jq -j '.[] + "\u0000"' <<< "$string")
Testing
string='["has spaces\tand tabs","has a\nnewline","has \"quotes\""]'
mapfile -d '' array < <(jq -j '.[] + "\u0000"' <<< "$string")
printf '==>%s<==\n' "${array[#]}"
==>has spaces and tabs<==
==>has a
newline<==
==>has "quotes"<==
eval "array=($( jq -r 'map( #sh ) | join(" ")' <<<"$json" ))"

Bash. Split text to array by delimiter [duplicate]

This question already has answers here:
How to split a string into an array in Bash?
(24 answers)
Closed 1 year ago.
Can somebody help me out. I want to split TEXT(variable with \n) into array in bash.
Ok, I have some text-variable:
variable='13423exa*lkco3nr*sw
kjenve*kejnv'
I want to split it in array.
If variable did not have new line in it, I will do it by:
IFS='*' read -a array <<< "$variable"
I assumed the third element should be:
echo "${array[2]}"
>sw
>kjenve
But with new line it is not working. Please give me right direction.
Use readarray.
$ variable='13423exa*lkco3nr*sw
kjenve*kejnv'
$ readarray -d '*' -t arr < <(printf "%s" "$variable")
$ declare -p arr
declare -a arr=([0]="13423exa" [1]="lkco3nr" [2]=$'sw\nkjenve' [3]="kejnv")
mapfile: -d: invavlid option
Update bash, then use readarray.
If not, replace separator with zero byte and read it element by element with read -d ''.
arr=()
while IFS= read -d '' -r e || [[ -n "$e" ]]; do
arr+=("$e")
done < <(printf "%s" "$variable" | tr '*' '\0');
declare -p arr
declare -a arr=([0]="13423exa" [1]="lkco3nr" [2]=$'sw\nkjenve' [3]="kejnv")
You can use the readarray command and use it like in the following example:
readarray -d ':' -t my_array <<< "a:b:c:d:"
for (( i = 0; i < ${#my_array[*]}; i++ )); do
echo "${my_array[i]}"
done
Where the -d parameter defines the delimiter and -t ask to remove last delimiter.
Use a ending character different than new line
end=.
read -a array -d "$end" <<< "$v$end"
Of course this solution suppose there is at least one charecter not used in your input variable.

Parse variables from string and add them to an array with Bash

In Bash, how can I get the strings between acolades (without the '_value' suffix) from for example
"\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*"
and put them into an array?
The result for the above example should be something like:
var_array=("host_name" "host_domain")
The string could also contain other stuff such as:
"${package_updates_count_value} ${package_updates_type_value} updates"
The result for the above example should be something like:
var_array=("package_updates_count" "package_updates_type")
All variables end with _value. There could 1 or more variables in the string.
Not sure what would be the most efficient way and how I'd best handle this. Regex? Sed?
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
# would also work with cat input or the like.
myarray=($(echo "$input" | awk -F'$' \
'{for(i=1;i<=NF;i++) {match($i, /{([^}]*)_value}/, a); print a[1]}}'))
Split your line(s) on $. Check if a column contains { }. If it does, print what's after { and before _value}. (If not, it will print out the empty string, which bash array creation will ignore.)
If there are only two variables, this will work.
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
first=$(echo $input | sed -r -e 's/[}].+//' -e 's/.+[{]//')
last=$(echo $input | sed -r -e 's/.+[{]//' -e 's/[}].+//')
output="var_array=(\"$first\" \"$last\")"
Maybe not very efficient and beautiful, but it works well.
Starting with a string variable:
$ str='\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*'
Use grep -o to print all matching words.
$ grep -o '\${\w*_value}' <<< "$str"
${host_name_value}
${host_domain_value}
${host_ip_value}
Then remove ${ and _value}.
$ grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//'
host_name
host_domain
host_ip
Finally, use readarray to safely read the results into an array.
$ readarray -t var_array < <(grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//')
$ declare -p var_array
declare -a var_array=([0]="host_name" [1]="host_domain" [2]="host_ip")

Appending a character to every element except the last one of a bash array

I am looking to append each element of a bash array with | except the last one.
array=("element1" "element2" "element3")
My desired output would be
array=("element1"|"element2"|"element3")
What I have done
for i in ${!array[#]};
do
array1+=( "${array[$i]}|" )
done
Followed by
array=echo ${array1[#]}|sed 's/|$//'
Is there any other looping approach I can use that only appends the character until the last but one element?
For your special case, where only one single character has to be inserted between each array member, the simplest solution is probably to use array expansion (and changing the separation character IFS according to your need beforehand):
$ array=("element1" "element2" "element3")
$ array=( $( IFS="|" ; echo "${array[*]}") )
$ echo "\$array[0] is '${array[0]}'"
$array[0] is 'element1|element2|element3'
You can use:
# append | all except last element
read -ra array < <( printf "%s|" "${array[#]:0:$((${#array[#]} - 1))}"; echo "${array[#]: -1}"; )
# Check array content now
declare -p array
declare -a array='([0]="element1|element2|element3")'
"${array[#]:0:$((${#array[#]} - 1))}" will get all but last element of array
"${array[#]: -1}" will get the last element of array
printf "%s|" will append | in front of all the arguments
< <(...) is process substitution to read output of any command from stdin
read -ra will read the input in an array
$ array=("element1" "element2" "element3")
$ printf -v str "|%s" "${array[#]}"
$ array=("${str:1}")
$ declare -p array
declare -a array='([0]="element1|element2|element3")'
The printf statement creates a string str that contains |element1|element2|element3, i.e., one more | than we want (at the beginning).
The next statement uses substring parameter expansion, ${str:1}, to skip the first character and reassigns to array, which now consists of a single element.
A simple solution (if you don't mind changing IFS) is:
$ array=("element1" "element2" "element3")
$ IFS="|"; printf "%s\n" "${array[*]}"
And to re-assign to the variable array (which doesn't change IFS):
$ array=("$(IFS="|"; printf "%s\n" "${array[*]}")")
$ printf '%s\n' "${array[#]}"
element1|element2|element3
An alternative solution is:
$ array=($(printf '%s|' "${array[#]}")); array="${array%?}"
$ printf '%s\n' "${array}"
A more complex solution (script) is:
array=("element1" "element2" "element3")
delimiter='|'
unset newarr
for val in "${array[#]}"
do newarr=$newarr${newarr+"$delimiter"}$val
done
array=("$newarr")
echo "array=($array)"

Fill a bash array from a NUL separated input

I want to create a bash array from a NUL separated input (from stdin).
Here's an example:
## Let define this for clarity
$ hd() { hexdump -v -e '/1 "%02X "'; echo ;}
$ echo -en "A B\0C\nD\0E\0" | hd
41 20 42 00 43 0A 44 00 45 00
So this is my input.
Now, working with NUL works fine if not using the -a of read command:
$ while read -r -d '' v; do echo -n "$v" | hd; done < <(echo -en "A B\0C\nD\0E\0")
41 20 42
43 0A 44
45
We get the correct values. But I can't store these values using -a:
$ read -r -d '' -a arr < <(echo -en "A B\0C\nD\0E\0")
$ declare -p arr
declare -a arr='([0]="A" [1]="B")'
Which is obviously not what I wanted. I would like to have:
$ declare -p arr
declare -a arr='([0]="A B" [1]="C
D" [2]="E")'
Is there a way to go with read -a, and if it doesn't work, why? Do you know a simple way to do this (avoiding the while loop) ?
read -a is the wrong tool for the job, as you've noticed; it only supports non-NUL delimiters. The appropriate technique is given in BashFAQ #1:
arr=()
while IFS= read -r -d '' entry; do
arr+=( "$entry" )
done
In terms of why read -d '' -a is the wrong tool: -d gives read an argument to use to determine when to stop reading entirely, rather than when to stop reading a single element.
Consider:
while IFS=$'\t' read -d $'\n' words; do
...
done
...this will read words separated by tab characters, until it reaches a newline. Thus, even with read -a, using -d '' will read until it reaches a NUL.
What you want, to read until no more content is available and split by NULs, is not a '-d' of NUL, but no end-of-line character at all (and an empty IFS). This is not something read's usage currently makes available.
bash-4.4-alpha added a -d option to mapfile:
The `mapfile' builtin now has a -d option to use an arbitrary
character
as the record delimiter, and a -t option to strip the delimiter as
supplied with -d.
— https://tiswww.case.edu/php/chet/bash/CHANGES
Using this, we can simply write:
mapfile -t -d '' arr < <(echo -en "A B\0C\nD\0E\0")
If anyone wonders, here's the function (using while) that I use to store values from a NUL-separated stdin:
read_array () {
local i
var="$1"
i=0
while read -r -d '' value; do
printf -v "$var[$i]" "%s" "$value"
i=$[$i + 1]
done
}
It can then be used quite cleanly:
$ read_array arr < <(echo -en "A B\0C\nD\0E\0")
$ declare -p arr
declare -a arr='([0]="A B" [1]="C
D" [2]="E")'
Here's a simplification of #vaab's function. It uses bash 4.3's nameref feature:
read_array () {
local -n a=$1
while read -r -d '' value; do
a+=("$value")
done
}
Test:
test_it () {
local -a arr
read_array arr < <(echo -en "A B\0C\nD\0E\0")
declare -p arr
}
test_it

Resources