Honoring quotes while reading shell arguments from a file

Honoring quotes while reading shell arguments from a file - arrays

In bash, I can pass quoted arguments to a command like this:
$ printf '[%s]\n' 'hello world'
[hello world]
But I can't get it to work right if the argument is coming from a subshell:
$ cat junk
'hello world'
$ printf '[%s]\n' $(cat junk)
['hello]
[world']
Or:
$ cat junk
hello world
$ printf '[%s]\n' $(cat junk)
[hello]
[world]
Or:
$ cat junk
hello\ world
$ printf '[%s]\n' $(cat junk)
[hello\]
[world]
How do I do this correctly?
EDIT: The solution also needs to handle this case:
$ printf '[%s]\n' abc 'hello world'
[abc]
[hello world]
So this solution doesn't work:
$ cat junk
abc 'hello world'
$ printf '[%s]\n' "$(cat junk)"
[abc 'hello world']
The question at Bash quoting issue has been suggested as a duplicate. However, it isn't clear how to apply its accepted answer; the following fails:
$ cat junk
abc 'hello world'
$ FOO=($(cat junk))
$ printf '[%s]\n' "${FOO[#]}"
[abc]
['hello]
[world']

There's no one good solution here, but you can choose between bad ones.
This answer requires changing the file format:
Using a NUL-delimited stream for the file is the safest approach; literally any C string (thus, any string bash can store as an array element) can be written and read in this manner.
# write file as a NUL-delimited stream
printf '%s\0' abc 'hello world' >junk
# read file as an array
foo=( )
while IFS= read -r -d '' entry; do
foo+=( "$entry" )
done <junk
If valid arguments can't contain newlines, you may wish to leave out the -d '' on the reading side and change the \0 on the writing side to \n to use newlines instead of NULs. Note that UNIX filenames can contain newlines, so if your possible arguments include filenames, this approach would be unwise.
This answer almost implements shell-like parsing semantics:
foo=( )
while IFS= read -r -d '' entry; do
foo+=( "$entry" )
done < <(xargs printf '%s\0' <junk)
xargs has some corner cases surrounding multi-line strings where its parsing isn't quite identical to how a shell does. It's a 99% solution, however.
This answer requires a Python interpreter:
The Python standard library shlex module supports POSIX-compliant string tokenization which is more true to the standard than that implemented by xargs. Note that bash/ksh extensions such as $'foo' are not honored.
shlex_split() {
python -c '
import shlex, sys
for item in shlex.split(sys.stdin.read()):
sys.stdout.write(item + "\0")
'
}
while IFS= read -r -d '' entry; do
foo+=( "$entry" )
done < <(shlex_split <junk)
These answers pose a security risk:
...specifically, if the contents of junk can be written to contain shell-sensitive code (like $(rm -rf /)), you don't want to use either of them:
# use declare
declare "foo=($(cat junk))"
# ...or use eval directly
eval "foo=( $(cat junk) )"
If you want to be sure that foo is written in a way that's safe to read in this way, and you control the code that writes to it, consider:
# write foo array to junk in an eval-safe way, if it contains at least one element
{ printf '%q ' "${foo[#]}" && printf '\n'; } >junk;
Alternately, you could use:
# write a command which, when evaluated, will recreate the variable foo
declare -p foo >junk
and:
# run all commands in the file junk
source junk

Related

Put lines of a text file in an array in bash

I'm taking over a bash script from a colleague that reads a file, process it and print another file based on the line in the while loop at the moment.
I now need to append some features to it. The one I'm having issues with right now is to read a file and put each line into an array, except the 2nd column of that line can be empty, e.g.:
For a text file with \t as separator:
A\tB\tC
A\t\tC
For a CSV file same but with , as separator:
A,B,C
A,,C
Which should then give
["A","B","C"] or ["A", "", "C"]
The code I took over is as follow:
while IFS=$'\t\r' read -r -a col; do
# Process the array, put that into a file
lp -d $printer $file_to_print
done < $input_file
Which works if B is filled, but B need to be empty now sometimes, so when the input files keeps it empty, the created array and thus the output file to print just skips this empty cell (array is then ["A","C"]).
I tried writing the whole bloc on awk but this brought it's own sets of problems, making it difficult to call the lp command to print.
So my question is, how can I preserve the empty cell from the line into my bash array, so that I can call on it later and use it?
Thank you very much. I know this might be quite confused so please ask and I'll specify.
Edit: After request, here's the awk code I've tried. The issue here is that it only prints the last print request, while I know it loops over the whole file, and the lp command is still in the loop.
awk 'BEGIN {
inputfile="'"${optfile}"'"
outputfile="'"${file_loc}"'"
printer="'"${printer}"'"
while (getline < inputfile){
print "'"${prefix}"'" > outputfile
split($0,ft,"'"${IFSseps}"'");
if (length(ft[2]) == 0){
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"\"" >> outputfile
size_changer = 0
} else {
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"_"ft[2]"\"" >> outputfile
size_changer = 1
}
if ( split($0,ft,"'"${IFSseps}"'") > 6)
maxcounter = 6;
else
maxcounter = split($0,ft,"'"${IFSseps}"'");
for (i = 3; i <= maxcounter; i++){
x=191-(i-2)*33
print "CODEPAGE 1252\nTEXT 465,"x",\"ROMAN.TTF\",180,7,7,\""ft[i]"\"" >> outputfile
}
print "PRINT ""'"${copies}"'"",1" >> outputfile
close(outputfile)
"'"`lp -d ${printer} ${file_loc}`"'"
}
close("'"${file_loc}"'");
}'
EDIT2: Continuing to try to find a solution to it, I tried following code without success. This is weird, as just doing printf without putting it in an array keeps the formatting intact.
$ cat testinput | tr '\t' '>'
A>B>C
A>>C
# Should normally be empty on the second ouput line
$ while read line; do IFS=$'\t' read -ra col < <(printf "$line"); echo ${col[1]}; done < testinput
B
C

For tab, it's complicated.
From 3.5.7 Word Splitting in the manual:
A sequence of IFS whitespace characters is also treated as a delimiter.
Since tab is an "IFS whitespace character", sequences of tabs are treated as a single delimiter
IFS=$'\t' read -ra ary <<<$'A\t\tC'
declare -p ary
declare -a ary=([0]="A" [1]="C")
What you can do is translate tabs to a non-whitespace character, assuming it does not clash with the actual data in the fields:
line=$'A\t\tC'
IFS=, read -ra ary <<<"${line//$'\t'/,}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")
To avoid the risk of colliding with commas in the data, we can use an unusual ASCII character: FS, octal 034
line=$'A\t\tC'
printf -v FS '\034'
IFS="$FS" read -ra ary <<<"${line//$'\t'/"$FS"}"
# or, without the placeholder variable
IFS=$'\034' read -ra ary <<<"${line//$'\t'/$'\034'}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")

One bash example using parameter expansion where we convert the delimiter into a \n and let mapfile read in each line as a new array entry ...
For tab-delimited data:
for line in $'A\tB\tC' $'A\t\tC'
do
mapfile -t array <<< "${line//$'\t'/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A B C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: The $'...' construct insures the \t is treated as a single <tab> character as opposed to the two literal characters \ + t.
For comma-delimited data:
for line in 'A,B,C' 'A,,C'
do
mapfile -t array <<< "${line//,/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A,B,C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A,,C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: This obviously (?) assumes the desired data does not contain a comma (,).

It may just be your # Process the array, put that into a file part.
IFS=, read -ra ray <<< "A,,C"
for e in "${ray[#]}"; do o="$o\"$e\","; done
echo "[${o%,}]"
["A","","C"]
See #Glenn's excellent answer regarding tabs.
My simple data file:
$: cat x # tab delimited, empty field 2 of line 2
a b c
d f
My test:
while IFS=$'\001' read -r a b c; do
echo "a:[$a] b:[$b] c:[$c]"
done < <(tr "\t" "\001"<x)
a:[a] b:[b] c:[c]
a:[d] b:[] c:[f]
Note that I used ^A (a 001 byte) but you might be able to use something as simple as a comma or pipe (|) character. Choose based on your data.

Parse variables from string and add them to an array with Bash

In Bash, how can I get the strings between acolades (without the '_value' suffix) from for example
"\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*"
and put them into an array?
The result for the above example should be something like:
var_array=("host_name" "host_domain")
The string could also contain other stuff such as:
"${package_updates_count_value} ${package_updates_type_value} updates"
The result for the above example should be something like:
var_array=("package_updates_count" "package_updates_type")
All variables end with _value. There could 1 or more variables in the string.
Not sure what would be the most efficient way and how I'd best handle this. Regex? Sed?

input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
# would also work with cat input or the like.
myarray=($(echo "$input" | awk -F'$' \
'{for(i=1;i<=NF;i++) {match($i, /{([^}]*)_value}/, a); print a[1]}}'))
Split your line(s) on $. Check if a column contains { }. If it does, print what's after { and before _value}. (If not, it will print out the empty string, which bash array creation will ignore.)

If there are only two variables, this will work.
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
first=$(echo $input | sed -r -e 's/[}].+//' -e 's/.+[{]//')
last=$(echo $input | sed -r -e 's/.+[{]//' -e 's/[}].+//')
output="var_array=(\"$first\" \"$last\")"
Maybe not very efficient and beautiful, but it works well.

Starting with a string variable:
$ str='\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*'
Use grep -o to print all matching words.
$ grep -o '\${\w*_value}' <<< "$str"
${host_name_value}
${host_domain_value}
${host_ip_value}
Then remove ${ and _value}.
$ grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//'
host_name
host_domain
host_ip
Finally, use readarray to safely read the results into an array.
$ readarray -t var_array < <(grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//')
$ declare -p var_array
declare -a var_array=([0]="host_name" [1]="host_domain" [2]="host_ip")

How do I convert CSV data into an associative array using Bash 4?

The file /tmp/file.csv contains the following:
name,age,gender
bob,21,m
jane,32,f
The CSV file will always have headers.. but might contain a different number of fields:
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
In either case, I want to convert my CSV data into an array of associative arrays..
What I need
So, I want a Bash 4.3 function that takes CSV as piped input and sends the array to stdout:
/tmp/file.csv:
name,age,gender
bob,21,m
jane,32,f
It needs to be used in my templating system, like this:
{{foo | csv_to_array | foo2}}
^ this is a fixed API, I must use that syntax .. foo2 must receive the array as standard input.
The csv_to_array func must do it's thing, so that afterwards I can do this:
$ declare -p row1; declare -p row2; declare -p new_array;
and it would give me this:
declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")
..Once I have this array structure (an indexed array of associative array names), I have a shell-based templating system to access them, like so:
{{#new_array}}
Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}
But I'm struggling to generate the arrays I need..
Things I tried:
I have already tried using this as a starting point to get the array structure I need:
while IFS=',' read -r -a my_array; do
echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)
(from Shell: CSV to array)
..and also this:
cat /tmp/file.csv | while read line; do
line=( ${line//,/ } )
echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[#]}"
done
(from https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/)
but I didn't really make any progress in getting what I want out the other end...
EDIT:
Accepted the 2nd answer, but I had to hack the library I am using to make either solution work..
I'll be happy to look at other answers, which do not export the declare commands as strings, to be run in the current env, but instead somehow hoist the resultant arrays of the declare commands to the current env (the current env is wherever the function is run from).
Example:
$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data
So, to be clear, if the above ^ works in a terminal, it'll work in the library I'm using without the hacks I had to add (which involved grepping STDIN for ^declare -a and using source <(cat); eval $STDIN... in other functions)...
See my comments on the 2nd answer for more info.

The approach is straightforward:
Read the column headers into an array
Read the file line by line, in each line …
Create a new associative array and register its name in the array of array names
Read the fields and assign them according to the column headers
In the last step we cannot use read -a, mapfile, or things like these since they only create regular arrays with numbers as indices, but we want an associative array instead, so we have to create the array manually.
However, the implementation is a bit convoluted because of bash's quirks.
The following function parses stdin and creates arrays accordingly.
I took the liberty to rename your array new_array to rowNames.
#! /bin/bash
csvToArrays() {
IFS=, read -ra header
rowIndex=0
while IFS= read -r line; do
((rowIndex++))
rowName="row$rowIndex"
declare -Ag "$rowName"
IFS=, read -ra fields <<< "$line"
fieldIndex=0
for field in "${fields[#]}"; do
printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
printf -v "$rowName[$quotedFieldHeader]" %s "$field"
done
rowNames+=("$rowName")
done
declare -p "${rowNames[#]}" rowNames
}
Calling the function in a pipe has no effect. Bash executes the commands in a pipe in a subshell, therefore you won't have access to the arrays created by someCommand | csvToArrays. Instead, call the function as either one of the following
csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile # when input comes from a file
Bash scripts like these tend to be very slow. That's the reason why I didn't bother to extract printf -v quotedFieldHeader … from the inner loop even though it will do the same work over and over again.
I think the whole templating thing and everything related would be way easier to program and faster to execute in languages like python, perl, or something like that.

The following script:
csv_to_array() {
local -a values
local -a headers
local counter
IFS=, read -r -a headers
declare -a new_array=()
counter=1
while IFS=, read -r -a values; do
new_array+=( row$counter )
declare -A "row$counter=($(
paste -d '' <(
printf "[%s]=\n" "${headers[#]}"
) <(
printf "%q\n" "${values[#]}"
)
))"
(( counter++ ))
done
declare -p new_array ${!row*}
}
foo2() {
source <(cat)
declare -p new_array ${!row*} |
sed 's/^/foo2: /'
}
echo "==> TEST 1 <=="
cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2
echo "==> TEST 2 <=="
cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2
will output:
==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )
The output comes from foo2 function.
The csv_to_array function first reads the headaers. Then for each read line it adds new element into new_array array and also creates a new associative array with the name row$index with elements created from joining the headers names with values read from the line. On the end the output from declare -p is outputted from the function.
The foo2 function sources the standard input, so the arrays come into scope for it. It outputs then those values again, prepending each line with foo2:.

Return awk split array to a bash variable

I have a requirement to split a string on a multi-character delimiter and return the values into an array in Bash for further processing
IFS can take a single character delimiter.
a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222_MultiCharDel_2;EEEE;FFFFFFF;22222"
awk'{split($0,ArrayDeltaMulDep,"_MultiCharDel_")}' <<< $a
The input string can have several substrings separated by the MultiCharDel delimiter.
How can i access this array ArrayDeltaMulDep fur further processing in Bash?

Your example string, a, does not contain newlines. If that is true in general, then:
a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222"
readarray -t b <<< "${a//MultiCharDel/$'\n'}"
We can verify that this split the string properly using declare -p to show the value of b:
$ declare -p b
declare -a b=([0]="2;AAAAA;BBBBB;1111_" [1]="_2;CCCC;DDDDDD;22222")
How it works:
readarray -t b
This reads lines from stdin and puts then in a bash array b.
<<< "${a//MultiCharDel/$'\n'}"
${a//MultiCharDel/$'\n'} uses pattern substitution to replace MultiCharDel with a newline character. <<< provides the result as stdin to the command readarray.
Hat tip: Chepner
More general solution
A bash string will never contain a null character (hex 00). Using GNU sed:
b=()
while read -d '' -r line
do
b+=("$line")
done < <(sed 's/MultiCharDel/\x00/g; s/$/\x00/' <<<"$a")
This again creates an array with the desired splitting:
$ declare -p b
declare -a b=([0]="2;AAAAA;BBBBB;1111_" [1]="_2;CCCC;DDDDDD;22222")

Using array inside awk in shell script

I am very new to Unix shell script and trying to get some knowledge in shell scripting. Please check my requirement and my approach.
I have a input file having data
ABC = A:3 E:3 PS:6
PQR = B:5 S:5 AS:2 N:2
I am trying to parse the data and get the result as
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
The values can be added horizontally and vertically so I am trying to use an array. I am trying something like this:
myarr=(main.conf | awk -F"=" 'NR!=1 {print $1}'))
echo ${myarr[1]}
# Or loop through every element in the array
for i in "${myarr[#]}"
do
:
echo $i
done
or
awk -F"=" 'NR!=1 {
print $1"\n"
STR=$2
IFS=':' read -r -a array <<< "$STR"
for i in "${!array[#]}"
do
echo "$i=>${array[i]}"
done
}' main.conf
But when I add this code to a .sh file and try to run it, I get syntax errors as
$ awk -F"=" 'NR!=1 {
> print $1"\n"
> STR=$2
> FS= read -r -a array <<< "$STR"
> for i in "${!array[#]}"
> do
> echo "$i=>${array[i]}"
> done
>
> }' main.conf
awk: cmd. line:4: FS= read -r -a array <<< "$STR"
awk: cmd. line:4: ^ syntax error
awk: cmd. line:5: for i in "${!array[#]}"
awk: cmd. line:5: ^ syntax error
awk: cmd. line:8: done
awk: cmd. line:8: ^ syntax error
How can I complete the above expectations?

This is the awk code to do what you want:
$ cat tst.awk
BEGIN { FS="[ =:]+"; OFS="=" }
{
print $1
for (i=2;i<NF;i+=2) {
print $i, $(i+1)
}
print ""
}
and this is the shell script (yes, all a shell script does to manipulate text is call awk):
$ awk -f tst.awk file
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
A UNIX shell is an environment from which to call UNIX tools (find, sort, sed, grep, awk, tr, cut, etc.). It has its own language for manipulating (e.g. creating/destroying) files and processes and sequencing calls to tools but it is NOT intended to be used to manipulate text. The guys who invented shell also invented awk for shell to call to manipulate text.
Read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice and the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

First off, a command that does what you want:
$ sed 's/ = /\n/;y/: /=\n/' main.conf
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
This replaces, on each line, the first (and only) occurrence of = with a newline (the s command), then turns all : into = and all spaces into newlines (the y command). Notice that
this works only because there is a space at the end of the first line (otherwise it would be a bit more involved to get the empty line between the blocks) and
this works only with GNU sed because it substitutes newlines; see this fantastic answer for all the details and how to get it to work with BSD sed.
As for what you tried, there is almost too much wrong with it to try and fix it piece by piece: from the wild mixing of awk and Bash to syntax errors all over the place. I recommend you read good tutorials for both, for example:
The BashGuide
Effective AWK Programming
A Bash solution
Here is a way to solve the same in Bash; I didn't use any arrays.
#!/bin/bash
# Read line by line into the 'line' variable. Setting 'IFS' to the empty string
# preserves leading and trailing whitespace; '-r' prevents interpretation of
# backslash escapes
while IFS= read -r line; do
# Three parameter expansions:
# Replace ' = ' by newline (escape backslash)
line="${line/ = /\\n}"
# Replace ':' by '='
line="${line//:/=}"
# Replace spaces by newlines (escape backslash)
line="${line// /\\n}"
# Print the modified input line; '%b' expands backslash escapes
printf "%b" "$line"
done < "$1"
Output:
$ ./SO.sh main.conf
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Honoring quotes while reading shell arguments from a file - arrays

Related

Put lines of a text file in an array in bash

Parse variables from string and add them to an array with Bash

How do I convert CSV data into an associative array using Bash 4?

Return awk split array to a bash variable

Using array inside awk in shell script

Categories

Resources