KSH scripting: how to split on ',' when values have escaped commas? - arrays

I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.
Format is:
NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc
Suppose I write:
read l
IFS=","
set -A nvls $l
echo "$nvls[2]"
This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:
NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc
Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".
I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.
Does anyone have a useful trick up their sleeve for me?
PS
I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)

As it often happens, I deviced an answer minutes after asking the question in public forum :(
I worked around the quoting/unquoting issue by piping the input file through the following sed script:
sed -e 's/\([^\]\),/\1\
/g;s/$/\
/
It converted the input into:
NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>
Now, I can parse this input like this:
while read name value ; do
echo "$name => $value"
done
Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.
PS
Since I cant accept my own answer, should I delete the question, or ...?

You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.
read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}

Related

Split comma separated and quoted string into an array in Bash

I need to split a comma separated, but quoted list of strings into an indexed bash array in a script.
I know there are a lot of posts on the web in general and also on SO that show how to create an indexed array from a given line / string, but I could not find any example that does the array elements the way I need. I apologise, if I have missed any obvious examples from SO itself.
I am reading a file that I receive from someone, and cannot change it.
The file is formatted like this
"Grant ACL","grantacls.sh"
"Revoke ACL","revokeacls.sh"
"Get ACls for Topic","topicacls.sh"
"Get Topics for User with ACLs","useracls.sh"
I need to create an array for each line above where the separator is comma - and each of the quoted string will be an array element. I have tried various options. The latest attempt was using a construct like this - copied from some example on the web
parseScriptMapLine=${scriptName[$IN_OPTION]}
mapfile -td ',' script1 < <(echo -n "${parseScriptMapLine//, /,}")
declare -p script1
echo "script1 $script1"
where script name is an associative array created from the original file, whose format is with 1, 2, etc. as the key and the other part after '=' sign as value.
The above snippet prints
script1
And the value part I need to split into an indexed array, so that I can pass the second element as a parameter. When creating indexed array from the value string, if I have to lose the quotes, that is fine or if it creates the elements with the quotes, that is fine too.
1="Grant ACL","grantacls.sh"
2="Revoke ACL","revokeacls.sh"
3="Get ACls for Topic","topicacls.sh"
4="Get Topics for User with ACLs","useracls.sh"
I have looked at a lot of examples, but haven't been able to get this particular requirement working.
Thank you
With apologies, I could not understand what you wanted - this sounds like an X/Y Problem. Can you clarify?
Maybe this?
$: while IFS=',"' read -r _ a _ _ d _ && [[ -n "$d" ]]; do echo "a=[$a] d=[$d]"; done < file
a=[Grant ACL] d=[grantacls.sh]
a=[Revoke ACL] d=[revokeacls.sh]
a=[Get ACls for Topic] d=[topicacls.sh]
a=[Get Topics for User with ACLs] d=[useracls.sh]
That will let you do whatever you wanted with the fields, which I named a and d.
If you just want to load the lines of the file into an array -
$: mapfile -t script1 < file
$: for i in "${!script1[#]}"; do echo "$i=${script1[i]}"; done
0="Grant ACL","grantacls.sh"
1="Revoke ACL","revokeacls.sh"
2="Get ACls for Topic","topicacls.sh"
3="Get Topics for User with ACLs","useracls.sh"
If you want a two-dimensional array, then sorry, you're going to have to use something besides bash. or get more creative.

Bash: Store sed result into array?

How to fix the following code so that it can store the result of sed, which will replace the _
with -?
My code:
names=()
for entry_ in $foo
do
names+=($entry_ | sed -e "s/_/-/g")
done
echo names
You don't need sed for this, you can use bash's built-in parameter expansion + substitution capability to replace all _ characters with -: ${var//_/-}. You can even use it to do this for the entire list of elements in a single operation, but how you do it depends on what the source variable, foo, actually is.
If foo is an array (the much better way to do things), you can combine [#] ("get me all elements of the array") with the substitution:
names=( "${foo[#]//_/-}" )
If foo is a plain string, and you need to use word splitting to break it into elements for the array, you can do essentially the same thing without the [#] ('cause it's not an array) or the double-quotes (which prevent word splitting):
names=( ${foo//_/-} )
Note: I recommend avoiding word splitting if possible -- it often does something close to what you want, but almost never exactly what you want.
P.s. I third the recommendation of shellcheck. Among other things, it'll flag anything involving word splitting as a probable mistake.
This should be enough to get you there.
names=()
names+=$(echo "hello_world" | sed -e "s/_/-/g")
echo $names
Note that you need $ before echoing your variable.
Also. Look into installing shellcheck for your code editor and it will help you catch sneaky bugs and build better shell programming practices.

Read paired arrays from a file in bash

I have a bash script which breaks bash array into pairs, and match on either element;
declare -a arr=(
"apple" "fruit"
"cabbage" "vegetables"
)
for ((i=0; i<${#arr[#]}; i+=2)); do
echo "${arr[i]} ${arr[i+1]}"
done
So when you run this script, it prints out each 2 element from the array, like this;
# bash script
apple fruit
cabbage vegetables
and I can also choose any element I want with ${arr[i+#]}.
Now I'm trying to read this array from a separate text file, instead of inside the script since I'll be manipulating this array in the future.
I've tried this method so far, which looked pretty promising at first but didn't work at all;
filename='stuff.log'
filelines=`cat $filename`
for line in $filelines ; do
props=($line)
echo "${props[0]} ${props[1]}"
done
which should've print out the below content in the console (basically the same thing as the first script where the array is inside the script), supposedly but instead, it returned nothing.
# bash script
apple fruit
cabbage vegetables
And the inside of stuff.log is;
"apple" "fruit"
"cabbage" "vegetables"
How can I basically read the array from a separate file for the first script and also be able to manipulate the content of array file in the future?
I think, if you trust your input, you can do:
IFS=' \n' eval props=($(<stuff.log))
Eval is evil and it is there to remove leading and trailing ". And it will parse properly elements with spaces in them. We can do a little safer by reading the file into array and then removing leading and trailing ":
IFS=' \n' props=($(<stuff.log))
IFS='\n' props=($(printf "%s\n" "${props[#]}" | sed 's/^"//;s/"$//'))
Anyway I think I would hesitate to use such method in production code. Would be better to write a proper fully parser that takes " into account and reads input char by char.
If you want to read a file into an array, use mapfile or readarray commands (they are exactly the same command).

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

Bash - how to ignore first delimiter of each line?

I have a file BookDB.txt which stores information in the following manner :
C++ for dummies:Jared:10.67:4:5
Java for dummies:David:10.45:3:6
PHP for dummies:Sarah:10.47:2:7
How do I ignore the first delimiter of each line and add the first 2 fields into an array? (Refer to example below).
Assuming that at runtime, the script asks the user for the variables TITLE and AUTHOR respectively. How would I then store the combined fields into an array?
Eg :
ARRAY=('C++ for dummies:Jared' 'Java for dummies:David' 'PHP for dummies:Sarah')
ARRAY=($TITLE:$AUTHOR)
This is very similar to your other question, and it would have been beneficial for you to link it.
My answer there can be modified to handle this quite easily.
IFS=$'\n'; arr=( $(awk -F':' '{print $1 ":" $2 }' Input.txt ) )
Note that there is no need to ignore the first delimiter to solve this problem. It suffices to acknowledge it and incorporate two fields instead of one.

Resources