Bash Array not sorting correctly - arrays

I need to order these 2 arrays, I don't care about the format of the output, I only need it to be ordered in order to compare them but this doesn't seem to work, although it works with simpler text. I also tried removing the --field-separator='"'
DIG_1=("sampletext""zzz""ms=ms91608007""asdas")
DIG_2=("zzz""ms=ms91608007""sampletext""asdas")
echo "unsorted:"
echo ${DIG_1[*]}
echo ${DIG_2[*]}
IFS=$'\n' sorted=($(sort --field-separator='"' <<<"${DIG_1[*]}")); unset IFS
IFS=$'\n' sorted2=($(sort --field-separator='"' <<<"${DIG_2[*]}")); unset IFS
echo "sorted:"
echo ${sorted[*]}
echo ${sorted2[*]}
And the output I get is:
unsorted:
sampletextzzzms=ms91608007asdas
zzzms=ms91608007sampletextasdas
sorted:
sampletextzzzms=ms91608007asdas
zzzms=ms91608007sampletextasdas
How Can I fix this? I want it to be, for example:
unsorted:
sampletextzzzms=ms91608007asdas
zzzms=ms91608007sampletextasdas
sorted:
asdasms=ms91608007sampletextzzz
asdasms=ms91608007sampletextzzz

There's no reason to use an array to store one element.
Since you need to keep the double quotes, you need to make efforts to preserve them:
DIG_1='"sampletext""zzz""ms=ms91608007""asdas"'
Otherwise the double quotes will be removed by the shell: 3.5.9 Quote Removal
When you use VAR=value some_command, that variable is only
set for the duration of some_command -- bash puts that variable in the environment for the command, not into the shell's own catalog of variables. Subsequently unsetting the
variable is not required -- unsetting the IFS variable is potentially harmful
for the rest of the program
sort won't sort the fields within a record, it's for sorting records against each other.
To accomplish what you want, this will do:
sorted_1=$(grep -Po '(?<=").*?(?=")' <<<"$DIG_1" | sort | paste -s -d "")

As anubhava mentioned in the comments, the current code is creating arrays of single values, ie:
$ DIG_1=("sampletext""zzz""ms=ms91608007""asdas")
$ typeset -p DIG_1
declare -a DIG_1=([0]="sampletextzzzms=ms91608007asdas")
$ DIG_2=("zzz""ms=ms91608007""sampletext""asdas")
$ typeset -p DIG_2
declare -a DIG_2=([0]="zzzms=ms91608007sampletextasdas")
Assuming the OP really does want an array, and that the array elements will be utilized in later code, we need a way to delimit the items of the array, and the easiest way to do this is with some white space, eg:
$ DIG_1=("sample text" "zzz" "ms=ms91608007" "asdas")
$ typeset -p DIG_1
declare -a DIG_1=([0]="sample text" [1]="zzz" [2]="ms=ms91608007" [3]="asdas")
$ DIG_2=("zzz" "ms=ms91608007" "sample text" "asdas")
$ typeset -p DIG_2
declare -a DIG_2=([0]="zzz" [1]="ms=ms91608007" [2]="sample text" [3]="asdas")
NOTE: I've added a single space to change "sampletext" to "sample text" so that we can see how a space is treated a) as part of the data vs b) as a delimiter.
NOTE: Assuming OPs code is generating the questionable array assignments (eg, DIG_1=("sampletext""zzz""ms=ms91608007""asdas")), it may make more sense to look into ways to 'fix' the array generator than to complicate the code by trying to figure out how to treat these single strings as a 4-part array definition.
Also, since the sample output (current vs desired) shows no double quotes I'm guessing this means the double quotes are not part of the actual data but rather just delimiters.
Now that we have an actual array of elements we can look at sorting the arrays and storing the results into additional (sorted) arrays, eg:
$ IFS=$'\n' sorted=($(printf "%s\n" "${DIG_1[#]}" | sort))
$ typeset -p sorted
declare -a sorted=([0]="asdas" [1]="ms=ms91608007" [2]="sample text" [3]="zzz")
$ IFS=$'\n' sorted2=($(printf "%s\n" "${DIG_2[#]}" | sort))
$ typeset -p sorted2
declare -a sorted2=([0]="asdas" [1]="ms=ms91608007" [2]="sample text" [3]="zzz")
At this point we now have 2 sets of arrays ... 1) original data (DIG_1[#] and DIG_2[#]) and 2) sorted (sorted[#] and sorted2[#]).
The OP can then slice-n-dice the data as desired, as well as print the contents of the arrays in any desired format, eg:
# print array elements on a single line with no delimiters, storing the results
# in variables for later use/comparison/display
$ printf -v srt "%s" "${sorted[#]}"
$ typeset -p srt
declare -- srt="asdasms=ms91608007sample textzzz"
$ echo "${srt}"
asdasms=ms91608007sample textzzz
$ printf -v srt2 "%s" "${sorted2[#]}"
$ typeset -p srt2
declare -- srt2="asdasms=ms91608007sample textzzz"
$ echo "${srt2}"
asdasms=ms91608007sample textzzz

Related

BASH - Create arrays from lines of csv file, where first entry is array name

I'm learning to script in Bash.
I have an CSV file, which contains next lines:
numbers,one,two,three,four,five
colors,red,blue,green,yellow,white
custom-1,a,b,c,d,e
custom+2,t,y,w,x,z
Need to create arrays from this, where first entry is array name, eg.
number=(one,two,three,four,five)
colors=(red,blue,green,yellow,white)
custom-1=(a,b,c,d,e)
custom+2=(t,y,w,x,z)
Here is my script:
IFS=","
while read NAME VALUES ; do
declare -a $NAME
arrays+=($NAME)
IFS=',' read -r -a $NAME <<< "${VALUES[0]}"
done < file.csv
When I try with csv file, containing only two first string (numbers and colors), code works well. And if i try to with number, colors, custom-1, custom-2, there is error during reading csv:
./script.sh: line 5: declare: `custom-1': not a valid identifier
./script.sh: line 7: read: `custom+2': not a valid identifier
because bash does not allow special characters in variable names, as far as I understand. Is there any way to avoid this?
As you cannot use the first column of your CSV file as bash array names one option would be to generate valid names using a counter (e.g. arrayN). If you want to access your data using the values of this first column you would also need to store them somewhere with the corresponding counter value. An associative array (declare -A names=()) would be perfect. Last but not least, the namerefs (declare -n arr=..., available starting with bash 4.3) will be convenient to store and access your data. Example:
declare -i cnt=1
declare -A names=()
while IFS=',' read -r -a line; do
names["${line[0]}"]="$cnt"
declare -n arr="array$cnt"
unset line[0]
declare -a arr=( "${line[#]}" )
((cnt++))
done < foo.csv
Now, to access the values corresponding to, let's say, entry custom+2, first get the corresponding counter value, declare a nameref pointing to the corresponding array and voilà:
$ cnt="${names[custom+2]}"
$ declare -n arr="array$cnt"
$ echo "${arr[#]}"
t y w x z
Let's declare a function for easier access:
getdata () {
local -i cnt="${names[$1]}"
local -n arr="array$cnt"
[ -z "$2" ] && echo "${arr[#]}" || echo "${arr[$2]}"
}
And then:
$ getdata "custom+2"
t y w x z
$ getdata "colors"
red blue green yellow white
$ getdata "colors" 3
yellow

how to parse json in bash script as an array? array value should have both key:value format

my json.file looks like
{
"price1" : "120.10",
"price2" : "110.30",
"price3" : "244.45"
}
I have used below sed command in my bash script to declare array that reads from json
array=( $(sed -n '/{/,/}/{s/[^:]*:[^"]*"\([^"]*\).*/\1/p;}' json.file) )
this gave me output for echo ${array[*]} values
120.10 110.10 244.45
I am looking for my array values to include the key name as well (key:value).
my desired output should be key:value format
price1:120.10 price2:110.10 price3:244.45
can someone please help or guide me?
Parsing json with sed is probably a bad idea. But let's assume it is a super-simple and regular kind of json format. You were not too far from a working solution:
$ array=($(sed -nr 's/.*"(.*)".*"(.*)".*/\1:\2/p' json.file))
$ echo ${array[*]}
price1:120.10 price2:110.30 price3:244.45
The p flag of the substitute command tells sed to print the result if there was a match. It is good to know. If you have only two quoted strings per line of interest it should work. The regular expression .*"(.*)".*"(.*)".* matches anything - .* - followed by a double quote, anything again, another double quote... The parentheses - (.*) - do not change what is matched. It just records the matched part between parentheses in one of nine buffers that can be used in the replacement string - \1:\2. So here, the replacement string corresponds to:
<first recorded match>:<second recorded match>
If you want to be more specific about the matching lines you can. For instance:
sed -nr 's/^ "(.*)" : "(.*)",?$/\1:\2/p' json.file
Also specifies that there are 2 leading spaces, that the colon is preceded and followed by one single space and that there is a optional comma after the last quote and before the end of line.
But there is also the much simpler:
sed -nr 's/[[:space:]",]//gp' json.file
that just removes all spaces, double quotes and commas, printing only the matching lines. And guess what? It is what you (apparently) want.
Anyway, remember that if your json files are more complex than what you show sed is definitely not the right tool.
Your solution just needs to extract the extra data:
array=( $(sed -n '/{/,/}/{s/"\([^"]*\)"[^:]*:[^"]*"\([^"]*\).*/\1:\2/p;}' <<< "$j") )
echo ${array[#]}
price1:120.10 price2:110.30 price3:244.45
Simpler looking solutions may work too, but I'm assuming that you wrote your pattern like that deliberately.
The key trick here is to enclose part of your pattern with escaped brackets \(...\) and then have a numbered substitution for each in the output part \1, \2, etc.
json='{"price1":"120.10","price2":"110.30","price3":"244.45"}'
array=($(jq -r 'to_entries[] | ( "\(.key):\(.value)")' <<< "$json"))
echo ${array[#]}
printf '%s\n' "${array[#]}"
declare -p array
echo
array=($(jq -r 'to_entries[] | ( "\(.key):\(.value)") | #sh' <<< "$json"))
echo ${array[#]}
printf '%s\n' "${array[#]}"
declare -p array
Output:
price1:120.10 price2:110.30 price3:244.45
price1:120.10
price2:110.30
price3:244.45
declare -a array='([0]="price1:120.10" [1]="price2:110.30" [2]="price3:244.45")'
'price1:120.10' 'price2:110.30' 'price3:244.45'
'price1:120.10'
'price2:110.30'
'price3:244.45'
declare -a array='([0]="'\''price1:120.10'\''" [1]="'\''price2:110.30'\''" [2]="'\''price3:244.45'\''")'
For bash scripting, you might want to use an Associative array:
declare -A prices
while IFS=$'\t' read -r key value; do
prices[$key]=$value
done < <(
jq -r 'to_entries[] | [.key, .value] | #tsv' json.file
)
Then, inspect the array
declare -p prices
declare -A prices='([price1]="120.10" [price3]="244.45" [price2]="110.30" )'
or iterate over it
for key in "${!prices[#]}"; do
printf '%s => %s\n' "$key" "${prices[$key]}"
done
price1 => 120.10
price3 => 244.45
price2 => 110.30

How do I convert CSV data into an associative array using Bash 4?

The file /tmp/file.csv contains the following:
name,age,gender
bob,21,m
jane,32,f
The CSV file will always have headers.. but might contain a different number of fields:
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
In either case, I want to convert my CSV data into an array of associative arrays..
What I need
So, I want a Bash 4.3 function that takes CSV as piped input and sends the array to stdout:
/tmp/file.csv:
name,age,gender
bob,21,m
jane,32,f
It needs to be used in my templating system, like this:
{{foo | csv_to_array | foo2}}
^ this is a fixed API, I must use that syntax .. foo2 must receive the array as standard input.
The csv_to_array func must do it's thing, so that afterwards I can do this:
$ declare -p row1; declare -p row2; declare -p new_array;
and it would give me this:
declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")
..Once I have this array structure (an indexed array of associative array names), I have a shell-based templating system to access them, like so:
{{#new_array}}
Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}
But I'm struggling to generate the arrays I need..
Things I tried:
I have already tried using this as a starting point to get the array structure I need:
while IFS=',' read -r -a my_array; do
echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)
(from Shell: CSV to array)
..and also this:
cat /tmp/file.csv | while read line; do
line=( ${line//,/ } )
echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[#]}"
done
(from https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/)
but I didn't really make any progress in getting what I want out the other end...
EDIT:
Accepted the 2nd answer, but I had to hack the library I am using to make either solution work..
I'll be happy to look at other answers, which do not export the declare commands as strings, to be run in the current env, but instead somehow hoist the resultant arrays of the declare commands to the current env (the current env is wherever the function is run from).
Example:
$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data
So, to be clear, if the above ^ works in a terminal, it'll work in the library I'm using without the hacks I had to add (which involved grepping STDIN for ^declare -a and using source <(cat); eval $STDIN... in other functions)...
See my comments on the 2nd answer for more info.
The approach is straightforward:
Read the column headers into an array
Read the file line by line, in each line …
Create a new associative array and register its name in the array of array names
Read the fields and assign them according to the column headers
In the last step we cannot use read -a, mapfile, or things like these since they only create regular arrays with numbers as indices, but we want an associative array instead, so we have to create the array manually.
However, the implementation is a bit convoluted because of bash's quirks.
The following function parses stdin and creates arrays accordingly.
I took the liberty to rename your array new_array to rowNames.
#! /bin/bash
csvToArrays() {
IFS=, read -ra header
rowIndex=0
while IFS= read -r line; do
((rowIndex++))
rowName="row$rowIndex"
declare -Ag "$rowName"
IFS=, read -ra fields <<< "$line"
fieldIndex=0
for field in "${fields[#]}"; do
printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
printf -v "$rowName[$quotedFieldHeader]" %s "$field"
done
rowNames+=("$rowName")
done
declare -p "${rowNames[#]}" rowNames
}
Calling the function in a pipe has no effect. Bash executes the commands in a pipe in a subshell, therefore you won't have access to the arrays created by someCommand | csvToArrays. Instead, call the function as either one of the following
csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile # when input comes from a file
Bash scripts like these tend to be very slow. That's the reason why I didn't bother to extract printf -v quotedFieldHeader … from the inner loop even though it will do the same work over and over again.
I think the whole templating thing and everything related would be way easier to program and faster to execute in languages like python, perl, or something like that.
The following script:
csv_to_array() {
local -a values
local -a headers
local counter
IFS=, read -r -a headers
declare -a new_array=()
counter=1
while IFS=, read -r -a values; do
new_array+=( row$counter )
declare -A "row$counter=($(
paste -d '' <(
printf "[%s]=\n" "${headers[#]}"
) <(
printf "%q\n" "${values[#]}"
)
))"
(( counter++ ))
done
declare -p new_array ${!row*}
}
foo2() {
source <(cat)
declare -p new_array ${!row*} |
sed 's/^/foo2: /'
}
echo "==> TEST 1 <=="
cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2
echo "==> TEST 2 <=="
cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2
will output:
==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )
The output comes from foo2 function.
The csv_to_array function first reads the headaers. Then for each read line it adds new element into new_array array and also creates a new associative array with the name row$index with elements created from joining the headers names with values read from the line. On the end the output from declare -p is outputted from the function.
The foo2 function sources the standard input, so the arrays come into scope for it. It outputs then those values again, prepending each line with foo2:.

Using Bash, is it possible to store an array in a dictionary

With bash, it is possible to store an array in a dictionary? I have shown some sample code of fetching an array from the dictionary but it seems to lose the fact that it is an array.
I expect it is the dict+=(["pos"]="${array[#]}") command but am unsure of how to do this or if it is even possible.
# Normal array behaviour (just an example)
array=(1 2 3)
for a in "${array[#]}"
do
echo "$a"
done
# Outputs:
# 1
# 2
# 3
# Array in a dictionary
declare -A dict
dict+=(["pos"]="${array[#]}")
# When I fetch the array, it is not an array anymore
posarray=("${dict[pos]}")
for a in "${posarray[#]}"
do
echo "$a"
done
# Outputs:
# 1 2 3
# but I want
# 1
# 2
# 3
No, but there are workarounds.
Using printf '%q ' + eval
You can flatten your array into a string:
printf -v array_str '%q ' "${array[#]}"
dict["pos"]=$array_str
...and then use eval to expand that array back:
# WARNING: Only safe if array was populated with eval-safe strings, as from printf %q
key=pos; dest=array
printf -v array_cmd "%q=( %s )" "$dest" "${dict[$key]}"
eval "$array_cmd"
Note that this is only safe if your associative array is populated through the code using printf '%q ' to escape the values before they're added; content that avoids this process is potentially unsafe to eval.
Using base64 encoding
Slower but safer (if you can't prevent modification of your dictionary's contents by untrusted code), another approach is to store a base64-encoded NUL-delimited list:
dict["pos"]=$(printf '%s\0' "${array[#]}" | openssl enc base64)
...and read it out the same way:
array=( )
while IFS= read -r -d '' item; do
array+=( "$item" )
done < <(openssl enc -d base64 <<<"${dict["pos"]}"
Using Multiple Variables + Indirect Expansion
This one's actually symmetric, though it requires bash 4.3 or newer. That said, it restricts your key names to those which are permissible as shell variable names.
key=pos
array=( "first value" "second value" )
printf -v var_name 'dict_%q' "$key"
declare -n var="$var_name"
var=( "${array[#]}" )
unset -n var
...whereafter declare -p dict_pos will emit declare -a dict_pos=([0]="first value" [1]="second value"). On the other end, for retrieval:
key=pos
printf -v var_name 'dict_%q' "$key"
declare -n var="$var_name"
array=( "${var[#]}" )
unset -n var
...whereafter declare -p array will emit declare -a array=([0]="first value" [1]="second value").
Dictionaries are associative arrays, so the question rephrased is: "Is it possible to store an array inside another array?"
No, it's not. Arrays cannot be nested.
dict+=(["pos"]="${array[#]}")
For this to work you'd need an extra set of parentheses to capture the value as an array and not a string:
dict+=(["pos"]=("${array[#]}"))
But that's not legal syntax.

Bash : Can Here Strings <<< work with multiple variables as input?

I am trying to initialize an array with multiple variables as below .
StringOne="This is a Test String"
StringTwo="This is a New String"
read -r -a Values <<< "$StringOne" "$StringTwo"
But It seems like array is getting values from only the first variable .ie StringOne
$ echo ${Values[0]}
This
$ echo ${Values[1]}
is
$ echo ${Values[2]}
a
$ echo ${Values[3]}
Test
${Values[4]}
String
$ echo ${Values[5]}
$ echo ${Values[6]}
$
What is wrong with this way of passing variable value for array initialization ? Cant we pass multiple variables with <<< operator ?
What is wrong with this way of passing variable value for array initialization ? Cant we pass multiple variables with <<< operator ?
Yes and no. The <<< operator takes one shell word as its operand, as is presented pretty clearly in its documentation. But you can combine the values of multiple variables in a single shell word by appropriate use of quoting:
StringOne="This is a Test String"
StringTwo="This is a New String"
read -r -a Values <<< "$StringOne $StringTwo"
echo "${Values[#]}"
Output:
This is a Test String This is a New String
Use:
read -r -a Values <<< "$StringOne"" ""$StringTwo"
Or:
read -r -a Values <<< "$StringOne"' '"$StringTwo"
Or:
out="$StringOne $StringTwo"
read -r -a Values <<< "$out"
Or some other variation on this theme.
This works because the string sent via <<< is the result of the expansion of what in on the right of it. We can make both variables act as one long string by concatenating them.

Resources