Convert a array into an associative array (Bash) - arrays

I had my code converting my string into an array, but i end up noticing that i will need it to be associative with $9 as key and the rest of the string as value
stringy=$(ls -l | awk '{print$3,$6,$7,$8,$9}')
declare -a myarray=()
There is the possibility of doing it using something similar to?
readarray -t myarray <<< "$stringy"
(yes parse ls is not exactly wise ;) )

Related

Bash Array not sorting correctly

I need to order these 2 arrays, I don't care about the format of the output, I only need it to be ordered in order to compare them but this doesn't seem to work, although it works with simpler text. I also tried removing the --field-separator='"'
DIG_1=("sampletext""zzz""ms=ms91608007""asdas")
DIG_2=("zzz""ms=ms91608007""sampletext""asdas")
echo "unsorted:"
echo ${DIG_1[*]}
echo ${DIG_2[*]}
IFS=$'\n' sorted=($(sort --field-separator='"' <<<"${DIG_1[*]}")); unset IFS
IFS=$'\n' sorted2=($(sort --field-separator='"' <<<"${DIG_2[*]}")); unset IFS
echo "sorted:"
echo ${sorted[*]}
echo ${sorted2[*]}
And the output I get is:
unsorted:
sampletextzzzms=ms91608007asdas
zzzms=ms91608007sampletextasdas
sorted:
sampletextzzzms=ms91608007asdas
zzzms=ms91608007sampletextasdas
How Can I fix this? I want it to be, for example:
unsorted:
sampletextzzzms=ms91608007asdas
zzzms=ms91608007sampletextasdas
sorted:
asdasms=ms91608007sampletextzzz
asdasms=ms91608007sampletextzzz
There's no reason to use an array to store one element.
Since you need to keep the double quotes, you need to make efforts to preserve them:
DIG_1='"sampletext""zzz""ms=ms91608007""asdas"'
Otherwise the double quotes will be removed by the shell: 3.5.9 Quote Removal
When you use VAR=value some_command, that variable is only
set for the duration of some_command -- bash puts that variable in the environment for the command, not into the shell's own catalog of variables. Subsequently unsetting the
variable is not required -- unsetting the IFS variable is potentially harmful
for the rest of the program
sort won't sort the fields within a record, it's for sorting records against each other.
To accomplish what you want, this will do:
sorted_1=$(grep -Po '(?<=").*?(?=")' <<<"$DIG_1" | sort | paste -s -d "")
As anubhava mentioned in the comments, the current code is creating arrays of single values, ie:
$ DIG_1=("sampletext""zzz""ms=ms91608007""asdas")
$ typeset -p DIG_1
declare -a DIG_1=([0]="sampletextzzzms=ms91608007asdas")
$ DIG_2=("zzz""ms=ms91608007""sampletext""asdas")
$ typeset -p DIG_2
declare -a DIG_2=([0]="zzzms=ms91608007sampletextasdas")
Assuming the OP really does want an array, and that the array elements will be utilized in later code, we need a way to delimit the items of the array, and the easiest way to do this is with some white space, eg:
$ DIG_1=("sample text" "zzz" "ms=ms91608007" "asdas")
$ typeset -p DIG_1
declare -a DIG_1=([0]="sample text" [1]="zzz" [2]="ms=ms91608007" [3]="asdas")
$ DIG_2=("zzz" "ms=ms91608007" "sample text" "asdas")
$ typeset -p DIG_2
declare -a DIG_2=([0]="zzz" [1]="ms=ms91608007" [2]="sample text" [3]="asdas")
NOTE: I've added a single space to change "sampletext" to "sample text" so that we can see how a space is treated a) as part of the data vs b) as a delimiter.
NOTE: Assuming OPs code is generating the questionable array assignments (eg, DIG_1=("sampletext""zzz""ms=ms91608007""asdas")), it may make more sense to look into ways to 'fix' the array generator than to complicate the code by trying to figure out how to treat these single strings as a 4-part array definition.
Also, since the sample output (current vs desired) shows no double quotes I'm guessing this means the double quotes are not part of the actual data but rather just delimiters.
Now that we have an actual array of elements we can look at sorting the arrays and storing the results into additional (sorted) arrays, eg:
$ IFS=$'\n' sorted=($(printf "%s\n" "${DIG_1[#]}" | sort))
$ typeset -p sorted
declare -a sorted=([0]="asdas" [1]="ms=ms91608007" [2]="sample text" [3]="zzz")
$ IFS=$'\n' sorted2=($(printf "%s\n" "${DIG_2[#]}" | sort))
$ typeset -p sorted2
declare -a sorted2=([0]="asdas" [1]="ms=ms91608007" [2]="sample text" [3]="zzz")
At this point we now have 2 sets of arrays ... 1) original data (DIG_1[#] and DIG_2[#]) and 2) sorted (sorted[#] and sorted2[#]).
The OP can then slice-n-dice the data as desired, as well as print the contents of the arrays in any desired format, eg:
# print array elements on a single line with no delimiters, storing the results
# in variables for later use/comparison/display
$ printf -v srt "%s" "${sorted[#]}"
$ typeset -p srt
declare -- srt="asdasms=ms91608007sample textzzz"
$ echo "${srt}"
asdasms=ms91608007sample textzzz
$ printf -v srt2 "%s" "${sorted2[#]}"
$ typeset -p srt2
declare -- srt2="asdasms=ms91608007sample textzzz"
$ echo "${srt2}"
asdasms=ms91608007sample textzzz

How do I convert CSV data into an associative array using Bash 4?

The file /tmp/file.csv contains the following:
name,age,gender
bob,21,m
jane,32,f
The CSV file will always have headers.. but might contain a different number of fields:
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
In either case, I want to convert my CSV data into an array of associative arrays..
What I need
So, I want a Bash 4.3 function that takes CSV as piped input and sends the array to stdout:
/tmp/file.csv:
name,age,gender
bob,21,m
jane,32,f
It needs to be used in my templating system, like this:
{{foo | csv_to_array | foo2}}
^ this is a fixed API, I must use that syntax .. foo2 must receive the array as standard input.
The csv_to_array func must do it's thing, so that afterwards I can do this:
$ declare -p row1; declare -p row2; declare -p new_array;
and it would give me this:
declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")
..Once I have this array structure (an indexed array of associative array names), I have a shell-based templating system to access them, like so:
{{#new_array}}
Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}
But I'm struggling to generate the arrays I need..
Things I tried:
I have already tried using this as a starting point to get the array structure I need:
while IFS=',' read -r -a my_array; do
echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)
(from Shell: CSV to array)
..and also this:
cat /tmp/file.csv | while read line; do
line=( ${line//,/ } )
echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[#]}"
done
(from https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/)
but I didn't really make any progress in getting what I want out the other end...
EDIT:
Accepted the 2nd answer, but I had to hack the library I am using to make either solution work..
I'll be happy to look at other answers, which do not export the declare commands as strings, to be run in the current env, but instead somehow hoist the resultant arrays of the declare commands to the current env (the current env is wherever the function is run from).
Example:
$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data
So, to be clear, if the above ^ works in a terminal, it'll work in the library I'm using without the hacks I had to add (which involved grepping STDIN for ^declare -a and using source <(cat); eval $STDIN... in other functions)...
See my comments on the 2nd answer for more info.
The approach is straightforward:
Read the column headers into an array
Read the file line by line, in each line …
Create a new associative array and register its name in the array of array names
Read the fields and assign them according to the column headers
In the last step we cannot use read -a, mapfile, or things like these since they only create regular arrays with numbers as indices, but we want an associative array instead, so we have to create the array manually.
However, the implementation is a bit convoluted because of bash's quirks.
The following function parses stdin and creates arrays accordingly.
I took the liberty to rename your array new_array to rowNames.
#! /bin/bash
csvToArrays() {
IFS=, read -ra header
rowIndex=0
while IFS= read -r line; do
((rowIndex++))
rowName="row$rowIndex"
declare -Ag "$rowName"
IFS=, read -ra fields <<< "$line"
fieldIndex=0
for field in "${fields[#]}"; do
printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
printf -v "$rowName[$quotedFieldHeader]" %s "$field"
done
rowNames+=("$rowName")
done
declare -p "${rowNames[#]}" rowNames
}
Calling the function in a pipe has no effect. Bash executes the commands in a pipe in a subshell, therefore you won't have access to the arrays created by someCommand | csvToArrays. Instead, call the function as either one of the following
csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile # when input comes from a file
Bash scripts like these tend to be very slow. That's the reason why I didn't bother to extract printf -v quotedFieldHeader … from the inner loop even though it will do the same work over and over again.
I think the whole templating thing and everything related would be way easier to program and faster to execute in languages like python, perl, or something like that.
The following script:
csv_to_array() {
local -a values
local -a headers
local counter
IFS=, read -r -a headers
declare -a new_array=()
counter=1
while IFS=, read -r -a values; do
new_array+=( row$counter )
declare -A "row$counter=($(
paste -d '' <(
printf "[%s]=\n" "${headers[#]}"
) <(
printf "%q\n" "${values[#]}"
)
))"
(( counter++ ))
done
declare -p new_array ${!row*}
}
foo2() {
source <(cat)
declare -p new_array ${!row*} |
sed 's/^/foo2: /'
}
echo "==> TEST 1 <=="
cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2
echo "==> TEST 2 <=="
cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2
will output:
==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )
The output comes from foo2 function.
The csv_to_array function first reads the headaers. Then for each read line it adds new element into new_array array and also creates a new associative array with the name row$index with elements created from joining the headers names with values read from the line. On the end the output from declare -p is outputted from the function.
The foo2 function sources the standard input, so the arrays come into scope for it. It outputs then those values again, prepending each line with foo2:.

How can I see which values exists in Bash array?

Given an array (in Bash), is there a command that prints the contents of the array according to indices?
Something like that: arr[0]=... , arr[1]=... ,...
I know that I can print it in a for loop, but I am looking for a command that does it.
Given an array with contiguous indices starting at 0:
$ arr=(one two three)
And non-contiguous indices:
$ declare -a arr2='([0]="one" [2]="two" [5]="three")'
You can print the values:
$ echo ${arr[*]} # same with arr2
one two three
Or, use a C style loop for arr:
$ for (( i=0;i<${#arr[#]};i++ )); do echo "arr[$i]=${arr[$i]}"; done
arr[0]=one
arr[1]=two
arr[2]=three
But that won't work for arr2.
So, you can expand the indices (contiguous or not) and print index, value like so:
$ for i in "${!arr2[#]}"; do echo "arr2[$i]=${arr2[$i]}"; done
arr2[0]=one
arr2[2]=two
arr2[5]=three
Or inspect it with declare -p:
$ declare -p arr
declare -a arr='([0]="one" [1]="two" [2]="three")'
Which also works if the array has non-contiguous indices (where the C loop would break):
$ declare -p arr2
declare -a arr2='([0]="one" [2]="two" [5]="three")'
Note: A common mistake is to use the sigil $ and thinking you are addressing that array or named value. It is the unadorned name that is used since the sigil will dereference and tell details of the name contained instead of that name:
$ k=arr
$ declare -p $k
declare -a arr='([0]="one" [1]="two" [2]="three")' # note this is 'arr' , not 'k'
$ declare -p k
declare -- k="arr"
Since declare with no arguments will print the entire Bash environment at that moment, you can also use utilities such as sed grep or awk against that output:
$ declare | grep 'arr'
arr=([0]="one" [1]="two" [2]="three")
k=arr

Can I pass an array to awk using -v?

I would like to be able to pass an array variable to awk. I don't mean a shell array but a native awk one. I know I can pass scalar variables like this:
awk -vfoo="1" 'NR==foo' file
Can I use the same mechanism to define an awk array? Something like:
$ awk -v"foo[0]=1" 'NR==foo' file
awk: fatal: `foo[0]' is not a legal variable name
I've tried a few variations of the above but none of them work on GNU awk 4.1.1 on my Debian. So, is there any version of awk (gawk,mawk or anything else) that can accept an array from the -v switch?
I know I can work around this and can easily think of ways to do so, I am just wondering if any awk implementation supports this kind of functionality natively.
You can use the split() function inside mawk or gawk to split the input of the "-v" value (here is the gawk man page):
split(s, a [, r [, seps] ])
Split the string s into the array a and the separators array seps on
the regular expression r, and return the number of fields.*
An example here in which i pass the value "ARRAYVAR", a comma separated list of values which is my array, with "-v" to the awk program, then split it into the internal variable array "arrayval" using the split() function and then print the 3rd value of the array:
echo 0 | gawk -v ARRAYVAR="a,b,c,d,e,f" '{ split(ARRAYVAR,arrayval,","); print(arrayval[3]) }'
c
Seems to work :)
It looks like it is impossible by definition.
From man awk we have that:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the
program begins. Such variable values are available to the BEGIN rule
of an AWK program.
Then we read in Using Variables in a Program that:
The name of a variable must be a sequence of letters, digits, or
underscores, and it may not begin with a digit.
Variables in awk can be assigned either numeric or string values.
So the way the -v implementation is defined makes it impossible to provide an array as a variable, since any kind of usage of the characters = or [ is not allowed as part of the -v variable passing. And both are required, since arrays in awk are only associative.
If you don't insist on using -v you could use -i (include) instead to read an awk file that contains the variable settings.
Like this:
if F=$(mktemp inputXXXXXX); then
cat >$F << 'END'
BEGIN {
foo[0]=1
}
END
cat $F
awk -i $F 'BEGIN { print foo[0] }' </dev/null
rm $F
fi
Sample trace (using gawk-4.2.1):
bash -x /tmp/test.sh
++ mktemp inputXXXXXX
+ F=inputrpMsan
+ cat
+ cat inputrpMsan
BEGIN {
foo[0]=1
}
+ awk -i inputrpMsan 'BEGIN { print foo[0] }'
1
+ rm inputrpMsan
Unfortunately, this is not possible. However, you can convert a bash array to an awk array using a few clever methods.
I wanted to do this recently by passing a bash array to awk to use it for filtering, so here is what I did:
$ arr=( hello world this is bash array )
$ echo -e 'this\nmight\nnot\nshow\nup' | awk 'BEGIN {
for (i = 1; i < ARGC; i++) {
my_filter[ARGV[i]]=1
ARGV[i]="" # unset ARGV[i] otherwise awk might try to read it as a file
}
} !my_filter[$0]' "${arr[#]}"
Output:
might
not
show
up
For associative arrays, you could pass it as a string of key-value pairs, and then reformat it in the BEGIN section.
$ echo | awk -v m="a,b;c,d" '
BEGIN {
split(m,M,";")
for (i in M) {
split(M[i],MM,",")
MA[MM[1]]=MM[2]
}
}
{
for (a in MA) {
printf("MA[%s]=%s\n",a, MA[a])
}
}'
Output:
MA[a]=b
MA[c]=d

BASH: Array element becoming the name of a new array

I've got an array bucket1=('10' '22' 'bucket1')
As you can see, one of the elements is the name of the array bucket1
Now I'm creating a new array by copying bucket1:
array1=("${bucket1[#]}")
Now I'm changing one of the elements in array1:
array1[1]='30'
echo ${array1[#]} gives 10 30 bucket1
Now I want to feed that change back to the array bucket1, but without knowing that array1 was created from bucket1. Instead I want to use the third element of array1, namely bucket1.
Something like:
declare -a ${array1[2]}=${array1[#]}
So that I end up with new bucket1 array, containing ('10' '30' 'bucket1')
In short:
I want to copy an array, alter the copied array, apply the changes from the copied array in the original array using one of the elements from the copied array as the name of the original array.
It this possible?
bucket1=(10 20 bucket1)
tmp=("${bucket1[#]}")
tmp[1]=30
declare -a "${tmp[2]}"=("${tmp[#]}")
bash: syntax error near unexpected token `('
Hmm that doesn't work. Try one-by-one
for i in ${!tmp[#]}; do declare "${tmp[2]}[$i]"="${tmp[i]}"; done
echo ${bucket1[1]}
30
This is MUCH easier in ksh93
$ bucket1=(10 20 bucket1)
$ nameref tmp=bucket1
$ tmp[1]=30
$ echo ${bucket1[1]}
30
You can use read -ra instead of declare here:
$> bucket1=('10' '22' 'bucket1')
$> array1=("${bucket1[#]}")
$> array1[1]='30 50'
$> declare -p array1
declare -a array1='([0]="10" [1]="30 50" [2]="bucket1")'
$> IFS=$'^G' && read -ra "${array1[2]}" < <(printf "%s^G" "${array1[#]}")
$> declare -p "${array1[2]}"
declare -a bucket1='([0]="10" [1]="30 50" [2]="bucket1")'
$> declare -p bucket1
declare -a bucket1='([0]="10" [1]="30 50" [2]="bucket1")'
All these declare -p have been used to print the array contents and can be removed in real script.
^G is typed using ControlVG together.
With a little work, you can get the value of the array in a form suitable for use in the argument to declare.
IFS="=" read _ value <<< "$(set | grep '^array1=')"
declare -a "${array1[2]}=$value"
The quotes around the command substitution are necessary to work around a bug that is fixed in bash 4.3. However, if you have that version of bash, you can use named references to simplify this:
declare -n tmp=${array1[2]}
tmp=("${array1[#]}")
Try this:
unset ${array1[2]}
declare -a ${array1[2]}="`echo ${array1[#]}`"
First we clear the array and then the output of echo will be stored in the new array name.

Resources