Bash: Split a string into an array - arrays

First of all, let me state that I am very new to Bash scripting. I have tried to look for solutions for my problem, but couldn't find any that worked for me.
Let's assume I want to use bash to parse a file that looks like the following:
variable1 = value1
variable2 = value2
I split the file line by line using the following code:
cat /path/to/my.file | while read line; do
echo $line
done
From the $line variable I want to create an array that I want to split using = as a delimiter, so that I will be able to get the variable names and values from the array like so:
$array[0] #variable1
$array[1] #value1
What would be the best way to do this?

Set IFS to '=' in order to split the string on the = sign in your lines, i.e.:
cat file | while IFS='=' read key value; do
${array[0]}="$key"
${array[1]}="$value"
done
You may also be able to use the -a argument to specify an array to write into, i.e.:
cat file | while IFS='=' read -a array; do
...
done
bash version depending.
Old completely wrong answer for posterity:
Add the argument -d = to your read statement. Then you can do:
cat file | while read -d = key value; do
$array[0]="$key"
$array[1]="$value"
done

while IFS='=' read -r k v; do
: # do something with $k and $v
done < file
IFS is the 'inner field separator', which tells bash to split the line on an '=' sign.

Related

Put lines of a text file in an array in bash

I'm taking over a bash script from a colleague that reads a file, process it and print another file based on the line in the while loop at the moment.
I now need to append some features to it. The one I'm having issues with right now is to read a file and put each line into an array, except the 2nd column of that line can be empty, e.g.:
For a text file with \t as separator:
A\tB\tC
A\t\tC
For a CSV file same but with , as separator:
A,B,C
A,,C
Which should then give
["A","B","C"] or ["A", "", "C"]
The code I took over is as follow:
while IFS=$'\t\r' read -r -a col; do
# Process the array, put that into a file
lp -d $printer $file_to_print
done < $input_file
Which works if B is filled, but B need to be empty now sometimes, so when the input files keeps it empty, the created array and thus the output file to print just skips this empty cell (array is then ["A","C"]).
I tried writing the whole bloc on awk but this brought it's own sets of problems, making it difficult to call the lp command to print.
So my question is, how can I preserve the empty cell from the line into my bash array, so that I can call on it later and use it?
Thank you very much. I know this might be quite confused so please ask and I'll specify.
Edit: After request, here's the awk code I've tried. The issue here is that it only prints the last print request, while I know it loops over the whole file, and the lp command is still in the loop.
awk 'BEGIN {
inputfile="'"${optfile}"'"
outputfile="'"${file_loc}"'"
printer="'"${printer}"'"
while (getline < inputfile){
print "'"${prefix}"'" > outputfile
split($0,ft,"'"${IFSseps}"'");
if (length(ft[2]) == 0){
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"\"" >> outputfile
size_changer = 0
} else {
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"_"ft[2]"\"" >> outputfile
size_changer = 1
}
if ( split($0,ft,"'"${IFSseps}"'") > 6)
maxcounter = 6;
else
maxcounter = split($0,ft,"'"${IFSseps}"'");
for (i = 3; i <= maxcounter; i++){
x=191-(i-2)*33
print "CODEPAGE 1252\nTEXT 465,"x",\"ROMAN.TTF\",180,7,7,\""ft[i]"\"" >> outputfile
}
print "PRINT ""'"${copies}"'"",1" >> outputfile
close(outputfile)
"'"`lp -d ${printer} ${file_loc}`"'"
}
close("'"${file_loc}"'");
}'
EDIT2: Continuing to try to find a solution to it, I tried following code without success. This is weird, as just doing printf without putting it in an array keeps the formatting intact.
$ cat testinput | tr '\t' '>'
A>B>C
A>>C
# Should normally be empty on the second ouput line
$ while read line; do IFS=$'\t' read -ra col < <(printf "$line"); echo ${col[1]}; done < testinput
B
C
For tab, it's complicated.
From 3.5.7 Word Splitting in the manual:
A sequence of IFS whitespace characters is also treated as a delimiter.
Since tab is an "IFS whitespace character", sequences of tabs are treated as a single delimiter
IFS=$'\t' read -ra ary <<<$'A\t\tC'
declare -p ary
declare -a ary=([0]="A" [1]="C")
What you can do is translate tabs to a non-whitespace character, assuming it does not clash with the actual data in the fields:
line=$'A\t\tC'
IFS=, read -ra ary <<<"${line//$'\t'/,}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")
To avoid the risk of colliding with commas in the data, we can use an unusual ASCII character: FS, octal 034
line=$'A\t\tC'
printf -v FS '\034'
IFS="$FS" read -ra ary <<<"${line//$'\t'/"$FS"}"
# or, without the placeholder variable
IFS=$'\034' read -ra ary <<<"${line//$'\t'/$'\034'}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")
One bash example using parameter expansion where we convert the delimiter into a \n and let mapfile read in each line as a new array entry ...
For tab-delimited data:
for line in $'A\tB\tC' $'A\t\tC'
do
mapfile -t array <<< "${line//$'\t'/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A B C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: The $'...' construct insures the \t is treated as a single <tab> character as opposed to the two literal characters \ + t.
For comma-delimited data:
for line in 'A,B,C' 'A,,C'
do
mapfile -t array <<< "${line//,/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A,B,C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A,,C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: This obviously (?) assumes the desired data does not contain a comma (,).
It may just be your # Process the array, put that into a file part.
IFS=, read -ra ray <<< "A,,C"
for e in "${ray[#]}"; do o="$o\"$e\","; done
echo "[${o%,}]"
["A","","C"]
See #Glenn's excellent answer regarding tabs.
My simple data file:
$: cat x # tab delimited, empty field 2 of line 2
a b c
d f
My test:
while IFS=$'\001' read -r a b c; do
echo "a:[$a] b:[$b] c:[$c]"
done < <(tr "\t" "\001"<x)
a:[a] b:[b] c:[c]
a:[d] b:[] c:[f]
Note that I used ^A (a 001 byte) but you might be able to use something as simple as a comma or pipe (|) character. Choose based on your data.

Output results from cat into different files with names specified into an array

I would like to do cat on several files, which names are stored in an array:
cat $input | grep -v "#" | cut -f 1,2,3
Here the content of the array:
echo $input
1.blastp 2.blastp 3.blastp 4.blastp 5.blastp 6.blastp 7.blastp 8.blastp 9.blastp 10.blastp 11.blastp 12.blastp 13.blastp 14.blastp 15.blastp 16.blastp 17.blastp 18.blastp 19.blastp 20.blastp
This will work just nicely. Now, I am struggling in storing the results into proper output files. So I want to also store the output into files which names are stored into another array:
echo $out_in
1_pairs.tab 2_pairs.tab 3_pairs.tab 4_pairs.tab 5_pairs.tab 6_pairs.tab 7_pairs.tab 8_pairs.tab 9_pairs.tab 10_pairs.tab 11_pairs.tab 12_pairs.tab 13_pairs.tab 14_pairs.tab 15_pairs.tab 16_pairs.tab 17_pairs.tab 18_pairs.tab 19_pairs.tab 20_pairs.tab
cat $input | grep -v "#" | cut -f 1,2,3 > "$out_in"
My problem is:
When I don't use the "" I will get 'ambiguous redirect' error.
When I use them, a single file will be created that comes by the name:
1_pairs.tab?2_pairs.tab?3_pairs.tab?4_pairs.tab?5_pairs.tab?6_pairs.tab?7_pairs.tab?8_pairs.tab?9_pairs.tab?10_pairs.tab?11_pairs.tab?12_pairs.tab?13_pairs.tab?14_pairs.tab?15_pairs.tab?16_pairs.tab?17_pairs.tab?18_pairs.tab?19_pairs.tab?20_pairs.tab
I don't get why the input array is read with no problem but that's not the case for the output array...
any ideas?
Thanks a lot!
D.
You cannot redirect output that way, the output is a stream of characters and the redirection can not know when to switch to the next file. You need a loop over input files.
Assuming that the file names do not contain spaces:
for fn in $input; do
grep -v "$fn" | cut -f 1,2,3 >"${fn%%.*}_pairs.tab"
done

how to use awk command where the file is perl variable

Unable to run the awk command in perl script and my file is variable.
I have tried in different ways like use system(awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' $download_content) and store the output in array but no luck.
$filter = `awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' $download_content`;
here $download_content is a webpage i.e in html format and i need to extract the search pattern line and its next 6th line.
Here is an example using IPC::Run3:
use IPC::Run3;
# [...]
my #cmd = ('awk', '/"">/{nr[NR]; nr[NR+6]}; NR in nr');
my $in = $download_content;
my $out;
run3 \#cmd, \$in, \$out;
$filter = $out;
Alternatively, you can do it in perl (not calling awk):
my #lines = split /\n/, $download_content;
my %nr;
my $NR = 0;
my $filter = "";
for ( #lines ) {
if ( /"">/ ) {
$nr{$NR}++;
$nr{$NR + 6}++;
}
$filter .= "$_\n" if $nr{$NR};
$NR++;
}
When you write something like:
awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' XXXX
Then awk expects "XXXX" to be the name of a file that it should work on. But (as I understand it) that's not the situation that you have. It sounds to me as though "XXXX" is the actual data that you want to work on. In that case, you need to pipe the data into awk. The easiest option is to use echo:
echo XXXX | awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr'
You should be able to do the same thing using Perl' system() function.
But it might be simpler to reimplement your awk code as Perl.

How do I convert CSV data into an associative array using Bash 4?

The file /tmp/file.csv contains the following:
name,age,gender
bob,21,m
jane,32,f
The CSV file will always have headers.. but might contain a different number of fields:
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
In either case, I want to convert my CSV data into an array of associative arrays..
What I need
So, I want a Bash 4.3 function that takes CSV as piped input and sends the array to stdout:
/tmp/file.csv:
name,age,gender
bob,21,m
jane,32,f
It needs to be used in my templating system, like this:
{{foo | csv_to_array | foo2}}
^ this is a fixed API, I must use that syntax .. foo2 must receive the array as standard input.
The csv_to_array func must do it's thing, so that afterwards I can do this:
$ declare -p row1; declare -p row2; declare -p new_array;
and it would give me this:
declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")
..Once I have this array structure (an indexed array of associative array names), I have a shell-based templating system to access them, like so:
{{#new_array}}
Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}
But I'm struggling to generate the arrays I need..
Things I tried:
I have already tried using this as a starting point to get the array structure I need:
while IFS=',' read -r -a my_array; do
echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)
(from Shell: CSV to array)
..and also this:
cat /tmp/file.csv | while read line; do
line=( ${line//,/ } )
echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[#]}"
done
(from https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/)
but I didn't really make any progress in getting what I want out the other end...
EDIT:
Accepted the 2nd answer, but I had to hack the library I am using to make either solution work..
I'll be happy to look at other answers, which do not export the declare commands as strings, to be run in the current env, but instead somehow hoist the resultant arrays of the declare commands to the current env (the current env is wherever the function is run from).
Example:
$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data
So, to be clear, if the above ^ works in a terminal, it'll work in the library I'm using without the hacks I had to add (which involved grepping STDIN for ^declare -a and using source <(cat); eval $STDIN... in other functions)...
See my comments on the 2nd answer for more info.
The approach is straightforward:
Read the column headers into an array
Read the file line by line, in each line …
Create a new associative array and register its name in the array of array names
Read the fields and assign them according to the column headers
In the last step we cannot use read -a, mapfile, or things like these since they only create regular arrays with numbers as indices, but we want an associative array instead, so we have to create the array manually.
However, the implementation is a bit convoluted because of bash's quirks.
The following function parses stdin and creates arrays accordingly.
I took the liberty to rename your array new_array to rowNames.
#! /bin/bash
csvToArrays() {
IFS=, read -ra header
rowIndex=0
while IFS= read -r line; do
((rowIndex++))
rowName="row$rowIndex"
declare -Ag "$rowName"
IFS=, read -ra fields <<< "$line"
fieldIndex=0
for field in "${fields[#]}"; do
printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
printf -v "$rowName[$quotedFieldHeader]" %s "$field"
done
rowNames+=("$rowName")
done
declare -p "${rowNames[#]}" rowNames
}
Calling the function in a pipe has no effect. Bash executes the commands in a pipe in a subshell, therefore you won't have access to the arrays created by someCommand | csvToArrays. Instead, call the function as either one of the following
csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile # when input comes from a file
Bash scripts like these tend to be very slow. That's the reason why I didn't bother to extract printf -v quotedFieldHeader … from the inner loop even though it will do the same work over and over again.
I think the whole templating thing and everything related would be way easier to program and faster to execute in languages like python, perl, or something like that.
The following script:
csv_to_array() {
local -a values
local -a headers
local counter
IFS=, read -r -a headers
declare -a new_array=()
counter=1
while IFS=, read -r -a values; do
new_array+=( row$counter )
declare -A "row$counter=($(
paste -d '' <(
printf "[%s]=\n" "${headers[#]}"
) <(
printf "%q\n" "${values[#]}"
)
))"
(( counter++ ))
done
declare -p new_array ${!row*}
}
foo2() {
source <(cat)
declare -p new_array ${!row*} |
sed 's/^/foo2: /'
}
echo "==> TEST 1 <=="
cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2
echo "==> TEST 2 <=="
cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2
will output:
==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )
The output comes from foo2 function.
The csv_to_array function first reads the headaers. Then for each read line it adds new element into new_array array and also creates a new associative array with the name row$index with elements created from joining the headers names with values read from the line. On the end the output from declare -p is outputted from the function.
The foo2 function sources the standard input, so the arrays come into scope for it. It outputs then those values again, prepending each line with foo2:.

Can I pass an array to awk using -v?

I would like to be able to pass an array variable to awk. I don't mean a shell array but a native awk one. I know I can pass scalar variables like this:
awk -vfoo="1" 'NR==foo' file
Can I use the same mechanism to define an awk array? Something like:
$ awk -v"foo[0]=1" 'NR==foo' file
awk: fatal: `foo[0]' is not a legal variable name
I've tried a few variations of the above but none of them work on GNU awk 4.1.1 on my Debian. So, is there any version of awk (gawk,mawk or anything else) that can accept an array from the -v switch?
I know I can work around this and can easily think of ways to do so, I am just wondering if any awk implementation supports this kind of functionality natively.
You can use the split() function inside mawk or gawk to split the input of the "-v" value (here is the gawk man page):
split(s, a [, r [, seps] ])
Split the string s into the array a and the separators array seps on
the regular expression r, and return the number of fields.*
An example here in which i pass the value "ARRAYVAR", a comma separated list of values which is my array, with "-v" to the awk program, then split it into the internal variable array "arrayval" using the split() function and then print the 3rd value of the array:
echo 0 | gawk -v ARRAYVAR="a,b,c,d,e,f" '{ split(ARRAYVAR,arrayval,","); print(arrayval[3]) }'
c
Seems to work :)
It looks like it is impossible by definition.
From man awk we have that:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the
program begins. Such variable values are available to the BEGIN rule
of an AWK program.
Then we read in Using Variables in a Program that:
The name of a variable must be a sequence of letters, digits, or
underscores, and it may not begin with a digit.
Variables in awk can be assigned either numeric or string values.
So the way the -v implementation is defined makes it impossible to provide an array as a variable, since any kind of usage of the characters = or [ is not allowed as part of the -v variable passing. And both are required, since arrays in awk are only associative.
If you don't insist on using -v you could use -i (include) instead to read an awk file that contains the variable settings.
Like this:
if F=$(mktemp inputXXXXXX); then
cat >$F << 'END'
BEGIN {
foo[0]=1
}
END
cat $F
awk -i $F 'BEGIN { print foo[0] }' </dev/null
rm $F
fi
Sample trace (using gawk-4.2.1):
bash -x /tmp/test.sh
++ mktemp inputXXXXXX
+ F=inputrpMsan
+ cat
+ cat inputrpMsan
BEGIN {
foo[0]=1
}
+ awk -i inputrpMsan 'BEGIN { print foo[0] }'
1
+ rm inputrpMsan
Unfortunately, this is not possible. However, you can convert a bash array to an awk array using a few clever methods.
I wanted to do this recently by passing a bash array to awk to use it for filtering, so here is what I did:
$ arr=( hello world this is bash array )
$ echo -e 'this\nmight\nnot\nshow\nup' | awk 'BEGIN {
for (i = 1; i < ARGC; i++) {
my_filter[ARGV[i]]=1
ARGV[i]="" # unset ARGV[i] otherwise awk might try to read it as a file
}
} !my_filter[$0]' "${arr[#]}"
Output:
might
not
show
up
For associative arrays, you could pass it as a string of key-value pairs, and then reformat it in the BEGIN section.
$ echo | awk -v m="a,b;c,d" '
BEGIN {
split(m,M,";")
for (i in M) {
split(M[i],MM,",")
MA[MM[1]]=MM[2]
}
}
{
for (a in MA) {
printf("MA[%s]=%s\n",a, MA[a])
}
}'
Output:
MA[a]=b
MA[c]=d

Resources