Put lines of a text file in an array in bash - arrays

I'm taking over a bash script from a colleague that reads a file, process it and print another file based on the line in the while loop at the moment.
I now need to append some features to it. The one I'm having issues with right now is to read a file and put each line into an array, except the 2nd column of that line can be empty, e.g.:
For a text file with \t as separator:
A\tB\tC
A\t\tC
For a CSV file same but with , as separator:
A,B,C
A,,C
Which should then give
["A","B","C"] or ["A", "", "C"]
The code I took over is as follow:
while IFS=$'\t\r' read -r -a col; do
# Process the array, put that into a file
lp -d $printer $file_to_print
done < $input_file
Which works if B is filled, but B need to be empty now sometimes, so when the input files keeps it empty, the created array and thus the output file to print just skips this empty cell (array is then ["A","C"]).
I tried writing the whole bloc on awk but this brought it's own sets of problems, making it difficult to call the lp command to print.
So my question is, how can I preserve the empty cell from the line into my bash array, so that I can call on it later and use it?
Thank you very much. I know this might be quite confused so please ask and I'll specify.
Edit: After request, here's the awk code I've tried. The issue here is that it only prints the last print request, while I know it loops over the whole file, and the lp command is still in the loop.
awk 'BEGIN {
inputfile="'"${optfile}"'"
outputfile="'"${file_loc}"'"
printer="'"${printer}"'"
while (getline < inputfile){
print "'"${prefix}"'" > outputfile
split($0,ft,"'"${IFSseps}"'");
if (length(ft[2]) == 0){
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"\"" >> outputfile
size_changer = 0
} else {
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"_"ft[2]"\"" >> outputfile
size_changer = 1
}
if ( split($0,ft,"'"${IFSseps}"'") > 6)
maxcounter = 6;
else
maxcounter = split($0,ft,"'"${IFSseps}"'");
for (i = 3; i <= maxcounter; i++){
x=191-(i-2)*33
print "CODEPAGE 1252\nTEXT 465,"x",\"ROMAN.TTF\",180,7,7,\""ft[i]"\"" >> outputfile
}
print "PRINT ""'"${copies}"'"",1" >> outputfile
close(outputfile)
"'"`lp -d ${printer} ${file_loc}`"'"
}
close("'"${file_loc}"'");
}'
EDIT2: Continuing to try to find a solution to it, I tried following code without success. This is weird, as just doing printf without putting it in an array keeps the formatting intact.
$ cat testinput | tr '\t' '>'
A>B>C
A>>C
# Should normally be empty on the second ouput line
$ while read line; do IFS=$'\t' read -ra col < <(printf "$line"); echo ${col[1]}; done < testinput
B
C

For tab, it's complicated.
From 3.5.7 Word Splitting in the manual:
A sequence of IFS whitespace characters is also treated as a delimiter.
Since tab is an "IFS whitespace character", sequences of tabs are treated as a single delimiter
IFS=$'\t' read -ra ary <<<$'A\t\tC'
declare -p ary
declare -a ary=([0]="A" [1]="C")
What you can do is translate tabs to a non-whitespace character, assuming it does not clash with the actual data in the fields:
line=$'A\t\tC'
IFS=, read -ra ary <<<"${line//$'\t'/,}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")
To avoid the risk of colliding with commas in the data, we can use an unusual ASCII character: FS, octal 034
line=$'A\t\tC'
printf -v FS '\034'
IFS="$FS" read -ra ary <<<"${line//$'\t'/"$FS"}"
# or, without the placeholder variable
IFS=$'\034' read -ra ary <<<"${line//$'\t'/$'\034'}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")

One bash example using parameter expansion where we convert the delimiter into a \n and let mapfile read in each line as a new array entry ...
For tab-delimited data:
for line in $'A\tB\tC' $'A\t\tC'
do
mapfile -t array <<< "${line//$'\t'/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A B C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: The $'...' construct insures the \t is treated as a single <tab> character as opposed to the two literal characters \ + t.
For comma-delimited data:
for line in 'A,B,C' 'A,,C'
do
mapfile -t array <<< "${line//,/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A,B,C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A,,C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: This obviously (?) assumes the desired data does not contain a comma (,).

It may just be your # Process the array, put that into a file part.
IFS=, read -ra ray <<< "A,,C"
for e in "${ray[#]}"; do o="$o\"$e\","; done
echo "[${o%,}]"
["A","","C"]
See #Glenn's excellent answer regarding tabs.
My simple data file:
$: cat x # tab delimited, empty field 2 of line 2
a b c
d f
My test:
while IFS=$'\001' read -r a b c; do
echo "a:[$a] b:[$b] c:[$c]"
done < <(tr "\t" "\001"<x)
a:[a] b:[b] c:[c]
a:[d] b:[] c:[f]
Note that I used ^A (a 001 byte) but you might be able to use something as simple as a comma or pipe (|) character. Choose based on your data.

Related

Bash, wihle read line by line, split strings on line divided by ",", store to array

I need to read file line by line, and every line split by ",", and store to array.
File source_file.
usl-coop,/root
usl-dev,/bin
Script.
i=1
while read -r line; do
IFS="," read -ra para_$i <<< $line
echo ${para_$i[#]}
((i++))
done < source_file
Expected output.
para_1[0]=usl-coop
para_1[1]=/root
para_2[0]=usl-dev
para_2[1]=/bin
Script will out error about echo.
./sofimon.sh: line 21: ${para_$i[#]}: bad substitution
When I echo array one by one field, for example
echo para_1[0]
it shows, that variables are stored.
But I need use it with variable within, something like this.
${para_$i[1]}
Is possible to do this?
Thanks.
S.
There is a trick to simulate 2D arrays using associative arrays. It works nice and I think is the most flexible and extensible:
declare -A para
i=1
while IFS=, read -r -a line; do
for j in ${!line[#]}; do
para[$i,$j]="${line[$j]}"
done
((i++)) ||:
done < source_file
declare -p para
will output:
declare -A para=([1,0]="usl-coop" [1,1]="/root" [2,1]="/bin" [2,0]="usl-dev" )
Without modifying your script that much you could use indirect variable expansion. It's sometimes used in simpler scripts:
i=1
while IFS="," read -r -a para_$i; do
n="para_$i[#]"
echo "${!n}"
((i++)) ||:
done < source_file
declare -p ${!para_*}
or basically the same with a nameref a named reference to another variable (side note: see how [#] needs to be part of the variable in indirect expansion, but not in named reference):
i=1
while IFS="," read -r -a para_$i; do
declare -n n
n="para_$i"
echo "${n[#]}"
((i++)) ||:
done < source_file
declare -p ${!para_*}
both scripts above will output the same:
usl-coop /root
usl-dev /bin
declare -a para_1=([0]="usl-coop" [1]="/root")
declare -a para_2=([0]="usl-dev" [1]="/bin")
That said, I think you shouldn't read your file into memory at all. It's just a bad design. Shell and bash is build around passing your files with pipes, streams, fifos, redirections, process substitutions, etc. without ever saving/copying/storing the file. If you have a file to parse, you should stream it to another process, parse and save the result, without ever storing the whole input in memory. If you want some data to find inside a file, use grep or awk.
Here is a short awk script that do the task.
awk 'BEGIN{FS=",";of="para_%d[%d]=%s\n"}{printf(of, NR, 0, $1);printf(of, NR, 1, $2)}' input.txt
Provide the desired output.
Explanation:
BEGIN{
FS=","; # set field seperator to `,`
of="para_%d[%d]=%s\n" # define common printf output format
}
{ # for each input line
printf(of, NR, 0, $1); # output for current line, [0], left field
printf(of, NR, 1, $2) # output for current line, [1], right field
}

Return awk split array to a bash variable

I have a requirement to split a string on a multi-character delimiter and return the values into an array in Bash for further processing
IFS can take a single character delimiter.
a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222_MultiCharDel_2;EEEE;FFFFFFF;22222"
awk'{split($0,ArrayDeltaMulDep,"_MultiCharDel_")}' <<< $a
The input string can have several substrings separated by the MultiCharDel delimiter.
How can i access this array ArrayDeltaMulDep fur further processing in Bash?
Your example string, a, does not contain newlines. If that is true in general, then:
a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222"
readarray -t b <<< "${a//MultiCharDel/$'\n'}"
We can verify that this split the string properly using declare -p to show the value of b:
$ declare -p b
declare -a b=([0]="2;AAAAA;BBBBB;1111_" [1]="_2;CCCC;DDDDDD;22222")
How it works:
readarray -t b
This reads lines from stdin and puts then in a bash array b.
<<< "${a//MultiCharDel/$'\n'}"
${a//MultiCharDel/$'\n'} uses pattern substitution to replace MultiCharDel with a newline character. <<< provides the result as stdin to the command readarray.
Hat tip: Chepner
More general solution
A bash string will never contain a null character (hex 00). Using GNU sed:
b=()
while read -d '' -r line
do
b+=("$line")
done < <(sed 's/MultiCharDel/\x00/g; s/$/\x00/' <<<"$a")
This again creates an array with the desired splitting:
$ declare -p b
declare -a b=([0]="2;AAAAA;BBBBB;1111_" [1]="_2;CCCC;DDDDDD;22222")

Bash, split words into letters and save to array

I'm struggling with a project. I am supposed to write a bash script which will work like tr command. At the beginning I would like to save all commands arguments into separated arrays. And in case if an argument is a word I would like to have each char in separated array field,eg.
tr_mine AB DC
I would like to have two arrays: a[0] = A, a[1] = B and b[0]=C b[1]=D.
I found a way, but it's not working:
IFS="" read -r -a array <<< "$a"
No sed, no awk, all bash internals.
Assuming that words are always separated with blanks (space and/or tabs),
also assuming that words are given as arguments, and writing for bash only:
#!/bin/bash
blank=$'[ \t]'
varname='A'
n=1
while IFS='' read -r -d '' -N 1 c ; do
if [[ $c =~ $blank ]]; then n=$((n+1)); continue; fi
eval ${varname}${n}'+=("'"$c"'")'
done <<<"$#"
last=$(eval echo \${#${varname}${n}[#]}) ### Find last character index.
unset "${varname}${n}[$last-1]" ### Remove last (trailing) newline.
for ((j=1;j<=$n;j++)); do
k="A$j[#]"
printf '<%s> ' "${!k}"; echo
done
That will set each array A1, A2, A3, etc. ... to the letters of each word.
The value at the end of the first loop of $n is the count of words processed.
Printing may be a little tricky, that is why the code to access each letter is given above.
Applied to your sample text:
$ script.sh AB DC
<A> <B>
<D> <C>
The script is setting two (array) vars A1 and A2.
And each letter is one array element: A1[0] = A, A1[1] = B and A2[0]=C, A2[1]=D.
You need to set a variable ($k) to the array element to access.
For example, to echo fourth letter (0 based) of second word (1 based) you need to do (that may be changed if needed):
k="A2[3]"; echo "${!k}" ### Indirect addressing.
The script will work as this:
$ script.sh ABCD efghi
<A> <B> <C> <D>
<e> <f> <g> <h> <i>
Caveat: Characters will be split even if quoted. However, quoted arguments is the correct way to use this script to avoid the effect of shell metacharacters ( |,&,;,(,),<,>,space,tab ). Of course, spaces (even if repeated) will split words as defined by the variable $blank:
$ script.sh $'qwer;rttt fgf\ngfg'
<q> <w> <e> <r> <;> <r> <t> <t> <t>
<>
<>
<>
<f> <g> <f> <
> <g> <f> <g>
As the script will accept and correctly process embebed newlines we need to use: unset "${varname}${n}[$last-1]" to remove the last trailing "newline". If that is not desired, quote the line.
Security Note: The eval is not much of a problem here as it is only processing one character at a time. It would be difficult to create an attack based on just one character. Anyway, the usual warning is valid: Always sanitize your input before using this script. Also, most (not quoted) metacharacters of bash will break this script.
$ script.sh qwer(rttt fgfgfg
bash: syntax error near unexpected token `('
I would strongly suggest to do this in another language if possible, it will be a lot easier.
Now, the closest I come up with is:
#!/bin/bash
sentence="AC DC"
words=`echo "$sentence" | tr " " "\n"`
# final array
declare -A result
# word count
wc=0
for i in $words; do
# letter count in the word
lc=0
for l in `echo "$i" | grep -o .`; do
result["w$wc-l$lc"]=$l
lc=$(($lc+1))
done
wc=$(($wc+1))
done
rLen=${#result[#]}
echo "Result Length $rLen"
for i in "${!result[#]}"
do
echo "$i => ${result[$i]}"
done
The above prints:
Result Length 4
w1-l1 => C
w1-l0 => D
w0-l0 => A
w0-l1 => C
Explanation:
Dynamic variables are not supported in bash (ie create variables using variables) so I am using an associative array instead (result)
Arrays in bash are single dimension. To fake a 2D array I use the indexes: w for words and l for letters. This will make further processing a pain...
Associative arrays are not ordered thus results appear in random order when printing
${!result[#]} is used instead of ${result[#]}. The first iterates keys while the second iterates values
I know this is not exactly what you ask for, but I hope it will point you to the right direction
Try this :
sentence="$#"
read -r -a words <<< "$sentence"
for word in ${words[#]}; do
inc=$(( i++ ))
read -r -a l${inc} <<< $(sed 's/./& /g' <<< $word)
done
echo ${words[1]} # print "CD"
echo ${l1[1]} # print "D"
The first read reads all words, the internal one is for letters.
The sed command add a space after each letters to make the string splittable by read -a. You can also use this sed command to remove unwanted characters from words (eg commas) before splitting.
If special characters are allowed in words, you can use a simple grep instead of the sed command (as suggested in http://www.unixcl.com/2009/07/split-string-to-characters-in-bash.html) :
read -r -a l${inc} <<< $(grep -o . <<< $word)
The word array is ${w}.
The letters arrays are named l# where # is an increment added for each word read.

How can I handle an array where elements contain spaces in Bash?

Let's say I have a file named tmp.out that contains the following:
c:\My files\testing\more files\stuff\test.exe
c:\testing\files here\less files\less stuff\mytest.exe
I want to put the contents of that file into an array and I do it like so:
ARRAY=( `cat tmp.out` )
I then run this through a for loop like so
for i in ${ARRAY[#]};do echo ${i}; done
But the output ends up like this:
c:\My
files\testing\more
files\stuff\test.sas
c:\testing\files
here\less
files\less
stuff\mytest.sas
and I want the output to be:
c:\My files\testing\more files\stuff\test.exe
c:\testing\files here\less files\less stuff\mytest.exe
How can I resolve this?
In order to iterate over the values in an array, you need to quote the array expansion to avoid word splitting:
for i in "${values[#]}"; do
Of course, you should also quote the use of the value:
echo "${i}"
done
That doesn't answer the question of how to get the lines of a file into an array in the first place. If you have bash 4.0, you can use the mapfile builtin:
mapfile -t values < tmp.out
Otherwise, you'd need to temporarily change the value of IFS to a single newline, or use a loop over the read builtin.
You can use the IFS variable, the Internal Field Separator. Set it to empty string to split the contents on newlines only:
while IFS= read -r line ; do
ARRAY+=("$line")
done < tmp.out
-r is needed to keep the literal backslashes.
Another simple way to control word-splitting is by controlling the Internal Field Separator (IFS):
#!/bin/bash
oifs="$IFS" ## save original IFS
IFS=$'\n' ## set IFS to break on newline
array=( $( <dat/2lines.txt ) ) ## read lines into array
IFS="$oifs" ## restore original IFS
for ((i = 0; i < ${#array[#]}; i++)) do
printf "array[$i] : '%s'\n" "${array[i]}"
done
Input
$ cat dat/2lines.txt
c:\My files\testing\more files\stuff\test.exe
c:\testing\files here\less files\less stuff\mytest.exe
Output
$ bash arrayss.sh
array[0] : 'c:\My files\testing\more files\stuff\test.exe'
array[1] : 'c:\testing\files here\less files\less stuff\mytest.exe'

Why can't I append to array?

I'm not sure whats going on here
#!/bin/bash
STRING_PREFIX="foo"
STRING_IDX="1,2,3,4,5"
declare -a STRING_ARRAY
main() {
assemble_strings
for i in "${STRING_ARRAY[#]}"; do
echo "TEST: $i"
done
}
assemble_strings() {
IFS=,
while IFS= read idx; do
STRING_ARRAY+=("${STRING_PREFIX}${idx}")
done < <(echo $STRING_IDX)
}
main
I expect an array of 5 strings each prepended with 'foo'. Instead I get an array of 1 string
TEST: foo1 2 3 4 5
For bonus points, how can I avoid the loop entirely? I can't figure out how to create an array from an expression in bash.
First: Because you put IFS= at the front of your read, the prior IFS=, does nothing (insofar as that read is concerned).
Second: Because you aren't setting -d , in your read, it's using the default -- newline -- value as record terminator. (IFS determines the field separator, not the record terminator; with an empty IFS value, your records have only one field in them anyhow). Thus, when you call read, it reads the whole record -- up to the newline -- so your loop only runs once.
One approach, using read -a to read directly to an array (in this case, treating the entire input stream as a single record, with fields separated by commas):
string_idx=1,2,3,4,5
string_prefix=foo
# use read to directly populate the array
IFS=, read -r -d '' -a string_array <<<"$string_idx"
# go back through and tack on prefixes
for idx in "${!string_array[#]}"; do
string_array[$idx]="${string_prefix}${string_array[$idx]}"
done
# print values
printf ' entry: %s\n' "${string_array[#]}"
Another, making the smallest change to your existing code -- treating the input stream as a series of single-field comma-separated records:
string_idx=1,2,3,4,5
string_prefix=foo
string_array=( )
while IFS= read -r -d , idx; do
string_array+=( "${string_prefix}${idx}" )
done <<<"$string_idx,"

Resources