Error in reading a multi-line string into an array? - arrays

I am using the following bash codes to want to read a multi-line string into an array. I want each array element corresponds to one line of the string.
mytext="line one
line two
line three"
IFS=$'\n' read -a lines <<<"${mytext}"
echo "len=${#lines[#]}"
for line in "${lines[#]}"
do
echo "[$line]"
done
I expect "len" should be equal to 3 and the "lines" array should be properly initialized. However, I got the following result :
len=1
[line one]
Have I used the wrong "IFS" ? What are the mistakes in the bash codes ?
Thanks in advance.

What's wrong with your solution is that read always reads a single line at a time, so telling it the IFS is a newline will make it read the entire line into the first element of the array. Each time you read you'll still overwrite the entire array. You can either build up the array iteratively:
lines=()
while read; do
lines+=("$REPLY")
done <<< "$mytext"
or by swapping the newlines for something else:
IFS='+' read -a lines <<< "${mytext//$'\n'/+}"
$ IFS=#
$ echo "${lines[*]}"
line one#line two#line three
Using mapfile (a.k.a. readarray) would be a more coherent, elegant solution, but that's only supported in Bash 4:
mapfile -t lines <<< "$mytext"
$ printf '[%s]\n' "${lines[#]}"
[line one]
[line two]
[line three]
Without the -t flag, mapfile will keep the newline attached to the array element.

This while loop should work:
arr=()
while read -r line; do
arr+=("$line")
done <<< "$mytext"
set | grep arr
arr=([0]="line one" [1]="line two" [2]="line three")

Not sure what is wrong in your case, but here is a workaround:
a=0
while read lines[$a]; do
((a++))
done <<< "${mytext}"
unset lines[$a]; #last iteration has already failed. Unset that index.

Related

Bash, wihle read line by line, split strings on line divided by ",", store to array

I need to read file line by line, and every line split by ",", and store to array.
File source_file.
usl-coop,/root
usl-dev,/bin
Script.
i=1
while read -r line; do
IFS="," read -ra para_$i <<< $line
echo ${para_$i[#]}
((i++))
done < source_file
Expected output.
para_1[0]=usl-coop
para_1[1]=/root
para_2[0]=usl-dev
para_2[1]=/bin
Script will out error about echo.
./sofimon.sh: line 21: ${para_$i[#]}: bad substitution
When I echo array one by one field, for example
echo para_1[0]
it shows, that variables are stored.
But I need use it with variable within, something like this.
${para_$i[1]}
Is possible to do this?
Thanks.
S.
There is a trick to simulate 2D arrays using associative arrays. It works nice and I think is the most flexible and extensible:
declare -A para
i=1
while IFS=, read -r -a line; do
for j in ${!line[#]}; do
para[$i,$j]="${line[$j]}"
done
((i++)) ||:
done < source_file
declare -p para
will output:
declare -A para=([1,0]="usl-coop" [1,1]="/root" [2,1]="/bin" [2,0]="usl-dev" )
Without modifying your script that much you could use indirect variable expansion. It's sometimes used in simpler scripts:
i=1
while IFS="," read -r -a para_$i; do
n="para_$i[#]"
echo "${!n}"
((i++)) ||:
done < source_file
declare -p ${!para_*}
or basically the same with a nameref a named reference to another variable (side note: see how [#] needs to be part of the variable in indirect expansion, but not in named reference):
i=1
while IFS="," read -r -a para_$i; do
declare -n n
n="para_$i"
echo "${n[#]}"
((i++)) ||:
done < source_file
declare -p ${!para_*}
both scripts above will output the same:
usl-coop /root
usl-dev /bin
declare -a para_1=([0]="usl-coop" [1]="/root")
declare -a para_2=([0]="usl-dev" [1]="/bin")
That said, I think you shouldn't read your file into memory at all. It's just a bad design. Shell and bash is build around passing your files with pipes, streams, fifos, redirections, process substitutions, etc. without ever saving/copying/storing the file. If you have a file to parse, you should stream it to another process, parse and save the result, without ever storing the whole input in memory. If you want some data to find inside a file, use grep or awk.
Here is a short awk script that do the task.
awk 'BEGIN{FS=",";of="para_%d[%d]=%s\n"}{printf(of, NR, 0, $1);printf(of, NR, 1, $2)}' input.txt
Provide the desired output.
Explanation:
BEGIN{
FS=","; # set field seperator to `,`
of="para_%d[%d]=%s\n" # define common printf output format
}
{ # for each input line
printf(of, NR, 0, $1); # output for current line, [0], left field
printf(of, NR, 1, $2) # output for current line, [1], right field
}

awk through text file with different delimiter count into array

I have a text file that has 8000 lines, here is an example
00122;IL;Chicago;Router;;1496009459
00133;IL;Chicago;Router;0;6.651;1496009460
00166;IL;Chicago;Router;0;5.798;1496009460
00177;IL;Chicago;Router;0;5.365;1496009460
00188;IL;Chicago;Router;0;22.347;1496009460
As you can see the file has different count of delimiter, I need to insert all columns separated by ';' to an array no matter when the the delimiter occurs
So the first line would have 6 fields and the second line would have 7.
When I tried do it through the below command
Number=( $(awk '{print $1}' $FileName.txt) ) with different array name and field for each columns, I am getting strange behavior which not all fields are printed for some lines when I echo them all in one line
Performance is very important (need to do it in a matter of seconds )and I found using awk is the fastest approach so far, unless someone has better approach.
An ideas why this is happening ?
To dump the entire text file into an array, I would use the following. In this example, we use the two arrays ${finalarray[]} and ${subarray[]} (though the latter is unset at the end) and the variable $line. We assume the file name is file.txt.
#!/bin/bash
finalarray=()
while read line; do #For each cycle, the variable $line is the next line of the file
if [[ -z $line ]]; then continue; done #If the line is empty, skip this cycle
IFS=";" read -r -a subarray <<< "$line" #Split $line into ${subarray[]} using : as delim
finalarray+=( "$subarray[#]}" ) #Add every element from ${subarray[]} to ${finalarray[]}
unset subarray #clears the array
done <file.txt
If your empty lines are, in fact, populated by spaces or other whitespace characters, the empty line catch won't work. Instead, you could use something like the following to skip any lines not containing semicolons.
if [[ $(echo "$line" | grep -c ";") -eq 0 ]]; then continue; fi
On the other hand, this would skip all lines without a semicolon, even if you intended some of those lines to be a single array entry.

Sort multiple column String array in bash

I have an array of strings:
arr[0]="1 10 2Z6UVU6h"
arr[1]="1 12 7YzF5mFs"
arr[2]="2 36 qRwAiLg7"
How could i sort by the 2nd column and use the 1st as a tie break.
Is there anything similar to something like...
sort -k 2,2n -k 1,1 $arr
As long as there are no newline characters in any array element, it's straight-forward: Just printf the array into sort and capture the output:
mapfile -t sorted < <(printf "%s\n" "${arr[#]}" | sort -k2,2n -k1,1)
(The use of process substitution is to avoid having the mapfile run in a subshell, which wouldn't be helpful since the goal is to set the value of $sorted in this shell.)
If the array elements might contain newlines, then you could use NUL as a delimiter in the printf and the sort (option -z for sort), but you'd have to replace mapfile with an explicit loop because mapfile does not offer an option to change the line delimiter. read does (-d '' will cause read to use NUL as a line delimiter), but it only reads one line at a time.

Bash, split words into letters and save to array

I'm struggling with a project. I am supposed to write a bash script which will work like tr command. At the beginning I would like to save all commands arguments into separated arrays. And in case if an argument is a word I would like to have each char in separated array field,eg.
tr_mine AB DC
I would like to have two arrays: a[0] = A, a[1] = B and b[0]=C b[1]=D.
I found a way, but it's not working:
IFS="" read -r -a array <<< "$a"
No sed, no awk, all bash internals.
Assuming that words are always separated with blanks (space and/or tabs),
also assuming that words are given as arguments, and writing for bash only:
#!/bin/bash
blank=$'[ \t]'
varname='A'
n=1
while IFS='' read -r -d '' -N 1 c ; do
if [[ $c =~ $blank ]]; then n=$((n+1)); continue; fi
eval ${varname}${n}'+=("'"$c"'")'
done <<<"$#"
last=$(eval echo \${#${varname}${n}[#]}) ### Find last character index.
unset "${varname}${n}[$last-1]" ### Remove last (trailing) newline.
for ((j=1;j<=$n;j++)); do
k="A$j[#]"
printf '<%s> ' "${!k}"; echo
done
That will set each array A1, A2, A3, etc. ... to the letters of each word.
The value at the end of the first loop of $n is the count of words processed.
Printing may be a little tricky, that is why the code to access each letter is given above.
Applied to your sample text:
$ script.sh AB DC
<A> <B>
<D> <C>
The script is setting two (array) vars A1 and A2.
And each letter is one array element: A1[0] = A, A1[1] = B and A2[0]=C, A2[1]=D.
You need to set a variable ($k) to the array element to access.
For example, to echo fourth letter (0 based) of second word (1 based) you need to do (that may be changed if needed):
k="A2[3]"; echo "${!k}" ### Indirect addressing.
The script will work as this:
$ script.sh ABCD efghi
<A> <B> <C> <D>
<e> <f> <g> <h> <i>
Caveat: Characters will be split even if quoted. However, quoted arguments is the correct way to use this script to avoid the effect of shell metacharacters ( |,&,;,(,),<,>,space,tab ). Of course, spaces (even if repeated) will split words as defined by the variable $blank:
$ script.sh $'qwer;rttt fgf\ngfg'
<q> <w> <e> <r> <;> <r> <t> <t> <t>
<>
<>
<>
<f> <g> <f> <
> <g> <f> <g>
As the script will accept and correctly process embebed newlines we need to use: unset "${varname}${n}[$last-1]" to remove the last trailing "newline". If that is not desired, quote the line.
Security Note: The eval is not much of a problem here as it is only processing one character at a time. It would be difficult to create an attack based on just one character. Anyway, the usual warning is valid: Always sanitize your input before using this script. Also, most (not quoted) metacharacters of bash will break this script.
$ script.sh qwer(rttt fgfgfg
bash: syntax error near unexpected token `('
I would strongly suggest to do this in another language if possible, it will be a lot easier.
Now, the closest I come up with is:
#!/bin/bash
sentence="AC DC"
words=`echo "$sentence" | tr " " "\n"`
# final array
declare -A result
# word count
wc=0
for i in $words; do
# letter count in the word
lc=0
for l in `echo "$i" | grep -o .`; do
result["w$wc-l$lc"]=$l
lc=$(($lc+1))
done
wc=$(($wc+1))
done
rLen=${#result[#]}
echo "Result Length $rLen"
for i in "${!result[#]}"
do
echo "$i => ${result[$i]}"
done
The above prints:
Result Length 4
w1-l1 => C
w1-l0 => D
w0-l0 => A
w0-l1 => C
Explanation:
Dynamic variables are not supported in bash (ie create variables using variables) so I am using an associative array instead (result)
Arrays in bash are single dimension. To fake a 2D array I use the indexes: w for words and l for letters. This will make further processing a pain...
Associative arrays are not ordered thus results appear in random order when printing
${!result[#]} is used instead of ${result[#]}. The first iterates keys while the second iterates values
I know this is not exactly what you ask for, but I hope it will point you to the right direction
Try this :
sentence="$#"
read -r -a words <<< "$sentence"
for word in ${words[#]}; do
inc=$(( i++ ))
read -r -a l${inc} <<< $(sed 's/./& /g' <<< $word)
done
echo ${words[1]} # print "CD"
echo ${l1[1]} # print "D"
The first read reads all words, the internal one is for letters.
The sed command add a space after each letters to make the string splittable by read -a. You can also use this sed command to remove unwanted characters from words (eg commas) before splitting.
If special characters are allowed in words, you can use a simple grep instead of the sed command (as suggested in http://www.unixcl.com/2009/07/split-string-to-characters-in-bash.html) :
read -r -a l${inc} <<< $(grep -o . <<< $word)
The word array is ${w}.
The letters arrays are named l# where # is an increment added for each word read.

How can I handle an array where elements contain spaces in Bash?

Let's say I have a file named tmp.out that contains the following:
c:\My files\testing\more files\stuff\test.exe
c:\testing\files here\less files\less stuff\mytest.exe
I want to put the contents of that file into an array and I do it like so:
ARRAY=( `cat tmp.out` )
I then run this through a for loop like so
for i in ${ARRAY[#]};do echo ${i}; done
But the output ends up like this:
c:\My
files\testing\more
files\stuff\test.sas
c:\testing\files
here\less
files\less
stuff\mytest.sas
and I want the output to be:
c:\My files\testing\more files\stuff\test.exe
c:\testing\files here\less files\less stuff\mytest.exe
How can I resolve this?
In order to iterate over the values in an array, you need to quote the array expansion to avoid word splitting:
for i in "${values[#]}"; do
Of course, you should also quote the use of the value:
echo "${i}"
done
That doesn't answer the question of how to get the lines of a file into an array in the first place. If you have bash 4.0, you can use the mapfile builtin:
mapfile -t values < tmp.out
Otherwise, you'd need to temporarily change the value of IFS to a single newline, or use a loop over the read builtin.
You can use the IFS variable, the Internal Field Separator. Set it to empty string to split the contents on newlines only:
while IFS= read -r line ; do
ARRAY+=("$line")
done < tmp.out
-r is needed to keep the literal backslashes.
Another simple way to control word-splitting is by controlling the Internal Field Separator (IFS):
#!/bin/bash
oifs="$IFS" ## save original IFS
IFS=$'\n' ## set IFS to break on newline
array=( $( <dat/2lines.txt ) ) ## read lines into array
IFS="$oifs" ## restore original IFS
for ((i = 0; i < ${#array[#]}; i++)) do
printf "array[$i] : '%s'\n" "${array[i]}"
done
Input
$ cat dat/2lines.txt
c:\My files\testing\more files\stuff\test.exe
c:\testing\files here\less files\less stuff\mytest.exe
Output
$ bash arrayss.sh
array[0] : 'c:\My files\testing\more files\stuff\test.exe'
array[1] : 'c:\testing\files here\less files\less stuff\mytest.exe'

Resources