Read lines from text file and store it in array - arrays

So i need to read all the lines from a text file(as an argument when i call the script) which contains numbers in this form(1 new line not 2):
num1:num2
num3:num4 etc
I use this command block:
while IFS= read line
do
IFS=':' read -r -a X <<< "$line"
done < "$1"
to read the lines and numbers and store it into array X but the array goes only to position 0 and 1 and when it changes line it just write the new number(eg num3) where the old number was(eg num1 in pos 0)
Any solution to this?

With bash. Replace all : with line break and use mapfile to fill array x.
mapfile -t x < <(tr ':' '\n' < file)
declare -p x
Output:
declare -a x='([0]="num1" [1]="num2" [2]="num3" [3]="num4")'
See: help mapfile

Related

Put lines of a text file in an array in bash

I'm taking over a bash script from a colleague that reads a file, process it and print another file based on the line in the while loop at the moment.
I now need to append some features to it. The one I'm having issues with right now is to read a file and put each line into an array, except the 2nd column of that line can be empty, e.g.:
For a text file with \t as separator:
A\tB\tC
A\t\tC
For a CSV file same but with , as separator:
A,B,C
A,,C
Which should then give
["A","B","C"] or ["A", "", "C"]
The code I took over is as follow:
while IFS=$'\t\r' read -r -a col; do
# Process the array, put that into a file
lp -d $printer $file_to_print
done < $input_file
Which works if B is filled, but B need to be empty now sometimes, so when the input files keeps it empty, the created array and thus the output file to print just skips this empty cell (array is then ["A","C"]).
I tried writing the whole bloc on awk but this brought it's own sets of problems, making it difficult to call the lp command to print.
So my question is, how can I preserve the empty cell from the line into my bash array, so that I can call on it later and use it?
Thank you very much. I know this might be quite confused so please ask and I'll specify.
Edit: After request, here's the awk code I've tried. The issue here is that it only prints the last print request, while I know it loops over the whole file, and the lp command is still in the loop.
awk 'BEGIN {
inputfile="'"${optfile}"'"
outputfile="'"${file_loc}"'"
printer="'"${printer}"'"
while (getline < inputfile){
print "'"${prefix}"'" > outputfile
split($0,ft,"'"${IFSseps}"'");
if (length(ft[2]) == 0){
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"\"" >> outputfile
size_changer = 0
} else {
print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"_"ft[2]"\"" >> outputfile
size_changer = 1
}
if ( split($0,ft,"'"${IFSseps}"'") > 6)
maxcounter = 6;
else
maxcounter = split($0,ft,"'"${IFSseps}"'");
for (i = 3; i <= maxcounter; i++){
x=191-(i-2)*33
print "CODEPAGE 1252\nTEXT 465,"x",\"ROMAN.TTF\",180,7,7,\""ft[i]"\"" >> outputfile
}
print "PRINT ""'"${copies}"'"",1" >> outputfile
close(outputfile)
"'"`lp -d ${printer} ${file_loc}`"'"
}
close("'"${file_loc}"'");
}'
EDIT2: Continuing to try to find a solution to it, I tried following code without success. This is weird, as just doing printf without putting it in an array keeps the formatting intact.
$ cat testinput | tr '\t' '>'
A>B>C
A>>C
# Should normally be empty on the second ouput line
$ while read line; do IFS=$'\t' read -ra col < <(printf "$line"); echo ${col[1]}; done < testinput
B
C
For tab, it's complicated.
From 3.5.7 Word Splitting in the manual:
A sequence of IFS whitespace characters is also treated as a delimiter.
Since tab is an "IFS whitespace character", sequences of tabs are treated as a single delimiter
IFS=$'\t' read -ra ary <<<$'A\t\tC'
declare -p ary
declare -a ary=([0]="A" [1]="C")
What you can do is translate tabs to a non-whitespace character, assuming it does not clash with the actual data in the fields:
line=$'A\t\tC'
IFS=, read -ra ary <<<"${line//$'\t'/,}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")
To avoid the risk of colliding with commas in the data, we can use an unusual ASCII character: FS, octal 034
line=$'A\t\tC'
printf -v FS '\034'
IFS="$FS" read -ra ary <<<"${line//$'\t'/"$FS"}"
# or, without the placeholder variable
IFS=$'\034' read -ra ary <<<"${line//$'\t'/$'\034'}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")
One bash example using parameter expansion where we convert the delimiter into a \n and let mapfile read in each line as a new array entry ...
For tab-delimited data:
for line in $'A\tB\tC' $'A\t\tC'
do
mapfile -t array <<< "${line//$'\t'/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A B C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: The $'...' construct insures the \t is treated as a single <tab> character as opposed to the two literal characters \ + t.
For comma-delimited data:
for line in 'A,B,C' 'A,,C'
do
mapfile -t array <<< "${line//,/$'\n'}"
echo "############# ${line}"
typeset -p array
done
############# A,B,C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A,,C
declare -a array=([0]="A" [1]="" [2]="C")
NOTE: This obviously (?) assumes the desired data does not contain a comma (,).
It may just be your # Process the array, put that into a file part.
IFS=, read -ra ray <<< "A,,C"
for e in "${ray[#]}"; do o="$o\"$e\","; done
echo "[${o%,}]"
["A","","C"]
See #Glenn's excellent answer regarding tabs.
My simple data file:
$: cat x # tab delimited, empty field 2 of line 2
a b c
d f
My test:
while IFS=$'\001' read -r a b c; do
echo "a:[$a] b:[$b] c:[$c]"
done < <(tr "\t" "\001"<x)
a:[a] b:[b] c:[c]
a:[d] b:[] c:[f]
Note that I used ^A (a 001 byte) but you might be able to use something as simple as a comma or pipe (|) character. Choose based on your data.

Split two numbers in two arrays

I need to split 2 numbers in the form(they are from a text file):
Num1:Num2
Num3:Num4
And store num1 into array X and number 2 in array Y num 3 in array X and num4 in array Y.
With bash:
mapfile -t X < <(cut -d : -f 1 file) # read only first column
mapfile -t Y < <(cut -d : -f 2 file) # read only second column
declare -p X Y
Output:
declare -a X='([0]="num1" [1]="num3")'
declare -a Y='([0]="num2" [1]="num4")'
Disadvantage: The file is read twice.
You could perform the following steps:
Create destination arrays empty
Read file line by line, with a classic while read ... < file loop
Split each line on :, again using read
Append values to arrays
For example:
arr_x=()
arr_y=()
while IFS= read line || [ -n "$line" ]; do
IFS=: read x y <<< "$line"
arr_x+=("$x")
arr_y+=("$y")
done < data.txt
echo "content of arr_x:"
for v in "${arr_x[#]}"; do
echo "$v"
done
echo "content of arr_y:"
for v in "${arr_y[#]}"; do
echo "$v"
done
Here is a quick bash solution:
c=0
while IFS=: read a b ;do
x[$c]="$a"
y[$c]="$b"
c=$((c+1))
done < input.txt
We send the input.txt to a while loop, using Input Field Separator : and read the first number of each line as $a and second number as $b. Then we add them to the array as you specified. We use a counter $c to iterate the location in the arrays.
Using =~ operator to store the pair of numbers to array $BASH_REMATCH:
$ cat file
123:456
789:012
$ while read -r line
do
[[ $line =~ ([^:]*):(.*) ]] && echo ${BASH_REMATCH[1]} ${BASH_REMATCH[2]}
# do something else with numbers as they will be replaced on the next iteration
done < file

Bash, wihle read line by line, split strings on line divided by ",", store to array

I need to read file line by line, and every line split by ",", and store to array.
File source_file.
usl-coop,/root
usl-dev,/bin
Script.
i=1
while read -r line; do
IFS="," read -ra para_$i <<< $line
echo ${para_$i[#]}
((i++))
done < source_file
Expected output.
para_1[0]=usl-coop
para_1[1]=/root
para_2[0]=usl-dev
para_2[1]=/bin
Script will out error about echo.
./sofimon.sh: line 21: ${para_$i[#]}: bad substitution
When I echo array one by one field, for example
echo para_1[0]
it shows, that variables are stored.
But I need use it with variable within, something like this.
${para_$i[1]}
Is possible to do this?
Thanks.
S.
There is a trick to simulate 2D arrays using associative arrays. It works nice and I think is the most flexible and extensible:
declare -A para
i=1
while IFS=, read -r -a line; do
for j in ${!line[#]}; do
para[$i,$j]="${line[$j]}"
done
((i++)) ||:
done < source_file
declare -p para
will output:
declare -A para=([1,0]="usl-coop" [1,1]="/root" [2,1]="/bin" [2,0]="usl-dev" )
Without modifying your script that much you could use indirect variable expansion. It's sometimes used in simpler scripts:
i=1
while IFS="," read -r -a para_$i; do
n="para_$i[#]"
echo "${!n}"
((i++)) ||:
done < source_file
declare -p ${!para_*}
or basically the same with a nameref a named reference to another variable (side note: see how [#] needs to be part of the variable in indirect expansion, but not in named reference):
i=1
while IFS="," read -r -a para_$i; do
declare -n n
n="para_$i"
echo "${n[#]}"
((i++)) ||:
done < source_file
declare -p ${!para_*}
both scripts above will output the same:
usl-coop /root
usl-dev /bin
declare -a para_1=([0]="usl-coop" [1]="/root")
declare -a para_2=([0]="usl-dev" [1]="/bin")
That said, I think you shouldn't read your file into memory at all. It's just a bad design. Shell and bash is build around passing your files with pipes, streams, fifos, redirections, process substitutions, etc. without ever saving/copying/storing the file. If you have a file to parse, you should stream it to another process, parse and save the result, without ever storing the whole input in memory. If you want some data to find inside a file, use grep or awk.
Here is a short awk script that do the task.
awk 'BEGIN{FS=",";of="para_%d[%d]=%s\n"}{printf(of, NR, 0, $1);printf(of, NR, 1, $2)}' input.txt
Provide the desired output.
Explanation:
BEGIN{
FS=","; # set field seperator to `,`
of="para_%d[%d]=%s\n" # define common printf output format
}
{ # for each input line
printf(of, NR, 0, $1); # output for current line, [0], left field
printf(of, NR, 1, $2) # output for current line, [1], right field
}

Storing data in multiple arrays (bash)

I am trying to store the contents of a .txt file in two sets of arrays in bash. The file is a list of characteristics for given data files, delimited by vertical bars (|). So far, I have written code that reads the file and prints each line of data separately, each followed by the given sections of the line.
#prints line of text and then separated version
while IFS='' read -r line || [[ -n "$line" ]]
do
echo "Text read from file: $line"
words=$(echo $line | tr "|" "\n")
for tests in $words
do
echo "> $tests"
done
done < "$1"
Example output:
Text read from file: this|is|data|in|a|file
> this
> is
> data
> in
> a
> file
Text read from file: another|example|of|data
> another
> example
> of
> data
Is there a way for me to store each individual line of data in one array, and then the broken up parts of it within another? I was thinking this might be possible using a loop, but I am confused by arrays using bash (newbie).
OK -- I just read in the lines like you have done, and append them to the lines array. Then, use tr as you have done, and append to the words array. Just use the parentheses to mark them as array elements in the assignments:
$ cat data.txt
this|is|data|in|a|file
another|example|of|data
$ cat read_data.sh
#!/bin/bash
declare -a lines
declare -a words
while IFS='' read -r line || [[ -n "$line" ]]
do
echo "Text read from file: $line"
lines+=( $line )
words+=( $(echo $line | tr "|" " ") )
done < "$1"
for (( ii=0; ii<${#lines[#]}; ii++ )); do
echo "Line $ii ${lines[ii]}"
done
for (( ii=0; ii<${#words[#]}; ii++ )); do
echo "Word $ii ${words[ii]}"
done
$ $ ./read_data.sh data.txt
Text read from file: this|is|data|in|a|file
Text read from file: another|example|of|data
Line 0 this|is|data|in|a|file
Line 1 another|example|of|data
Word 0 this
Word 1 is
Word 2 data
Word 3 in
Word 4 a
Word 5 file
Word 6 another
Word 7 example
Word 8 of
Word 9 data

BASH copy of array

I am new to BASH.
I have string with name ARRAY, but i need set ARRAY as array, and this ARRAY as array must include parts of string from ARRAY as string separated by \n (new line)
This is what I have:
ARRAY=$'one\ntwo';
x=$ARRAY;
IFS=$'\n' read -rd '' -a y <<<"$x";
y=(${x//$'\n'/});
IFS=$'\n' y=(${x//$'\n'/ });
IFS=$'\n' y=($x);
unset ARRAY; (i try unset ARRAY)
ARRAY=$y; (this not works correctrly)
echo ${ARRAY[1]}; //result ARRAY[0]="one",ARRAY[1]=""
But if I try echo ${y[1]}; //all is right y[0]="one" y[1]="two"
My problem is that I cannot set ARRAY as copy of y array..
The way you're splitting the string at the newlines is correct:
array=$'one\ntwo'
IFS=$'\n' read -rd '' -a y <<<"$array"
Now, why do you give a different name, if eventually you want the variable array to contain the array? just do:
IFS=$'\n' read -rd '' -a array <<<"$array"
There are no problems if array appears both times here.
Now, if you want to copy an array, you'll do this (assuming the array to copy is called y as in your example):
array=( "${y[#]}" )
Note, that this will not preserve the sparseness of the array (but in your case, y is not sparse so there are no problems with this).
Another comment: when you do IFS=$'\n' read -rd '' -a y <<<"$array", read will return with a return code of 1; while this is not a problem, you may still want to make return happy by using:
IFS=$'\n' read -rd '' -a array < <(printf '%s\0' "$array")
A last comment: instead of using read you can use the builtin mapfile (bash≥4.0 only):
mapfile -t array <<< "$array"

Resources