bash: split text file at braces into array - arrays

I searched the site very thoroughly but was not able to turn up a fitting answer - most probably I wasn't asking the correct questions.
I have a text-file with up to a several thousand lines of coordinates formatted as in the following example:
[1]
-75.4532 75.8273
-115.00 64.5
-90.00 74.3333
-100.00 72.4167
-110.00 69.00
-120.8 56.284
[2]
-70.00 73.75
-100.00 69.3333
-110.00 65.1533
-90.00 71.5833
-80.00 73.00
[3]
-100.00 67.5
-67.7133 72.6611
-80.00 71.5
-90.00 70.00
-110.00 63.8667
-115.8 60.836
What I'm trying to achieve is to split the file into an array at the numbers in brackets. So that I can use the number in brackets as the arrays index and the following lines as the corresponding value.
The next step would be looping through the array feeding each element to another program. If there is a more elegant approach I'm willing to listen.
All the best!

You can use sed to massage the file into a bash array definition:
declare -a "$(sed 's/\[/" &/g; s/\]/&="/g' file | sed '1s/^"/arr=(/; $s/$/")/')"
echo "${arr[2]}"
echo
echo ${arr[2]}
-70.00 73.75
-100.00 69.3333
-110.00 65.1533
-90.00 71.5833
-80.00 73.00
-70.00 73.75 -100.00 69.3333 -110.00 65.1533 -90.00 71.5833 -80.00 73.00
Printing with and without quotes to demonstrate the difference

Use a combination of read -d (to set the record delimiter) and IFS (to set the field separator):
# read content from file
content="$(<input_filename)"
# append record separator to avoid dropping the last record
content="$content["
# read into array
arr=()
while IFS=']' read -d '[' sub value; do
arr[$sub]=$value
done <<<"$content"
The resulting array will have an empty first element since it's zero-based. This can make it trickier to loop over it. You can remove the first element explicitly to make the loop easier:
unset arr[0]
Now you can loop over the elements:
for value in "${arr[#]}"; do
program < "$value"
done
or if you need the 1-based index as well:
for ((i=1; i<=${#arr[#]}; i++)); do
program "$i" "$value"
done
Hope that helps!

Related

Select a specific line of an array in a for loop

I have an array with multiple lines, and I would like to select a specific line of the array in a for loop. And then use the data in the selected line for some analysis in the for loop.
ARRAY=(A1,A2
B1,B2
C1,C2)
When I applied the below code, I can assign "B1" to Trait1 and "B2" to Trait2. This works for j=1 or j=3 as well.
j=2
echo ${ARRAY[$j-1]}
Trait1="$(echo ${ARRAY[$j-1]} | cut -d',' -f1)"
Trait2="$(echo ${ARRAY[$j-1]} | cut -d',' -f2)"
echo $Trait1
echo $Trait2
Ultimately, I want to put the above code in a for loop. But this failed.
nLine=3
for j in $(eval echo "{1..$nLine}")
do
Trait1="$(echo ${ARRAY[$j-1]} | cut -d',' -f1)"
Trait2="$(echo ${ARRAY[$j-1]} | cut -d',' -f2)"
echo $Trait1
echo $Trait2
done
-bash: 1 2 3-1: syntax error in expression (error token is "2 3-1")
-bash: 1 2 3-1: syntax error in expression (error token is "2 3-1")
Thank you.
Keeping the current code design I'd probably want to use parameter expansion to break the array element into its 2 components ...
Data setup:
$ ARRAY=(A1,A2
B1,B2
C1,C2)
$ typeset -p ARRAY
declare -a ARRAY=([0]="A1,A2" [1]="B1,B2" [2]="C1,C2")
Quick demo of parameter expansion to break array element into parts:
$ x=${ARRAY[0]}
$ echo ${x}
A1,A2
$ echo ${x%,*}
A1
$ echo ${x#*,}
A2
Updating the PO's for loop to use parameter expansion, and replacing the eval/echo/j-1 with something a bit easier to read:
nLine=3
for (( j=0 ; j<nLine ; j++ ))
do
Trait1="${ARRAY[${j}]%,*}"
Trait2="${ARRAY[${j}]#*,}"
echo "${Trait1}"
echo "${Trait2}"
done
Generates the output:
A1
A2
B1
B2
C1
C2
The way you're using it, you should have a single string, not an array.
ARRAY='A1,A2
B1,B2
C1,C2'
Then when you use it as a here-string input to sed, it will receive multiple lines of input, and you can select the line you want with ${j}p.
As mentioned in the comments, this seems like a convoluted way to access array elements. You can simply use the shell's built-in array indexing operator.
Array assignments
ARRAY=(A1,A2
B1,B2
C1,C2)
and
ARRAY=(A1,A2 B1,B2 C1,C2)
are equivalent and both define an array with 3 elements: A1,A2, B1,B2, and C1,C2. Space(s), tab(s) and new-line(s) are used to separate array elements in compound assignments.
A for loop can be used to traverse an array:
# "${ARRAY[#]}" expands each element of ARRAY to a separate word
for elm in "${ARRAY[#]}"; do
# The value of the array element accessed is "$elm"
done
Or, using the alternate form of the for loop:
# ${#ARRAY[*]} expands to the number of elements in the ARRAY
for ((j = 0; j < ${#ARRAY[*]}; ++j)); do
# Array element can be referenced using ${ARRAY[j]}
done
assuming there is no gap between indices (bash allows an array with "holes").
Here strings, a variant of here documents, and read builtin with IFS variable can be used to parse the elements of the array:
IFS=, read Trait1 Trait2 <<< ${ARRAY[j]}
Here the IFS is used with the value , to cut the array element into Trait1 and Trait2 when there is a ,.
So, a sample program to demonstrate all of these may be:
#!/bin/bash
ARRAY=(A1,A2
B1,B2
C1,C2)
for elem in "${ARRAY[#]}"; do
IFS=, read Trait1 Trait2 <<< $elem
printf "%s\n%s\n" "$Trait1" "$Trait2"
done
or,
#!/bin/bash
ARRAY=(A1,A2
B1,B2
C1,C2)
for ((j = 0; j < ${#ARRAY[*]}; ++j)); do
IFS=, read Trait1 Trait2 <<< ${ARRAY[j]}
printf "%s\n%s\n" "$Trait1" "$Trait2"
done

Copy elements from an array to another preserving embedded spaces

Consider having an array which is as follows:
array=("BMW E46" "Ford Mustang" "Toyota GT86")
Running the following command proves that it has 3 elements:
echo ${#array[#]} # outputs 3
Now I will remove an element from the initial array:
unset array[0]
While this will remove the element from my array, the indices will remain the same:
echo ${!array[#]} # output: 1 2
My way of dealing with this problem (normally) is to just type array=(${array[#]}) but in this case, running the command will result in my elements being parsed as ('Ford' 'Mustang' ...).
Is there a way of fixing the indices without messing up the array elements?
Enclose ${array[#]} in double-quotes so that each element will be retained as a separate field.
$ array=("BMW E46" "Ford Mustang" "Toyota GT86")
$ unset array[0]
$ array=("${array[#]}")
$ declare -p array
declare -a array=([0]="Ford Mustang" [1]="Toyota GT86")

How to get user input as number and echo the stored array value of that number in bash scripting

I have wrote a script that throws the output of running node processes with the cwd of that process and I store the value in an array using for loop and do echo that array.
How can I able to get the user enter the index of array regarding the output that the script throws and show the output against that input generated by user
Example Myscript
array=$(netstat -nlp | grep node)
for i in ${array[*]}
do
echo $i
done
output is something like that
1056
2064
3024
I want something more advance. I want to take input from user like
Enter the regarding index from above list = 1
And lets suppose user enter 1
Then next output should be
Your selected value is 2064
Is it possible in bash
First, you're not actually using an array, you are storing a plain string in the variable "array". The string contains words separated by whitespace, so when you supply the variable in the for statement, the unquoted value is subject to Word Splitting
You need to use the array syntax for setting the array:
array=( $(netstat -nlp | grep node) )
However, the unquoted command substitution still exposes you to Filename Expansion. The best way to store the lines of a command into an array is to use the mapfile command with a process substitution:
mapfile -t array < <(netstat -nlp | grep node)
And in the for loop, make sure you quote all the variables and use index #
for i in "${array[#]}"; do
echo "$i"
done
Notes:
arrays created with mapfile will start at index 0, so be careful of off-by-one errors
I don't know how variables are implemented in bash, but there is this oddity:
if you refer to the array without an index, you'll get the first element:
array=( "hello" "world" )
echo "$array" # ==> hello
If you refer to a plain variable with array syntax and index zero, you'll get the value:
var=1234
echo "${var[0]}" # ==> 1234

Bash, split words into letters and save to array

I'm struggling with a project. I am supposed to write a bash script which will work like tr command. At the beginning I would like to save all commands arguments into separated arrays. And in case if an argument is a word I would like to have each char in separated array field,eg.
tr_mine AB DC
I would like to have two arrays: a[0] = A, a[1] = B and b[0]=C b[1]=D.
I found a way, but it's not working:
IFS="" read -r -a array <<< "$a"
No sed, no awk, all bash internals.
Assuming that words are always separated with blanks (space and/or tabs),
also assuming that words are given as arguments, and writing for bash only:
#!/bin/bash
blank=$'[ \t]'
varname='A'
n=1
while IFS='' read -r -d '' -N 1 c ; do
if [[ $c =~ $blank ]]; then n=$((n+1)); continue; fi
eval ${varname}${n}'+=("'"$c"'")'
done <<<"$#"
last=$(eval echo \${#${varname}${n}[#]}) ### Find last character index.
unset "${varname}${n}[$last-1]" ### Remove last (trailing) newline.
for ((j=1;j<=$n;j++)); do
k="A$j[#]"
printf '<%s> ' "${!k}"; echo
done
That will set each array A1, A2, A3, etc. ... to the letters of each word.
The value at the end of the first loop of $n is the count of words processed.
Printing may be a little tricky, that is why the code to access each letter is given above.
Applied to your sample text:
$ script.sh AB DC
<A> <B>
<D> <C>
The script is setting two (array) vars A1 and A2.
And each letter is one array element: A1[0] = A, A1[1] = B and A2[0]=C, A2[1]=D.
You need to set a variable ($k) to the array element to access.
For example, to echo fourth letter (0 based) of second word (1 based) you need to do (that may be changed if needed):
k="A2[3]"; echo "${!k}" ### Indirect addressing.
The script will work as this:
$ script.sh ABCD efghi
<A> <B> <C> <D>
<e> <f> <g> <h> <i>
Caveat: Characters will be split even if quoted. However, quoted arguments is the correct way to use this script to avoid the effect of shell metacharacters ( |,&,;,(,),<,>,space,tab ). Of course, spaces (even if repeated) will split words as defined by the variable $blank:
$ script.sh $'qwer;rttt fgf\ngfg'
<q> <w> <e> <r> <;> <r> <t> <t> <t>
<>
<>
<>
<f> <g> <f> <
> <g> <f> <g>
As the script will accept and correctly process embebed newlines we need to use: unset "${varname}${n}[$last-1]" to remove the last trailing "newline". If that is not desired, quote the line.
Security Note: The eval is not much of a problem here as it is only processing one character at a time. It would be difficult to create an attack based on just one character. Anyway, the usual warning is valid: Always sanitize your input before using this script. Also, most (not quoted) metacharacters of bash will break this script.
$ script.sh qwer(rttt fgfgfg
bash: syntax error near unexpected token `('
I would strongly suggest to do this in another language if possible, it will be a lot easier.
Now, the closest I come up with is:
#!/bin/bash
sentence="AC DC"
words=`echo "$sentence" | tr " " "\n"`
# final array
declare -A result
# word count
wc=0
for i in $words; do
# letter count in the word
lc=0
for l in `echo "$i" | grep -o .`; do
result["w$wc-l$lc"]=$l
lc=$(($lc+1))
done
wc=$(($wc+1))
done
rLen=${#result[#]}
echo "Result Length $rLen"
for i in "${!result[#]}"
do
echo "$i => ${result[$i]}"
done
The above prints:
Result Length 4
w1-l1 => C
w1-l0 => D
w0-l0 => A
w0-l1 => C
Explanation:
Dynamic variables are not supported in bash (ie create variables using variables) so I am using an associative array instead (result)
Arrays in bash are single dimension. To fake a 2D array I use the indexes: w for words and l for letters. This will make further processing a pain...
Associative arrays are not ordered thus results appear in random order when printing
${!result[#]} is used instead of ${result[#]}. The first iterates keys while the second iterates values
I know this is not exactly what you ask for, but I hope it will point you to the right direction
Try this :
sentence="$#"
read -r -a words <<< "$sentence"
for word in ${words[#]}; do
inc=$(( i++ ))
read -r -a l${inc} <<< $(sed 's/./& /g' <<< $word)
done
echo ${words[1]} # print "CD"
echo ${l1[1]} # print "D"
The first read reads all words, the internal one is for letters.
The sed command add a space after each letters to make the string splittable by read -a. You can also use this sed command to remove unwanted characters from words (eg commas) before splitting.
If special characters are allowed in words, you can use a simple grep instead of the sed command (as suggested in http://www.unixcl.com/2009/07/split-string-to-characters-in-bash.html) :
read -r -a l${inc} <<< $(grep -o . <<< $word)
The word array is ${w}.
The letters arrays are named l# where # is an increment added for each word read.

Bash array with spaces in elements

I'm trying to construct an array in bash of the filenames from my camera:
FILES=(2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg)
As you can see, there is a space in the middle of each filename.
I've tried wrapping each name in quotes, and escaping the space with a backslash, neither of which works.
When I try to access the array elements, it continues to treat the space as the elementdelimiter.
How can I properly capture the filenames with a space inside the name?
I think the issue might be partly with how you're accessing the elements. If I do a simple for elem in $FILES, I experience the same issue as you. However, if I access the array through its indices, like so, it works if I add the elements either numerically or with escapes:
for ((i = 0; i < ${#FILES[#]}; i++))
do
echo "${FILES[$i]}"
done
Any of these declarations of $FILES should work:
FILES=(2011-09-04\ 21.43.02.jpg
2011-09-05\ 10.23.14.jpg
2011-09-09\ 12.31.16.jpg
2011-09-11\ 08.43.12.jpg)
or
FILES=("2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg")
or
FILES[0]="2011-09-04 21.43.02.jpg"
FILES[1]="2011-09-05 10.23.14.jpg"
FILES[2]="2011-09-09 12.31.16.jpg"
FILES[3]="2011-09-11 08.43.12.jpg"
There must be something wrong with the way you access the array's items. Here's how it's done:
for elem in "${files[#]}"
...
From the bash manpage:
Any element of an array may be referenced using ${name[subscript]}. ... If subscript is # or *, the word expands to all members of name. These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable, and ${name[#]} expands each element of name to a separate word.
Of course, you should also use double quotes when accessing a single member
cp "${files[0]}" /tmp
You need to use IFS to stop space as element delimiter.
FILES=("2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg")
IFS=""
for jpg in ${FILES[*]}
do
echo "${jpg}"
done
If you want to separate on basis of . then just do IFS="."
Hope it helps you:)
I agree with others that it's likely how you're accessing the elements that is the problem. Quoting the file names in the array assignment is correct:
FILES=(
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
)
for f in "${FILES[#]}"
do
echo "$f"
done
Using double quotes around any array of the form "${FILES[#]}" splits the array into one word per array element. It doesn't do any word-splitting beyond that.
Using "${FILES[*]}" also has a special meaning, but it joins the array elements with the first character of $IFS, resulting in one word, which is probably not what you want.
Using a bare ${array[#]} or ${array[*]} subjects the result of that expansion to further word-splitting, so you'll end up with words split on spaces (and anything else in $IFS) instead of one word per array element.
Using a C-style for loop is also fine and avoids worrying about word-splitting if you're not clear on it:
for (( i = 0; i < ${#FILES[#]}; i++ ))
do
echo "${FILES[$i]}"
done
If you had your array like this:
#!/bin/bash
Unix[0]='Debian'
Unix[1]="Red Hat"
Unix[2]='Ubuntu'
Unix[3]='Suse'
for i in $(echo ${Unix[#]});
do echo $i;
done
You would get:
Debian
Red
Hat
Ubuntu
Suse
I don't know why but the loop breaks down the spaces and puts them as an individual item, even you surround it with quotes.
To get around this, instead of calling the elements in the array, you call the indexes, which takes the full string thats wrapped in quotes.
It must be wrapped in quotes!
#!/bin/bash
Unix[0]='Debian'
Unix[1]='Red Hat'
Unix[2]='Ubuntu'
Unix[3]='Suse'
for i in $(echo ${!Unix[#]});
do echo ${Unix[$i]};
done
Then you'll get:
Debian
Red Hat
Ubuntu
Suse
This was already answered above, but that answer was a bit terse and the man page excerpt is a bit cryptic. I wanted to provide a fully worked example to demonstrate how this works in practice.
If not quoted, an array just expands to strings separated by spaces, so that
for file in ${FILES[#]}; do
expands to
for file in 2011-09-04 21.43.02.jpg 2011-09-05 10.23.14.jpg 2011-09-09 12.31.16.jpg 2011-09-11 08.43.12.jpg ; do
But if you quote the expansion, bash adds double quotes around each term, so that:
for file in "${FILES[#]}"; do
expands to
for file in "2011-09-04 21.43.02.jpg" "2011-09-05 10.23.14.jpg" "2011-09-09 12.31.16.jpg" "2011-09-11 08.43.12.jpg" ; do
The simple rule of thumb is to always use [#] instead of [*] and quote array expansions if you want spaces preserved.
To elaborate on this a little further, the man page in the other answer is explaining that if unquoted, $* an $# behave the same way, but they are different when quoted. So, given
array=(a b c)
Then $* and $# both expand to
a b c
and "$*" expands to
"a b c"
and "$#" expands to
"a" "b" "c"
Not exactly an answer to the quoting/escaping problem of the original question but probably something that would actually have been more useful for the op:
unset FILES
for f in 2011-*.jpg; do FILES+=("$f"); done
echo "${FILES[#]}"
Where of course the expression would have to be adopted to the specific requirement (e.g. *.jpg for all or 2001-09-11*.jpg for only the pictures of a certain day).
For those who prefer set array in oneline mode, instead of using for loop
Changing IFS temporarily to new line could save you from escaping.
OLD_IFS="$IFS"
IFS=$'\n'
array=( $(ls *.jpg) ) #save the hassle to construct filename
IFS="$OLD_IFS"
Escaping works.
#!/bin/bash
FILES=(2011-09-04\ 21.43.02.jpg
2011-09-05\ 10.23.14.jpg
2011-09-09\ 12.31.16.jpg
2011-09-11\ 08.43.12.jpg)
echo ${FILES[0]}
echo ${FILES[1]}
echo ${FILES[2]}
echo ${FILES[3]}
Output:
$ ./test.sh
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg
Quoting the strings also produces the same output.
#! /bin/bash
renditions=(
"640x360 80k 60k"
"1280x720 320k 128k"
"1280x720 320k 128k"
)
for z in "${renditions[#]}"; do
echo "$z"
done
OUTPUT
640x360 80k 60k
1280x720 320k 128k
1280x720 320k 128k
`
Another solution is using a "while" loop instead a "for" loop:
index=0
while [ ${index} -lt ${#Array[#]} ]
do
echo ${Array[${index}]}
index=$(( $index + 1 ))
done
If you aren't stuck on using bash, different handling of spaces in file names is one of the benefits of the fish shell. Consider a directory which contains two files: "a b.txt" and "b c.txt". Here's a reasonable guess at processing a list of files generated from another command with bash, but it fails due to spaces in file names you experienced:
# bash
$ for f in $(ls *.txt); { echo $f; }
a
b.txt
b
c.txt
With fish, the syntax is nearly identical, but the result is what you'd expect:
# fish
for f in (ls *.txt); echo $f; end
a b.txt
b c.txt
It works differently because fish splits the output of commands on newlines, not spaces.
If you have a case where you do want to split on spaces instead of newlines, fish has a very readable syntax for that:
for f in (ls *.txt | string split " "); echo $f; end
If the elements of FILES come from another file whose file names are line-separated like this:
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg
then try this so that the whitespaces in the file names aren't regarded as delimiters:
while read -r line; do
FILES+=("$line")
done < ./files.txt
If they come from another command, you need to rewrite the last line like this:
while read -r line; do
FILES+=("$line")
done < <(./output-files.sh)
I used to reset the IFS value and rollback when done.
# backup IFS value
O_IFS=$IFS
# reset IFS value
IFS=""
FILES=(
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
)
for file in ${FILES[#]}; do
echo ${file}
done
# rollback IFS value
IFS=${O_IFS}
Possible output from the loop:
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg

Resources