Converting bash array into string with - in between [duplicate] - arrays

This question already has answers here:
How can I join elements of an array in Bash?
(34 answers)
Closed 4 months ago.
How do I convert an array into a string with a dash '-' in between in bash. For eg, I have this array
arr=(a b c d e)
I want to convert it into a string "a-b-c-d".
I figured out this "a-b-c-d-e," but there is an unwanted dash at the end. Is there an efficient way of doing this?
Thanks

This is where the "${arr[*]}" expansion form is useful (note the double quotes and the * index). This joins the array elements using the first character of the IFS variable
arr=(a b c d e)
joined=$(IFS='-'; echo "${arr[*]}")
declare -p joined
# => declare -- joined="a-b-c-d-e"
But if you want all-but-the-last elements, you'll combine this with the ${var:offset:length} expansion
joined=$(IFS='-'; echo "${arr[*]:0: ${#arr[#]} - 1}")
declare -p joined
# => declare -- joined="a-b-c-d"
This one's a little tricker. The offset and length parts of that expansion are arithmetic expressions. I'm calculating the length as "the number of elements in the array minus one".
Note how I'm defining IFS inside the command substitution parentheses: this is overriding the variable in the subshell of the command substitution, so it won't affect the IFS variable in your current shell.

Using awk if you want to remove the last entry
$ awk -vOFS=- '{NF--;$1=$1}1' <<<${arr[#]}
a-b-c-d
or the complete array
$ awk -vOFS=- '{$1=$1}1' <<<${arr[#]}
a-b-c-d-e

Related

bash dynamically reference dynamically created variables [duplicate]

This question already has answers here:
Dynamic variable names in Bash
(19 answers)
Closed 4 years ago.
I have the variable $foo="something" and would like to use:
bar="foo"; echo $($bar)
to get "something" echoed.
In bash, you can use ${!variable} to use variable variables.
foo="something"
bar="foo"
echo "${!bar}"
# something
eval echo \"\$$bar\" would do it.
The accepted answer is great. However, #Edison asked how to do the same for arrays. The trick is that you want your variable holding the "[#]", so that the array is expanded with the "!". Check out this function to dump variables:
$ function dump_variables() {
for var in "$#"; do
echo "$var=${!var}"
done
}
$ STRING="Hello World"
$ ARRAY=("ab" "cd")
$ dump_variables STRING ARRAY ARRAY[#]
This outputs:
STRING=Hello World
ARRAY=ab
ARRAY[#]=ab cd
When given as just ARRAY, the first element is shown as that's what's expanded by the !. By giving the ARRAY[#] format, you get the array and all its values expanded.
To make it more clear how to do it with arrays:
arr=( 'a' 'b' 'c' )
# construct a var assigning the string representation
# of the variable (array) as its value:
var=arr[#]
echo "${!var}"

Reading several files into an associative array in bash (>4.0) [duplicate]

This question already has answers here:
How to pipe input to a Bash while loop and preserve variables after loop ends
(3 answers)
Closed 4 years ago.
I am new to associative arrays in bash so please forgive me if I sound silly somewhere. Let's say am reading through a large file and using bash (version = 4.2.46) associative array to store FDR values for genes. For one file, I am simply doing:
declare -A array
while read ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference; do
array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR" ;
done < input.txt
Which will store the FDR values that I can print by doing
for key in "${!array[#]}"; do echo "$key->${array[$key]}"; done
# Prints out
"ABHD14B"->0.285807588279,0.898327660004,0.820468496328
"DHFR"->0.464931314555,0.449582575347
...
I naively tried to read several file through my array by doing
declare -A array
find ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt -type f -exec cat {} + |
while read ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference;
do array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR" ;
done
But in this case my array ends up being empty. I can of course cat all the files I need and save them into a single file that I can use as above, but it would be nice to know how to make an associative array to store data from several distinct files.
Thank you very much!
You probably shouldn't be doing this in bash in the first place, but your main problem is that the while loop runs in a subshell induced by the pipeline. Use process substitution to invert the relationship.
(Also, don't give names to all the fields you don't actually use; just split the line into an indexed array and pick out the two fields you actually want.)
while read -a fields; do
geneSymbol=${fields[1]}
FDR=${fields[...]} # some number; i'm not counting
array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR"
done < <(find ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt -type f -exec cat {} +)
find probably isn't necessary; just put your while loop inside a for loop:
for f in ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt; do
while read -a fields; do
...
done < "$f"
done

How can I sort array in bash? [duplicate]

This question already has answers here:
How to sort an array in Bash
(20 answers)
Closed 4 years ago.
for example, if I want to write a script that get strings as arguments, and I want to insert them to array array_of_args and then I want to sort this array and to make sorting array.
How can I do it?
I thought to sort the array (and print it to stdout) in the next way:
array_of_args=("$#")
# sort_array=()
# i=0
for string in "${array_of_args[#]}"; do
echo "${string}"
done | sort
But I don't know how to insert the sorting values to array ( to sort_array)..
You can use following script to sort input argument that may contain whitespace, newlines, glob characters or any other special characters:
#!/usr/bin/env bash
local args=("$#") # create an array of arguments
local sarr=() # initialise an array for sorted arguments
# use printf '%s\0' "${args[#]}" | sort -z to print each argument delimited
# by NUL character and sort it
# iterate through sorted arguments and add it in sorted array
if (( $# )); then
while IFS= read -rd '' el; do
sarr+=("$el")
done < <(printf '%s\0' "${args[#]}" | sort -z)
fi
# examine sorted array
declare -p sarr

Bash, split words into letters and save to array

I'm struggling with a project. I am supposed to write a bash script which will work like tr command. At the beginning I would like to save all commands arguments into separated arrays. And in case if an argument is a word I would like to have each char in separated array field,eg.
tr_mine AB DC
I would like to have two arrays: a[0] = A, a[1] = B and b[0]=C b[1]=D.
I found a way, but it's not working:
IFS="" read -r -a array <<< "$a"
No sed, no awk, all bash internals.
Assuming that words are always separated with blanks (space and/or tabs),
also assuming that words are given as arguments, and writing for bash only:
#!/bin/bash
blank=$'[ \t]'
varname='A'
n=1
while IFS='' read -r -d '' -N 1 c ; do
if [[ $c =~ $blank ]]; then n=$((n+1)); continue; fi
eval ${varname}${n}'+=("'"$c"'")'
done <<<"$#"
last=$(eval echo \${#${varname}${n}[#]}) ### Find last character index.
unset "${varname}${n}[$last-1]" ### Remove last (trailing) newline.
for ((j=1;j<=$n;j++)); do
k="A$j[#]"
printf '<%s> ' "${!k}"; echo
done
That will set each array A1, A2, A3, etc. ... to the letters of each word.
The value at the end of the first loop of $n is the count of words processed.
Printing may be a little tricky, that is why the code to access each letter is given above.
Applied to your sample text:
$ script.sh AB DC
<A> <B>
<D> <C>
The script is setting two (array) vars A1 and A2.
And each letter is one array element: A1[0] = A, A1[1] = B and A2[0]=C, A2[1]=D.
You need to set a variable ($k) to the array element to access.
For example, to echo fourth letter (0 based) of second word (1 based) you need to do (that may be changed if needed):
k="A2[3]"; echo "${!k}" ### Indirect addressing.
The script will work as this:
$ script.sh ABCD efghi
<A> <B> <C> <D>
<e> <f> <g> <h> <i>
Caveat: Characters will be split even if quoted. However, quoted arguments is the correct way to use this script to avoid the effect of shell metacharacters ( |,&,;,(,),<,>,space,tab ). Of course, spaces (even if repeated) will split words as defined by the variable $blank:
$ script.sh $'qwer;rttt fgf\ngfg'
<q> <w> <e> <r> <;> <r> <t> <t> <t>
<>
<>
<>
<f> <g> <f> <
> <g> <f> <g>
As the script will accept and correctly process embebed newlines we need to use: unset "${varname}${n}[$last-1]" to remove the last trailing "newline". If that is not desired, quote the line.
Security Note: The eval is not much of a problem here as it is only processing one character at a time. It would be difficult to create an attack based on just one character. Anyway, the usual warning is valid: Always sanitize your input before using this script. Also, most (not quoted) metacharacters of bash will break this script.
$ script.sh qwer(rttt fgfgfg
bash: syntax error near unexpected token `('
I would strongly suggest to do this in another language if possible, it will be a lot easier.
Now, the closest I come up with is:
#!/bin/bash
sentence="AC DC"
words=`echo "$sentence" | tr " " "\n"`
# final array
declare -A result
# word count
wc=0
for i in $words; do
# letter count in the word
lc=0
for l in `echo "$i" | grep -o .`; do
result["w$wc-l$lc"]=$l
lc=$(($lc+1))
done
wc=$(($wc+1))
done
rLen=${#result[#]}
echo "Result Length $rLen"
for i in "${!result[#]}"
do
echo "$i => ${result[$i]}"
done
The above prints:
Result Length 4
w1-l1 => C
w1-l0 => D
w0-l0 => A
w0-l1 => C
Explanation:
Dynamic variables are not supported in bash (ie create variables using variables) so I am using an associative array instead (result)
Arrays in bash are single dimension. To fake a 2D array I use the indexes: w for words and l for letters. This will make further processing a pain...
Associative arrays are not ordered thus results appear in random order when printing
${!result[#]} is used instead of ${result[#]}. The first iterates keys while the second iterates values
I know this is not exactly what you ask for, but I hope it will point you to the right direction
Try this :
sentence="$#"
read -r -a words <<< "$sentence"
for word in ${words[#]}; do
inc=$(( i++ ))
read -r -a l${inc} <<< $(sed 's/./& /g' <<< $word)
done
echo ${words[1]} # print "CD"
echo ${l1[1]} # print "D"
The first read reads all words, the internal one is for letters.
The sed command add a space after each letters to make the string splittable by read -a. You can also use this sed command to remove unwanted characters from words (eg commas) before splitting.
If special characters are allowed in words, you can use a simple grep instead of the sed command (as suggested in http://www.unixcl.com/2009/07/split-string-to-characters-in-bash.html) :
read -r -a l${inc} <<< $(grep -o . <<< $word)
The word array is ${w}.
The letters arrays are named l# where # is an increment added for each word read.

How to parse only selected column values using awk

I have a sample flat file which contains the following block
test my array which array is better array huh got it?
INDIA USA SA NZ AUS ARG ARM ARZ GER BRA SPN
I also have an array(ksh_arr2) which was defined like this
ksh_arr2=$(awk '{if(NR==1){for(i=1;i<=NF;i++){if($i~/^arr/){print i}}}}' testUnix.txt)
and contains the following integers
3 5 8
Now I want to parse only those column values which are at the respective numbered positions i.e. third fifth and eighth.
I also want the outputs from the 2nd line on wards.
So I tried the following
awk '{for(i=1;i<=NF;i++){if(NR >=1 && i=${ksh_arr2[i]}) do print$i ; done}}' testUnix.txt
but it is apparently not printing the desired outputs.
What am I missing ? Please help.
How i would approach it
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{for(i in B)print $B[i]}' file
Explanation
-vA="${ksh_arr2[*]}" - Set variable A to expanded ksh array
'BEGIN{split(A,B," ") - Splits the expanded array on spaces
(effictively recreating it in awk)
{for(i in B)print $B[i]} - Index in the new array print the field that is the number
contained in that index
Edit
If you want to preserve the order of the fields when printing then this would be better
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{while(++i<=length(B))print $B[i]}' file
Since no sample output is shown, I don't know if this output is what you want. It is the output one gets from the code provided with the minimal changes required to get it to run:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<=length(a);i++){print $a[i]}}' testUnix.txt
array
array
array
SA
AUS
ARZ
The above code prints out the selected columns in the same order supplied by the variable k.
Notes
The awk code never defined ksh_arr2. I presume that the value of this array was to be passed in from the shell. It is done here using the -v option to set the variable k to the value of ksh_arr2.
It is not possible to pass into awk an array directly. It is possible to pass in a string, as above, and then convert it to an array using the split function. Above the string k is converted to the awk array a.
awk syntax is different from shell syntax. For instance, awk does not use do or done.
Details
-v k='3 5 8'
This defines an awk variable k. To do this programmatically, replace 3 5 8 with a string or array from the shell.
BEGIN{split(k,a," ");}
This converts the space-separated values in variable k into an array named a.
for(i=1;i<=length(a);i++){print $a[i]}
This prints out each column in array a in order.
Alternate Output
If you want to keep the output from each line on a single line:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<length(a);i++) printf "%s ",$a[i]; print $a[length(a)]}' testUnix.txt
array array array
SA AUS ARZ
awk 'NR>=1 { print $3 " " $5 " " $8 }' testUnix.txt

Resources