Bash, split words into letters and save to array - arrays

I'm struggling with a project. I am supposed to write a bash script which will work like tr command. At the beginning I would like to save all commands arguments into separated arrays. And in case if an argument is a word I would like to have each char in separated array field,eg.
tr_mine AB DC
I would like to have two arrays: a[0] = A, a[1] = B and b[0]=C b[1]=D.
I found a way, but it's not working:
IFS="" read -r -a array <<< "$a"

No sed, no awk, all bash internals.
Assuming that words are always separated with blanks (space and/or tabs),
also assuming that words are given as arguments, and writing for bash only:
#!/bin/bash
blank=$'[ \t]'
varname='A'
n=1
while IFS='' read -r -d '' -N 1 c ; do
if [[ $c =~ $blank ]]; then n=$((n+1)); continue; fi
eval ${varname}${n}'+=("'"$c"'")'
done <<<"$#"
last=$(eval echo \${#${varname}${n}[#]}) ### Find last character index.
unset "${varname}${n}[$last-1]" ### Remove last (trailing) newline.
for ((j=1;j<=$n;j++)); do
k="A$j[#]"
printf '<%s> ' "${!k}"; echo
done
That will set each array A1, A2, A3, etc. ... to the letters of each word.
The value at the end of the first loop of $n is the count of words processed.
Printing may be a little tricky, that is why the code to access each letter is given above.
Applied to your sample text:
$ script.sh AB DC
<A> <B>
<D> <C>
The script is setting two (array) vars A1 and A2.
And each letter is one array element: A1[0] = A, A1[1] = B and A2[0]=C, A2[1]=D.
You need to set a variable ($k) to the array element to access.
For example, to echo fourth letter (0 based) of second word (1 based) you need to do (that may be changed if needed):
k="A2[3]"; echo "${!k}" ### Indirect addressing.
The script will work as this:
$ script.sh ABCD efghi
<A> <B> <C> <D>
<e> <f> <g> <h> <i>
Caveat: Characters will be split even if quoted. However, quoted arguments is the correct way to use this script to avoid the effect of shell metacharacters ( |,&,;,(,),<,>,space,tab ). Of course, spaces (even if repeated) will split words as defined by the variable $blank:
$ script.sh $'qwer;rttt fgf\ngfg'
<q> <w> <e> <r> <;> <r> <t> <t> <t>
<>
<>
<>
<f> <g> <f> <
> <g> <f> <g>
As the script will accept and correctly process embebed newlines we need to use: unset "${varname}${n}[$last-1]" to remove the last trailing "newline". If that is not desired, quote the line.
Security Note: The eval is not much of a problem here as it is only processing one character at a time. It would be difficult to create an attack based on just one character. Anyway, the usual warning is valid: Always sanitize your input before using this script. Also, most (not quoted) metacharacters of bash will break this script.
$ script.sh qwer(rttt fgfgfg
bash: syntax error near unexpected token `('

I would strongly suggest to do this in another language if possible, it will be a lot easier.
Now, the closest I come up with is:
#!/bin/bash
sentence="AC DC"
words=`echo "$sentence" | tr " " "\n"`
# final array
declare -A result
# word count
wc=0
for i in $words; do
# letter count in the word
lc=0
for l in `echo "$i" | grep -o .`; do
result["w$wc-l$lc"]=$l
lc=$(($lc+1))
done
wc=$(($wc+1))
done
rLen=${#result[#]}
echo "Result Length $rLen"
for i in "${!result[#]}"
do
echo "$i => ${result[$i]}"
done
The above prints:
Result Length 4
w1-l1 => C
w1-l0 => D
w0-l0 => A
w0-l1 => C
Explanation:
Dynamic variables are not supported in bash (ie create variables using variables) so I am using an associative array instead (result)
Arrays in bash are single dimension. To fake a 2D array I use the indexes: w for words and l for letters. This will make further processing a pain...
Associative arrays are not ordered thus results appear in random order when printing
${!result[#]} is used instead of ${result[#]}. The first iterates keys while the second iterates values
I know this is not exactly what you ask for, but I hope it will point you to the right direction

Try this :
sentence="$#"
read -r -a words <<< "$sentence"
for word in ${words[#]}; do
inc=$(( i++ ))
read -r -a l${inc} <<< $(sed 's/./& /g' <<< $word)
done
echo ${words[1]} # print "CD"
echo ${l1[1]} # print "D"
The first read reads all words, the internal one is for letters.
The sed command add a space after each letters to make the string splittable by read -a. You can also use this sed command to remove unwanted characters from words (eg commas) before splitting.
If special characters are allowed in words, you can use a simple grep instead of the sed command (as suggested in http://www.unixcl.com/2009/07/split-string-to-characters-in-bash.html) :
read -r -a l${inc} <<< $(grep -o . <<< $word)
The word array is ${w}.
The letters arrays are named l# where # is an increment added for each word read.

Related

How to create a for loop that reads each character in ksh

I'm unable to find any for loop syntax that works with ksh for me. I'm wanting to create a program that esentially will add up numbers that are assigned to letters when a user inputs a word into the program. I want the for loop to read the first character, grab the number value of the letter, and then add the value to a variable that starts out equaled to 0. Then the next letter will get added to the variable's value and so on until there are no more letters and there will be a variable from the for loop that equals a value.
I understand I more than likely need an array that specifies what a letter's (a-z) value would be (1-26)...which I am finding difficult to figure that out as well. Or worst case I figure out the for loop and then make about 26 if statements saying something like if the letter equals c, add 3 to the variable.
So far I have this (which is pretty bare bones):
#!/bin/ksh
typeset -A my_array
my_array[a]=1
my_array[b]=2
my_array[c]=3
echo "Enter a word: \c"
read work
for (( i=0; i<${work}; i++ )); do
echo "${work:$i:1}"
done
Pretty sure this for loop is bash and not ksh. And the array returns an error of typeset: bad option(s) (I understand I haven't specified the array in the for loop).
I want the array letters (a, b, c, etc) to correspond to a value such as a = 1, b = 2, c = 3 and so on. For example the word is 'abc' and so it would add 1 + 2 + 3 and the final output will be 6.
You were missing the pound in ${#work}, which expands to 'length of ${work}'.
#!/bin/ksh
typeset -A my_array
my_array[a]=1
my_array[b]=2
my_array[c]=3
read 'work?enter a word: '
for (( i=0; i<${#work}; i++ )); do
c=${work:$i:1} w=${my_array[${work:$i:1}]}
echo "$c: $w"
done
ksh2020 supports read -p PROMPT syntax, which would make this script 100% compatible with bash. ksh93 does not. You could also use printf 'enter a word: ', which works in all three (ksh2020, ksh93, bash) and zsh.
Both ksh2020 and ksh93 understand read var?prompt which I used above.
First check the input work, you only want {a..z}.
charwork=$(tr -cd "[a-z]" <<< "$work")
Next you can fill 2 arrays with corresponding values
a=( {a..z} )
b=( {1..26} )
Using these arrays you can make a file with sed commands
for ((i=0;i<26;i++)); do echo "s/${a[i]}/${b[i]}+/g"; done > replaceletters.sed
# test it
echo "abcz" | sed -f replaceletters.sed
# result: 1+2+3+26+
Before you can pipe this into bc, use sed to remove the last + character.
Append s/+$/\n/ to replaceletters.sed and bc can calculate it.
Now you can use sed for replacing letters by digits and insert + signs.
Combining the steps, you have
a=( {a..z} )
b=( {1..26} )
tr -cd "[a-z]" <<< "$work" |
sed -f <(
for ((i=0;i<26;i++)); do
echo "s/${a[i]}/${b[i]}+/g"
done
echo 's/+$/\n/'
) | bc
In the loop you can use $i and avoid array b, but remember that the array start with 0, so a[5] corresponds with 6.

BASH: Sorting Associative array without echoing it [duplicate]

I have an array in Bash, for example:
array=(a c b f 3 5)
I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.
You don't really need all that much code:
IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS
Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.
e.g.:
$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[#]}"
[3 5]
[a c]
[b]
[f]
Note: #sorontar has pointed out that care is required if elements contain wildcards such as * or ?:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.
What's happening:
The result is a culmination six things that happen in this order:
IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS
First, the IFS=$'\n'
This is an important part of our operation that affects the outcome of 2 and 5 in the following way:
Given:
"${array[*]}" expands to every element delimited by the first character of IFS
sorted=() creates elements by splitting on every character of IFS
IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)
Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.
The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.
Next, the sort <<<"${array[*]}" part
<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.
With our example, sort is fed this following string:
a c
b
f
3 5
Since sort sorts, it produces:
3 5
a c
b
f
Next, the sorted=($(...)) part
The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.
In our example, this produces something similar to simply writing:
sorted=(3 5
a c
b
f
)
sorted then becomes an array that's created by splitting this literal on every new line.
Finally, the unset IFS
This resets the value of IFS to the default value, and is just good practice.
It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)
Original response:
array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[#]}"; do echo "$a"; done | sort)
output:
$ for a in "${sorted[#]}"; do echo "$a"; done
3
5
a
b
c
f f
Note this version copes with values that contains special characters or whitespace (except newlines)
Note readarray is supported in bash 4+.
Edit Based on the suggestion by #Dimitre I had updated it to:
readarray -t sorted < <(printf '%s\0' "${array[#]}" | sort -z | xargs -0n1)
which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by #ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.
If you don't need to handle special shell characters in the array elements:
array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[#]}"|sort))
With bash you'll need an external sorting program anyway.
With zsh no external programs are needed and special shell characters are easily handled:
% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[#]}"
3
5
a a
b
c
f
ksh has set -s to sort ASCIIbetically.
Here's a pure Bash quicksort implementation:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
local pivot i smaller=() larger=()
qsort_ret=()
(($#==0)) && return 0
pivot=$1
shift
for i; do
# This sorts strings lexicographically.
if [[ $i < $pivot ]]; then
smaller+=( "$i" )
else
larger+=( "$i" )
fi
done
qsort "${smaller[#]}"
smaller=( "${qsort_ret[#]}" )
qsort "${larger[#]}"
larger=( "${qsort_ret[#]}" )
qsort_ret=( "${smaller[#]}" "$pivot" "${larger[#]}" )
}
Use as, e.g.,
$ array=(a c b f 3 5)
$ qsort "${array[#]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'
This implementation is recursive… so here's an iterative quicksort:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
(($#==0)) && return 0
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
(($#<=1)) && return 0
local compare_fun=$1
shift
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
Then you can have this comparison function:
compare_mtime() { [[ $1 -nt $2 ]]; }
and use:
$ qsort compare_mtime *
$ declare -p qsort_ret
to have the files in current folder sorted by modification time (newest first).
NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).
NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead.
Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.
tl;dr:
Sort array a_in and store the result in a_out (elements must not have embedded newlines[1]
):
Bash v4+:
readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Bash v3:
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Advantages over antak's solution:
You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.[2]
Optional reading: explanation and sample code
The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):
Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
(The exact thresholds will depend on your specific input, machine, and platform.)
The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[#]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
"${a_in[#]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
Bash v3: readarray doesn't exist, so read must be used:
IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
(-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)
Annotated sample code:
#!/usr/bin/env bash
# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )
# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Print sorted output array, line by line:
printf '%s\n' "${a_out[#]}"
Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):
*
10
5
a c
b
f
If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):
*
a c
b
f
5
10
[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[#]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.
[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.
In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):
declare BSORT=()
function bubble_sort()
{ #
# #param [ARGUMENTS]...
#
# Sort all positional arguments and store them in global array BSORT.
# Without arguments sort this array. Return the number of iterations made.
#
# Bubble sorting lets the heaviest element sink to the bottom.
#
(($# > 0)) && BSORT=("$#")
local j=0 ubound=$((${#BSORT[*]} - 1))
while ((ubound > 0))
do
local i=0
while ((i < ubound))
do
if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
then
local t="${BSORT[$i]}"
BSORT[$i]="${BSORT[$((i + 1))]}"
BSORT[$((i + 1))]="$t"
fi
((++i))
done
((++j))
((--ubound))
done
echo $j
}
bubble_sort a c b 'z y' 3 5
echo ${BSORT[#]}
This prints:
3 5 a b c z y
The same output is created from
BSORT=(a c b 'z y' 3 5)
bubble_sort
echo ${BSORT[#]}
Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.
Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).
local e new_array=()
while IFS= read -r -d '' e; do
new_array+=( "${e}" )
done < <(printf "%s\0" "${array[#]}" | LC_ALL=C sort -z)
First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:
for e in "${array[#]}"; do
printf "%s\0" "${e}"
done
The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.
The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).
Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.
As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).
Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).
Keep it simple ;)
In the following example, the array b is the sorted version of the array a!
The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.
a=(2 3 1)
b=( $( for x in ${a[#]}; do echo $x; done | sort ) )
echo ${b[#]} # output: 1 2 3
min sort:
#!/bin/bash
array=(.....)
index_of_element1=0
while (( ${index_of_element1} < ${#array[#]} )); do
element_1="${array[${index_of_element1}]}"
index_of_element2=$((index_of_element1 + 1))
index_of_min=${index_of_element1}
min_element="${element_1}"
for element_2 in "${array[#]:$((index_of_element1 + 1))}"; do
min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"
if [[ "${min_element}" == "${element_2}" ]]; then
index_of_min=${index_of_element2}
fi
let index_of_element2++
done
array[${index_of_element1}]="${min_element}"
array[${index_of_min}]="${element_1}"
let index_of_element1++
done
try this:
echo ${array[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
Output will be:
3
5
a
b
c
f
Problem solved.
If you can compute a unique integer for each element in the array, like this:
tab='0123456789abcdefghijklmnopqrstuvwxyz'
# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
declare -g ord_${tab:i:1}=$i
done
function sexy_int() {
local sum=0
local i ch ref
for ((i = 0; i < ${#1}; i++)); do
ch="${1:i:1}"
ref="ord_$ch"
(( sum += ${!ref} ))
done
return $sum
}
sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"
then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:
array=(a c b f 3 5)
for el in "${array[#]}"; do
sexy_int "$el"
sorted[$?]="$el"
done
echo "${sorted[#]}"
Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
array=(a c b f 3 5)
new_array=($(echo "${array[#]}" | sed 's/ /\n/g' | sort))
echo ${new_array[#]}
echo contents of new_array will be:
3 5 a b c f
There is a workaround for the usual problem of spaces and newlines:
Use a character that is not in the original array (like $'\1' or $'\4' or similar).
This function gets the job done:
# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
if [[ $* =~ [$wa] ]]; then
echo "$0: error: array contains the workaround char" >&2
exit 1
fi
set -f; local IFS=$'\n' x nl=$'\n'
set -- $(printf '%s\n' "${#//$nl/$wa}" | sort -n)
for x
do sorted+=("${x//$wa/$nl}")
done
}
This will sort the array:
$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[#]}"
$ printf '<%s>\n' "${sorted[#]}"
<a>
<b>
<c d>
<e
f>
<gh>
This will complain that the source array contains the workaround character:
$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[#]}"
./script: error: array contains the workaround char
description
We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $#).
but we replace any new line by the workaround char "${#//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done
This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):
mergesort() {
local -n -r input_reference="$1"
local -n output_reference="$2"
local -r -i size="${#input_reference[#]}"
local merge previous
local -a -i runs indices
local -i index previous_idx merged_idx \
run_a_idx run_a_stop \
run_b_idx run_b_stop
output_reference=("${input_reference[#]}")
if ((size == 0)); then return; fi
previous="${output_reference[0]}"
runs=(0)
for ((index = 0;;)) do
for ((++index;; ++index)); do
if ((index >= size)); then break 2; fi
if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
previous="${output_reference[index]}"
done
previous="${output_reference[index]}"
runs+=(index)
done
runs+=(size)
while (("${#runs[#]}" > 2)); do
indices=("${!runs[#]}")
merge=("${output_reference[#]}")
for ((index = 0; index < "${#indices[#]}" - 2; index += 2)); do
merged_idx=runs[indices[index]]
run_a_idx=merged_idx
previous_idx=indices[$((index + 1))]
run_a_stop=runs[previous_idx]
run_b_idx=runs[previous_idx]
run_b_stop=runs[indices[$((index + 2))]]
unset runs[previous_idx]
while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
output_reference[merged_idx++]="${merge[run_a_idx++]}"
else
output_reference[merged_idx++]="${merge[run_b_idx++]}"
fi
done
while ((run_a_idx < run_a_stop)); do
output_reference[merged_idx++]="${merge[run_a_idx++]}"
done
while ((run_b_idx < run_b_stop)); do
output_reference[merged_idx++]="${merge[run_b_idx++]}"
done
done
done
}
declare -ar input=({z..a}{z..a})
declare -a output
mergesort input output
echo "${input[#]}"
echo "${output[#]}"
Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change
array=("a \n c" b f "3 5")
Using process substitution and read array in bash > v4.4 WITH EOL character
readarray -t sorted < <(sort < <(printf '%s\n' "${array[#]}"))
Using process substitution and read array in bash > v4.4 WITH NULL character
readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[#]}"))
Finally we verify with
printf "[%s]\n" "${sorted[#]}"
output is
[3 5]
[a \n c]
[b]
[f]
Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n
I am not convinced that you'll need an external sorting program in Bash.
Here is my implementation for the simple bubble-sort algorithm.
function bubble_sort()
{ #
# Sorts all positional arguments and echoes them back.
#
# Bubble sorting lets the heaviest (longest) element sink to the bottom.
#
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i += 1))
done
((max -= 1))
done
echo ${array[#]}
}
array=(a c b f 3 5)
echo " input: ${array[#]}"
echo "output: $(bubble_sort ${array[#]})"
This shall print:
input: a c b f 3 5
output: 3 5 a b c f
a=(e b 'c d')
shuf -e "${a[#]}" | sort >/tmp/f
mapfile -t g </tmp/f
Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:
a=(3 "2 a" 1) # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[#]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[#]}"; # Success!
Outputs:
'1' '2 a' '3'
Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.
Inspiration was from #yas's answer and #Alcamtar comments.
EDIT
Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!
IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
Turns out that the unset is required because this is a variable assignment that has no command.
I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.
EDIT 2
GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.
array=(z 'b c'); { set "${array[#]}"; printf '%s\n' "$#"; } \
| sort \
| mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
Copy the array to the positional arguments. (e.g. set "${array[#]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
Print each positional argument (e.g. printf '%s\n' "$#" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
Then sort does its thing.
Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
Dump the array to show its been sorted.
As a function:
set +m
shopt -s lastpipe
sort_array() {
declare -n ref=$1
set "${ref[#]}"
printf '%s\n' "$#"
| sort \
| mapfile -t $ref
}
then
array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")
I look forward to being ripped apart by all the UNIX gurus! :)
sorted=($(echo ${array[#]} | tr " " "\n" | sort))
In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:
Echo array content --> replace space by newline --> sort
$() is to echo the result
($()) is to put the "echoed result" in an array
Note: as #sorontar mentioned in a comment to a different question:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Passing and Parsing an Array from one shell script to another

Due to the limitation of 9 parameters in a script, my objective is to pass about 30 strings bundled in an array from calling script (scriptA) to called script (scriptB).
My scriptA looks something like this...
#!/bin/bash
declare -a arr=( ab "c d" 123 "string with spaces" 456 )
. ./scriptB.sh "Task Name" "${arr[#]}"
My scriptB looks something like this...
#!/bin/bash
arg1="$1"
shift
arg2=("$#")
read -a arr1 <<< "$#"
j=0
for i in "${arr1[#]}"; do
#echo ${arr1[j]}
((j++))
case "$j" in
"1")
param1="${i//(}"
echo "$j=$param1"
;;
"2")
param2="${i}"
echo "$j=$param2"
;;
"3")
param3="${i}"
echo "$j=$param3"
;;
"4")
param4="${i}"
echo "$j=$param4"
;;
"5")
param5="${i//)}"
echo "$j=$param5"
;;
esac
done
OUTPUT:
1=ab
2=c
3=d
4=123
5=string
Problem:
1. I see parenthesis ( and ) gets added to the string which I have to strip them out
2. I see an array element (with spaces) though quoted under double quotes get to interpreted as separate elements by spaces.
read -a arr1 <<< "$#"
is wrong. The "$#" here is equal to "$*", and then read will split the input on whitespaces (spaces, tabs and newlines) and also interpret \ slashes and assign the result to array arr1. Remember to use read -r.
Do:
arr1=("$#")
to assign to an array. Then you could print with:
for ((i=1;i<${#arr1};++i)); do
printf "%d=%s\n" "$i" "${arr1[$i]}"
done
of 9 parameters in a script, my objective is to pass about 30 strings bundled in an array from calling script (scriptA)
Ok. But "${arr[#]}" is passing multiple arguments anyway. If you want to pass array as string, pass it as a string (note that eval is evil):
arr=( ab "c d" 123 "string with spaces" 456 )
./scriptB.sh "Task Name" "$(declare -p arr)"
# Then inside scriptB.sh, re-evaulate parameter 2:
eval "$2" # assigns to arr
Note that the scriptB.sh is sourced in your example, so passing arguments.... makes no sense anyway.
I see an array element (with spaces) though quoted under double quotes get to interpreted as separate elements by spaces
Yes, because you interpreted the content with read, which splits the input on characters in IFS, which by default is set to space, tab and newline. You could print arguments on separate lines and change IFS accordingly:
IFS=$'\n' read -r -a arr1 < <(printf "%s\n" "$#")
or even use a zero terminated string:
mapfile -t -d '' arr1 < <(printf "%s\0" "$#")
but those are just fancy and useless ways of writing arr1=("$#").
Note that in your code snipped, arg2 is an array.

Bash array with spaces in elements

I'm trying to construct an array in bash of the filenames from my camera:
FILES=(2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg)
As you can see, there is a space in the middle of each filename.
I've tried wrapping each name in quotes, and escaping the space with a backslash, neither of which works.
When I try to access the array elements, it continues to treat the space as the elementdelimiter.
How can I properly capture the filenames with a space inside the name?
I think the issue might be partly with how you're accessing the elements. If I do a simple for elem in $FILES, I experience the same issue as you. However, if I access the array through its indices, like so, it works if I add the elements either numerically or with escapes:
for ((i = 0; i < ${#FILES[#]}; i++))
do
echo "${FILES[$i]}"
done
Any of these declarations of $FILES should work:
FILES=(2011-09-04\ 21.43.02.jpg
2011-09-05\ 10.23.14.jpg
2011-09-09\ 12.31.16.jpg
2011-09-11\ 08.43.12.jpg)
or
FILES=("2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg")
or
FILES[0]="2011-09-04 21.43.02.jpg"
FILES[1]="2011-09-05 10.23.14.jpg"
FILES[2]="2011-09-09 12.31.16.jpg"
FILES[3]="2011-09-11 08.43.12.jpg"
There must be something wrong with the way you access the array's items. Here's how it's done:
for elem in "${files[#]}"
...
From the bash manpage:
Any element of an array may be referenced using ${name[subscript]}. ... If subscript is # or *, the word expands to all members of name. These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable, and ${name[#]} expands each element of name to a separate word.
Of course, you should also use double quotes when accessing a single member
cp "${files[0]}" /tmp
You need to use IFS to stop space as element delimiter.
FILES=("2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg")
IFS=""
for jpg in ${FILES[*]}
do
echo "${jpg}"
done
If you want to separate on basis of . then just do IFS="."
Hope it helps you:)
I agree with others that it's likely how you're accessing the elements that is the problem. Quoting the file names in the array assignment is correct:
FILES=(
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
)
for f in "${FILES[#]}"
do
echo "$f"
done
Using double quotes around any array of the form "${FILES[#]}" splits the array into one word per array element. It doesn't do any word-splitting beyond that.
Using "${FILES[*]}" also has a special meaning, but it joins the array elements with the first character of $IFS, resulting in one word, which is probably not what you want.
Using a bare ${array[#]} or ${array[*]} subjects the result of that expansion to further word-splitting, so you'll end up with words split on spaces (and anything else in $IFS) instead of one word per array element.
Using a C-style for loop is also fine and avoids worrying about word-splitting if you're not clear on it:
for (( i = 0; i < ${#FILES[#]}; i++ ))
do
echo "${FILES[$i]}"
done
If you had your array like this:
#!/bin/bash
Unix[0]='Debian'
Unix[1]="Red Hat"
Unix[2]='Ubuntu'
Unix[3]='Suse'
for i in $(echo ${Unix[#]});
do echo $i;
done
You would get:
Debian
Red
Hat
Ubuntu
Suse
I don't know why but the loop breaks down the spaces and puts them as an individual item, even you surround it with quotes.
To get around this, instead of calling the elements in the array, you call the indexes, which takes the full string thats wrapped in quotes.
It must be wrapped in quotes!
#!/bin/bash
Unix[0]='Debian'
Unix[1]='Red Hat'
Unix[2]='Ubuntu'
Unix[3]='Suse'
for i in $(echo ${!Unix[#]});
do echo ${Unix[$i]};
done
Then you'll get:
Debian
Red Hat
Ubuntu
Suse
This was already answered above, but that answer was a bit terse and the man page excerpt is a bit cryptic. I wanted to provide a fully worked example to demonstrate how this works in practice.
If not quoted, an array just expands to strings separated by spaces, so that
for file in ${FILES[#]}; do
expands to
for file in 2011-09-04 21.43.02.jpg 2011-09-05 10.23.14.jpg 2011-09-09 12.31.16.jpg 2011-09-11 08.43.12.jpg ; do
But if you quote the expansion, bash adds double quotes around each term, so that:
for file in "${FILES[#]}"; do
expands to
for file in "2011-09-04 21.43.02.jpg" "2011-09-05 10.23.14.jpg" "2011-09-09 12.31.16.jpg" "2011-09-11 08.43.12.jpg" ; do
The simple rule of thumb is to always use [#] instead of [*] and quote array expansions if you want spaces preserved.
To elaborate on this a little further, the man page in the other answer is explaining that if unquoted, $* an $# behave the same way, but they are different when quoted. So, given
array=(a b c)
Then $* and $# both expand to
a b c
and "$*" expands to
"a b c"
and "$#" expands to
"a" "b" "c"
Not exactly an answer to the quoting/escaping problem of the original question but probably something that would actually have been more useful for the op:
unset FILES
for f in 2011-*.jpg; do FILES+=("$f"); done
echo "${FILES[#]}"
Where of course the expression would have to be adopted to the specific requirement (e.g. *.jpg for all or 2001-09-11*.jpg for only the pictures of a certain day).
For those who prefer set array in oneline mode, instead of using for loop
Changing IFS temporarily to new line could save you from escaping.
OLD_IFS="$IFS"
IFS=$'\n'
array=( $(ls *.jpg) ) #save the hassle to construct filename
IFS="$OLD_IFS"
Escaping works.
#!/bin/bash
FILES=(2011-09-04\ 21.43.02.jpg
2011-09-05\ 10.23.14.jpg
2011-09-09\ 12.31.16.jpg
2011-09-11\ 08.43.12.jpg)
echo ${FILES[0]}
echo ${FILES[1]}
echo ${FILES[2]}
echo ${FILES[3]}
Output:
$ ./test.sh
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg
Quoting the strings also produces the same output.
#! /bin/bash
renditions=(
"640x360 80k 60k"
"1280x720 320k 128k"
"1280x720 320k 128k"
)
for z in "${renditions[#]}"; do
echo "$z"
done
OUTPUT
640x360 80k 60k
1280x720 320k 128k
1280x720 320k 128k
`
Another solution is using a "while" loop instead a "for" loop:
index=0
while [ ${index} -lt ${#Array[#]} ]
do
echo ${Array[${index}]}
index=$(( $index + 1 ))
done
If you aren't stuck on using bash, different handling of spaces in file names is one of the benefits of the fish shell. Consider a directory which contains two files: "a b.txt" and "b c.txt". Here's a reasonable guess at processing a list of files generated from another command with bash, but it fails due to spaces in file names you experienced:
# bash
$ for f in $(ls *.txt); { echo $f; }
a
b.txt
b
c.txt
With fish, the syntax is nearly identical, but the result is what you'd expect:
# fish
for f in (ls *.txt); echo $f; end
a b.txt
b c.txt
It works differently because fish splits the output of commands on newlines, not spaces.
If you have a case where you do want to split on spaces instead of newlines, fish has a very readable syntax for that:
for f in (ls *.txt | string split " "); echo $f; end
If the elements of FILES come from another file whose file names are line-separated like this:
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg
then try this so that the whitespaces in the file names aren't regarded as delimiters:
while read -r line; do
FILES+=("$line")
done < ./files.txt
If they come from another command, you need to rewrite the last line like this:
while read -r line; do
FILES+=("$line")
done < <(./output-files.sh)
I used to reset the IFS value and rollback when done.
# backup IFS value
O_IFS=$IFS
# reset IFS value
IFS=""
FILES=(
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
)
for file in ${FILES[#]}; do
echo ${file}
done
# rollback IFS value
IFS=${O_IFS}
Possible output from the loop:
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg

How to sort an array in Bash

I have an array in Bash, for example:
array=(a c b f 3 5)
I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.
You don't really need all that much code:
IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS
Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.
e.g.:
$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[#]}"
[3 5]
[a c]
[b]
[f]
Note: #sorontar has pointed out that care is required if elements contain wildcards such as * or ?:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.
What's happening:
The result is a culmination six things that happen in this order:
IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS
First, the IFS=$'\n'
This is an important part of our operation that affects the outcome of 2 and 5 in the following way:
Given:
"${array[*]}" expands to every element delimited by the first character of IFS
sorted=() creates elements by splitting on every character of IFS
IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)
Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.
The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.
Next, the sort <<<"${array[*]}" part
<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.
With our example, sort is fed this following string:
a c
b
f
3 5
Since sort sorts, it produces:
3 5
a c
b
f
Next, the sorted=($(...)) part
The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.
In our example, this produces something similar to simply writing:
sorted=(3 5
a c
b
f
)
sorted then becomes an array that's created by splitting this literal on every new line.
Finally, the unset IFS
This resets the value of IFS to the default value, and is just good practice.
It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)
Original response:
array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[#]}"; do echo "$a"; done | sort)
output:
$ for a in "${sorted[#]}"; do echo "$a"; done
3
5
a
b
c
f f
Note this version copes with values that contains special characters or whitespace (except newlines)
Note readarray is supported in bash 4+.
Edit Based on the suggestion by #Dimitre I had updated it to:
readarray -t sorted < <(printf '%s\0' "${array[#]}" | sort -z | xargs -0n1)
which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by #ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.
If you don't need to handle special shell characters in the array elements:
array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[#]}"|sort))
With bash you'll need an external sorting program anyway.
With zsh no external programs are needed and special shell characters are easily handled:
% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[#]}"
3
5
a a
b
c
f
ksh has set -s to sort ASCIIbetically.
Here's a pure Bash quicksort implementation:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
local pivot i smaller=() larger=()
qsort_ret=()
(($#==0)) && return 0
pivot=$1
shift
for i; do
# This sorts strings lexicographically.
if [[ $i < $pivot ]]; then
smaller+=( "$i" )
else
larger+=( "$i" )
fi
done
qsort "${smaller[#]}"
smaller=( "${qsort_ret[#]}" )
qsort "${larger[#]}"
larger=( "${qsort_ret[#]}" )
qsort_ret=( "${smaller[#]}" "$pivot" "${larger[#]}" )
}
Use as, e.g.,
$ array=(a c b f 3 5)
$ qsort "${array[#]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'
This implementation is recursive… so here's an iterative quicksort:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
(($#==0)) && return 0
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
(($#<=1)) && return 0
local compare_fun=$1
shift
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
Then you can have this comparison function:
compare_mtime() { [[ $1 -nt $2 ]]; }
and use:
$ qsort compare_mtime *
$ declare -p qsort_ret
to have the files in current folder sorted by modification time (newest first).
NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).
NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead.
Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.
tl;dr:
Sort array a_in and store the result in a_out (elements must not have embedded newlines[1]
):
Bash v4+:
readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Bash v3:
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Advantages over antak's solution:
You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.[2]
Optional reading: explanation and sample code
The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):
Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
(The exact thresholds will depend on your specific input, machine, and platform.)
The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[#]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
"${a_in[#]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
Bash v3: readarray doesn't exist, so read must be used:
IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
(-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)
Annotated sample code:
#!/usr/bin/env bash
# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )
# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Print sorted output array, line by line:
printf '%s\n' "${a_out[#]}"
Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):
*
10
5
a c
b
f
If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):
*
a c
b
f
5
10
[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[#]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.
[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.
In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):
declare BSORT=()
function bubble_sort()
{ #
# #param [ARGUMENTS]...
#
# Sort all positional arguments and store them in global array BSORT.
# Without arguments sort this array. Return the number of iterations made.
#
# Bubble sorting lets the heaviest element sink to the bottom.
#
(($# > 0)) && BSORT=("$#")
local j=0 ubound=$((${#BSORT[*]} - 1))
while ((ubound > 0))
do
local i=0
while ((i < ubound))
do
if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
then
local t="${BSORT[$i]}"
BSORT[$i]="${BSORT[$((i + 1))]}"
BSORT[$((i + 1))]="$t"
fi
((++i))
done
((++j))
((--ubound))
done
echo $j
}
bubble_sort a c b 'z y' 3 5
echo ${BSORT[#]}
This prints:
3 5 a b c z y
The same output is created from
BSORT=(a c b 'z y' 3 5)
bubble_sort
echo ${BSORT[#]}
Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.
Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).
local e new_array=()
while IFS= read -r -d '' e; do
new_array+=( "${e}" )
done < <(printf "%s\0" "${array[#]}" | LC_ALL=C sort -z)
First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:
for e in "${array[#]}"; do
printf "%s\0" "${e}"
done
The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.
The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).
Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.
As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).
Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).
Keep it simple ;)
In the following example, the array b is the sorted version of the array a!
The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.
a=(2 3 1)
b=( $( for x in ${a[#]}; do echo $x; done | sort ) )
echo ${b[#]} # output: 1 2 3
min sort:
#!/bin/bash
array=(.....)
index_of_element1=0
while (( ${index_of_element1} < ${#array[#]} )); do
element_1="${array[${index_of_element1}]}"
index_of_element2=$((index_of_element1 + 1))
index_of_min=${index_of_element1}
min_element="${element_1}"
for element_2 in "${array[#]:$((index_of_element1 + 1))}"; do
min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"
if [[ "${min_element}" == "${element_2}" ]]; then
index_of_min=${index_of_element2}
fi
let index_of_element2++
done
array[${index_of_element1}]="${min_element}"
array[${index_of_min}]="${element_1}"
let index_of_element1++
done
try this:
echo ${array[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
Output will be:
3
5
a
b
c
f
Problem solved.
If you can compute a unique integer for each element in the array, like this:
tab='0123456789abcdefghijklmnopqrstuvwxyz'
# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
declare -g ord_${tab:i:1}=$i
done
function sexy_int() {
local sum=0
local i ch ref
for ((i = 0; i < ${#1}; i++)); do
ch="${1:i:1}"
ref="ord_$ch"
(( sum += ${!ref} ))
done
return $sum
}
sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"
then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:
array=(a c b f 3 5)
for el in "${array[#]}"; do
sexy_int "$el"
sorted[$?]="$el"
done
echo "${sorted[#]}"
Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
array=(a c b f 3 5)
new_array=($(echo "${array[#]}" | sed 's/ /\n/g' | sort))
echo ${new_array[#]}
echo contents of new_array will be:
3 5 a b c f
There is a workaround for the usual problem of spaces and newlines:
Use a character that is not in the original array (like $'\1' or $'\4' or similar).
This function gets the job done:
# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
if [[ $* =~ [$wa] ]]; then
echo "$0: error: array contains the workaround char" >&2
exit 1
fi
set -f; local IFS=$'\n' x nl=$'\n'
set -- $(printf '%s\n' "${#//$nl/$wa}" | sort -n)
for x
do sorted+=("${x//$wa/$nl}")
done
}
This will sort the array:
$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[#]}"
$ printf '<%s>\n' "${sorted[#]}"
<a>
<b>
<c d>
<e
f>
<gh>
This will complain that the source array contains the workaround character:
$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[#]}"
./script: error: array contains the workaround char
description
We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $#).
but we replace any new line by the workaround char "${#//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done
This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):
mergesort() {
local -n -r input_reference="$1"
local -n output_reference="$2"
local -r -i size="${#input_reference[#]}"
local merge previous
local -a -i runs indices
local -i index previous_idx merged_idx \
run_a_idx run_a_stop \
run_b_idx run_b_stop
output_reference=("${input_reference[#]}")
if ((size == 0)); then return; fi
previous="${output_reference[0]}"
runs=(0)
for ((index = 0;;)) do
for ((++index;; ++index)); do
if ((index >= size)); then break 2; fi
if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
previous="${output_reference[index]}"
done
previous="${output_reference[index]}"
runs+=(index)
done
runs+=(size)
while (("${#runs[#]}" > 2)); do
indices=("${!runs[#]}")
merge=("${output_reference[#]}")
for ((index = 0; index < "${#indices[#]}" - 2; index += 2)); do
merged_idx=runs[indices[index]]
run_a_idx=merged_idx
previous_idx=indices[$((index + 1))]
run_a_stop=runs[previous_idx]
run_b_idx=runs[previous_idx]
run_b_stop=runs[indices[$((index + 2))]]
unset runs[previous_idx]
while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
output_reference[merged_idx++]="${merge[run_a_idx++]}"
else
output_reference[merged_idx++]="${merge[run_b_idx++]}"
fi
done
while ((run_a_idx < run_a_stop)); do
output_reference[merged_idx++]="${merge[run_a_idx++]}"
done
while ((run_b_idx < run_b_stop)); do
output_reference[merged_idx++]="${merge[run_b_idx++]}"
done
done
done
}
declare -ar input=({z..a}{z..a})
declare -a output
mergesort input output
echo "${input[#]}"
echo "${output[#]}"
Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change
array=("a \n c" b f "3 5")
Using process substitution and read array in bash > v4.4 WITH EOL character
readarray -t sorted < <(sort < <(printf '%s\n' "${array[#]}"))
Using process substitution and read array in bash > v4.4 WITH NULL character
readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[#]}"))
Finally we verify with
printf "[%s]\n" "${sorted[#]}"
output is
[3 5]
[a \n c]
[b]
[f]
Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n
I am not convinced that you'll need an external sorting program in Bash.
Here is my implementation for the simple bubble-sort algorithm.
function bubble_sort()
{ #
# Sorts all positional arguments and echoes them back.
#
# Bubble sorting lets the heaviest (longest) element sink to the bottom.
#
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i += 1))
done
((max -= 1))
done
echo ${array[#]}
}
array=(a c b f 3 5)
echo " input: ${array[#]}"
echo "output: $(bubble_sort ${array[#]})"
This shall print:
input: a c b f 3 5
output: 3 5 a b c f
a=(e b 'c d')
shuf -e "${a[#]}" | sort >/tmp/f
mapfile -t g </tmp/f
Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:
a=(3 "2 a" 1) # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[#]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[#]}"; # Success!
Outputs:
'1' '2 a' '3'
Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.
Inspiration was from #yas's answer and #Alcamtar comments.
EDIT
Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!
IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
Turns out that the unset is required because this is a variable assignment that has no command.
I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.
EDIT 2
GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.
array=(z 'b c'); { set "${array[#]}"; printf '%s\n' "$#"; } \
| sort \
| mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
Copy the array to the positional arguments. (e.g. set "${array[#]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
Print each positional argument (e.g. printf '%s\n' "$#" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
Then sort does its thing.
Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
Dump the array to show its been sorted.
As a function:
set +m
shopt -s lastpipe
sort_array() {
declare -n ref=$1
set "${ref[#]}"
printf '%s\n' "$#"
| sort \
| mapfile -t $ref
}
then
array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")
I look forward to being ripped apart by all the UNIX gurus! :)
sorted=($(echo ${array[#]} | tr " " "\n" | sort))
In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:
Echo array content --> replace space by newline --> sort
$() is to echo the result
($()) is to put the "echoed result" in an array
Note: as #sorontar mentioned in a comment to a different question:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Resources