Converting a Bash array into a delimited string - arrays

I would like to know the following;
Why the given non-working example doesn't work.
If there are any other cleaner methods than those given in working example.
Non-working example
> ids=(1 2 3 4);echo ${ids[*]// /|}
1 2 3 4
> ids=(1 2 3 4);echo ${${ids[*]}// /|}
-bash: ${${ids[*]}// /|}: bad substitution
> ids=(1 2 3 4);echo ${"${ids[*]}"// /|}
-bash: ${"${ids[*]}"// /|}: bad substitution
Working example
> ids=(1 2 3 4);id="${ids[#]}";echo ${id// /|}
1|2|3|4
> ids=(1 2 3 4); lst=$( IFS='|'; echo "${ids[*]}" ); echo $lst
1|2|3|4
In context, the delimited string to be used in a sed command for further parsing.

Because parentheses are used to delimit an array, not a string:
ids="1 2 3 4";echo ${ids// /|}
1|2|3|4
Some samples: Populating $ids with two strings: a b and c d
ids=("a b" "c d")
echo ${ids[*]// /|}
a|b c|d
IFS='|';echo "${ids[*]}";IFS=$' \t\n'
a b|c d
... and finally:
IFS='|';echo "${ids[*]// /|}";IFS=$' \t\n'
a|b|c|d
Where array is assembled, separated by 1st char of $IFS, but with space replaced by | in each element of array.
When you do:
id="${ids[#]}"
you transfer the string build from the merging of the array ids by a space to a new variable of type string.
Note: when "${ids[#]}" give a space-separated string, "${ids[*]}" (with a star * instead of the at sign #) will render a string separated by the first character of $IFS.
what man bash says:
man -Len -Pcol\ -b bash | sed -ne '/^ *IFS /{N;N;p;q}'
IFS The Internal Field Separator that is used for word splitting
after expansion and to split lines into words with the read
builtin command. The default value is ``<space><tab><newline>''.
Playing with $IFS:
declare -p IFS
declare -- IFS="
"
printf "%q\n" "$IFS"
$' \t\n'
Literally a space, a tabulation and (meaning or) a line-feed. So, while the first character is a space. the use of * will do the same as #.
But:
{
IFS=: read -a array < <(echo root:x:0:0:root:/root:/bin/bash)
echo 1 "${array[#]}"
echo 2 "${array[*]}"
OIFS="$IFS" IFS=:
echo 3 "${array[#]}"
echo 4 "${array[*]}"
IFS="$OIFS"
}
1 root x 0 0 root /root /bin/bash
2 root x 0 0 root /root /bin/bash
3 root x 0 0 root /root /bin/bash
4 root:x:0:0:root:/root:/bin/bash
Note: The line IFS=: read -a array < <(...) will use : as separator, without setting $IFS permanently. This is because output line #2 present spaces as separators.

You can use printf too, without any external commands or the need to manipulate IFS:
ids=(1 2 3 4) # create array
printf -v ids_d '|%s' "${ids[#]}" # yields "|1|2|3|4"
ids_d=${ids_d:1} # remove the leading '|'

Your first question is already addressed in F. Hauri's answer. Here's canonical way to join the elements of an array:
ids=( 1 2 3 4 )
IFS=\| eval 'lst="${ids[*]}"'
Some people will cry out loud that eval is evil, yet it's perfectly safe here, thanks to the single quotes. This only has advantages: there are no subshells, IFS is not globally modified, it will not trim trailing newlines, and it's very simple.

An utility function to join arguments array into a delimited string:
#!/usr/bin/env bash
# Join arguments with delimiter
# #Params
# $1: The delimiter string
# ${#:2}: The arguments to join
# #Output
# >&1: The arguments separated by the delimiter string
array::join() {
(($#)) || return 1 # At least delimiter required
local -- delim="$1" str IFS=
shift
str="${*/#/$delim}" # Expands arguments with prefixed delimiter (Empty IFS)
printf '%s\n' "${str:${#delim}}" # Echo without first delimiter
}
declare -a my_array=( 'Paris' 'Berlin' 'London' 'Brussel' 'Madrid' 'Oslo' )
array::join ', ' "${my_array[#]}"
array::join '*' {1..9} | bc # 1*2*3*4*5*6*7*8*9=362880 Factorial 9
declare -a null_array=()
array::join '== Ultimate separator of nothing ==' "${null_array[#]}"
Output:
Paris, Berlin, London, Brussel, Madrid, Oslo
362880
Now with Bash 4.2+'s nameref variables, using sub-shells output capture is no longer needed.
#!/usr/bin/env bash
if ((BASH_VERSINFO[0] < 4 || (BASH_VERSINFO[0] == 4 && BASH_VERSINFO[0] < 2)))
then
printf 'Bash version 4.2 or above required for nameref variables\n' >&2
exit 1
fi
# Join arguments with delimiter
# #Params
# $1: The variable reference to receive the joined output
# $2: The delimiter string
# ${#:3}: The arguments to join
# #Output
array::join_to() {
(($# > 1)) || return 1 # At least nameref and delimiter required
local -n out="$1"
local -- delim="$2" str IFS=
shift 2
str="${*/#/$delim}" # Expands arguments with prefixed delimiter (Empty IFS)
# shellcheck disable=SC2034 # Nameref variable
out="${str:${#delim}}" # Discards prefixed delimiter
}
declare -g result1 result2 result3
declare -a my_array=( 'Paris' 'Berlin' 'London' 'Brussel' 'Madrid' 'Oslo' )
array::join_to result1 ', ' "${my_array[#]}"
array::join_to result2 '*' {1..9}
result2=$((result2)) # Expands arythmetic expression
declare -a null_array=()
array::join_to result3 '== Ultimate separator of nothing ==' "${null_array[#]}"
printf '%s\n' "$result1" "$result2" "$result3"

Related

BASH: Sorting Associative array without echoing it [duplicate]

I have an array in Bash, for example:
array=(a c b f 3 5)
I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.
You don't really need all that much code:
IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS
Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.
e.g.:
$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[#]}"
[3 5]
[a c]
[b]
[f]
Note: #sorontar has pointed out that care is required if elements contain wildcards such as * or ?:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.
What's happening:
The result is a culmination six things that happen in this order:
IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS
First, the IFS=$'\n'
This is an important part of our operation that affects the outcome of 2 and 5 in the following way:
Given:
"${array[*]}" expands to every element delimited by the first character of IFS
sorted=() creates elements by splitting on every character of IFS
IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)
Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.
The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.
Next, the sort <<<"${array[*]}" part
<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.
With our example, sort is fed this following string:
a c
b
f
3 5
Since sort sorts, it produces:
3 5
a c
b
f
Next, the sorted=($(...)) part
The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.
In our example, this produces something similar to simply writing:
sorted=(3 5
a c
b
f
)
sorted then becomes an array that's created by splitting this literal on every new line.
Finally, the unset IFS
This resets the value of IFS to the default value, and is just good practice.
It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)
Original response:
array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[#]}"; do echo "$a"; done | sort)
output:
$ for a in "${sorted[#]}"; do echo "$a"; done
3
5
a
b
c
f f
Note this version copes with values that contains special characters or whitespace (except newlines)
Note readarray is supported in bash 4+.
Edit Based on the suggestion by #Dimitre I had updated it to:
readarray -t sorted < <(printf '%s\0' "${array[#]}" | sort -z | xargs -0n1)
which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by #ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.
If you don't need to handle special shell characters in the array elements:
array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[#]}"|sort))
With bash you'll need an external sorting program anyway.
With zsh no external programs are needed and special shell characters are easily handled:
% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[#]}"
3
5
a a
b
c
f
ksh has set -s to sort ASCIIbetically.
Here's a pure Bash quicksort implementation:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
local pivot i smaller=() larger=()
qsort_ret=()
(($#==0)) && return 0
pivot=$1
shift
for i; do
# This sorts strings lexicographically.
if [[ $i < $pivot ]]; then
smaller+=( "$i" )
else
larger+=( "$i" )
fi
done
qsort "${smaller[#]}"
smaller=( "${qsort_ret[#]}" )
qsort "${larger[#]}"
larger=( "${qsort_ret[#]}" )
qsort_ret=( "${smaller[#]}" "$pivot" "${larger[#]}" )
}
Use as, e.g.,
$ array=(a c b f 3 5)
$ qsort "${array[#]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'
This implementation is recursive… so here's an iterative quicksort:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
(($#==0)) && return 0
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
(($#<=1)) && return 0
local compare_fun=$1
shift
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
Then you can have this comparison function:
compare_mtime() { [[ $1 -nt $2 ]]; }
and use:
$ qsort compare_mtime *
$ declare -p qsort_ret
to have the files in current folder sorted by modification time (newest first).
NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).
NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead.
Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.
tl;dr:
Sort array a_in and store the result in a_out (elements must not have embedded newlines[1]
):
Bash v4+:
readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Bash v3:
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Advantages over antak's solution:
You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.[2]
Optional reading: explanation and sample code
The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):
Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
(The exact thresholds will depend on your specific input, machine, and platform.)
The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[#]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
"${a_in[#]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
Bash v3: readarray doesn't exist, so read must be used:
IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
(-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)
Annotated sample code:
#!/usr/bin/env bash
# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )
# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Print sorted output array, line by line:
printf '%s\n' "${a_out[#]}"
Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):
*
10
5
a c
b
f
If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):
*
a c
b
f
5
10
[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[#]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.
[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.
In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):
declare BSORT=()
function bubble_sort()
{ #
# #param [ARGUMENTS]...
#
# Sort all positional arguments and store them in global array BSORT.
# Without arguments sort this array. Return the number of iterations made.
#
# Bubble sorting lets the heaviest element sink to the bottom.
#
(($# > 0)) && BSORT=("$#")
local j=0 ubound=$((${#BSORT[*]} - 1))
while ((ubound > 0))
do
local i=0
while ((i < ubound))
do
if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
then
local t="${BSORT[$i]}"
BSORT[$i]="${BSORT[$((i + 1))]}"
BSORT[$((i + 1))]="$t"
fi
((++i))
done
((++j))
((--ubound))
done
echo $j
}
bubble_sort a c b 'z y' 3 5
echo ${BSORT[#]}
This prints:
3 5 a b c z y
The same output is created from
BSORT=(a c b 'z y' 3 5)
bubble_sort
echo ${BSORT[#]}
Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.
Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).
local e new_array=()
while IFS= read -r -d '' e; do
new_array+=( "${e}" )
done < <(printf "%s\0" "${array[#]}" | LC_ALL=C sort -z)
First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:
for e in "${array[#]}"; do
printf "%s\0" "${e}"
done
The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.
The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).
Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.
As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).
Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).
Keep it simple ;)
In the following example, the array b is the sorted version of the array a!
The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.
a=(2 3 1)
b=( $( for x in ${a[#]}; do echo $x; done | sort ) )
echo ${b[#]} # output: 1 2 3
min sort:
#!/bin/bash
array=(.....)
index_of_element1=0
while (( ${index_of_element1} < ${#array[#]} )); do
element_1="${array[${index_of_element1}]}"
index_of_element2=$((index_of_element1 + 1))
index_of_min=${index_of_element1}
min_element="${element_1}"
for element_2 in "${array[#]:$((index_of_element1 + 1))}"; do
min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"
if [[ "${min_element}" == "${element_2}" ]]; then
index_of_min=${index_of_element2}
fi
let index_of_element2++
done
array[${index_of_element1}]="${min_element}"
array[${index_of_min}]="${element_1}"
let index_of_element1++
done
try this:
echo ${array[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
Output will be:
3
5
a
b
c
f
Problem solved.
If you can compute a unique integer for each element in the array, like this:
tab='0123456789abcdefghijklmnopqrstuvwxyz'
# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
declare -g ord_${tab:i:1}=$i
done
function sexy_int() {
local sum=0
local i ch ref
for ((i = 0; i < ${#1}; i++)); do
ch="${1:i:1}"
ref="ord_$ch"
(( sum += ${!ref} ))
done
return $sum
}
sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"
then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:
array=(a c b f 3 5)
for el in "${array[#]}"; do
sexy_int "$el"
sorted[$?]="$el"
done
echo "${sorted[#]}"
Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
array=(a c b f 3 5)
new_array=($(echo "${array[#]}" | sed 's/ /\n/g' | sort))
echo ${new_array[#]}
echo contents of new_array will be:
3 5 a b c f
There is a workaround for the usual problem of spaces and newlines:
Use a character that is not in the original array (like $'\1' or $'\4' or similar).
This function gets the job done:
# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
if [[ $* =~ [$wa] ]]; then
echo "$0: error: array contains the workaround char" >&2
exit 1
fi
set -f; local IFS=$'\n' x nl=$'\n'
set -- $(printf '%s\n' "${#//$nl/$wa}" | sort -n)
for x
do sorted+=("${x//$wa/$nl}")
done
}
This will sort the array:
$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[#]}"
$ printf '<%s>\n' "${sorted[#]}"
<a>
<b>
<c d>
<e
f>
<gh>
This will complain that the source array contains the workaround character:
$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[#]}"
./script: error: array contains the workaround char
description
We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $#).
but we replace any new line by the workaround char "${#//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done
This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):
mergesort() {
local -n -r input_reference="$1"
local -n output_reference="$2"
local -r -i size="${#input_reference[#]}"
local merge previous
local -a -i runs indices
local -i index previous_idx merged_idx \
run_a_idx run_a_stop \
run_b_idx run_b_stop
output_reference=("${input_reference[#]}")
if ((size == 0)); then return; fi
previous="${output_reference[0]}"
runs=(0)
for ((index = 0;;)) do
for ((++index;; ++index)); do
if ((index >= size)); then break 2; fi
if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
previous="${output_reference[index]}"
done
previous="${output_reference[index]}"
runs+=(index)
done
runs+=(size)
while (("${#runs[#]}" > 2)); do
indices=("${!runs[#]}")
merge=("${output_reference[#]}")
for ((index = 0; index < "${#indices[#]}" - 2; index += 2)); do
merged_idx=runs[indices[index]]
run_a_idx=merged_idx
previous_idx=indices[$((index + 1))]
run_a_stop=runs[previous_idx]
run_b_idx=runs[previous_idx]
run_b_stop=runs[indices[$((index + 2))]]
unset runs[previous_idx]
while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
output_reference[merged_idx++]="${merge[run_a_idx++]}"
else
output_reference[merged_idx++]="${merge[run_b_idx++]}"
fi
done
while ((run_a_idx < run_a_stop)); do
output_reference[merged_idx++]="${merge[run_a_idx++]}"
done
while ((run_b_idx < run_b_stop)); do
output_reference[merged_idx++]="${merge[run_b_idx++]}"
done
done
done
}
declare -ar input=({z..a}{z..a})
declare -a output
mergesort input output
echo "${input[#]}"
echo "${output[#]}"
Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change
array=("a \n c" b f "3 5")
Using process substitution and read array in bash > v4.4 WITH EOL character
readarray -t sorted < <(sort < <(printf '%s\n' "${array[#]}"))
Using process substitution and read array in bash > v4.4 WITH NULL character
readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[#]}"))
Finally we verify with
printf "[%s]\n" "${sorted[#]}"
output is
[3 5]
[a \n c]
[b]
[f]
Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n
I am not convinced that you'll need an external sorting program in Bash.
Here is my implementation for the simple bubble-sort algorithm.
function bubble_sort()
{ #
# Sorts all positional arguments and echoes them back.
#
# Bubble sorting lets the heaviest (longest) element sink to the bottom.
#
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i += 1))
done
((max -= 1))
done
echo ${array[#]}
}
array=(a c b f 3 5)
echo " input: ${array[#]}"
echo "output: $(bubble_sort ${array[#]})"
This shall print:
input: a c b f 3 5
output: 3 5 a b c f
a=(e b 'c d')
shuf -e "${a[#]}" | sort >/tmp/f
mapfile -t g </tmp/f
Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:
a=(3 "2 a" 1) # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[#]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[#]}"; # Success!
Outputs:
'1' '2 a' '3'
Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.
Inspiration was from #yas's answer and #Alcamtar comments.
EDIT
Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!
IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
Turns out that the unset is required because this is a variable assignment that has no command.
I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.
EDIT 2
GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.
array=(z 'b c'); { set "${array[#]}"; printf '%s\n' "$#"; } \
| sort \
| mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
Copy the array to the positional arguments. (e.g. set "${array[#]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
Print each positional argument (e.g. printf '%s\n' "$#" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
Then sort does its thing.
Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
Dump the array to show its been sorted.
As a function:
set +m
shopt -s lastpipe
sort_array() {
declare -n ref=$1
set "${ref[#]}"
printf '%s\n' "$#"
| sort \
| mapfile -t $ref
}
then
array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")
I look forward to being ripped apart by all the UNIX gurus! :)
sorted=($(echo ${array[#]} | tr " " "\n" | sort))
In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:
Echo array content --> replace space by newline --> sort
$() is to echo the result
($()) is to put the "echoed result" in an array
Note: as #sorontar mentioned in a comment to a different question:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

How to create mutiple arrays from a text file and loop through the values of each array

I have a text file with the following:
Paige
Buckley
Govan
Mayer
King
Harrison
Atkins
Reinhardt
Wilson
Vaughan
Sergovia
Tarrega
My goal is to create an array for each set of names. Then Iterate through the first array of values then move on to the second array of values and lastly the third array. Each set is separated by a new line in the text file. Help with code or logic is much appreciated!
so far I have the following. i am unsure of the logic moving forward when i reach a line break. My research here also suggests that i can use readarray -d.
#!/bin/bash
my_array=()
while IFS= read -r line || [[ "$line" ]]; do
if [[ $line -eq "" ]];
.
.
.
arr+=("$line") # i know this adds the value to the array
done < "$1"
printf '%s\n' "${my_array[#]}"
desired output:
array1 = (Paige Buckley6 Govan Mayer King)
array2 = (Harrison Atkins Reinhardt Wilson)
array3 = (Vaughan Sergovia Terrega)
#then loop through the each array one after the other.
Bash has no array-of-arrays. So you have to represent it in an other way.
You could leave the newlines and have an array of newline separated elements:
array=()
elem=""
while IFS= read -r line; do
if [[ "$line" != "" ]]; then
elem+="${elem:+$'\n'}$line" # accumulate lines in elem
else
array+=("$elem") # flush elem as array element
elem=""
fi
done
if [[ -n "$elem" ]]; then
array+=("$elem") # flush the last elem
fi
# iterate over array
for ((i=0;i<${#array[#]};++i)); do
# each array element is newline separated items
readarray -t elem <<<"${array[i]}"
printf 'array%d = (%s)\n' "$i" "${elem[*]}"
done
You could simplify the loop with some unique character and a sed for example like:
readarray -d '#' -t array < <(sed -z 's/\n\n/#/g' file)
But overall, this awk generates same output:
awk -v RS= -v FS='\n' '{
printf "array%d = (", NR;
for (i=1;i<=NF;++i) printf "%s%s", $i, i==NF?"":" ";
printf ")\n"
}'
Using nameref :
#!/usr/bin/env bash
declare -a array1 array2 array3
declare -n array=array$((n=1))
while IFS= read -r line; do
test "$line" = "" && declare -n array=array$((n=n+1)) || array+=("$line")
done < "$1"
declare -p array1 array2 array3
Called with :
bash test.sh data
# result
declare -a array1=([0]="Paige" [1]="Buckley" [2]="Govan" [3]="Mayer" [4]="King")
declare -a array2=([0]="Harrison" [1]="Atkins" [2]="Reinhardt" [3]="Wilson")
declare -a array3=([0]="Vaughan" [1]="Sergovia" [2]="Tarrega")
Assumptions:
blank links are truly blank (ie, no need to worry about any white space on said lines)
could have consecutive blank lines
names could have embedded white space
the number of groups could vary and won't always be 3 (as with the sample data provided in the question)
OP is open to using a (simulated) 2-dimensional array as opposed to a (variable) number of 1-dimensional arrays
My data file:
$ cat names.dat
<<< leading blank lines
Paige
Buckley
Govan
Mayer
King Kong
<<< consecutive blank lines
Harrison
Atkins
Reinhardt
Wilson
Larry
Moe
Curly
Shemp
Vaughan
Sergovia
Tarrega
<<< trailing blank lines
One idea that uses a couple arrays:
array #1: associative array - the previously mentioned (simulated) 2-dimensional array with the index - [x,y] - where x is a unique identifier for a group of names and y is a unique identifier for a name within a group
array #2: 1-dimensional array to keep track of max(y) for each group x
Loading the arrays:
unset names max_y # make sure array names are not already in use
declare -A names # declare associative array
x=1 # init group counter
y=0 # init name counter
max_y=() # initialize the max(y) array
inc= # clear increment flag
while read -r name
do
if [[ "${name}" = '' ]] # if we found a blank line ...
then
[[ "${y}" -eq 0 ]] && # if this is a leading blank line then ...
continue # ignore and skip to the next line
inc=y # set flag to increment 'x'
else
[[ "${inc}" = 'y' ]] && # if increment flag is set ...
max_y[${x}]="${y}" && # make note of max(y) for this 'x'
((x++)) && # increment 'x' (group counter)
y=0 && # reset 'y'
inc= # clear increment flag
((y++)) # increment 'y' (name counter)
names[${x},${y}]="${name}" # save the name
fi
done < names.dat
max_y[${x}]="${y}" # make note of the last max(y) value
Contents of the array:
$ typeset -p names
declare -A names=([1,5]="King Kong" [1,4]="Mayer" [1,1]="Paige" [1,3]="Govan" [1,2]="Buckley" [3,4]="Shemp" [3,3]="Curly" [3,2]="Moe" [3,1]="Larry" [2,4]="Wilson" [2,2]="Atkins" [2,3]="Reinhardt" [2,1]="Harrison" [4,1]="Vaughan" [4,2]="Sergovia" [4,3]="Tarrega" )
$ for (( i=1; i<=${x}; i++ ))
do
for (( j=1; j<=${max_y[${i}]}; j++ ))
do
echo "names[${i},${j}] : ${names[${i},${j}]}"
done
echo ""
done
names[1,1] : Paige
names[1,2] : Buckley
names[1,3] : Govan
names[1,4] : Mayer
names[1,5] : King Kong
names[2,1] : Harrison
names[2,2] : Atkins
names[2,3] : Reinhardt
names[2,4] : Wilson
names[3,1] : Larry
names[3,2] : Moe
names[3,3] : Curly
names[3,4] : Shemp
names[4,1] : Vaughan
names[4,2] : Sergovia
names[4,3] : Tarrega

Passing and Parsing an Array from one shell script to another

Due to the limitation of 9 parameters in a script, my objective is to pass about 30 strings bundled in an array from calling script (scriptA) to called script (scriptB).
My scriptA looks something like this...
#!/bin/bash
declare -a arr=( ab "c d" 123 "string with spaces" 456 )
. ./scriptB.sh "Task Name" "${arr[#]}"
My scriptB looks something like this...
#!/bin/bash
arg1="$1"
shift
arg2=("$#")
read -a arr1 <<< "$#"
j=0
for i in "${arr1[#]}"; do
#echo ${arr1[j]}
((j++))
case "$j" in
"1")
param1="${i//(}"
echo "$j=$param1"
;;
"2")
param2="${i}"
echo "$j=$param2"
;;
"3")
param3="${i}"
echo "$j=$param3"
;;
"4")
param4="${i}"
echo "$j=$param4"
;;
"5")
param5="${i//)}"
echo "$j=$param5"
;;
esac
done
OUTPUT:
1=ab
2=c
3=d
4=123
5=string
Problem:
1. I see parenthesis ( and ) gets added to the string which I have to strip them out
2. I see an array element (with spaces) though quoted under double quotes get to interpreted as separate elements by spaces.
read -a arr1 <<< "$#"
is wrong. The "$#" here is equal to "$*", and then read will split the input on whitespaces (spaces, tabs and newlines) and also interpret \ slashes and assign the result to array arr1. Remember to use read -r.
Do:
arr1=("$#")
to assign to an array. Then you could print with:
for ((i=1;i<${#arr1};++i)); do
printf "%d=%s\n" "$i" "${arr1[$i]}"
done
of 9 parameters in a script, my objective is to pass about 30 strings bundled in an array from calling script (scriptA)
Ok. But "${arr[#]}" is passing multiple arguments anyway. If you want to pass array as string, pass it as a string (note that eval is evil):
arr=( ab "c d" 123 "string with spaces" 456 )
./scriptB.sh "Task Name" "$(declare -p arr)"
# Then inside scriptB.sh, re-evaulate parameter 2:
eval "$2" # assigns to arr
Note that the scriptB.sh is sourced in your example, so passing arguments.... makes no sense anyway.
I see an array element (with spaces) though quoted under double quotes get to interpreted as separate elements by spaces
Yes, because you interpreted the content with read, which splits the input on characters in IFS, which by default is set to space, tab and newline. You could print arguments on separate lines and change IFS accordingly:
IFS=$'\n' read -r -a arr1 < <(printf "%s\n" "$#")
or even use a zero terminated string:
mapfile -t -d '' arr1 < <(printf "%s\0" "$#")
but those are just fancy and useless ways of writing arr1=("$#").
Note that in your code snipped, arg2 is an array.

How do you unset all empty array elements in bash? [duplicate]

I need to remove an element from an array in bash shell.
Generally I'd simply do:
array=("${(#)array:#<element to remove>}")
Unfortunately the element I want to remove is a variable so I can't use the previous command.
Down here an example:
array+=(pluto)
array+=(pippo)
delete=(pluto)
array( ${array[#]/$delete} ) -> but clearly doesn't work because of {}
Any idea?
The following works as you would like in bash and zsh:
$ array=(pluto pippo)
$ delete=pluto
$ echo ${array[#]/$delete}
pippo
$ array=( "${array[#]/$delete}" ) #Quotes when working with strings
If need to delete more than one element:
...
$ delete=(pluto pippo)
for del in ${delete[#]}
do
array=("${array[#]/$del}") #Quotes when working with strings
done
Caveat
This technique actually removes prefixes matching $delete from the elements, not necessarily whole elements.
Update
To really remove an exact item, you need to walk through the array, comparing the target to each element, and using unset to delete an exact match.
array=(pluto pippo bob)
delete=(pippo)
for target in "${delete[#]}"; do
for i in "${!array[#]}"; do
if [[ ${array[i]} = $target ]]; then
unset 'array[i]'
fi
done
done
Note that if you do this, and one or more elements is removed, the indices will no longer be a continuous sequence of integers.
$ declare -p array
declare -a array=([0]="pluto" [2]="bob")
The simple fact is, arrays were not designed for use as mutable data structures. They are primarily used for storing lists of items in a single variable without needing to waste a character as a delimiter (e.g., to store a list of strings which can contain whitespace).
If gaps are a problem, then you need to rebuild the array to fill the gaps:
for i in "${!array[#]}"; do
new_array+=( "${array[i]}" )
done
array=("${new_array[#]}")
unset new_array
You could build up a new array without the undesired element, then assign it back to the old array. This works in bash:
array=(pluto pippo)
new_array=()
for value in "${array[#]}"
do
[[ $value != pluto ]] && new_array+=($value)
done
array=("${new_array[#]}")
unset new_array
This yields:
echo "${array[#]}"
pippo
This is the most direct way to unset a value if you know it's position.
$ array=(one two three)
$ echo ${#array[#]}
3
$ unset 'array[1]'
$ echo ${array[#]}
one three
$ echo ${#array[#]}
2
This answer is specific to the case of deleting multiple values from large arrays, where performance is important.
The most voted solutions are (1) pattern substitution on an array, or (2) iterating over the array elements. The first is fast, but can only deal with elements that have distinct prefix, the second has O(n*k), n=array size, k=elements to remove. Associative array are relative new feature, and might not have been common when the question was originally posted.
For the exact match case, with large n and k, possible to improve performance from O(nk) to O(n+klog(k)). In practice, O(n) assuming k much lower than n. Most of the speed up is based on using associative array to identify items to be removed.
Performance (n-array size, k-values to delete). Performance measure seconds of user time
N K New(seconds) Current(seconds) Speedup
1000 10 0.005 0.033 6X
10000 10 0.070 0.348 5X
10000 20 0.070 0.656 9X
10000 1 0.043 0.050 -7%
As expected, the current solution is linear to N*K, and the fast solution is practically linear to K, with much lower constant. The fast solution is slightly slower vs the current solution when k=1, due to additional setup.
The 'Fast' solution: array=list of input, delete=list of values to remove.
declare -A delk
for del in "${delete[#]}" ; do delk[$del]=1 ; done
# Tag items to remove, based on
for k in "${!array[#]}" ; do
[ "${delk[${array[$k]}]-}" ] && unset 'array[k]'
done
# Compaction
array=("${array[#]}")
Benchmarked against current solution, from the most-voted answer.
for target in "${delete[#]}"; do
for i in "${!array[#]}"; do
if [[ ${array[i]} = $target ]]; then
unset 'array[i]'
fi
done
done
array=("${array[#]}")
Here's a one-line solution with mapfile:
$ mapfile -d $'\0' -t arr < <(printf '%s\0' "${arr[#]}" | grep -Pzv "<regexp>")
Example:
$ arr=("Adam" "Bob" "Claire"$'\n'"Smith" "David" "Eve" "Fred")
$ echo "Size: ${#arr[*]} Contents: ${arr[*]}"
Size: 6 Contents: Adam Bob Claire
Smith David Eve Fred
$ mapfile -d $'\0' -t arr < <(printf '%s\0' "${arr[#]}" | grep -Pzv "^Claire\nSmith$")
$ echo "Size: ${#arr[*]} Contents: ${arr[*]}"
Size: 5 Contents: Adam Bob David Eve Fred
This method allows for great flexibility by modifying/exchanging the grep command and doesn't leave any empty strings in the array.
Partial answer only
To delete the first item in the array
unset 'array[0]'
To delete the last item in the array
unset 'array[-1]'
To expand on the above answers, the following can be used to remove multiple elements from an array, without partial matching:
ARRAY=(one two onetwo three four threefour "one six")
TO_REMOVE=(one four)
TEMP_ARRAY=()
for pkg in "${ARRAY[#]}"; do
for remove in "${TO_REMOVE[#]}"; do
KEEP=true
if [[ ${pkg} == ${remove} ]]; then
KEEP=false
break
fi
done
if ${KEEP}; then
TEMP_ARRAY+=(${pkg})
fi
done
ARRAY=("${TEMP_ARRAY[#]}")
unset TEMP_ARRAY
This will result in an array containing:
(two onetwo three threefour "one six")
Here's a (probably very bash-specific) little function involving bash variable indirection and unset; it's a general solution that does not involve text substitution or discarding empty elements and has no problems with quoting/whitespace etc.
delete_ary_elmt() {
local word=$1 # the element to search for & delete
local aryref="$2[#]" # a necessary step since '${!$2[#]}' is a syntax error
local arycopy=("${!aryref}") # create a copy of the input array
local status=1
for (( i = ${#arycopy[#]} - 1; i >= 0; i-- )); do # iterate over indices backwards
elmt=${arycopy[$i]}
[[ $elmt == $word ]] && unset "$2[$i]" && status=0 # unset matching elmts in orig. ary
done
return $status # return 0 if something was deleted; 1 if not
}
array=(a 0 0 b 0 0 0 c 0 d e 0 0 0)
delete_ary_elmt 0 array
for e in "${array[#]}"; do
echo "$e"
done
# prints "a" "b" "c" "d" in lines
Use it like delete_ary_elmt ELEMENT ARRAYNAME without any $ sigil. Switch the == $word for == $word* for prefix matches; use ${elmt,,} == ${word,,} for case-insensitive matches; etc., whatever bash [[ supports.
It works by determining the indices of the input array and iterating over them backwards (so deleting elements doesn't screw up iteration order). To get the indices you need to access the input array by name, which can be done via bash variable indirection x=1; varname=x; echo ${!varname} # prints "1".
You can't access arrays by name like aryname=a; echo "${$aryname[#]}, this gives you an error. You can't do aryname=a; echo "${!aryname[#]}", this gives you the indices of the variable aryname (although it is not an array). What DOES work is aryref="a[#]"; echo "${!aryref}", which will print the elements of the array a, preserving shell-word quoting and whitespace exactly like echo "${a[#]}". But this only works for printing the elements of an array, not for printing its length or indices (aryref="!a[#]" or aryref="#a[#]" or "${!!aryref}" or "${#!aryref}", they all fail).
So I copy the original array by its name via bash indirection and get the indices from the copy. To iterate over the indices in reverse I use a C-style for loop. I could also do it by accessing the indices via ${!arycopy[#]} and reversing them with tac, which is a cat that turns around the input line order.
A function solution without variable indirection would probably have to involve eval, which may or may not be safe to use in that situation (I can't tell).
Using unset
To remove an element at particular index, we can use unset and then do copy to another array. Only just unset is not required in this case. Because unset does not remove the element it just sets null string to the particular index in array.
declare -a arr=('aa' 'bb' 'cc' 'dd' 'ee')
unset 'arr[1]'
declare -a arr2=()
i=0
for element in "${arr[#]}"
do
arr2[$i]=$element
((++i))
done
echo "${arr[#]}"
echo "1st val is ${arr[1]}, 2nd val is ${arr[2]}"
echo "${arr2[#]}"
echo "1st val is ${arr2[1]}, 2nd val is ${arr2[2]}"
Output is
aa cc dd ee
1st val is , 2nd val is cc
aa cc dd ee
1st val is cc, 2nd val is dd
Using :<idx>
We can remove some set of elements using :<idx> also. For example if we want to remove 1st element we can use :1 as mentioned below.
declare -a arr=('aa' 'bb' 'cc' 'dd' 'ee')
arr2=("${arr[#]:1}")
echo "${arr2[#]}"
echo "1st val is ${arr2[1]}, 2nd val is ${arr2[2]}"
Output is
bb cc dd ee
1st val is cc, 2nd val is dd
http://wiki.bash-hackers.org/syntax/pe#substring_removal
${PARAMETER#PATTERN} # remove from beginning
${PARAMETER##PATTERN} # remove from the beginning, greedy match
${PARAMETER%PATTERN} # remove from the end
${PARAMETER%%PATTERN} # remove from the end, greedy match
In order to do a full remove element, you have to do an unset command with an if statement. If you don't care about removing prefixes from other variables or about supporting whitespace in the array, then you can just drop the quotes and forget about for loops.
See example below for a few different ways to clean up an array.
options=("foo" "bar" "foo" "foobar" "foo bar" "bars" "bar")
# remove bar from the start of each element
options=("${options[#]/#"bar"}")
# options=("foo" "" "foo" "foobar" "foo bar" "s" "")
# remove the complete string "foo" in a for loop
count=${#options[#]}
for ((i = 0; i < count; i++)); do
if [ "${options[i]}" = "foo" ] ; then
unset 'options[i]'
fi
done
# options=( "" "foobar" "foo bar" "s" "")
# remove empty options
# note the count variable can't be recalculated easily on a sparse array
for ((i = 0; i < count; i++)); do
# echo "Element $i: '${options[i]}'"
if [ -z "${options[i]}" ] ; then
unset 'options[i]'
fi
done
# options=("foobar" "foo bar" "s")
# list them with select
echo "Choose an option:"
PS3='Option? '
select i in "${options[#]}" Quit
do
case $i in
Quit) break ;;
*) echo "You selected \"$i\"" ;;
esac
done
Output
Choose an option:
1) foobar
2) foo bar
3) s
4) Quit
Option?
Hope that helps.
There is also this syntax, e.g. if you want to delete the 2nd element :
array=("${array[#]:0:1}" "${array[#]:2}")
which is in fact the concatenation of 2 tabs. The first from the index 0 to the index 1 (exclusive) and the 2nd from the index 2 to the end.
POSIX shell script does not have arrays.
So most probably you are using a specific dialect such as bash, korn shells or zsh.
Therefore, your question as of now cannot be answered.
Maybe this works for you:
unset array[$delete]
What I do is:
array="$(echo $array | tr ' ' '\n' | sed "/itemtodelete/d")"
BAM, that item is removed.
This is a quick-and-dirty solution that will work in simple cases but will break if (a) there are regex special characters in $delete, or (b) there are any spaces at all in any items. Starting with:
array+=(pluto)
array+=(pippo)
delete=(pluto)
Delete all entries exactly matching $delete:
array=(`echo $array | fmt -1 | grep -v "^${delete}$" | fmt -999999`)
resulting in
echo $array -> pippo, and making sure it's an array:
echo $array[1] -> pippo
fmt is a little obscure: fmt -1 wraps at the first column (to put each item on its own line. That's where the problem arises with items in spaces.) fmt -999999 unwraps it back to one line, putting back the spaces between items. There are other ways to do that, such as xargs.
Addendum: If you want to delete just the first match, use sed, as described here:
array=(`echo $array | fmt -1 | sed "0,/^${delete}$/{//d;}" | fmt -999999`)
Actually, I just noticed that the shell syntax somewhat has a behavior built-in that allows for easy reconstruction of the array when, as posed in the question, an item should be removed.
# let's set up an array of items to consume:
x=()
for (( i=0; i<10; i++ )); do
x+=("$i")
done
# here, we consume that array:
while (( ${#x[#]} )); do
i=$(( $RANDOM % ${#x[#]} ))
echo "${x[i]} / ${x[#]}"
x=("${x[#]:0:i}" "${x[#]:i+1}")
done
Notice how we constructed the array using bash's x+=() syntax?
You could actually add more than one item with that, the content of a whole other array at once.
In ZSH this is dead easy (note this uses more bash compatible syntax than necessary where possible for ease of understanding):
# I always include an edge case to make sure each element
# is not being word split.
start=(one two three 'four 4' five)
work=(${(#)start})
idx=2
val=${work[idx]}
# How to remove a single element easily.
# Also works for associative arrays (at least in zsh)
work[$idx]=()
echo "Array size went down by one: "
[[ $#work -eq $(($#start - 1)) ]] && echo "OK"
echo "Array item "$val" is now gone: "
[[ -z ${work[(r)$val]} ]] && echo OK
echo "Array contents are as expected: "
wanted=("${start[#]:0:1}" "${start[#]:2}")
[[ "${(j.:.)wanted[#]}" == "${(j.:.)work[#]}" ]] && echo "OK"
echo "-- array contents: start --"
print -l -r -- "-- $#start elements" ${(#)start}
echo "-- array contents: work --"
print -l -r -- "-- $#work elements" "${work[#]}"
Results:
Array size went down by one:
OK
Array item two is now gone:
OK
Array contents are as expected:
OK
-- array contents: start --
-- 5 elements
one
two
three
four 4
five
-- array contents: work --
-- 4 elements
one
three
four 4
five
To avoid conflicts with array index using unset - see https://stackoverflow.com/a/49626928/3223785 and https://stackoverflow.com/a/47798640/3223785 for more information - reassign the array to itself: ARRAY_VAR=(${ARRAY_VAR[#]}).
#!/bin/bash
ARRAY_VAR=(0 1 2 3 4 5 6 7 8 9)
unset ARRAY_VAR[5]
unset ARRAY_VAR[4]
ARRAY_VAR=(${ARRAY_VAR[#]})
echo ${ARRAY_VAR[#]}
A_LENGTH=${#ARRAY_VAR[*]}
for (( i=0; i<=$(( $A_LENGTH -1 )); i++ )) ; do
echo ""
echo "INDEX - $i"
echo "VALUE - ${ARRAY_VAR[$i]}"
done
exit 0
[Ref.: https://tecadmin.net/working-with-array-bash-script/ ]
How about something like:
array=(one two three)
array_t=" ${array[#]} "
delete=one
array=(${array_t// $delete / })
unset array_t
#/bin/bash
echo "# define array with six elements"
arr=(zero one two three 'four 4' five)
echo "# unset by index: 0"
unset -v 'arr[0]'
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
arr_delete_by_content() { # value to delete
for i in ${!arr[*]}; do
[ "${arr[$i]}" = "$1" ] && unset -v 'arr[$i]'
done
}
echo "# unset in global variable where value: three"
arr_delete_by_content three
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
echo "# rearrange indices"
arr=( "${arr[#]}" )
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
delete_value() { # value arrayelements..., returns array decl.
local e val=$1; new=(); shift
for e in "${#}"; do [ "$val" != "$e" ] && new+=("$e"); done
declare -p new|sed 's,^[^=]*=,,'
}
echo "# new array without value: two"
declare -a arr="$(delete_value two "${arr[#]}")"
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
delete_values() { # arraydecl values..., returns array decl. (keeps indices)
declare -a arr="$1"; local i v; shift
for v in "${#}"; do
for i in ${!arr[*]}; do
[ "$v" = "${arr[$i]}" ] && unset -v 'arr[$i]'
done
done
declare -p arr|sed 's,^[^=]*=,,'
}
echo "# new array without values: one five (keep indices)"
declare -a arr="$(delete_values "$(declare -p arr|sed 's,^[^=]*=,,')" one five)"
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
# new array without multiple values and rearranged indices is left to the reader

How to sort an array in Bash

I have an array in Bash, for example:
array=(a c b f 3 5)
I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.
You don't really need all that much code:
IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS
Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.
e.g.:
$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[#]}"
[3 5]
[a c]
[b]
[f]
Note: #sorontar has pointed out that care is required if elements contain wildcards such as * or ?:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.
What's happening:
The result is a culmination six things that happen in this order:
IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS
First, the IFS=$'\n'
This is an important part of our operation that affects the outcome of 2 and 5 in the following way:
Given:
"${array[*]}" expands to every element delimited by the first character of IFS
sorted=() creates elements by splitting on every character of IFS
IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)
Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.
The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.
Next, the sort <<<"${array[*]}" part
<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.
With our example, sort is fed this following string:
a c
b
f
3 5
Since sort sorts, it produces:
3 5
a c
b
f
Next, the sorted=($(...)) part
The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.
In our example, this produces something similar to simply writing:
sorted=(3 5
a c
b
f
)
sorted then becomes an array that's created by splitting this literal on every new line.
Finally, the unset IFS
This resets the value of IFS to the default value, and is just good practice.
It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)
Original response:
array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[#]}"; do echo "$a"; done | sort)
output:
$ for a in "${sorted[#]}"; do echo "$a"; done
3
5
a
b
c
f f
Note this version copes with values that contains special characters or whitespace (except newlines)
Note readarray is supported in bash 4+.
Edit Based on the suggestion by #Dimitre I had updated it to:
readarray -t sorted < <(printf '%s\0' "${array[#]}" | sort -z | xargs -0n1)
which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by #ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.
If you don't need to handle special shell characters in the array elements:
array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[#]}"|sort))
With bash you'll need an external sorting program anyway.
With zsh no external programs are needed and special shell characters are easily handled:
% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[#]}"
3
5
a a
b
c
f
ksh has set -s to sort ASCIIbetically.
Here's a pure Bash quicksort implementation:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
local pivot i smaller=() larger=()
qsort_ret=()
(($#==0)) && return 0
pivot=$1
shift
for i; do
# This sorts strings lexicographically.
if [[ $i < $pivot ]]; then
smaller+=( "$i" )
else
larger+=( "$i" )
fi
done
qsort "${smaller[#]}"
smaller=( "${qsort_ret[#]}" )
qsort "${larger[#]}"
larger=( "${qsort_ret[#]}" )
qsort_ret=( "${smaller[#]}" "$pivot" "${larger[#]}" )
}
Use as, e.g.,
$ array=(a c b f 3 5)
$ qsort "${array[#]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'
This implementation is recursive… so here's an iterative quicksort:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
(($#==0)) && return 0
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
(($#<=1)) && return 0
local compare_fun=$1
shift
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
Then you can have this comparison function:
compare_mtime() { [[ $1 -nt $2 ]]; }
and use:
$ qsort compare_mtime *
$ declare -p qsort_ret
to have the files in current folder sorted by modification time (newest first).
NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).
NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead.
Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.
tl;dr:
Sort array a_in and store the result in a_out (elements must not have embedded newlines[1]
):
Bash v4+:
readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Bash v3:
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Advantages over antak's solution:
You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.[2]
Optional reading: explanation and sample code
The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):
Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
(The exact thresholds will depend on your specific input, machine, and platform.)
The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[#]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
"${a_in[#]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
Bash v3: readarray doesn't exist, so read must be used:
IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
(-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)
Annotated sample code:
#!/usr/bin/env bash
# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )
# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Print sorted output array, line by line:
printf '%s\n' "${a_out[#]}"
Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):
*
10
5
a c
b
f
If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):
*
a c
b
f
5
10
[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[#]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.
[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.
In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):
declare BSORT=()
function bubble_sort()
{ #
# #param [ARGUMENTS]...
#
# Sort all positional arguments and store them in global array BSORT.
# Without arguments sort this array. Return the number of iterations made.
#
# Bubble sorting lets the heaviest element sink to the bottom.
#
(($# > 0)) && BSORT=("$#")
local j=0 ubound=$((${#BSORT[*]} - 1))
while ((ubound > 0))
do
local i=0
while ((i < ubound))
do
if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
then
local t="${BSORT[$i]}"
BSORT[$i]="${BSORT[$((i + 1))]}"
BSORT[$((i + 1))]="$t"
fi
((++i))
done
((++j))
((--ubound))
done
echo $j
}
bubble_sort a c b 'z y' 3 5
echo ${BSORT[#]}
This prints:
3 5 a b c z y
The same output is created from
BSORT=(a c b 'z y' 3 5)
bubble_sort
echo ${BSORT[#]}
Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.
Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).
local e new_array=()
while IFS= read -r -d '' e; do
new_array+=( "${e}" )
done < <(printf "%s\0" "${array[#]}" | LC_ALL=C sort -z)
First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:
for e in "${array[#]}"; do
printf "%s\0" "${e}"
done
The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.
The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).
Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.
As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).
Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).
Keep it simple ;)
In the following example, the array b is the sorted version of the array a!
The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.
a=(2 3 1)
b=( $( for x in ${a[#]}; do echo $x; done | sort ) )
echo ${b[#]} # output: 1 2 3
min sort:
#!/bin/bash
array=(.....)
index_of_element1=0
while (( ${index_of_element1} < ${#array[#]} )); do
element_1="${array[${index_of_element1}]}"
index_of_element2=$((index_of_element1 + 1))
index_of_min=${index_of_element1}
min_element="${element_1}"
for element_2 in "${array[#]:$((index_of_element1 + 1))}"; do
min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"
if [[ "${min_element}" == "${element_2}" ]]; then
index_of_min=${index_of_element2}
fi
let index_of_element2++
done
array[${index_of_element1}]="${min_element}"
array[${index_of_min}]="${element_1}"
let index_of_element1++
done
try this:
echo ${array[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
Output will be:
3
5
a
b
c
f
Problem solved.
If you can compute a unique integer for each element in the array, like this:
tab='0123456789abcdefghijklmnopqrstuvwxyz'
# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
declare -g ord_${tab:i:1}=$i
done
function sexy_int() {
local sum=0
local i ch ref
for ((i = 0; i < ${#1}; i++)); do
ch="${1:i:1}"
ref="ord_$ch"
(( sum += ${!ref} ))
done
return $sum
}
sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"
then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:
array=(a c b f 3 5)
for el in "${array[#]}"; do
sexy_int "$el"
sorted[$?]="$el"
done
echo "${sorted[#]}"
Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
array=(a c b f 3 5)
new_array=($(echo "${array[#]}" | sed 's/ /\n/g' | sort))
echo ${new_array[#]}
echo contents of new_array will be:
3 5 a b c f
There is a workaround for the usual problem of spaces and newlines:
Use a character that is not in the original array (like $'\1' or $'\4' or similar).
This function gets the job done:
# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
if [[ $* =~ [$wa] ]]; then
echo "$0: error: array contains the workaround char" >&2
exit 1
fi
set -f; local IFS=$'\n' x nl=$'\n'
set -- $(printf '%s\n' "${#//$nl/$wa}" | sort -n)
for x
do sorted+=("${x//$wa/$nl}")
done
}
This will sort the array:
$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[#]}"
$ printf '<%s>\n' "${sorted[#]}"
<a>
<b>
<c d>
<e
f>
<gh>
This will complain that the source array contains the workaround character:
$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[#]}"
./script: error: array contains the workaround char
description
We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $#).
but we replace any new line by the workaround char "${#//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done
This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):
mergesort() {
local -n -r input_reference="$1"
local -n output_reference="$2"
local -r -i size="${#input_reference[#]}"
local merge previous
local -a -i runs indices
local -i index previous_idx merged_idx \
run_a_idx run_a_stop \
run_b_idx run_b_stop
output_reference=("${input_reference[#]}")
if ((size == 0)); then return; fi
previous="${output_reference[0]}"
runs=(0)
for ((index = 0;;)) do
for ((++index;; ++index)); do
if ((index >= size)); then break 2; fi
if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
previous="${output_reference[index]}"
done
previous="${output_reference[index]}"
runs+=(index)
done
runs+=(size)
while (("${#runs[#]}" > 2)); do
indices=("${!runs[#]}")
merge=("${output_reference[#]}")
for ((index = 0; index < "${#indices[#]}" - 2; index += 2)); do
merged_idx=runs[indices[index]]
run_a_idx=merged_idx
previous_idx=indices[$((index + 1))]
run_a_stop=runs[previous_idx]
run_b_idx=runs[previous_idx]
run_b_stop=runs[indices[$((index + 2))]]
unset runs[previous_idx]
while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
output_reference[merged_idx++]="${merge[run_a_idx++]}"
else
output_reference[merged_idx++]="${merge[run_b_idx++]}"
fi
done
while ((run_a_idx < run_a_stop)); do
output_reference[merged_idx++]="${merge[run_a_idx++]}"
done
while ((run_b_idx < run_b_stop)); do
output_reference[merged_idx++]="${merge[run_b_idx++]}"
done
done
done
}
declare -ar input=({z..a}{z..a})
declare -a output
mergesort input output
echo "${input[#]}"
echo "${output[#]}"
Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change
array=("a \n c" b f "3 5")
Using process substitution and read array in bash > v4.4 WITH EOL character
readarray -t sorted < <(sort < <(printf '%s\n' "${array[#]}"))
Using process substitution and read array in bash > v4.4 WITH NULL character
readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[#]}"))
Finally we verify with
printf "[%s]\n" "${sorted[#]}"
output is
[3 5]
[a \n c]
[b]
[f]
Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n
I am not convinced that you'll need an external sorting program in Bash.
Here is my implementation for the simple bubble-sort algorithm.
function bubble_sort()
{ #
# Sorts all positional arguments and echoes them back.
#
# Bubble sorting lets the heaviest (longest) element sink to the bottom.
#
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i += 1))
done
((max -= 1))
done
echo ${array[#]}
}
array=(a c b f 3 5)
echo " input: ${array[#]}"
echo "output: $(bubble_sort ${array[#]})"
This shall print:
input: a c b f 3 5
output: 3 5 a b c f
a=(e b 'c d')
shuf -e "${a[#]}" | sort >/tmp/f
mapfile -t g </tmp/f
Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:
a=(3 "2 a" 1) # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[#]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[#]}"; # Success!
Outputs:
'1' '2 a' '3'
Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.
Inspiration was from #yas's answer and #Alcamtar comments.
EDIT
Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!
IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
Turns out that the unset is required because this is a variable assignment that has no command.
I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.
EDIT 2
GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.
array=(z 'b c'); { set "${array[#]}"; printf '%s\n' "$#"; } \
| sort \
| mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
Copy the array to the positional arguments. (e.g. set "${array[#]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
Print each positional argument (e.g. printf '%s\n' "$#" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
Then sort does its thing.
Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
Dump the array to show its been sorted.
As a function:
set +m
shopt -s lastpipe
sort_array() {
declare -n ref=$1
set "${ref[#]}"
printf '%s\n' "$#"
| sort \
| mapfile -t $ref
}
then
array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")
I look forward to being ripped apart by all the UNIX gurus! :)
sorted=($(echo ${array[#]} | tr " " "\n" | sort))
In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:
Echo array content --> replace space by newline --> sort
$() is to echo the result
($()) is to put the "echoed result" in an array
Note: as #sorontar mentioned in a comment to a different question:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Resources