Sorting an array of mixed data - arrays

Suppose that I have an array of such data:
arr[0] = "someText1 (x,y,z) a"
arr[1] = "someText2 (x,y,z) b"
How can I sort this array lexicographically [only taking the text into account] using Bash?

Join on newline, pass to sort.
(IFS=$'\n'; sort <<<"${arr[*]}")
sort <<<"fnord" simply sends the string "fnord" as the standard input to sort; this is a Bash convenience notation for the clumsier echo "fnord" | sort (plus it avoids the extra process) and similarly, sort <<<"${arr[*]}" feeds the array to sort.
Because array pasting depends on the value of IFS, we change it to a newline so that "${arr[*]}" will result in a newline-separated list (the default IFS would cause the entries in the array to be expanded to a space-separated list). In order to not change IFS permanently, we do this in a subshell; hence, the enclosing parentheses.
The Bash manual page is rather dense, but it's all there; or see the Reference Manual.

One way is to implement your own sorting algorithm; bubble-sort is pretty simple.
Another way is to use an external program, such as sort, to do your sorting. Here is a shell function that takes the array elements as arguments, and saves a sorted copy of the array into a variable named $SORTED:
function sort_array () {
SORTED=()
local elem
while IFS= read -r -d '' elem ; do
SORTED+=("$elem")
done < <(printf '%s\0' "$#" | sort -z)
}
(Note the use of null bytes as a delimiter, rather than newlines, so that your array elements are unrestricted. This is achieved by the -d '' option to read, the \0 in the printf format-string, and the -z option to sort.)
It can be used like this:
arr=('a b c' 'd e f' 'b c d' 'e f g' 'c d e')
printf '%s\n' "${arr[#]}" # prints elements, one per line
sort_array "${arr[#]}"
arr=("${SORTED[#]}")
printf '%s\n' "${arr[#]}" # same as above, but now it's sorted

This code is in modules but you could just include the needed functions from the other file array.sh to make it complete:
https://github.com/konsolebox/bash-library/blob/master/array/sort.sh
The function is customizable like producing elements or indices, and specializing on strings or integers. Just try to use it.
And one thing, it doesn't depend on external binaries like sort, and doesn't cause possible reinterpretation of data.

Related

BASH: Sorting Associative array without echoing it [duplicate]

I have an array in Bash, for example:
array=(a c b f 3 5)
I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.
You don't really need all that much code:
IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS
Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.
e.g.:
$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[#]}"
[3 5]
[a c]
[b]
[f]
Note: #sorontar has pointed out that care is required if elements contain wildcards such as * or ?:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.
What's happening:
The result is a culmination six things that happen in this order:
IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS
First, the IFS=$'\n'
This is an important part of our operation that affects the outcome of 2 and 5 in the following way:
Given:
"${array[*]}" expands to every element delimited by the first character of IFS
sorted=() creates elements by splitting on every character of IFS
IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)
Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.
The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.
Next, the sort <<<"${array[*]}" part
<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.
With our example, sort is fed this following string:
a c
b
f
3 5
Since sort sorts, it produces:
3 5
a c
b
f
Next, the sorted=($(...)) part
The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.
In our example, this produces something similar to simply writing:
sorted=(3 5
a c
b
f
)
sorted then becomes an array that's created by splitting this literal on every new line.
Finally, the unset IFS
This resets the value of IFS to the default value, and is just good practice.
It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)
Original response:
array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[#]}"; do echo "$a"; done | sort)
output:
$ for a in "${sorted[#]}"; do echo "$a"; done
3
5
a
b
c
f f
Note this version copes with values that contains special characters or whitespace (except newlines)
Note readarray is supported in bash 4+.
Edit Based on the suggestion by #Dimitre I had updated it to:
readarray -t sorted < <(printf '%s\0' "${array[#]}" | sort -z | xargs -0n1)
which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by #ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.
If you don't need to handle special shell characters in the array elements:
array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[#]}"|sort))
With bash you'll need an external sorting program anyway.
With zsh no external programs are needed and special shell characters are easily handled:
% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[#]}"
3
5
a a
b
c
f
ksh has set -s to sort ASCIIbetically.
Here's a pure Bash quicksort implementation:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
local pivot i smaller=() larger=()
qsort_ret=()
(($#==0)) && return 0
pivot=$1
shift
for i; do
# This sorts strings lexicographically.
if [[ $i < $pivot ]]; then
smaller+=( "$i" )
else
larger+=( "$i" )
fi
done
qsort "${smaller[#]}"
smaller=( "${qsort_ret[#]}" )
qsort "${larger[#]}"
larger=( "${qsort_ret[#]}" )
qsort_ret=( "${smaller[#]}" "$pivot" "${larger[#]}" )
}
Use as, e.g.,
$ array=(a c b f 3 5)
$ qsort "${array[#]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'
This implementation is recursive… so here's an iterative quicksort:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
(($#==0)) && return 0
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
(($#<=1)) && return 0
local compare_fun=$1
shift
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
Then you can have this comparison function:
compare_mtime() { [[ $1 -nt $2 ]]; }
and use:
$ qsort compare_mtime *
$ declare -p qsort_ret
to have the files in current folder sorted by modification time (newest first).
NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).
NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead.
Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.
tl;dr:
Sort array a_in and store the result in a_out (elements must not have embedded newlines[1]
):
Bash v4+:
readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Bash v3:
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Advantages over antak's solution:
You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.[2]
Optional reading: explanation and sample code
The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):
Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
(The exact thresholds will depend on your specific input, machine, and platform.)
The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[#]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
"${a_in[#]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
Bash v3: readarray doesn't exist, so read must be used:
IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
(-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)
Annotated sample code:
#!/usr/bin/env bash
# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )
# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Print sorted output array, line by line:
printf '%s\n' "${a_out[#]}"
Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):
*
10
5
a c
b
f
If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):
*
a c
b
f
5
10
[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[#]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.
[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.
In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):
declare BSORT=()
function bubble_sort()
{ #
# #param [ARGUMENTS]...
#
# Sort all positional arguments and store them in global array BSORT.
# Without arguments sort this array. Return the number of iterations made.
#
# Bubble sorting lets the heaviest element sink to the bottom.
#
(($# > 0)) && BSORT=("$#")
local j=0 ubound=$((${#BSORT[*]} - 1))
while ((ubound > 0))
do
local i=0
while ((i < ubound))
do
if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
then
local t="${BSORT[$i]}"
BSORT[$i]="${BSORT[$((i + 1))]}"
BSORT[$((i + 1))]="$t"
fi
((++i))
done
((++j))
((--ubound))
done
echo $j
}
bubble_sort a c b 'z y' 3 5
echo ${BSORT[#]}
This prints:
3 5 a b c z y
The same output is created from
BSORT=(a c b 'z y' 3 5)
bubble_sort
echo ${BSORT[#]}
Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.
Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).
local e new_array=()
while IFS= read -r -d '' e; do
new_array+=( "${e}" )
done < <(printf "%s\0" "${array[#]}" | LC_ALL=C sort -z)
First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:
for e in "${array[#]}"; do
printf "%s\0" "${e}"
done
The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.
The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).
Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.
As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).
Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).
Keep it simple ;)
In the following example, the array b is the sorted version of the array a!
The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.
a=(2 3 1)
b=( $( for x in ${a[#]}; do echo $x; done | sort ) )
echo ${b[#]} # output: 1 2 3
min sort:
#!/bin/bash
array=(.....)
index_of_element1=0
while (( ${index_of_element1} < ${#array[#]} )); do
element_1="${array[${index_of_element1}]}"
index_of_element2=$((index_of_element1 + 1))
index_of_min=${index_of_element1}
min_element="${element_1}"
for element_2 in "${array[#]:$((index_of_element1 + 1))}"; do
min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"
if [[ "${min_element}" == "${element_2}" ]]; then
index_of_min=${index_of_element2}
fi
let index_of_element2++
done
array[${index_of_element1}]="${min_element}"
array[${index_of_min}]="${element_1}"
let index_of_element1++
done
try this:
echo ${array[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
Output will be:
3
5
a
b
c
f
Problem solved.
If you can compute a unique integer for each element in the array, like this:
tab='0123456789abcdefghijklmnopqrstuvwxyz'
# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
declare -g ord_${tab:i:1}=$i
done
function sexy_int() {
local sum=0
local i ch ref
for ((i = 0; i < ${#1}; i++)); do
ch="${1:i:1}"
ref="ord_$ch"
(( sum += ${!ref} ))
done
return $sum
}
sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"
then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:
array=(a c b f 3 5)
for el in "${array[#]}"; do
sexy_int "$el"
sorted[$?]="$el"
done
echo "${sorted[#]}"
Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
array=(a c b f 3 5)
new_array=($(echo "${array[#]}" | sed 's/ /\n/g' | sort))
echo ${new_array[#]}
echo contents of new_array will be:
3 5 a b c f
There is a workaround for the usual problem of spaces and newlines:
Use a character that is not in the original array (like $'\1' or $'\4' or similar).
This function gets the job done:
# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
if [[ $* =~ [$wa] ]]; then
echo "$0: error: array contains the workaround char" >&2
exit 1
fi
set -f; local IFS=$'\n' x nl=$'\n'
set -- $(printf '%s\n' "${#//$nl/$wa}" | sort -n)
for x
do sorted+=("${x//$wa/$nl}")
done
}
This will sort the array:
$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[#]}"
$ printf '<%s>\n' "${sorted[#]}"
<a>
<b>
<c d>
<e
f>
<gh>
This will complain that the source array contains the workaround character:
$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[#]}"
./script: error: array contains the workaround char
description
We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $#).
but we replace any new line by the workaround char "${#//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done
This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):
mergesort() {
local -n -r input_reference="$1"
local -n output_reference="$2"
local -r -i size="${#input_reference[#]}"
local merge previous
local -a -i runs indices
local -i index previous_idx merged_idx \
run_a_idx run_a_stop \
run_b_idx run_b_stop
output_reference=("${input_reference[#]}")
if ((size == 0)); then return; fi
previous="${output_reference[0]}"
runs=(0)
for ((index = 0;;)) do
for ((++index;; ++index)); do
if ((index >= size)); then break 2; fi
if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
previous="${output_reference[index]}"
done
previous="${output_reference[index]}"
runs+=(index)
done
runs+=(size)
while (("${#runs[#]}" > 2)); do
indices=("${!runs[#]}")
merge=("${output_reference[#]}")
for ((index = 0; index < "${#indices[#]}" - 2; index += 2)); do
merged_idx=runs[indices[index]]
run_a_idx=merged_idx
previous_idx=indices[$((index + 1))]
run_a_stop=runs[previous_idx]
run_b_idx=runs[previous_idx]
run_b_stop=runs[indices[$((index + 2))]]
unset runs[previous_idx]
while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
output_reference[merged_idx++]="${merge[run_a_idx++]}"
else
output_reference[merged_idx++]="${merge[run_b_idx++]}"
fi
done
while ((run_a_idx < run_a_stop)); do
output_reference[merged_idx++]="${merge[run_a_idx++]}"
done
while ((run_b_idx < run_b_stop)); do
output_reference[merged_idx++]="${merge[run_b_idx++]}"
done
done
done
}
declare -ar input=({z..a}{z..a})
declare -a output
mergesort input output
echo "${input[#]}"
echo "${output[#]}"
Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change
array=("a \n c" b f "3 5")
Using process substitution and read array in bash > v4.4 WITH EOL character
readarray -t sorted < <(sort < <(printf '%s\n' "${array[#]}"))
Using process substitution and read array in bash > v4.4 WITH NULL character
readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[#]}"))
Finally we verify with
printf "[%s]\n" "${sorted[#]}"
output is
[3 5]
[a \n c]
[b]
[f]
Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n
I am not convinced that you'll need an external sorting program in Bash.
Here is my implementation for the simple bubble-sort algorithm.
function bubble_sort()
{ #
# Sorts all positional arguments and echoes them back.
#
# Bubble sorting lets the heaviest (longest) element sink to the bottom.
#
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i += 1))
done
((max -= 1))
done
echo ${array[#]}
}
array=(a c b f 3 5)
echo " input: ${array[#]}"
echo "output: $(bubble_sort ${array[#]})"
This shall print:
input: a c b f 3 5
output: 3 5 a b c f
a=(e b 'c d')
shuf -e "${a[#]}" | sort >/tmp/f
mapfile -t g </tmp/f
Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:
a=(3 "2 a" 1) # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[#]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[#]}"; # Success!
Outputs:
'1' '2 a' '3'
Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.
Inspiration was from #yas's answer and #Alcamtar comments.
EDIT
Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!
IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
Turns out that the unset is required because this is a variable assignment that has no command.
I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.
EDIT 2
GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.
array=(z 'b c'); { set "${array[#]}"; printf '%s\n' "$#"; } \
| sort \
| mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
Copy the array to the positional arguments. (e.g. set "${array[#]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
Print each positional argument (e.g. printf '%s\n' "$#" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
Then sort does its thing.
Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
Dump the array to show its been sorted.
As a function:
set +m
shopt -s lastpipe
sort_array() {
declare -n ref=$1
set "${ref[#]}"
printf '%s\n' "$#"
| sort \
| mapfile -t $ref
}
then
array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")
I look forward to being ripped apart by all the UNIX gurus! :)
sorted=($(echo ${array[#]} | tr " " "\n" | sort))
In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:
Echo array content --> replace space by newline --> sort
$() is to echo the result
($()) is to put the "echoed result" in an array
Note: as #sorontar mentioned in a comment to a different question:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Finding elements in common between two ksh or bash arrays efficiently

I am writing a Korn shell script. I have two arrays (say, arr1 and arr2), both containing strings, and I need to check which elements from arr1 are present (as whole strings or substrings) in arr2. The most intuitive solution is having nested for loops, and checking if each element from arr1 can be found in arr2 (through grep) like this:
for arr1Element in ${arr1[*]}; do
for arr2Element in ${arr2[*]}; do
# using grep to check if arr1Element is present in arr2Element
echo $arr2Element | grep $arr1Element
done
done
The issue is that arr2 has around 3000 elements, so running a nested loop takes a long time. I am wondering if there is a better way to do this in Bash.
If I were doing this in Java, I could have calculated hashes for elements in one of the arrays, and then looked for those hashes in the other array, but I don't think Bash has any functionality for doing something like this (unless I was willing to write a hash calculating function in Bash).
Any suggestions?
Since version 4.0 Bash has associative arrays:
$ declare -A elements
$ elements[hello]=world
$ echo ${elements[hello]}
world
You can use this in the same way you would a Java Map.
declare -A map
for el in "${arr1[#]}"; do
map[$el]="x"
done
for el in "${arr2[#]}"; do
if [ -n "${map[$el]}" ] ; then
echo "${el}"
fi
done
Dealing with substrings is an altogether more weighty problem, and would be a challenge in any language, short of the brute-force algorithm you're already using. You could build a binary-tree index of character sequences, but I wouldn't try that in Bash!
BashFAQ #36 describes doing set arithmetic (unions, disjoint sets, etc) in bash with comm.
Assuming your values can't contain literal newlines, the following will emit a line per item in both arr1 and arr2:
comm -12 <(printf '%s\n' "${arr1[#]}" | sort -u) \
<(printf '%s\n' "${arr2[#]}" | sort -u)
If your arrays are pre-sorted, you can remove the sorts (which will make this extremely memory- and time-efficient with large arrays, moreso than the grep-based approach).
Since you're OK with using grep, and since you want to match substrings as well as full strings, one approach is to write:
printf '%s\n' "${arr2[#]}" \
| grep -o -F "$(printf '%s\n' "${arr1[#]}")
and let grep optimize as it sees fit.
Here's a bash/awk idea:
# some sample arrays
$ arr1=( my first string "hello wolrd")
$ arr2=( my last stringbean strings "well, hello world!)
# break array elements into separate lines
$ printf '%s\n' "${arr1[#]}"
my
first
string
hello world
$ printf '%s\n' "${arr2[#]}"
my
last
stringbean
strings
well, hello world!
# use the 'printf' command output as input to our awk command
$ awk '
NR==FNR { a[NR]=$0 ; next }
{ for (i in a)
if ($0 ~ a[i]) print "array1 string {"a[i]"} is a substring of array2 string {"$0"}" }
' <( printf '%s\n' "${arr1[#]}" ) \
<( printf '%s\n' "${arr2[#]}" )
array1 string {my} is a substring of array2 string {my}
array1 string {string} is a substring of array2 string {stringbean}
array1 string {string} is a substring of array2 string {strings}
array1 string {hello world} is a substring of array2 string {well, hello world!}
NR==FNR : for file #1 only: store elements into awk array named 'a'
next : process next line in file #1; at this point rest of awk script is ignored for file #1; the for each line in file #2 ...
for (i in a) : for each index 'i' in array 'a' ...
if ($0 ~ a[i] ) : see if a[i] is a substring of the current line ($0) from file #2 and if so ...
print "array1... : output info about the match
A test run using the following data:
arr1 == 3300 elements
arr2 == 500 elements
When all arr2 elements have a substring/pattern match in arr1 (ie, 500 matches), total time to run is ~27 seconds ... so the repetitive looping through the array takes a toll.
Obviously (?) need to reduce the volume of repetitive actions ...
for an exact string match the comm solution by Charles Duffy makes sense (it runs against the same 3300/500 test set in about 0.5 seconds)
for a substring/pattern match I was able to get a egrep solution to run in about 5 seconds (see my other answer/post)
An egrep solution for substring/pattern matching ...
egrep -f <(printf '.*%s.*\n' "${arr1[#]}") \
<(printf '%s\n' "${arr2[#]}")
egrep -f : take patterns to search from the file designated by the -f, which in this case is ...
<(printf '.*%s.*\n' "${arr1[#]}") : convert arr1 elements into 1 pattern per line, appending a regex wild card character (.*) for prefix and suffix
<(printf '%s\n' "${arr2[#]}") : convert arr2 elements into 1 string per line
When run against a sample data set like:
arr1 == 3300 elements
arr2 == 500 elements
... with 500 matches, total run time is ~5 seconds; there's still a good bit of repetitive processing going on with egrep but not as bad as seen with my other answer (bash/awk) ... and of course not as fast the comm solution which eliminates the repetitive processing.

Sort multiple column String array in bash

I have an array of strings:
arr[0]="1 10 2Z6UVU6h"
arr[1]="1 12 7YzF5mFs"
arr[2]="2 36 qRwAiLg7"
How could i sort by the 2nd column and use the 1st as a tie break.
Is there anything similar to something like...
sort -k 2,2n -k 1,1 $arr
As long as there are no newline characters in any array element, it's straight-forward: Just printf the array into sort and capture the output:
mapfile -t sorted < <(printf "%s\n" "${arr[#]}" | sort -k2,2n -k1,1)
(The use of process substitution is to avoid having the mapfile run in a subshell, which wouldn't be helpful since the goal is to set the value of $sorted in this shell.)
If the array elements might contain newlines, then you could use NUL as a delimiter in the printf and the sort (option -z for sort), but you'd have to replace mapfile with an explicit loop because mapfile does not offer an option to change the line delimiter. read does (-d '' will cause read to use NUL as a line delimiter), but it only reads one line at a time.

How to sort an array in Bash

I have an array in Bash, for example:
array=(a c b f 3 5)
I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.
You don't really need all that much code:
IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS
Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.
e.g.:
$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[#]}"
[3 5]
[a c]
[b]
[f]
Note: #sorontar has pointed out that care is required if elements contain wildcards such as * or ?:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.
What's happening:
The result is a culmination six things that happen in this order:
IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS
First, the IFS=$'\n'
This is an important part of our operation that affects the outcome of 2 and 5 in the following way:
Given:
"${array[*]}" expands to every element delimited by the first character of IFS
sorted=() creates elements by splitting on every character of IFS
IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)
Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.
The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.
Next, the sort <<<"${array[*]}" part
<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.
With our example, sort is fed this following string:
a c
b
f
3 5
Since sort sorts, it produces:
3 5
a c
b
f
Next, the sorted=($(...)) part
The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.
In our example, this produces something similar to simply writing:
sorted=(3 5
a c
b
f
)
sorted then becomes an array that's created by splitting this literal on every new line.
Finally, the unset IFS
This resets the value of IFS to the default value, and is just good practice.
It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)
Original response:
array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[#]}"; do echo "$a"; done | sort)
output:
$ for a in "${sorted[#]}"; do echo "$a"; done
3
5
a
b
c
f f
Note this version copes with values that contains special characters or whitespace (except newlines)
Note readarray is supported in bash 4+.
Edit Based on the suggestion by #Dimitre I had updated it to:
readarray -t sorted < <(printf '%s\0' "${array[#]}" | sort -z | xargs -0n1)
which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by #ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.
If you don't need to handle special shell characters in the array elements:
array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[#]}"|sort))
With bash you'll need an external sorting program anyway.
With zsh no external programs are needed and special shell characters are easily handled:
% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[#]}"
3
5
a a
b
c
f
ksh has set -s to sort ASCIIbetically.
Here's a pure Bash quicksort implementation:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
local pivot i smaller=() larger=()
qsort_ret=()
(($#==0)) && return 0
pivot=$1
shift
for i; do
# This sorts strings lexicographically.
if [[ $i < $pivot ]]; then
smaller+=( "$i" )
else
larger+=( "$i" )
fi
done
qsort "${smaller[#]}"
smaller=( "${qsort_ret[#]}" )
qsort "${larger[#]}"
larger=( "${qsort_ret[#]}" )
qsort_ret=( "${smaller[#]}" "$pivot" "${larger[#]}" )
}
Use as, e.g.,
$ array=(a c b f 3 5)
$ qsort "${array[#]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'
This implementation is recursive… so here's an iterative quicksort:
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
(($#==0)) && return 0
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,
#!/bin/bash
# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
(($#<=1)) && return 0
local compare_fun=$1
shift
local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
qsort_ret=("$#")
while ((${#stack[#]})); do
beg=${stack[0]}
end=${stack[1]}
stack=( "${stack[#]:2}" )
smaller=() larger=()
pivot=${qsort_ret[beg]}
for ((i=beg+1;i<=end;++i)); do
if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
smaller+=( "${qsort_ret[i]}" )
else
larger+=( "${qsort_ret[i]}" )
fi
done
qsort_ret=( "${qsort_ret[#]:0:beg}" "${smaller[#]}" "$pivot" "${larger[#]}" "${qsort_ret[#]:end+1}" )
if ((${#smaller[#]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[#]}-1))" ); fi
if ((${#larger[#]}>=2)); then stack+=( "$((end-${#larger[#]}+1))" "$end" ); fi
done
}
Then you can have this comparison function:
compare_mtime() { [[ $1 -nt $2 ]]; }
and use:
$ qsort compare_mtime *
$ declare -p qsort_ret
to have the files in current folder sorted by modification time (newest first).
NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).
NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead.
Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.
tl;dr:
Sort array a_in and store the result in a_out (elements must not have embedded newlines[1]
):
Bash v4+:
readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Bash v3:
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
Advantages over antak's solution:
You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.[2]
Optional reading: explanation and sample code
The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):
Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
(The exact thresholds will depend on your specific input, machine, and platform.)
The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[#]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
"${a_in[#]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
Bash v3: readarray doesn't exist, so read must be used:
IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
(-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)
Annotated sample code:
#!/usr/bin/env bash
# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )
# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[#]}" | sort)
# Print sorted output array, line by line:
printf '%s\n' "${a_out[#]}"
Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):
*
10
5
a c
b
f
If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):
*
a c
b
f
5
10
[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[#]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.
[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.
In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):
declare BSORT=()
function bubble_sort()
{ #
# #param [ARGUMENTS]...
#
# Sort all positional arguments and store them in global array BSORT.
# Without arguments sort this array. Return the number of iterations made.
#
# Bubble sorting lets the heaviest element sink to the bottom.
#
(($# > 0)) && BSORT=("$#")
local j=0 ubound=$((${#BSORT[*]} - 1))
while ((ubound > 0))
do
local i=0
while ((i < ubound))
do
if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
then
local t="${BSORT[$i]}"
BSORT[$i]="${BSORT[$((i + 1))]}"
BSORT[$((i + 1))]="$t"
fi
((++i))
done
((++j))
((--ubound))
done
echo $j
}
bubble_sort a c b 'z y' 3 5
echo ${BSORT[#]}
This prints:
3 5 a b c z y
The same output is created from
BSORT=(a c b 'z y' 3 5)
bubble_sort
echo ${BSORT[#]}
Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.
Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).
local e new_array=()
while IFS= read -r -d '' e; do
new_array+=( "${e}" )
done < <(printf "%s\0" "${array[#]}" | LC_ALL=C sort -z)
First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:
for e in "${array[#]}"; do
printf "%s\0" "${e}"
done
The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.
The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).
Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.
As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).
Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).
Keep it simple ;)
In the following example, the array b is the sorted version of the array a!
The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.
a=(2 3 1)
b=( $( for x in ${a[#]}; do echo $x; done | sort ) )
echo ${b[#]} # output: 1 2 3
min sort:
#!/bin/bash
array=(.....)
index_of_element1=0
while (( ${index_of_element1} < ${#array[#]} )); do
element_1="${array[${index_of_element1}]}"
index_of_element2=$((index_of_element1 + 1))
index_of_min=${index_of_element1}
min_element="${element_1}"
for element_2 in "${array[#]:$((index_of_element1 + 1))}"; do
min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"
if [[ "${min_element}" == "${element_2}" ]]; then
index_of_min=${index_of_element2}
fi
let index_of_element2++
done
array[${index_of_element1}]="${min_element}"
array[${index_of_min}]="${element_1}"
let index_of_element1++
done
try this:
echo ${array[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
Output will be:
3
5
a
b
c
f
Problem solved.
If you can compute a unique integer for each element in the array, like this:
tab='0123456789abcdefghijklmnopqrstuvwxyz'
# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
declare -g ord_${tab:i:1}=$i
done
function sexy_int() {
local sum=0
local i ch ref
for ((i = 0; i < ${#1}; i++)); do
ch="${1:i:1}"
ref="ord_$ch"
(( sum += ${!ref} ))
done
return $sum
}
sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"
then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:
array=(a c b f 3 5)
for el in "${array[#]}"; do
sexy_int "$el"
sorted[$?]="$el"
done
echo "${sorted[#]}"
Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
array=(a c b f 3 5)
new_array=($(echo "${array[#]}" | sed 's/ /\n/g' | sort))
echo ${new_array[#]}
echo contents of new_array will be:
3 5 a b c f
There is a workaround for the usual problem of spaces and newlines:
Use a character that is not in the original array (like $'\1' or $'\4' or similar).
This function gets the job done:
# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
if [[ $* =~ [$wa] ]]; then
echo "$0: error: array contains the workaround char" >&2
exit 1
fi
set -f; local IFS=$'\n' x nl=$'\n'
set -- $(printf '%s\n' "${#//$nl/$wa}" | sort -n)
for x
do sorted+=("${x//$wa/$nl}")
done
}
This will sort the array:
$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[#]}"
$ printf '<%s>\n' "${sorted[#]}"
<a>
<b>
<c d>
<e
f>
<gh>
This will complain that the source array contains the workaround character:
$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[#]}"
./script: error: array contains the workaround char
description
We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $#).
but we replace any new line by the workaround char "${#//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done
This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):
mergesort() {
local -n -r input_reference="$1"
local -n output_reference="$2"
local -r -i size="${#input_reference[#]}"
local merge previous
local -a -i runs indices
local -i index previous_idx merged_idx \
run_a_idx run_a_stop \
run_b_idx run_b_stop
output_reference=("${input_reference[#]}")
if ((size == 0)); then return; fi
previous="${output_reference[0]}"
runs=(0)
for ((index = 0;;)) do
for ((++index;; ++index)); do
if ((index >= size)); then break 2; fi
if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
previous="${output_reference[index]}"
done
previous="${output_reference[index]}"
runs+=(index)
done
runs+=(size)
while (("${#runs[#]}" > 2)); do
indices=("${!runs[#]}")
merge=("${output_reference[#]}")
for ((index = 0; index < "${#indices[#]}" - 2; index += 2)); do
merged_idx=runs[indices[index]]
run_a_idx=merged_idx
previous_idx=indices[$((index + 1))]
run_a_stop=runs[previous_idx]
run_b_idx=runs[previous_idx]
run_b_stop=runs[indices[$((index + 2))]]
unset runs[previous_idx]
while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
output_reference[merged_idx++]="${merge[run_a_idx++]}"
else
output_reference[merged_idx++]="${merge[run_b_idx++]}"
fi
done
while ((run_a_idx < run_a_stop)); do
output_reference[merged_idx++]="${merge[run_a_idx++]}"
done
while ((run_b_idx < run_b_stop)); do
output_reference[merged_idx++]="${merge[run_b_idx++]}"
done
done
done
}
declare -ar input=({z..a}{z..a})
declare -a output
mergesort input output
echo "${input[#]}"
echo "${output[#]}"
Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change
array=("a \n c" b f "3 5")
Using process substitution and read array in bash > v4.4 WITH EOL character
readarray -t sorted < <(sort < <(printf '%s\n' "${array[#]}"))
Using process substitution and read array in bash > v4.4 WITH NULL character
readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[#]}"))
Finally we verify with
printf "[%s]\n" "${sorted[#]}"
output is
[3 5]
[a \n c]
[b]
[f]
Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n
I am not convinced that you'll need an external sorting program in Bash.
Here is my implementation for the simple bubble-sort algorithm.
function bubble_sort()
{ #
# Sorts all positional arguments and echoes them back.
#
# Bubble sorting lets the heaviest (longest) element sink to the bottom.
#
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i += 1))
done
((max -= 1))
done
echo ${array[#]}
}
array=(a c b f 3 5)
echo " input: ${array[#]}"
echo "output: $(bubble_sort ${array[#]})"
This shall print:
input: a c b f 3 5
output: 3 5 a b c f
a=(e b 'c d')
shuf -e "${a[#]}" | sort >/tmp/f
mapfile -t g </tmp/f
Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:
a=(3 "2 a" 1) # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[#]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[#]}"; # Success!
Outputs:
'1' '2 a' '3'
Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.
Inspiration was from #yas's answer and #Alcamtar comments.
EDIT
Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!
IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
Turns out that the unset is required because this is a variable assignment that has no command.
I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.
EDIT 2
GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.
array=(z 'b c'); { set "${array[#]}"; printf '%s\n' "$#"; } \
| sort \
| mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
Copy the array to the positional arguments. (e.g. set "${array[#]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
Print each positional argument (e.g. printf '%s\n' "$#" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
Then sort does its thing.
Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
Dump the array to show its been sorted.
As a function:
set +m
shopt -s lastpipe
sort_array() {
declare -n ref=$1
set "${ref[#]}"
printf '%s\n' "$#"
| sort \
| mapfile -t $ref
}
then
array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")
I look forward to being ripped apart by all the UNIX gurus! :)
sorted=($(echo ${array[#]} | tr " " "\n" | sort))
In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:
Echo array content --> replace space by newline --> sort
$() is to echo the result
($()) is to put the "echoed result" in an array
Note: as #sorontar mentioned in a comment to a different question:
The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Passing arrays as parameters in bash

How can I pass an array as parameter to a bash function?
Note: After not finding an answer here on Stack Overflow, I posted my somewhat crude solution myself. It allows for only one array being passed, and it being the last element of the parameter list. Actually, it is not passing the array at all, but a list of its elements, which are re-assembled into an array by called_function(), but it worked for me. If someone knows a better way, feel free to add it here.
You can pass multiple arrays as arguments using something like this:
takes_ary_as_arg()
{
declare -a argAry1=("${!1}")
echo "${argAry1[#]}"
declare -a argAry2=("${!2}")
echo "${argAry2[#]}"
}
try_with_local_arys()
{
# array variables could have local scope
local descTable=(
"sli4-iread"
"sli4-iwrite"
"sli3-iread"
"sli3-iwrite"
)
local optsTable=(
"--msix --iread"
"--msix --iwrite"
"--msi --iread"
"--msi --iwrite"
)
takes_ary_as_arg descTable[#] optsTable[#]
}
try_with_local_arys
will echo:
sli4-iread sli4-iwrite sli3-iread sli3-iwrite
--msix --iread --msix --iwrite --msi --iread --msi --iwrite
Edit/notes: (from comments below)
descTable and optsTable are passed as names and are expanded in the function. Thus no $ is needed when given as parameters.
Note that this still works even with descTable etc being defined with local, because locals are visible to the functions they call.
The ! in ${!1} expands the arg 1 variable.
declare -a just makes the indexed array explicit, it is not strictly necessary.
Note: This is the somewhat crude solution I posted myself, after not finding an answer here on Stack Overflow. It allows for only one array being passed, and it being the last element of the parameter list. Actually, it is not passing the array at all, but a list of its elements, which are re-assembled into an array by called_function(), but it worked for me. Somewhat later Ken posted his solution, but I kept mine here for "historic" reference.
calling_function()
{
variable="a"
array=( "x", "y", "z" )
called_function "${variable}" "${array[#]}"
}
called_function()
{
local_variable="${1}"
shift
local_array=("${#}")
}
Commenting on Ken Bertelson solution and answering Jan Hettich:
How it works
the takes_ary_as_arg descTable[#] optsTable[#] line in try_with_local_arys() function sends:
This is actually creates a copy of the descTable and optsTable arrays which are accessible to the takes_ary_as_arg function.
takes_ary_as_arg() function receives descTable[#] and optsTable[#] as strings, that means $1 == descTable[#] and $2 == optsTable[#].
in the beginning of takes_ary_as_arg() function it uses ${!parameter} syntax, which is called indirect reference or sometimes double referenced, this means that instead of using $1's value, we use the value of the expanded value of $1, example:
baba=booba
variable=baba
echo ${variable} # baba
echo ${!variable} # booba
likewise for $2.
putting this in argAry1=("${!1}") creates argAry1 as an array (the brackets following =) with the expanded descTable[#], just like writing there argAry1=("${descTable[#]}") directly.
the declare there is not required.
N.B.: It is worth mentioning that array initialization using this bracket form initializes the new array according to the IFS or Internal Field Separator which is by default tab, newline and space. in that case, since it used [#] notation each element is seen by itself as if he was quoted (contrary to [*]).
My reservation with it
In BASH, local variable scope is the current function and every child function called from it, this translates to the fact that takes_ary_as_arg() function "sees" those descTable[#] and optsTable[#] arrays, thus it is working (see above explanation).
Being that case, why not directly look at those variables themselves? It is just like writing there:
argAry1=("${descTable[#]}")
See above explanation, which just copies descTable[#] array's values according to the current IFS.
In summary
This is passing, in essence, nothing by value - as usual.
I also want to emphasize Dennis Williamson comment above: sparse arrays (arrays without all the keys defines - with "holes" in them) will not work as expected - we would loose the keys and "condense" the array.
That being said, I do see the value for generalization, functions thus can get the arrays (or copies) without knowing the names:
for ~"copies": this technique is good enough, just need to keep aware, that the indices (keys) are gone.
for real copies:
we can use an eval for the keys, for example:
eval local keys=(\${!$1})
and then a loop using them to create a copy.
Note: here ! is not used it's previous indirect/double evaluation, but rather in array context it returns the array indices (keys).
and, of course, if we were to pass descTable and optsTable strings (without [#]), we could use the array itself (as in by reference) with eval. for a generic function that accepts arrays.
The basic problem here is that the bash developer(s) that designed/implemented arrays really screwed the pooch. They decided that ${array} was just short hand for ${array[0]}, which was a bad mistake. Especially when you consider that ${array[0]} has no meaning and evaluates to the empty string if the array type is associative.
Assigning an array takes the form array=(value1 ... valueN) where value has the syntax [subscript]=string, thereby assigning a value directly to a particular index in the array. This makes it so there can be two types of arrays, numerically indexed and hash indexed (called associative arrays in bash parlance). It also makes it so that you can create sparse numerically indexed arrays. Leaving off the [subscript]= part is short hand for a numerically indexed array, starting with the ordinal index of 0 and incrementing with each new value in the assignment statement.
Therefore, ${array} should evaluate to the entire array, indexes and all. It should evaluate to the inverse of the assignment statement. Any third year CS major should know that. In that case, this code would work exactly as you might expect it to:
declare -A foo bar
foo=${bar}
Then, passing arrays by value to functions and assigning one array to another would work as the rest of the shell syntax dictates. But because they didn't do this right, the assignment operator = doesn't work for arrays, and arrays can't be passed by value to functions or to subshells or output in general (echo ${array}) without code to chew through it all.
So, if it had been done right, then the following example would show how the usefulness of arrays in bash could be substantially better:
simple=(first=one second=2 third=3)
echo ${simple}
the resulting output should be:
(first=one second=2 third=3)
Then, arrays could use the assignment operator, and be passed by value to functions and even other shell scripts. Easily stored by outputting to a file, and easily loaded from a file into a script.
declare -A foo
read foo <file
Alas, we have been let down by an otherwise superlative bash development team.
As such, to pass an array to a function, there is really only one option, and that is to use the nameref feature:
function funky() {
local -n ARR
ARR=$1
echo "indexes: ${!ARR[#]}"
echo "values: ${ARR[#]}"
}
declare -A HASH
HASH=([foo]=bar [zoom]=fast)
funky HASH # notice that I'm just passing the word 'HASH' to the function
will result in the following output:
indexes: foo zoom
values: bar fast
Since this is passing by reference, you can also assign to the array in the function. Yes, the array being referenced has to have a global scope, but that shouldn't be too big a deal, considering that this is shell scripting. To pass an associative or sparse indexed array by value to a function requires throwing all the indexes and the values onto the argument list (not too useful if it's a large array) as single strings like this:
funky "${!array[*]}" "${array[*]}"
and then writing a bunch of code inside the function to reassemble the array.
DevSolar's answer has one point I don't understand (maybe he has a specific reason to do so, but I can't think of one): He sets the array from the positional parameters element by element, iterative.
An easier approuch would be
called_function()
{
...
# do everything like shown by DevSolar
...
# now get a copy of the positional parameters
local_array=("$#")
...
}
An easy way to pass several arrays as parameter is to use a character-separated string. You can call your script like this:
./myScript.sh "value1;value2;value3" "somethingElse" "value4;value5" "anotherOne"
Then, you can extract it in your code like this:
myArray=$1
IFS=';' read -a myArray <<< "$myArray"
myOtherArray=$3
IFS=';' read -a myOtherArray <<< "$myOtherArray"
This way, you can actually pass multiple arrays as parameters and it doesn't have to be the last parameters.
function aecho {
set "$1[$2]"
echo "${!1}"
}
Example
$ foo=(dog cat bird)
$ aecho foo 1
cat
Modern bash (apparently version 4.3 or later), allows you to pass arrays by reference. I'll show that below. If you'd like to manually serialize and deserialize the arrays instead, see my answer here for bash regular "indexed" arrays, and here for bash associative arrays. Passing arrays by reference, as shown below, is much easier and more-concise, however, so that's what I now recommend.
The code below is also available online in my eRCaGuy_hello_world repo here: array_pass_as_bash_parameter_by_reference.sh. See also this example here: array_pass_as_bash_parameter_2_associative.sh.
Here is a demo for regular bash arrays:
function foo {
# declare a local **reference variable** (hence `-n`) named `data_ref`
# which is a reference to the value stored in the first parameter
# passed in
local -n data_ref="$1"
echo "${data_ref[0]}"
echo "${data_ref[1]}"
}
# declare a regular bash "indexed" array
declare -a data
data+=("Fred Flintstone")
data+=("Barney Rubble")
foo "data"
Sample output:
Fred Flintstone
Barney Rubble
...and here is a demo for associative bash arrays (ie: bash hash tables, "dictionaries", or "unordered maps"):
function foo {
# declare a local **reference variable** (hence `-n`) named `data_ref`
# which is a reference to the value stored in the first parameter
# passed in
local -n data_ref="$1"
echo "${data_ref["a"]}"
echo "${data_ref["b"]}"
}
# declare a bash associative array
declare -A data
data["a"]="Fred Flintstone"
data["b"]="Barney Rubble"
foo "data"
Sample output:
Fred Flintstone
Barney Rubble
References:
I modified the above code samples from #Todd Lehman's answer here: How to pass an associative array as argument to a function in Bash?
See also my manual serializing/deserializing answer here
And see my follow-up Question here: Why do the man bash pages state the declare and local -n attribute "cannot be applied to array variables", and yet it can?
This one works even with spaces:
format="\t%2s - %s\n"
function doAction
{
local_array=("$#")
for (( i = 0 ; i < ${#local_array[#]} ; i++ ))
do
printf "${format}" $i "${local_array[$i]}"
done
echo -n "Choose: "
option=""
read -n1 option
echo ${local_array[option]}
return
}
#the call:
doAction "${tools[#]}"
With a few tricks you can actually pass named parameters to functions, along with arrays.
The method I developed allows you to access parameters passed to a function like this:
testPassingParams() {
#var hello
l=4 #array anArrayWithFourElements
l=2 #array anotherArrayWithTwo
#var anotherSingle
#reference table # references only work in bash >=4.3
#params anArrayOfVariedSize
test "$hello" = "$1" && echo correct
#
test "${anArrayWithFourElements[0]}" = "$2" && echo correct
test "${anArrayWithFourElements[1]}" = "$3" && echo correct
test "${anArrayWithFourElements[2]}" = "$4" && echo correct
# etc...
#
test "${anotherArrayWithTwo[0]}" = "$6" && echo correct
test "${anotherArrayWithTwo[1]}" = "$7" && echo correct
#
test "$anotherSingle" = "$8" && echo correct
#
test "${table[test]}" = "works"
table[inside]="adding a new value"
#
# I'm using * just in this example:
test "${anArrayOfVariedSize[*]}" = "${*:10}" && echo correct
}
fourElements=( a1 a2 "a3 with spaces" a4 )
twoElements=( b1 b2 )
declare -A assocArray
assocArray[test]="works"
testPassingParams "first" "${fourElements[#]}" "${twoElements[#]}" "single with spaces" assocArray "and more... " "even more..."
test "${assocArray[inside]}" = "adding a new value"
In other words, not only you can call your parameters by their names (which makes up for a more readable core), you can actually pass arrays (and references to variables - this feature works only in bash 4.3 though)! Plus, the mapped variables are all in the local scope, just as $1 (and others).
The code that makes this work is pretty light and works both in bash 3 and bash 4 (these are the only versions I've tested it with). If you're interested in more tricks like this that make developing with bash much nicer and easier, you can take a look at my Bash Infinity Framework, the code below was developed for that purpose.
Function.AssignParamLocally() {
local commandWithArgs=( $1 )
local command="${commandWithArgs[0]}"
shift
if [[ "$command" == "trap" || "$command" == "l="* || "$command" == "_type="* ]]
then
paramNo+=-1
return 0
fi
if [[ "$command" != "local" ]]
then
assignNormalCodeStarted=true
fi
local varDeclaration="${commandWithArgs[1]}"
if [[ $varDeclaration == '-n' ]]
then
varDeclaration="${commandWithArgs[2]}"
fi
local varName="${varDeclaration%%=*}"
# var value is only important if making an object later on from it
local varValue="${varDeclaration#*=}"
if [[ ! -z $assignVarType ]]
then
local previousParamNo=$(expr $paramNo - 1)
if [[ "$assignVarType" == "array" ]]
then
# passing array:
execute="$assignVarName=( \"\${#:$previousParamNo:$assignArrLength}\" )"
eval "$execute"
paramNo+=$(expr $assignArrLength - 1)
unset assignArrLength
elif [[ "$assignVarType" == "params" ]]
then
execute="$assignVarName=( \"\${#:$previousParamNo}\" )"
eval "$execute"
elif [[ "$assignVarType" == "reference" ]]
then
execute="$assignVarName=\"\$$previousParamNo\""
eval "$execute"
elif [[ ! -z "${!previousParamNo}" ]]
then
execute="$assignVarName=\"\$$previousParamNo\""
eval "$execute"
fi
fi
assignVarType="$__capture_type"
assignVarName="$varName"
assignArrLength="$__capture_arrLength"
}
Function.CaptureParams() {
__capture_type="$_type"
__capture_arrLength="$l"
}
alias #trapAssign='Function.CaptureParams; trap "declare -i \"paramNo+=1\"; Function.AssignParamLocally \"\$BASH_COMMAND\" \"\$#\"; [[ \$assignNormalCodeStarted = true ]] && trap - DEBUG && unset assignVarType && unset assignVarName && unset assignNormalCodeStarted && unset paramNo" DEBUG; '
alias #param='#trapAssign local'
alias #reference='_type=reference #trapAssign local -n'
alias #var='_type=var #param'
alias #params='_type=params #param'
alias #array='_type=array #param'
Just to add to the accepted answer, as I found it doesn't work well if the array contents are someting like:
RUN_COMMANDS=(
"command1 param1... paramN"
"command2 param1... paramN"
)
In this case, each member of the array gets split, so the array the function sees is equivalent to:
RUN_COMMANDS=(
"command1"
"param1"
...
"command2"
...
)
To get this case to work, the way I found is to pass the variable name to the function, then use eval:
function () {
eval 'COMMANDS=( "${'"$1"'[#]}" )'
for COMMAND in "${COMMANDS[#]}"; do
echo $COMMAND
done
}
function RUN_COMMANDS
Just my 2©
As ugly as it is, here is a workaround that works as long as you aren't passing an array explicitly, but a variable corresponding to an array:
function passarray()
{
eval array_internally=("$(echo '${'$1'[#]}')")
# access array now via array_internally
echo "${array_internally[#]}"
#...
}
array=(0 1 2 3 4 5)
passarray array # echo's (0 1 2 3 4 5) as expected
I'm sure someone can come up with a clearner implementation of the idea, but I've found this to be a better solution than passing an array as "{array[#]"} and then accessing it internally using array_inside=("$#"). This becomes complicated when there are other positional/getopts parameters. In these cases, I've had to first determine and then remove the parameters not associated with the array using some combination of shift and array element removal.
A purist perspective likely views this approach as a violation of the language, but pragmatically speaking, this approach has saved me a whole lot of grief. On a related topic, I also use eval to assign an internally constructed array to a variable named according to a parameter target_varname I pass to the function:
eval $target_varname=$"(${array_inside[#]})"
Hope this helps someone.
My short answer is:
function display_two_array {
local arr1=$1
local arr2=$2
for i in $arr1
do
echo "arrary1: $i"
done
for i in $arr2
do
echo "arrary2: $i"
done
}
test_array=(1 2 3 4 5)
test_array2=(7 8 9 10 11)
display_two_array "${test_array[*]}" "${test_array2[*]}"
It should be noticed that the ${test_array[*]} and ${test_array2[*]} should be surrounded by "", otherwise you'll fail.
The answer below shows you how to pass bash regular "indexed" arrays as parameters to a function essentially by serializing and deserializing them.
To see this manual serializing/deserializing for bash associative arrays (hash tables) instead of for regular indexed arrays, see my answer here.
For a better way (which requires bash version 4.3 or later, I think), which passes the arrays by reference, see the link just above and my other answer here.
Passing arrays by reference is much easier and more-concise, so that's what I now recommend. That being said, the manual serializing/deserializing techniques I show below are also extremely informative and useful.
Quick summary:
See the 3 separate function definitions below. I go over how to pass:
one bash array to a function
two or more bash arrays to a function, and
two or more bash arrays plus additional arguments (before or after the arrays) to a function.
12 years later and I still don't see any answers I really like here and which I would consider to be thorough enough, simple enough, and "canonical" enough for me to just use--answers which I can come back to again and again and copy and paste and expand when needed. So, here is my answer which I do consider to be all of these things.
How to pass bash arrays as parameters to bash functions
You might also call this "variadic argument parsing in bash functions or scripts", especially since the number of elements in each array passed in to the examples below can dynamically vary, and in bash the elements of an array essentially get passed to the function as separate input parameters even when the array is passed in via a single array expansion argument like this "${array1[#]}".
For all example code below, assume you have these two bash arrays for testing:
array1=()
array1+=("one")
array1+=("two")
array1+=("three")
array2=("four" "five" "six" "seven" "eight")
The code above and below is available in my bash/array_pass_as_bash_parameter.sh file in my eRCaGuy_hello_world repo on GitHub.
Example 1: how to pass one bash array to a function
To pass an array to a bash function, you have to pass all of its elements separately. Given bash array array1, the syntax to obtain all elements of this array is "${array1[#]}". Since all incoming parameters to a bash function or executable file get wrapped up in the magic bash input parameter array called #, you can read all members of the input array with the "$#" syntax, as shown below.
Function definition:
# Print all elements of a bash array.
# General form:
# print_one_array array1
# Example usage:
# print_one_array "${array1[#]}"
print_one_array() {
for element in "$#"; do
printf " %s\n" "$element"
done
}
Example usage:
echo "Printing array1"
# This syntax passes all members of array1 as separate input arguments to
# the function
print_one_array "${array1[#]}"
Example Output:
Printing array1
one
two
three
Example 2: how to pass two or more bash arrays to a function...
(and how to recapture the input arrays as separate bash arrays again)
Here, we need to differentiate which incoming parameters belong to which array. To do this, we need to know the size of each array, meaning the number of elements in each array. This is very similar to passing arrays in C, where we also generally must know the array length passed to any C function. Given bash array array1, the number of elements in it can be obtained with "${#array1[#]}" (notice the usage of the # symbol). In order to know where in the input arguments the array_len length parameter is, we must always pass the array length parameter for each array before passing the individual array elements, as shown below.
In order to parse the arrays, I use array slicing on the input argument array, #.
Here is a reminder on how bash array slicing syntax works (from my answer here). In the slicing syntax :start:length, the 1st number is the zero-based index to start slicing from, and the 2nd number is the number of elements to grab:
# array slicing basic format 1: grab a certain length starting at a certain
# index
echo "${#:2:5}"
# │ │
# │ └────> slice length
# └──────> slice starting index (zero-based)
# array slicing basic format 2: grab all remaining array elements starting at a
# certain index through to the end
echo "${#:2}"
# │
# │
# └──────> slice starting index (zero-based)
Also, in order to force the sliced parameters from the input array to become a new array, I surround them in parenthesis (), like this, for example ("${#:$i:$array1_len}"). Those parenthesis on the outside are important, again, because that's how we make an array in bash.
This example below only accepts two bash arrays, but following the given patterns it can be easily adapted to accept any number of bash arrays as arguments.
Function definition:
# Print all elements of two bash arrays.
# General form (notice length MUST come before the array in order
# to be able to parse the args!):
# print_two_arrays array1_len array1 array2_len array2
# Example usage:
# print_two_arrays "${#array1[#]}" "${array1[#]}" \
# "${#array2[#]}" "${array2[#]}"
print_two_arrays() {
# For debugging: print all input args
echo "All args to 'print_two_arrays':"
print_one_array "$#"
i=1
# Read array1_len into a variable
array1_len="${#:$i:1}"
((i++))
# Read array1 into a new array
array1=("${#:$i:$array1_len}")
((i += $array1_len))
# Read array2_len into a variable
array2_len="${#:$i:1}"
((i++))
# Read array2 into a new array
array2=("${#:$i:$array2_len}")
((i += $array2_len))
# Print the two arrays
echo "array1:"
print_one_array "${array1[#]}"
echo "array2:"
print_one_array "${array2[#]}"
}
Example usage:
echo "Printing array1 and array2"
print_two_arrays "${#array1[#]}" "${array1[#]}" "${#array2[#]}" "${array2[#]}"
Example Output:
Printing array1 and array2
All args to 'print_two_arrays':
3
one
two
three
5
four
five
six
seven
eight
array1:
one
two
three
array2:
four
five
six
seven
eight
Example 3: pass two bash arrays plus some extra args after that to a function
This is a tiny expansion of the example above. It also uses bash array slicing, just like the example above. Instead of stopping after parsing two full input arrays, however, we continue and parse a couple more arguments at the end. This pattern can be continued indefinitely for any number of bash arrays and any number of additional arguments, accommodating any input argument order, so long as the length of each bash array comes just before the elements of that array.
Function definition:
# Print all elements of two bash arrays, plus two extra args at the end.
# General form (notice length MUST come before the array in order
# to be able to parse the args!):
# print_two_arrays_plus_extra_args array1_len array1 array2_len array2 \
# extra_arg1 extra_arg2
# Example usage:
# print_two_arrays_plus_extra_args "${#array1[#]}" "${array1[#]}" \
# "${#array2[#]}" "${array2[#]}" "hello" "world"
print_two_arrays_plus_extra_args() {
i=1
# Read array1_len into a variable
array1_len="${#:$i:1}"
((i++))
# Read array1 into a new array
array1=("${#:$i:$array1_len}")
((i += $array1_len))
# Read array2_len into a variable
array2_len="${#:$i:1}"
((i++))
# Read array2 into a new array
array2=("${#:$i:$array2_len}")
((i += $array2_len))
# You can now read the extra arguments all at once and gather them into a
# new array like this:
extra_args_array=("${#:$i}")
# OR you can read the extra arguments individually into their own variables
# one-by-one like this
extra_arg1="${#:$i:1}"
((i++))
extra_arg2="${#:$i:1}"
((i++))
# Print the output
echo "array1:"
print_one_array "${array1[#]}"
echo "array2:"
print_one_array "${array2[#]}"
echo "extra_arg1 = $extra_arg1"
echo "extra_arg2 = $extra_arg2"
echo "extra_args_array:"
print_one_array "${extra_args_array[#]}"
}
Example usage:
echo "Printing array1 and array2 plus some extra args"
print_two_arrays_plus_extra_args "${#array1[#]}" "${array1[#]}" \
"${#array2[#]}" "${array2[#]}" "hello" "world"
Example Output:
Printing array1 and array2 plus some extra args
array1:
one
two
three
array2:
four
five
six
seven
eight
extra_arg1 = hello
extra_arg2 = world
extra_args_array:
hello
world
References:
I referenced a lot of my own sample code from my eRCaGuy_hello_world repo here:
array_practice.sh
array_slicing_demo.sh
[my answer on bash array slicing] Unix & Linux: Bash: slice of positional parameters
An answer to my question on "How can I create and use a backup copy of all input args ("$#") in bash?" - very useful for general array manipulation of the input argument array
An answer to "How to pass array as an argument to a function in Bash", which confirmed to me this really important concept that:
You cannot pass an array, you can only pass its elements (i.e. the expanded array).
See also:
[another answer of mine on this topic] How to pass array as an argument to a function in Bash
Requirement: Function to find a string in an array.
This is a slight simplification of DevSolar's solution in that it uses the arguments passed rather than copying them.
myarray=('foobar' 'foxbat')
function isInArray() {
local item=$1
shift
for one in $#; do
if [ $one = $item ]; then
return 0 # found
fi
done
return 1 # not found
}
var='foobar'
if isInArray $var ${myarray[#]}; then
echo "$var found in array"
else
echo "$var not found in array"
fi
You can also create a json file with an array, and then parse that json file with jq
For example:
my-array.json:
{
"array": ["item1","item2"]
}
script.sh:
ARRAY=$(jq -r '."array"' $1 | tr -d '[],"')
And then call the script like:
script.sh ./path-to-json/my-array.json
You can find more ideas in this similar question: How to pass array as an argument to a function in Bash

Resources